# Bonds

A dataset for illustrating the various available visualizations needs a certain degree of richness with manageable size. The dataset on Bonds contains three categorical and a few quantitative indicators sufficient to show what we might wish.

Bonds <- read.csv(url("https://raw.githubusercontent.com/robertwwalker/DADMStuff/master/BondFunds.csv"))

## A Summary

library(skimr)
Bonds %>% skim()
 Name Piped data Number of rows 184 Number of columns 9 _______________________ Column type frequency: character 4 numeric 5 ________________________ Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Fund.Number 0 1 4 6 0 184 0
Type 0 1 20 23 0 2 0
Fees 0 1 2 3 0 2 0
Risk 0 1 7 13 0 3 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Assets 0 1 910.65 2253.27 12.40 113.72 268.4 621.95 18603.50 ▇▁▁▁▁
Expense.Ratio 0 1 0.71 0.26 0.12 0.53 0.7 0.90 1.94 ▂▇▅▁▁
Return.2009 0 1 7.16 6.09 -8.80 3.48 6.4 10.72 32.00 ▁▇▅▁▁
X3.Year.Return 0 1 4.66 2.52 -13.80 4.05 5.1 6.10 9.40 ▁▁▁▅▇
X5.Year.Return 0 1 3.99 1.49 -7.30 3.60 4.3 4.90 6.80 ▁▁▁▅▇

Most data types are represented. There is no time variable so dates and the visualizations that go with time series are omitted.

# Data Visualization

First, let us look at visualizations for one quantitative variable. Let me focus on assets..

## geom_histogram()

A histogram divides the data into categories and counts the observations per category. The width of the categories [on x] is determined by binwidth= or the binwidth can be calculated as a function of the range and the number of bins bin=. I will define it as Gen.Hist.

### A Base Histogram

Gen.Hist <- Bonds %>% ggplot() + aes(x = Assets) + geom_histogram()
Gen.Hist
## stat_bin() using bins = 30. Pick better value with binwidth.

### Histograms [bins]

We can choose more bins. 50? That is far more than the default of 30.

Bin50.Hist <- Bonds %>% ggplot() + aes(x = Assets) + geom_histogram(bins = 50)
Bin50.Hist

We can also choose fewer bins. I will choose 10.

Bin10.Hist <- Bonds %>% ggplot() + aes(x = Assets) + geom_histogram(bins = 10)
Bin10.Hist

### Histograms [binwidth]

We can also set the width of bins in the metric of x; I will choose 500 (bigger).

BinW500.Hist <- Bonds %>% ggplot() + aes(x = Assets) + geom_histogram(binwidth = 500)
BinW500.Hist

We can also set the width of bins in the metric of x; I will choose 50 (smaller width makes more bins).

BinW50.Hist <- Bonds %>% ggplot() + aes(x = Assets) + geom_histogram(binwidth = 50)
BinW50.Hist

## geom_dotplot()

geom_dotplot() places a dot for every observation in the relevant bin. We can control the size of the bins [in the original metric] with binwidth=.

### Small binwidth

Bonds %>% ggplot() + aes(x = Assets) + geom_dotplot(binwidth = 10)

### Large binwidth

Bonds %>% ggplot() + aes(x = Assets) + geom_dotplot(binwidth = 1000)

### An ?optimal? binwidth

Each dot represents a datum with bins of size 100.

Bonds %>% ggplot() + aes(x = Assets) + geom_dotplot(binwidth = 100) + labs(y = "")

## geom_freqpoly()

geom_freqpoly() is the line equivalent of a histogram. The arguments are similar, the output doesn’t include the bars as it does in the histogram.

Bonds %>% ggplot(., aes(x = Assets)) + geom_freqpoly()
## stat_bin() using bins = 30. Pick better value with binwidth.

### More bins

Bonds %>% ggplot(., aes(x = Assets)) + geom_freqpoly(bins = 50)

### Fewer bins

Bonds %>% ggplot(., aes(x = Assets)) + geom_freqpoly(bins = 10)

## geom_area()

Is a relative of the histogram with lines connecting the midpoints of the bins and an associated fill from zero.

### Defaults to 30 bins

Bonds %>% ggplot(., aes(x = Assets)) + geom_area(stat = "bin")
## stat_bin() using bins = 30. Pick better value with binwidth.

### Small binwidth with a large number of bins

I will color in the area with magenta and clean up the theme.

Bonds %>% ggplot(., aes(x = Assets)) + geom_area(stat = "bin", bins = 100, fill = "magenta") +
theme_minimal()

## geom_density()

A relative of the histogram and the area plots above, the density plot smooths out the blocks of a histogram with a moving window [known as the bandwidth].

### geom_density() outlines

Bonds %>% ggplot(., aes(x = Assets)) + geom_density(outline.type = "upper")

Bonds %>% ggplot(., aes(x = Assets)) + geom_density(outline.type = "lower")

Bonds %>% ggplot(., aes(x = Assets)) + geom_density(outline.type = "full")

### geom_density() adjust

Adjust applies a numeric correction to the bandwidth. Numbers greater than 1 make the bandwidth bigger [and the graphic smoother] and numbers less than 1 [but greater than zero] make the bandwidth smaller and the graphic more jagged.

Bonds %>% ggplot(., aes(x = Assets)) + geom_density(adjust = 2)

Bonds %>% ggplot(., aes(x = Assets)) + geom_density(adjust = 1/2)

## geom_boxplot

A boxplot shows a box of the first and third quartiles and a notch at the median. The dots above or below denote points outside the hinges. The hinges [default to 1.5*IQR] show a range of expected data while the individual dots show possible outliers outside the hinges. To adjust the hinges, the argument coef=1.5 can be adjusted.

Bonds %>% ggplot(., aes(x = Assets)) + geom_boxplot()

## geom_qq()

To compare empirical and theoretical quantiles. Comparing a distribution to the normal or others is common and this provides the tool for doing so. The default is a normal.

The empirical cumulative distribution function arises when we sort a quantitative variable and show the percentiles below said value.

Bonds %>% ggplot(aes(sample = Assets)) + geom_qq()

## stat_ecdf(geom = )

We could do this with most geometries. I will show a few.

### stat_ecdf(geom = "step")

Bonds %>% ggplot(aes(x = Assets)) + stat_ecdf(geom = "point") + stat_ecdf(geom = "step",
alpha = 0.1) + labs(y = "ECDF: Proportion less than Assets") + theme_minimal()

### stat_ecdf(geom = "point")

Bonds %>% ggplot(aes(x = Assets)) + stat_ecdf(geom = "point") + stat_ecdf(geom = "step",
alpha = 0.1) + labs(y = "ECDF: Proportion less than Assets") + theme_minimal()

### Combining two

Bonds %>% ggplot(aes(x = Assets)) + stat_ecdf(geom = "point") + stat_ecdf(geom = "step",
alpha = 0.1) + labs(y = "ECDF: Proportion less than Assets") + theme_minimal()

### stat_ecdf(geom = "line")

Bonds %>% ggplot(aes(x = Assets)) + stat_ecdf(geom = "line") + labs(y = "ECDF: Proportion less than Assets") +
theme_minimal()

### stat_ecdf(geom = "area")

Bonds %>% ggplot(aes(x = Assets)) + stat_ecdf(geom = "area", alpha = 0.2) + labs(y = "ECDF: Proportion less than Assets") +
theme_minimal()

##### Robert W. Walker
###### Associate Professor of Quantitative Methods

My research interests include causal inference, statistical computation and data visualization.