Visualizing One Qualitative Variable

Bonds

A dataset for illustrating the various available visualizations needs a certain degree of richness with manageable size. The dataset on Bonds contains three categorical and a few quantitative indicators sufficient to show what we might wish.

Loading the Data

Bonds <- read.csv(url("https://raw.githubusercontent.com/robertwwalker/DADMStuff/master/BondFunds.csv"))

A Summary

library(skimr)
Bonds %>% skim()
Table 1: Data summary
Name Piped data
Number of rows 184
Number of columns 9
_______________________
Column type frequency:
character 4
numeric 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Fund.Number 0 1 4 6 0 184 0
Type 0 1 20 23 0 2 0
Fees 0 1 2 3 0 2 0
Risk 0 1 7 13 0 3 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Assets 0 1 910.65 2253.27 12.40 113.72 268.4 621.95 18603.50 ▇▁▁▁▁
Expense.Ratio 0 1 0.71 0.26 0.12 0.53 0.7 0.90 1.94 ▂▇▅▁▁
Return.2009 0 1 7.16 6.09 -8.80 3.48 6.4 10.72 32.00 ▁▇▅▁▁
X3.Year.Return 0 1 4.66 2.52 -13.80 4.05 5.1 6.10 9.40 ▁▁▁▅▇
X5.Year.Return 0 1 3.99 1.49 -7.30 3.60 4.3 4.90 6.80 ▁▁▁▅▇

Most data types are represented. There is no time variable so dates and the visualizations that go with time series are omitted.

Data Visualization

First, let us look at visualizations for one variable.

Bar plots and column plots

There are two ways to construct a barplot; we can let ggplot handle it on the raw data or calculate it ourselves. Let me focus on Risk.

geom_bar()

Bonds %>% ggplot() + aes(x = Risk) + geom_bar()

Raw Data Bar Plot [color]

Bonds %>% ggplot() + aes(x = Risk, color = Risk) + geom_bar()

Raw Data Bar Plot [color and fill]

We could color it.

Bonds %>% ggplot() + aes(x = Risk, color = Risk) + geom_bar(fill = "white") + guides(color = FALSE)

Raw Data Bar Plot [Fill]

We can fill the shapes.

# guides(fill=FALSE) removes the legend
Bonds %>% ggplot(., aes(x = Risk, fill = Risk)) + geom_bar() + guides(fill = FALSE)

geom_bar() meets fill

We can also deploy fill but x is no longer the axis; the axis is some constant value with frequencies filled by the fill. This will require some prettying.

A Cumulative Bar Plot

Basic.Bar <- Bonds %>% ggplot(., aes(x = "", fill = Risk)) + geom_bar()
Basic.Bar

The prettying will require that I eliminate the x axis [set it to empty], include a theme, and give it proper labels.

Enhanced Cumulative Bar Plot

Bonds %>% ggplot(., aes(x = "", fill = Risk)) + geom_bar() + labs(x = "", y = "Number of Funds") + 
    theme_minimal() + theme(axis.text.x = element_blank())

Proportion Bar Plot

Bonds %>% ggplot(., aes(x = "", fill = Risk)) + geom_bar(position = "fill") + labs(x = "", 
    y = "Proportion of Funds")

The prettying will require that I eliminate the x axis [set it to empty], include a theme, and give it proper labels.

Enhanced Proportion Bar Plot

Bonds %>% ggplot(., aes(x = "", fill = Risk)) + geom_bar(position = "fill") + labs(x = "", 
    y = "Propotion of Funds") + theme_minimal()

geom_col()

Risk.Table <- table(Bonds$Risk) %>% data.frame()
Risk.Table %>% ggplot(., aes(x = Var1, y = Freq)) + geom_col()

Beautifying geom_col()

Now it really needs some beautification.

Risk.Table %>% ggplot(., aes(x = Var1, y = Freq, fill = Var1)) + geom_col() + labs(x = "Risk Levels", 
    y = "Number of Funds") + theme_minimal() + theme(axis.text.x = element_blank()) + 
    scale_fill_viridis_d() + guides(fill = FALSE)

position = "fill"

The two commands are symmetric in the sense that x as axis always splits it into multiple parts. fill will prove very useful with a two dimensional table.

Risk.Table %>% ggplot(., aes(x = 1, y = Freq, fill = Var1)) + geom_col(position = "fill") + 
    labs(x = "Risk Levels", y = "Number of Funds") + theme_minimal() + theme(axis.text.x = element_blank()) + 
    scale_fill_viridis_d() + guides(fill = FALSE)

A lollipop chart

A lollipop chart is a combination of two geometries. It is a basic scatterplot combining one qualitative variable and the quantitative count of the number of observations. The head of the lollipop is a point while there is an accompanying line segment from (x,0) to (x,Freq) where Freq is the default name for a count from a table.

Basic Lollipop Chart

Risk.Table %>% ggplot(., aes(x = Var1, y = Freq, color = Var1)) + geom_point(size = 6) + 
    labs(x = "Risk Level", y = "Number of Funds", color = "Risk Level") + geom_segment(aes(xend = Var1, 
    y = 0, yend = Freq)) + theme_minimal()

Slicked Lollipop Chart by Adjusting Segment Size

Risk.Table %>% ggplot(., aes(x = Var1, y = Freq, color = Var1)) + geom_point(size = 6) + 
    labs(x = "Risk Levels", y = "Number of Funds") + geom_segment(aes(xend = Var1, 
    y = 0, yend = Freq), size = 1.5) + theme_minimal() + guides(color = FALSE)

Risk.Table %>% ggplot(., aes(x = Var1, y = Freq, color = Var1)) + geom_point(size = 6) + 
    labs(x = "Risk Levels", y = "Number of Funds") + geom_segment(aes(xend = Var1, 
    y = 0, yend = Freq)) + theme_minimal() + scale_color_viridis_d() + guides(color = FALSE) + 
    coord_flip()

A Lollipop Table [geom_label()]

Now I will switch up the points to be the actual values as text. For this, I use the geom_text aesthetic that requires a label to be assigned. I also want to put down the lines before the text to avoid overlap.

Risk.Table %>% ggplot(., aes(x = Var1, y = Freq, color = Var1, label = Freq)) + labs(x = "Risk Levels", 
    y = "Number of Funds") + geom_segment(aes(xend = Var1, y = 0, yend = Freq)) + 
    geom_label(size = 6) + theme_minimal() + scale_color_viridis_d() + guides(color = FALSE) + 
    coord_flip()

A Lollipop Table [geom_text() inverse]

The ggplot is built in layers so the segment before the label makes sure that the white shows up. The fill and a discrete color are combined to create this graphical table.

Risk.Table %>% ggplot(., aes(x = Var1, y = Freq, color = Var1, fill = Var1, label = Freq)) + 
    geom_segment(aes(xend = Var1, y = 0, yend = Freq), size = 1.5) + geom_label(size = 6, 
    color = "white") + labs(x = "Risk Levels", y = "Number of Funds") + theme_minimal() + 
    scale_color_viridis_d() + scale_fill_viridis_d() + guides(fill = FALSE, color = FALSE) + 
    coord_flip()

I HATE PIE CHARTS

A pie chart is fairly easy to do. Let’s go back and show something that I find pretty amazing. A pie chart is a bar chart [the fill variety] with coordinates that fill a circle rather than a square. We take the most basic bar plot – Basic.Bar – and add three things: new coordinates that are polar, labels, and a blank theme to eliminate axis labels.

Basic.Bar + coord_polar("y", start = 0) + labs(x = "", y = "") + theme_void()

Avatar
Robert W. Walker
Associate Professor of Quantitative Methods

My research interests include causal inference, statistical computation and data visualization.

Next
Previous

Related