The Office library(tidyverse) office_ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-17/office_ratings.csv') A First Plot The number of episodes for the Office by season.
library(janitor) TableS <- office_ratings %>% tabyl(season) p1 <- TableS %>% ggplot(., aes(x=as.factor(season), y=n, fill=as.factor(season))) + geom_col() + labs(x="Season", y="Episodes", title="The Office: Episodes") + guides(fill=FALSE) p1 Ratings How are the various seasons and episodes rated?
p2 <- office_ratings %>% ggplot(., aes(x=as.factor(season), y=imdb_rating, fill=as.factor(season), color=as.factor(season))) + geom_violin(alpha=0.3) + guides(fill=FALSE, color=FALSE) + labs(x="Season", y="IMDB Rating") + geom_point() p2 Patchwork Using patchwork, we can combine multiple plots.
tidyTuesday on the Carbon Footprint of Feeding the Planet The tidyTuesday for this week relies on data scraped from the Food and Agricultural Organization of the United Nations. The blog post for obtaining the data can be found on r-tastic. The scraping exercise is nice and easy to follow and explored a case of cleaning up a very messy data structure. I took this exercise as practice for using pivot_wider and pivot_longer.
Trees in San Francisco This week’s data cover trees in San Francisco.
sf_trees <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-28/sf_trees.csv') library(tidyverse); library(ggmap); library(skimr) skim(sf_trees) Table 1: Data summary Name sf_trees Number of rows 192987 Number of columns 12 _______________________ Column type frequency: character 6 Date 1 numeric 5 ________________________ Group variables None Variable type: character
First, I wanted to acquire the distribution of letters and then play with that. I embedded the result here. The second step is to import the tidyTuesday data.
library(tidyverse) Letter.Freq <- data.frame(stringsAsFactors=FALSE, Letter = c("E", "T", "A", "O", "I", "N", "S", "R", "H", "D", "L", "U", "C", "M", "F", "Y", "W", "G", "P", "B", "V", "K", "X", "Q", "J", "Z"), Frequency = c(12.02, 9.1, 8.12, 7.68, 7.31, 6.95, 6.28, 6.
tidyTuesday: December 10, 2019 Replicating plots from simplystatistics. One nice twist is the development of a tidytuesdayR package to grab the necessary data in an easy way. You can install the package via github. I will also use fiftystater and ggflags.
devtools::install_github("thebioengineer/tidytuesdayR") devtools::install_github("ellisp/ggflags") devtools::install_github("wmurphyrd/fiftystater") tuesdata <- tidytuesdayR::tt_load(2019, week = 50) ## --- Downloading #TidyTuesday Information for 2019-12-10 ---- ## --- Identified 4 files available for download ---- ## --- Downloading files --- ## Warning in identify_delim(temp_file): Not able to detect delimiter for the file.
International Murders Are among the data for analysis in the tidyTuesday for December 10, 2019. These are made for a map.
library(tidyverse) library(leaflet) library(stringr) library(sf) library(here) library(widgetframe) library(htmlwidgets) library(htmltools) options(digits = 3) set.seed(1234) theme_set(theme_minimal()) library(tidytuesdayR) tuesdata <- tt_load(2019, week = 50) murders <- tuesdata$gun_murders There isn’t much data so it should make this a bit easier. Now for some data. As it happens, the best way I currently know how to do this is going to involve acquiring a spatial frame.
Philadelphia Map Use ggmap for the base layer.
library(ggmap); library(osmdata); library(tidyverse) PHI <- get_map(getbb("Philadelphia, PA"), maptype = "stamen", zoom=12) Get the Tickets Data TidyTuesday covers 1.26 million parking tickets in Philadelphia.
tickets <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-03/tickets.csv") ## Parsed with column specification: ## cols( ## violation_desc = col_character(), ## issue_datetime = col_datetime(format = ""), ## fine = col_double(), ## issuing_agency = col_character(), ## lat = col_double(), ## lon = col_double(), ## zip_code = col_double() ## ) Two Lines of Code Left library(lubridate); library(ggthemes) tickets <- tickets %>% mutate(Day = wday(issue_datetime, label=TRUE)) # use lubridate to extract the day of the month.
Searching and Mapping the Census Searching for the Asian Population via the Census To use tidycensus, there are limitations imposed by the available tables. There is ACS – a survey of about 3 million people – and the two main decennial census files [SF1] and [SF2]. I will search SF1 for the Asian population.
library(tidycensus); library(kableExtra) library(tidyverse); library(stringr) v10 <- load_variables(2010, "sf1", cache = TRUE) v10 %>% filter(str_detect(concept, "ASIAN")) %>% filter(str_detect(label, "Female")) %>% kable() %>% scroll_box(width = "100%") name label concept P012D026 Total!
The Economist’s Errors and Credit Where Credit is Due The Economist is serious about their use of data visualization and they have occasionally owned up to errors in their visualizations. They can be deceptive, uninformative, confusing, excessively busy, and present a host of other barriers to clean communication. Their blog post on their errors is great.
I have drawn the following example from a #tidyTuesday earlier this year that explores this.