Beer Distribution The #tidyTuesday for March 31, 2020 is on beer. The essential elements and a method for pulling the data are shown:
A Comment on Scraping .pdf The Tweet
The details on how the data were obtained are a nice overview of scraping .pdf files. The code for doing it is at the bottom of the page. @thomasmock has done a great job commenting his way through it.
The Office library(tidyverse) office_ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-17/office_ratings.csv') A First Plot The number of episodes for the Office by season.
library(janitor) TableS <- office_ratings %>% tabyl(season) p1 <- TableS %>% ggplot(., aes(x=as.factor(season), y=n, fill=as.factor(season))) + geom_col() + labs(x="Season", y="Episodes", title="The Office: Episodes") + guides(fill=FALSE) p1 Ratings How are the various seasons and episodes rated?
p2 <- office_ratings %>% ggplot(., aes(x=as.factor(season), y=imdb_rating, fill=as.factor(season), color=as.factor(season))) + geom_violin(alpha=0.3) + guides(fill=FALSE, color=FALSE) + labs(x="Season", y="IMDB Rating") + geom_point() p2 Patchwork Using patchwork, we can combine multiple plots.
tidyTuesday on the Carbon Footprint of Feeding the Planet The tidyTuesday for this week relies on data scraped from the Food and Agricultural Organization of the United Nations. The blog post for obtaining the data can be found on r-tastic. The scraping exercise is nice and easy to follow and explored a case of cleaning up a very messy data structure. I took this exercise as practice for using pivot_wider and pivot_longer.
Trees in San Francisco This week’s data cover trees in San Francisco.
sf_trees <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-28/sf_trees.csv') library(tidyverse); library(ggmap); library(skimr) skim(sf_trees) Table 1: Data summary Name sf_trees Number of rows 192987 Number of columns 12 _______________________ Column type frequency: character 6 Date 1 numeric 5 ________________________ Group variables None Variable type: character
First, I wanted to acquire the distribution of letters and then play with that. I embedded the result here. The second step is to import the tidyTuesday data.
library(tidyverse) Letter.Freq <- data.frame(stringsAsFactors=FALSE, Letter = c("E", "T", "A", "O", "I", "N", "S", "R", "H", "D", "L", "U", "C", "M", "F", "Y", "W", "G", "P", "B", "V", "K", "X", "Q", "J", "Z"), Frequency = c(12.02, 9.1, 8.12, 7.68, 7.31, 6.95, 6.28, 6.
Adoptable Dogs # devtools::install_github("thebioengineer/tidytuesdayR", force=TRUE) tuesdata51 <- tidytuesdayR::tt_load(2019, week = 51) dog_moves <- tuesdata51$dog_moves dog_des <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-17/dog_descriptions.csv') library(tidyverse); library(scatterpie) library(rgeos) library(maptools) library(rgdal); library(usmap); library(ggthemes) The Base Map My.Map <- us_map(regions = "states") Base.Plot <- ggplot() + geom_polygon(data=My.Map, aes(x=x, y=y, group=group), fill="white", color="black") + theme_map() Base.Plot A fifty state map to plot this information on.
New.Dat <- left_join(My.Map, dog_moves, by= c("full" = "location")) ggplot() + geom_polygon(data=New.