Beer Distribution
The #tidyTuesday for March 31, 2020 is on beer. The essential elements and a method for pulling the data are shown:
Imgur
A Comment on Scraping .pdf
The Tweet
The details on how the data were obtained are a nice overview of scraping .pdf files. The code for doing it is at the bottom of the page. @thomasmock has done a great job commenting his way through it.
R to Import COVID Data
library(tidyverse)
library(gganimate)
COVID.states <- read.csv(url("http://covidtracking.com/api/states/daily.csv"))
COVID.states <- COVID.states %>% mutate(Date = as.Date(as.character(date), format = "%Y%m%d"))
The Raw Testing Incidence
I want to use patchwork to show the testing rate by state in the United States. Then I want to show where things currently stand. In both cases, a base-10 log is used on the number of tests.
tidyTuesday: December 10, 2019
Replicating plots from simplystatistics. One nice twist is the development of a tidytuesdayR package to grab the necessary data in an easy way. You can install the package via github. I will also use fiftystater and ggflags.
devtools::install_github("thebioengineer/tidytuesdayR")
devtools::install_github("ellisp/ggflags")
devtools::install_github("wmurphyrd/fiftystater")
tuesdata <- tidytuesdayR::tt_load(2019, week = 50)
## --- Downloading #TidyTuesday Information for 2019-12-10 ----
## --- Identified 4 files available for download ----
## --- Downloading files ---
## Warning in identify_delim(temp_file): Not able to detect delimiter for the file.
Philadelphia Map
Use ggmap for the base layer.
library(ggmap); library(osmdata); library(tidyverse)
PHI <- get_map(getbb("Philadelphia, PA"), maptype = "stamen", zoom=12)
Get the Tickets Data
TidyTuesday covers 1.26 million parking tickets in Philadelphia.
tickets <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-03/tickets.csv")
## Parsed with column specification:
## cols(
## violation_desc = col_character(),
## issue_datetime = col_datetime(format = ""),
## fine = col_double(),
## issuing_agency = col_character(),
## lat = col_double(),
## lon = col_double(),
## zip_code = col_double()
## )
Two Lines of Code Left
library(lubridate); library(ggthemes)
tickets <- tickets %>% mutate(Day = wday(issue_datetime, label=TRUE)) # use lubridate to extract the day of the month.
Searching and Mapping the Census
Searching for the Asian Population via the Census
To use tidycensus, there are limitations imposed by the available tables. There is ACS – a survey of about 3 million people – and the two main decennial census files [SF1] and [SF2]. I will search SF1 for the Asian population.
library(tidycensus); library(kableExtra)
library(tidyverse); library(stringr)
v10 <- load_variables(2010, "sf1", cache = TRUE)
v10 %>% filter(str_detect(concept, "ASIAN")) %>% filter(str_detect(label, "Female")) %>% kable() %>% scroll_box(width = "100%")
name
label
concept
P012D026
Total!
Fariss Data
Is neat and complete.
load("FarissHRData.RData")
skimr::skim(HR.Data)
Table 1: Data summary
Name
HR.Data
Number of rows
11717
Number of columns
27
_______________________
Column type frequency:
factor
1
numeric
26
________________________
Group variables
None
Variable type: factor
skim_variable
n_missing
complete_rate
ordered
n_unique
top_counts
COW_YEAR
0
1
FALSE
11717
100: 1, 100: 1, 100: 1, 100: 1
Variable type: numeric
Scraping NFL data
Note: An original version of this post had issues induced by overtime games. There is a better way to handle all of this that I learned from a brief analysis of a tie game between Cleveland and Pittsburgh in Week One.
The nflscrapR package is designed to make data on NFL games more easily available. To install the package, we need to grab it from github.
Archigos
Is an amazing collaboration that produced a comprehensive dataset of world leaders going pretty far back; see Archigos on the web. For thinking about leadership, it is quite natural. In this post, I want to do some reshaping into country year and leader year datasets and explore the basic confines of Archigos. I also want to use gganimate for a few things. So what do we know?