The #tidyTuesday for March 31, 2020 is on beer. The essential elements and a method for pulling the data are shown:
A Comment on Scraping .pdf
The details on how the data were obtained are a nice overview of scraping .pdf files. The code for doing it is at the bottom of the page. @thomasmock has done a great job commenting his way through it.
R to Import COVID Data
COVID.states <- read.csv(url("http://covidtracking.com/api/states/daily.csv"))
COVID.states <- COVID.states %>% mutate(Date = as.Date(as.character(date), format = "%Y%m%d"))
The Raw Testing Incidence
I want to use patchwork to show the testing rate by state in the United States. Then I want to show where things currently stand. In both cases, a base-10 log is used on the number of tests.
tidyTuesday: December 10, 2019
Replicating plots from simplystatistics. One nice twist is the development of a tidytuesdayR package to grab the necessary data in an easy way. You can install the package via github. I will also use fiftystater and ggflags.
tuesdata <- tidytuesdayR::tt_load(2019, week = 50)
## --- Downloading #TidyTuesday Information for 2019-12-10 ----
## --- Identified 4 files available for download ----
## --- Downloading files ---
## Warning in identify_delim(temp_file): Not able to detect delimiter for the file.
Use ggmap for the base layer.
library(ggmap); library(osmdata); library(tidyverse)
PHI <- get_map(getbb("Philadelphia, PA"), maptype = "stamen", zoom=12)
Get the Tickets Data
TidyTuesday covers 1.26 million parking tickets in Philadelphia.
tickets <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-03/tickets.csv")
## Parsed with column specification:
## violation_desc = col_character(),
## issue_datetime = col_datetime(format = ""),
## fine = col_double(),
## issuing_agency = col_character(),
## lat = col_double(),
## lon = col_double(),
## zip_code = col_double()
Two Lines of Code Left
tickets <- tickets %>% mutate(Day = wday(issue_datetime, label=TRUE)) # use lubridate to extract the day of the month.
Searching and Mapping the Census
Searching for the Asian Population via the Census
To use tidycensus, there are limitations imposed by the available tables. There is ACS – a survey of about 3 million people – and the two main decennial census files [SF1] and [SF2]. I will search SF1 for the Asian population.
v10 <- load_variables(2010, "sf1", cache = TRUE)
v10 %>% filter(str_detect(concept, "ASIAN")) %>% filter(str_detect(label, "Female")) %>% kable() %>% scroll_box(width = "100%")
Is neat and complete.
Table 1: Data summary
Number of rows
Number of columns
Column type frequency:
Variable type: factor
100: 1, 100: 1, 100: 1, 100: 1
Variable type: numeric
Scraping NFL data
Note: An original version of this post had issues induced by overtime games. There is a better way to handle all of this that I learned from a brief analysis of a tie game between Cleveland and Pittsburgh in Week One.
The nflscrapR package is designed to make data on NFL games more easily available. To install the package, we need to grab it from github.
Is an amazing collaboration that produced a comprehensive dataset of world leaders going pretty far back; see Archigos on the web. For thinking about leadership, it is quite natural. In this post, I want to do some reshaping into country year and leader year datasets and explore the basic confines of Archigos. I also want to use gganimate for a few things. So what do we know?