R Code

# R Code

## COVID-19 in Oregon

Oregon COVID data I wanted to create a self-updating visualization of the data on COVID-19 in the state of Oregon provided by OHA. I still have yet to do that but decided to build this one to visualize the New York Times data. There is a separate page of daily maps. Oregon reports a set of daily snapshots while progression requires ingesting new data each day so I began tracking it March 20; the process of scraping it is detailed in a separate file.

## COVID-19 Maps for Oregon

library(tigris) library(rgdal) library(htmltools) library(viridis) library(sf) library(ggrepel) library(ggthemes) library(gganimate) library(patchwork) library(hrbrthemes) load(url(paste0("https://github.com/robertwwalker/rww-science/raw/master/content/R/COVID/data/OregonCOVID",Sys.Date(),".RData"))) Verifying, these data are current as of 2020-06-05 according to the loaded dataset. A base map To build a map to work from, I need a map library. Load the tigris library then grab the map as an sf object; there is a geom_sf that makes them easy to work with. Finally, I join the map to the data.

## Mapping Europe

R Natural Earth Map of Europe library(tidyverse) library(rnaturalearth) Europe <- ne_countries(scale = 'medium', type = 'map_units', returnclass = 'sf', continent="Europe") ggplot(Europe) + geom_sf() Woops. Because they come from an entire world map, latitude and longitude are too big. To cut them off, we need to crop the map. The returnclass above means we need a tool for sf data, that tool is st_crop out of the sf library.

## COVID-19 Scraping

NB: This was last updated on March 25, 2020. Building Oregon COVID data I have a few days of data now. To rebuild it, I will have to use the waybackmachine. The files that I need to locate and follow updates to this page from Oregon’s OHA. A Scraper Let me explain the logic for the scraper. NB: I had to rewrite it; the original versions of the website had three tables without data on hospitalizations.

## Breaking Predict for lm() with dollar.sign

As is often the case with $$R$$, there are many ways to do things that are equivalent or nearly equivalent. It is the nearly equivalent part that is frustrating; one of the first encounters with this can come with attempts to predict a regression. The ultimate source of troubles is scoping and environments; the use of the \$ syntax sometimes has unintended side effects. lm() Syntax is Important I will refer to an example from a recent homework on regression.

## Financial Analysis of SEC Reports in R

The Package: finreport The key tool to facilitate the financial analysis of companies that file regular SEC reports of certain forms is finreportr. To make use of it, we must first have R install it and dependencies. To install it, install.packages("finreportr", dependencies=TRUE). The Commands The first command is CompanyInfo(). library(finreportr) CompanyInfo("JPM") ## company CIK SIC state state.inc FY.end street.address ## 1 JPMORGAN CHASE & CO 0000019617 6021 NY DE 1231 383 MADISON AVENUE ## city.

## A Quick and Dirty Introduction to R

Some Data I will start with some inline data. library(tidyverse); library(skimr); Support.Times <- structure(list(Screened = c(26.9, 28.4, 23.9, 21.8, 22.4, 25.9, 26.5, 20, 23.7, 23.7, 22.6, 19.4, 27.3, 25.3, 27.7, 25.3, 28.4, 24.2, 20.4, 29.6, 27, 23.6, 18.3, 28.1, 20.5, 24.1, 27.2, 26.4, 24.5, 25.6, 17.9, 23.5, 25.3, 20.2, 26.3, 27.9), Not.Screened = c(24.7, 19.1, 21, 17.8, 22.8, 24.4, 17.9, 20.5, 20, 26.2, 14.5, 22.4, 21.1, 24.3, 22, 24.3, 23.

## Working an Example on Proportions

A Proportions Example We started with an equation: $z = \frac{\hat{\pi} - \pi}{\sqrt{\frac{\pi(1-\pi)}{n}}}$ In language, the difference between the sample proportion (recall that with only two outcomes the sample proportion $$\hat{\pi}$$ is between 0 [all No’s] and 1 [all Yes’s]) and the true probability $$\pi$$ divided by the standard error of the proportion $$\sqrt{\frac{\pi(1-\pi)}{n}}$$ has a $$z$$ [Normal(0,1)] distribution under the condition that $$n\pi > 10$$ and $$n(1-\pi) > 10$$.

## The Duality of Hypothesis Tests and Confidence Intervals

cars data I will work with R’s internal dataset on cars: cars. There are two variables in the dataset, this is what they look like. plot(cars) An Hypothesis Test I will work with the speed variable. The hypothesis to advance is that 17 or greater is the true average speed. The alternative must then be that the average speed is less than 17. Knowing only the sample size, I can figure out what $$t$$ must be to reject 17 or greater and conclude that the true average must be less with 90% probability.

## Alluvial Plots

Alluvial and Sankey Diagrams The aforementioned plots are methods for visualising the flow of data through a stream of markers. I was motivated to show this because enough of you deal in orders, tickets, and the like the flow visualisation of a system might prove of use. I will work with a familiar dataset. These are data on Admissions at the University of California Berkeley. The data exist as an internal R file in tabular form.