I am an Associate Professor of Quantitative Methods in the Atkinson Graduate School of Management at Willamette University. My research interests include panel/cross-sectional time series data, causal inference in observed populations (joint with Tim Johnson), political economy, and general applied statistics and statistical computing. I am also an Honorary Instructor at the University of Essex where I lecture annually in the Essex Summer School in Social Science Data Analysis. I have held previous appointments at Dartmouth College, Harvard University, Texas A&M University, Washington University in Saint Louis, and Rice University. With Curt Signorino and Muhammet Bas, I was awarded the Warren Miller Prize for Statistical Backwards Induction for the best article in Political Analysis.
Of greatest import, I am married to the love of my life, am the proud father of two wonderful sons, like the Pacific Northwest, and, to keep things in balance, am a lifelong Arsenal fan.
PhD in Political Science, 2005
University of Rochester
MA in Political Science, 2002
Universty of Rochester
BA in Post-Soviet and East European Studies, 1995
University of Texas at Austin
Страноведение России, 1994
Московский государственный лингвистический университет
tl;dr In September of 2018, I began to track email solicitations by the Trump Campaign. I have noticed a striking pattern of increasing fundraising activity that started just after the July 4 weekend but I wanted to verify this over the span of the data. In short, something is up. The Data I will use the wonderful gmailr package to access my gmail. You need a key and an id that the vignette gives guidance on.
New York Times data for the US The New York Times has a wonderful compilation of United States on the novel coronavirus. The data update automatically so the following graphics were generated with data retrieved at 2020-10-23 17:07:05. The Basic State of Things options(scipen=9) library(tidyverse); library(hrbrthemes); library(patchwork); library(plotly); library(ggdark); library(ggrepel) CTP <- read.csv("https://covidtracking.com/api/v1/states/daily.csv") state.data <- read_csv(url("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")) Rect.NYT <- complete(state.data, state,date) Rect.NYT <- Rect.NYT %>% group_by(state) %>% mutate(New.Cases = cases - lag(cases, order_by = date), New.
tidyTuesday beyonce_lyrics Load the data. beyonce_lyrics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-29/beyonce_lyrics.csv') ## ## ── Column specification ──────────────────────────────────────────────────────── ## cols( ## line = col_character(), ## song_id = col_double(), ## song_name = col_character(), ## artist_id = col_double(), ## artist_name = col_character(), ## song_line = col_double() ## ) str(beyonce_lyrics) ## tibble [22,616 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame) ## $ line : chr [1:22616] "If I ain't got nothing, I got you" "If I ain't got something, I don't give a damn" "'Cause I got it with you" "I don't know much about algebra, but I know 1+1 equals 2" .
The datasaurus dozen The datasaurus sozen is a fantastic teaching resource for examining the importance of data visualization. Let’s have a look. datasaurus <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-10-13/datasaurus.csv') ## ## ── Column specification ──────────────────────────────────────────────────────── ## cols( ## dataset = col_character(), ## x = col_double(), ## y = col_double() ## ) Two libraries to make our work easy. library(tidyverse) library(skimr) First, the summary statistics. datasaurus %>% group_by(dataset) %>% skim() Table 1: Data summary Name Piped data Number of rows 1846 Number of columns 3 _______________________ Column type frequency: numeric 2 ________________________ Group variables dataset Variable type: numeric
Spending on Kids First, let me import the data. kids <- read.csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-15/kids.csv') # kids <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-15/kids.csv') Now let me summarise it and show a table of the variables. summary(kids) ## state variable year raw ## Length:23460 Length:23460 Min. :1997 Min. : -60139 ## Class :character Class :character 1st Qu.:2002 1st Qu.: 71985 ## Mode :character Mode :character Median :2006 Median : 252002 ## Mean :2006 Mean : 1181359 ## 3rd Qu.
DADM Kicks off
Exploring the grammar of graphics using Archigos, the database of leaders.
Basic ggplot one step at a time.
One and Two Sample Inference
tidy data and a crash course in tidyverse
Exploring the grammar of graphics using Archigos, the database of leaders.
Probability and Tables
The management of the fuzzy front end (FFE) phase of innovation is crucial to the ultimate success of new product and process initiatives. A critical challenge that teams face at this stage is dealing with equivocality – the extent to which project participants grapple with multiple, and plausibly conflicting, meanings and interpretations of the information available to them (Daft and Lengel, 1986; Weick, 1979). While initially, a certain level of equivocality is beneficial for enhancing team creativity and preventing early closure, at some point it must be resolved in order for an idea to become a viable New Product Development (NPD) project. This study employs a social networks perspective to understand how different types of informal work-based relations and their structural properties affect equivocality on project teams in the FFE. In particular, it examines the structural effects of two types of social relations and their associated networks – technical-advice and friendship ties. The findings suggest that while high density in a projects technical-advice network is likely to reduce equivocality, high density in a projects friendship network is likely to increase it. More interestingly, having multiple members on projects who are highly central in the lab technical-advice network tends to increase equivocality unless it is balanced with members who occupy positions of high centrality in the lab friendship network. In addition to contributing to the scholarship on NPD, FFE, and social networks, the results offer managerial insights for deploying social networks in order to assemble NPD teams and structure the flows of communication on projects so as to resolve equivocality in the FFE.
To measure the effect of veterans’ preference on U.S. federal workforce quality, researchers have assessed whether military veterans advance in their federal careers at a different rate than nonveterans. This research, however, has produced mixed results. In research concerning recent employee cohorts, nonveterans outpace veterans’ advancement, implying that veterans’ preference lessens employee quality. In older cohorts, veterans and nonveterans advance comparably. The latter research, however, controls for employees’ entry positions, whereas research concerning recent cohorts does not do so, thus inhibiting direct comparison of results. To facilitate such comparisons, we controlled for veterans’ and nonveterans’ entry positions in a study of career advancement among all white-collar, U.S. executive branch workers entering employment from 1992 to 2013. In these recent cohorts, we find roughly equivalent rates of career advancement among veterans and nonveterans when controlling for entry positions. This finding holds when using grade or pay increases as measures of advancement.
We examine models that relax proportionality in cumulative ordered regression models. Something fundamental arising from ordered variables and stochastic ordering implies a partitioning. Efforts to relax proportionality also relax the ability to collapse an inherently multidimensional problem to a partitioning of the (unidimensional) real line. It is surprising and unfortunate to find that deviations from proportionality are sufficient to generate internal contradictions; undecidable propositions must exist by relaxing proportional odds without other relevant and significant changes in the underlying model. We prove a single theorem linking continuous support and partitions of a latent space to show that for these two characteristics to be simultaneously satisfied, the model must be the proportional-odds model. Conditioning on the adjacency that is closely related to the partitioning is fruitful, but at this point we join the class of continuation-ratio models. Alternatively, Anderson’s (1984) stereotype model is quite general and nests ordered and unordered choice models, but again we have left the domain of cumulative models. Adopting multidimensional cumulative models or imposing covariate-specific thresholds are the only certain methods for avoiding these troubles in the cumulative framework. It is generically impossible to generalize the cumulative class of ordered regression models in ways consistent with the spirit of generalized cumulative regression models. Monte Carlo studies also demonstrate the general principles.