DUKE UNIVERSITY
Multivariate Dynamic Linear Models and NBA Player Tracking
This project investigates the utility of time series modeling applied to player tracking data from NBA games. The goal of this project is to determine if an NBA player's movements are statistically exchangeable. Multivariate dynamic linear models were fit using discount factor based methods to the tracking data for Avery Bradley, a guard on the Boston Celtics (at the time). Multiple models were fit to two different uninterrupted sections of game play. Feed forward intervention was used to account for Bradley's changes in direction throughout the game. The marginal likelihoods of the discount factors are compared in order to determine if the movement of the player is statistically exchangeable.
An Analysis of NBA Spatio-Temporal Data
This project examines the utility of spatio-temporal tracking data from professional basketball games by fitting models predicting whether a player will make a shot. The first part of the project involved the exploration of the data, evaluated its issues, and generated features to use as co-variates in the models. The second part fit various classification models and evaluated their predictive performance. The paper concludes with a discussion of methods to improve the models and future work.
|
Modeling Armed Robberies in Chicago
The City of Chicago is frequently listed as one of the most dangerous and crime-ridden cities in theUS. President Donald Trump frequently discusses the high-rate of crime in Chicago. According to the Chicago Tribune, there were 4,367 shooting victims in Chicago in 2016. In the same yearthere were also 785 homicides. However, other reports conclude that Chicago should not becalled the “crime capital” of America, as Chicago’s violence rate is lower than cities like St. Louisand Detroit. The goal of this project was to examine crime in Chicago, specifically armedrobberies, from 2012-2016.
Making Music with Hidden Markov Models
Hidden Markov Models (HMMs) are widely used for modeling sequential data and have many applications. In this project, we used HMMs to capture the latent elements in music creation. The observed states are the pitch and velocity (volume) of each note in an existing song, while the latent states include elements such as chord progression, harmonics, dynamics and melody. In this paper, we implement three different HMMs trained on classical piano pieces and compare the resulting compositions for each model.
Latent Dirichlet Allocation
Latent Dirichlet Analysis (LDA) is a method used to model the generative process of creating discrete data such as text corpora. LDA can take a large amount of data and create descriptions of that data. This method reduces the size of the data to these short descriptions while still maintaining the relationships necessary to carry out various inferences. For this project, we implemented the LDA algorithm presented in the paper Latent Dirichlet Allocation, written by David M. Blei, Andrew Y. Ng, and Michael I. Jordan. The implementation of the algorithm is used to analyze the text of some famous American historical documents and to create clusters of movies based on user rating data.
Examining the Death Row Inmates of Texas
We investigate the similarities and differences between the inmates who have been executed in Texas using text mining and clustering techniques. The crimes and last statements of each inmate are assigned a topic through Latent Dirichlet Allocation (LDA). The k-modes clustering algorithm is used to group similar inmates and determine if similar inmates touch on the same subjects in their last statements. This project provides insight into the characteristics of executed prisoners in Texas, and extracts information from the descriptions of their crimes and last statements.
Selected Coursework
Selected work from courses taken during my two years in the Master's of Statistical Science program at Duke University.
AMHERST COLLEGE
What Makes A Batter Swing?
Major League Baseball's Pitch F/X data has allowed for granular investigation of baseball data. This project looks into what aspects of a pitch might cause an MLB player to swing at a pitch. I created an interactive application that allows users to visualize pitches and whether a batter chose to swing. In addition, I fit a logistic model to predict whether a batter would swing at a particular pitch.
NBA Player Tracking
The purpose of this project was to investigate the player tracking data available for the NBA. In the project, we explored the differences for the variables related to position. A model is fit to predict the minutes per game that an NBA player can expect to play based on other game level statistics as well as examining the distributions of some variables. Finally, various distributions are explored for select variables in the data and the best ones are selected.