Homework 1
- Initial due date, 1/23
- Revision date, 1/25
Instructions
This assignment is just about GitHub and data management. The goal is to give you a chance to practice wrangling and tidying data. We do this very early in the class because we will start doing some empirical analysis using real data soon. The faster you are comfortable with the datasets, the better. For more detailed instructions on how to submit your homework answers, please see the overview page here.
Building the data
The purpose of this part of the assignment is essentially to practice database management. Most of your professional lives will likely involve managing data. It can be tedious but also extremely rewarding when you finally get to find out what’s going on in the analysis stage. Anyway, let’s get to work! All of these questions require you to use the Medicare Advantage GitHub Repo.
Enrollment Data
Run the R
code to organize the Monthly Plan Enrollment Data. Once you’ve created your final dataset (it’s called full_ma_data in my code), answer the following:
How many observations exist in your current dataset?
How many different plan_types exist in the data?
Provide a table of the count of plans under each plan type in each year. Your table should look something like Table 1.
knitr::kable(test.data, col.names=c("2010","2011","2012","2013","2014","2015"),
type="html", caption = "Plan Count by Year", booktabs = TRUE)
2010 | 2011 | 2012 | 2013 | 2014 | 2015 | |
---|---|---|---|---|---|---|
Type 1 | 12 | 32 | 29 | 10 | 12 | 12 |
Type 2 | 30 | 27 | 18 | 41 | 32 | 30 |
Type 3 | 25 | 17 | 16 | 16 | 31 | 25 |
Remove all special needs plans (SNP), employer group plans (eghp), and all “800-series” plans. Provide an updated version of Table 1 after making these exclusions.
Merge the contract service area data to the enrollment data, and restrict the data only to contracts that are approved in their respective counties. The
R
script to create the service area dataset is here: Contract Service Area. And you can follow the _BuildFinalData.R script to see where/how I join the datasets. Limiting your dataset only to plans with non-missing enrollment data, provide a graph showing the average number of Medicare Advantage enrollees per county from 2008 to 2015. Be sure to format your graph in a meaningful way.
Summary Questions
With all of this data work and these great summaries, let’s take a step back and think about what all this means.
Why did we drop the “800-series” plans?
Why do so many plans charge a $0 premium? What does that really mean to a beneficiary?
Briefly describe your experience working with these data (just a few sentences). Tell me one thing you learned and one thing that really aggravated you.