Homework 2
- Initial due date, 2/13
- Revision date, 2/15
Instructions
In this assignment, you’ll recreate the HCRIS data and answer a few questions along the way. The first step is to make sure you’re working with the HCRIS GitHub repository and downloaded all of the raw data sources. Once you have the data downloaded and the code running, answer the following questions:
Summarize the data
How many hospitals filed more than one report in the same year? Show your answer as a line graph of the number of hospitals over time.
After removing/combining multiple reports, how many unique hospital IDs (Medicare provider numbers) exist in the data?
What is the distribution of total charges (tot_charges in the data) in each year? Show your results with a “violin” plot, with charges on the y-axis and years on the x-axis. For a nice tutorial on violin plots, look at Violin Plots with ggplot2.
What is the distribution of estimated prices in each year? Again present your results with a violin plot, and recall our formula for estimating prices from class.
discount_factor = 1-tot_discounts/tot_charges
price_num = (ip_charges + icu_charges + ancillary_charges)*discount_factor - tot_mcare_payment
price_denom = tot_discharges - mcare_discharges
price = price_num/price_denom
Estimate ATEs
For the rest of the assignment, you should include only observations in 2012. So we are now dealing with cross-sectional data in which some hospitals are penalized and some are not. Please also define penalty as whether the sum of the HRRP and HVBP amounts are negative (i.e., a net penalty under the two programs). Code to do this is in the Section 2 slides.
Calculate the average price among penalized versus non-penalized hospitals.
Split hospitals into quartiles based on bed size. To do this, create 4 new indicator variables, where each variable is set to 1 if the hospital’s bed size falls into the relevant quartile. Provide a table of the average price among treated/control groups for each quartile.
Find the average treatment effect using each of the following estimators, and present your results in a single table:
- Nearest neighbor matching (1-to-1) with inverse variance distance based on quartiles of bed size
- Nearest neighbor matching (1-to-1) with Mahalanobis distance based on quartiles of bed size
- Inverse propensity weighting, where the propensity scores are based on quartiles of bed size
- Simple linear regression, adjusting for quartiles of bed size using dummy variables and appropriate interactions as discussed in class
With these different treatment effect estimators, are the results similar, identical, very different?
Do you think you’ve estimated a causal effect of the penalty? Why or why not? (just a couple of sentences)
Briefly describe your experience working with these data (just a few sentences). Tell me one thing you learned and one thing that really aggravated you.