PAF 9170: Research and Analysis I
Instructions: There are two parts (Part A: Policy Memo and Part B: Statistical Calculation) on this exam (most have several sub-parts) each worth the specified number of points. You are to answer every question. Make sure you show all your work so you can get partial credit even if your final answer is incorrect. Due is 11/4 before the class. Please bring the hard copy of your answer.
Part A. Policy Memo (50 points)
You are a health policy analyst for the Finance Committee of the U.S. Senate. The committee is examining the potential impact of the recently passed “health care reform” and needs an analysis of factors associated with health care expenditures. To carry out the analysis you are going to draw on 2006 data from one of the most comprehensive surveys of health care in the United States, the Medical Expenditure Panel Survey (MEPS). While this survey includes information on over 1,500 variables, you have extracted from it information on health care expenditures, sources of funding for health care, health condition, insurance coverage, income by source, and some demographic variables. The variables in the dataset for this assignment are list at the end of this handout. The data is discussed in more detail in a separate handout.
The MEPS data is aggregated to the family-level, where family is defined as family members that would usually be covered under the same family health insurance policy. (See document on MEPS data for more detail.) Because the MEPS in 2006 includes over 14,000 “families” you will use a random sample of approximately 300 families in your analysis. A description of how to draw a random sample in SPSS follows the discussion of the assignment.
Research Questions: Using the random sample you have drawn from the MEPS and three independent variables which you think might be related to total medical expenditures, you need to carry out the following analysis (you can use your previous results from assignments, but make sure you need to attach those results to the memo):
1) Provide a histogram for medical expenditures and at least three independent variables that you chose.
2) Provide summary statistics (mean, median and standard deviation) and a box plot for each of the independent variables.
3) Examine the association using a cross tabulation table between whether a family is uninsured and the family’s race, and census region. Are uninsured families systematically different than those that are insured?
4) Pick at least two continuous variables which you think might be related to medical expenditures. For each,
- Prepare a scatterplot
- Provide the correlation coefficients with medical expenditures and these independent variables
- Provide a regression line for medical expenditures regressed on these independent variables.
ASSIGNMENT: Based on this analysis, prepare the following;
1) A memo (single space, 1– 2 page) written to your boss discussing your findings and making suggestions on which variables should be considered in a more in-depth analysis of medical expenditures. Don’t just answer the questions presented in order. You need to summarize the findings and key conclusions from your analysis in language accessible to the educated layman.
2) Your SPSS results as the appendix.
I will use the following criteria when grading your memos:
Accuracy/thoroughness of analysis (50%)
– Did you answer the 4 questions using the appropriate statistics and interpret the statistics correctly?
– Do you use results from your analysis to support your conclusions?
– Can someone with no statistical background understand your memo?
– Does your memo follow the memo structure?
* Some useful tips about writing policy memos
– Writing Studio at Duke University
– Other Example
Part B. Statistical Calculation (50 points)
1. An investigator wants to study the relationship between physical and intellectual growth in children. He has data which give the size of the vocabulary and the height of all children in a large elementary school. Please fill in the blanks by choosing the appropriate word from the box below. In 1) and 2), which of these correlations presents a more meaningful comparison of these variables? Explain briefly (4 points)
1) Within each of the grades, 1-6, the correlation between height and size of vocabulary is likely to be __________.
2) For all students in the school, the correlation is likely to be _________.
a) about zero; b) positive; c) negative; d) no way to tell
2. A professor has been asked to teach a course in social science statistics off-campus to a class of graduate students enrolled in the Continuing Education Program of the University. Because the professor has never taught in this program before, he does not know a great deal about the needs and backgrounds of the students in the class. In order to learn more about the students, on the first day of class he obtains from each of them the following information: age, major field as an undergraduate student, number of statistics courses taken previously, and interest in conducting empirical research (coded low, medium, high). This information is listed below:
Student # Age Major Statistics Courses Research Interest
1 24 Poli Sci 3 High
2 55 Zoology 3 High
3 26 Botany 0 Low
4 55 Sociology 0 Low
5 22 Poli Sci 1 Low
6 23 Sociology 2 Medium
7 24 Poli Sci 2 Medium
8 55 Forestry 1 Low
9 56 Engineering 9 High
10 53 Poli Sci 1 Medium
11 26 Chemistry 2 Medium
12 24 Sociology 0 Low
13 54 Physics 3 High
14 51 Sociology 3 High
15 55 Poli Sci 0 Low
16 25 Poli Sci 1 Medium
17 24 Sociology 1 Medium
Based on these data, answer questions in the following. (Total 13 points)
(a) At what level of measurement is each of these variables such as Age, Major, Statistics Course, and Research Interest? (3 points)
(b) In order to summarize this data, the professor decides to calculate measures of central tendency for these variables. For each variable, which measures of the central tendency can be calculated? For each of these variables, calculate the measure(s) of central tendency which can be calculated. (5 points)
(c) For each of these variables, which measure(s) of central tendency would it be most useful for the professor to bear in mind as he conducts the class? That is, for each of these variables, which measure(s) yield(s) the most typical or representative value for the class of students? For each variable, briefly explain your answer. (5 points)
3. Assume you work for the Army Corps of Engineers involved with flood control projects. After years of analyzing water levels on the Ohio River, the Corps has determined that water levels in a certain stretch of river (prone to flooding) are normally distributed with a mean of 20 feet and standard deviation of 5 feet. Widespread flooding occurs when the water level reaches 26 feet. (Total 12 points)
1) What percent of the time is there a flood? (5 points)
2) The corps wants to decrease the probability of a flood to only 3 percent of the time. How much will they have to raise the river bank? In other words, determine the height of the river bank to limit flooding to 3% of the time. (7 points)
4. The city council of Greely received the following data from the local police department regarding the crimes committed in the community. The data show the race and sex of those who committed crimes in the last year and whether or not they received a jail term for their actions. (Total 6 points)
Crimes Committed in Greely
Jail Sentence Male Female Total Male Female Total
Yes 48 34 82 107 62 169
No 72 34 106 57 26 83
Total 120 68 188 164 88 252
Total N = 440
Help the city council answer the following questions.
1) Were men or women more likely to receive a jail sentence? By how much? (2 points)
2) Were blacks or whites more likely to receive a jail sentence? By how much? (2 points)
3) Do these data show evidence of discrimination against any groups in receiving jail sentences? In other words, was any group of people (i.e., black males, black females, white male, and white females) most likely to receive a jail sentence compared to others? (2 points)
5. The leaders of a community health clinic want to evaluate the impact of a weight loss program for clients initiated a year ago. The program consisted of weekly low-impact group exercise sessions offered three times a week. Clients varied greatly in their attendance at these sessions.
Program administrators decide to analyze data from the program using multiple regression. The dependent variable in the analysis is total pounds lost per client over the course of the program (WLOSS).
Program administrators have a variety of independent variables. The primary independent variable is the number of exercise sessions attended (EXERCISE). Weight (in pounds) at the start of the program (WEIGHT) is included as an independent variable. Age (Age) and Gender (Gender) is also entered into the model. Gender is recoded as a dummy variable (male=1 and female =0). (Total 15 points)
1) Interpret the coefficients of independent variables (Exercise, Weight, Age, and Gender), r2 and the intercept (5 points)
2) Predict total pounds lost when 35 year old female was attending the session 7 times at her weight (200 pounds) (5 points)
3) Do you think that this program is effective for a weight loss? What other variables might explain a weight loss? In other words, if you want to add any variable to model, what would be appropriate for expecting a weight loss? (5 points)
Model Variables Entered Variables Removed Method
1 gender, weight, age, exerciseb . Enter
a. Dependent Variable: wloss
b. All requested variables entered.
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .621a .386 .360 11.1243
a. Predictors: (Constant), gender, weight, age, exercise
Model Sum of Squares df Mean Square F Sig.
1 Regression 7378.559 4 1844.640 14.906 .000b
Residual 11756.191 95 123.749
Total 19134.750 99
a. Dependent Variable: wloss
b. Predictors: (Constant), gender, weight, age, exercise
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Constant) -3.764 8.996 -.418 .677
exercise -.495 .080 -.557 -6.185 .000
weight -.056 .036 -.137 -1.556 .123
age .002 .087 .002 .027 .978
gender -.393 2.382 -.014 -.165 .869
a. Dependent Variable: wloss