PAF 9170: Research and Analysis I

2015 Fall

Midterm Exam

Name:

Instructions: There are two parts (Part A: Policy Memo and Part B: Statistical Calculation) on this exam (most have several sub-parts) each worth the specified number of points. You are to answer every question. Make sure you show all your work so you can get partial credit even if your final answer is incorrect. Due is 11/4 before the class. Please bring the hard copy of your answer.

Part A. Policy Memo (50 points)

You are a health policy analyst for the Finance Committee of the U.S. Senate. The committee is examining the potential impact of the recently passed “health care reform” and needs an analysis of factors associated with health care expenditures. To carry out the analysis you are going to draw on 2006 data from one of the most comprehensive surveys of health care in the United States, the Medical Expenditure Panel Survey (MEPS). While this survey includes information on over 1,500 variables, you have extracted from it information on health care expenditures, sources of funding for health care, health condition, insurance coverage, income by source, and some demographic variables. The variables in the dataset for this assignment are list at the end of this handout. The data is discussed in more detail in a separate handout.

The MEPS data is aggregated to the family-level, where family is defined as family members that would usually be covered under the same family health insurance policy. (See document on MEPS data for more detail.) Because the MEPS in 2006 includes over 14,000 “families” you will use a random sample of approximately 300 families in your analysis. A description of how to draw a random sample in SPSS follows the discussion of the assignment.

Research Questions: Using the random sample you have drawn from the MEPS and three independent variables which you think might be related to total medical expenditures, you need to carry out the following analysis (you can use your previous results from assignments, but make sure you need to attach those results to the memo):

1) Provide a histogram for medical expenditures and at least three independent variables that you chose.

2) Provide summary statistics (mean, median and standard deviation) and a box plot for each of the independent variables.

3) Examine the association using a cross tabulation table between whether a family is uninsured and the family’s race, and census region. Are uninsured families systematically different than those that are insured?

4) Pick at least two continuous variables which you think might be related to medical expenditures. For each,

- Prepare a scatterplot

- Provide the correlation coefficients with medical expenditures and these independent variables

- Provide a regression line for medical expenditures regressed on these independent variables.

ASSIGNMENT: Based on this analysis, prepare the following;

1) A memo (single space, 1– 2 page) written to your boss discussing your findings and making suggestions on which variables should be considered in a more in-depth analysis of medical expenditures. Don’t just answer the questions presented in order. You need to summarize the findings and key conclusions from your analysis in language accessible to the educated layman.

2) Your SPSS results as the appendix.

I will use the following criteria when grading your memos:

Accuracy/thoroughness of analysis (50%)

– Did you answer the 4 questions using the appropriate statistics and interpret the statistics correctly?

Presentation (30%)

– Do you use results from your analysis to support your conclusions?

– Can someone with no statistical background understand your memo?

Writing (20%)

– Does your memo follow the memo structure?

* Some useful tips about writing policy memos

– Writing Studio at Duke University

http://twp.duke.edu/uploads/media_items/policy-memo.original.pdf

– Other Example

http://dspace.mit.edu/bitstream/handle/1721.1/36824/11-479Spring-2004/NR/rdonlyres/Urban-Studies-and-Planning/11-479Spring-2004/9CE4ACA2-EC3D-4C1D-91CC-27971E27DCF5/0/pmwriting.pdf

Part B. Statistical Calculation (50 points)

1. An investigator wants to study the relationship between physical and intellectual growth in children. He has data which give the size of the vocabulary and the height of all children in a large elementary school. Please fill in the blanks by choosing the appropriate word from the box below. In 1) and 2), which of these correlations presents a more meaningful comparison of these variables? Explain briefly (4 points)

1) Within each of the grades, 1-6, the correlation between height and size of vocabulary is likely to be __________.

2) For all students in the school, the correlation is likely to be _________.

a) about zero; b) positive; c) negative; d) no way to tell

2. A professor has been asked to teach a course in social science statistics off-campus to a class of graduate students enrolled in the Continuing Education Program of the University. Because the professor has never taught in this program before, he does not know a great deal about the needs and backgrounds of the students in the class. In order to learn more about the students, on the first day of class he obtains from each of them the following information: age, major field as an undergraduate student, number of statistics courses taken previously, and interest in conducting empirical research (coded low, medium, high). This information is listed below:

Student # Age Major Statistics Courses Research Interest

1 24 Poli Sci 3 High

2 55 Zoology 3 High

3 26 Botany 0 Low

4 55 Sociology 0 Low

5 22 Poli Sci 1 Low

6 23 Sociology 2 Medium

7 24 Poli Sci 2 Medium

8 55 Forestry 1 Low

9 56 Engineering 9 High

10 53 Poli Sci 1 Medium

11 26 Chemistry 2 Medium

12 24 Sociology 0 Low

13 54 Physics 3 High

14 51 Sociology 3 High

15 55 Poli Sci 0 Low

16 25 Poli Sci 1 Medium

17 24 Sociology 1 Medium

Based on these data, answer questions in the following. (Total 13 points)

(a) At what level of measurement is each of these variables such as Age, Major, Statistics Course, and Research Interest? (3 points)

(b) In order to summarize this data, the professor decides to calculate measures of central tendency for these variables. For each variable, which measures of the central tendency can be calculated? For each of these variables, calculate the measure(s) of central tendency which can be calculated. (5 points)

(c) For each of these variables, which measure(s) of central tendency would it be most useful for the professor to bear in mind as he conducts the class? That is, for each of these variables, which measure(s) yield(s) the most typical or representative value for the class of students? For each variable, briefly explain your answer. (5 points)

3. Assume you work for the Army Corps of Engineers involved with flood control projects. After years of analyzing water levels on the Ohio River, the Corps has determined that water levels in a certain stretch of river (prone to flooding) are normally distributed with a mean of 20 feet and standard deviation of 5 feet. Widespread flooding occurs when the water level reaches 26 feet. (Total 12 points)

1) What percent of the time is there a flood? (5 points)

2) The corps wants to decrease the probability of a flood to only 3 percent of the time. How much will they have to raise the river bank? In other words, determine the height of the river bank to limit flooding to 3% of the time. (7 points)

4. The city council of Greely received the following data from the local police department regarding the crimes committed in the community. The data show the race and sex of those who committed crimes in the last year and whether or not they received a jail term for their actions. (Total 6 points)

Crimes Committed in Greely

Blacks Whites

Jail Sentence Male Female Total Male Female Total

Yes 48 34 82 107 62 169

No 72 34 106 57 26 83

Total 120 68 188 164 88 252

Total N = 440

Help the city council answer the following questions.

1) Were men or women more likely to receive a jail sentence? By how much? (2 points)

2) Were blacks or whites more likely to receive a jail sentence? By how much? (2 points)

3) Do these data show evidence of discrimination against any groups in receiving jail sentences? In other words, was any group of people (i.e., black males, black females, white male, and white females) most likely to receive a jail sentence compared to others? (2 points)

5. The leaders of a community health clinic want to evaluate the impact of a weight loss program for clients initiated a year ago. The program consisted of weekly low-impact group exercise sessions offered three times a week. Clients varied greatly in their attendance at these sessions.

Program administrators decide to analyze data from the program using multiple regression. The dependent variable in the analysis is total pounds lost per client over the course of the program (WLOSS).

Program administrators have a variety of independent variables. The primary independent variable is the number of exercise sessions attended (EXERCISE). Weight (in pounds) at the start of the program (WEIGHT) is included as an independent variable. Age (Age) and Gender (Gender) is also entered into the model. Gender is recoded as a dummy variable (male=1 and female =0). (Total 15 points)

1) Interpret the coefficients of independent variables (Exercise, Weight, Age, and Gender), r2 and the intercept (5 points)

2) Predict total pounds lost when 35 year old female was attending the session 7 times at her weight (200 pounds) (5 points)

3) Do you think that this program is effective for a weight loss? What other variables might explain a weight loss? In other words, if you want to add any variable to model, what would be appropriate for expecting a weight loss? (5 points)

Variables Entered/Removeda

Model Variables Entered Variables Removed Method

1 gender, weight, age, exerciseb . Enter

a. Dependent Variable: wloss

b. All requested variables entered.

Model Summary

Model R R Square Adjusted R Square Std. Error of the Estimate

1 .621a .386 .360 11.1243

a. Predictors: (Constant), gender, weight, age, exercise

ANOVAa

Model Sum of Squares df Mean Square F Sig.

1 Regression 7378.559 4 1844.640 14.906 .000b

Residual 11756.191 95 123.749

Total 19134.750 99

a. Dependent Variable: wloss

b. Predictors: (Constant), gender, weight, age, exercise

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients t Sig.

B Std. Error Beta

1 (Constant) -3.764 8.996 -.418 .677

exercise -.495 .080 -.557 -6.185 .000

weight -.056 .036 -.137 -1.556 .123

age .002 .087 .002 .027 .978

gender -.393 2.382 -.014 -.165 .869

a. Dependent Variable: wloss

2015 Fall

Midterm Exam

Name:

Instructions: There are two parts (Part A: Policy Memo and Part B: Statistical Calculation) on this exam (most have several sub-parts) each worth the specified number of points. You are to answer every question. Make sure you show all your work so you can get partial credit even if your final answer is incorrect. Due is 11/4 before the class. Please bring the hard copy of your answer.

Part A. Policy Memo (50 points)

You are a health policy analyst for the Finance Committee of the U.S. Senate. The committee is examining the potential impact of the recently passed “health care reform” and needs an analysis of factors associated with health care expenditures. To carry out the analysis you are going to draw on 2006 data from one of the most comprehensive surveys of health care in the United States, the Medical Expenditure Panel Survey (MEPS). While this survey includes information on over 1,500 variables, you have extracted from it information on health care expenditures, sources of funding for health care, health condition, insurance coverage, income by source, and some demographic variables. The variables in the dataset for this assignment are list at the end of this handout. The data is discussed in more detail in a separate handout.

The MEPS data is aggregated to the family-level, where family is defined as family members that would usually be covered under the same family health insurance policy. (See document on MEPS data for more detail.) Because the MEPS in 2006 includes over 14,000 “families” you will use a random sample of approximately 300 families in your analysis. A description of how to draw a random sample in SPSS follows the discussion of the assignment.

Research Questions: Using the random sample you have drawn from the MEPS and three independent variables which you think might be related to total medical expenditures, you need to carry out the following analysis (you can use your previous results from assignments, but make sure you need to attach those results to the memo):

1) Provide a histogram for medical expenditures and at least three independent variables that you chose.

2) Provide summary statistics (mean, median and standard deviation) and a box plot for each of the independent variables.

3) Examine the association using a cross tabulation table between whether a family is uninsured and the family’s race, and census region. Are uninsured families systematically different than those that are insured?

4) Pick at least two continuous variables which you think might be related to medical expenditures. For each,

- Prepare a scatterplot

- Provide the correlation coefficients with medical expenditures and these independent variables

- Provide a regression line for medical expenditures regressed on these independent variables.

ASSIGNMENT: Based on this analysis, prepare the following;

1) A memo (single space, 1– 2 page) written to your boss discussing your findings and making suggestions on which variables should be considered in a more in-depth analysis of medical expenditures. Don’t just answer the questions presented in order. You need to summarize the findings and key conclusions from your analysis in language accessible to the educated layman.

2) Your SPSS results as the appendix.

I will use the following criteria when grading your memos:

Accuracy/thoroughness of analysis (50%)

– Did you answer the 4 questions using the appropriate statistics and interpret the statistics correctly?

Presentation (30%)

– Do you use results from your analysis to support your conclusions?

– Can someone with no statistical background understand your memo?

Writing (20%)

– Does your memo follow the memo structure?

* Some useful tips about writing policy memos

– Writing Studio at Duke University

http://twp.duke.edu/uploads/media_items/policy-memo.original.pdf

– Other Example

http://dspace.mit.edu/bitstream/handle/1721.1/36824/11-479Spring-2004/NR/rdonlyres/Urban-Studies-and-Planning/11-479Spring-2004/9CE4ACA2-EC3D-4C1D-91CC-27971E27DCF5/0/pmwriting.pdf

Part B. Statistical Calculation (50 points)

1. An investigator wants to study the relationship between physical and intellectual growth in children. He has data which give the size of the vocabulary and the height of all children in a large elementary school. Please fill in the blanks by choosing the appropriate word from the box below. In 1) and 2), which of these correlations presents a more meaningful comparison of these variables? Explain briefly (4 points)

1) Within each of the grades, 1-6, the correlation between height and size of vocabulary is likely to be __________.

2) For all students in the school, the correlation is likely to be _________.

a) about zero; b) positive; c) negative; d) no way to tell

2. A professor has been asked to teach a course in social science statistics off-campus to a class of graduate students enrolled in the Continuing Education Program of the University. Because the professor has never taught in this program before, he does not know a great deal about the needs and backgrounds of the students in the class. In order to learn more about the students, on the first day of class he obtains from each of them the following information: age, major field as an undergraduate student, number of statistics courses taken previously, and interest in conducting empirical research (coded low, medium, high). This information is listed below:

Student # Age Major Statistics Courses Research Interest

1 24 Poli Sci 3 High

2 55 Zoology 3 High

3 26 Botany 0 Low

4 55 Sociology 0 Low

5 22 Poli Sci 1 Low

6 23 Sociology 2 Medium

7 24 Poli Sci 2 Medium

8 55 Forestry 1 Low

9 56 Engineering 9 High

10 53 Poli Sci 1 Medium

11 26 Chemistry 2 Medium

12 24 Sociology 0 Low

13 54 Physics 3 High

14 51 Sociology 3 High

15 55 Poli Sci 0 Low

16 25 Poli Sci 1 Medium

17 24 Sociology 1 Medium

Based on these data, answer questions in the following. (Total 13 points)

(a) At what level of measurement is each of these variables such as Age, Major, Statistics Course, and Research Interest? (3 points)

(b) In order to summarize this data, the professor decides to calculate measures of central tendency for these variables. For each variable, which measures of the central tendency can be calculated? For each of these variables, calculate the measure(s) of central tendency which can be calculated. (5 points)

(c) For each of these variables, which measure(s) of central tendency would it be most useful for the professor to bear in mind as he conducts the class? That is, for each of these variables, which measure(s) yield(s) the most typical or representative value for the class of students? For each variable, briefly explain your answer. (5 points)

3. Assume you work for the Army Corps of Engineers involved with flood control projects. After years of analyzing water levels on the Ohio River, the Corps has determined that water levels in a certain stretch of river (prone to flooding) are normally distributed with a mean of 20 feet and standard deviation of 5 feet. Widespread flooding occurs when the water level reaches 26 feet. (Total 12 points)

1) What percent of the time is there a flood? (5 points)

2) The corps wants to decrease the probability of a flood to only 3 percent of the time. How much will they have to raise the river bank? In other words, determine the height of the river bank to limit flooding to 3% of the time. (7 points)

4. The city council of Greely received the following data from the local police department regarding the crimes committed in the community. The data show the race and sex of those who committed crimes in the last year and whether or not they received a jail term for their actions. (Total 6 points)

Crimes Committed in Greely

Blacks Whites

Jail Sentence Male Female Total Male Female Total

Yes 48 34 82 107 62 169

No 72 34 106 57 26 83

Total 120 68 188 164 88 252

Total N = 440

Help the city council answer the following questions.

1) Were men or women more likely to receive a jail sentence? By how much? (2 points)

2) Were blacks or whites more likely to receive a jail sentence? By how much? (2 points)

3) Do these data show evidence of discrimination against any groups in receiving jail sentences? In other words, was any group of people (i.e., black males, black females, white male, and white females) most likely to receive a jail sentence compared to others? (2 points)

5. The leaders of a community health clinic want to evaluate the impact of a weight loss program for clients initiated a year ago. The program consisted of weekly low-impact group exercise sessions offered three times a week. Clients varied greatly in their attendance at these sessions.

Program administrators decide to analyze data from the program using multiple regression. The dependent variable in the analysis is total pounds lost per client over the course of the program (WLOSS).

Program administrators have a variety of independent variables. The primary independent variable is the number of exercise sessions attended (EXERCISE). Weight (in pounds) at the start of the program (WEIGHT) is included as an independent variable. Age (Age) and Gender (Gender) is also entered into the model. Gender is recoded as a dummy variable (male=1 and female =0). (Total 15 points)

1) Interpret the coefficients of independent variables (Exercise, Weight, Age, and Gender), r2 and the intercept (5 points)

2) Predict total pounds lost when 35 year old female was attending the session 7 times at her weight (200 pounds) (5 points)

3) Do you think that this program is effective for a weight loss? What other variables might explain a weight loss? In other words, if you want to add any variable to model, what would be appropriate for expecting a weight loss? (5 points)

Variables Entered/Removeda

Model Variables Entered Variables Removed Method

1 gender, weight, age, exerciseb . Enter

a. Dependent Variable: wloss

b. All requested variables entered.

Model Summary

Model R R Square Adjusted R Square Std. Error of the Estimate

1 .621a .386 .360 11.1243

a. Predictors: (Constant), gender, weight, age, exercise

ANOVAa

Model Sum of Squares df Mean Square F Sig.

1 Regression 7378.559 4 1844.640 14.906 .000b

Residual 11756.191 95 123.749

Total 19134.750 99

a. Dependent Variable: wloss

b. Predictors: (Constant), gender, weight, age, exercise

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients t Sig.

B Std. Error Beta

1 (Constant) -3.764 8.996 -.418 .677

exercise -.495 .080 -.557 -6.185 .000

weight -.056 .036 -.137 -1.556 .123

age .002 .087 .002 .027 .978

gender -.393 2.382 -.014 -.165 .869

a. Dependent Variable: wloss

Here is the outline for the Midterm Project Phase 1 Implementation. The subsequent outline will guide you prepare in analyzing further concepts in developing a more feasible, usable and desirable Philippine...You are tasked with designing a Tom and Jerry themed game for the teensy pewpew platform. Jerry, who will be controlled by the user, just wants to eat cheese and dodge Tom. After he has had a few pieces...EFB344 Assignment - Part ADue: Friday the 18th of October, 2019 at 11:59pmWeight: 30% of the overall unitNote: This is an individual assignment.OverviewThe task you are given is to estimate the market...What is Academic Plagiarism and why is academic integrity relevant to professionalism in a Business environment?Weightage 50%1000 wordsWhen the Kafue Plastic Company’s plant is completely idle, fixed costs amount to K720,000. When the plant operates at levels of 50% of capacity or less, its fixed costs are K840,000; at levels more than...I would like to get a quotation for my environmental geography assignment in South African currency. Assignment Five Due date: 15 October 2019 Marks: 40 (contribution to year mark: 30%) Question 01: Environmental...4 discussions (A, B, C and D)Deadline : 2nd OctoberA. https://www.pbs.org/wgbh/frontline/film/class-divided/1. What did the children's body language indicate about the impact of discrimination?2. How...**Show All Questions**