Chapter 5
In a survey conducted by Wright State University, senior high school students were asked if they had ever used
marijuana. Of the 1120 females surveyed, 445 responded yes. Of the 1156 males surveyed, 515 responded yes. The
researchers who conducted a survey were interested in knowing if there is a difference between genders in regards to
having tried marijuana.
1. Is this an observational study or an experiment? Explain how you know.
Observational study – the explanatory variable, gender, is not controlled
2. Identify the explanatory and response variables, and classify them as categorical or quantitative.
Explanatory – gender, categorical
Response – use of marijuana, categorical
3. Do you think random sampling was employed in this scenario? Explain your reasoning.
Not really clear if random sampling was used – not clearly stated in the scenario
4. Do you think random assignment was employed in this scenario? Explain your reasoning.
Random assignment was not used as they did not have the ability to assign people genders.
5. Write the hypotheses in symbols for this scenario.
H 0 : π m − π f =0
H A : π m−π f ≠ 0
6. Produce a 2x2 table of counts, with the explanatory variable in columns.
Tried
Have not tried
Total
Male
515
641
1156
Female
445
675
1120
7. Calculate the proportion of subjects who have tried marijuana based on gender.
515
=0.4455
1156
445
=0.3973
^pf =
1120
^pm=
Total
960
1316
2276 8. Calculate the relative risk of trying marijuana by gender.
^p m 0.4455
=
=1.1213
^p f 0.3973
9. Summarize what the relative risk reveals about the likelihood of trying marijuana based on gender.
The proportion of males who have tried marijuana is 1.1213 times more likely than the proportion of females
who have tried marijuana.
10. We can also use the difference in proportions to compare two groups. Calculate the difference in the proportion
of subjects who had tried marijuana by gender.
^pm− ^p f =0.4455−0.3973=0.0482
11. Suppose each group has been given 2276 cards. Each group writes “Tried” on 960 of them and “Not tried” on the
other 1316. What is special about 960 and 1316? Why do you think these numbers were selected?
That is the total number of people who have tried and the total number of people who have not tried
respectively.
12. Next, each group would shuffle the cards and deal them into two piles – one pile of 1120 and one of 1156. What
is special 1120 and 1156? Why do you think these numbers were selected?
The total number of males and the total number of females respectively.
13. After redistributing into two piles, each group would find the number of “Tried” in each pile, calculate the
proportion of “Tried” in each pile, and finally calculate the difference between these proportions. This process is
the simulation process our Two Proportions applet goes through. Clearly, we do not want to sit here and shuffle
cards 1000 times to get a good idea of our null distribution. So, let’s use our applet. Simulate this
study/experiment 1000 times in the applet. Where is the null distribution centered? Is this what you expect?
Why?
Centered at 0. Makes sense because we have assumed no association to simulate potential values of the statistic
14. Find the p-value using the applet. Report the p-value.
p-value = 0.02 or something closer.
15. Using α =0.05 , what can you conclude based on the p-value found in problem 14? Be sure you conclude in
context.
Since the p-value is less than alpha, we can reject the null hypothesis. In context, this means there is a difference
in the long-run proportions who have tried marijuana by gender. 16. Based on the study/experiment these researchers performed, would they be able to use the Theory-Based
approach? Why or why not?
Yes, we have more than 10 observations per cell.
17. Do the formal hypothesis test. Use Se
( ^p1− ^p2)=0.021,
α =0.05 . The critical values for one sided test
are 1.6 and -1.6, and that for two-sided test are 2 and -2. Is the conclusion here same as in (15)?
H 0 : π m − π f =0
1.
H A : π m−π f ≠ 0
2.
3.
α =0.05 . This is two-sided test so CV are -2 and 2.
^pm − ^p f
0.0482
=2.295
z=
=
se (^pm − ^p f ) 0.021
2.295 > 2. So, we reject the null hypothesis.
4.
5. We have strong evidence to conclude that there is a difference in the long-run proportions who have tried
marijuana by gender.
18. Find the 95% confidence interval for this study/experiment.
CI = (^p m− ^p f) ±2∗se( ^p m− ^pf )
¿ 0.0482 ± 2∗0.021
¿ 0.0482 ± 0.042=(0.0062, 0.0902)
Note: Applet gives (0.0077, 0.0887). It’s different because of some rounding error.
19. Interpret the interval found in problem 18.
We are 95% confident the difference in the long-run proportions who have tried marijuana for males vs. females
is between 0.0062 and 0.0902.
20. Is 0 contained in the interval? What does your answer imply?
No – this implies there is an association between gender and the use of marijuana.