# Lab activity 7 hypothesis testing data set: spx monthly

**P-value Guidelines when using Standard Normal Table (i.e. the Z-table):**

Keep this in mind: The method for finding the *p*-value is based on the ** alternative** hypothesis. Minitab will provide a

*p*-value but if we need to calculate it by hand, we would usethe Standard Normal Tableand observe the following:

· For Ha: p <p_{o, }then the p-value = P( Z ≤ z)

· For Ha: p >p_{o, }then the *p*-value = P( Z ≥ z)

· For Ha: p ≠ p_{o,} then the p-value = 2P(Z ≥ |z|) That is, find 1 – P(Z < |z|) and then multiply this p-value by 2.

**1. Hypothesis Testing for a Proportion – Example 1 Using Software**

No dataset is needed.

Prior to the start of a Stat200 course, a survey was taken to find out students’ level of excitement for the course. It is generally believed that 40% of all World Campus students are excited to take elementary statistics. That means, the ** null**hypothesis is that the percentage of World Campus students excited to take elementary statistics, which was measured by a “3” on the survey, is 0.40 (40%). The

**hypothesis is that true percentage is less than 0.40 (40%).**

*alternative*

**a. **In this case,*p* = true population proportion of World Campus students who are excited to take elementary statistics, measured by a “3” on the survey. Write the null and alternative hypotheses using statistical notation.

**Hint: Remember that hypotheses are always written about the population. Be sure to use notation for the correct population parameter (should we use notation for a mean or a proportion?)!**

**i. **Null Hypothesis:

**ii. **Alternative Hypothesis:

**b. **The survey for Stat200 polled 32 students, of which 10 responded that they are excited to take the elementary statistics course. What is the value of = percentage of students answering “3” in the ** sample**? Round your answer to 4 decimal places.

**c.** How does in part (b) compare to the true population proportion of 0.40?

**d.** Use software to perform a hypothesis test for a proportion. Note that in this case, we do not have a data set to follow. Instead, what we have is called “summarized data,” in other words, a count of the final values. Following the written and video instructions in the online notes for **“****Using Software to Perform a Summarized One Proportion Test Analysis****,” **use software to test the hypotheses in part (a). This is different from the instructions for the case of “raw data!” Copy and paste the output.

**e.** What value is given for the test statistic Z in the output?

** **

** **

**f.** What is the *p*-value?

**g.** One way to determine statistical significance is by using a p-value. We compare the p-value to a pre-determined significance level, α. In general, a significance level of 0.05 is used.

**iii. **If the p-value is **less than** the significance level (0.05), we **reject** the null hypothesis. This means, we do not have enough evidence to claim the null hypothesis is true.

**iv. **If the p-value is **greater than** the significance level (0.05), we **fail to reject** the null hypothesis. This means we do not have enough evidence to support the alternative hypothesis.

Based on the p-value in part (f), do we “reject the null hypothesis” or “fail to reject the null hypothesis?” Explain why.

**h.** In real world terms (terms of the problem), write a conclusion about the proportion of World Campus students who are excited to take elementary statistics.

**i.** Suppose the study intended to find out if **more** than 40% of World Campus students are excited to take elementary statistics. In other words, the **alternative** hypothesis is that the percentage of World Campus students excited to take elementary statistics is **greater** than 0.40 (40%). Repeat parts a, b, c, d, e, f, and g. How are the answers different from before? Explain how the alternative hypothesis affects the results of the hypothesis testing.

**i. **Write the null and alternative hypothesis using statistical notation. Compare to part (a).

** **

**ii. **Find . Compare to part (b).

**iii. ** Following the written and video instructions in the online notes for **“****Using Software to Perform a Summarized One Proportion Test Analysis****,” **use software to test the hypotheses from part (i) above. Remember to adjust the alternative hypothesis for a “**greater than**” case. Copy and paste the output.

**iv. ** What value is given for the test statistic Z in the output? Compare this test statistic to the test statistic in part (e).

** **

** **

**v. **What is the *p*-value? Compare this p-value to the p-value in part (f).

** **

**vi. **Based on the p-value in part (v), do we “reject the null hypothesis” or “fail to reject the null hypothesis?” Example why.

** **

** **

**vii. **In real world terms (terms of the problem), write a conclusion about the proportion of students who are excited to take introductory statistics.

** **

** **

**2**. **Hypothesis Testing for a Proportion – Impact of Sample Size**

Use the same survey from question 1. This time, suppose the survey from Stat 200 had 100 ‘3’ responses out of 320 people (instead of 10 out of 32 in question 1).

**a. **What is the value of = percentage of ‘3’ responses?

** **

**b. **How does above compare to the sample proportion found in question 1, part b?

**c. **Use software to test the hypotheses in question 1 part i. In other words, test:

H_{0}: p = .40

Ha: p **<** .40

Remembering that this time, the survey had 100 ‘3” responses out of 320 people and we again are testing a “<” alternative hypothesis. Copy and paste the output.

**d. **What value is given for the test statistic Z in the output?

**e. **What is the *p*-value?

**f. **Decide between the null hypothesis and the alternative hypothesis. Explain your decision.

**g. **In real world terms (terms of the problem) write a conclusion about the proportion of World Campus students who are excited to take elementary statistics.

**h. **Briefly explain how sample size affects the statistical significance of an observed result. As a starting point, note that the observed sample proportion is 0.3125 for both samples in question 1 part b and question 2 part a, yet, we had different statistical conclusions from the p-values.

**3. Hypothesis Testing for a Proportion – Example 2 By Hand**

In a marketing survey for a coffee brand, 80 randomly selected coffee drinkers are asked if they only drink decaffeinated coffee. Of the 80 respondents, 7 said “yes.”

**a.**Let *p* = population proportion of coffee drinkers who only drink decaffeinated coffee. The marketing team wants to learn if less than 10% of coffee drinkers drink only decaffeinated coffee. Write the null and alternative hypotheses about *p* in statistical notation for this situation.

**Hint: The alternative hypothesis is usually what the researcher is trying to prove, as the null hypothesis is always a statement of “no difference” or “=”. And, the null value, or numerical value written in the hypotheses is always the population parameter (not the sample statistic)!**

**i. **Null Hypothesis:

**ii. **Alternative Hypothesis:

** **

**b. **What is the value of, the proportion of the sample that drinks only decaffeinated coffee?

**c.** Test the hypotheses stated in part **a** above. By hand, calculate the test statistic by using:

Round your answer to two decimal places. Notice that this statistic is sensitive to the difference between the sample result and the null hypothesis value

**d.** Use the Standard Normal Table to find the *p*-value associated with this test statistic. Use the ** p-value guidelines** found at the beginning of this activity.

**e.** Following the written and video instructions in the online notes for **“****Using Software to Perform a Summarized One Proportion Test Analysis****,” **use software to test the hypotheses in part (a). Again, we have ** summarized** data. Copy and paste the output.

**f.**What value is given for Z in the output?

**g.**What is the p-value?

**h.**Do the Z test statistic you found by hand in part c and the *p*-value from part d approximately equal to the Z statistic and p-value from the software output in e?

**i.**Decide whether the result is significant based on the p-value from Minitab and report a conclusion in the context of this situation.

** j**. What would the *p*-value have been if the study wanted to test that decaffeinated coffee drinkers do not equal 10% of the coffee drinker population? That is, test Ho: p = 0.10 versus Ha: p ≠ 0.10.

**4. Hypothesis Testing for a Mean**

A financial analyst wanted to answer a fundamental question faced with any investor: does investing in S&P 500 stock index provide long-term return that is beyond the inflation rate? The analyst collected monthly total return data of S&P 500 Index since 1950. Based on the Consumer Price Index (CPI), the analyst estimated that the average monthly inflation rate is 0.21%. Use the **SPXMonthlyData** to test whether the S&P 500 monthly return is larger than average monthly inflation rate of 0.21%. Perform hypothesis testing first by hand and then with software. The descriptive statistics from Minitab for the variable “**Return**” are:

** **

**Descriptive Statistics: Return **

** **

Variable N N* Mean SE Mean StDev Minimum Maximum

Return 776 0 0.00612 0.00150 0.04185 -0.24543 0.15104

**a. **Write the null and alternative hypotheses using appropriate statistical notation.

**H _{0}: H_{a}: **

**b. **Calculate the degrees of freedom, DF.

**c.**Calculate the*t*-statistic by hand using:

**d. **From T-Table what is the range of the* p*-value based on you *t*-statistic? NOTE: IF you selected a two-sided Ha (i.e. used ≠) then you need to double the *p*-values found in the table.

**e. **Based on your *p*-value do you “fail to reject the null hypothesis” or “reject the null hypothesis”? State your conclusion in real world terms.

**f.**Now use software to verify your results. Follow the written and video instructions **“Using ****Using Software to Perform a One Mean Test Analysis Using Raw Data”**provided in the online notes. Note that here, since we have a data set, we do not use the summarized data procedure. Rather, we use the column containing our data, in this case, “return.” Do your results by hand and those from software roughly match?