Data Analysis Assignment: Statistics.

__STAT 250 Fall 2018 Data Analysis Assignment 3__

Your submitted document should include the following items. Points will be deducted if the following are not included.

- Type your
**Name**,**STAT 250**with your correct section number (e.g. STAT 250-xxx) right justified and**Data Analysis Assignment #3**centered on the top of page 1 of your document. - Number your pages across your entire solutions document.
- Your document should include the
**ANSWERS ONLY**with each answer labeled by its corresponding number and subpart. Keep the answers in order. Do NOT include the questions in your submitted document. - Generate all requested graphs and tables using
.__StatCrunch__ - Upload your document onto Blackboard as a Word or pdf document using the link provided by your instructor.
- You may not work with other individuals on this assignment. It is an honor code violation if you do.

**Elements of good technical writing:**

Use complete and coherent sentences to answer the questions.

Graphs must be appropriately titled and should refer to the context of the question.

Graphical displays must include labels with units if appropriate for each axis.

Units should always be included when referring to numerical values.

When making a comparison you must use comparative language, such as “greater than”, “less than”, or “about the same as.”

Ensure that all graphs and tables appear on one page and are not split across two pages.

Type all mathematical calculations when directed to compute an answer ‘by-hand.’

Pictures of actual handwritten work are not accepted.

When writing mathematical expressions into your document you may use either an equation editor or common shortcuts such as: can be written as sqrt(x), can be written as p-hat, can be written as x-bar.

**Problem 1: Confidence Interval for Percentage of B’s.**

The data set “**STAT 250 Final Exam Scores**” contains a random sample of 269 STAT 250 students’ final exam scores (maximum of 80) collected over the past two years. Answer the following questions using this data set.

- What proportion of students in our sample earned B’s on the final exam? A letter grade of B is obtained with a score of between 64 and 71 inclusive. Hint: You can do this many ways, but in StatCrunch, go to Data à Row Selection à Interactive Tools. In the slider selectors box, click the variable “Scores” into the variable box. Then click compute. Use the slider to obtain the count by looking at the “# rows selected” presented in the first line of the box. Show your work (i.e. describe the method you used to obtain the number of B’s) and express this value as a proportion rounded to four decimal places.

- Using the sample proportion obtained in (a), construct a 90% confidence interval to estimate the population proportion of students who earned a B on the final exam. Please do this “by hand” using the formula and showing your work (please type your work, no images accepted here). Assume all Central Limit Theorem conditions hold. Round your confidence limits to four decimal places.

- Verify your result from part (b) using Stat à Proportions Stats à One Sample à With Summary. Inside the box, select confidence interval and click Compute! Copy and paste your StatCrunch result in your document.

- Interpret the StatCrunch confidence interval in part (c) in one sentence using the context of the question.

- Did this confidence interval capture the true population proportion p = 0.21 given in Problem 4 of Data Analysis 2. Answer this question in one sentence.

- Use the Confidence Interval applet (for a Proportion) in StatCrunch to simulate constructing one thousand 90% confidence intervals using p = 0.21 and n = 269. Once the window is open,
**click reset and select (or click) 1000 intervals**. Copy and paste your image into your document.

Box 1: Enter the given population proportion, 0.21 Box 2: Enter the given confidence level 0.90 Box 3: Enter the given sample size, n=269 |

- Compare the “Prop. contained” value from part (f) to the confidence level associated with the simulation.

- Write a long-run interpretation for your confidence interval method in one sentence.

**Problem 2: Opinion on Sports Betting**

About one year ago, polling numbers began to show that public opinion about legalizing sports betting had changed. For the first time, a majority of Americans supported making wagering on professional sports legal. A researcher in the state of Virginia is currently interested in this topic. She wants to test the claim that more than 54% of Virginians would support this legalization. To test this claim she collected a random sample of 382 Virginia adults and then asked whether they support or oppose legalized professional sports betting. The responses (0 = Oppose and 1 = Support) are found in StatCrunch in a data set called “**Virginia Sports Betting Survey**.”

- Obtain the sample proportion of individuals who said “Support” using Stat à Tables à Frequency in StatCrunch. Only the value of the sample proportion is needed in your answer. Present this sample proportion as a fraction or a decimal rounded to 4 decimal places.

- Using a = 0.05, is there sufficient evidence to conclude that more than 54% of Virginia adults support legalized sports betting? Conduct a full hypothesis test by following the steps below.

- Define the population parameter in one sentence.
- State the null and alternative hypotheses using correct notation.
- State the significance level for this problem.
- Check the three conditions of the Central Limit Theorem that allow you to use the One-Proportion z-Test using one complete sentence for each condition. Show work for the numerical calculation.
- Calculate the test statistic “by-hand.” Show the work necessary to obtain the value by typing your work and provide the resulting test statistic. Do not round while doing the calculation. Then, round the test statistic to two decimal places after you complete the calculation.
- Calculate the
*p*-value using the standard Normal table and provide the answer. Use four decimal places for the*p-*value. - State whether you reject or do not reject the null hypothesis and the reason for your decision in one sentence (compare your p-value to the significance level to do this).
- State your conclusion in context of the problem (i.e. interpret your results and/or answer the question being posed) in one or two complete sentences.
- Use StatCrunch (Stat à Proportion Stats à One Sample à with Data) to verify your test statistic and p-value. Copy and paste this box into your document.

**Problem 3: Electric Cars**

According to both a Consumer Report’s and a AAA survey, about 1 in 5 Americans will buy an electric car as their next vehicle. To test this claim, an independent surveyor obtained records for 2018 car sales. The surveyor generated a random sample of 175 car sales and found that 28 of these new car purchases were of electric cars.

- Check the three conditions of the Central Limit Theorem that allow you to use the One-Proportion Confidence Interval using one complete sentence for each condition. Show work for the numerical calculation.

- Construct a 99% confidence interval to estimate the population proportion of Americans new car purchases that were electric. Calculate this “by hand” using the formula and showing your work (please type your work, no images accepted here). Round your confidence limits to four decimals.

- Verify your result in part (a) using Stat à Proportions Stats à One Sample à With Summary. Copy and paste your StatCrunch result in your document as well.

- Using a = 0.01, is there sufficient evidence to conclude that the proportion of Americans who purchased an electric car is different from 0.2? Conduct a full hypothesis test by following the steps below. Enter an answer for each of these steps in your document.

- Define the population parameter in one sentence.
- State the null and alternative hypotheses using correct notation.
- State the significance level for this problem.
- Calculate the test statistic “by-hand.” Show the work necessary to obtain the value by typing your work and provide the resulting test statistic. Do not round during the calculation. Then, round the test statistic to two decimal places after you complete the calculation.
- Calculate the
*p*-value using the standard Normal table and provide the answer. Use four decimal places for the*p*-value. - State whether you reject or do not reject the null hypothesis and the reason for your decision in one sentence (compare your p-value to the significance level to do this).
- State your conclusion in context of the problem (i.e. interpret your results and/or answer the question being posed) in one or two complete sentences.
- Use StatCrunch (Stat à Proportion Stats à One Sample à with Summary) to verify your test statistic and p-value. Copy and paste this box into your document.

- Explain the connection between the confidence interval and the hypothesis test in this problem (discuss this in relation to the decision made from your hypothesis test and connect it to the confidence interval you constructed in part (b)). Answer this question in one to two sentences.

**Problem 4: House Prices**

Use the “**Fairfax City Home Sales**” dataset for parts of this problem.

- Use StatCrunch to construct an appropriately titled and labeled relative frequency histogram of Fairfax home closing prices stored in the “Price” variable. Copy your histogram into your document.
- What is the shape of this distribution? Answer this question in one complete sentence.
- Assuming the population has a similar shape as the sample with population mean $510,000 and population standard deviation $145,000; calculate the probability that in a random sample of size 10, the mean of the sample will be greater than $600,000. You may assume a random sample was taken and the sample came from a big population. However, be sure to check the central limit theorem condition of a large sample size before completing this problem using one complete sentence. If this condition is not met, you cannot complete the problem.
- Assuming the population has a similar shape as the sample with population mean $510,000 and population standard deviation $145,000; calculate the probability that in a random sample of size 36, the mean of the sample will be greater than $600,000. You may assume a random sample was taken and the sample came from a big population. However, be sure to check the central limit theorem condition of a large sample size before completing this problem using one complete sentence. If this condition is not met, you cannot complete the problem.

**Name**, **STAT 250**

**Data Analysis Assignment #3**

**Problem 1: Confidence Interval for Percentage of B’s.**

- Proportion of students who earned B’s on the final exam; a score of between 64 and 71 inclusive.

**# rows selected:** 51

**Combining selectors:**

Select rows where all selectors are satisfied

Select rows where any selector is satisfied

**Scores:** 64 – 71

Stat crunch was used to find the proportion and solve the problem and the output is provided above. The number of students who scored a B; between 64 and 71 points were

51. The respective proportion is

- 90% confidence interval for estimation on the population proportion of students who earned a B on the final exam.

90% CI =

=

=

= [0.1503, 0.2289]

- Verification of result from part (b) using StatCrunch

**One sample proportion summary confidence interval:**

p : Proportion of successes

Method: Standard-Wald**90% confidence interval results:**

Proportion | Count | Total | Sample Prop. | Std. Err. | L. Limit | U. Limit |

P | 51 | 269 | 0.18959108 | 0.023899285 | 0.15028025 | 0.2289019 |

The CI interval matches that from c above; [0.1503, 0.2289]

- Interpretation of the confidence interval in part (c).

Therefore, the minimum proportion expected to score a B would be 0.1503 while the maximum would be 0.2289.

- The CI obtained was [0.1503, 0.2289] and the true population proportion p = 0.21 was captured there.
- Simulation of one thousand 90% confidence intervals using p = 0.21 and n = 269.

- The “Prop. Contained” value from part (f) is relatively close and equal to the confidence interval level used of 90%.
- Whenever a large sample is considered, the ultimate proportion contained shifts closer to the level of confidence interval

**Problem 2: Opinion on Sports Betting**

- The sample proportion of individuals who said “Support”

Support 1frequency = 21;

Proportion =

- For a= 0.05, is there sufficient evidence to conclude that more than 54% of Virginia adults support legalized sports betting? Conduct a full hypothesis test by following the steps below.
- Population parameter; the proportion of Virginia adults supporting legalization of sports betting = 0.54; that is p = 0.54
- H
_{0}: p = 0.54 against H_{A}: p > 0.54 - Significance level = 0.05
- Conditions of the Central Limit Theorem.

The conditions are randomization each sample is a random sample for the population, the sample should not be bigger that 10% of the total population, and it should be large enough such that np ≥10 and n*(1-p) ≥10. Notably, the sample was random, it represented the entire population hence it was more than 10% and it was large; 0.54*382 > 10 and 0.46*238 > 10.

- Calculate the test statistic “by-hand.” Show the work necessary to obtain the value by typing your work and provide the resulting test statistic. Do not round while doing the calculation. Then, round the test statistic to two decimal places after you complete the calculation.
- Calculate the
*p*-value using the standard Normal table and provide the answer. Use four decimal places for the*p-*value.

z =

For p ≤ 0.54, p-value = 0.24510

for p ≤ 0.54 p-value = 1 – 0.24510 = 0.7549

- The p-value is greater than the significance level of 0.05 hence the null hypothesis was not rejected.
- Given that the null hypothesis was not rejected, then the result was not significant. Therefore, there lacked statistical evidence to support the claim that more than 54% of Virginia’s adults would support betting legalization.
- Stat crunch output

**One sample proportion hypothesis test:**

Outcomes in: Opinion

Success: 0.5576

Group by: Opinion

p : Proportion of Successes

H_{0} : p = 0.54

H_{A} : p > 0.54**Hypothesis test results:**

Opinion | Count | Total | Sample Prop. | Std. Err. | Z-Stat | P-value |

0 | 0 | 169 | 0 | 0.038338264 | -14.085145 | 1 |

1 | 0 | 213 | 0 | 0.034149629 | -15.812763 | 1 |

The null hypothesis was not rejected.

**Problem 3: Electric Cars**

- Conditions of the Central Limit Theorem

The conditions are the sample show be randomly sampled from the population, not be bigger that 10% of the total population, and it should sufficiently large. The sample used was random, was not larger than 10% of targeted population and was large; n > 30.

- 99% confidence interval.

Confidence interval; ; p =

=

=

= [0.0886, 0.2314]

- Stat crunch output.

**One sample proportion summary confidence interval:**

p : Proportion of successes

Method: Standard-Wald**99% confidence interval results:**

Proportion | Count | Total | Sample Prop. | Std. Err. | L. Limit | U. Limit |

p | 28 | 175 | 0.16 | 0.027712813 | 0.088616524 | 0.23138348 |

- α = 0.01, hypothesis testing
- population parameter; the population proportion of Americans who purchased an electric car = 0.2
- H
_{0}: p = 0.2 against H_{A}: p ≠ 0.2 - Significance level = 0.01
- Test statistic.

= =

= -1.32 (2 d.p)

*p*-value.

= 0.1868

- Decision

Do not reject the null hypothesis; the p=value is greater than the significance.

- The null hypothesis was rejected in favor of the alternative. There lacked enough statistical evidence to indicate that the proportions of Americans who would purchase an electric car was different from 0.2.
- StatCrunch output

Proportion | Count | Total | Sample Prop. | Std. Err. | Z-Stat | P-value |

p | 28 | 175 | 0.16 | 0.030237158 | -1.3228757 | 0.1859 |

**One sample proportion summary hypothesis test:**

p : Proportion of successes

H_{0} : p = 0.2

H_{A} : p ≠ 0.2**Hypothesis test results:**

- The confidence level captured the population proportion. However, the evidence that the proportion would be different from 0.2 lacked.

**Problem 4: House Prices**

- Relative frequency histogram

- The histogram is skewed to the right; positively skewed as the tail is longer on the positive side of the number line and most of the data is concentrated on the left side of the graph.
- Population mean $510,000, standard deviation $145,000, n= 10

Central limit theorem condition of a large sample size; np ≥ 10, nq ≥ 10

Taking p = 0.5, np = nq =10 * 0.5 = 5 which are both less than 10 hence the condition is unmet. Problem cannot be completed

- Population mean $510,000, standard deviation $145,000, n= 10

Central limit theorem condition of a large sample size; np ≥ 10, nq ≥ 10

Taking p = 0.5, np = nq =36 * 0.5 = 18 which are both greater than 10 hence the condition is met.

p(x > 600000)

z =

p(z > 1.9628) = 1- 0.975

= 0.025