High School: Statistics and Probability

High School: Statistics and Probability

Making Inferences and Justifying Conclusions HSS-IC.A.2

2. Decide if a specified model is consistent with results from a given data-generating process, e.g., using simulation. For example, a model says a spinning coin falls heads up with probability 0.5. Would a result of 5 tails in a row cause you to question this model?

Students should understand that sometimes, weird things happen in statistics. Not spooky weird, but an almost unnatural weird. Like flipping a coin ten times and ending up with ten heads. Odd.

Long ago, statisticians would seek the help of mental health professionals to figure out if they were crazy when these strange events kept happening. The mental health industry has been devastated ever since statisticians came up with standardized mathematical ways that help explain these phenomena. (The correlation between statisticians and the mental health industry has yet to be statistically confirmed.)

Students shouldn't freak out when things don't go exactly according to plan. There are ways to test whether results fit nicely into a statistical model or not. In statistics, the options for numerical tests are as numerous and appealing as germs on a hotel room comforter. By the way, good luck sleeping on your next vacation.

Students should be familiar with Goodness of Fit tests (which aren't the same tests used in JC Penney changing rooms). Students should also know that these tests help measure whether or not a statistical model fits certain observations.

The Chi-Squared Goodness of Fit Test (called the Chi-Squared test, for short) assumes that any discrepancy within our data is the cause of chance rather than a faulty model. We can use the Chi-Squared test provided a large enough population, an appropriate random sample, and all that other good stuff that comes with proper statistical studies.

Students should know how to calculate the value of χ2, where

In the formula, O is our observed frequency value and E is our expected frequency value.

Students also need to find the degrees of freedom, which equals the number of categories in our sample minus 1. So if we have 4 different kinds of fruit, that means we have 3 degrees of freedom. Simple enough.

As a side note, statisticians love tables. Not round tables or square tables or three-legged tables. We mean tables of values. Never-ending columns and rows of numbers upon numbers. Whatever floats their boat, right?

Students don't have to like these tables, but they should know how to use them. By that we mean compare our χ2 value to the number corresponding the degrees of freedom and significance level p = 0.05 on the table. If χ2 is larger, then our data doesn't quite match the model. If χ2 is less than the critical value (the one given by the table), the model works well enough.

Drills

  1. Different types of cell phone brands were sold at the local store over a one-month period. The data below represents how many phones were sold and how many were projected to be sold. Did the model predict the expected values accurately enough? Assume p = 0.05.

    CategoryObservedExpected
    iPhone198213
    Motorola3228
    Nokia1510

    Correct Answer:

    Yes, the χ2 value was too low

    Answer Explanation:

    To start off, the Chi-Squared test assumes there is no significant difference between observed and predicted frequencies in cell phone model sales. But we need to calculate our χ2 value to find out. If we do so, we end up with χ2 = 4.13. We have two degrees of freedom and a p = 0.05 value. Compared to the number on the table, our χ2 is lower, so we've narrowed our answer choices down to (B) or (D), but it's important to remember that a low χ2 value means the model is correct.


  2. A city crime commission has modeled the expected crime rates for certain crimes. The observed frequencies are detailed as well. What is the χ2 value?

    CategoryObservedExpected
    Robbery12831459
    Assault456345
    Murder839967
    Theft56835671

    Correct Answer:

    73.9

    Answer Explanation:

    All we need to do is apply our formula:

    If we do that for each category, we end up with a total of 73.91254, which is closest to (C). That value doesn't mean much without another value to compare it to, but that's what the question asked for.


  3. When the χ2 value is negative, what does this imply?

    Correct Answer:

    The calculation was performed incorrectly

    Answer Explanation:

    It's impossible to have a negative χ2 value. After all, it's squared for a reason. Usually, students forget to square their (OE) values, which might lead to negatives, but it's not a big deal. Just redo the calculation.


  4. An airport traffic model shows that 25% of airplanes arrive late, 25% of airplanes arrive early, and 50% of airplanes arrive on time. Out of 20 airplanes, 10 arrive late, 5 arrive early, and 5 arrive on time, is this model valid? (Assume p = 0.05.)

    Correct Answer:

    No, the model is invalid

    Answer Explanation:

    Out of 20, our expected values for late, early, and on-time planes are 5, 5, and 10, respectively. This means our χ2 value comes out to 7.5. With 2 degrees of freedom of and p = 0.05, our critical value is 5.991. Since our χ2 value is greater than the critical value, the model is invalid and (A) is the right answer.


  5. Your basketball team has decided to figure out shooting percentages of all of its players by creating a model. The study categorizes the player into categories of age: 14, 15, 16, 17, and 18. How many degrees of freedom are there?

    Correct Answer:

    4

    Answer Explanation:

    Degrees of freedom are the number of categories minus one. Our categories are the 5 different ages. Since 5 – 1 = 4, our answer is (C).


  6. Several quantitative financial analysts working for a giant bank have created a financial model for Apple's stock price. They predict the following prices over the next 12 months. The observed values are shown as well. Calculate the χ2 value and find out whether or not their model is valid. (Assume p = 0.05.)

    MonthObserved PriceExpected Price
    Jan334333
    Feb333333
    March332333
    April331333
    May325335
    June320334
    July315320
    Aug317321
    Sept319315
    Oct322315
    Nov320316
    Dec319317

    Correct Answer:

    The model is valid; there is no observable statistically significant difference

    Answer Explanation:

    We have 12 – 1 = 11 degrees of freedom. After calculating the χ2 value, we get a number close to 1.3. With 11 degrees of freedom and a significance level of 0.05, our χ2 value is well below the critical value. This means the model is fairly accurate.


  7. Your teacher claims that if you were to come up to the front of the class and select one of ten numbers, 1 through 10, randomly, his model would predict the numbers you will choose with statistically accurate significance. In fact, he's so confident he's willing to lower the p value to 0.01. How good is his model of predicting your behavior?

    Observed ValueExpected Value
    14
    31
    57
    22
    76
    14
    101
    410
    84
    43

    Correct Answer:

    His model doesn't work, since the values are too different to be due to chance alone

    Answer Explanation:

    If we calculate our χ2 value, we end up with 49.17. Our critical value is 21.66 (since we have 10 – 1 = 9 degrees of freedom). Since our χ2 value is greater than our critical value, your teacher's model doesn't work. He could learn a thing or two from you.


  8. Identify the mistake in this χ2 value calculation.

    OE(OE)2χ2
    353500
    23523500
    75637561-2-0.00026
    1342313400-23-0.00172
    25322500-32-0.0128
    435450150.033333
    5464554000-645-0.01194

    Correct Answer:

    The third column was not actually squared, despite what's written

    Answer Explanation:

    If you did the calculations or if you simply noticed the negative signs, you'd realize that the third column was simply not squared, leading to the wrong result. All other answers are completely untrue. While answer (B) is partially right, the χ2 value can often be very small, which is not a cause for concern if all the calculations are performed correctly.


  9. After several hours of isolation, 16 people are asked to estimate the time. Scientists designed a model that predicts how far off in minutes their guesses would be. The observed differences and the expected differences were assembled into the following table. Was the model accurate? (Assume p = 0.05.)

    ObservedExpected
    15
    21
    56
    22
    88
    13
    41
    107
    162
    44
    326
    61
    27
    41
    75
    22

    Correct Answer:

    No

    Answer Explanation:

    With a quick calculation of our 16 – 1 = 15 degrees of freedom, we can see that the χ2 value is humongous (over 265) and far greater than the critical value on the table. Clearly, this model needs some more work.


  10. If our χ2 value comes out to be greater than the critical value, which of the following can we assume?

    Correct Answer:

    There is a significant difference between the model and the results

    Answer Explanation:

    The greater our χ2 value is, the greater the differences between the observed and expected values. That means that when our χ2 value is larger than the critical value, the differences between the observed and expected values have reached a point to where they're no longer accurate according to the model.


Aligned Resources

    More standards from High School: Statistics and Probability - Making Inferences and Justifying Conclusions