One-Sample and Two-Sample Means Tests

1 Sample t TestThe 1 sample t test allows us to determine whether the mean ofa sample data set is different than a known value. Used when the population variance is not known. Can be used when the sample size is small. Use n-1 degrees of freedom.

For example, we are interested in determining if the mean percapita income of West Virginia counties is different than thenational average, and we suspect based on a priori (before hand)knowledge that it is lower.Our hypotheses are:Ho: The per capita income of West Virginia counties is notsignificantly less than the national average.Ha: The per capita income of West Virginia counties is significantlyless than the national average.

Procedure Steps1. Determine if West Virginia county per capita income isnormally distributed.2. Proceed with the 1 sample t test.We do not need to know the normality of US per capita income.All we need to know is the mean.

Per ePopulationHouseholdsUnited States 27,334 51,914 62,982 308,745,538 116,716,292West Virginia 19,443 38,380 488961,852,994763,831This is why we suspect WV is lower. But is this difference statistically significant?Important: remember that we develop null hypotheses based on theoryand/or what we see in the data.

W/S Test for Normalitysd 3,075.28range 16,778wq s 16,778q 3,075q 5.46qcritical 3.90, 5.46Since 3.90 q 5.46 5.46, accept Ho, the data are not significantlydifferent than normal (w/s5.46, p 0.05). Just barely normal.

The ‘barely normal’ result of the w/s test is due to its testing of k3,so it is more sensitive to leptokurtic data.

The t statistic is calculated as:x µt s/ nWhere x is the sample mean, µ is the true mean, s is thesample standard deviation, and n is the sample size.

The effect of sample size on t test results:n 7t (15 13)22 2.632 / 2.65 0.762/ 7n 14t (15 13)22 3.772 / 3.74 0.532 / 14Small sample sizes make tests MORE conservative (e.g. harder togain significance). Small but important differences may not be detected.Increasing the sample size allows us to detect smaller, butstatistically significant, differences. Larger sample sizes make statistical test more powerful.but remember there is a point of diminishing returns.

West VirginiaUnited StatesMean:s:n:Mean:19253.23075.28542733419253.2 27334.0 8080.8t 9.93813.63075.28 / 54df 54 1 53We take the calculated t value and the df to the t table todetermine the probability.Although we ignore the sign on the t table, we can tell fromthe sign that WV per capita income is less than the nationalaverage.

t -9.93df 53Since our df is not listed, usethe next lower df.Where does the calculated t(-9.93) fall?

Since -9.93 1.684, reject H0.The per capita income of West Virginia is significantly less than the nationalaverage (t-9.93, p 0.005).West Virginia’s per capita income is very low.

Note that the confidence intervals for the mean are calculatedusing the critical t value from the table. Use the 2-tailed value or0.025 if it is a 1-tailed table.𝑝𝑝𝑝𝑝𝑥𝑥̅ 𝑡𝑡𝜎𝜎,𝑑𝑑𝑑𝑑𝑠𝑠𝑠𝑠 𝜇𝜇 𝑥𝑥̅ 𝑡𝑡𝜎𝜎,𝑑𝑑𝑑𝑑𝑛𝑛𝑛𝑛 1 𝛼𝛼If the df you need is not in the table, use the next LOWER value.𝑝𝑝𝑝𝑝19553.2 2.02118794.9 𝜇𝜇 20311.52757.0454 𝜇𝜇 19553.2 2.0212757.0454 We are 95% confident that the mean incomefor West Virginia falls within this range.

df 53We’re using a 1-tailed table, sowe need to split 0.05 betweenthe tails.Therefore, use the 0.025column.

Two-Sample T-Test for Means Used to compare one sample mean to another. Two different test: Equal variances Unequal variances Homoscedasticity – the assumption of equal variances.

When data meet the assumption of normality we can useconfidence intervals to quantify the area of uncertainty (in thiscase 5%).We are 95% confident thatthe mean from this group will not fall in this range.Here we are 95% confident that the means do not overlap and thatthese two groups are significantly different.

In this example the means could occur in an overlapping region andwe are less than 95% certain that they are significantly different.

When the variances are not equal, the region of overlap isasymmetrical and the probabilities associated with the location ofthe mean are not the same. To avoid a Type 1 error we use a moreconservative approach.

Test 1: Equal VariancesThe test statistic is:t X1 X 2s12 s 22 n1 n2df n1 n2-2s21 and s22 are the variances (or squared standard deviations) foreach sample, n1 and n2 are the group samples sizes.Therefore t is the distance between the two means, in standarddeviation units, taking into consideration sample size and spread.For the 2-sample t test we know 2 means, therefore the degrees offreedom would be: df n1 n2-2.

When comparing two or more groups, each group MUST testedfor normality individually.If we pool the data and test for normality, then we are assumingthat the data are from the same population which is what weare trying to determine with the t test.

To determine if the variances are equal, use theequation:Numerator dfs12F 2s2df ( n1 1, n2 1)Denominator dfWhich is just the ratio of one variance to the other.The F results are then compared to the F table.

Note that there are several additional pages not shown

Example: 2-Sample T Test with Equal VariancesResearch question: Is there a difference in the sitting heightbetween the Arctic and Great Plains native Americans?H0 : There is no significant difference in the sitting heights ofof Arctic and Great Plains native Americans.Ha : There is a significant difference in the sitting heights ofof Arctic and Great Plains native 6882.7 cm432.684.22

The order of operations for conducting the test is:1. Test each group for normality. Proceed with t test if bothgroups are normal.2. Conduct an F test.A. If Ho is accepted then proceed with equal variances ttest.B. If Ho is rejected then proceed with Welch’s approximatet test.3. Conduct appropriate t test.

Test each group for normality:ArcticPlainsn 8n 16Range Arctic 71.2RangePlains 84.22Variance Arctic 488.2q Arctic 71.2 3.22488.2VariancePlains 432.6qCritical 2.50,3.399q Arctic 84.22 4.049432.6qCritical 3.01,4.24Since 2.50 q3.22 3.399, accept Ho.Since 3.01 q4.049 3.4.24, accept Ho.Arctic sitting height is not differentthan normal (q3.22, p 0.05).Plains sitting height is not differentthan normal (q4.049, p 0.05).Since both groups are normal, proceed to the variance(F) test.

Variance or F Test:H0 : The variances are not significantly different.Ha : The variances are significantly different.α 0.05488.2F 1.13432.6df (16 1, 8 1) (15, 7)From the F table the critical value for 15,7 df is 2.71.

Note that on the Ftable the probabilitiesare read in the columnsrather than the rows.The calculated Fstatistic falls herebetween 0.25 and0.50.

Since our calculated F statistic is less than the critical value fromthe table (1.13 2.71) we assume that the variances are equal.The variances for two sitting heights are not significantly different(F1.13, 0.50 p 0.25).Now we can proceed with the two-sample t test

NMeanS2Ranget Arctic8864.9488.271.2Plains16882.7 cm432.684.22864.9 882.7 17.8 1.99.38488.2 432.6 816Since we do not care about direction (larger or smaller) the sign does notmatter. If we do care about direction then the sign is very important.

For example:No direction implied(doesn’t matter):Arctict 864.9 882.7488.2 432.6 816ArcticArctic tribe shorterthan plains tribe:t 17.8 1.99.38Sign has no meaning. 17.8 1.99.38Sign means thearctic tribe’s meanis less than theplains tribe’s mean.Plains864.9 882.7488.2 432.6 816PlainsPlains tribe tallerthan arctic tribe:Plains Arctic882.7 864.9 17.8t 1.99.38432.6 488.2 168Sign means plainsthe tribe’s mean isgreater than thearctic tribe’s mean.

Remember that the positioning of each group in the numeratorof the t equation is related to the sign of the results.Be very careful how you construct the equation’s numerator, ANDthe way you interpret the results.

Note: use a 2-tailed test since we are just interested in whetherthere is a difference, not if one is greater or less than the other.

In Table A.3 the critical value for df 22 is 2.074.Since 1.95 2.074 we therefore accept Ho:There is no significant difference in the sitting heights of Arcticand Great Plains native Americans (t-1.95, 0.10 p 0.05).The probability range (e.g. 0.10 p 0.05) are from the t-table.

Note: 2-tailed t 1.95 for a df(v) of 22 falls between 0.10 and 0.05.

Test 2: Unequal Variances (Welch’s approximate t)The test statistic is:t X1 X 22122ss n1 n2and2 ss n1 n2 df 2 22 2 s1 s2 n1 n2 n1 1 n2 12122s21 and s22 are the variances (or squared standard deviations) foreach sample, n1 and n2 are the group samples sizes.The df may be non-integer, in which case the next smaller integershould be used. Take this number to the t table.

Is the vegetation index for wetland 35 less than the index for 23?ID 23:MeanVariancen254434055144ID 35:MeanVariancen18872884144Ho: The variances are not significantly different.F 34055 11.82884FCritical (144,144 ) 1.35From F table for df 120,120 (next lower on the table)The wetland variances are significantly different (F11.8, p 0.001).

ID35t ID231887 25442884 34055 1441442and 34055 2884 144144 df 22 34055 2884 144 144 144 1144 1 657t 20 236.5t 103.5tCritical65792.25 167df 391.1 2.8 1.654The vegetation index for wetland 35 is significantly less than the index forwetland 23 (t-103.5, p 0.001).

Variable Distributions and Test Statistic DistributionsThe variable may have a negative exponential distribution (e.g.frequency of a Ebola cases per outbreak). Most locations have very few people with Ebola.

Take 100 samples from the previous distribution. Calculate a mean for each sample. The central limit theorem states that the sample means willbe normally distributed.24 sample means48 sample means72 sample means

Some sample means will be greater than the true mean,some will be less. However, if the number of sample means is large they willtake on a the properties of a normal distribution. This is true even if the underlying population has a differentdistribution.

Robustness in StatisticsRobust – when a statistical technique remains useful even whenone or more of its assumptions are violated.In general, if the sample size is large moderate deviations from thestatistical technique’s assumptions will not invalidate the results.The F and t tests are considered to be fairly robust other tests arenot, so simply increasing the sample size may not be the bestapproach if the data violate assumptions.