Statistics FAQ

What is an outlier?

An outlier is a data point which does not appear to come from the same distribution as the rest of the data. This may occur, for example, if a participant is not following or has misunderstood the given instructions. If the data is distributed normally, one would predict 31.74% of the values to be at least 1 Standard Deviation (SD) away from the mean, 4.56% to be at least 2 SD from the mean, and 0.26% to be at least 3 SD from the mean. As a rule of thumb, unless you have over 100 participants, any value more than 3 SD from the mean should be excluded as an outlier.

What should I do if the test assumptions are violated?

There are three options available to you:

  1. Simplify. Non-parametric tests are only available for experimental simple designs, but it may be possible to reduce a more complicated design to something that does have a non-parametric equivalent. For examples, a 2x2 repeated or independent ANOVA (not mixed) could be reduced to a Friedman or Kruskal-Wallis test with four levels.
  2. Transform. If the dependent variable has a skewed distribution, try using the square root or log if it's positively skewed, or square/inverse log the distribution if it has a negative skew. If you have a repeated independent variable, don't forget to transform the dependent variable for all levels of your repeated variable. You can still present descriptive statistics using the original untransformed data.
  3. Acknowledge. Parametric tests tend to be quite robust. If your assumptions aren't met then the likelihood is that the results will still be fairly reliable. You should acknowledge the problem, proceed with the original test, and then be cautious about borderline results (p>.010).

Should I use Yates Correction?

Yates correction should only ever be applied to Chi Squared calculations where there is only on degree of freedom (i.e. a 2x2 contingency table). However, there is debate as to whether it is appropriate in the majority of cases, so the best policy is probably never to use it.

Is my Independent Samples t-test significant?

If you run an Independent Samples t-test on SPSS, you are given 2 different 2-Tail. Sig. values. The first is in a row labelled Equal and the second in a row labelled Unequal. In order to decide which row to use, you must look at the line labelled Levene's Test for Equality of Variances.
If the Levene's test is significant (P is less than or equal to 0.05) use the Unequal row.
If the Levene's test is not significant (P is greater than 0.05) use the Equal row.
If the 2-Tail. Sig. value is less than or equal to 0.05 in the relevant row then the test is significant at the 5% level.

Is my repeated Measures ANOVA significant?

Kinnear and Gray (1999) and Mauranen (1997)

If you run a Repeated Measures ANOVA and there is a within subjects factor with more than 2 levels, you are presented with Mauchly's Test of Sphericity. This is similar to the Levene's test which appears in the middle of the Independent Samples t-test.
If the Mauchly test is significant (P is less than or equal to 0.05) and the normal distribution assumption is violated, use the Pillais row in the Multivariate tables.
If the Mauchly test is significant (P is less than or equal to 0.05) but the distributions are normal, use the Greenhouse-Geisser row in the averaged summary table.
If the Mauchly test is not significant (P is greater than 0.05) use the Sphericity assumed row in the averaged summary table.

Is my stepwise multiple regression significant?

Stepwise regressions are designed for data mining (exploring data to look for trends which can be tested properly at a later date) rather than hypothesis testing. This is because the p values produced do not take into account all the variables which have been omitted from the model. An approximate solution to this would be to multiply each p value by the number of variables in the maximum posssible model size and then divide by the number of variables left in the model. For example, if the maximum size of model has 10 variables and the best fit model has only 5 variables then all the p values for the individual variables left in the model should be doubled. N.B. I have based this method on the Bonferroni correction, but have not seen it explictly used like this.

What should I do if my results aren't significant?

If your significance value is over .100 then you should treat it as a non-result. You did not find sufficient evidence for the effect to report any differences you might see in the data. If you discuss the direction of non-significant differences then you are likely to lose marks.

If your significance value is over .050 but under .100 then you can report that you have "weak evidence for" an effect. Use "weak evidence for an effect of" instead of "significant effect of" in your results and discussion and be tentative about the conclusions you draw from it.

If you have applied a Bonferroni correction then you should use weak evidence to refer to effects that have significance values under .050 but are greater than the corrected value.

How do I interpret partial eta squared?

Richardson (2011)

Small, medium and large effect sizes correspond to partial eta squared values of greater than .01, .06 and .14 respectively. These values have been rounded from the values .0099, .0588 and .1379 which were calculated by Cohen (1969, pp. 278-280) from his values of f.

When reporting effect sizes it is important report the value as a partial eta squared value to avoid confusion with the eta squared itself. While eta squared should add up to 1 across all effects in the model, partial eta squared can add up to more than 1. It should not therefore, be expressed as a percentage. 

How do I run a 1-tailed t-test in SPSS?

Run a normal 2-tailed test. The value of t for the 1-tailed test is the same, but you should divide the 2-tailed sig. value by 2.

Which post hoc test should I use for a One-Way ANOVA?

Howell (1992, pp 355-368)
SPSS only offers post hoc tests for the One-Way ANOVA. Except for the Fisher's Least-Significant Difference, the overall F value does not have to be significant. (Howell, 1992, p338)
You do not need a post hoc test if your factor has 2 levels.
Use Student-Newman-Keuls if your factor has 3 levels. Do not use if your factor has more than 5 levels.
You can also use Fisher's Least-Significant Difference (Fisher's protected t) if your factor has 3 levels and the overall F is significant.
Use Tukey's b if your factor has 4 or 5 levels.
Use Tukey's Honestly Significant Difference if your factor has 6 or more levels.
Use Scheffé if you are also testing one or more contrasts.

Which post hoc test should I use for a normal ANOVA?

Howell (1992, pp 349-351)
Since SPSS will not produce post hoc tests for repeated factors or interactions in an ANOVA model, the easiest way to compare groups is to use Bonferroni t-tests. These are run as normal t-tests, but the p values should be multiplied by the total possible number of t-tests which could be run. For example, if Factor A has 3 levels and Factor B has 2 levels then there are 15 (6C2 = 6!/(4!2!) ) possible t-tests (A1B1-A2B1, A1B1-A3B1, A1B1-A1B2, A1B1-A2B2, A1B1-A3B2, A2B1-A3B1, A2B1-A1B2, A2B1-A2B2, A2B1-A3B2, A3B1-A1B2, A3B1-A2B2, A3B1-A3B2, A1B2-A2B2, A1B2-A3B2, & A2B2-A3B2). All the p values would have to be multiplied by 15. In other words, the t tests would have to have a 2-tailed sig of 0.0033 or less for the familywise error rate to be 0.05.
The better option is to use A Priori comparisions rather than post hoc. In this case you must decide which comparisons you wish to make before you look at the data. The p values of the t-tests can now be multiplied the number of comparisons you have chosen to make, rather than the total possible number. In the above example, you might want to make the following 3 comparisons (A1B1-A1B2, A2B1-A2B2, A3B1-A3B2) and therefore the t-test p values would only have to be multiplied by 3 instead of 15. In other words, the t tests would have to have a 2-tailed sig of 0.0167 or less for the familywise error rate to be 0.05.

What is Multicolinearity?

Conley and Pollard (1998)
Multicolinearity is when there is a high correlation between variables in a regression equation. This results in unstable significance levels and ß values for the correlated predictors (i.e. a small difference in the data can result in very different significance levels and ß values for the correlated predictors). A rule of thumb is that problems of multicolinearity are likely to occur if any of the the correlations between any of the independent variables are greater than .7. If this is the case, the best courses of action are either to remove one of the correlated variables from the model, or to create a compound variable by adding the correlated variables together or performing a factor analysis.

How do I run a repeated measures regression?

In departure from normal practice, the data should be set out with multiple rows for each participant. The participant indentifier and all the between subject factors should be copied onto every row for each participant.

  1. Calculate the mean score for each participant on the dependent variable ignoring the within subject factors.
  2. Create a column called Mean and enter the scores just calculated.
  3. Regress the between subjects factors on the mean score.
  4. Regress the mean score and the within subjects factors on the individual scores.

This method is called Criterion Scaling (Pedhazur, 1982).

How do I run a cross-lagged panel correlation?

Hahn (1999)
Under normal circumstances, correlations provide no causal information whatsoever. However, causality can be inferred if two quantities (A & B) are measured at two points in time (Time 1 & Time 2) and the correlation of A1 with B2 is significantly different from the correlation of B1 with A2. Hahn (1999) provides a DOS program which performs this calculation. The cross-lagged correlation requires an assumption of quasi-stationarity of the synchronous correlations, i.e. the correlation of A1 with B1 is the same as the correlation of A2 with B2.


Conley, Dalton and Pollard, David (1998, December 12). Model Comparisons. Direct and Indirect Effects. [WWW document]. URL:
Hahn, André (1999, May 11). Computer Programm für "cross-lagged" Korrelationen. [WWW document]. URL: (In German)
Howell, David C. (1992). Statistical Methods for Psychology. Third Edition. Belmont, California: Duxberry Press.
Kinnear, Paul R. and Gray, Colin D. (1999). SPSS for Windows Made Simple. Third Edition. Hove, East Sussex: Psychology Press Ltd.
Mauranen, Kari (1997, September 8). Analysis of Variance and Covariance I. [WWW document]. URL:
Pedhazur, Elazar J. (1982). Multiple Regression in Behavioral Research. Second edition. New York: CBS College Publishing.
Richardson, J.T.E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Research Review, 6, 135-147.

Last modified: Wednesday, 28 January 2015, 2:19 PM