Skewness

A topic in the lecture "Assumption of Normality"

A distribution is skewed if it has one fat tail (has a greater than normal proportion of extreme scores in one tail only). The formula for measuring skewness in the sample is:
  • skewnessFormula.gif
All scores are changed into z scores, then cubed, and the statistic (g1) equals the mean of the cubed z scores. Note this has two important attributes:
  1. Because they are cubed, any large z scores (i.e. outliers) have a big effect on the statistic.
  2. Cubing a value retains its sign, so if there is a large positive z value (an extreme score above the mean, say z=3.1 so z cubed equals 29.79) and also a large negative z value (an extreme score below the mean, say z=-2.9, so z cubed equals -24.38) they more or less cancel each other out. So g1 can only be a value far from zero if most of the extreme scores are off in one direction only (i.e. the data are skewed). If the data are perfectly symmetrical then g1=0.

The formula given above is for measuring the skewness of the sample (i.e. it is a statistic). What we wantto know, however, is the skewness of the population (the parameter), for the assumption of normality has to do with the shape of the population. g1 is a biased estimate of the skewness of the population. As with the variance, a slight adjustment is needed to the formula to estimate the parameter. So which does SPSS give you when it reports a value for ‘skewness’, the statistic or the estimate of the parameter?

Most statistics books give the formula I gave above (the statistic). SPSS gives you the estimate of the parameter, which is good because that is what you want. The point, however, is that information on which formula is used in computing a value provided by a statistics program is hard to come by. You won’t find this information in the SPSS help menu or in many of the books about SPSS. This is one of the challenges of using a statistics program, finding out exactly what it is computing. For SPSS I have found the book by Gardner (2001) to be most informative.

Back to skewness. If the data are negatively skewed then the value of g1 will be negative, if the data are positively skewed then the value of g1 will be positive. The more skewed the data the bigger the absolute value of g1, but how big does g1 have to get to be worthy of your attention? Gardner suggests turning g1 into a standard score, which we know how to interpret.

  • stdscrSkewness.gif

SPSS will give you the value of skewness (estimate of skewness in the population) and the standard error that goes with it. The mean of the population of skewness scores if the data are symmetrical (i.e. if H0 is true) is zero. With that information you can calculate the standard score of the skewness, if its absolute value is greater than, say, 2 they you might want to worry (as you can see, this is not a precise thing). The measure of skewness and its standard error are available as part of the SPSS 'Analyze>>DescriptiveStatistics>>Explore' menu item and they also show up unbidden in the output of various other procedures.

Examples


Normal Distribution: this sample was drawn from a normal population, which is a symmetrical distribution, and thus the absolute value of the standard score of the measure of skewness should be low.

  • NormalHist.jpg Skewness=-.04 Std Error=.241 Standard Score=-0.17

>
Negatively Skewed Distribution: this sample was drawn from a negatively skewed population and thus the standard score of the measure of skewness should be a large negative number.

  • NegativeHist.jpg Skewness=-1.161. Std Err=.229 Standard Score=-5.07
>

Fat-Tailed Distribution: this sample was drawn from a fat-tailed population, which is a symmetrical distribution, and thus the absolute value of the standard score of the measure of skewness should be low.

  • FatHist.jpg Skewness=-.22. Std Err=.217 Standard score=-1.01
>

Previous Topic: Measures of Non-Normality
Next Topic: Kurtosis