Further comments on this article are now closed.

Jon-Eric Steinbomer wrote:

Great article, and these distinctions are very much worth taking the time to clarify. Could you offer recommendations or pointers on applying inferential statistical analyses of data (Lickert responses, counts, etc.) with small sample sizes (5-12 participants)?

-- posted at 05:53 PM on March 23, 2010
Reply to Jon-Eric Steinbomer

We don't recommend using inferential statistics with very small sample sizes. Small sample usability tests work fine when the goal is to find usability problems. In this case there is no need for inferential statistics.

But in summative testing, where one may wish to compare scores (including Likert scale responses) for more than one product or system, or for different types of users, a larger sample is needed. Sample size affects the sensitivity of a statistical test (its ability to detect whether there is a meaningful difference or "effect"). If the sample is too small one runs the risk of missing a real effect (statisticians call this a 'Type 2' error). If effects are large, and if the study has been carefully designed to minimize random error, it is sometimes possible to detect them with as few as 10 or 12 participants. However, you could still miss smaller effects that may be important to your design or marketing team. ISO 20282 describes methods for measuring the ease of operation of everyday products and recommends testing with at least 50 participants.

A complete answer to your question would need a lot more room than I have here, but you may find Jeff Sauro's websites, Measuring Usability and Usable Stats helpful. They contain a wealth of valuable information, tips, and tutorials on statistics in usability. If you want to delve further, try the Witte & Witte book that is referenced in the article. It is one of the best we've seen.

-- posted at 08:41 AM on March 24, 2010

Philip: Just a note: you mentioned about analysing Likert scale responses with ANOVA. You need to be careful here because the responses are likely to be parametric in which case you would need to use a Kruskal-Wallis (between-subjects design) or Friedman (within-subjects). Nunally (not sure of the year but it's an old book) said that a Likert type scale with 11 or more points could be considered continuous and thus suitable for ANOVA, but then the ANOVA has other assumptions (homogeneity of variance; normal distribution of residuals, independent cases). And if the residuals are not normally distributed, then the nonparametric versions (KS & Friedman, above) can be used.

Jon-Eric (just came across this article by chance; hope things are good in MS for you): agreed with what Phil says about small samples but the power depends on the question. You can possibly go with 12 but that's the minimum. For Likert type scales, depends how many conditions (1, 2, 3+) and the design (within or between subjects). For frequencies, probably a chi-square.

-- posted at 12:58 PM on July 01, 2011
Powered by TalkBack

Axure prototyping

Feb 20-22, London: Whatever your level of expertise in Axure RP Pro, there's something for you in our 3-day Axure training-fest. More details

Free newsletter

Sign up now and download your free guide to usability test moderation.


Our services

Let us help you create great customer experiences.

Upcoming courses

We run public training courses in usability every month.

Get free email updates

Join the 1000s of other people who get their monthly fix of user experience insights from Userfocus and get a free guide to usability test moderation.