Further comments on this article are now closed.
Great article, and these distinctions are very much worth taking the time to clarify. Could you offer recommendations or pointers on applying inferential statistical analyses of data (Lickert responses, counts, etc.) with small sample sizes (5-12 participants)?
We don't recommend using inferential statistics with very small sample sizes. Small sample usability tests work fine when the goal is to find usability problems. In this case there is no need for inferential statistics.
But in summative testing, where one may wish to compare scores (including Likert scale responses) for more than one product or system, or for different types of users, a larger sample is needed. Sample size affects the sensitivity of a statistical test (its ability to detect whether there is a meaningful difference or "effect"). If the sample is too small one runs the risk of missing a real effect (statisticians call this a 'Type 2' error). If effects are large, and if the study has been carefully designed to minimize random error, it is sometimes possible to detect them with as few as 10 or 12 participants. However, you could still miss smaller effects that may be important to your design or marketing team. ISO 20282 describes methods for measuring the ease of operation of everyday products and recommends testing with at least 50 participants.
A complete answer to your question would need a lot more room than I have here, but you may find Jeff Sauro's websites, Measuring Usability and Usable Stats helpful. They contain a wealth of valuable information, tips, and tutorials on statistics in usability. If you want to delve further, try the Witte & Witte book that is referenced in the article. It is one of the best we've seen.
Philip: Just a note: you mentioned about analysing Likert scale responses with ANOVA. You need to be careful here because the responses are likely to be parametric in which case you would need to use a Kruskal-Wallis (between-subjects design) or Friedman (within-subjects). Nunally (not sure of the year but it's an old book) said that a Likert type scale with 11 or more points could be considered continuous and thus suitable for ANOVA, but then the ANOVA has other assumptions (homogeneity of variance; normal distribution of residuals, independent cases). And if the residuals are not normally distributed, then the nonparametric versions (KS & Friedman, above) can be used.
Jon-Eric (just came across this article by chance; hope things are good in MS for you): agreed with what Phil says about small samples but the power depends on the question. You can possibly go with 12 but that's the minimum. For Likert type scales, depends how many conditions (1, 2, 3+) and the design (within or between subjects). For frequencies, probably a chi-square.
Web Usability: An Introduction to UX
Sept 23-24, London: Get hands-on practice in all the key areas of usability, from identifying your customers through to usability testing your web site. More details
Download the best of Userfocus. For free.
100s of pages of practical advice on user experience, in handy portable form. 'Bright Ideas' eBooks.
Every month, we share an in-depth article on user experience with over 8,000 newsletter readers. Want in? Sign up now and download a free guide to usability test moderation.
User Experience Articles
Our most popular articles
Our most commented articles
Our most recent articles
- Feb 4: Cheap and free under-the-radar alternatives to field visits
- Jun 3: What Gordon Ramsay can teach us about giving feedback to design teams
- May 6: My place or yours? How to decide where to run your next usability test
- Apr 8: The usability error you don't know you're making
- Mar 4: Adapting your usability testing practise for mobile