Weight of evidence

Someone once said, “There are no questions in usability.” I think it was me. I admit it’s not a great quote. It’s not up there with, “Never in the field of human conflict…” or, “One small step for man…” but it’s a good one — and it leads to a useful rule of thumb for UX researchers.

Let me explain.

Some years ago while working for a large corporation I was preparing a usability test when the project manager called and asked me to send over the list of usability questions.

“There are no questions in usability,” I replied.

“What do you mean?” she asked, “How can there be no questions? How are you going to find out if people like our new design?”

“But I’m not trying to find out if they like it,” I pointed out in a manner that, in hindsight, seems unnecessarily stroppy, “I’m trying to find out if they can use it. I have a list of tasks not a list of questions.

Good and bad data

Requests to shovel explicit, “What do you think?” questions into UX studies betray the fact that not only do some stakeholders not understand the purpose of a usability test, but also that they believe all customer responses are necessarily valuable. It shows that they are unaware of the concept of good data and bad data, and, as a result, believe that all customer feedback is grist to the mill.

But it isn’t.

There’s good grist and there’s useless grist. Similarly, there’s strong data and weak data. This holds true for all fields of research, whether developing a new medicine, discovering a new planet, solving a crime, or evaluating a new software interface.

User Experience research is about observing what people do. It’s not about canvassing people’s opinions. This is because, as data, opinions are worthless. For every 10 people who like your design 10 others will hate it and 10 more won't care one way or the other. Opinions are not evidence.

Behaviours, on the other hand, are evidence. This is why a detective would much rather catch someone ‘red-handed’ in the act of committing a crime than depend on hearsay and supposition. Hence the often-repeated advice: “Pay attention to what people do, not to what they say.” It’s almost become a UX cliché but it’s a good starting point for a discussion about something important: strength of evidence. This is the notion that some data provide strong evidence, some provide only moderately strong evidence, and some provide weak evidence. You don't want to base your product development efforts on weak evidence.

Evidence in user experience research

Evidence is what we use to support our claims and our reasoning. It’s what gives us credibility when we make decisions about a specific design parameter, about product features, about when to exit an iterative design-test loop, about go/no-go decisions, and about whether to launch a new product, service or website. Evidence is what we present to our development team and what we bring to the table to arbitrate disagreements and disputes. Evidence is what helps us avoid making knee-jerk seat of the pants decisions. We back our reasoning with evidence based on good data. Data are the stuff of research. “Data! Data! Data!” cried Sherlock Holmes. “I can’t make bricks without clay.”

It may look as though UX studies are ‘method-first’ events (“We need a usability test”, “I want a Contextual Inquiry study”, “Let’s do a Card Sort”) but the UX researcher, focusing on the underlying research question, thinks ‘data-first’: What kind of data must I collect to provide credible and compelling evidence on this issue? The method then follows.

What is strong evidence?

Strong evidence results from data that are valid and reliable.

Valid data are data that really do measure the construct that you think they are measuring. In a usability test, valid data measures things like task completion rate and efficiency rather than aesthetic appeal or preference.

Reliable data are data that can be replicated if you, or someone else, conducted the research again using the same method but with different test participants.

No matter what the method, research data must be valid and reliable — or the data should be discarded.

In UX research, strong data come from task-based studies, from studies that focus on observable user behaviours where the data are objective and unbiased—effectively catching the user ‘red-handed’ in the act. Strong data come with a confidence level, and assure us that further research is unlikely to change our degree of confidence in our findings.

The following is a brief taxonomy of methods based on levels of evidence—actually it’s a taxonomy of the types of data that result from the methods. It assumes, in all cases, that the method has been well designed and well conducted. It’s not an exhaustive list, but it includes methods the UX researcher is likely to consider in a typical user-centered design lifecycle.

Examples of strong UX evidence

Strong UX evidence invariably involves target users doing tasks or engaging in some activity that is relevant to the concept being designed or the issue being investigated. It includes data from:

  • Contextual research (field visits and other ethnography variants that record the behaviour of users as they currently do their work and achieve their goals).
  • Formative and summative usability tests in which actual users carry out actual tasks using an interface or product.
  • Web or search analytics and any kind of automatically collected usage data.
  • A/B or Multivariate testing.
  • Controlled Experiments.
  • Task analysis.
  • Secondary research of behavioural studies, drawing on meta-analyses and peer-reviewed papers, and on previous UX reports that fully describe the method used.

Examples of moderately strong UX evidence

To qualify for this category, data should come from studies that at least include carrying out tasks—either by users or by usability experts, or involve self-reporting of actual behaviours. These methods are often a precursor to methods from the ‘Strong’ category. They fall into this category because the data typically has a higher degree of variability or uncertainty. They include:

Examples of weak UX evidence

Decisions based on weak or flawed data can cost companies millions of dollars if they result in bad designs, poor marketing decisions or false product claims. So the obvious question is, why would you ever design a study to collect weak data?

You wouldn't.

Data from these methods have no place in UX research. They result from methods that are either badly flawed or are little better than guesswork. If you can choose between spending your UX budget on these methods or donating it to charity — opt for the latter.

  • Any kind of faux-usability test, for example tests that ask people which design they like best, or tests that rely heavily on interviewing for primary data collection.
  • Unmoderated, thinking aloud testing that allows users to simply act as if they were expert reviewers while not actually doing the tasks.
  • Usability evaluations—even by experts— that are based on ‘just kicking the tyres’.
  • Focus groups (don't get me started)
  • Surveys (you’re allowed to disagree but only if you slept through the 2016 US election and its many polling results)
  • Intuition, appeal to authorities or personal experience.
  • The opinions of friends, family, work colleagues, your boss, company managers and executives.

How to judge the strength of evidence from a study or a report

Start by asking these questions:

  • Why should I believe your claim?
  • How good is your evidence?
  • Can I depend on these findings?

These are not trick questions: anyone presenting research findings should be able to answer them.

During a study you can ask yourself:

  • Am I observing people working (carrying out tasks with a prototype) rather than listening to what they are saying (giving their opinions about a design)?
  • Are people in an interview speculating on what they might do in the future? Or are they relating actual events that have happened to them in the past?

Some time ago I prepared a checklist for evaluating research studies. If you want to give a research study a good shakedown, you’ll find lots of useful checkpoints and questions there.

I started the article by promising a rule of thumb. Here it is. Use this as your mantra when evaluating the strength of user experience research: “Behavioural data are strong. Opinion data are weak.”

About the author

Philip Hodgson

Dr. Philip Hodgson (@bpusability on Twitter) holds a B.Sc., M.A., and Ph.D. in Experimental Psychology. He has over twenty years of experience as a researcher, consultant, and trainer in usability, user experience, human factors and experimental psychology. His work has influenced product and system design in the consumer, telecoms, manufacturing, packaging, public safety, web and medical domains for the North American, European, and Asian markets.



Foundation Certificate in UX

Gain hands-on practice in all the key areas of UX while you prepare for the BCS Foundation Certificate in User Experience. More details

Download the best of Userfocus. For free.

100s of pages of practical advice on user experience, in handy portable form. 'Bright Ideas' eBooks.

UX newsletter

Every month, we share an in-depth article on user experience with over 10,000 newsletter readers. Want in? Sign up now and download a free guide to usability test moderation.

Related articles & resources

This article is tagged ethnography, strategy, usabilitytesting.

Search for articles by keyword


Our services

Let us help you create great customer experiences.

Upcoming courses

We run regular training courses in usability and user experience.

Get free email updates

Join the thousands of other people who get their monthly fix of user experience insights from Userfocus and get a free guide to usability test moderation.