Data collection for usability research

David Travis

How should you go about collecting data in usability tests? This article examines the data collection process in usability studies and describes some popular data logging solutions. Since most of these tools are expensive, I show you how you can use Microsoft Excel with Visual Basic macros to collect the data.

-- Todd Zazelenchuk, May 5, 2008


Taking notes in usability tests

Anyone who has ever conducted a usability evaluation of a website, software application, or consumer product, knows that human behavior research often produces reams of data that can take significant time to analyze. To be productive, researchers must organize and reduce these data so that they can quickly perform their analysis and proceed with improving the product.

People who are new to the field tend to take notes on paper or on a computer. Unfortunately, this approach can make data compilation cumbersome. More complex and powerful tools to assist with compilation and analysis are becoming more common, but they can be expensive and often provide more functions than most researchers need. In addition, such proprietary tools are frequently restricted to the Windows operating system, and they offer little, if any, ability for customization, making it difficult for practitioners to tailor the tool to meet their specific needs.

How to define behaviours and collect data in usability tests

Before entering into a discussion on logging the data that arises from usability research, let's review some terminology associated with usability research.

The ISO Definition of usability: Effectiveness, Efficiency, and Satisfaction

According to the International Standards Organization (ISO 9241-11) there are three primary attributes that comprise usability: effectiveness, efficiency, and satisfaction. Some usability experts feel that we should also consider some additional contributing elements such as learnability and retention. Table 1 shows how some of the more popular definitions of usability map to one another.

Three Popular Definitions of Usability

ISO 9241-11

Nielsen 1993

Shneiderman 1998

Efficiency

Efficiency
Learnability

Speed of Performance
Time to Learn

Effectiveness

Memorability
Errors
Safety

Retention over time
Rate of errors by users

Satisfaction

Satisfaction

Subjective satisfaction

Distinguishing Quantitative/Qualitative and Objective/Subjective Data

The terms quantitative and qualitative have become pervasive in the user-centered design community. Because of the close working relationship between marketing and user experience groups in product design, there is often confusion surrounding these terms. In my experience, the confusion is generally due to the tendency to refer to research "methods" and "data" as if they are always the same. This is not always true, however.

For example, a survey of 1000 people is often regarded as quantitative research, yet it may collect both quantitative and qualitative data. Similarly, interviews and usability tests are often considered qualitative research, yet they too can collect both quantitative and qualitative data. By reducing the "research" to a single type, we fail to recognize that different types of data may be collected, and that different types of analysis are appropriate.

Another dimension of data that can be confusing to people is the objective and subjective dimension. In general, objective data, that which is "external to the mind", based on facts, and unbiased by opinion or interpretation is more valuable than subjective data, that which "exists in the mind" and belongs to the thinking subject rather than the object of thought. Just as both quantitative and qualitative data may be collected in usability research, both objective and subjective data may be collected.

Effectively determining the types of data to be collected in a study is really a function of the research questions that need to be answered. Table 2 provides some examples of usability data collection to help convey the possibilities.

Types of Data Collected for Effectiveness, Efficiency, Satisfaction

Quantitative / Objective

Quantitative / Subjective

Qualitative / Objective

Qualitative / Subjective

Effectiveness

Count of tasks completed successfully (according to predefined criteria).
Count of errors committed by user during task performance (according to predefined criteria).

Likert scale rating by participant of how well the product solves the intended job.

A description of the observed sequence of steps performed by user.

Participant's comments related to completing a given task.

Efficiency

Time spent per completed task.
Count of number of clicks performed during task completion.

Likert scale rating by participant of how efficient they perceive the product to be.

Participant's comments related to perceived efficiency of product.

Satisfaction

Likert scale rating of participant satisfaction.

Participant's comments related to satisfaction with product.
A description of observed behavior by participant (frustration, delight, etc.)

Distinguishing between Formative and Summative Evaluations

One final distinction that exists in the field of usability research is the one between formative and summative evaluations. In a formative evaluation, the emphasis is on the "formation" of the future design and direction of a product. Data collected to help drive this future direction may include qualitative data that is largely based on users' observed behaviors and comments about the product. It may also include quantitative data, however, especially when the research question involves an A-B comparison between two early prototypes.

A summative evaluation is intended to provide a "summation" of the products' current state, ideally in the form of a measurable score. Due to the desire for a numerical score, quantitative data collection is generally the priority in summative research. Qualitative data may still be collected as a 'bonus' to supplement the value gained from the study, provided it does not interfere or influence the collection of quantitative data. For example, a think-aloud protocol is generally not performed during a summative study due to its impact on the time required to complete a task. However, qualitative comments may be recorded after tasks and/or at the end of a study without adversely affecting other data collection.

Presenting Your Data with Confidence

Finally, it's worth noting a relatively recent enthusiasm in the usability community for including confidence intervals when presenting the effectiveness results from a usability study. Sauro (2005) makes a case for the benefit of including confidence intervals in addition to point estimates (i.e. the basic task completion percentage that is commonly reported) in order to help convey the margin of error associated with the results, and to "temper both excessive skepticism and overstated usability findings".

Calculating confidence intervals is a relatively straightforward procedure, although as Lewis and Sauro (2006) point out, there are numerous subtleties to be considered in your selection of method used. In practice, however, the results seem to be quite similar across the different methods of calculation, such that applying any one of their top 2-3 recommended methods will allow you to achieve the goal of communicating that a margin of error is associated with your results.

Datalogging Practices

What's the common practice?

As with most types of research, usability research is frequently characterized by "tailored approaches" refined and customized by organizations and individuals to meet their specific needs. This is particularly true when it comes to collecting data.

In an Idea Market session conducted at UPA 2004, Dr. David Dayton (Southern Polytechnic State University) delved into the details of how six different practitioners (including himself) logged usability data. As part of his review of datalogging practices, Dayton (2004) describes four popular types:

Problem Coding
Record predictable events and sort them on the fly into one or more categories. Analyze the resulting quantitative data with statistical methods and compare to pre-defined benchmarks to assess the usability of the product. None of the practitioners reported regular use of this technique.
Event Description
Records free-form handwritten notes to capture significant events and/or usability problems. Analyze notes post-test, group and categorize events, and rate their severity. 4 of the practitioners reported regular use of this technique.
Event Description with Problem Coding
Combine methods 1 and 2. Code events into certain pre-set categories, and enter descriptive notes for later team review and discussion of the most significant problems. 1 of the practitioners reported regular use of this technique.
Event Description with Problem Coding & Video Time Stamps
Capture the "story" of a test session in shorthand notes. Code significant events into preset categories (“navigation problem”, “mental model gap”). Time-stamp to sync notes with video. 1 of the practitioners reported regular use of this technique.

Dayton (2004) made two other interesting observations about data logging in his UPA workshop. He noted that his collection of participants and attendees at the session were generally able to agree that an effective data log is one that "allows a team to find answers to its questions without having to review the session tapes." Interestingly, of the six data points mapped to his common practices the only practitioner who regularly includes a reference to recorded video data was Dayton himself.

Dayton (2004) also discovered from his work with experienced practitioners, that the most common mistake made by those new to logging is the tendency for "observation overkill". That is, the tendency to record excess information that ends up impeding the team in reviewing the logs during analysis sessions. In this author's experience, this signal to noise ratio of "data to information" certainly determines the value of your logged results.

The benefits of separating data collection from data analysis

With respect to the types of data collection commonly practiced, an interesting issue presents itself regarding the 'on the fly' coding of data according to preset categories. Researchers should consider how important it is to their study that they separate data collection from data analysis, for this decision may have implications on the time required to complete the analysis phase, as well as the quality of analysis performed.

Keeping data collection separate from analysis allows the researcher to concentrate on making quality observations, and leaves the analysis and pattern identification of problems to be performed later once all data has been gathered. This approach may be especially appropriate when the product being evaluated is new and a predetermined set of categories or codes may not be entirely appropriate for that product.

Alternatively, pre-defined categories for coding data 'on the fly' may help expedite a research study by reducing the amount of time spent in data analysis. When the product being tested has been tested previously, pre-defined categories are more likely to be anticipated with a high degree of accuracy.

Data Collection: The low-tech solution

As Dayton (2004) revealed by his small sample from a UPA conference, data logging practices may consist of a basic paper notepad or electronic document. A pre-designed form or template may further facilitate logging by anticipating in advance some of the patterns to be recorded and providing a checklist approach to recording common observations. Even with a well-designed form, however, this method can result in a significant "paper shuffle" at the end of the study. Multiple pages of documents with scribbled notes and numbers, often out of order and inconsistently labeled, must then be collated, coded, entered into some type of sorting software, and categorized by task or question — all of this just so the analysis phase can begin!

Low-tech solutions are also limited when it comes to collecting efficiency measures. Typically, measuring efficiency requires the use of a stopwatch or some external timekeeping tool whose results are then manually recorded onto the printed form. During analysis, these efficiency data may get manually entered a second time from the paper form into some analysis software before insertion into the final report. Practices which require little or no data re-entry are most desirable from the persepctive of data-integrity.

Data Collection: the high-tech solution

In recent years, an increasing number of computer software packages has been developed specifically to support usability researchers in collecting, coding, analyzing, and even reporting their usability data. Several of these programs provide particularly excellent solutions for managing the video data captured during a usability study. Like any solution, however, these applications have their strengths and weaknesses. The following section presents a quick review of some of the available solutions on the market.

A Survey of Usability Data Logging Software

The following software products represent a range of solutions currently available:

Bit Debris
The Usability Activity Log "offers an effective means to easily and unobtrusively document observational data and task performance". This application can be synchronized with existing video equipment so that recorded observations are directly 'linked' to the accompanying video data for easy access by the researcher. The product is a Windows application and costs $300.00 USD per license.
Noldus
The Noldus Observer is "a professional system for the collection, analysis, presentation and management of observational data." This application is able to accommodate data entry directly from a computer, a handheld device, or a video recorder, and offers extensive coding and analysis options to the researcher. While the Observer is packed with powerful features that may be needed for extensive qualitative research, it may be overkill for many usability researchers. Observer is Windows only.
OvoStudios
OVO Logger comes in three different flavours (freeware, a la carte, fully featured) and provides extensive logging options for both notes and "tapeless" video as well as powerful bookmarking and reporting features designed to optimize the analysis and reporting phase of a study. While the fully featured version may be more than many researchers need or have a budget for, the freeware version may be just the right ticket. OVO Logger is a Windows-only product.
Techsmith
Morae is touted as "the only fully integrated, all-digital solution for analyzing human-computer interaction". This application takes full advantage of digital video technology to allow researchers to capture, store, locate, and edit their video data from a usability study. Priced at $1495.00, this application may be an attractively priced solution for managing video data. In addition, the software provides the ability to enter notes and set markers to reference the corresponding video. Morae is Windows only but can be used with Uservue to collect data from users anywhere in the world.
Usability Systems/Alucid
UsabilityWare 4.0 is "a single program that can be relied upon as a beginning-to-end tool for all of your data collection, analysis, and final deliverables." This program allows you to enter your recruitment and scheduling details prior to a study, record your observations during the study, analyze the results, and build a report based on the data. The product is a Windows application and costs $4500.00 per license.

Pros and Cons of Commercial Software Dataloggers

Pros

Cons

How to use Microsoft Excel for Data Logging

Since VisiCalc (short for 'visible calculator') arrived on the scene in 1978, the computer spreadsheet has been considered by many to be the original 'killer software application'. A few years later, Lotus 1-2-3 assumed the lead position in the spreadsheet category, and shortly after that Microsoft Excel captured first place. Today, Microsoft Excel holds one of the largest installed bases of any software application, and is an integral part of the Microsoft Office suite of business applications.

Pros

Cons

Download Excel datalogger

If you'd like to try out an Excel datalogger, visit the Datalogger download page.

Want this article on your web site?

You can republish this article on your own web site or intranet. You don't need to ask permission so long as you include this citation at the end of the article.

Free Usability Newsletter

Let us notify you when we publish more articles like this. Benefits of signing up.

Join Our Mailing List

If you liked this, try…

Usability test datalogger

This Excel spreadsheet allows you to measure task completion rates, time-on-task, analyse questionnaire data, and summarise participant comments.

More related articles

See all articles and resources tagged iso9241, metrics, questionnaires, tools, usabilitytesting.