UX Certification
Get hands-on practice in all the key areas of UX and prepare for the BCS Foundation Certificate.
How should you go about collecting data in usability tests? This article examines the data collection process in usability studies and describes some popular data logging solutions. Since most of these tools are expensive, I show you how you can use Microsoft Excel with Visual Basic macros to collect the data.
Anyone who has ever conducted a usability evaluation of a web site, software application, or consumer product, knows that human behaviour research often produces reams of data that can take significant time to analyse. To be productive, researchers must organize and reduce these data so that they can quickly perform their analysis and proceed with improving the product.
People who are new to the field tend to take notes on paper or on a computer. Unfortunately, this approach can make data compilation cumbersome. More complex and powerful tools to assist with compilation and analysis are becoming more common, but they can be expensive and often provide more functions than most researchers need. In addition, such proprietary tools are frequently restricted to the Windows operating system, and they offer little, if any, ability for customizing, making it difficult for practitioners to tailor the tool to meet their specific needs.
Before entering into a discussion on logging the data that arises from usability research, let's review some terminology associated with usability research.
According to the International Standards Organization (ISO 9241-11) there are three primary attributes that comprise usability: effectiveness, efficiency, and satisfaction. Some usability experts feel that we should also consider some additional contributing elements such as learnability and retention. Table 1 shows how some of the more popular definitions of usability map to one another.
ISO 9241-11 | Nielsen 1993 | Shneiderman 1998 |
---|---|---|
Efficiency | Efficiency Learnability |
Speed of Performance Time to Learn |
Effectiveness | Memorability Errors Safety |
Retention over time Rate of errors by users |
Satisfaction | Satisfaction | Subjective satisfaction |
The terms quantitative and qualitative have become pervasive in the user-centered design community. Because of the close working relationship between marketing and user experience groups in product design, there is often confusion surrounding these terms. In my experience, the confusion is generally due to the tendency to refer to research "methods" and "data" as if they are always the same. This is not always true, however.
For example, a survey of 1000 people is often regarded as quantitative research, yet it may collect both quantitative and qualitative data. Similarly, interviews and usability tests are often considered qualitative research, yet they too can collect both quantitative and qualitative data. By reducing the "research" to a single type, we fail to recognize that different types of data may be collected, and that different types of analysis are appropriate.
Another dimension of data that can be confusing to people is the objective and subjective dimension. In general, objective data, that which is "external to the mind", based on facts, and unbiased by opinion or interpretation is more valuable than subjective data, that which "exists in the mind" and belongs to the thinking subject rather than the object of thought. Just as both quantitative and qualitative data may be collected in usability research, both objective and subjective data may be collected.
Effectively determining the types of data to be collected in a study is really a function of the research questions that need to be answered. Table 2 provides some examples of usability data collection to help convey the possibilities.
Quantitative | Qualitative | |||
---|---|---|---|---|
Objective | Subjective | Objective | Subjective | |
Effectiveness | Count of tasks completed successfully (according to predefined criteria). Count of errors committed by user during task performance (according to predefined criteria). |
Likert scale rating by participant of how well the product solves the intended job. | A description of the observed sequence of steps performed by user. | Participant's comments related to completing a given task. |
Efficiency | Time spent per completed task. Count of number of clicks performed during task completion. |
Likert scale rating by participant of how efficient they perceive the product to be. | Participant's comments related to perceived efficiency of product. | |
Satisfaction | Likert scale rating of participant satisfaction. | Participant's comments related to satisfaction with product. A description of observed behaviour by participant (frustration, delight, etc.) |
One final distinction that exists in the field of usability research is the one between formative and summative evaluations. In a formative evaluation, the emphasis is on the "formation" of the future design and direction of a product. Data collected to help drive this future direction may include qualitative data that is largely based on users' observed behaviours and comments about the product. It may also include quantitative data, however, especially when the research question involves an A-B comparison between two early prototypes.
A summative evaluation is intended to provide a "summation" of the products' current state, ideally in the form of a measurable score. Due to the desire for a numerical score, quantitative data collection is generally the priority in summative research. Qualitative data may still be collected as a 'bonus' to supplement the value gained from the study, provided it does not interfere or influence the collection of quantitative data. For example, a think-aloud protocol is generally not performed during a summative study due to its impact on the time required to complete a task. However, qualitative comments may be recorded after tasks and/or at the end of a study without adversely affecting other data collection.
Finally, it's worth noting a relatively recent enthusiasm in the usability community for including confidence intervals when presenting the effectiveness results from a usability study. Sauro (2005) makes a case for the benefit of including confidence intervals in addition to point estimates (i.e. the basic task completion percentage that is commonly reported) in order to help convey the margin of error associated with the results, and to "temper both excessive scepticism and overstated usability findings".
Calculating confidence intervals is a relatively straightforward procedure, although as Lewis and Sauro (2006) point out, there are numerous subtleties to be considered in your selection of method used. In practice, however, the results seem to be quite similar across the different methods of calculation, such that applying any one of their top 2-3 recommended methods will allow you to achieve the goal of communicating that a margin of error is associated with your results.
As with most types of research, usability research is frequently characterized by "tailored approaches" refined and customized by organizations and individuals to meet their specific needs. This is particularly true when it comes to collecting data.
In an Idea Market session conducted at UPA 2004, Dr. David Dayton (Southern Polytechnic State University) delved into the details of how six different practitioners (including himself) logged usability data. As part of his review of datalogging practices, Dayton (2004) describes four popular types:
Dayton (2004) made two other interesting observations about data logging in his UPA workshop. He noted that his collection of participants and attendees at the session were generally able to agree that an effective data log is one that "allows a team to find answers to its questions without having to review the session tapes." Interestingly, of the six data points mapped to his common practices the only practitioner who regularly includes a reference to recorded video data was Dayton himself.
Dayton (2004) also discovered from his work with experienced practitioners, that the most common mistake made by those new to logging is the tendency for "observation overkill". That is, the tendency to record excess information that ends up impeding the team in reviewing the logs during analysis sessions. In this author's experience, this signal to noise ratio of "data to information" certainly determines the value of your logged results.
With respect to the types of data collection commonly practiced, an interesting issue presents itself regarding the 'on the fly' coding of data according to preset categories. Researchers should consider how important it is to their study that they separate data collection from data analysis, for this decision may have implications on the time required to complete the analysis phase, as well as the quality of analysis performed.
Keeping data collection separate from analysis allows the researcher to concentrate on making quality observations, and leaves the analysis and pattern identification of problems to be performed later once all data has been gathered. This approach may be especially appropriate when the product being evaluated is new and a predetermined set of categories or codes may not be entirely appropriate for that product.
Alternatively, pre-defined categories for coding data 'on the fly' may help expedite a research study by reducing the amount of time spent in data analysis. When the product being tested has been tested previously, pre-defined categories are more likely to be anticipated with a high degree of accuracy.
As Dayton (2004) revealed by his small sample from a UPA conference, data logging practices may consist of a basic paper notepad or electronic document. A pre-designed form or template may further facilitate logging by anticipating in advance some of the patterns to be recorded and providing a checklist approach to recording common observations. Even with a well-designed form, however, this method can result in a significant "paper shuffle" at the end of the study. Multiple pages of documents with scribbled notes and numbers, often out of order and inconsistently labelled, must then be collated, coded, entered into some type of sorting software, and categorized by task or question — all of this just so the analysis phase can begin!
Low-tech solutions are also limited when it comes to collecting efficiency measures. Typically, measuring efficiency requires the use of a stopwatch or some external timekeeping tool whose results are then manually recorded onto the printed form. During analysis, these efficiency data may get manually entered a second time from the paper form into some analysis software before insertion into the final report. Practices which require little or no data re-entry are most desirable from the perspective of data-integrity.
In recent years, an increasing number of computer software packages has been developed specifically to support usability researchers in collecting, coding, analysing, and even reporting their usability data. Several of these programs provide particularly excellent solutions for managing the video data captured during a usability study. Like any solution, however, these applications have their strengths and weaknesses. The following section presents a quick review of some of the available solutions on the market.
The following software products represent a range of solutions currently available:
Pros
Cons
Since VisiCalc (short for 'visible calculator') arrived on the scene in 1978, the computer spreadsheet has been considered by many to be the original 'killer software application'. A few years later, Lotus 1-2-3 assumed the lead position in the spreadsheet category, and shortly after that Microsoft Excel captured first place. Today, Microsoft Excel holds one of the largest installed bases of any software application, and is an integral part of the Microsoft Office suite of business applications.
Pros
Cons
If you'd like to try out an Excel datalogger, visit the Datalogger download page.
Dr. Todd Zazelenchuk (@ToddZazelenchuk on Twitter) holds a BSc in Geography, a BEd, an MSc in Educational Technology and a PhD in Instructional Design. Todd is an associate of Userfocus and works in product design at Plantronics in Santa Cruz, CA where he designs integrated mobile, web, and client-based software applications that enhance the user experience with Plantronics' hardware devices.
Gain hands-on practice in all the key areas of UX while you prepare for the BCS Foundation Certificate in User Experience. More details
This Excel spreadsheet allows you to measure task completion rates, time-on-task, analyse questionnaire data, and summarise participant comments. Usability test datalogger.
This article is tagged ISO 9241, metrics, questionnaires, tools, usability testing.
Our most recent videos
Our most recent articles
Let us help you create great customer experiences.
We run regular training courses in usability and UX.
Join our community of UX professionals who get their user experience training from Userfocus. See our curriculum.
copyright © Userfocus 2021.
Get hands-on practice in all the key areas of UX and prepare for the BCS Foundation Certificate.
We can tailor our user research and design courses to address the specific issues facing your development team.
Users don't always know what they want and their opinions can be unreliable — so we help you get behind your users' behaviour.