Usability Testing CS 239 Experimental Methodologies for System Software Peter Reiher May 29, 2007 Outline What is usability testing? How do you design a usability test? How do you run a usability test? Page 1 Page 2 What Is Usability Testing? What Do We Mean By Usability? Testing a system or product to see if it is sufficiently usable By its intended human users In some ways, fundamentally different than normal performance testing Looked at broadly, though, a form of performance testing Usability means that people who use a product can do so quickly and easily to accomplish their own tasks. 1 Page 3 1 Dumas and Redish, A Practical Guide to Usability Testing Page 4 Principles of Usability 1. Usability means focusing on users 2. People use products to be productive 3. Users are busy people trying to accomplish tasks 4. Users decide when a product is easy to use Page 5 Usability Testing Fundamentals Requires testing system with human users On real tasks representative of system And of real user behavior Requires observation of those users behavior Must be performed objectively Goal is to improve the system Not to make designers feel good All systems could be made more usable Only useful if results change the system No point in bothering with usability testing if you won t change unusable elements Page 6 1
Usability Testing Vs. Performance Testing Usability Testing and the Usability Process Usability testing always involves human users Performance testing usually doesn t Usability testing always has strong subjective elements Performance testing strives for objectivity Usability testing is performed assuming existence of problems, with intention of finding them Performance testing generally seeks to find how the system behaves Usability testing usually done by those developing a system Should be an integrated part of process of making usable system Not a checkmark at the end of the development But an opportunity to fix problems that slipped through rest of usability process Page 7 Page 8 Other Elements of Usability Design Who Should Do Usability Testing? Engineering usability into design and system Involving real users throughout process Allow usability to drive design Setting quantitative usability goals early in process Being committed to making technology work for people Not really researchers building prototypes of brand new ideas Unless those ideas closely related to user behavior Those designing products Those considering changing their organization s software Page 9 Page 10 Why Do Usability Testing? People use products that are usable They use the parts of them that are most usable Many products fail because they re too hard to use Non-usable products cost their developers: Help desks Lost sales Wasted time building features that are never used Basic Outline of a Usability Test Get system at reasonable stage of maturity Determine possible usability problems Design tests to investigate usability Plan test to determine usability Recruit users and run them through the test Interpret the results Improve the system based on results Iterate Page 11 Page 12 2
What Is a Usability Test Like? Recruited users are brought into testing environment They are given tasks to perform on the system to be tested Test team observes their performance Keeping careful notes and records Notes and records for all test participants analyzed to determine what test told you Page 13 A Typical Test Situation The participant arrives and fills out paperwork The participant is instructed in how to do the test The participant is shown to the special testing room One or more test team members observe him performing the test Gathering data as he does so The participant performs test tasks for several hours There is a debriefing Often involving a questionnaire The participant leaves and the testers prepare for the next participant Whole process could take ½ to full day May involve videotaping Will certainly involve much analysis later Page 14 Designing a Usability Test Planning is vital Must be totally prepared before you test anything Planning may take more time than testing But testing time likely to be wasted if you haven t planned carefully When Do You Test Usability? Early testing will allow easier response to major problems But you might not have a testable system early on Very late testing allows no time for non-trivial improvements Generally, start testing as soon as you have something usable Quick prototypes of user interfaces can be helpful at early stages Page 15 Page 16 Important Questions for Planning the Test Steps in Planning a Usability Test What aspects of system might not be as usable as they should be? How can we use a few test subjects to cover the range of real system users? What tasks should test users perform in the limited time you work with them? What information will you collect during test? How will you analyze that information? What will you do with the analysis? Page 17 Define goals and concerns Decide who should participate Recruit participants Select and organize test tasks Decide how to measure usability Prepare testing environment and supporting test materials Prepare a test team Conduct a pilot test Then you re ready to go with the real test Page 18 3
Scheduling Issues Reasonable usability tests typically take 8-12 weeks From inception to completion 4-6 weeks is possible With experienced testers, cutting some corners 1 week allows only testing small things And only if you have lots of experience Goals and Concerns Goals describe how the system must behave to be usable Concerns are areas where you see potential for usability problems Where do they come from? System developers Usability experts Previous experience By tomorrow is right out Page 19 Page 20 Defining Usability Goals Usability Concerns Goals will be specific to the system under test Usually stated in declarative sentence: Users will be able to choose the right menu item in less than 30 seconds with no more than one mistake. Many goals are possible and appropriate You can only test for a few So choose most important ones They start general: We ve introduced an icon-driven method of controlling the product, and users might not understand it. They need to become specific, for testing: New users to the system might tend to choose the wrong icons to perform the three most common tasks. Page 21 Page 22 Choosing Participants The Wrong Way Usability testing involves sitting real people down to test a system Results depend a lot on who sits down How do you choose the right set of participants for your test? Choose the most convenient ones: The system developers The secretaries in the office suites Your buddies The students in your classes Page 23 Page 24 4
The Right Way How Do I Get These Participants? Ask yourself who you re building the system for Recruit participants from that group If it s built for programmers, recruit programmers If it s built for consumers, recruit consumers If it s built for accountants, recruit accountants Using the wrong group of participants can give you completely bogus results Typically, you re only fooling yourself And not for long Since the real users will make their own judgment To repeat: Don t run usability tests with folks you ve pulled out of your surrounding cubicles! Usually, you pay them Either advertise Or hire from a temp agency Sometimes you can find the right people elsewhere in the company But don t choose people involved in the system development Generally, you need to go where your expected users already are Page 25 Page 26 Groups and the User Pool Some systems are meant for many types of users E.g., Outlook is designed to handle email for experts and novices May be important to recruit subgroups representing all constituencies Practicalities usually limit you to 1-3 groups of participants And probably only a handful in each Developing a User Profile Write down general characteristics you need in test users Which ones are most relevant to this test? Which ones shouldn t vary across subjects? Which ones do you want to vary intentionally? Note similarity to defining factors and levels in performance testing group Page 27 Page 28 Getting Information On Participants Best to use a questionnaire Consistency in treating different participants a continuing theme, here Ask about all important aspects of required background Also useful later when evaluating test results How Big Will Your Test Be? 6-12 participants is typical Each one takes several hours With two or more test team members Testing usually done somewhat under the wire Results expected in small number of weeks, at most Generally don t have test personnel to run multiple tests simultaneously Page 29 Page 30 5
Selecting Tasks to Test Criteria for Selecting Tasks Most systems being tested are fairly complex They are capable of doing many things Doing each takes some time You have a limited amount of time for each participant So, which tasks do you test? Select tasks that probe potential usability problems Select tasks that past experience have shown are important Select tasks that real users are likely to perform often Select tasks where problems could be disastrous Page 31 Page 32 Determining Task Time Determining Other Task Resources Each task you choose will take some amount of participant time You probably have 2-8 hours of each participant s time Reflects on task selection Make best estimate of how long a task should take Also determine how long you think users would find maximum acceptable length What hardware will be required? Remote file access requires a second machine, e.g. What software will be required? Will the task require data? You ll probably need to create it, if it does List all resources for all tasks to be tested The general importance of writing everything down during planning can t be overstated Page 33 Page 34 The Final Task List Task Scenarios Sort of a script for each participant s experience An ordered list of tasks participants should perform Include all resources And any special instructions Participants respond better when given short scenarios Little stories telling them about the task to be done Must be: Short In user s terms, not developer s Unambiguous Inclusive of information needed to perform it Be directly linked to what you want to test Page 35 Page 36 6
An Example Scenario Delivering Scenarios to Participants Task is to create a folder to save groups of related email messages Possible scenario: You need to keep copies of messages related to your paychecks, like the one you just received. Create a folder for that purpose. Possible concern with this scenario: does the user know what you mean by folder? Common to set up a booklet for them to work with Each page contains one scenario Also possible to deliver scenarios via the computer Also possible to playact the scenarios Important that all participants get the same experience, though Page 37 Page 38 Measuring Usability Performance Measures A key problem for usability testing Likely to be different for different kinds of systems and products Two key types of metrics: Performance measures Subjective measures Think about what you re going to do with them before you start gathering them Count or time things How long did it take to do task 5? How many menus did the participant open before finding the right one? Did the participant select the wrong message? How many times before selecting the right one? Page 39 Page 40 Timing Tasks How long it takes to do each task is usually important Either measure internally in system Or (maybe better) use a human tester with a stopwatch Good idea to arrange for participant to pause after each task Put instructions to wait after task completion in participant s test booklet Subjective Measures Either quantitative or qualitative Can create scales On scale of 1 to 5, how much work was required to perform this task? Some useful information isn t quantifiable The user says this task was really hard Or you observe signs of obvious frustration Page 41 Page 42 7
How Do You Gather Measurements? Primarily by watching the users Can do some counting within software But some things happen outside the machine And software can t evaluate most subjective measurements Usability tests almost always involve a tested watching the participant Also can use questionnaires after test Gathering and Saving Data Much of data must be gathered by humans Vital that it be written down And not lost And format of data not forgotten Good planning helps here Inexperienced testers often find things move very fast Making it hard to write down all important observations Page 43 Page 44 Preparing Test Materials Legal Issues Usually should obtain legal consent Might also include non-disclosure agreement Usually need a pre-test questionnaire Might need questionnaires after some tasks Probably want post-test questionnaire A very serious matter if the US federal government funds you Don t attempt any test involving human subjects without knowing the rules Important even if they don t fund you Want to avoid lawsuits if problems arise Key issue is generally getting informed consent from participants And ensuring entire process is fully voluntary But best to talk to your lawyers first Page 45 Page 46 Scripting Everything Preparing for Problems Particularly important for tests involving human subjects Research has shown that test subjects respond to cues from test members Even unintentional or mistaken ones Maximize consistency in dealing with participants to minimize this effect Having as much as possible scripted helps System under test is often not complete It might crash or behave oddly Be prepared for resets and restarts Be prepared to move participants to another task if disaster strikes Have a plan if a participant doesn t show Or changes his mind about participating Page 47 Page 48 8
Test Teams Team Roles You usually need a few people There are around half a dozen team roles One person can fill two or three roles, sometimes Three people is a typical team size Test administrator Briefer Camera operator Data recorder Product expert Narrator Help desk operator Page 49 Page 50 Developers and the Test Usually the organization running the test developed the system Should the developers be involved? Yes, for some roles They tend not to be objective Not good for designing experiment Or interacting with participants But they have great expertise Useful for setting up system Vital for pinpointing problem areas Common to have them watch the test But important they don t interfere Page 51 The Pilot Test Experience test administrators assume they ve screwed up something They typically run a pilot test to find out what Preliminary complete run-through to find bugs and snags Not expected to produce usability results More likely to produce late nights of frantically fixing problems Before the real participants arrive Suggesting you should run pilot test sooner than the day before the first participant is scheduled Page 52 Running a Usability Test Caring for test participants Conducting the test Analyzing data and making recommendations Page 53 Caring for Test Participants Value of the test depends on the participants doing their part Experience suggests that proper treatment of participants makes big difference Improper treatment can bias results Make sure they understand the test is serious and important to you And you value and appreciate their participation Generally best to have one team member deal with most interactions Page 54 9
What s Proper Treatment? Having Participants Talk Through the Test Treat them with respect Treat them consistently Don t lead or influence them Expect them to be a little nervous Most people haven t done this before Make clear you re testing the system, not them Allow breaks whenever they want Allow them to stop the test if they want Often useful to encourage participants to vocally describe what they re doing Offers valuable insights into how they use the system And why they don t behave as expected Often requires gentle encouragement But must be careful not to bias results Page 55 Page 56 Possible Participant Issues Conducting the Test The participant isn t really qualified The participant changes his mind about participating The participant can t complete a task in the time budgeted The participant becomes extremely frustrated Equipment or software malfunction interrupts the test Organization and preparation are key Know exactly what you need to do Have everything ready Make sure everything works Page 57 Page 58 A Typical Test (1) A Typical Test (2) Prepare everything in advance The participant arrives Describe briefly what the test will involve Important not to bias participant here Answer questions Fill out legal paperwork and pretest questionnaire Give any necessary brief instructions on using system Should be minimal, since you re testing usability Give participant scenario list Run tests and gather data Page 59 Page 60 10
A Typical Test (3) After all tests completed, have participant fill out post-test questionnaire Thank and debrief participant Make sure all test results are properly marked and saved Test forms, electronically gathered test data, videotapes, whatever Reset testing environment for next participant Page 61 Gathering Observational Data Much interesting information must be gathered by a person Requires strong knowledge of test purpose and open mind Keep your attention on participant Write down anything interesting You ll forget it if you don t Don t think about fixing problems right now Remain detached Assume usability problems will be found Otherwise, we wouldn t need these tests Page 62 Dealing With the Data Usability test data is rather different than performance data Usually not subject to useful statistical analysis Too many variables to justify typical statistical assumptions Too few participants to wash those out Purpose of the data is to pinpoint problems Surprises and Outliers Look for tasks that took unexpectedly long Even if just for one or two users Look for tasks with high variability of the time taken Look for tasks where users made many errors Page 63 Page 64 Triangulating Your Data You typically have three kinds of data: The list of expected problems Augments by observed problems during tests The quantitative data you gathered during the test Participants comments from the post-test questionnaires and test team observations Look for intersections to find problems Page 65 Problem Scope Scope: how widespread is the problem throughout the system? Does it only occur for one type of action in the system? Or is it present on every menu/error message/icon? Don t solve global scope problems locally Unfortunately, they re usually expensive to solve globally But local solutions only band-aid the cases turned up in the tests Not how they ll pop up in the real deployment Page 66 11
Problem Severity How critical is the problem? Level 1 problems prevent completion of a task Level 2 problems create delay and frustration Level 3 problems are minor annoyances Level 4 problems are subtle issues often related to the need for future enhancements What Do You Do With The Results? Most often work with the development team You know what s wrong They know how the system works Work together to find the best solutions within the system (and schedule) scope For major products, usability testing may be iterative Page 67 Page 68 Should YOU Do Usability Tests? For Experts Only? Probably only if you work for a company Sometimes used to evaluate products you may buy Far more often to evaluate something you want to sell Not usually part of academic research Though there is a branch of research on how to run usability tests Evidence suggests that experts run much better usability tests than novices Usually, though, anything s better than nothing Don t allow unavailability of experts to prevent usability testing Learn what you can and do your best Some day, you might become the expert Page 69 Page 70 A Useful Resource A Practical Guide to Usability Testing, Joseph S. Dumas and Janice C. Redish Authors are highly experience usability testers Full of very practical advice Not worth getting if you re not going to do usability tests Probably worth its weight in gold if you are Page 71 12