Usability Testing November 14, 2016
Announcements Wednesday: HCI in industry VW: December 1 (no matter what) 2
Questions? 3
Today Usability testing Data collection and analysis 4
Usability test A usability test is a formal method for evaluating whether a design is learnable, efficient, memorable, can reduce errors, meets users expectations, etc. users are not being evaluated the design is being evaluated 5
Usability test Rough Outline Bring in real users Have them complete tasks with your design, while you watch (ideally with your entire team) Measure and record things task completion, task time, error rates satisfaction, problem points, etc. use a think-aloud protocol, so you can hear what they are thinking 6
Usability test Rough outline Use the data to identify problems (major ones minor ones) provide design suggestions to design/engineering team iterate on the design, repeat 7
Important Considerations Usually takes place in a usability lab or other controlled space Major emphasis is on selecting representative users developing representative tasks 5-10 users typically selected Tasks usually last no more than 30 minutes The test conditions should be the same for every participant Informed consent form explains ethical issues 8
Case Study: Testing MEDLINEplus Five tasks were developed Wanted to check categorization and navigation support Task 1: Find information about whether a dark bump on your shoulder might be skin cancer Task 2: Find information about whether its safe to use Prozac during pregnancy Task 3: Find information about whether there is a vaccine for hepatitis C Task 4: Find recommendations about the treatment of breast cancer Task 5: Find information about the dangers associated with drinking alcohol during pregnancy 9
Creating tasks A task is designed to probe a problem Tasks should be straightforward and require the user to find certain items, or do certain operations They can be more complex such as solving particular problems Sample tasks for a weather network web site: What is the forecasted weather for Winnipeg? What is air quality in Los Angeles today? What is the level of humidity in Winnipeg? What is the forecast for Ottawa for the upcoming weekend? 10
Case Study: Testing MEDLINEplus Selection of participants 9 participants from health care practices in DC area 7 Female, 2 Male 11
How many participants is enough for usability testing? The number is largely a practical issue Depends on: schedule for testing availability of participants cost of running tests Typical 5-10 participants Some experts argue that testing should continue until no new insights are gained 12
Activity You are developing a user test for a new CS web page. Identify 6 tasks for the test: Task 1: Identify the instructor for Comp 3020 Task 2: Find the e-mail address of the Comp 3020 prof Task 3: Find the admission requirements for the M.Sc. Program Task 4: Find out the first day of classes next term Task 5: Locate the requirements for being a Co-op student Task 6: Identify whether the graduate Graphics course is a fundamentals course 13
Activity You are developing a user test for a new CS web page. Who are your participants: Students (CS, or interested) Faculty? 14
Data Analysis Qualitative data Collected from interviews, some types of questionnaires, observation notes Interpreted & used for telling a story about what was observed (difficult!) Quantitative data Collected from interaction & video logs Presented as values, tables, charts, graphs and treated statistically (safe!) Fall 2016 COMP 3020 15
Making Sense of Your Data Affinity diagrams 16
Making Sense of Your Data discussion with others who watched with you 17
Elements of Usability Testing Identify practical issues select typical users make sure you have appropriate representation i.e. e-recipe primarily for families but 90% sample are single people Identify practical issues prepare testing conditions Lab preferably Identify practical issues plan to run tests Have scripts in place Test equipment Have recording material prepared Deal with ethical issues Consent form 18
Elements of Usability Testing Evaluate, analyze, and present data Report on times to complete task, number of errors Provide simple statistical measures: mean, median, std dev. Describe interaction patterns e.g., four ways that people may use the interface 19
Usability Testing: Presenting the Results Rank issues in terms of severity Not only a list of problems and issues! Provide small suggestions on how to address Provide evidence (video, quotes, examples) of people encountering issues ITERATE ON THE DESIGN!!?!?!? 20
More on data collection
Questionnaires Earlier in the term we discussed questionnaire design for gathering requirements Most user satisfaction questionnaires consist primarily of closed questions Participants encouraged to leave their comments in space provided on the page, or in the margins More on designing closed questions 22
Question and response format Likert scales Likert scales are used for measuring opinions, attitudes, beliefs E.g., Evaluating color on a web site can have the following forms The use of color is excellent: (where 1 represents strongly disagree and 5 represents strongly agree) 1 2 3 4 5 The use of color is excellent: strongly disagree ok agree strongly disagree agree 23
Question and response format Likert scales Steps for designing Likert scales: Gather a pool of short statements about the features of the product that are to be evaluated Divide the items into groups containing the same amount of positive and negative statements Create logical/conceptual groups Decide on the scale (5-point/3-point/9-point) Select items for the final questionnaire and reword as necessary 24
Odd/Even Likert scales response options If possible to have 'neutral' response, then use odd number of options (central = neutral place) If judging something is good/bad, male/female then look at two response options Even numbers 'force' respondents in one way or another end up with random responses between middle items How wide (1 to 3, 1 to 5, or even 1 to 12?) How will the majority distinguish between the different levels If majority fairly uninformed about the topic, then use small number If dealing with experts, then you can use a much larger set 25
Anchors Anchors are the verbal comments above the numbers ('strongly agree', etc.) How many to include? In factual statements (or smaller scales) considered good to use anchors above all options will give you accurate results News: Daily Weekly Monthly Never Larger scales Helpful to indicate the central (neutral) point if meaningful, having numerous anchors may not be so important The content in the website is clear (1-10): 1 (strongly disagree) 5 (neutral) 10 (strongly agree) 26
Guidelines for questionnaire design See notes on from earlier in the term (recall) Conciseness: questions should be clear and specific e.g. should the system include a users manual? (YES/NO) Closed questions: when possible ask closed questions and offer a range of answers e.g. How often do you print checks? (1: very often 5: never) Alternate option: Consider including a no-opinion option for questions that seek opinions e.g. the payroll module is essential ( N/A) Order: think about the ordering of questions. General questions should precede specific ones e.g. a question about a specific feature say in a payroll module should come after asking whether the payroll module is essential 27
Guidelines for questionnaire design Break up multiple questions: Avoid complex multiple questions e.g. is the payroll system and attendance manager efficient? Proper scales: when scales are used make sure the range is appropriate and do not overlap e.g. 10 30, 31 40,. Language: avoid jargon e.g. should the display be based on bezier curves? Instructions: provide clear instructions on how to complete the questionnaire e.g. please rate the performance of the following items Compactness: a balance must be struck between white space and the need to keep the questionnaire as compact as possible 28
factual Designing questionnaires Participant #: Please circle the most appropriate selection Age Range: 21-29 30-44 45-60 Gender: Male Female Different typeface used for instructions on the questionnaire Internet/Web Experience News Daily Weekly Monthly Never Research, Information gathering Daily Weekly Monthly Never Top stories usage Daily Weekly Monthly Never Please rate (i.e. check the box) agreement or disagreement with the following statements Question The navigation on the links is clear The website contains information that is useful to me Strongly Agree Agree Neutral Disagree Strongly Disagree 29
Analyzing questionnaire data Helps to think of analysis of questionnaire even before its design Present results clearly - tables can be used for proper structure Simple statistics can say a lot, e.g., mean, median, mode, standard deviation Percentages are useful but give population size Bar graphs show categorical data well More advanced statistics can be used if needed 30
Observing People The majority of evaluations with users involve some form of observation Simple form of observation: user is given a set of tasks, and the evaluator simply watches the user So...? what do you watch? what do you do? what do you record? 31
Think-Aloud Gives insight into what the user is thinking Awkward/uncomfortable for subject May alter the way people perform their task Hard to talk when they are concentrating User s personality may not align with thinking aloud Fall 2016 COMP 3020 32
Participatory Observation (co-discovery learning) Main idea: remove the awkwardness of think-aloud Two people sit down to complete tasks Only one person is allowed to touch the interface Variation: use a semi-knowledgeable coach and a novice (only the novice gets to touch the design) Creates a natural social situation Novice subject asking questions Semi-knowledgeable coach giving little feedback but not much The activity provides insights into thinking process of both subjects Fall 2016 COMP 3020 33
Indirect tracking of activities Direct observation can be obtrusive or impossible Alternatives: Interaction logging: Recording key presses, mouse buttons, interface changes Difficulty: need to correlate specific action with the appropriate tasks and meaning (hard) Diaries / experience sampling What users did, when they did it, and what they thought about their interactions Provide templates for users to fill in Fall 2016 COMP 3020 34
Observations: Obtrusive vs. Unobtrusive How people behave and how they explain are different, e.g., as with LOOK vs ASK Observation techniques can range from being unobtrusive to obtrusive Unobtrusive: Observe test users but refrain from interacting with them; want to avoid influencing or encouraging questions Obtrusive: Fall 2016 interact with users by asking questions, explain design decisions, engage user in a discussion COMP 3020 35