University of Maryland. fzzj, basili, Empirical studies (Desurvire, 1994) (Jeries, Miller, USABILITY INSPECTION

Similar documents
Perspective-based Usability Inspection: An Empirical Validation of. Ecacy. University of Maryland. College Park, MD

Expert Reviews (1) Lecture 5-2: Usability Methods II. Usability Inspection Methods. Expert Reviews (2)

Cognitive Walkthrough

Learnability of software

NPTEL Computer Science and Engineering Human-Computer Interaction

Goals of Usability Evaluation

User Interface Evaluation

How to Conduct a Heuristic Evaluation

HCI Research Methods

Cognitive Walkthrough

Usefulness of Nonverbal Cues from Participants in Usability Testing Sessions

Heuristic evaluation is a usability inspection technique developed by Jakob Nielsen. The original set of heuristics was derived empirically from an

15/16 CSY2041 Quality and User-Centred Systems

iscreen Usability INTRODUCTION

EVALUATION OF THE USABILITY OF EDUCATIONAL WEB MEDIA: A CASE STUDY OF GROU.PS

Overview of Today s Lecture. Analytical Evaluation / Usability Testing. ex: find a book at Amazon.ca via search

Assistant Professor Computer Science. Introduction to Human-Computer Interaction

User Experience Report: Heuristic Evaluation

SFU CMPT week 11

cs465 principles of user interface design, implementation and evaluation

Student Usability Project Recommendations Define Information Architecture for Library Technology

Detection of Web-Site Usability Problems: Empirical Comparison of Two Testing Methods

THE USE OF PARTNERED USABILITY TESTING TO HELP TO IDENTIFY GAPS IN ONLINE WORK FLOW

Cognitive Walkthrough Evaluation

Towards a Pattern Based Usability Inspection Method for Industrial Practitioners

Introducing Evaluation

3d: Usability Testing Review

User Interface Evaluation

Analytical Evaluation

Heuristic evaluation of usability of public administration portal

MIT GSL week 4 Wednesday. User Interfaces II

Integrating Usability Design and Evaluation: Training Novice Evaluators in Usability Testing

Introducing Evaluation

HCI and Design SPRING 2016

evision Review Project - Engagement Simon McLean, Head of Web & IT Support Information & Data Services.

Assignment 5 is posted! Heuristic evaluation and AB testing. Heuristic Evaluation. Thursday: AB Testing

UI Evaluation: Cognitive Walkthrough. CS-E5220 User Interface Construction

Evaluation in Information Visualization. An Introduction to Information Visualization Techniques for Exploring Large Database. Jing Yang Fall 2005

The user action framework: a reliable foundation for usability engineering support tools

nding that simple gloss (i.e., word-by-word) translations allowed users to outperform a Naive Bayes classier [3]. In the other study, Ogden et al., ev

Analytical evaluation

Parametric and provisioning approaches are again obviously useful for various

Nektarios Kostaras, Mixalis Xenos. Hellenic Open University, School of Sciences & Technology, Patras, Greece

Amsterdam Medical Center Department of Medical Informatics. Improve. Usability evaluation of the sign up process of the Improve app

COGNITIVE WALKTHROUGH REPORT. The International Children s Digital Library

Usability Evaluation

Chapter 15: Analytical evaluation

CSCI 3160: User Interface Design

Usability. HCI - Human Computer Interaction

Usability of interactive systems: Current practices and challenges of its measurement

Interaction Design. Heuristic Evaluation & Cognitive Walkthrough

Design Heuristics and Evaluation

Our Three Usability Tests

The Pluralistic Usability Walk-Through Method S. Riihiaho Helsinki University of Technology P.O. Box 5400, FIN HUT

CS 160: Evaluation. Professor John Canny Spring /15/2006 1

User-centered design in technical communication

Heuristic Evaluation of Groupware. How to do Heuristic Evaluation of Groupware. Benefits

CS 160: Evaluation. Outline. Outline. Iterative Design. Preparing for a User Test. User Test

Methods: Deciding What To Design

Current Issues in the Determination of Usability Test Sample Size: How Many Users is Enough?

Additional reading for this lecture: Heuristic Evaluation by Jakob Nielsen. Read the first four bulleted articles, starting with How to conduct a

Web Accessibility for Older Readers: Effects of Font Type and Font Size on Skim Reading Webpages in Thai

EVALUATION OF PROTOTYPES USABILITY TESTING

CS 4317: Human-Computer Interaction

Monitoring the Usage of the ZEUS Analysis Grid

Human-Computer Interaction IS 4300

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907

Systems Analysis and Design

ACSD Evaluation Methods. and Usability Labs

Evaluation Types GOMS and KLM PRICPE. Evaluation 10/30/2013. Where we are in PRICPE: Analytical based on your head Empirical based on data

Evaluation Plan for Fearless Fitness Center Website

Visual Appeal vs. Usability: Which One Influences User Perceptions of a Website More?

Evaluation techniques 1

Evaluation techniques 1

evaluation techniques goals of evaluation evaluation by experts cisc3650 human-computer interaction spring 2012 lecture # II.1

Evaluation and Design Issues of Nordic DC Metadata Creation Tool

Interaction Design DECO1200

Ryan Parsons Chad Price Jia Reese Alex Vassallo

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley

Learning to Find Usability Problems in Internet Time

Folsom Library & RensSearch Usability Test Plan

Interaction Techniques. SWE 432, Fall 2017 Design and Implementation of Software for the Web

Usability Testing Experiences from the Facilitator s Perspective

Appendix A: Scenarios

CogSysIII Lecture 9: User Modeling with GOMS

UCD Method Collection Card-set

Analytical &! Empirical Evaluation

TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University

1. Select/view stores based on product type/category- 2. Select/view stores based on store name-

CSE 440: Introduction to HCI User Interface Design, Prototyping, and Evaluation

ITT Technical Institute. IT390 Business Database Administration Onsite Course SYLLABUS

EVALUATION OF PROTOTYPES USABILITY TESTING

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst

Carbon IQ User Centered Design Methods

esign an ser mterrace Evaluation TheOpen University DEBBIE STONE The Open University, UK CAROLINE JARRETT Effortmark Limited

Designing with Patterns: Possibilities and Pitfalls

BMVC 1996 doi: /c.10.41

Research Article Evaluation and Satisfaction Survey on the Interface Usability of Online Publishing Software

Memorandum Participants Method

Transcription:

AN EMPIRICAL STUDY OF PERSPECTIVE-BASED USABILITY INSPECTION Zhijun Zhang, Victor Basili, and Ben Shneiderman Department of Computer Science University of Maryland College Park, MD 20742, USA fzzj, basili, beng@cs.umd.edu Abstract Inspection is a fundamental means of achieving software usability. Past research showed that during usability inspection the success rate (percentage of problems detected) of each individual inspector was rather low. We developed perspective-based usability inspection, which divides the large variety of usability issues along dierent perspectives and focuses each inspection session on one perspective. We conducted a controlled experiment to study its eectiveness, using a post-test only control group experimental design, with 24 professionals as subjects. The control group used heuristic evaluation, which is the most popular technique for usability inspection. The experimental results are that 1) for usability problems covered by each perspective, the inspectors using that perspective had higher success rate than others 2) for all usability problems, perspective inspectors had higher average success rate than heuristic inspectors 3) for all usability problems, the union of three perspective inspectors (with one from each perspective) had higher average success rate than the union of three heuristic inspectors. INTRODUCTION Usability inspection (Nielsen & Mack, 1994) is an important approach to achieving usability. Dierent usability inspection techniques have been practiced, including heuristic evaluation, cognitive walkthrough, and pluralistic walkthrough, etc. (Nielsen & Mack). This work was supported by the National Science Foundation under grant NSF UMIACS-01-5-23394. Empirical studies (Desurvire, 1994) (Jeries, Miller, Wharton, & Uyeda, 1991) (Nielsen, 1992) showed that when using these techniques the percentage of usability problems detected by each inspector was rather low. We proposed the perspective-based usability inspection technique to improve the eectiveness of usability inspection. In this paper, we describe an empirical study conducted recently and its results. PERSPECTIVE-BASED USABILITY INSPECTION With the observation that it is dicult for inspectors to capture all dierent dimensions of usability issues at the same time, we propose perspective-based usability inspection, where each inspection session focuses on a subset of usability issues covered by one of several usability perspectives. Each perspective provides the inspector a point of view, a list of inspection questions that represent the usability issues to check, and a specic procedure for conducting the inspection. Our assumption is that with focused attention and a well-dened procedure, each inspection session can detect a higher percentage of the problems relating to the perspective used, and that the combination of dierent perspectives uncovers more problems than the combination of the same number of inspection sessions using the same and general inspection technique. This idea is supported by studies on defect-based reading (Porter, 1995) and perspective-based reading (Basili, 1996). These two studies showed that when inspecting software requirement documents, it is more eective to let each inspector focus on one class of de- 1

fects or inspect from one particular perspective than to let each inspector have the same and general responsibility. Currently we use the following usability perspectives: novice use, expert use, and error handling. \Error handling" has been separated from \novice use" and \expert use" situations, so that inspections on \novice use" and \expert use" will only deal with interactions along the correct paths. A unique procedure is used for inspections on \error handling". In developing the technique, we built a model for human-computer interaction by extending Norman's \seven stages of action" model (Norman, 1988) with error handling. A set of usability goals were dened for each perspective. The inspection questions for each perspective were derived by going through the model and see if the respective usability goals can be achieved based on the usability context, which includes the characteristics of the user interface, the users, the tasks, and the users' working environment. The abstract form of the human-computer interaction model is as follows: 1. Form the goal 2. Form the intention 3. Form the action 4. Execute the action 5. Perceive the feedback 6. Interpret the results 7. Understand the outcome 8. Deal with errors that may have occurred METHOD An experiment was conducted at a government organization with 24 professionals. We used a post-test only control group experimental design. The control group used heuristic evaluation. The experiment group was further divided into three sub-groups along the three perspectives. Each subject was assigned to use one technique to inspect two alternative interfaces of a Web-based Table 1: The experimental design Heuristic Novice Expert Error (12) (4) (4) (4) H/N, H/M, J 3 1 1 1 H/M, H/N, J 3 1 1 1 J, H/N, H/M 3 1 1 1 J, H/M, H/N 3 1 1 1 data collection form, namely interface H and J. Interface H had two variations, H/N and H/M. The subjects were randomized and assigned to dierent techniques and dierent interface orders. The layout of the experimental design is shown in Table 1, where the numbers indicate the number of subjects in each slot. The 24 subjects in the main study were familiar with both the interface domain and the task domain. They were either programmers, domain experts, technical researchers, or cognitive researchers. Eorts were made to evenly distribute participants of dierent backgrounds to dierent groups. Before the main study, we conducted a pilot study with 7 graduate students of Computer Science major to test out the instruments. We also asked two external usability experts to review the interfaces and report the usability problems they found. In the main study, each participant rst watched a video introduction of the project background and the inspection technique to be used. After signing the consent form, each inspector spent up to 100 minutes in one of the two \cognitive lab" rooms to conduct the inspection. All inspection sessions were observed from an adjacent room through one-way mirrors. The sessions were also videotaped, with two views: one of the computer screen and the other of the inspector's facial expression and upper-body movement. The usability heuristics used were: 1. Speak the users' language 2. Consistency 3. Minimize the users' memory load and fatigue 4. Flexibility and eciency of use 5. Use visually functional design 6. Design for easy navigation 7. Validation checks 2

8. Data entry 9. Provide sucient guidance Each heuristic had a detailed explanation about the related usability issues. For example, 8. Data entry: Easy to enter data. Data are visible and clearly displayed. Allow the users to change data previously entered. Easy to nd data already entered. Necessary entries are clearly dened. Entries are in correct format. For \novice use" perspective, inspectors were asked to think of novice users with a list of characteristics: being able to use a keyboard and a mouse, without visual disabilities, etc., which were dened based on the context of the application. Inspectors were given the description of the application and the user tasks. For each task, they were asked to think about whether a novice user would be able to choose the correct action, execute it successfully, and understand the outcome. They were provided with a list of detailed usability questions. For example, for data entry, Are formats for data entries indicated? For \expert use", inspectors were asked to think about expert users and check the interface for eciency, exibility, and consistency in supporting the user tasks. They were given a list of usability questions relating to these issues. For example, for data entry, Are possible short-cuts (e.g. using the Tab key to switch to the next eld) available? Are possible default values used? For \error handling", inspectors were given a classication of user errors. They were also given the characteristics of the users as the \novice use" inspectors were. For each user task, inspectors were asked to list the possible user errors and check the following questions for each user error: Does the user interface prevent the error as much as possible? Does the user interface minimize the side effects the error may cause? When the error occurs, will the user realize the error immediately and understand the nature of the error from the response of the user interface? When the error occurs, does the user interface provide guidance for error recovery? RESULTS Altogether 82 problems were detected for interface H, 61 for interface J. These problems were collectively identied by the 24 experiment subjects, 7 pilot subjects, and 2 external expert reviewers. The results for interface H were the combined results for H/M and H/N. A large portion of the problems (33 for H and 20 for J)were specic to the application domain of form design, and not covered by the usability heuristics or the usability questions for the perspectives. Since they would have negative inuence on the use of the forms, they were counted as usability problems. The performance of the 24 experiment subjects are presented and discussed as follows. The primary independent variable was the inspection technique. But another independent variable, the interface order, was confounded in the experimental design. An ANOVA test failed to reveal a signicant main eect by the interface order or its interaction with the inspection technique. The following statistics test the eect of inspection technique as the only source of dierences. The dependent variables include the number of all usability problems identied by each individual inspector and the number of certain types of usability problems identied by each individual inspector. The comparison of the individual detection eectiveness between the two technique groups is given in Table 2. It shows the mean and standard deviation (in parentheses) of the percentage of problems found by each individual inspector in each technique group. The improvement of perspective-based technique over heuristic evaluation is about 30% on the average. The p-values were the results of the t-test. 3

Table 2: Individual eectiveness (overall) Heur.(%) Pers.(%) Improve(%) p-value H 8.4(3.8) 10.4(3.4) 23.8 0.094 J 9.4(6.4) 13.4(6.0) 42.5 0.064 Both 9.1(3.8) 11.9(3.6) 31.2 0.037 Table 3: Individual eectiveness on dierent types of problems Category Heur.(12)(%) Pers.(4)(%) p-value H Novice 8.0(6.6) 18.5(9.0) 0.012 Expert 15.9(10.3) 20.5(8.7) 0.22 Error 14.3(12.2) 33.9(6.8) 0.0046 J Novice 11.7(9.1) 26.3(11.1) 0.010 Expert 14.9(11.1) 28.0(18.5) 0.052 Error 9.0(10.9) 29.3(13.5) 0.0043 Table 3 shows the individual detection eectiveness with respect to usability problems categorized by the three perspectives. For usability problems related to each perspective, the average percentage of such problems detected by the 4 inspectors using that perspective is compared against the average percentage by the 12 heuristic inspectors. The standard deviations are in parentheses. It shows that the use of the \novice use" and \error handling" perspectives greatly improved the inspector's detection eectiveness for problems within the category. The use of \expert use" perspective brought about improvement to a lesser extent, possibly because that the inspectors themselves were all experts in the application domain and user interface domain. Thus they were able to capture a large portion of the \expert use" problems even without help from the \expert use" perspective. Although all inspectors conducted the inspection individually, wewere interested in comparing the aggregated results of multiple inspectors. For example, we compared the number of unique problems identi- ed by 3 perspective inspectors (one from each of the three perspectives) and 3 heuristic inspectors. The results are shown in Table 4. There were 220 positive aggregations for heuristic evaluation and 64 for perspective technique. Since the data points in each group were not independent from each other, no statistical test was performed. We also did a permutation test (Edington, 1987) of simulated 12-person teams. This involves con- Table 4: Aggregated problems found by 3 inspectors Tech. Problems(%) Improve(%) H Heuristic 21.8(5.0) Perspective 27.7(4.4) 26.5 J Heuristic 24.1(7.2) Perspective 32.8(7.4) 35.7 Table 5: Permutation tests for team performance #Possible teams Rank of pers. team p-value H 2,704,156 262,577 0.097 J 2,704,156 122,993 0.045 structing all possible 12-person teams and see how the un-diluted perspective team ranked among all possible 12-person teams in terms of number of unique problems detected. Whether or not we can claim that the perspective-based technique had a benecial eect on team performance depends on how the un-diluted perspective team (with all 12 perspective inspectors) appears towards the top of the ranking. The p-value is the rank of the un-diluted team divided by the total number of teams. There were 2,704,156 possible 12-person teams out of the 24 subjects. The results of this test are given in Table 5. It shows that at p<0:10 level, the perspective-based inspection technique signicantly improved the eectiveness of an inspection team. DISCUSSION AND FUTURE DIRECTIONS The experimental results showed about 30% improvement at both the individual level and for the aggregation of 3 inspectors. As we had expected, the perspective inspectors found much more usability problems covered by the assigned perspective, as shown in Table 3. Furthermore, the average number of all the problems found by each perspective inspector was also higher than that of each heuristic inspector. In summary, the focused attention enabled inspectors to nd much more problems of certain categories without suering the total number of problems found in general. To generalize the results, the following issues need to be considered: Generating specic inspection questions. To ap- 4

ply perspective-based usability inspection eectively, the generic inspection questions for each perspective need to be instantiated to the specic context of the project. In the experiment, this was done by the experimenter. One question that remains is whether a usability practitioner can follow the provided guideline to develop the specic inspection questions that are equally effective. Domain experts vs. usability experts. The subjects in this experiment were all experts in the application domain, with some knowledge in usability. We need to know how the technique works for inspectors who are usability experts with some knowledge in the application domain, as well as for inspectors who are experts in both usability and the application domain. Inspection time. In the experiment, each inspector was given a time limit of 100 minutes to inspect the two interfaces. Although most participants nished inspection within the time limit, there was one case where a perspective inspector said that given the time limit she was not able to follow the technique very well. In some studies, inspectors were asked to conduct the inspection, besides doing their daily work, within two weeks. It would be interesting to test how manymore problems the inspectors can detect when they are given more time. Experience with the technique. In this experiment, both techniques were new to the subjects. We would like toknowhow the learning curve is like for each technique. We plan to conduct more empirical studies to address some of these issues. A lab package is being built to facilitate replications of the experiment by other researchers. We also plan to build an application package so that practitioners can learn and use the technique and provide some feedback that may address the other issues mentioned above. We thank all the subjects for their participation. We thank Pawan Vora of US WEST and Chris North of University of Maryland for their expert reviews of the interfaces. References Basili, V., Green, S., Laitenberger, O., Shull, F., Sorumgard, S., & Zelkowitz, M. (1996). The empirical investigation of perspective-based reading. Empirical Software Engineering, 1, 133-164. Desurvire, H. W. (1994). Faster, Cheaper!! Are Usability Inspection Methods as Eective as Empirical Testing? In J. Nielsen & R. L. Mack (Eds.), Usability Inspection Methods (pp. 173-202). New York, NY: John Wiley & Sons. Edington, E. S. (1987). Randomization Tests. New York, NY: Marcel Dekker. Jeries, R., Miller, J. R., Wharton, C., and Uyeda, K. M. (1991, April). User interface evaluation in the real world: A comparison of four methods. CHI'91 Conference Proceedings. 261-266. Nielsen, J. (1992, May). Finding usability problems through heuristic evaluation. CHI'92 Conference Proceedings. 373-380. Nielsen, J., & Mack, R. L. (Eds.). (1994). Usability Inspection Methods. New York, NY: John Wiley & Sons. Norman, D. (1988). The Design of Everyday Things. New York, NY: Basic Books. Porter, A., Votta, L., Basili, V. (1995, June). Comparing Detection Methods for Software Requirements Inspections: A Replicated Experiment. IEEE Transactions of Software Engineering, 21, 563-575. Acknowledgements We thank Elizabeth Nichols, Barbara Sedivi, Nancy VanDerveer, Ellen Soper, and Al Paez of the US Bureau of the Census for their cooperation in the study. 5