Evaluation and Testing. Andreas Butz, LMU Media Informatics slides partially taken from MMI class

Similar documents
Interaction Design. Chapter 7 (June 22nd, 2011, 9am-12pm): Evaluation and Testing

8.1 Goals of Evaluation 8.2 Analytic Evaluation 8.3 Empirical Evaluation 8.4 Comparing and Choosing Evaluation Techniques

8.1 Goals of Evaluation 8.2 Analytic Evaluation 8.3 Empirical Evaluation 8.4 Comparing and Choosing Evaluation Techniques

Addition about Prototypes

Goals of Usability Evaluation

evaluation techniques goals of evaluation evaluation by experts cisc3650 human-computer interaction spring 2012 lecture # II.1

Overview of Today s Lecture. Analytical Evaluation / Usability Testing. ex: find a book at Amazon.ca via search

Introducing Evaluation

Übung zur Vorlesung Mensch-Maschine-Interaktion. e5: Heuristic Evaluation

Analytical Evaluation

CS 160: Evaluation. Professor John Canny Spring /15/2006 1

CS 160: Evaluation. Outline. Outline. Iterative Design. Preparing for a User Test. User Test

Evaluation in Information Visualization. An Introduction to Information Visualization Techniques for Exploring Large Database. Jing Yang Fall 2005

15/16 CSY2041 Quality and User-Centred Systems

Chapter 15: Analytical evaluation

Analytical evaluation

Interaction Design. Heuristic Evaluation & Cognitive Walkthrough

Nektarios Kostaras, Mixalis Xenos. Hellenic Open University, School of Sciences & Technology, Patras, Greece

SFU CMPT week 11

Assistant Professor Computer Science. Introduction to Human-Computer Interaction

Assignment 5 is posted! Heuristic evaluation and AB testing. Heuristic Evaluation. Thursday: AB Testing

3 Evaluating Interactive Systems

Design Heuristics and Evaluation

MMI 2. Tutorials. Winter Term 2017/18. LMU München - LFE Medieninformatik. Prof. Andreas Butz Renate Häuslschmid Christina Schneegaß

User Interface Evaluation

Usability Evaluation Lab classes

evision Review Project - Engagement Simon McLean, Head of Web & IT Support Information & Data Services.

Interaction Design. Chapter 7 May 28th, 2014, 9am-12pm): Prototyping UX - From Sketch to Prototype

IPM 10/11 T1.6 Discount Evaluation Methods

Engineering Human Computer Interaction

Interaction Design. Chapter 3 (May 11, 2017, 9am-12pm): Approaches to IxD Recap Session (Previous lecture): Process Models, Elements and Usability

CSE 440: Introduction to HCI User Interface Design, Prototyping, and Evaluation

CS415 Human Computer Interaction

CS415 Human Computer Interaction

Analytical &! Empirical Evaluation

HCI and Design SPRING 2016

Interaction Design. Recap Session (Previous lecture April 21, 2016, 9am-12pm): Process Models, Elements and Usability

Page 1. Ideas to windows. Lecture 7: Prototyping & Evaluation. Levels of prototyping. Progressive refinement

Heuristic evaluation is a usability inspection technique developed by Jakob Nielsen. The original set of heuristics was derived empirically from an

3 Prototyping and Iterative Evaluations

CSE 440: Introduction to HCI User Interface Design, Prototyping, and Evaluation

Designing Usable Apps

Concepts of Usability. Usability Testing. Usability concept ISO/IS What is context? What is context? What is usability? How to measure it?

User Interface Evaluation Methods

Vorlesung Mensch-Maschine-Interaktion. Methods & Tools

User-Centered Design. Jeff Bos, Design Insights BlackBerry

Introducing Evaluation

Human-Computer Interaction: An Overview. CS2190 Spring 2010

Applying Usability to elearning

cs465 principles of user interface design, implementation and evaluation

User Centered Design - Maximising the Use of Portal

User-centered design in technical communication

CS 315 Intro to Human Computer Interaction (HCI)

User-Centered Design. SWE 432, Fall 2017 Design and Implementation of Software for the Web

Introducing Evaluation

Usability & User Centered Design. SWE 432, Fall 2018 Design and Implementation of Software for the Web

A Heuristic Evaluation of Ohiosci.org

3. Total estimated time (hours/semester of didactic activities) 3.1 Hours per week 4 Of which: 3.2 course seminar/laboratory

CogSysIII Lecture 9: User Modeling with GOMS

Usability Inspection Methods. Overview. Usability Measures. SMD157 Human-Computer Interaction Fall 2003

Amsterdam Medical Center Department of Medical Informatics. Improve. Usability evaluation of the sign up process of the Improve app

Evaluation Types GOMS and KLM PRICPE. Evaluation 10/30/2013. Where we are in PRICPE: Analytical based on your head Empirical based on data

Interaction Design. Ruben Kruiper

Additional reading for this lecture: Heuristic Evaluation by Jakob Nielsen. Read the first four bulleted articles, starting with How to conduct a

Implementing Games User Research Processes Throughout Development: Beyond Playtesting

Heuristic Evaluation. Hall of Fame or Shame? Hall of Fame or Shame? Hall of Fame! Heuristic Evaluation

Software Quality. Martin Glinz. Thomas Fritz. Lecture 7 UI Design, Usability & Testing. Many thanks to Meghan Allen and Daniel Greenblatt.

Evaluation studies: From controlled to natural settings

Interfaces Homme-Machine

USER INTERFACE DESIGN + PROTOTYPING + EVALUATION. Heuristic Evaluation. Prof. James A. Landay University of Washington CSE 440

Programmiersprache C++ Winter 2005 Operator overloading (48)

EVALUATION OF PROTOTYPES USABILITY TESTING

Expert Reviews (1) Lecture 5-2: Usability Methods II. Usability Inspection Methods. Expert Reviews (2)

Foundation Level Syllabus Usability Tester Sample Exam

Learnability of software

User Interface Evaluation

UC Irvine Law Library Website Usability Project Initial Presentation

Chapter 4. Evaluating Interface Designs

..in a nutshell. credit: Chris Hundhausen Associate Professor, EECS Director, HELP Lab

iscreen Usability INTRODUCTION

Methods: Deciding What To Design

MIT GSL week 4 Wednesday. User Interfaces II

SEPA diary Heuristic evaluation

Heuristic Evaluation. Heuristic evaluation evaluates the interface to identify usability problems against recognized usability design heuristics.

Sri Vidya College of Engineering & Technology Question bank (unit 2) UNIT 2 TWO MARKS

A new interaction evaluation framework for digital libraries

CMSC434 Intro to Human-Computer Interaction. Visual Design #3 and Evaluation #1 Monday, April 8th, 2012 Instructor: Jon Froehlich TA: Kotaro Hara

Usability and User Interface Design

1. Select/view stores based on product type/category- 2. Select/view stores based on store name-

UI Evaluation: Cognitive Walkthrough. CS-E5220 User Interface Construction

User Experience Design I (Interaction Design)

Prototyping UX. From Sketch to Prototype. Alexander Wiethoff Ludwig-Maximilians University of Munich (LMU)

Towards Systematic Usability Verification

Jakob Nielsen s Heuristics (

Interaction Design DECO1200

MTAT : Software Testing

Heuristic Evaluation! Hall of Fame or Shame?! Hall of Fame or Shame?! Hall of Fame or Shame?! Hall of Fame!!

Mensch-Maschine-Interaktion 1

HEURISTIC EVALUATION WHY AND HOW

CSCI 3160: User Interface Design

Transcription:

Evaluation and Testing Andreas Butz, LMU Media Informatics butz@lmu.de slides partially taken from MMI class 1

What and when to evaluate Original title: Evaluation and user testing...we ll talk about testing products, not users, right? general mindset: accept the user as the reference! Evaluation somehow sounds as if it only happens at the end evaluation methods can be used throughout the entire development process! Tests Development Tests Process Tests Test LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 2 /38

The user as the ultima ratio... Donald Norman LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 3 /38

What can be evaluated? The usability of a system! it depends on the stage of a project Ideas and concepts Designs (paper and functional) Prototypes Implementations Products in use it also depends on the goals Approaches: Formative vs. summative evaluation Analytical vs. empirical evaluation Qualitative vs. quantitative results LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 4 /38

Formative vs. Summative Evaluation Formative: what and how to (re)design Design Construction Summative: how did we do? M. Scriven: The methodology of evaluation, 1967 LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 5 /38

Analytical vs. Empirical Evaluation Scriven, 1967: If you want to evaluate a tool, say an axe, you might study the design of the bit, the weight distribution, the steel alloy used, the grade of hickory in the handle, etc., or you may just study the kind and speed of the cuts it makes in the hands of a good axeman. LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 6 /38

Empirical and Analytic Methods are Complementary (not complimentary ;-) Empirical evaluation produces facts which need to be interpreted If the axe does not cut well, what do we have to change? Analytic evaluation identifies the crucial characteristics Analytical evaluation produces facts which need to be interpreted Why does the axe have a special-shaped handle? Empirical evaluation helps to understand the context for object properties LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 7 /38

Agenda for this class Intro & motivation Analytical Formative Cognitive walkthrough Summative Heuristic evaluation Empirical Prototype user study Controlled experiment Usability lab test Field studies Discussion and take-home thoughts LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 8 /38

Agenda for this class Intro & motivation Analytical Formative Cognitive walkthrough Summative Heuristic evaluation Empirical Prototype user study Controlled experiment Usability lab test Field studies Discussion and take-home thoughts LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 9 /38

Cognitive Walkthrough One or more evaluators going through a set of tasks Evaluating understandability and ease of learning Procedure: Defining the input: Who will be the users of the system? What task(s) will be analyzed? What is the correct action sequence for each task? How is the interface defined? During the walkthrough: Will the users try to achieve the right effect? Will the user notice that the correct action is available? Will the user associate the correct action with the effect to be achieved? If the correct action is performed, will the user see that progress is being made toward solution of the task? From www.usabilityhome.com LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 10 /38

Agenda for this class Intro & motivation Analytical Formative Cognitive walkthrough Summative Heuristic evaluation Empirical Prototype user study Controlled experiment Usability lab test Field studies Discussion and take-home thoughts LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 11 /38

Heuristic Evaluation Heuristic evaluation is a discount usability inspection method Quick, cheap and easy evaluation of UI design http://www.useit.com/papers/heuristic/ Basic Idea: Small set of evaluators examine the interface and judge its compliance with recognized usability principles (the "heuristics"). Either just by inspection or by scenario-based walkthrough Critical issues list, weighted by severity grade Opinions of evaluators are consolidated into one report Implicit assumptions: There is a fixed list of desirable properties of UIs (the heuristics ) They can be checked by experts with a clear and defined result Jakob Nielsen LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 12 /38

Ten Usability Heuristics (Nielsen) Visibility of system status Match between system and the real world User control and freedom Consistency and standards Error prevention Recognition rather than recall Flexibility and efficiency of use Aesthetic and minimalist design Help users recognize, diagnose, and recover from errors Help and documentation LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 13 /38

Detailed Checklist Example http://www.stcsig.org/usability/topics/articles/he-checklist.html LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 14 /38

Problems with Inspection Methods Validity of the findings: Usability checklists and inspections can produce rapid feedback, but may call attention to problems that are infrequent or atypical in real world use. (Rosson/Carroll) Usage context for inspection Selection of scenarios, or decision not to use scenarios, may influence results heavily Systematic contribution to the discipline of usability engineering? Heuristic evaluation relies very much on creativity and experience of the evaluators How to save and reuse the knowledge available in the heads of expert evaluators? LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 15 /38

Agenda for this class Intro & motivation Analytical Formative Cognitive walkthrough Summative Heuristic evaluation Empirical Prototype user study Controlled experiment Usability lab test Field studies Discussion and take-home thoughts LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 16 /38

Controlled Experiments Answering specific additional, often quantitative, questions Performance Satisfaction Providing basic knowledge generic to many applications Comparing input/output devices Comparing general design strategies Basic idea: Selected participants carry out well-defined tasks Specific values (variables) are measured and compared Principal experiment designs: Within-subjects design: Same participant exposed to all test conditions Between-subjects design: Independent groups of participants for each test condition http://01.edu-cdn.com/files/static/wiley/9780471419204/controlled_experiments_01.gif LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 17 /38

Variables in Experiment Design Variables are manipulated and measured Independent variables are manipulated Dependent variables are measured The conditions of the experiment are set by independent variables E.g. number of items in a list, text size, font, color The number of different values used is called level The number of experimental conditions is the product of the levels E.g., font can be times or arial (2 levels), background can be blue, green, or white (3 levels). This results in 6 experimental conditions (times on blue, times, on green,, arial on white) The dependent variables are the values that can be measured Objective values: e.g. time to complete a task, number of errors, etc. Subjective values: ease of use, preferred option They should only be dependent on changes of the independent variables LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 18 /38

Hypotheses Prediction of the result of an experiment Stating how a change in the independent variables will affect the measured dependent variables With the experiment it can be tested whether the hypothesis is correct Usual approach Stating a null-hypothesis (predicts that there is no effect) Carrying out the experiment and using statistical measures to disprove the null-hypothesis When a statistical test shows a significant difference it is probable that the effect is not random Carefully apply statistical significance tests (see statistics lecture!) LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 19 /38

Example: Study on Text Input Is text input on a keyboard really better than using T9 on a phone? Compare text input speed and errors made Qwertz-keyboard on a notebook computer T9 on a mobile phone Concentrate on test input only, ignore: Time to setup / boot / initialize the device Time to get into the application LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 20 /38

Example: Study on Text Input Participants How many? Skills Computer user? Phone/T9 users? Independent variables Input method, 2 levels: Keyboard and T9 Text to input 1 level: text with about 10 words Dependent variables Time to input the text Number of errors made LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 21 /38

Example: Study on Text Input Experimental conditions 2 conditions T9 and Key Order of conditions is counterbalanced User 1,3,5,7,9 perform T9 then Key User 2,4,6,8,10 perform Key then T9 Different texts in first and second run? Particular phone model? Completion time is measured (e.g. stop watch or application) Number of errors/corrections is observed LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 22 /38

Example: Study on Text Input Hypotheses H-1: Input by keyboard is quicker than T9 H-2: fewer errors are made using keyboard input compared to T9 Null-Hypotheses Assumes no effect H0-1: there is no difference in the input speed between keyboard and T9 H0-2: there is no difference in the number of errors made using a keyboard input compared to T9 LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 23 /38

Comparing Values Significant differences between measurements? frequency value frequency mean A mean B value mean A mean B LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 24 /38

Significance In statistics, a result is called significant if it is unlikely to have occurred by chance. It does not mean that the result is of practical significance! A statistically significant speed difference of 0.1% between two text-entry methods may have little practical importance. In the case of hypothesis testing the significance level is the probability that the null hypothesis ('no correlation') will be rejected in error although it is true. Popular levels of significance are 5%, 1% and 0.1% LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 25 /38

t-test The t-test is a test of the null hypothesis that the means of two normally distributed populations are equal. The t-test gives the probability that both populations have the same mean (and thus their differences are due to random noise). A result of 0.05 from a t-test is a 5% chance for the same mean. Different variants of the t-test are used for paired (each sample in population A has a counterpart in population B) and unpaired samples. Examples: Paired: speed of persons before and after treatment (within subjects design) Unpaired: the reading speed of two different groups of people are compared (between subjects design) Student [William Sealy Gosset] (March 1908). "The probable error of a mean". Biometrika 6 (1): 1 25. LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 26 /38

Excel: t-test Real data from a user study Excel functions used: ( MITTELWERT(C4:C13 = ( TTEST(C4:C13;D4:D13;2;1 = ( localized (function names are Menu: Tools>Data Analysis TTEST( ) Parameters: l Data row 1 l Data row 2 l Ends (1 or ( 2 (ususally 2) l ( variance Type (1=paired, 2=same variance, 3=different LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 27 /38

Usability Laboratory Specifically constructed testing room Instrumented with data collection devices (e.g. microphones, cameras) Separate observation room Usually connected to testing room by one-way mirror and audio system Data recording and analysis Test users perform prepared scenarios Think aloud technique Problem: Very artificial setting No communication Source: www.xperienceconsulting.com LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 28 /38

Poor Man s Usability Lab Goal: Integrate multiple views Capture screen with pointer View of the person interacting with the system View of the environment Test system Subjects screen Setup: Computer for the test user, run application to test export the screen (e.g., via VNC) Computer for the observer See the screen of the subject Attach 2 web cams (face and entire user) Display them on the observer s screen Have an editor for the observer s notes Capture this screen (e.g. QT, Camtasia) Discuss with the user afterwards Why did you do this? What did you try here?. Subjects screen Editor Observer system Cam1 Cam2 LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 29 /38 time

Screen video LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 30 /38

Field Studies Normal activities are studied in normal environment Advantages: Can reveal results on user acceptance Allows longitudinal studies, including learning and adaptation Problems: In general very expensive Highly reliable product (prototype, mockup) needed How to get observations? Collecting usage data Collecting incident stories On-line feedback Retrospective interviews, questionnaires LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 31 /38

Agenda for this class Intro & motivation Analytical Formative Cognitive walkthrough Summative Heuristic evaluation Empirical Prototype user study Controlled experiment Usability lab test Field studies Discussion and take-home thoughts LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 32 /38

Paper Prototype Study LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 33 /38

Agenda for this class Intro & motivation Analytical Formative Cognitive walkthrough Summative Heuristic evaluation Empirical Prototype user study Controlled experiment Usability lab test Field studies Discussion and take-home thoughts LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 34 /38

References Alan Dix, Janet Finlay, Gregory Abowd and Russell Beale: Human Computer Interaction (third edition), Prentice Hall 2003 Mary Beth Rosson, John M. Carroll: Usability Engineering. Morgan-Kaufman 2002. Chapter 7 Discount Usability Engineering http://www.useit.com/papers/guerrilla_hci.html Heuristic Evaluation http://www.useit.com/papers/heuristic/ Further Literature Andy Field & Graham Hole: How to design and report experiments, Sage Jürgen Bortz: Statistik für Sozialwissenschaftler, Springer Christel Weiß: Basiswissen Medizinische Statistik, Springer Lothar Sachs, Jürgen Hedderich: Angewandte Statistik, Springer various books by Edward R. Tufte video on next slide by Eric Shaffer, Human Factors Inc. : http://www.youtube.com/watch?v=bminulau47q LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 35 /38

LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 36 /38

LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 37 /38

Intuitive Interfaces? Given: old style water faucet 2 valves, 1 outlet Cylindrical, next to each other Left warm, right cold Question: In which direction does each valve close? Homework: find such faucets, determine which are intuitive and why (not) LMU München Medieninformatik Alexander Wiethoff + Andreas Butz Interaction Design SS2013 38 /38