Amsterdam Medical Center Department of Medical Informatics. Improve. Usability evaluation of the sign up process of the Improve app

Similar documents
15/16 CSY2041 Quality and User-Centred Systems

Design Heuristics and Evaluation

Evaluation in Information Visualization. An Introduction to Information Visualization Techniques for Exploring Large Database. Jing Yang Fall 2005

Analytical Evaluation

CS 160: Evaluation. Outline. Outline. Iterative Design. Preparing for a User Test. User Test

CS 160: Evaluation. Professor John Canny Spring /15/2006 1

Nektarios Kostaras, Mixalis Xenos. Hellenic Open University, School of Sciences & Technology, Patras, Greece

NPTEL Computer Science and Engineering Human-Computer Interaction

How to Conduct a Heuristic Evaluation

Chris Jung, Garrick Li, Luyi Lu, Grant Neubauer CSE Autumn d: Usability Testing Review. Usability Test 1

User Experience Report: Heuristic Evaluation

User Interface Evaluation

Heuristic Evaluation. Hall of Fame or Shame? Hall of Fame or Shame? Hall of Fame! Heuristic Evaluation

iscreen Usability INTRODUCTION

Overview of Today s Lecture. Analytical Evaluation / Usability Testing. ex: find a book at Amazon.ca via search

USER INTERFACE DESIGN + PROTOTYPING + EVALUATION. Heuristic Evaluation. Prof. James A. Landay University of Washington CSE 440

Übung zur Vorlesung Mensch-Maschine-Interaktion. e5: Heuristic Evaluation

Heuristic Evaluation

Assignment 5 is posted! Heuristic evaluation and AB testing. Heuristic Evaluation. Thursday: AB Testing

Due on: May 12, Team Members: Arpan Bhattacharya. Collin Breslin. Thkeya Smith. INFO (Spring 2013): Human-Computer Interaction

HCI and Design SPRING 2016

Usability Report. Author: Stephen Varnado Version: 1.0 Date: November 24, 2014

Interaction Design. Heuristic Evaluation & Cognitive Walkthrough

Heuristic Evaluation of Covalence

Heuristic Evaluation

Chapter 15: Analytical evaluation

Lose It! Weight Loss App Heuristic Evaluation Report

IBM MANY EYES USABILITY STUDY

Jakob Nielsen s Heuristics (

3 Evaluating Interactive Systems

UC Irvine Law Library Website Usability Project Initial Presentation

Perfect Timing. Alejandra Pardo : Manager Andrew Emrazian : Testing Brant Nielsen : Design Eric Budd : Documentation

A Heuristic Evaluation of Ohiosci.org

CSE 440: Introduction to HCI User Interface Design, Prototyping, and Evaluation

Heuristic Evaluation! Hall of Fame or Shame?! Hall of Fame or Shame?! Hall of Fame or Shame?! Hall of Fame!!

1. Select/view stores based on product type/category- 2. Select/view stores based on store name-

Analytical &! Empirical Evaluation

Introduction. Heuristic Evaluation. Methods. Heuristics Used

Heuristic Evaluation Google Play Store

Analytical evaluation

Heuristic & Severity H.10 S.2. Would be useful given the nature of the problem, but not critical. H.4 S.1. Minor complaint, but an easy fix. H.2 S.

3 Prototyping and Iterative Evaluations

Consumers Energy Usability Testing Report

Usability Testing Review

Folsom Library & RensSearch Usability Test Plan

cs465 principles of user interface design, implementation and evaluation

CSE 440: Introduction to HCI User Interface Design, Prototyping, and Evaluation

Heuristic Evaluation of [Slaptitude]

A Comparative Usability Test. Orbitz.com vs. Hipmunk.com

assignment #9: usability study and website design

Axis labels for graphs could be improved (heuristic violated-visibility of system status):

USER RESEARCH Website portfolio prototype

Programmiersprache C++ Winter 2005 Operator overloading (48)

Heuristic Evaluation. Jon Kolko Professor, Austin Center for Design

Hyacinth Macaws for Seniors Survey Report

Experimental Evaluation of Effectiveness of E-Government Websites

Page 1. Ideas to windows. Lecture 7: Prototyping & Evaluation. Levels of prototyping. Progressive refinement

Heuristic evaluation is a usability inspection technique developed by Jakob Nielsen. The original set of heuristics was derived empirically from an

Stream Features Application Usability Test Report

User Testing Study: Collaborizm.com. Jessica Espejel, Megan Koontz, Lauren Restivo. LIS 644: Usability Theory and Practice

SkillSwap. A community of learners and teachers

EVALUATION OF PROTOTYPES USABILITY TESTING

Evaluating myat&t Redesign. Conner Drew


EVALUATION OF PROTOTYPES USABILITY TESTING

IPM 10/11 T1.6 Discount Evaluation Methods

Problem and Solution Overview: An elegant task management solution, that saves busy people time.

Hyper Mesh Code analyzer

Expert Reviews (1) Lecture 5-2: Usability Methods II. Usability Inspection Methods. Expert Reviews (2)

Competitive & Comparative k Analysis k

Usability. HCI - Human Computer Interaction

evision Review Project - Engagement Simon McLean, Head of Web & IT Support Information & Data Services.

Heuristic Evaluation of Team Betamax

User-Centered Design. Jeff Bos, Design Insights BlackBerry

3d: Usability Testing Review

Foundation Level Syllabus Usability Tester Sample Exam

esign an ser mterrace Evaluation TheOpen University DEBBIE STONE The Open University, UK CAROLINE JARRETT Effortmark Limited

Heuristic Evaluation of Math Out of the Box

Additional reading for this lecture: Heuristic Evaluation by Jakob Nielsen. Read the first four bulleted articles, starting with How to conduct a

Human-Computer Interaction: An Overview. CS2190 Spring 2010

EECE 418. Fantasy Exchange Pass 2 Portfolio

interview.io Final Report

E2: Heuristic Evaluation A usability analysis of decorativethings.com. Jordana Carlin LIS Spring 2014

Heuristic Evaluation of [ Quest ]

Pilltender. Automated Pill Dispenser for Seniors with Memory Loss

Online Food Ordering Company, Founded 2004, Chicago, IL

Heuristic Evaluation Report. The New York Philharmonic Digital Archives archives.nyphil.org

Heuristic Evaluation of NUIG Participate Module 1

Heuristic Evaluation. Heuristic evaluation evaluates the interface to identify usability problems against recognized usability design heuristics.

Heuristic Evaluation of meetchewthere

Web Evaluation Report Guidelines

Cindy Fan, Rick Huang, Maggie Liu, Ethan Zhang November 6, c: Usability Testing Check-In

Recording end-users security events: A step towards increasing usability

Design Principles. Overview. User-Center Design. SMD157 Human-Computer Interaction Fall User-center design Guidelines

CS 147 Autumn 2017: Assignment 9 (Heuristic Evaluation Group Template) Instructor: James Landay. Fix: make web also block the desktop screen.

Design Principles. Overview. User-Center Design. SMD157 Human-Computer Interaction Fall User-center design Guidelines

Heuristic Evaluation of Enable Ireland

CS 315 Intro to Human Computer Interaction (HCI)

balancer high-fidelity prototype dian hartono, grace jang, chris rovillos, catriona scott, brian yin

User Interface Evaluation

Transcription:

Amsterdam Medical Center Department of Medical Informatics Improve Usability evaluation of the sign up process of the Improve app Author L.J.M. Heerink Principal investigator Prof. Dr. M.W.M Jaspers Supervised by Dr. ir T.H.F. Broens Daily supervisor M. Woesthuis M. Verhoeven 1 juli 2016

Inhoud 1 Introduction... 2 2 Usability evaluation methods... 2 2.1 Heuristic evaluation... 2 2.1.1 Procedure... 2 2.2 Cognitive walkthrough... 3 2.2.1 Procedure... 3 2.3 Think aloud... 3 2.3.1 Procedure... 3 2.3.2 Usability metrics... 4 2.3.3 Participants... 4 3 Results usability evaluation... 5 3.1 Type of usability problems... 5 3.2 Task completion success rate... 6 3.2.1 Participants younger than 30 years... 6 3.2.2 Participants between 30 and 50 years... 6 3.2.3 Participants aged 50 years or older... 6 3.3 Task ratings... 7 3.4 Time on Task... 7 3.5 Number of errors... 8 3.6 Usability problems... 9 3.6.1 Categorizing usability problems... 9 3.6.2 Severity level usability problems... 9 3.6.3 Usability problem list...11 3.6.4 Most severe usability problems...19 4 Recommendations for redesign...21 4.1 Change the information in the invitation mail...21 4.2 Provide the personal study code in a second mail or remove the need for a personal study code...21 4.3 Remove the button start nu' and the tab home...22 4.4 Create more distinction between sign up and login...22 4.5 Give appropriate feedback during the sign up process with email...22 4.6 Make forward buttons on the keyboard functional everywhere...23 4.7 Make buttons in the studies tab look more clickable...23 4.8 Simplify logging out and change the label of the settings tab...23 5 Discussion and conclusion...24 References...24 Appendix A: Cognitive walkthrough...25 A.1. Action sequences of the tasks serving as coding scheme...25 A.2. Cognitive walkthrough reporting form...29 Appendix B: Think Aloud...37 B.1. Information and tasks (in Dutch)...37 B.2. Results per test person...39 Appendix C: Logbook...40 1

1 Introduction This document describes a test plan for conducting three usability evaluation methods of the Improve app; a heuristic evaluation, a cognitive walkthrough, and a think aloud evaluation. The Improve app, developed by Open HealthHub, enables secure communication between the patient and physician. Patients can participate in physicians' studies and will be invited to participate for a study by email. This email contains instructions about how to install the app and how to sign up for a study by using the personal study code. Unless these instructions, it turns out that patients especially have difficulties during the sign up process. Therefore, the major goal is to identify the usability problems associated with the sign up process. Three usability evaluation methods were used to identify the usability problems, since each method reveals different problems. A cognitive walkthrough is task specific, whereas a heuristic evaluation takes a holistic view to catch problems. The usability test objectives are: 1. To determine design inconsistencies and usability problem areas within the user interface and content areas that prevent intuitive use and navigation of the Improve app. 2. Establish a list with suggestions for redesign based on the identified user interface problems in order to increase the usability of the Improve app. This document is divided into three main chapters. In chapter 2, the different usability evaluation methods are described. The results of all usability evaluation methods are discussed in chapter 3. This includes completion rates, time on task, number of errors and task rating. Also the usability problems found in all three evaluation methods were combined into one list, categorized and prioritized by giving each problem a severity rating. Recommendations for redesign are discussed in chapter 4. 2 Usability evaluation methods The major goal is to find all usability problems associated with the sign up process in order to redesign the app based on these usability problems. A variety of usability evaluation methods were needed because usability is a complex concept that should be looked at in many ways. Performing only one usability evaluation method will not detect all usability problems, since different methods reveal different issues. Therefore, several methods should be used as a complement to each other. In general, there are two types of usability evaluation methods: user testing methods and usability inspection methods. User testing involves representative users as participants, while usability inspection can be applied without user involvement. Inspection methods can provide quick feedback, because only a few weeks were available to perform the usability evaluation. Therefore, a heuristic evaluation and cognitive walkthrough will be performed. The main goal of a heuristic evaluation is to identify any problems associated with the design of the user interface. A cognitive walkthrough will also be performed since it evaluates the ease of learning to use the app and will therefore detect other usability problems. It aims to look at how easy and obvious goals and actions are and to highlight areas of possible confusion. This is especially important since this app will also be used by less experience users [1]. However, having real users evaluating the Improve app is essential. Especially, because only one expert is involved in performing the cognitive walkthrough and heuristic evaluation. From the literature, it is known that one expert finds around 35% of all usability problems [3]. Therefore, also a think aloud evaluation method is performed with real users. The end-users of the Improve app can be categorized into three user groups based on age. Therefore, it is important to have around 5 participants from each user group, in order to find around 75% of the usability problems [3]. 2.1 Heuristic evaluation In a heuristic evaluation an expert is inspecting the app. In this method, the user interface of the Improve app is compared against accepted usability principles. Therefore, it is a systematic inspection to see if the user interface complies to design guidelines. 2.1.1 Procedure The expert had step through the interface twice. First to get a general idea about the general scope of the app and its navigation structure. Second, to focus on the screen lay out and interaction structure in more detail, evaluating their design and implementation against ten pre-defined heuristics developed by Jacob Nielsen [1]. This resulted in a list of usability problems with a reference to the violated heuristics. The 10 heuristics are: 1. Visibility of system status 2. Match between system and real world 3. User control and freedom 4. Consistency and standards 5. Error prevention 6. Recognition rather than recall 7. Flexibility and efficiency of use 8. Aesthetic and minimalistic design 9. Help users recognize, diagnose, and recover from errors 10. Help and documentation 2

2.2 Cognitive walkthrough The cognitive walkthrough is a type of usability inspection method that focuses on evaluating a design for learnability by exploration [1]. In order to perform the cognitive walkthrough, the app has been explored, tasks were defined and all possible action sequences and interface states were determined on beforehand. 2.2.1 Procedure Both a tablet and a smartphone were used in order to perform the cognitive walkthrough, since differences can exist in the user interface of a tablet and smartphone. A 10.5-inch Android tablet (Samsung Galaxy Tab S 10.5) and a 5.0-inch Android smartphone (Doogee X5) were used to perform the test. The Improve app was first explored by the author and all functionalities of the app were thoroughly tested. The expected goals of the user of the Improve app were transformed into 6 representative tasks. Initially, an extra task of requesting a new password was added to this list. However, requesting a new password did not function and can therefore not be tested. Therefore, only 6 tasks were part of the cognitive walkthrough. A usability goal was defined for each task. 1. Install and open the Improve app Goal: Find out whether the user can find the link in the mail or find the app in the Play Store. Also find out whether the user can install and open the app. 2. Sign up for the app Goal: Find out whether the user can sign up for the app by creating a personal account. 3. Sign up for a study Goal: Find out whether the user can sign up for a study in the app by using the personal study code from the mail. 4. Log out and close the app Goal: Find out whether the user can safely close the app. 5. Open the app and login with the account created in task 2 Goal: Find out whether the user can login again with their created account. 6. Answer a study Goal: Find out whether the user can start a study and answer the questions All possible routes to complete the tasks were analyzed. Action sequences were developed for each possible route. These action sequences included all steps a user should complete before reaching the goal of the task. All possible routes with corresponding action sequences are included in appendix A.1. The author walked through the usability tasks by means of a test plan. At each step in a task, the author was answering four questions about the expectation of the users' behavior. Most of the time, questions were only provided with additional information when the answer was 'no' in order to save time during the evaluation. These questions included: 1. Will the user try to achieve the right effect? 2. Will the user notice that the correct action is available? 3. Will the user associate the correct action with the effect to be achieved? 4. If the correct action is performed, will the user see that progress is being made towards the solution of the task? A potential usability problem was found when the answer on one of those four questions was no. The list with usability problems was extended with the problems found in the cognitive walkthrough. 2.3 Think aloud The think aloud method is a user testing method which is used to get insight in the cognitive processes of potential users of the app. Testing with real users has the advantage that real usability problems can be found and that usability problems found with the cognitive walkthrough and heuristic evaluation can be confirmed. 2.3.1 Procedure Participants were needed in order to perform the think aloud evaluation method. Different age groups were included into this test, since the Improve app does not serve a specific age group. Participants were categorized into younger than 30 years old, between 30 and 50 years old, and participants aged 50 years or older. To find around 75% of all usability problems 5 or 6 participants per age group were needed [2,3]. Therefore, 50 potential participants were invited to participate in this study by mail. In this mail it was described what was expected of the participant and that the participant could only be included when he or she has experience with ios or Android. In total, 18 participants respond and were willing to participate. Each age group had an equal number of 6 participants. In agreement with the test person, a date has been set to conduct the test at the participants home. He or she was allowed to decide for themselves whether to test with an Android tablet (Samsung Galaxy Tab S 10.5) or with a 5.0-inch Android smartphone (Doogee X5). Only the Gmail app and the Play Store were visible on both devices and were both placed on the home screen. All other apps were hided. Both devices were equally chosen by the participants. Before the test started, the participant was asked to read one printed A4 with information about the app, an explanation of the goal of the test, and what was expected of the participant in this test. This information can be found in appendix B.1. It was explicitly mentioned that it does not matter when the participant made mistakes. Then the participant was asked to turn the paper and to read the first part of the text. In this part it was explained 3

Test number Age Gender Province City/Village Educational level Experience level Experience with operating system Test device that the participant was invited to participate in a study by mail. It was mentioned that this mail contains instructions about how the app can be installed and how the participant can sign up for a study. It was also mentioned that the mail can be opened from the home screen of the smartphone or tablet. Before continuing it was asked whether the participant understood everything and it was verbally highlighted that the mail was send to a test mail account and not to their own email address. Also it was highlighted that the participant had to speak out loud and that they should read each task before continuing to the next task. Then the participant was ready to perform the tasks. The tasks defined for the cognitive walkthrough were also used during the think-aloud method, but were provided with some more information. Also all participants were asked to sign up with their test email account. The exact tasks are shown in appendix B.1. When the participant was ready, the screen recorder was started (ADV screen recorder). The author of this report sat beside the participant to provide assistance when asked for, or when it seems that the participant did not reach the goal of the task. 2.3.2 Usability metrics The screen of the smartphone or tablet was recorded during the test. This gave the opportunity not only to measure the number and type of usability problems, but also to measure time and completion rates. Therefore, the completion rate, time on task (TOT), number of errors, and the participants rating for each task were measured. 2.3.3 Participants Participants characteristics were asked in task 6 by means of a questionnaire. These data are shown in table 1. Participants were sorted on age to see the characteristics per age group. Apart from the age, the biggest differences between the age groups were the educational level and experience level. Higher education is mainly found in participants younger than 30 years old, while lower education is found in participants aged 50 years and older. All 6 participants aged 50 years and older see themselves as beginners with smartphones and/or tablets, while advanced and experts were seen in participants younger than 50 years. Because of the experience level, lower educational levels and higher age, it was expected that older participants experience more problems during the tests and need more time to complete a task. Table 1: Participant characteristics 005AMC 16 M OV Village Intermediate Advanced Android S 009AMC 21 F NH City Higher Advanced Android T 011AMC 22 M NH City Higher Expert Android S 010AMC 23 F NH City Higher Expert Android S 013AMC 25 F GD City Intermediate Advanced Android S 001AMC 26 M NB City Higher Advanced ios T 016AMC 31 F OV Village Intermediate Advanced ios T 006AMC 35 F OV Village Higher Advanced ios T 017AMC 37 M OV Village Lower Advanced Android T 018AMC 42 F OV Village Intermediate Advanced Android S 004AMC 47 F OV Village Intermediate Beginner Android T 015AMC 49 M OV Village Intermediate Beginner Android S 002AMC 54 F OV Village Intermediate Beginner Android S 007AMC 54 F OV Village Lower Beginner Android T 008AMC 57 M OV Village Lower Beginner Android S 012AMC 57 F GD City Intermediate Beginner Android S 003AMC 60 F OV City Lower Beginner Android T 014AMC 62 F OV Village Lower Beginner ios T 4

3 Results usability evaluation 3.1 Type of usability problems Three usability evaluation methods were performed. These usability evaluation methods revealed a total of 59 unique usability problems. These problems were categorized into 6 categories. A total of 22 design issues, 11 content issues, 9 functionality issues, 8 wayfinding issues, 6 labeling issues, and 3 other issues were identified. A cognitive walkthrough on all 6 tasks was performed. All possible routes to complete the tasks were identified. Each route had various actions which should be accomplished in order to complete the task. However, many actions will overlap when there are multiple routes available for a task. For each unique step or action, the author answered four questions about the expectations of the users' behavior. In total there were 93 unique actions. Therefore, 372 questions were answered about the expected users' behavior. The in-depth cognitive walkthrough analysis of the user interface of the Improve app revealed a total of 19 potential usability problems associated with a total of 93 unique actions to be performed in executing the 6 tasks. In the heuristic evaluation the user interface has been judged on whether each element followed a list of established usability heuristics developed by Jacob Nielsen. The heuristic evaluation revealed a total of 25 usability problems. In the think aloud evaluation, a total of 18 participants performed 6 tasks almost equal to the tasks of the cognitive walkthrough. A total of 37 usability problems were found after analyzing the video materials with voice recording. The number of usability problems found per evaluation methods are shown in table 2. Table 2: Number of usability problems per evaluation methods Category CW HE TA Unique Design issues 3 10 14 22 Content issues 4 5 8 11 Labeling issues 4 3 2 6 Wayfinding issues 4 2 5 8 Functionality issues 4 3 7 9 Other issues 0 2 1 3 Total 19 25 37 59 The number of usability problems found per usability evaluation method are shown as percentage in figure 1. It is clearly visible that each type of usability evaluation method revealed different type of problems. The cognitive walkthrough found especially labeling and wayfinding problems, while the heuristic evaluation found many more design problems and less wayfinding problems. The think aloud method revealed the most problems. This can be explained because 18 participants were involved in the think aloud evaluation, while only one expert was involved in the cognitive walkthrough and heuristic evaluation. 100,0 90,0 80,0 70,0 60,0 50,0 40,0 30,0 20,0 10,0 Percentage and type of usability problems found per method 0,0 Design Content Labeling Wayfinding Functionality Other CW (%) 16,7 33,3 66,7 50,0 33,3 0,0 TA (%) 77,8 66,7 33,3 62,5 58,3 33,3 HE (%) 55,6 41,7 66,7 25,0 25,0 66,7 CW (%) TA (%) HE (%) Figure 1: Percentage of usability problems found per method 5

3.2 Task completion success rate In total, 18 participants were part of the think aloud evaluation method. The completion rate per task and per age category is shown in figure 2. A task was not completed when any crucial help was needed because otherwise the person could not continue. Since all users were using a test device, help for switching between apps and help for finding keys on the keyboard were not included. This could be different on their own device. Also problems occurred during the attempt to install the app by some participants. The Play Store said that the app was already installed and that no device was compatible. Help was needed to solve this problem. Therefore, the task of installing the app was considered to be completed when the participant had completed the task on their own till they saw the button installed instead of install. The completion rate was counted per age category. 100 90 80 70 60 50 40 30 20 10 Completion rate per task 0 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Completion rate (%) - Age < 30 100,0 100,0 66,7 100,0 100,0 100,0 Completion rate (%) - Age 30-50 83,3 100,0 50,0 100,0 100,0 100,0 Completion rate (%) - Age 50 > 83,3 66,7 33,3 66,7 83,3 83,3 Completion rate (%) - Age < 30 Completion rate (%) - Age 30-50 Completion rate (%) - Age 50 > Figure 2: Completion rate per task and per age category 3.2.1 Participants younger than 30 years All participants younger than 30 years old completed task 1, task 2, task 4, task 5 and task 6. Four of the six (66,7%) completed task 3 without any necessary help. Two participants needed help since they were looking in the wrong mail and therefore could not find the personal study code. 3.2.2 Participants between 30 and 50 years All participants between 30 and 50 years old completed task 2, task 4, task 5 and task 6. Five of the six (83,3%) completed task 1. One participants had difficulties with installing the app, since he or she was not aware of the search functionality in the Play Store. Three of the six (50,0%) completed task 3 without any necessary help. Again three participants could not complete task 3 without any help. Two of them were looking in the wrong mail and therefore could not find the personal study code. One participant thought he or she should fill in the password used for creating an account and was also trying to download the app again, since it was mentioned in the text below the personal study code in the mail. 3.2.3 Participants aged 50 years or older Participants aged 50 years or older had most difficulties during the tasks. There is no task with 100% completion rate in this age group. Five of the six (83,3%) completed task 1. One participant needed help in the Play Store. This participant searched for Improve but scrolled too far in the result list and had therefore already passed the app. Four of the six (66,7%) completed task 2 without any help. The other two participants needed help because they were not aware that they should repeat the password, since it was hidden behind the keyboard and no scrollbar was available. Task 3 had a completion rate of only 33,3%. One participant thought he or she should fill in the password used for creating an account. Another participant thought he or she should come up with an own password, just as when creating an account. One also opened the mail with the study code, but thought he or she should follow all steps again which are mentioned in the text below the personal study code. One participant even not succeed in finding the button to sign up for a study. Four of the six (66,7%) completed task 4. The other two participants needed help since they tried so many options that they no longer knew where to press. Five out of six (83,3%) completed task 5. One participant did not succeed in completing task five without any help, since he or she tried to enter the personal study code as their password. Task 6 had a completion rate of 83,3%. One participant did not notice that a study was already available and therefore tried the personal study code again. 6

3.3 Task ratings Participants were asked to rate the first 5 tasks in task 6. Therefore, task 6 is not rated. The 10-point rating scale ranged from 1 (very difficult) to 10 (very easy). An average per task and per age category is shown in table 3 and figure 3. It is clearly visible that the rating is decreasing while the age is increasing. Task 4 (logging out) and task 1 (installing the app) have the lowest ratings in all age categories, followed by task 3 (sign up for a study). The overall ratings are quite high compared to the number of errors made and the completion rate. One of the reasons could be that the slider was standard on 10 (very easy) instead of a more neutral position. Because of this, participants could be steered into a more positive direction. Table 3: Rating per task Time (s) Age < 30 Age 30-50 Age 50 > Avg Min Max Avg Min Max Avg Min Max Task 1 7,8 4 10 7,0 3 9 5,3 2 7 Task 2 9,3 7 10 8,7 7 10 6,5 5 8 Task 3 8,8 7 10 7,7 4 10 6,2 4 9 Task 4 7,2 6 8 6,8 4 9 6,0 5 7 Task 5 8,8 7 10 8,7 5 10 7,3 5 10 Figure 3: Task rating per age category 3.4 Time on Task Task rating per age category 10,0 9,0 8,0 7,0 6,0 5,0 4,0 3,0 2,0 1,0 0,0 Task 1 Task 2 Task 3 Task 4 Task 5 Avg. Rating - Age < 30 7,8 9,3 8,8 7,2 8,8 8,4 Rating - Age 30-50 7,0 8,7 7,7 6,8 8,7 7,8 Rating - Age 50 > 5,3 6,5 6,2 6,0 7,3 6,3 Rating - Age < 30 Rating - Age 30-50 Rating - Age 50 > ADV screen recorder was used to record the time on task for each participant. Some tasks were inherently more difficult to complete than others and was reflected by the median time on task. A major difference exists between using a smartphone and using a tablet during task 1. The time needed to install the app (task 1) was extremely higher (minutes) on a phone compared to the tablet. However, this is not caused by the participant. Therefore, the time was measured from starting the task till the participant pushed the button install. The total time per task to complete task 2 to 5 was measured from start to end. The time required to fill in the questionnaire highly depend on whether the participant substantiated why he or she had chosen a certain rating. Therefore, time on task was measured from starting the task to click on the button to start the questionnaire. The median instead of the average was calculated for each age group per task to deal with outliers, which mainly occurred in the age group of 50 years and older. Overall, each task required more time when participants were getting older. This is shown in table 4 and figure 4. Major differences are visible for the time to complete task 3 of signing up for a study by using the personal study code from the mail. Participants aged 30 years and below had a median completion time of 71,5 seconds (approximately 1.20 minutes) ranging from 30 seconds to 130 seconds. Participants aged between 30 and 50 years had a median completion time of 155 seconds (approximately 2.55 minutes) ranging from 45 seconds to 340 seconds. Participants aged 50 years and older had a median completion time of 201 seconds (approximately 3.35 minutes) ranging from 113 to 375 seconds. The lowest completion times were reached by participants who directly copied the code from the email even before installing the app. The highest completion times were reached by participants who had difficulties with switching between the apps or had to write down the code. 7

Table 4: Time on Task per age category Time (s) Age < 30 Age 30-50 Age 50 > Median Min Max Median Min Max Median Min Max Task 1 56 33 130 66 31 142 76 45 103 Task 2 76 50 100 86 68 133 146 113 260 Task 3 72 30 130 155 45 340 201 113 375 Task 4 20 11 34 30 27 89 87 44 130 Task 5 45 17 73 77 43 219 91 68 256 Task 6 11 10 30 15 9 24 28 11 190 400 Time on Task (TOT) per task and age category 350 300 250 200 150 100 50 0 <30 30-50 50> < 30 30-50 50> < 30 30-50 50> <30 30-50 50> <30 30-50 50> <30 30-50 50> Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Figure 4: Time on Task per age category 3.5 Number of errors The author of this report captured the number of errors participants made while they tried to complete the tasks. Errors include critical errors and non-critical errors. Participants who needed help during the task were counted as critical errors. They needed help because they were deviated from the target of the task scenario or because they stuck and asked for help. Non-critical errors were errors that were recovered from by the participant or do not result in processing problems or unexpected results (for example hitting the wrong button). In figure 5, the number of errors per task per age category are displayed. More errors occurred with an increasing age. In all age categories most errors were made in task 3 and task 4. Task 3 was about signing up for a study with a personal study code from the mail. A total of 22 errors were made by all participants during this task. In task 4 the participant was asked to log out and close the app. A total of 29 errors were made by all participants during this task. The least errors were made in task 1 (installing the app), task 5 (open the app and login) and task 6 (answer a study). 8

18 16 14 12 10 0 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Errors (n) - Age < 30 0 1 4 3 1 1 Errors (n) - Age 30-50 1 2 8 9 1 0 Errors (n) - Age 50> 1 7 10 17 4 4 Figure 5: Number of errors per task per age category 3.6 Usability problems The usability problems found by all three evaluation methods were combined into one list with usability problems. Problems were coded based on where the problem occurred when performing the tasks. The coding scheme is equal to the action sequences used in the cognitive walkthrough. This coding scheme can be found in appendix A.1. Each usability problem was categorized and prioritized by means of a severity rating. 3.6.1 Categorizing usability problems 8 6 4 2 All usability problems found by the different evaluation methods were categorized into 6 easy categories. This will help Open HealthHub more easily understand how to solve the issues and will give insight into the type of usability problem. The categories were: 1. Design issues: Confusion caused by color, placement or layout of page. 2. Content issues: Information or feedback is missing or not written clearly. 3. Labeling issues: It is not clear what will happen when a link is clicked. 4. Functionality issues: The app does not function as the user expects. 5. Wayfinding issues: It is difficult to find or navigate to the desired location. 6. Other issues: All other issues which cannot be categorized. 3.6.2 Severity level usability problems Number of errors per task Errors (n) - Age < 30 Errors (n) - Age 30-50 Errors (n) - Age 50> When the usability problem was found in the think aloud method, the frequency of occurrence was noted. The frequency can be high, moderate or low. 1. High: 6/18 to 18/18 of the participants experienced the problem 2. Moderate: 3/18 to 5/18 of the participants experienced the problem 3. Low: 1/18 to 2/18 of the participants experienced the problem Each usability problem was also prioritized into three easy categories: high, medium or low. It is most important to focus on the high frequently and moderate frequently occurred usability problems first. Therefore, the severity rating is highly dependent on the extent to which the errors occurred in the think aloud evaluation. 1. High severity High frequently occurred problems during the think aloud evaluation. Problems that prevent wayfinding in the app. Labels/error messages that might not be understood by the end-user. 2. Medium severity Moderate and low frequently occurred problems during the think aloud evaluation. Problems that will cause a significant delay. 3. Low severity Problems that did not occur during the think aloud evaluation. Problems that only needs small design changes. 9

5 Discussion and conclusion The goal of the evaluation of the sign up process of the Improve app was to identify design inconsistencies and usability problem areas within the user interface and content areas that prevent intuitive use and navigation of the Improve app. In total, 59 usability problems were found by performing three evaluation methods; a cognitive walkthrough, a heuristic evaluation, and a think aloud session with 18 participants. A total of 22 design issues, 11 content issues, 9 functionality issues, 8 wayfinding issues, 6 labeling issues, and 3 other issues were identified. Most errors were made during the sign up process for the app, signing-up for a study, and logging out. Less errors were made during the installation of the app, logging in and answering a study. Recommendations were given based on the most severe and most frequently occurring usability problems. These recommendations were visualized by the Invision prototype software by means of mock-ups. However, results from the think aloud session are biased. First of all, participants were not randomly selected but were invited to participate by email. Therefore, only participants were asked who knew the author. Results were also biased because of self-selection. All participants who were willing to participate were included into this test. As a result, the sample may not be representative for the whole population which may affect the results. For example, almost all participants aged 50 years and older had a low educational level and were all beginners with smartphones and tablets. Participants aged 30 years and younger and aged between 30 and 50 had higher educational levels and were advanced or expert phone users most of the time. Therefore, it is logical that participants aged 50 years and older had on average a higher completion time, had a lower completion rate and made more errors. Results could be different when more advanced or expert smartphone users were included into this age group. However, this seems the best method because of the limited available time in this internship. Overall, it seems that a representative user group exist when all 18 participants are joined together into one group. Participants had different ages, were both female and male, had different educational levels, had different experience levels with smartphones and tablets and were not only living in one province. Therefore, it may be assumed that representative usability problems were found during the think aloud evaluation method. The next step for Open HealthHub is to review the recommendations and to redesign the user interface in an attempt to improve the usability of the Improve app. Removing the need of a personal study code is the most important recommendation that will solve many high and medium severe usability problems. New tests with real users should be performed to find out whether the usability problems are solved after the redesign. References [1] Stone, D., Jarrett, C., Woodroffe, M., and Minocha, S. User Interface Design and Evaluation. Morgan Kaufmann Series in Interactive Technologies. Published by Elsevier, Inc. San Francisco: Morgan Kaufman, 2005. [2] Nielsen, J., and Landauer, T.K., A mathematical model of the finding of usability problems, Proceedings of ACM INTERCHI'93 Conference (Amsterdam, The Netherlands, 24-29 April 1993), pp. 206-213. [3] Nielsen, J. How to Conduct a Heuristic Evaluation, https://www.nngroup.com/articles/how-to-conduct-aheuristic-evaluation/, 1995. Accessed on: 06-06-2016. 24