The Whyline. An Interrogative Debugging Interface for Asking Questions About Program Behavior. Andrew J. Ko and Brad A. Myers

Similar documents
Using Functions in Alice

Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behavior

AO3. 1. Load Flash. 2. Under Create New click on Flash document a blank screen should appear:

CSCI 1100L: Topics in Computing Lab Lab 11: Programming with Scratch

Midterm Exam, October 24th, 2000 Tuesday, October 24th, Human-Computer Interaction IT 113, 2 credits First trimester, both modules 2000/2001

Getting Started with Java Using Alice. 1 Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Lecture 1: Overview

Asking and Answering Why and Why Not Questions about Program Behavior. Andrew Ko Brad Myers


Lab 4: Imperative & Debugging 12:00 PM, Feb 14, 2018

Introduction to Programming

If Statements, For Loops, Functions

Hello World! Computer Programming for Kids and Other Beginners. Chapter 1. by Warren Sande and Carter Sande. Copyright 2009 Manning Publications

Getting Started with Java Using Alice. 1 Copyright 2013, Oracle and/or its affiliates. All rights reserved.

BONE CONTROLLER ASSET VERSION 0.1 REV 1

THE SNAP-ON APOLLO D8 GUIDES YOU TO ANSWERS. WHICH LEADS TO SOMETHING ELSE: CONFIDENCE

Laboratory 1: Eclipse and Karel the Robot

Blackfin Online Learning & Development

Powered by. How did trying to give apples away for free change the world?

Reliable programming

Midterm Exam Solutions March 7, 2001 CS162 Operating Systems

9 R1 Get another piece of paper. We re going to have fun keeping track of (inaudible). Um How much time do you have? Are you getting tired?

Lecture 9: July 14, How to Think About Debugging

Heuristic Evaluation of igetyou

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

Text Input and Conditionals

Julie Rand LIS Fall Usability Study

Taskbar: Working with Several Windows at Once

Using Tab Stops in Microsoft Word

EXAMINING THE CODE. 1. Examining the Design and Code 2. Formal Review: 3. Coding Standards and Guidelines: 4. Generic Code Review Checklist:

One of the fundamental kinds of websites that SharePoint 2010 allows

Fractions and their Equivalent Forms

Post Experiment Interview Questions

Headshots in Alice. Duke University Professor Susan H. Rodger Gaetjens Lezin July 2008 Updates made June 2014 by Yossra Hamid

StoryStylus Scripting Help

CISC-124. Casting. // this would fail because we can t assign a double value to an int // variable

June Usability Test Results of the Perform Application

This project was originally conceived as a pocket database application for a mobile platform, allowing a

5 R1 The one green in the same place so either of these could be green.

Usable Privacy and Security Introduction to HCI Methods January 19, 2006 Jason Hong Notes By: Kami Vaniea

Installing and Using Trackside Cameras Revised November 2008

Usability Test Report: Requesting Library Material 1

COMP1730/COMP6730 Programming for Scientists. Testing and Debugging.

Project Report Number Plate Recognition

Programming Robots with ROS, Morgan Quigley, Brian Gerkey & William D. Smart

CS 4349 Lecture October 18th, 2017

CS 160: Evaluation. Outline. Outline. Iterative Design. Preparing for a User Test. User Test

CS 160: Evaluation. Professor John Canny Spring /15/2006 1

Spam. Time: five years from now Place: England

(I m not printing out these notes! Take your own.)

Outline. Usability Testing CS 239 Experimental Methodologies for System Software Peter Reiher May 29, What Do We Mean By Usability?

Parcel QA/QC: Video Script. 1. Introduction 1

Telling a Story Visually. Copyright 2012, Oracle. All rights reserved.

Scene 2, Flash. STEP THREE: Same with the mouths, if you have it touching the ears or something, move it so there is clearance.

Getting Started. Excerpted from Hello World! Computer Programming for Kids and Other Beginners

Clickbank Domination Presents. A case study by Devin Zander. A look into how absolutely easy internet marketing is. Money Mindset Page 1

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change

Testing is a very big and important topic when it comes to software development. Testing has a number of aspects that need to be considered.

SPRITES Moving Two At the Same Using Game State

Git. all meaningful operations can be expressed in terms of the rebase command. -Linus Torvalds, 2015

Yup, left blank on purpose. You can use it to draw whatever you want :-)

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

MITOCW watch?v=flgjisf3l78

Contents. How to use Magic Ink... p Creating Magic Revealers (with Magic Ink)... p Basic Containers... p. 7-11

1.7 Limit of a Function

Ryan Parsons Chad Price Jia Reese Alex Vassallo

EMF Temporality. Jean-Claude Coté Éric Ladouceur

COMP390 (Design &) Implementation

Approximate Q-Learning 3/23/18

PHY Microprocessor Interfacing Techniques LabVIEW Tutorial - Part I Beginning at the Beginning

Introduction to User Stories. CSCI 5828: Foundations of Software Engineering Lecture 05 09/09/2014

Lesson 13 Transcript: User-Defined Functions

5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 485.e1

BDD and Testing. User requirements and testing are tightly coupled

CS 520 Theory and Practice of Software Engineering Fall 2018

MD-HQ Utilizes Atlantic.Net s Private Cloud Solutions to Realize Tremendous Growth

Testing. So let s start at the beginning, shall we

Reporting and analyzing bugs

Programming in the Real World. Dr. Baldassano Yu s Elite Education

CS 6371: Advanced Programming Languages

Princess & Dragon Part 4: Breathing Fire Adding Effects to Alice

Software Quality Assurance. David Janzen

6.001 Notes: Section 8.1

CMSC201 Computer Science I for Majors

Program development plan

Skater World: Part Three


The Essentials of Alice (Bunny) By Jenna Hayes under the direction of Professor Susan Rodger Duke University July 2008

CMSC 201 Fall 2016 Lab 09 Advanced Debugging

Creating Breakout - Part 2

Rethinking Usability for Responsive Web Design

Hyper Mesh Code analyzer

CS 220: Introduction to Parallel Computing. Arrays. Lecture 4

Interactive Tourist Map

How to approach a computational problem

CSCI 1100L: Topics in Computing Lab Lab 1: Introduction to the Lab! Part I

The name of our class will be Yo. Type that in where it says Class Name. Don t hit the OK button yet.

Outline More Security Protocols CS 239 Computer Security February 6, 2006

1 Build Your First App. The way to get started is to quit talking and begin doing. Walt Disney

Lab 7 Unit testing and debugging

Transcription:

The Whyline An Interrogative Debugging Interface for Asking Questions About Program Behavior Andrew J. Ko and Brad A. Myers Project Marmalade Human-Computer Interaction Institute Carnegie Mellon University My name is Andrew Ko, and today I ll be talking about some work that I ve done on debugging as part of project Marmalade. 4/15/04-20 minutes 58 seconds 4/19/04-17 minutes 50 seconds 4/20/04-17 minutes 51 seconds

Debugging Interrogative Debugging A new approach to debugging tools for programmers of all expertise. The Whyline, an end-user programming interrogative debugging prototype. What is Interrogative Debugging? What is the Whyline? - Demonstration - User Study Conclusions 2

Debugging Debugging Among all programming activities, debugging is still the most common and costly. - Software engineers spend 70-80% of their time testing and debugging, with the average bug taking 17.4 hours to fix. - Aspiring end-user programmers often abandon programming systems because of difficulty with debugging. But we ve had the same debugging tools for over 30 years... - Print statements are still the best. - Several research prototypes have been proposed, but few have been shown to reduce debugging time or effort. 3 Why is debugging so hard? And why have so many tools proven ineffective?

Debugging Understanding Debugging We asked 17 expert, novice, and non-programmers to think aloud while programming. We found several striking trends: - Programmers asked explicit questions about their program s failures, immediately after failure. - All were in the form why did (32%) or why didn t (68%). - Questions were always in terms of visual output, and never in terms of code and data. i.e., Why didn t my object disappear? 4

Debugging Assumptions Behind every question were implicit assumptions about what did or did not happen at runtime. Almost 80% of questions made false assumptions: Q: Why didn t that disappear? A: It did disappear, but then immediately reappeared. 50% of all of programming errors were due to decisions programmers had made because of false assumptions. 5 Even more striking... These observations led to approach to debugging interfaces, which we call...

Debugging Interrogative Debugging 1. Allow why did and why didn t questions about a program s behavior. 2. Reveal false assumptions about what did and did not occur at runtime. 3. Give answers in terms of the code and data directly responsible for the behavior in question. 6...Interrogative Debugging. It has three parts. Will this work? To find out...

The Whyline The Whyline An Interrogative Debugging interface for Alice. A Workspace that Helps You Link Instructions and Numbers to Events 7 We prototyped an Interrogative Debugging tool called the Whyline, in the Alice 3D programming system... Let me briefly describe the Alice programming system.

The Whyline Alice A 3D animation end-user programming system. Event-based, imperative, and concurrent. Drag and drop interaction prevents all type and syntax errors. A free gift from Randy Pausch, et al. at Carnegie Mellon s Entertainment Technology Center. http://www.alice.org 8

The Whyline What is the Whyline? Question Menu { Textual Answer { { Answer Visualization 9

The Whyline What is the Whyline? Question Menu { Textual Answer { { Answer Visualization 9

The Whyline The Question Menu Contains why did and why didn t questions about a program s behavior. Questions are organized by the object they relate to......then by animation and variable. 10

The Whyline What is the Whyline? The textual answer reveals assumptions and summarizes what did or didn t happen at runtime. 11

The Whyline What is the Whyline? The answer visualization is an interactive timeline. Shows data and control flow directly related to a question. Time { { { Data flow Control flow Time cursor 12

Demonstration Demonstration We ll recreate three scenarios directly from user studies. In our studies, programmers created a simple Pac-Man game. Pac tries to eat the dots and the Ghost tries to eat Pac. Pac Big Dot Ghost We ll watch a programmer: 1) Make Pac move. 2) Make Pac shrink if eaten by the Ghost. Small Dots 3) Make Pac invincible if he eats the Big Dot. 13

Demonstration Make Pac Move Make Pac move forward 3 meters/second, as seen below. 14

The programmer creates the movepac() method. 15

The programmer creates a move forward animation and tests. 16 Before we see the use of the Whyline, how have users in prior studies reacted to this failure?

Demonstration Without the Whyline Programmers wondered, why didn t Pac move forward? - Is there something wrong with my move command? - Maybe forward doesn t mean what I think? - Was I supposed to save before I played the world? Experts, novices and non-programmers spent 2-3 minutes diagnosing this failure. Every programmer made this type of error at least twice in our observations. 17 With the Whyline, however...

Why didn t Pac move forward 3? 18 The code a question refers to becomes highlighted.

The programmer clicks the question and the answer appears. 19

The programmer then inspects the answer. 20

The programmer makes an event that calls movepac(). 21

Demonstration Invariant Answers An example of an invariant answer. Pac would never move because the programmer forgot to call movepac(). The Whyline revealed this problem, directly diagnosing the failure. 2-3 minute task 10 second task! Reduced by a factor of 15! 22

Demonstration Make the Ghost Eat Pac If the Ghost touches Pac, make Pac shrink. 23

The programmer wraps a DoTogether around move. 24 The programmer wants to move pac and test the collision at the same time. He does this with a do together, which executes statements concurrently.

He adds an If-Else, checking if Ghost is within 2 meters of Pac. 25

The programmer adds a Pac resize 0 animation and tests. 26 We want Pac to shrink, so we ll resize him to 0 if he touches the ghost.

Demonstration Without the Whyline Programmers wondered, why didn t Pac resize? - Maybe is within threshold doesn t do what I thought... - Maybe I shouldn t have used the resize animation... - Maybe I wasn t supposed to put that in the DoTogether... Experts, novices and non-programmers alike spent 4-7 minutes diagnosing this failure. 27

Why didn t Pac resize 0? 28 Again, the code in question becomes selected as the programmer hovers over the questions.

The answer visualization shows what caused the resize. 29 So the resize did happen...

The time cursor moves backward and forward through the execution history, changing the output window. He scrubs the history, verifying that resize 0 had no effect. 30 The view of the output window changes while

The programmer changes resize 0 to resize.5, and tests. 31

Demonstration False Proposition Answers An example of a false proposition answer. The programmer made the reasonable assumption that resize 0 would have a visible effect. When it didn t, he assumed that it did not occur. The Whyline revealed this assumption, allowing him to isolate the problem to the resize animation. 4-7 minute task 30 second task! Reduced by a factor of 11! 32

Demonstration Make Pac Invincible If the Big Dot has been eaten, prevent Pac from shrinking. 33

The programmer makes Big Dot disappear if Pac touches it.34

Pac should only resize when the Big Dot is visible. 35

Demonstration Without the Whyline Programmers wondered, why did Pac resize? - I have no idea what just happened... - Maybe my If-Else is wrong... - Maybe Pac didn t get to the dot in time... - Maybe isshowing doesn t do what I think... Experts, novices and non-programmers alike spent 4-9 minutes diagnosing this failure. 36 Don t bother reading these.

Why did Pac resize.5? 37

Big Dot.isShowing was true when the Ghost touched Pac. 38

Why didn t Big Dot.isShowing change to false? 39 Asks a question in context, which adds the answer to the Whyline. You can see that the condition was tested before the value was changed.

The two conditionals were interleaved. 40

The Big Dot check should happen before the Ghost check. 41 Because the conditionals were in the same do together, they became interleaved. He fixes it by dragging the Big Dot check outside the do together.

Demonstration Control & Dataflow Answers An example of a control and dataflow answer. The programmer assumed that the Big Dot check would finish executing before the Ghost check. The Whyline revealed this assumption, illustrating that the two conditionals were interleaved. 4-9 minute task 30 second task! Reduced by a factor of 13! 42

Implementation Whyline Implementation The Whyline records each execution of code and each assignment and use of data. 43 We use this information to construct the question menu...

Implementation The Question Menu A question for each unique execution of an output statement. Does not contain what didn t happen. - Programmers rarely assumed something did happen when it actually didn t. A question for each output statement that could have executed... - Only need to include what programmers expect to happen (what they wrote). A question for everything that did happen, allowing for false propositions. Infinite number of things that could have happened, but you only have to include ones that programmers would expect (they expect what they wrote). Why did doesn t contain what didn t happen for false propositions, because these were never asked in user tests. To answer the questions... 44

Implementation The Question Menu A question for each unique execution of an output statement. Does not contain what didn t happen. - Programmers rarely assumed something did happen when it actually didn t. A question for each output statement that could have executed... - Only need to include what programmers expect to happen (what they wrote). A question for everything that did happen, allowing for false propositions. Infinite number of things that could have happened, but you only have to include ones that programmers would expect (they expect what they wrote). Why did doesn t contain what didn t happen for false propositions, because these were never asked in user tests. To answer the questions... 44

Implementation Answering Questions Why Did? Why Didn't? Did the statement execute? Yes False proposition answer No Invariant answer Yes Does it always execute? Could it ever execute? No Invariant answer No Yes Control and dataflow answer A control and dataflow answer for each unique execution of the conditional that prevented it. 45 We do answer invariant. Now let s briefly discuss our user studies of the Whyline.

User Study User Study versus + Whyline Expert, novice, and non-programmers were given 90 minutes to implement the Pac-Man game. - The game required programmers to implement 6 distinct behaviors. Programmers were asked to think aloud, and were videotaped over the shoulder. 46 Participants were all novice to Alice, and expressed interest in learning to program 3D simulations. We saw 3 simplified versions of these behaviors in the demo.

User Study How Often Was It Useful? Programmers used the Whyline for all 24 of their debugging questions.???????????????????????????????????????????????? 19 of 24 answers (80%) led directly to valid repairs. - In each, programmers had made false assumptions about their program s pattern of execution. 5 of 24 answers failed to help programmers with their debugging task. - In each, programmers already knew what had occurred, but didn t know why it was wrong. - All concerned complex Boolean expressions. 47 Say 80% and 20% here. So we re thinking about how to address this with other stuff

User Study Debugging Time 6 5 4 3 2 1 0 Debugging Time (minutes) 6 equivalent debugging scenarios No Whyline Whyline Reduced debugging time by an average factor of 7.8 (p <.02). 48 In terms of task completion,...

User Study Task Completion The Whyline prevented further errors by preventing programmers from editing code that was not at fault. As a result, the group of programmers with the Whyline correctly completed 40% more tasks (p <.02) 4 No Whyline Whyline Average # of behaviors correctly implemented. 3 2 1 0 49

Conclusions Discussion Interrogative Debugging was highly effective. When programmers made false assumptions about what occurred at runtime (80%), the Whyline revealed them. - Programmers were able to use the Whyline s answer to quickly and successfully correct their code. Even when programmers knew what had occurred at runtime, but didn t know why it was wrong (20%), the Whyline helped them focus on the code responsible. 50

Conclusions Future Work The Whyline raises several questions about Interrogative Debugging: - Does it scale to larger, more complex programs? - Would it help more experienced programmers? - Would it help comprehend unfamiliar code? - What types of questions are important in other domains? - How should answers be visualized in other domains? - Can questions involving before and after be answered effectively? 51 Look forward to working on several of these questions for the foreseeable future.

Q & A Project Marmalade http://www.cs.cmu.edu/~natprog/marmalade.html EUSES Consortium http://eecs.oregonstate.edu/euses/ Project Marmalade is funded by NSF grant IIS-0329090 and in part by the EUSES consortium (End Users Shaping Effective Software) under NSF ITR CCR-0324770. Thank you very much to NSF and EUSES and thank you for being here. Q: Barriers to Java implementation? A: What are the relevant questions? Efficiency. Q: Does it scale? A: