Experience in Using a Process Language to Define Scientific Workflow and Generate Dataset Provenance

Size: px
Start display at page:

Download "Experience in Using a Process Language to Define Scientific Workflow and Generate Dataset Provenance"

Transcription

1 Experience in Using a Process Language to Define Scientific Workflow and Generate Dataset Provenance Leon J. Osterweil 1, Lori A. Clarke 1, Aaron Ellison 2, Rodion Podorozhny 3, Alexander Wise 1, Emery Boose 2, and Julian Hadley 2 1 Lab. for Advanced SW Engineering Research (LASER), Dept of Computer Science Univ. of Massachusetts Amherst, MA Harvard Forest, Harvard University, Petersham, MA 3 Dept. of Computer Science, Texas State Univ., San Marcos, TX

2 Or: Why software engineers should be interested in scientific data and how it is processed

3 Or: Why software engineers should be interested in scientific data and how it is processed Scientific data management raises problems in: Configuration Management Abstraction/modularity Exception management Concurrency control Static Analysis --and more

4 Or: Why software engineers should be interested in scientific data and how it is processed Scientific data management raises problems in: Configuration Management Abstraction/modularity Exception management Concurrency control Static Analysis --and more Addressing these Problems can shed light on fundamental issues in software engineering

5 Our approach Evaluate applying the principles of o Modern programming languages o Software engineering o Other CS domains In the form of the Analytic Web concept o Two interrelated types of graphs To see how this helps address the problems of scientists o And what it tells us about the CS domains

6 Our approach Evaluate applying the principles of o Modern programming languages o Software engineering o Other CS domains In the form of the Analytic Web concept o Two interrelated types of graphs To see how this helps address the problems of scientists o And what it tells us about the CS domains

7 You re not a real scientist, you re a Computer Scientist --A Physicist/Astronomer to Lee

8 Computer Scientists do real science, but it is a science of abstract, non-tangible things. And this is a science that you need our help with Lee, back to Physicist/Astronomer

9 So, What do real scientists do (in the abstract)?: The Traditional Structure of Science--The Scientific Method Hypothesis Experimental Design o Define experimentation process Experimentation o Execute the process to produce data Evaluation o Analyze datasets, producing more datasets Reproduction by others o Reapply processes to compare results and increase confidence in the conclusions

10 Focus on Datasets Datasets o Their production o Their reproduction by others o Their use by consumers E.g., In producing subsequent datasets Dataset Provenance o Document where datasets come from o Who, What, When, Where (Typical Metadata) o How What computer programs What cleansing and interpolation algorithms Process Provenance Metadata

11 Real Datasets: Real Complications Is there really any raw data? o Mostly recorded automatically o Often preprocessed by computers Considerable cleaning of datasets o Some by humans o Some by automation Documentation is usually scant or absent Reproduction is hard or impossible o Processes are complex and hard to document Real threat to the foundations of science o Reproducibility is the bedrock of science

12 Perils in Inadequate Dataset Documentation Some datasets are wrong o Incorrectly cleaned o Inappropriate algorithms applied incorrectly o Incompletely processed Statistical data is often misunderstood/misused o Used under assumptions that are unwarranted Incorrect/incorrectly used data can be the basis for gravely important decisions o Clear cut forests o Release of unsafe pharmaceuticals How to know which datasets to trust?

13 The Internet Aggravates These Problems Scientific datasets are published o In the literature o ON THE WEB (NSF mandate) What might be published? o All the data o Analyzed datasets o With all of the fudge factors? What does get published o Highly processed datasets o With important details omitted THE DETAILS MATTER A GREAT DEAL

14 Key Problems to be Addressed Producers need automation help for o Generation o Distribution o Provenance documentation Consumers need help for o Understanding o Reproduction o Regeneration o Reuse These are primarily process issues

15 A Real Ecological Research Problem: The Water Budget Problem Measure and analyze the flow of water through a small forested watershed ds = P - ET- Q Precipitation (P) o Two gauges, P1, P2 used to minimize gaps in data. Surface Discharge (Q) o Measured at a stream gauge o Gaps caused by sensor failure, ice build-up, etc. o Fill as a function of preceding P and Q. Evapotranspiration (ET) o measured at an eddy-flux tower. Photosynthetically active radiation (PAR) o PAR can be used to estimate ET

16 An Eddy Covariance (Flux) Tower Flux source area: = f(canopy structure, wind speed, wind direction, turbulence) CO 2 photosynthesis CO 2 soil respiration CO 2 CO 2 soil respiration Eddy covariance tower

17 Key Features of the Water Budget Process Frequent measurements taken in real-time o Some wirelessly o Considerable manipulation in real-time Datasets published in real-time(?) 30-day retrospective revision of dataset o Maybe revisions at other (more immediate?) intervals Other revisions done at other times o Adjustments for sensor drift o Alternative model evaluation o Use of data from other sites (?) Different scientists want to do different things????

18 DFG of Water Budget Process

19 Some key issues to be dealt with Processes are complex o Intertwined iterations o Exceptions (not shown) interrupt nominal flow o Concurrency Create profusions of artifact instances o How to keep track of multiple instances of a single dataset type? o Need an artifact structure o And configuration management Still other issues o Mixed human and automated agents

20 DFG does not do well with some of these issues Producers generate datasets by executing paths o DFG has some spurious unexecutable paths Consumers need provenance documentation o How datasets were generated o How data items were created o But DFG nodes are types. There are no instances Both need reproducibility o Sufficient details are needed o And can be hard to produce Both need analyses o Which requires rigor and precision o As well as details

21 Our Proposed Approach: Analytic Web Represent scientific datasets as a web structure specifying: o What data were used to derive each new dataset o The process by which each dataset was derived Use two complementary graphs o Process Definition Graph (PDG) (a visual program) Defines process in sufficient detail to support Understanding, Execution, Reproducibility, Scrutiny o Data Derivation Graph (DDGs) (program traces) Documents how each dataset instance was actually derived In terms of actual process steps applied and choices made

22 The Two Graphs Process Derivation Graph (PDG) o Defines the process o Types of tasks that are applied to types of datasets Data Derivation Graph (DDG) o Which instances derived by which others o By which activities PDG execution leads to the creation of the DDG Dataset provenance metadata created from DDG nodes

23 The Process Definition Challenges How to represent processes o Accurately,completely,clearly,reproducibly Real processes are large and complex o Even easy ones How to support execution of these processes How to communicate the processes to working scientists How to validate these processes o Scientific scrutiny o Analysis to assure that composed processes are not anomalous

24 The Little-JIL Process Language Executable,visual process language emphasizing o Use of abstraction o Exception management o Concurrency control o Resource specification and late-binding Use it to define PDGs to determine the characteristics and features needed Execute the PDGs to generate DDGs o That define dataset provenance

25 Value of Hierarchy, Scoping, and Abstraction Process definition is a hierarchical decomposition Think of steps as procedure invocations o They define scopes o Copy and restore argument semantics o Creation of scopes Encourages use of abstraction o Eg. process fragment reuse Use of recursion helps define, propagate, exploit context information to elucidate iteration

26 Elaboration of on key step Augment Realtime Dataset X Get RT Readings ~Data not OK Read Precip Read Others Data not OK Clean Data X ~Dataset Review OK Select Values Clean Data Apply Models Select Best Model Fill Gaps in Data Dataset Review OK

27 Makes use of recursion Apply Models Apply A Model Apply Models Get Model Compute Model Values

28 Use of different types of agents too Apply Models Apply A Model Apply Models Agent is scientist Get Model Compute Model Values

29 The whole process

30 Elaborate this corner

31 Need for complex exception handling Read Precip = Handle No Precip Read Precip 1 Read Precip 2 X Choose Source O X Sensor Down X Read Precip 1 Read Precip 2 Get Airport Put Null

32 The whole process

33 howing process odule reuse

34 re process dule reuse

35 The DDG A DAG of instances o Dataset instances o Process step instances (and their agents) Edges indicate which datasets are o Derived from executing which step instance o Using which dataset instances as inputs

36 Different executions lead to different DDGs Read Precip = Handle No Precip Read Precip 1 Read Precip 2 X Choose Source O X Sensor Down X Read Precip 1 Read Precip 2 Get Airport Put Null

37 DDG if both sensors are read successfully Read Precip 1 Read Precip 2 P1 P2 Read Precip P1 P2 Get RT Readings

38 DDG if Read of Precip 1 Fails twice, then gets value from alternative source (airport) Read Precip 1 Get Airport Read Precip 1 Read Precip 2 P1 P2 P1 Retry Airport P1 Read Precip Handle No Precip P2 P1 Get RT Readings

39 DDG showing consideration of two alternative models P1 P2 Model 1 Model 2 Apply A Model Apply A Model P from Model1 P from Model 2 Add to Dataset P

40 Full DDG: Two Models; Both readings succeeded P1 Read Precip 1 Read Precip Read Precip 2 P2 P1 P2 Get RT Readings P1 P2 Apply Models P1 Model 1 Model 2 P2 Apply A Model Apply A Model

41 Read Precip 1 Get Airport Read Precip 1 Read Precip 2 P1 P2 P1 Retry Airport P1 Read Precip Handle No Precip P2 DDG with two failed Precip readings, then airport read P1 P1 P1 Get RT Readings Apply Models Model 1 Model 2 P2 P2 Apply A Model Apply A Model

42 Read Precip 1 Get Airport Read Precip 1 Read Precip 2 P1 P2 P1 Retry Airport P1 Read Precip Handle No Precip P1 P2 Eliding for Clarity (?) Get RT Readings: Apply Models Model 1 Model 2 Apply A Model Apply A Model

43 Read Precip 1 Read Precip 2 A Further DDG Read Precip; Get RT Readings; Apply Models P1 P2 Model 1 Model 2 Apply A Model Apply A Model P from Model1 P from Model 2 Add to Dataset P

44 sensordatum Read PModelChannel at time 1 Read PModelChannel at time 2 Read PModelChannel at time 3 IdofPModel 1 IdofPModel 2 IdofPModel 3 DDG for Selection from among three Models Auto Select Model at time 1 Auto Select Model at time 2 Auto Select Model at time 3 Cached data IdofPModel 1 After Auto Select Model IdofPModel 2 After Auto Select Model IdofPModel 3 After Auto Select Model Eval Selected Model for Model1 Eval Selected Model for Model2 Eval Selected Model for Model3 SensorDatum.original SensorDatum 1 SensorDatum 2 SensorDatum 3 PModelID SensorDatum.modeled Select Best Model

45 Read try1 time Sensor1 try1 time Read try2 time try2 time Read PModelChannel at time 1 Read PModelChannel at time 2 Read PModelChannel at time 3 IdofPModel 1 IdofPModel 2 IdofPModel 3 Get Airport Data streams Auto Select Model at time 1 Auto Select Model at time 2 Auto Select Model at time 3 Airport airport try time Bind Airport Datum As P1 Cached Data IdofPModel 1 After Auto Select Model Eval Selected Model for Model1 IdofPModel 2 After Auto Select Model Eval Selected Model for Model2 IdofPModel 3 After Auto Select Model Eval Selected Model for Model3 P1 after Get Airport SensorDatum 1 SensorDatum 2 SensorDatum 3 Filter Range Data P1, after Check Ranges Check Ranges Read try1 time P2 Read PModelChannel at time 4 Select Values of P SensorDatum.original Select Best Model PModelID SensorDatum.modeled IdofPModel 4 Auto Select Model at time 4 P set from P2 sensordatum Enhanced Cached Data SensorDatum.original The DDG further along Next PModelID Select Best Model Next SensorDatum.modeled IdofPModel 4 After Auto Select Model Eval Selected Model for Model4 SensorDatum 4

46 Process Provenance Metadata Derived from the DDG o Root of the appropriate DDG subdag o Efficiencies by storing one DDG Provenance metadata stored as pointer Tradeoffs possible o Cache some intermediate values o Rederive the rest

47 Some Related Work in Scientific Workflow The Kepler Project o Largest, best funded o Based on Ptolemy II Data Flow Graph Approach o Oodles of tools Taverna, Teuta, Chimera, JOpera o Different Data Flow Graph projects Provenance Projects o Database approaches o Process history logging approaches

48 Problems with Current Scientific Workflow Languages tend to be semantically weak o Based on box and arrow charts Principal semantic weaknesses o Lack of abstraction facilities o Weak exception management o Non-trivial iteration can be hard to understand Other problems (with some) o Lack of focus on data and data flow o Provenance viewed as a separate issue o Poorly defined semantics Restricts the rigor of analyses Kepler is better than most

49 Kepler Large toolset to support users Highly visual Box and arrow charts are defined by Directors, which provide semantics o But different levels of hierarchy may have different Directors o No standard usage of Directors E.g. for loops and exceptions o Hard to tell what a diagram is saying without looking at Its Directors Director of its parent Programming language experience can inform work in this area

50 Important Clarification DFG-based scientific workflow systems can do (most or) all of the above BUT: Process definitions tend to be ungainly, opaque The Turing Tarpit for process definition Focus here: o What language constructs should be used to make these processes clear, evolvable, analyzable, etc. o Software engineering has much to say about this

51 Observations Creating a PDG makes scientists think carefully The PDG helps scientists create flexible and scalable processes The PDG supports re-execution using different input data and/or different tools. The PDG provides a useful and efficient framework for structuring and generating the DDG. PDGs can be rigorously evaluated for logical, statistical and propagation of measurement errors.

52 Summary Defined PDGs for ecological research processes o Carbon Flux o Water Budget Key semantic features needed o Exception management o Abstraction/modularity/reuse o Agent definition Not favorable for use of DFGs as PDGs Little-JIL interpreter executed simple processes only DDGs done only manually o Automation needed o Application to other process areas and problems?

53 Future work SciWalker 2 o Environment for building and maintaining PDGs and DDGs Focus on DDGs o How to build, maintain, optimize, display them o What else are they good for? (Software) rework? Running benchmarks???? Integrate with Kepler o Use Kepler for low-level tasks o Little-JIL for high-level process definition

54 Questions and Discussion?

Experience in Using a Process Language to Define Scientific Workflow and Generate Dataset Provenance

Experience in Using a Process Language to Define Scientific Workflow and Generate Dataset Provenance Experience in Using a Process Language to Define Scientific Workflow and Generate Dataset Provenance Leon J. Osterweil Dept. Computer Science Univ. of Massachusetts Amherst, MA 01003 USA ljo@cs.umass.edu

More information

Supporting Undo and Redo in Scientific Data Analysis

Supporting Undo and Redo in Scientific Data Analysis Supporting Undo and Redo in Scientific Data Analysis Xiang Zhao, Emery R. Boose, Yuriy Brun, xiang@cs.umass.edu, boose@fas.harvard.edu, brun@cs.umass.edu Barbara Staudt Lerner, Leon J. Osterweil blerner@mtholyoke.edu,

More information

CONCEPTS & SYNTHESIS

CONCEPTS & SYNTHESIS CONCEPTS & SYNTHESIS EMPHASIZING NEW IDEAS TO STIMULATE RESEARCH IN ECOLOGY Ecology, 87(6), 2006, pp. 1345 1358 Ó 2006 by the Ecological Society of America ANALYTIC WEBS SUPPORT THE SYNTHESIS OF ECOLOGICAL

More information

The Role of Context in Exception-Driven Rework

The Role of Context in Exception-Driven Rework The Role of Context in Exception-Driven Rework Xiang Zhao University of Massachuestts Amherst Amherst, USA xiang@cs.umass.edu Barbara Staudt Lerner Mount Holyoke College South Hadley, USA blerner@mtholyoke.edu

More information

The Role of Context in Exception-Driven Rework

The Role of Context in Exception-Driven Rework Laboratory for Advanced Software Engineering Research 1 University of Massachusetts Amherst 2 Mount Holyoke College 5th International Workshop on Exception Handling (WEH.12) Zurich, Switzerland June 16,

More information

Testing! Prof. Leon Osterweil! CS 520/620! Spring 2013!

Testing! Prof. Leon Osterweil! CS 520/620! Spring 2013! Testing Prof. Leon Osterweil CS 520/620 Spring 2013 Relations and Analysis A software product consists of A collection of (types of) artifacts Related to each other by myriad Relations The relations are

More information

Supporting Process Undo and Redo in Software Engineering Decision Making

Supporting Process Undo and Redo in Software Engineering Decision Making Supporting Process Undo and Redo in Software Engineering Decision Making Xiang Zhao Yuriy Brun Leon J. Osterweil School of Computer Science University of Massachusetts Amherst, MA, USA {xiang,brun,ljo}@cs.umass.edu

More information

Coding and Unit Testing! The Coding Phase! Coding vs. Code! Coding! Overall Coding Language Trends!

Coding and Unit Testing! The Coding Phase! Coding vs. Code! Coding! Overall Coding Language Trends! Requirements Spec. Design Coding and Unit Testing Characteristics of System to be built must match required characteristics (high level) Architecture consistent views Software Engineering Computer Science

More information

Applying Real-Time Scheduling Techniques to Software Processes: A Position Paper

Applying Real-Time Scheduling Techniques to Software Processes: A Position Paper To Appear in Proc. of the 8th European Workshop on Software Process Technology.19-21 June 2001. Witten, Germany. Applying Real-Time Scheduling Techniques to Software Processes: A Position Paper Aaron G.

More information

Little-JIL/Juliette: A Process Definition Language and Interpreter

Little-JIL/Juliette: A Process Definition Language and Interpreter Proceedings of the 22nd International Conference on Software Engineering. 4-11 June 2000. Limerick, Ireland. Pages 754 757. Little-JIL/Juliette: A Process Definition Language and Interpreter Aaron G. Cass

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis Test Loads CS 147: Computer Systems Performance Analysis Test Loads 1 / 33 Overview Overview Overview 2 / 33 Test Load Design Test Load Design Test Load Design

More information

Solve the Data Flow Problem

Solve the Data Flow Problem Gaining Condence in Distributed Systems Gleb Naumovich, Lori A. Clarke, and Leon J. Osterweil University of Massachusetts, Amherst Computer Science Department University of Massachusetts Amherst, Massachusetts

More information

Overview. Scientific workflows and Grids. Kepler revisited Data Grids. Taxonomy Example systems. Chimera GridDB

Overview. Scientific workflows and Grids. Kepler revisited Data Grids. Taxonomy Example systems. Chimera GridDB Grids and Workflows Overview Scientific workflows and Grids Taxonomy Example systems Kepler revisited Data Grids Chimera GridDB 2 Workflows and Grids Given a set of workflow tasks and a set of resources,

More information

Issues in the Development of Transactional Web Applications R. D. Johnson D. Reimer IBM Systems Journal Vol 43, No 2, 2004 Presenter: Barbara Ryder

Issues in the Development of Transactional Web Applications R. D. Johnson D. Reimer IBM Systems Journal Vol 43, No 2, 2004 Presenter: Barbara Ryder Issues in the Development of Transactional Web Applications R. D. Johnson D. Reimer IBM Systems Journal Vol 43, No 2, 2004 Presenter: Barbara Ryder 3/21/05 CS674 BGR 1 Web Applications Transactional processing

More information

Data Governance. Mark Plessinger / Julie Evans December /7/2017

Data Governance. Mark Plessinger / Julie Evans December /7/2017 Data Governance Mark Plessinger / Julie Evans December 2017 12/7/2017 Agenda Introductions (15) Background (30) Definitions Fundamentals Roadmap (15) Break (15) Framework (60) Foundation Disciplines Engagements

More information

Static Analysis! Prof. Leon J. Osterweil! CS 520/620! Fall 2012! Characteristics of! System to be! built must! match required! characteristics!

Static Analysis! Prof. Leon J. Osterweil! CS 520/620! Fall 2012! Characteristics of! System to be! built must! match required! characteristics! Static Analysis! Prof. Leon J. Osterweil! CS 520/620! Fall 2012! Requirements Spec.! Design! Test Results must! match required behavior! Characteristics of! System to be! built must! match required! characteristics!

More information

CS-XXX: Graduate Programming Languages. Lecture 9 Simply Typed Lambda Calculus. Dan Grossman 2012

CS-XXX: Graduate Programming Languages. Lecture 9 Simply Typed Lambda Calculus. Dan Grossman 2012 CS-XXX: Graduate Programming Languages Lecture 9 Simply Typed Lambda Calculus Dan Grossman 2012 Types Major new topic worthy of several lectures: Type systems Continue to use (CBV) Lambda Caluclus as our

More information

CS:2820 (22C:22) Object-Oriented Software Development

CS:2820 (22C:22) Object-Oriented Software Development The University of Iowa CS:2820 (22C:22) Object-Oriented Software Development! Spring 2015 Software Complexity by Cesare Tinelli Complexity Software systems are complex artifacts Failure to master this

More information

Repetition Through Recursion

Repetition Through Recursion Fundamentals of Computer Science I (CS151.02 2007S) Repetition Through Recursion Summary: In many algorithms, you want to do things again and again and again. For example, you might want to do something

More information

Integration With the Business Modeler

Integration With the Business Modeler Decision Framework, J. Duggan Research Note 11 September 2003 Evaluating OOA&D Functionality Criteria Looking at nine criteria will help you evaluate the functionality of object-oriented analysis and design

More information

Modeling Dependencies for Cascading Selective Undo

Modeling Dependencies for Cascading Selective Undo Modeling Dependencies for Cascading Selective Undo Aaron G. Cass and Chris S. T. Fernandes Union College, Schenectady, NY 12308, USA, {cassa fernandc}@union.edu Abstract. Linear and selective undo mechanisms

More information

Test and Evaluation of Autonomous Systems in a Model Based Engineering Context

Test and Evaluation of Autonomous Systems in a Model Based Engineering Context Test and Evaluation of Autonomous Systems in a Model Based Engineering Context Raytheon Michael Nolan USAF AFRL Aaron Fifarek Jonathan Hoffman 3 March 2016 Copyright 2016. Unpublished Work. Raytheon Company.

More information

Actor-Oriented Design: Concurrent Models as Programs

Actor-Oriented Design: Concurrent Models as Programs Actor-Oriented Design: Concurrent Models as Programs Edward A. Lee Professor, UC Berkeley Director, Center for Hybrid and Embedded Software Systems (CHESS) Parc Forum Palo Alto, CA May 13, 2004 Abstract

More information

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System Jianwu Wang, Ilkay Altintas Scientific Workflow Automation Technologies Lab SDSC, UCSD project.org UCGrid Summit,

More information

Modeling Dependencies for Cascading Selective Undo

Modeling Dependencies for Cascading Selective Undo Modeling Dependencies for Cascading Selective Undo Aaron G. Cass and Chris S. T. Fernandes Union College, Schenectady, NY 12308, USA, {cassa fernandc}@union.edu Abstract. Linear and selective undo mechanisms

More information

Managing the Evolution of Dataflows with VisTrails

Managing the Evolution of Dataflows with VisTrails Managing the Evolution of Dataflows with VisTrails Juliana Freire http://www.cs.utah.edu/~juliana University of Utah Joint work with: Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger, Claudio

More information

Project Name. The Eclipse Integrated Computational Environment. Jay Jay Billings, ORNL Parent Project. None selected yet.

Project Name. The Eclipse Integrated Computational Environment. Jay Jay Billings, ORNL Parent Project. None selected yet. Project Name The Eclipse Integrated Computational Environment Jay Jay Billings, ORNL 20140219 Parent Project None selected yet. Background The science and engineering community relies heavily on modeling

More information

Introduction to Data Science

Introduction to Data Science UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics

More information

FORMALIZED SOFTWARE DEVELOPMENT IN AN INDUSTRIAL ENVIRONMENT

FORMALIZED SOFTWARE DEVELOPMENT IN AN INDUSTRIAL ENVIRONMENT FORMALIZED SOFTWARE DEVELOPMENT IN AN INDUSTRIAL ENVIRONMENT Otthein Herzog IBM Germany, Dept. 3100 P.O.Box 80 0880 D-7000 STUTTGART, F. R. G. ABSTRACT tn the IBM Boeblingen Laboratory some software was

More information

AADL Graphical Editor Design

AADL Graphical Editor Design AADL Graphical Editor Design Peter Feiler Software Engineering Institute phf@sei.cmu.edu Introduction An AADL specification is a set of component type and implementation declarations. They are organized

More information

Absolute C++ Walter Savitch

Absolute C++ Walter Savitch Absolute C++ sixth edition Walter Savitch Global edition This page intentionally left blank Absolute C++, Global Edition Cover Title Page Copyright Page Preface Acknowledgments Brief Contents Contents

More information

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 1 Database Systems

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 1 Database Systems Database Systems: Design, Implementation, and Management Tenth Edition Chapter 1 Database Systems Objectives In this chapter, you will learn: The difference between data and information What a database

More information

Introduction to Data Management for Ocean Science Research

Introduction to Data Management for Ocean Science Research Introduction to Data Management for Ocean Science Research Cyndy Chandler Biological and Chemical Oceanography Data Management Office 12 November 2009 Ocean Acidification Short Course Woods Hole, MA USA

More information

Automatic Fault Tree Derivation from Little-JIL Process Definitions

Automatic Fault Tree Derivation from Little-JIL Process Definitions Automatic Fault Tree Derivation from Little-JIL Process Definitions Bin Chen, George S. Avrunin, Lori A. Clarke, and Leon J. Osterweil Department of Computer Science, University of Massachusetts, Amherst,

More information

Developing a Research Data Policy

Developing a Research Data Policy Developing a Research Data Policy Core Elements of the Content of a Research Data Management Policy This document may be useful for defining research data, explaining what RDM is, illustrating workflows,

More information

Massive Data Algorithmics

Massive Data Algorithmics In the name of Allah Massive Data Algorithmics An Introduction Overview MADALGO SCALGO Basic Concepts The TerraFlow Project STREAM The TerraStream Project TPIE MADALGO- Introduction Center for MAssive

More information

Open Data Standards for Administrative Data Processing

Open Data Standards for Administrative Data Processing University of Pennsylvania ScholarlyCommons 2018 ADRF Network Research Conference Presentations ADRF Network Research Conference Presentations 11-2018 Open Data Standards for Administrative Data Processing

More information

7 th International Digital Curation Conference December 2011

7 th International Digital Curation Conference December 2011 Golden Trail 1 Golden-Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository Practice Paper Paolo Missier, Newcastle University, UK Bertram Ludäscher, Saumen Dey, Michael

More information

Minsoo Ryu. College of Information and Communications Hanyang University.

Minsoo Ryu. College of Information and Communications Hanyang University. Software Reuse and Component-Based Software Engineering Minsoo Ryu College of Information and Communications Hanyang University msryu@hanyang.ac.kr Software Reuse Contents Components CBSE (Component-Based

More information

Managing Exploratory Workflows

Managing Exploratory Workflows Managing Exploratory Workflows Juliana Freire Claudio T. Silva http://www.sci.utah.edu/~vgc/vistrails/ University of Utah Joint work with: Erik Andersen, Steven P. Callahan, David Koop, Emanuele Santos,

More information

Outline. Proof Carrying Code. Hardware Trojan Threat. Why Compromise HDL Code?

Outline. Proof Carrying Code. Hardware Trojan Threat. Why Compromise HDL Code? Outline Proof Carrying Code Mohammad Tehranipoor ECE6095: Hardware Security & Trust University of Connecticut ECE Department Hardware IP Verification Hardware PCC Background Software PCC Description Software

More information

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program and Code Improvement Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program Review Front end code Source code analysis Syntax tree Back end code Target code

More information

Reasoning About Programs Panagiotis Manolios

Reasoning About Programs Panagiotis Manolios Reasoning About Programs Panagiotis Manolios Northeastern University February 26, 2017 Version: 100 Copyright c 2017 by Panagiotis Manolios All rights reserved. We hereby grant permission for this publication

More information

18-642: Software Architecture & High Level Design

18-642: Software Architecture & High Level Design 18-642: Software Architecture & High Level Design 9/27/2018 All the really important mistakes are made the first day. Eberhardt Rechtin, System Architecting 1 YOU ARE HERE Product Requirements SPECIFY

More information

NAPA COUNTY RESPONSE TO THE GRAND JURY FINAL REPORT NAPA COUNTY S WEBSITE NEEDS IMPROVEMENT MAY 24, 2016

NAPA COUNTY RESPONSE TO THE GRAND JURY FINAL REPORT NAPA COUNTY S WEBSITE NEEDS IMPROVEMENT MAY 24, 2016 NAPA COUNTY RESPONSE TO THE GRAND JURY FINAL REPORT NAPA COUNTY S WEBSITE NEEDS IMPROVEMENT MAY 24, 2016 The Grand Jury requested responses from the Board of Supervisors, County Executive Officer, Library

More information

Data Flow Diagram" Answering many of these questions can be facilitated by allowing hierarchical decomposition"

Data Flow Diagram Answering many of these questions can be facilitated by allowing hierarchical decomposition Computer Science 520/620 Spring 2011 Prof. L. Osterweil" Software Models and Representations" Part 2" Consistency is a principal concern" Are the diagrams consistent with each other?" Top view consistent

More information

Finding Firmware Defects Class T-18 Sean M. Beatty

Finding Firmware Defects Class T-18 Sean M. Beatty Sean Beatty Sean Beatty is a Principal with High Impact Services in Indianapolis. He holds a BSEE from the University of Wisconsin - Milwaukee. Sean has worked in the embedded systems field since 1986,

More information

Calibrating HART Transmitters. HCF_LIT-054, Revision 1.1

Calibrating HART Transmitters. HCF_LIT-054, Revision 1.1 Calibrating HART Transmitters HCF_LIT-054, Revision 1.1 Release Date: November 19, 2008 Date of Publication: November 19, 2008 Document Distribution / Maintenance Control / Document Approval To obtain

More information

Computer Science 520/620 Spring 2013 Prof. L. Osterweil" Use Cases" Software Models and Representations" Part 4" More, and Multiple Models"

Computer Science 520/620 Spring 2013 Prof. L. Osterweil Use Cases Software Models and Representations Part 4 More, and Multiple Models Computer Science 520/620 Spring 2013 Prof. L. Osterweil Software Models and Representations Part 4 More, and Multiple Models Use Cases Specify actors and how they interact with various component parts

More information

Computer Science 520/620 Spring 2013 Prof. L. Osterweil" Software Models and Representations" Part 4" More, and Multiple Models" Use Cases"

Computer Science 520/620 Spring 2013 Prof. L. Osterweil Software Models and Representations Part 4 More, and Multiple Models Use Cases Computer Science 520/620 Spring 2013 Prof. L. Osterweil Software Models and Representations Part 4 More, and Multiple Models Use Cases Specify actors and how they interact with various component parts

More information

CS 241 Honors Memory

CS 241 Honors Memory CS 241 Honors Memory Ben Kurtovic Atul Sandur Bhuvan Venkatesh Brian Zhou Kevin Hong University of Illinois Urbana Champaign February 20, 2018 CS 241 Course Staff (UIUC) Memory February 20, 2018 1 / 35

More information

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine. 1. true / false By a compiler we mean a program that translates to code that will run natively on some machine. 2. true / false ML can be compiled. 3. true / false FORTRAN can reasonably be considered

More information

Vocabulary-Driven Enterprise Architecture Development Guidelines for DoDAF AV-2: Design and Development of the Integrated Dictionary

Vocabulary-Driven Enterprise Architecture Development Guidelines for DoDAF AV-2: Design and Development of the Integrated Dictionary Vocabulary-Driven Enterprise Architecture Development Guidelines for DoDAF AV-2: Design and Development of the Integrated Dictionary December 17, 2009 Version History Version Publication Date Author Description

More information

History of object-oriented approaches

History of object-oriented approaches Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr http://www.yildiz.edu.tr/~naydin Object-Oriented Oriented Systems Analysis and Design with the UML Objectives: Understand the basic characteristics of object-oriented

More information

CS 360 Programming Languages Interpreters

CS 360 Programming Languages Interpreters CS 360 Programming Languages Interpreters Implementing PLs Most of the course is learning fundamental concepts for using and understanding PLs. Syntax vs. semantics vs. idioms. Powerful constructs like

More information

System Development Life Cycle Methods/Approaches/Models

System Development Life Cycle Methods/Approaches/Models Week 11 System Development Life Cycle Methods/Approaches/Models Approaches to System Development System Development Life Cycle Methods/Approaches/Models Waterfall Model Prototype Model Spiral Model Extreme

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

Intermediate Representation (IR)

Intermediate Representation (IR) Intermediate Representation (IR) Components and Design Goals for an IR IR encodes all knowledge the compiler has derived about source program. Simple compiler structure source code More typical compiler

More information

Programming Languages

Programming Languages Programming Languages As difficult to discuss rationally as religion or politics. Prone to extreme statements devoid of data. Examples: "It is practically impossible to teach good programming to students

More information

Roy Fielding s PHD Dissertation. Chapter s 5 & 6 (REST)

Roy Fielding s PHD Dissertation. Chapter s 5 & 6 (REST) Roy Fielding s PHD Dissertation Chapter s 5 & 6 (REST) Architectural Styles and the Design of Networkbased Software Architectures Roy Fielding University of California - Irvine 2000 Chapter 5 Representational

More information

Formal Verification! Prof. Leon Osterweil! Computer Science 520/620! Spring 2012!

Formal Verification! Prof. Leon Osterweil! Computer Science 520/620! Spring 2012! Formal Verification Prof. Leon Osterweil Computer Science 520/620 Spring 2012 Relations and Analysis A software product consists of A collection of (types of) artifacts Related to each other by myriad

More information

Working Together In Communities of Practice with Metadata Standards

Working Together In Communities of Practice with Metadata Standards International Open Government Data Conference Working Together In Communities of Practice with Metadata Standards Sandra Cannon, Ph.D., Assistant Director, Division of Research and Statistics, Board of

More information

DEPARTMENT OF COMPUTER SCIENCE

DEPARTMENT OF COMPUTER SCIENCE Department of Computer Science 1 DEPARTMENT OF COMPUTER SCIENCE Office in Computer Science Building, Room 279 (970) 491-5792 cs.colostate.edu (http://www.cs.colostate.edu) Professor L. Darrell Whitley,

More information

Violations of the contract are exceptions, and are usually handled by special language constructs. Design by contract

Violations of the contract are exceptions, and are usually handled by special language constructs. Design by contract Specification and validation [L&G Ch. 9] Design patterns are a useful way to describe program structure. They provide a guide as to how a program fits together. Another dimension is the responsibilities

More information

Lab 3: Digitizing in ArcGIS Pro

Lab 3: Digitizing in ArcGIS Pro Lab 3: Digitizing in ArcGIS Pro What You ll Learn: In this Lab you ll be introduced to basic digitizing techniques using ArcGIS Pro. You should read Chapter 4 in the GIS Fundamentals textbook before starting

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

D2K Support for Standard Content Repositories: Design Notes. Request For Comments

D2K Support for Standard Content Repositories: Design Notes. Request For Comments D2K Support for Standard Content Repositories: Design Notes Request For Comments Abstract This note discusses software developments to integrate D2K [1] with Tupelo [6] and other similar repositories.

More information

Using Metadata Queries To Build Row-Level Audit Reports in SAS Visual Analytics

Using Metadata Queries To Build Row-Level Audit Reports in SAS Visual Analytics SAS6660-2016 Using Metadata Queries To Build Row-Level Audit Reports in SAS Visual Analytics ABSTRACT Brandon Kirk and Jason Shoffner, SAS Institute Inc., Cary, NC Sensitive data requires elevated security

More information

Computer Science 520/620 Spring 2014 Prof. L. Osterweil" Use Cases" Software Models and Representations" Part 4" More, and Multiple Models"

Computer Science 520/620 Spring 2014 Prof. L. Osterweil Use Cases Software Models and Representations Part 4 More, and Multiple Models Computer Science 520/620 Spring 2014 Prof. L. Osterweil Software Models and Representations Part 4 More, and Multiple Models Use Cases Specify actors and how they interact with various component parts

More information

Section 1: True / False (2 points each, 30 pts total)

Section 1: True / False (2 points each, 30 pts total) Section 1: True / False (2 points each, 30 pts total) Circle the word TRUE or the word FALSE. If neither is circled, both are circled, or it impossible to tell which is circled, your answer will be considered

More information

PANDA A System for Provenance and Data. Example: Sales Prediction Workflow. Example: Sales Prediction Workflow. Backward Tracing. Item Sales.

PANDA A System for Provenance and Data. Example: Sales Prediction Workflow. Example: Sales Prediction Workflow. Backward Tracing. Item Sales. PANDA A System for Provenance and Data Example: Prediction Workflow Union Predict Agg -1 s 2 Example: Prediction Workflow Backward Tracing Union Predict Agg -1 s Name Amelie Jacques Isabelle Name Address

More information

Continued Investigation of Small-Scale Air-Sea Coupled Dynamics Using CBLAST Data

Continued Investigation of Small-Scale Air-Sea Coupled Dynamics Using CBLAST Data Continued Investigation of Small-Scale Air-Sea Coupled Dynamics Using CBLAST Data Dick K.P. Yue Center for Ocean Engineering Department of Mechanical Engineering Massachusetts Institute of Technology Cambridge,

More information

Process and data flow modeling

Process and data flow modeling Process and data flow modeling Vince Molnár Informatikai Rendszertervezés BMEVIMIAC01 Budapest University of Technology and Economics Fault Tolerant Systems Research Group Budapest University of Technology

More information

System R and the Relational Model

System R and the Relational Model IC-65 Advances in Database Management Systems Roadmap System R and the Relational Model Intro Codd s paper System R - design Anastasia Ailamaki www.cs.cmu.edu/~natassa 2 The Roots The Roots Codd (CACM

More information

challenges in domain-specific modeling raphaël mannadiar august 27, 2009

challenges in domain-specific modeling raphaël mannadiar august 27, 2009 challenges in domain-specific modeling raphaël mannadiar august 27, 2009 raphaël mannadiar challenges in domain-specific modeling 1/59 outline 1 introduction 2 approaches 3 debugging and simulation 4 differencing

More information

Information System Support for Analytical Methods

Information System Support for Analytical Methods Information System Support for Analytical Methods Martin Doerr Center for Cultural Informatics, Institute of Computer Science Foundation for Research and Technology - Hellas Heraklion, October 18, 2017

More information

Alan J. Perlis - Epigrams on Programming

Alan J. Perlis - Epigrams on Programming Programming Languages (CS302 2007S) Alan J. Perlis - Epigrams on Programming Comments on: Perlis, Alan J. (1982). Epigrams on Programming. ACM SIGPLAN Notices 17(9), September 1982, pp. 7-13. 1. One man

More information

Objective and Subjective Specifications

Objective and Subjective Specifications 2017-7-10 0 Objective and Subjective Specifications Eric C.R. Hehner Department of Computer Science, University of Toronto hehner@cs.utoronto.ca Abstract: We examine specifications for dependence on the

More information

Scripted Components: Problem. Scripted Components. Problems with Components. Single-Language Assumption. Dr. James A. Bednar

Scripted Components: Problem. Scripted Components. Problems with Components. Single-Language Assumption. Dr. James A. Bednar Scripted Components: Problem Scripted Components Dr. James A. Bednar jbednar@inf.ed.ac.uk http://homepages.inf.ed.ac.uk/jbednar (Cf. Reuse-Oriented Development; Sommerville 2004 Chapter 4, 18) A longstanding

More information

Scripted Components Dr. James A. Bednar

Scripted Components Dr. James A. Bednar Scripted Components Dr. James A. Bednar jbednar@inf.ed.ac.uk http://homepages.inf.ed.ac.uk/jbednar SAPM Spring 2012: Scripted Components 1 Scripted Components: Problem (Cf. Reuse-Oriented Development;

More information

USING THE BUSINESS PROCESS EXECUTION LANGUAGE FOR MANAGING SCIENTIFIC PROCESSES. Anna Malinova, Snezhana Gocheva-Ilieva

USING THE BUSINESS PROCESS EXECUTION LANGUAGE FOR MANAGING SCIENTIFIC PROCESSES. Anna Malinova, Snezhana Gocheva-Ilieva International Journal "Information Technologies and Knowledge" Vol.2 / 2008 257 USING THE BUSINESS PROCESS EXECUTION LANGUAGE FOR MANAGING SCIENTIFIC PROCESSES Anna Malinova, Snezhana Gocheva-Ilieva Abstract:

More information

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information

More information

Initiate a PSI-BLAST search simply by choosing the option on the BLAST input form.

Initiate a PSI-BLAST search simply by choosing the option on the BLAST input form. 1 2 Initiate a PSI-BLAST search simply by choosing the option on the BLAST input form. But note: invoking the algorithm is trivial. Using it correctly and interpreting the results, perhaps not so much.

More information

Quality Assured (QA) data

Quality Assured (QA) data Quality Assured (QA) data Towards DOI quality of data generated at the UFZ Mark Frenzel (Ecologist) & Thomas Schnicke (IT) DataCite / Helmholtz Open Science Workshop Leipzig, 12.01.2016 QA + DOI: Best

More information

Data Curation Profile Human Genomics

Data Curation Profile Human Genomics Data Curation Profile Human Genomics Profile Author Profile Author Institution Name Contact J. Carlson N. Brown Purdue University J. Carlson, jrcarlso@purdue.edu Date of Creation October 27, 2009 Date

More information

Software Testing and Maintenance 1. Introduction Product & Version Space Interplay of Product and Version Space Intensional Versioning Conclusion

Software Testing and Maintenance 1. Introduction Product & Version Space Interplay of Product and Version Space Intensional Versioning Conclusion Today s Agenda Quiz 3 HW 4 Posted Version Control Software Testing and Maintenance 1 Outline Introduction Product & Version Space Interplay of Product and Version Space Intensional Versioning Conclusion

More information

Design Rationale Modeling Representations

Design Rationale Modeling Representations Agenda Design Rationale Modeling Representations EE382V Software Architecture and Design Intent Pankaj Adhikari, Bill Reinhart 7 MAR 2006 Why Rationale? Design Rationale s Roots Modeling Rationale IBIS

More information

Data management is fun. Casey Dunn Assistant Professor Ecology and Evolutionary Biology

Data management is fun. Casey Dunn Assistant Professor Ecology and Evolutionary Biology Data management is fun Casey Dunn Assistant Professor Ecology and Evolutionary Biology What is science? The study of the natural world through observation and experiment. Reproducible study. Prove it isn

More information

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES (2018-19) (ODD) Code Optimization Prof. Jonita Roman Date: 30/06/2018 Time: 9:45 to 10:45 Venue: MCA

More information

Problem Solving with C++

Problem Solving with C++ GLOBAL EDITION Problem Solving with C++ NINTH EDITION Walter Savitch Kendrick Mock Ninth Edition PROBLEM SOLVING with C++ Problem Solving with C++, Global Edition Cover Title Copyright Contents Chapter

More information

Wade Sheldon. Georgia Coastal Ecosystems LTER University of Georgia

Wade Sheldon. Georgia Coastal Ecosystems LTER University of Georgia Wade Sheldon Georgia Coastal Ecosystems LTER University of Georgia email: sheldon@uga.edu Regardless of Q/A procedures, data quality issues guaranteed with environmental sensor data Without good Q/C data

More information

A PRIMITIVE EXECUTION MODEL FOR HETEROGENEOUS MODELING

A PRIMITIVE EXECUTION MODEL FOR HETEROGENEOUS MODELING A PRIMITIVE EXECUTION MODEL FOR HETEROGENEOUS MODELING Frédéric Boulanger Supélec Département Informatique, 3 rue Joliot-Curie, 91192 Gif-sur-Yvette cedex, France Email: Frederic.Boulanger@supelec.fr Guy

More information

RAISE in Perspective

RAISE in Perspective RAISE in Perspective Klaus Havelund NASA s Jet Propulsion Laboratory, Pasadena, USA Klaus.Havelund@jpl.nasa.gov 1 The Contribution of RAISE The RAISE [6] Specification Language, RSL, originated as a development

More information

The SPL Programming Language Reference Manual

The SPL Programming Language Reference Manual The SPL Programming Language Reference Manual Leonidas Fegaras University of Texas at Arlington Arlington, TX 76019 fegaras@cse.uta.edu February 27, 2018 1 Introduction The SPL language is a Small Programming

More information

EXPERT SERVICES FOR IoT CYBERSECURITY AND RISK MANAGEMENT. An Insight Cyber White Paper. Copyright Insight Cyber All rights reserved.

EXPERT SERVICES FOR IoT CYBERSECURITY AND RISK MANAGEMENT. An Insight Cyber White Paper. Copyright Insight Cyber All rights reserved. EXPERT SERVICES FOR IoT CYBERSECURITY AND RISK MANAGEMENT An Insight Cyber White Paper Copyright Insight Cyber 2018. All rights reserved. The Need for Expert Monitoring Digitization and external connectivity

More information

Concept as a Generalization of Class and Principles of the Concept-Oriented Programming

Concept as a Generalization of Class and Principles of the Concept-Oriented Programming Computer Science Journal of Moldova, vol.13, no.3(39), 2005 Concept as a Generalization of Class and Principles of the Concept-Oriented Programming Alexandr Savinov Abstract In the paper we describe a

More information

Build and Provision: Two Sides of the Coin We Love to Hate

Build and Provision: Two Sides of the Coin We Love to Hate Build and Provision: Two Sides of the Coin We Love to Hate Ed Merks Eclipse Modeling Project Lead 1 The Software Pipeline Software artifacts flow from developer to developer and ultimately to the clients

More information

HDF5 Single Writer/Multiple Reader Feature Design and Semantics

HDF5 Single Writer/Multiple Reader Feature Design and Semantics HDF5 Single Writer/Multiple Reader Feature Design and Semantics The HDF Group Document Version 5.2 This document describes the design and semantics of HDF5 s single writer/multiplereader (SWMR) feature.

More information

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego Scientific Workflow Tools Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego 1 escience Today Increasing number of Cyberinfrastructure (CI) technologies Data Repositories: Network

More information

Categorizing Migrations

Categorizing Migrations What to Migrate? Categorizing Migrations A version control repository contains two distinct types of data. The first type of data is the actual content of the directories and files themselves which are

More information