Final Project Assignments. Promoter Identification Workflow (PIW)

Similar documents
KEPLER: Overview and Project Status

Scientific Workflows

Scientific Data & Workflow Engineering. Outline

ECS289F Winter 05 Scientific Data Management

The Ptolemy II Framework for Visual Languages

Scientific Data Integration: From the Big Picture to some Gory Details

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System

Actor-Oriented Design and The Ptolemy II framework

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life

Building Unreliable Systems out of Reliable Components: The Real Time Story

Modal Models in Ptolemy

The Future of the Ptolemy Project

Where we are so far. Intro to Data Integration (Datalog, mediators, ) more to come (your projects!): schema matching, simple query rewriting

Kepler: An Extensible System for Design and Execution of Scientific Workflows

San Diego Supercomputer Center, UCSD, U.S.A. The Consortium for Conservation Medicine, Wildlife Trust, U.S.A.

UC Berkeley Mobies Technology Project

The Gigascale Silicon Research Center

A High-Level Distributed Execution Framework for Scientific Workflows

Using Web Services and Scientific Workflow for Species Distribution Prediction Modeling 1

Process-Based Software Components Final Mobies Presentation

System-Level Design Languages: Orthogonalizing the Issues

7 th International Digital Curation Conference December 2011

Component-Based Design of Embedded Control Systems

A(nother) Vision of ppod Data Integration!?

Concurrent Component Patterns, Models of Computation, and Types

Automatic Transformation from Geospatial Conceptual Workflow to Executable Workflow Using GRASS GIS Command Line Modules in Kepler *

Component-Based Design of Embedded Control Systems

Embedded Software from Concurrent Component Models

Workflow Exchange and Archival: The KSW File and the Kepler Object Manager. Shawn Bowers (For Chad Berkley & Matt Jones)

Hierarchical FSMs with Multiple CMs

METROII AND PTOLEMYII INTEGRATION. Presented by: Shaoyi Cheng, Tatsuaki Iwata, Brad Miller, Avissa Tehrani

Actor-Oriented Design: Concurrent Models as Programs

Provenance Collection Support in the Kepler Scientific Workflow System

A Three Tier Architecture for LiDAR Interpolation and Analysis

A High-Level Distributed Execution Framework for Scientific Workflows

Advanced Tool Architectures. Edited and Presented by Edward A. Lee, Co-PI UC Berkeley. Tool Projects. Chess Review May 10, 2004 Berkeley, CA

Giotto Domain. 5.1 Introduction. 5.2 Using Giotto. Edward Lee Christoph Kirsch

Workflow Fault Tolerance for Kepler. Sven Köhler, Thimothy McPhillips, Sean Riddle, Daniel Zinn, Bertram Ludäscher

Hands-on tutorial on usage the Kepler Scientific Workflow System

Enhanced Access to High-Resolution LiDAR Topography through Cyberinfrastructure-Based Data Distribution and Processing

Introduction to Grid Computing

Heterogeneous Workflows in Scientific Workflow Systems

Hybrid System Modeling: Operational Semantics Issues

Accelerating Parameter Sweep Workflows by Utilizing Ad-hoc Network Computing Resources: an Ecological Example

A Dataflow-Oriented Atomicity and Provenance System for Pipelined Scientific Workflows

DESIGN AND SIMULATION OF HETEROGENEOUS CONTROL SYSTEMS USING PTOLEMY II

Overview of the Ptolemy Project

Embedded System Design and Modeling EE382V, Fall 2008

2/12/11. Addendum (different syntax, similar ideas): XML, JSON, Motivation: Why Scientific Workflows? Scientific Workflows

Hybrid-Type Extensions for Actor-Oriented Modeling (a.k.a. Semantic Data-types for Kepler) Shawn Bowers & Bertram Ludäscher

System-Level Design Languages: Orthogonalizing the Issues. Kees Vissers

Discrete-Event Modeling and Design of Embedded Software

Concurrent Models of Computation

A geoinformatics-based approach to the distribution and processing of integrated LiDAR and imagery data to enhance 3D earth systems research

Nimrod/K: Towards Massively Parallel Dynamic Grid Workflows

Future Directions. Edward A. Lee. Berkeley, CA May 12, A New Computational Platform: Ubiquitous Networked Embedded Systems. actuate.

Extending Ptolemy II

Kepler User Manual Version October, 2010

Concurrent Models of Computation for Embedded Software

Concurrent Models of Computation

USING THE BUSINESS PROCESS EXECUTION LANGUAGE FOR MANAGING SCIENTIFIC PROCESSES. Anna Malinova, Snezhana Gocheva-Ilieva

Kepler and Grid Systems -- Early Efforts --

VOLUME 1: INTRODUCTION TO PTOLEMY II. Document Version 3.0 for use with Ptolemy II 3.0 July 16, 2003

Ptolemy II The automotive challenge problems version 4.1

Datagridflows: Managing Long-Run Processes on Datagrids

Parameter Space Exploration using Scientific Workflows

Architectural Styles I

fakultät für informatik informatik 12 technische universität dortmund Data flow models Peter Marwedel TU Dortmund, Informatik /10/08

Understandable Concurrency

Managing Rapidly-Evolving Scientific Workflows

Interface Automata and Actif Actors

ISSN: Supporting Collaborative Tool of A New Scientific Workflow Composition

The International Journal of Digital Curation Volume 7, Issue

Overview. Scientific workflows and Grids. Kepler revisited Data Grids. Taxonomy Example systems. Chimera GridDB

Dynamic Dataflow Modeling in Ptolemy II. by Gang Zhou. Research Project

Appendix A - Glossary(of OO software term s)

Architectural Styles. Software Architecture Lecture 5. Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved.

6/20/2018 CS5386 SOFTWARE DESIGN & ARCHITECTURE LECTURE 5: ARCHITECTURAL VIEWS C&C STYLES. Outline for Today. Architecture views C&C Views

Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example

Integrated Machine Learning in the Kepler Scientific Workflow System

Ant-based interactive workflow management: a case study on RMI

7. System Design: Addressing Design Goals

Simulation of LET Models in Simulink and Ptolemy

Building Synchronous DataFlow graphs with UML & MARTE/CCSL

Static Analysis of Actor Networks

Concurrent Programming Method for Embedded Systems

Internet Application Developer

Big Data Applications using Workflows for Data Parallel Computing

Multithreading and Interactive Programs

SDF Domain. 3.1 Purpose of the Domain. 3.2 Using SDF Deadlock. Steve Neuendorffer

Orleans. Actors for High-Scale Services. Sergey Bykov extreme Computing Group, Microsoft Research

Automating Real-time Seismic Analysis

Concurrency Demands New Foundations for Computing

Introduction 2 The first synchronous language (early 80 s) Gérard Berry and his team (École des Mines de Paris / INRIA Sophia-Antipolis) Imperative, s

SDL. Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund, Informatik 年 10 月 18 日. technische universität dortmund

Architectural Design. Architectural Design. Software Architecture. Architectural Models

Overview of Dataflow Languages. Waheed Ahmad

Scientific Workflow, Provenance, and Data Modeling Challenges and Approaches

Compositionality in system design: interfaces everywhere! UC Berkeley

Transcription:

Final Project Assignments Schema Matching Ji-Yeong Chong Biological Pathways & Ontologies Russell D Sa FCA Theory and Practice Bill Man, Betty Chan Practice of Data Integration (GAV) Jenny Wang Kepler/Data Analysis (Biodiversity) Mike Kofi (Biodiversity) Carlos Rueda (visualization focus) Kepler/Data-intensive & SRB Tim Wong Kepler/ROADnet/Geostreams Haiyan Yang Promoter Identification Workflow (PIW) Source: Matt Coleman (LLNL) KEPLER/CSP: Contributors, Sponsors, Projects (or loosely coupled Communicating Sequential Persons ;-) Source: NIH BIRN (Jeffrey Grethe, UCSD) Ilkay Altintas SDM, Resurgence Kim Baldridge Resurgence, NMI Chad Berkley SEEK Shawn Bowers SEEK Terence Critchlow SDM Tobin Fricke ROADNet Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Dan Higgins SEEK Efrat Jaeger GEON Matt Jones SEEK Werner Krebs, EOL Edward A. Lee Ptolemy II Kai Lin GEON Bertram Ludaescher SDM, SEEK, GEON, BIRN, ROADNet Mark Miller EOL Steve Mock NMI Steve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Xiaowen Xin SDM Yang Zhao Ptolemy II Bing Zhu SEEK www.kepler-project.org project.org Ptolemy II 1

GEON Dataset Generation & Registration (a co-development in KEPLER) % Makefile Makefile $> $> ant ant run run Matt,Chad, Dan et al. (SEEK) SQL database access (JDBC) Ptolemy II/KEPLER GUI (Vergil) Directors define the component interaction & execution semantics Efrat (GEON) Ilkay (SDM) Yang (Ptolemy) Xiaowen (SDM) Edward et al.(ptolemy) Large, polymorphic component ( Actors ) and Directors libraries (drag & drop) Web Services Actors (WS Harvester) Rapid Web Service-based Prototyping (Here: ROADNet Command & Control Services for LOOKING Kick-Off Mtg) 1 2 4 3 Minute-made (MM) WS-based application integration Similarly: MM workflow design & sharing w/o implemented components Source: Ilkay Altintas, SDM, NLADR ROADNet: Vernon, Orcutt et al Web services: Tony Fountain et al 2

An early example: Promoter Identification SSDBM, AD 2003 PIW Workflow Today Scientist models application as a workflow of connected components ( actors ) If all components exist, the workflow can be automated/ executed Different directors can be used to pick appropriate execution model (often pipelined execution: PN director) Run Window Enter initial inputs, Run and Display results Custom Visualization 3

Job Management (here: NIMROD) Some Recent Actor Additions Job management infrastructure in place Results database: under development Goal: 1000 s of GAMESS jobs (quantum mechanics) in KEPLER (w/ editable script) in KEPLER (interactive session) Source: Dan Higgins, Kepler/SEEK Source: Dan Higgins, Kepler/SEEK 4

Blurring Design (ToDo) and Execution Reengineering a Geoscientist s Mineral Classification Workflow Add semantic types to ports!! Scientific Workflows : Some Findings More dataflow than (business control-/) workflow DiscoveryNet, Kepler, SCIRun, Scitegic, Triana, Taverna,, Need for programming extensions Iterations over lists (foreach); filtering; functional composition; generic & higher-order operations (zip, map(f), ) Need for abstraction and nested workflows Need for data transformations (WS1 DT WS2) Need for rich user interaction & workflow steering: pause / revise / resume select & branch; e.g., web browser capability at specific steps as part of a coordinated SWF Need for high-throughput data transfers and CPU cyles: (Data-)Grid-enabling, streaming Need for persistence of intermediate products and provenance Scientific Workflows vs Business Workflows Scientific Workflows Dataflow and data transformations Data problems: volume, complexity, heterogeneity Grid-aspects Distributed computation Distributed data User-interactions/WF steering Data, tool, and analysis integration Dataflow and control-flow are often married! Business Workflows (BPEL4WS ) Task-orientation: travel reservations; credit approval; BPM; Tasks, documents, etc. undergo modifications (e.g., flight reservation from reserved to ticketed), but modified WF objects still identifiable throughout Complex control flow, complex process composition (danger of control flow/dataflow spaghetti ) Dataflow and control-flow are often divorced! 5

Focus on Actor-Oriented Design Object orientation: call Actor orientation: Input data class name data methods actor name data (state) parameters ports return Output data What flows through an object is sequential control What flows through an object is streams of data Object-Oriented vs. Actor-Oriented Interface Definitions Object Oriented TextToSpeech initialize(): void notify(): void isready(): boolean getspeech(): double[] OO interface definition gives procedures that have to be invoked in an order not specified as part of the interface definition. Actor Oriented AO interface definition says Give me text and I ll give you speech Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ Domains: Semantics for Component Interaction CI Push/pull component interaction CSP concurrent threads with rendezvous CT continuous-time modeling DE discrete-event systems DDE distributed discrete events FSM finite state machines DT discrete time (cycle driven) Giotto synchronous periodic GR 2-D and 3-D graphics PN process networks SDF synchronous dataflow SR synchronous/reactive TM timed multitasking For (coarse grained) Scientific Workflows! Polymorphic Actors: Components Working Across Data Types and Domains Actor Data Polymorphism: Add numbers (int, float, double, Complex) Add strings (concatenation) Add complex types (arrays, records, matrices) Add user-defined types Actor Behavioral Polymorphism: In dataflow, add when all connected inputs have data In a time-triggered model, add when the clock ticks In discrete-event, add when any connected input has data, and add in zero time In process networks, execute an infinite loop in a thread that blocks when reading empty inputs In CSP, execute an infinite loop that performs rendezvous on input or output In push/pull, ports are push or pull (declared or inferred) and behave accordingly In real-time CORBA*, priorities are associated with ports and a dispatcher determines when to add *hey, Ptolemy has been out for long! By not choosing among these when defining the component, we get a huge increment in component reusability. But how do we ensure that the component will work in all these circumstances? Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ 6

Directors and Combining Different Component Interaction Semantics KEPLER & ROADNet: Real-Time Scientific Workflows (Tobin Fricke et al.) Architecture: Seismic Waveforms Images other types of data Straightforward Example: Laser Strainmeter Channels in; Scientific Workflow; Earth-tide signal out Near-real-time database Real-time ORBserver Packet Buffer Target Directions: Complex Processing Results Cross-disciplinary signals analysis Geophysical Stream Algebras Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyii/ Scientific Workflow A Scientific Workflow Problem: Solved map(f)-style iterators Powerful type checking Generic, declarative programming constructs Generic data transformation actors Solution based on declarative, functional dataflow process network (= also a data streaming model!) Higher-order constructs: map(f) no control-flow spaghetti data-intensive apps Forward-only, abstractable subworkflow piw(geneid) free concurrent execution free type checking automatic support to go from piw(geneid) to PIW :=map(piw) over [GeneId] Optimization by Declarative Rewriting I map(f o g) instead of map(f) o map(g) Combination of map and zip PIW as a declarative, referentially transparent functional process optimization via functional rewriting possible e.g. map(f o g) = map(f) o map(g) Technical report &PIW specification in Haskell http://kbis.sdsc.edu/scidac-sdm/scidac-tn-map-constructs.pdf 7

Optimizing II: Streams & Pipelines Source: Source: Real-Time Real-Time Signal Signal Processing: Processing: Dataflow, Dataflow, Visual, Visual, and and Functional Functional Programming, Programming, Hideki Hideki John John Reekie, Reekie, University University of of Technology, Technology, Sydney Sydney Clean functional semantics facilitates algebraic workflow (program) transformations (Bird-Meertens); e.g. maps f maps g maps (f g) Traffic info for a list of highways: Uses iterate (higher-order map ) actor to access highway info web service repeatedly, sending out one email per highway. An (oversimplified) Model of the Grid Shipping and Handling Algebra (SHA) Hosts: {h1, h2, h3, } Data@Hosts: d1@{h i }, d2@{h j }, Functions@Hosts: f1@{h i }, f2@{h j }, f g X Y Z Given: data/workflow: as a functional plan: [ ; Y := f(x); Z := g(y); ] as a logic plan: [ ; f(x,y) g(y,z); ] f@a Logical view x@b y@c plan Y@C = F@A of X@B = 1. [ X@B to A, Y@A := F@A(X@A), Y@A to C ] 2. [ F@A => B, Y@B := F@B(X@B), Y@B to C ] x@b x@b f@a f@a f@a y@c y@c (1) (2) Find Host Assignment: d i h i, f j h j for all d i,f j s.t. [ ; d3@h3 := f@h2(d1@h1), ] is a valid plan 3. [ X@B to C, F@A => C, Y@C := F@C(X@C) ] Physical view: SHA Plans x@b y@c (3) 8