FastFlow: targeting distributed systems Massimo Torquati

Similar documents
FastFlow: targeting distributed systems

Evolution of FastFlow: C++11 extensions, distributed systems

Introduction to FastFlow programming

Overview. Daemon processes and advanced I/O. Source: Chapters 13&14 of Stevens book

An efficient Unbounded Lock-Free Queue for Multi-Core Systems

Introduction to FastFlow programming

ISA 563: Fundamentals of Systems Programming

Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed

Lecture 10 Overview!

Università di Pisa. Dipartimento di Informatica. Technical Report: TR FastFlow tutorial

A brief introduction to C programming for Java programmers

Parallel Programming using FastFlow

Assignment 2 Group 5 Simon Gerber Systems Group Dept. Computer Science ETH Zurich - Switzerland

CS5460/6460: Operating Systems. Lecture 24: Device drivers. Anton Burtsev April, 2014

I/O in scientific applications

Structured approaches for multi/many core targeting

Section 3: File I/O, JSON, Generics. Meghan Cowan

System Call. Preview. System Call. System Call. System Call 9/7/2018

Memory management. Johan Montelius KTH

Processes often need to communicate. CSCB09: Software Tools and Systems Programming. Solution: Pipes. Recall: I/O mechanisms in C

Goby3: A new open-source middleware for nested communication on autonomous marine vehicles

Operating Systems. Lecture 06. System Calls (Exec, Open, Read, Write) Inter-process Communication in Unix/Linux (PIPE), Use of PIPE on command line

The ASSIST Programming Environment

Files. Eric McCreath

Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach

EEC-484/584 Computer Networks

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Remote Procedure Call. Chord: Node Joins and Leaves. Recall? Socket API

CSE 4/521 Introduction to Operating Systems. Lecture 24 I/O Systems (Overview, Application I/O Interface, Kernel I/O Subsystem) Summer 2018

CSci 4061 Introduction to Operating Systems. Input/Output: High-level

NETWORK PROGRAMMING. Instructor: Junaid Tariq, Lecturer, Department of Computer Science

New Communication Standard Takyon Proposal Overview

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

Virtual File System (VFS) Implementation in Linux. Tushar B. Kute,

Kernel Modules. Kartik Gopalan

CS2028 -UNIX INTERNALS

Introduction to Operating Systems. Device Drivers. John Franco. Dept. of Electrical Engineering and Computing Systems University of Cincinnati

Lecture 2. Outline. Layering and Protocols. Network Architecture. Layering and Protocols. Layering and Protocols. Chapter 1 - Foundation

Network Implementation

Operating System Concepts Ch. 11: File System Implementation

we are here Page 1 Recall: How do we Hide I/O Latency? I/O & Storage Layers Recall: C Low level I/O

Inter-process communication (IPC)

CSE 333 SECTION 3. POSIX I/O Functions

Tunneling. Encapsulation and Tunneling. Performance implications of data transmission on high speed network devices

Input / Output. Kevin Webb Swarthmore College April 12, 2018

Programming with MPI

CS201 Some Important Definitions

Communicating Process Architectures in Light of Parallel Design Patterns and Skeletons

An application: foreign function bindings

Processes. Johan Montelius KTH

Processes COMPSCI 386

CSCI 4061: Virtual Memory

txzmq Documentation Release Andrey Smirnov

Chapter 4: Processes. Process Concept. Process State

A process. the stack

we are here I/O & Storage Layers Recall: C Low level I/O Recall: C Low Level Operations CS162 Operating Systems and Systems Programming Lecture 18

ECE 435 Network Engineering Lecture 2

Efficient Smith-Waterman on multi-core with FastFlow

MaRTE OS Misc utilities

Programming with MPI

Parallel stochastic simulators in systems biology: the evolution of the species

Start of Lecture on January 20, Chapter 3: Processes

libxcpc(3) Exception and resource handling in C libxcpc(3)

CS201- Introduction to Programming Latest Solved Mcqs from Midterm Papers May 07,2011. MIDTERM EXAMINATION Spring 2010

Heap Arrays. Steven R. Bagley

Types II. Hwansoo Han

Systems software design NETWORK COMMUNICATIONS & RPC SYSTEMS

o Code, executable, and process o Main memory vs. virtual memory

Order Is A Lie. Are you sure you know how your code runs?

Pointers, Arrays, and Strings. CS449 Spring 2016

COP 4610: Introduction to Operating Systems (Spring 2016) Chapter 3: Process. Zhi Wang Florida State University

Page 1. Analogy: Problems: Operating Systems Lecture 7. Operating Systems Lecture 7

Outline. OS Interface to Devices. System Input/Output. CSCI 4061 Introduction to Operating Systems. System I/O and Files. Instructor: Abhishek Chandra

Marco Aldinucci Salvatore Ruggieri, Massimo Torquati

Signal Example 1. Signal Example 2

Lab # 4. Files & Queues in C

CS C Primer. Tyler Szepesi. January 16, 2013

The MPI Message-passing Standard Lab Time Hands-on. SPD Course Massimo Coppola

UNIX input and output

Prepared by Prof. Hui Jiang Process. Prof. Hui Jiang Dept of Electrical Engineering and Computer Science, York University

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Sockets and Parallel Computing. CS439: Principles of Computer Systems April 11, 2018

Advanced Programming & C++ Language

Inter-Process Communication: Message Passing. Thomas Plagemann. Big Picture. message passing communication?

Process. Prepared by Prof. Hui Jiang Dept. of EECS, York Univ. 1. Process in Memory (I) PROCESS. Process. How OS manages CPU usage? No.

What Is A Process? Process States. Process Concept. Process Control Block (PCB) Process State Transition Diagram 9/6/2013. Process Fundamentals

COP 4610: Introduction to Operating Systems (Spring 2014) Chapter 3: Process. Zhi Wang Florida State University

Data Representation and Storage

CSE 333 Final Exam June 6, 2017 Sample Solution

Pointers, Dynamic Data, and Reference Types

Processes. CSE 2431: Introduction to Operating Systems Reading: Chap. 3, [OSC]

Jaguar: Enabling Efficient Communication and I/O in Java

CS11 C++ DGC. Spring Lecture 6

Intel Thread Building Blocks

COSC Operating Systems Design, Fall Lecture Note: Unnamed Pipe and Shared Memory. Unnamed Pipes

Topics. bool and string types input/output library functions comments memory allocation templates classes

Operating Systems CMPSCI 377, Lec 2 Intro to C/C++ Prashant Shenoy University of Massachusetts Amherst

G52CON: Concepts of Concurrency Lecture 15: Message Passing. Gabriela Ochoa School of Computer Science & IT

Introduction and Overview Socket Programming Higher-level interfaces Final thoughts. Network Programming. Samuli Sorvakko/Nixu Oy

CSC209: Software tools. Unix files and directories permissions utilities/commands Shell programming quoting wild cards files

CSC209: Software tools. Unix files and directories permissions utilities/commands Shell programming quoting wild cards files. Compiler vs.

Transcription:

FastFlow: targeting distributed systems Massimo Torquati May 17 th, 2012 torquati@di.unipi.it http://www.di.unipi.it/~torquati

FastFlow node FastFlow's implementation is based on the concept of node (ff_node class) A node is an abstraction which has an input and an output queue where another node can be connected to (queues can be bounded or unbounded) generic node Operations: get from the input queue, put to the output queue

FastFlow node (2) A sequential node is eventually (at run-time) a posix-thread There are 2 special nodes used in the farm skeleton which provide SPMC and MCSP queues using an active thread for scheduling and gathering policies control emitter collector An ongoing work is to implement SPMC and MCSP as a lock-free CDS

Nodes composition A node can be a sequential node, a pipeline, a farm or a combination of them The model exposed is a streaming network model farm pipeline pipeline (torus) NOTE: there are some limitations on the possible nesting of nodes when cycles are present

Scaling to multiple SMP workstations We need to scale to hundreds/thousands of cores We have to use more than a single multi-core platform The streaming network model exposed by FastFlow, can be easily extended to work outside the single workstation

From node to dnode A dnode (class ff_dnode) is a node (i.e. extends the ff_node class) with an external communication channel ext. channel generic dnode The ext. channel is specialized to be an input or an output channel (not both)

From node to dnode (2) The main idea is to have only the edge nodes of the FastFlow network to be able to talk to the outside world hosta hostb dnode node FastFlow farm FastFlow pipelinel Actually in the above scenario we have 2 FastFlow applications connected together

ff_dnode class sketch The ff_dnode offers the same interface of the ff_node In addition it encapsulates the external channel whose type is passed as template parameter The init method creates and initializes the communication end-point

Available communication patterns Unicast (one-to-one) Broadcast Scatter One-To-Many All Gather Collect from Any TODO: On-demand Many-To-One (the opposite of One-To- Many)

Communication pattern interface init and close The descriptor contains all implementation details get and put interface putmore used for multipart message (sender-side) done used for multipar message (receiver-side)

Communication Transport Interface

Communication patterns implementation At moment, the external channel of the dnode is implemented using the ZeroMQ library The implementation uses the TCP/IP transport layer We have planned to add more implementations based on different messaging framework

ZeroMQ messaging framework (1) ZeroMQ (or ØMQ) is a communication library It provides you a socket layer abstraction Sockets carry whole messages across various transports: in-process (threads), inter-proess, TCP/IP, multicast ØMQ is quite easy to use It is efficient enough to be used in cluster environment

ZeroMQ messaging framework (2) ZeroMQ offers an asynchronous I/O model Runs on most operating systems (Linux, Windows, OS X) Supports many programming languages: C++, Java,.NET, Python, C#, Erlang, Perl,... It is open-source, LGPL license Lots of documentation and examples available take a look at: www.zeromq.org

ZeroMQ messaging framework (3) Sockets can be used with different communication patterns Not only classical bidirectional communication between 2 peers (point-to-point) ØMQ offers the following patterns: request/reply, publish/subscribe, push/pull Communication patterns can be directly used in your application to solve specific communication need: take a look at zguide.zeromq.org for more details

ZeroMQ Hello World From ØMQ on-line manual

ZeroMQ programming Minor pitfalls you may come across with ØMQ: It is not possible to provide your pre-allocated buffer on the receiver side The message buffer allocation is in charge of the ZeroMQ runtime You must be carefull to mange multi-part messages Some kind of ØMQ sockets, if not used properly, start dropping messages without any alert.

How to define a dnode Implementation of the comm. pattern we want to use: broadcast true identifies a producer, false a consumer node

First distributed example: pipeline hosta hostb Node0 Node1 A Node2 Node3 FastFlow pipeline test11_pipe A 1 hostb:port hosta FastFlow pipeline test11_pipe A 0 hostb:port hostb A Node0 Node1 Node2 Node3 FastFlow pipeline test11_pipe A 1 hostb:port hosta:port B FastFlow pipeline test11_pipe A 0 hostb:port hosta:port

Marshalling/Unmarshalling Consider the case where two or more objects have to be sent in a single message If the two objects are non contiguous in memory we have to memcpy one of the two but can be quite costly in term of performance A classical solution to this problem is to use readv/writev-like primitives, i.e. multi-part messages (see man writev): ssize_t writev(int fd, const struct iovec *iov, int iovcnt); struct iovec { void *iov_base; // starting address size_t iov_len; // number of bytes to transfer };

Marshalling/Unmarshalling (2) The ff_dnode class provides 3 methods that can be (have to be) overloaded: 2 prepare methods (1 for the sender and 1 for the receiver), and 1 unmarshall method only for the receiver sender-side: the prepare method is called by the run-time before sending data to the channel receiver-side: the unmarshall method is called before passing the received data to the svc()

Marshalling/Unmarshalling (3) ptr Object definition: struct mystring_t { int length; char* str; }; mystring_t* ptr; Memory layout: 12 str Hello world! prepare (top one) creates 2 iovec for the 2 parts of memory - those pointed by ptr and str unmarshall arranges things to have a single pointer to the object

How to play with it You have to install ZeroMQ Package distribution (.rpm,.deb,.) Or download the tarball and compile it You have to have installed the uuid-dev package The distributed version of FastFlow is not yet available on sourceforge It will be soon... sourceforge.net/projects/mc-fastflow/ Ask me or to Prof. Marco Danelutto to have the tarball with the examples