Is It Possible to Code an Efficient Software Switch?

Size: px
Start display at page:

Download "Is It Possible to Code an Efficient Software Switch?"

Transcription

1 Is It Possible to Code an Efficient Software Switch? Linearizing the Heap, and the Pervasive Use of Hardware Accelerators! Nick Mitchell along with Ioana Baldini, Peter Sweeney IBM T.J. Watson Research Center!! July 31, 2014

2 What Nick Does: Work with IBM customers and developers to make their applications run well the data herein is backed by thousands of real apps

3 Summer School Means Nick Learns At Least As Much From You, As You From Him

4 0. The Apps

5 Most of Our Data Comes from Somewhere Else Your Service Request My Service MongoDB Response Hadoop FPGA, GPU

6 Each Source Has its Own Wire Protocol Thrift Your Service Request HTTP BSON MongoDB My Service Response HTTP RPC Hadoop streaming! binary FPGA, GPU

7 Each Service Has its Operational Form Your Service Request My Service MongoDB Response Hadoop FPGA, GPU

8 Solutions Denied: (but worth learning from)! keep everything in memory code in a systems language

9 Summary Programmers are plumbers who write software switches ECOOP Languages offer poor support for efficient software switches, because of the distance between operational and wire forms Amdahl s Law strikes again, severely limiting the use of specialized computational circuitry What are our options for doing better?

10 1. Initial Thought Exercises

11 Implement an Efficient Map Entry HashMap a conventional chained hash map Key Value ~3 pointers per entry ~3-5 cache misses per GET copying required for RPC c.f. Entry Key Value

12 Implement This Without Copying Any Bits Into the Heap Entry HashMap HashMap Key Value Java Heap Entry Key Value Entry Key Value Entry Key Value update this value

13 Implement an Efficient String String 4 8 bytes of data bytes of non-data 8 33% efficiency characters

14 2. Programmers as Plumbers

15 For Example (a JAX-RS RESTful Response addtocart(@cookieparam( session ) Cookie item ) int quantity ) int quantity) {! ShoppingCart cart = new ShoppingCart(session); cart.add(itemcode, quantity); return Response.ok(cart.status()).cookie(cart.toCookie()).build(); }

16 Dispatch Response addtocart(@cookieparam( session ) Cookie item ) int quantity ) int quantity) {! ShoppingCart cart = new ShoppingCart(session); cart.add(itemcode, quantity); return Response.ok(cart.status()).cookie(cart.toCookie()).build(); }

17 Response addtocart(@cookieparam( session ) Cookie item ) int quantity ) int quantity) {! ShoppingCart cart = new ShoppingCart(session); cart.add(itemcode, quantity); return Response.ok(cart.status()).cookie(cart.toCookie()).build(); }

18 Deserialization into Application Response addtocart(@cookieparam( session ) Cookie item ) int quantity ) int quantity) {! ShoppingCart cart = new ShoppingCart(session); cart.add(itemcode, quantity); return Response.ok(cart.status()).cookie(cart.toCookie()).build(); }

19 Response addtocart(@cookieparam( session ) Cookie item ) int quantity ) int quantity) {! ShoppingCart cart = new ShoppingCart(session); cart.add(itemcode, quantity); return Response.ok(cart.status()).cookie(cart.toCookie()).build(); } (a lonely increment operation)

20 Serialize Application Data Structures Back into Response addtocart(@cookieparam( session ) Cookie item ) int quantity ) int quantity) {! ShoppingCart cart = new ShoppingCart(session); cart.add(itemcode, quantity); return Response.ok(cart.status()).cookie(cart.toCookie()).build(); }

21 Marshalling and Data Formats operational form wire form serialize transfer deserialize

22 3. The Profitability Threshold

23 Trade-offs At what point does externalizing a computation become more pain than it s worth? granularity of kernel cost of externalization Amdahl s Law accelerator speedup

24 Amdahl s Law Tuned Kernel Grows Smaller Offloading Computation Hurts Offloading Computation Helps Externalization More Expensive

25 Amdahl s Law make kernel 6.25% faster Externalization More Expensive free externalization, kernel is 100% of overall computation

26 Amdahl s Law make kernel 6.25% faster Externalization More Expensive externalization equivalent to 1% of overall computation, kernel is 100% of overall computation

27 Amdahl s Law make kernel 12.5% faster

28 Amdahl s Law make kernel 25% faster

29 Amdahl s Law make kernel 50% faster

30 Amdahl s Law make kernel 2x faster

31 Amdahl s Law make kernel 4x faster

32 Amdahl s Law make kernel 8x faster

33 Amdahl s Law make kernel 16x faster

34 Amdahl s Law make kernel 32x faster

35 4. The Costs of Externalization

36 Rough Measurements Object Churn Fractal Webs of Invocations 3000 temps to ingest one document 70 temps to turn a SOAP date (as bytes) into a Java Calendar 200k calls to ingest that document 2000 calls to turn an XML timecard into a Java data structure 6 temps to turn month into int (Sevitsky 2006)

37 Spending CPU Cycles 43% 32% 21% Serialization String Operations Intra-object Copying Application Logic Caching Encryption Reflection Data-driven Dispatching Connection Management

38 Three Kinds of Expenses 43% 32% 21% Serialization String Operations Intra-object Copying Optimizable Data Motion Reflection Data-driven Dispatching Connection Management Amortizable Application Logic Caching Encryption Vital

39 Does the Story Vary? All Data Sets 50% Optimizable Data Motion 31% Amortizable 19% Vital Analytics Apps Web Apps SPECjbb % 53% Vital SPECjEnterprise 49% Optimizable 34% Amortizable 13% trade 54% Amortizable BENCHMARKS

40 What the JIT Doesn t Catch (numbers relative to original w/o JIT optimizations) Copies Comparisons ALU Loads/Stores 0% 25% 50% 75% 100% original with JIT optimizations handtuned w/o JIT optimizations (Xu 2009)

41 Allocate-Use Separation Thread.run How much do we have to inline to have a chance of removing temps and copies?

42 Method Inlining % temps eliminated Date field base J9 inliner JOLT inliner Eclipse 0.4% 1.9% JPetStore on Spring 0.7% 2.5% TPCW on JBoss 0% 4.3% DaCapo 3.4% 13.3% Inlining is hard! Small benchmarks don t reflect difficulties (Shankar 2008)

43 Spending Memory COBOL vs Java 01 Student occurs Name PIC X(10) 02 Score PIC 999V99 class Student { String name; BigDecimal score; } Student[] data = new Student[40]; bytes 4492 bytes (Suganuma 2008)

44 Memory Overhead 0-10% 10-20% 20-30% 30-40% Overhead 40-50% 50-60% 60-70% 70-80% 80-90% % Number of Heap Snapshots

45 5. Alternative Marshalling Schemes

46 Options operational form operational form operational form optimize the translation wire form wire form transmit less wire form transmit less

47 Options operational form operational form operational form optimize the translation Google protobuf, Apache Thrift, Apache Avro, Scala Pickling???? wire form wire form wire form transmit less transmit less

48 protobuf, avro, thrift declaratively specify schema of data automatically generate marshallers struct Work { 1: i32 num1 = 0, 2: i32 num2, 3: Operation op, 4: optional string comment, } operational form optimize the translation wire form operational form wire form transmit less

49 Does it Matter? java-manual protostuff-manual protostuff kryo protobuf thrift-compact thrift avro hessian java-built-in scala/java-built-in relative time to serialize and deserialize an object (

50 Does it Matter? Java Kryo v1 Kryo v2 Scala Pickling Pickler Combinators Unsafe Pickler Combinators Time [ms] e+06 Number of Elements (Miller 2013)

51 Experiment: specjbb % Fraction of CPU Spent in Serialization 40% 30% 20% 10% original implementation change ~100 lines of code in the marshaller 0% Stack Sample Frequency (seconds)

52 6. Alternative Storage Schemes

53 Value Types and Structs struct Student { char[] name; int score; } pay the header cost once per record

54 Ref Poisoning struct Student { String name; int score; }. unless we use a reference type in which case, we re back to where we started

55 Arrays of Records struct Student { chart[] name; int score; } Student[] data = new Student[10]; pay the header cost once per array

56 Marshalling Arrays of Records struct Student { string name; decimal score; } Student[] data = new Student[10]; RPC pay the header cost once per array data[3].name.charat(5) operational form and wire form

57 c.f. Cobol both are fine until we need to store non-scalar data, i.e. variable-length data

58 Column Stores class Student { String name; BigDecimal score; } namestart nameend scorestart scoreend scorepool Student # 1 Student # 2 PROS maintains easily serializability allows for mix-and-match use of attributes Student # CONS added overhead (start and end pointers) unspeakably horrible code

59 7. Alternative Compilation Schemes and Optimizable Language Kernels

60 High-level Languages for Targeting FPGAs LegUp (U. Toronto) Kiwi (Microsoft Research) Bluespec (MIT) make it easier to write computational kernels Lime (IBM Research)

61 asm.js! C/C++ Javascript LLVM subset that is easier to optimize Browser JIT with asm.js support

62 RPython (c.f. Truffle)! Python Human codes interpreter for language X subset that is easier to optimize JIT for language X

63 Models vs Storage class Student { String name; BigDecimal score; } code is hard to maintain, or impossible to express in the language, but performance is great code is easy to maintain, but performance sucks

64 Lowering class Student { String name; BigDecimal score; } Can we lower from one to the other?

65 Lowering class Student { String name; BigDecimal score; } What is lowering other than a partial evaluation of serialization?

66 Partial Marshal class Student { String name; BigDecimal score; } transformer?? student.getname(); transformer?? class StudentTable { CharTable names; int[] namestart; int[] nameend; IntTable scores; int[] scorestart; int[] scoreend; } students.getname(i); names.splice(students.namestart[i], students.nameend([i]);

67 Partial Marshal class Student { String name; BigDecimal score; } data model class StudentTable CharTable names; int[] namestart; int[] nameend; IntTable scores; int[] scorestart; int[] scoreend; } student.getname(); data access students.getname(i); names.splice(students.namestart[i], students.nameend([i]);

68 Attic

69 All Data Sets 50% Optimizable Data Motion 31% Amortizable 19% Vital Batch A Batch B 77% 55% Amortizable Framework T Framework U Framework Z Framework Y Framework X Framework W 59% 43% Amortizable J2EE Provider A jboss J2EE Provider B Analytics D Analytics C Analytics A Analytics F Analytics E SPECjEnterprise tradesoap SPECjbb2013 tradebeans 49% Optimizable 46% 35% 11% 65% Vital

CSCI-1680 RPC and Data Representation. Rodrigo Fonseca

CSCI-1680 RPC and Data Representation. Rodrigo Fonseca CSCI-1680 RPC and Data Representation Rodrigo Fonseca Today Defining Protocols RPC IDL Problem Two programs want to communicate: must define the protocol We have seen many of these, across all layers E.g.,

More information

Building Memory-efficient Java Applications: Practices and Challenges

Building Memory-efficient Java Applications: Practices and Challenges Building Memory-efficient Java Applications: Practices and Challenges Nick Mitchell, Gary Sevitsky (presenting) IBM TJ Watson Research Center Hawthorne, NY USA Copyright is held by the author/owner(s).

More information

CSCI-1680 RPC and Data Representation John Jannotti

CSCI-1680 RPC and Data Representation John Jannotti CSCI-1680 RPC and Data Representation John Jannotti Original Slides from Rodrigo Fonseca Today Defining Protocols RPC IDL Problem Two programs want to communicate: must define the protocol We have seen

More information

Kampala August, Agner Fog

Kampala August, Agner Fog Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler

More information

Protocol Buffers, grpc

Protocol Buffers, grpc Protocol Buffers, grpc Szolgáltatásorientált rendszerintegráció Service-Oriented System Integration Dr. Balázs Simon BME, IIT Outline Remote communication application level vs. transport level protocols

More information

The Evolution of Big Data Platforms and Data Science

The Evolution of Big Data Platforms and Data Science IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering

More information

John Davies LJP Tech. Understand your data

John Davies LJP Tech. Understand your data John Davies LJP Tech Understand your data Understand your data John Davies CTO 11 th October 2016 @jtdavies It worked so well someone bought it - yesterday Copyright 2016 LJP Technologies Ltd. 3 Why do

More information

Overview. Communication types and role of Middleware Remote Procedure Call (RPC) Message Oriented Communication Multicasting 2/36

Overview. Communication types and role of Middleware Remote Procedure Call (RPC) Message Oriented Communication Multicasting 2/36 Communication address calls class client communication declarations implementations interface java language littleendian machine message method multicast network object operations parameters passing procedure

More information

Apache Thrift Introduction & Tutorial

Apache Thrift Introduction & Tutorial Apache Thrift Introduction & Tutorial Marlon Pierce, Suresh Marru Q & A TIOBE Index Programming Language polyglotism Modern distributed applications are rarely composed of modules written in a single language.

More information

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo June 15, 2012 ISMM 2012 at Beijing, China Motivation

More information

Chapter 5. Names, Bindings, and Scopes

Chapter 5. Names, Bindings, and Scopes Chapter 5 Names, Bindings, and Scopes Chapter 5 Topics Introduction Names Variables The Concept of Binding Scope Scope and Lifetime Referencing Environments Named Constants 1-2 Introduction Imperative

More information

Software Analysis. Asymptotic Performance Analysis

Software Analysis. Asymptotic Performance Analysis Software Analysis Performance Analysis Presenter: Jonathan Aldrich Performance Analysis Software Analysis 1 Asymptotic Performance Analysis How do we compare algorithm performance? Abstract away low-level

More information

Lecture 9 Dynamic Compilation

Lecture 9 Dynamic Compilation Lecture 9 Dynamic Compilation I. Motivation & Background II. Overview III. Compilation Policy IV. Partial Method Compilation V. Partial Dead Code Elimination VI. Escape Analysis VII. Results Partial Method

More information

Trace-based JIT Compilation

Trace-based JIT Compilation Trace-based JIT Compilation Hiroshi Inoue, IBM Research - Tokyo 1 Trace JIT vs. Method JIT https://twitter.com/yukihiro_matz/status/533775624486133762 2 Background: Trace-based Compilation Using a Trace,

More information

LIQUID METAL Taming Heterogeneity

LIQUID METAL Taming Heterogeneity LIQUID METAL Taming Heterogeneity Stephen Fink IBM Research! IBM Research Liquid Metal Team (IBM T. J. Watson Research Center) Josh Auerbach Perry Cheng 2 David Bacon Stephen Fink Ioana Baldini Rodric

More information

CSCI-1680 RPC and Data Representation. Rodrigo Fonseca

CSCI-1680 RPC and Data Representation. Rodrigo Fonseca CSCI-1680 RPC and Data Representation Rodrigo Fonseca Administrivia TCP: talk to the TAs if you still have questions! ursday: HW3 out Final Project (out 4/21) Implement a WebSockets server an efficient

More information

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center April

More information

RISC Architecture Ch 12

RISC Architecture Ch 12 RISC Architecture Ch 12 Some History Instruction Usage Characteristics Large Register Files Register Allocation Optimization RISC vs. CISC 18 Original Ideas Behind CISC (Complex Instruction Set Comp.)

More information

Runtime Defenses against Memory Corruption

Runtime Defenses against Memory Corruption CS 380S Runtime Defenses against Memory Corruption Vitaly Shmatikov slide 1 Reading Assignment Cowan et al. Buffer overflows: Attacks and defenses for the vulnerability of the decade (DISCEX 2000). Avijit,

More information

JVA-563. Developing RESTful Services in Java

JVA-563. Developing RESTful Services in Java JVA-563. Developing RESTful Services in Java Version 2.0.1 This course shows experienced Java programmers how to build RESTful web services using the Java API for RESTful Web Services, or JAX-RS. We develop

More information

Spark 2. Alexey Zinovyev, Java/BigData Trainer in EPAM

Spark 2. Alexey Zinovyev, Java/BigData Trainer in EPAM Spark 2 Alexey Zinovyev, Java/BigData Trainer in EPAM With IT since 2007 With Java since 2009 With Hadoop since 2012 With EPAM since 2015 About Secret Word from EPAM itsubbotnik Big Data Training 3 Contacts

More information

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary. ECE C61 Computer Architecture Lecture 2 performance Prof Alok N Choudhary choudhar@ecenorthwesternedu 2-1 Today s s Lecture Performance Concepts Response Time Throughput Performance Evaluation Benchmarks

More information

CpE 442 Introduction to Computer Architecture. The Role of Performance

CpE 442 Introduction to Computer Architecture. The Role of Performance CpE 442 Introduction to Computer Architecture The Role of Performance Instructor: H. H. Ammar CpE442 Lec2.1 Overview of Today s Lecture: The Role of Performance Review from Last Lecture Definition and

More information

Netezza The Analytics Appliance

Netezza The Analytics Appliance Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for

More information

Workload Optimization on Hybrid Architectures

Workload Optimization on Hybrid Architectures IBM OCR project Workload Optimization on Hybrid Architectures IBM T.J. Watson Research Center May 4, 2011 Chiron & Achilles 2003 IBM Corporation IBM Research Goal Parallelism with hundreds and thousands

More information

Computer Organization & Assembly Language Programming

Computer Organization & Assembly Language Programming Computer Organization & Assembly Language Programming CSE 2312 Lecture 11 Introduction of Assembly Language 1 Assembly Language Translation The Assembly Language layer is implemented by translation rather

More information

COS 140: Foundations of Computer Science

COS 140: Foundations of Computer Science COS 140: Foundations of Computer Science Variables and Primitive Data Types Fall 2017 Introduction 3 What is a variable?......................................................... 3 Variable attributes..........................................................

More information

Programming Web Services in Java

Programming Web Services in Java Programming Web Services in Java Description Audience This course teaches students how to program Web Services in Java, including using SOAP, WSDL and UDDI. Developers and other people interested in learning

More information

Operating Systems. 18. Remote Procedure Calls. Paul Krzyzanowski. Rutgers University. Spring /20/ Paul Krzyzanowski

Operating Systems. 18. Remote Procedure Calls. Paul Krzyzanowski. Rutgers University. Spring /20/ Paul Krzyzanowski Operating Systems 18. Remote Procedure Calls Paul Krzyzanowski Rutgers University Spring 2015 4/20/2015 2014-2015 Paul Krzyzanowski 1 Remote Procedure Calls 2 Problems with the sockets API The sockets

More information

Hash-Based Indexes. Chapter 11

Hash-Based Indexes. Chapter 11 Hash-Based Indexes Chapter 11 1 Introduction : Hash-based Indexes Best for equality selections. Cannot support range searches. Static and dynamic hashing techniques exist: Trade-offs similar to ISAM vs.

More information

COS 140: Foundations of Computer Science

COS 140: Foundations of Computer Science COS 140: Foundations of Variables and Primitive Data Types Fall 2017 Copyright c 2002 2017 UMaine School of Computing and Information S 1 / 29 Homework Reading: Chapter 16 Homework: Exercises at end of

More information

Memory Management: The Details

Memory Management: The Details Lecture 10 Memory Management: The Details Sizing Up Memory Primitive Data Types Complex Data Types byte: char: short: basic value (8 bits) 1 byte 2 bytes Pointer: platform dependent 4 bytes on 32 bit machine

More information

CS 417 9/18/17. Paul Krzyzanowski 1. Socket-based communication. Distributed Systems 03. Remote Procedure Calls. Sample SMTP Interaction

CS 417 9/18/17. Paul Krzyzanowski 1. Socket-based communication. Distributed Systems 03. Remote Procedure Calls. Sample SMTP Interaction Socket-based communication Distributed Systems 03. Remote Procedure Calls Socket API: all we get from the to access the network Socket = distinct end-to-end communication channels Read/write model Line-oriented,

More information

Darek Mihocka, Emulators.com Stanislav Shwartsman, Intel Corp. June

Darek Mihocka, Emulators.com Stanislav Shwartsman, Intel Corp. June Darek Mihocka, Emulators.com Stanislav Shwartsman, Intel Corp. June 21 2008 Agenda Introduction Gemulator Bochs Proposed ISA Extensions Conclusions and Future Work Q & A Jun-21-2008 AMAS-BT 2008 2 Introduction

More information

Distributed Systems. 03. Remote Procedure Calls. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 03. Remote Procedure Calls. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 03. Remote Procedure Calls Paul Krzyzanowski Rutgers University Fall 2017 1 Socket-based communication Socket API: all we get from the OS to access the network Socket = distinct end-to-end

More information

CS Programming In C

CS Programming In C CS 24000 - Programming In C Week 16: Review Zhiyuan Li Department of Computer Science Purdue University, USA This has been quite a journey Congratulation to all students who have worked hard in honest

More information

C exam. IBM C IBM WebSphere Application Server Developer Tools V8.5 with Liberty Profile. Version: 1.

C exam.   IBM C IBM WebSphere Application Server Developer Tools V8.5 with Liberty Profile. Version: 1. C9510-319.exam Number: C9510-319 Passing Score: 800 Time Limit: 120 min File Version: 1.0 IBM C9510-319 IBM WebSphere Application Server Developer Tools V8.5 with Liberty Profile Version: 1.0 Exam A QUESTION

More information

Go with the Flow: Profiling Copies to Find Run-time Bloat

Go with the Flow: Profiling Copies to Find Run-time Bloat Go with the Flow: Profiling Copies to Find Run-time Bloat Guoqing Xu, Matthew Arnold, Nick Mitchell, Atanas Rountev, Gary Sevitsky Presented by Wei Fang February 10, 2015 Bloat Example A commercial document

More information

Soot A Java Bytecode Optimization Framework. Sable Research Group School of Computer Science McGill University

Soot A Java Bytecode Optimization Framework. Sable Research Group School of Computer Science McGill University Soot A Java Bytecode Optimization Framework Sable Research Group School of Computer Science McGill University Goal Provide a Java framework for optimizing and annotating bytecode provide a set of API s

More information

LogCA: A High-Level Performance Model for Hardware Accelerators

LogCA: A High-Level Performance Model for Hardware Accelerators Everything should be made as simple as possible, but not simpler Albert Einstein LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Fall 2016 Lecture 4a Andrew Tolmach Portland State University 1994-2016 Pragmatics of Large Values Real machines are very efficient at handling word-size chunks of data (e.g.

More information

G Programming Languages Spring 2010 Lecture 6. Robert Grimm, New York University

G Programming Languages Spring 2010 Lecture 6. Robert Grimm, New York University G22.2110-001 Programming Languages Spring 2010 Lecture 6 Robert Grimm, New York University 1 Review Last week Function Languages Lambda Calculus SCHEME review 2 Outline Promises, promises, promises Types,

More information

Pragmatic Clustering. Mike Cannon-Brookes CEO, Atlassian Software Systems

Pragmatic Clustering. Mike Cannon-Brookes CEO, Atlassian Software Systems Pragmatic Clustering Mike Cannon-Brookes CEO, Atlassian Software Systems 1 Confluence Largest enterprise wiki in the world 2000 customers in 60 countries J2EE application, ~500k LOC Hibernate, Lucene,

More information

Computer Principles and Components 1

Computer Principles and Components 1 Computer Principles and Components 1 Course Map This module provides an overview of the hardware and software environment being used throughout the course. Introduction Computer Principles and Components

More information

Java On Steroids: Sun s High-Performance Java Implementation. History

Java On Steroids: Sun s High-Performance Java Implementation. History Java On Steroids: Sun s High-Performance Java Implementation Urs Hölzle Lars Bak Steffen Grarup Robert Griesemer Srdjan Mitrovic Sun Microsystems History First Java implementations: interpreters compact

More information

C++\CLI. Jim Fawcett CSE687-OnLine Object Oriented Design Summer 2017

C++\CLI. Jim Fawcett CSE687-OnLine Object Oriented Design Summer 2017 C++\CLI Jim Fawcett CSE687-OnLine Object Oriented Design Summer 2017 Comparison of Object Models Standard C++ Object Model All objects share a rich memory model: Static, stack, and heap Rich object life-time

More information

STARCOUNTER. Technical Overview

STARCOUNTER. Technical Overview STARCOUNTER Technical Overview Summary 3 Introduction 4 Scope 5 Audience 5 Prerequisite Knowledge 5 Virtual Machine Database Management System 6 Weaver 7 Shared Memory 8 Atomicity 8 Consistency 9 Isolation

More information

Unit 2 : Computer and Operating System Structure

Unit 2 : Computer and Operating System Structure Unit 2 : Computer and Operating System Structure Lesson 1 : Interrupts and I/O Structure 1.1. Learning Objectives On completion of this lesson you will know : what interrupt is the causes of occurring

More information

There are 16 total numbered pages, 7 Questions. You have 2 hours. Budget your time carefully!

There are 16 total numbered pages, 7 Questions. You have 2 hours. Budget your time carefully! UNIVERSITY OF TORONTO FACULTY OF APPLIED SCIENCE AND ENGINEERING MIDTERM EXAMINATION, October 27, 2014 ECE454H1 Computer Systems Programming Closed book, Closed note No programmable electronics allowed

More information

Comprehensive Final Exam for Capacity Planning (CIS 4930/6930) Fall 2001 >>> SOLUTIONS <<<

Comprehensive Final Exam for Capacity Planning (CIS 4930/6930) Fall 2001 >>> SOLUTIONS <<< Comprehensive Final Exam for Capacity Planning (CIS 4930/6930) Fall 001 >>> SOLUTIONS

More information

Efficient Runtime Tracking of Allocation Sites in Java

Efficient Runtime Tracking of Allocation Sites in Java Efficient Runtime Tracking of Allocation Sites in Java Rei Odaira, Kazunori Ogata, Kiyokuni Kawachiya, Tamiya Onodera, Toshio Nakatani IBM Research - Tokyo Why Do You Need Allocation Site Information?

More information

Building up to today. Remote Procedure Calls. Reminder about last time. Threads - impl

Building up to today. Remote Procedure Calls. Reminder about last time. Threads - impl Remote Procedure Calls Carnegie Mellon University 15-440 Distributed Systems Building up to today 2x ago: Abstractions for communication example: TCP masks some of the pain of communicating across unreliable

More information

Application Development in JAVA. Data Types, Variable, Comments & Operators. Part I: Core Java (J2SE) Getting Started

Application Development in JAVA. Data Types, Variable, Comments & Operators. Part I: Core Java (J2SE) Getting Started Application Development in JAVA Duration Lecture: Specialization x Hours Core Java (J2SE) & Advance Java (J2EE) Detailed Module Part I: Core Java (J2SE) Getting Started What is Java all about? Features

More information

/ / JAVA TRAINING

/ / JAVA TRAINING www.tekclasses.com +91-8970005497/+91-7411642061 info@tekclasses.com / contact@tekclasses.com JAVA TRAINING If you are looking for JAVA Training, then Tek Classes is the right place to get the knowledge.

More information

QUIZ on Ch.5. Why is it sometimes not a good idea to place the private part of the interface in a header file?

QUIZ on Ch.5. Why is it sometimes not a good idea to place the private part of the interface in a header file? QUIZ on Ch.5 Why is it sometimes not a good idea to place the private part of the interface in a header file? Example projects where we don t want the implementation visible to the client programmer: The

More information

A new Mono GC. Paolo Molaro October 25, 2006

A new Mono GC. Paolo Molaro October 25, 2006 A new Mono GC Paolo Molaro lupus@novell.com October 25, 2006 Current GC: why Boehm Ported to the major architectures and systems Featurefull Very easy to integrate Handles managed pointers in unmanaged

More information

CS558 Programming Languages Winter 2018 Lecture 4a. Andrew Tolmach Portland State University

CS558 Programming Languages Winter 2018 Lecture 4a. Andrew Tolmach Portland State University CS558 Programming Languages Winter 2018 Lecture 4a Andrew Tolmach Portland State University 1994-2018 Pragmatics of Large Values Real machines are very efficient at handling word-size chunks of data (e.g.

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

LegUp: Accelerating Memcached on Cloud FPGAs

LegUp: Accelerating Memcached on Cloud FPGAs 0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are

More information

Chapter 4 Communication

Chapter 4 Communication DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 4 Communication Layered Protocols (1) Figure 4-1. Layers, interfaces, and protocols in the OSI

More information

CS 31: Intro to Systems Pointers and Memory. Kevin Webb Swarthmore College October 2, 2018

CS 31: Intro to Systems Pointers and Memory. Kevin Webb Swarthmore College October 2, 2018 CS 31: Intro to Systems Pointers and Memory Kevin Webb Swarthmore College October 2, 2018 Overview How to reference the location of a variable in memory Where variables are placed in memory How to make

More information

Building Robust Embedded Software

Building Robust Embedded Software Building Robust Embedded Software by Lars Bak, OOVM A/S Demands of the Embedded Industy Increased reliability Low cost -> resource constraints Dynamic software updates in the field Real-time capabilities

More information

Building loosely coupled and scalable systems using Event-Driven Architecture. Jonas Bonér Patrik Nordwall Andreas Källberg

Building loosely coupled and scalable systems using Event-Driven Architecture. Jonas Bonér Patrik Nordwall Andreas Källberg Building loosely coupled and scalable systems using Event-Driven Architecture Jonas Bonér Patrik Nordwall Andreas Källberg Why is EDA Important for Scalability? What building blocks does EDA consists of?

More information

19 File Structure, Disk Scheduling

19 File Structure, Disk Scheduling 88 19 File Structure, Disk Scheduling Readings for this topic: Silberschatz et al., Chapters 10 11. File: a named collection of bytes stored on disk. From the OS standpoint, the file consists of a bunch

More information

Vertical Profiling: Understanding the Behavior of Object-Oriented Applications

Vertical Profiling: Understanding the Behavior of Object-Oriented Applications Vertical Profiling: Understanding the Behavior of Object-Oriented Applications Matthias Hauswirth, Amer Diwan University of Colorado at Boulder Peter F. Sweeney, Michael Hind IBM Thomas J. Watson Research

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

Hardware Acceleration for Tagging

Hardware Acceleration for Tagging Parallel Hardware Parallel Applications IT industry (Silicon Valley) Parallel Software Users Hardware Acceleration for Tagging Sarah Bird, David McGrogan, John Kubiatowicz, Krste Asanovic June 5, 2008

More information

Projects and Apache Thrift. August 30 th 2018 Suresh Marru, Marlon Pierce

Projects and Apache Thrift. August 30 th 2018 Suresh Marru, Marlon Pierce Projects and Apache Thrift August 30 th 2018 Suresh Marru, Marlon Pierce smarru@iu.edu, marpierc@iu.edu Todays Outline Final Project Proposals Apache Thrift Overview Open Discussion Project Life Cycle

More information

Introduction. L25: Modern Compiler Design

Introduction. L25: Modern Compiler Design Introduction L25: Modern Compiler Design Course Aims Understand the performance characteristics of modern processors Be familiar with strategies for optimising dynamic dispatch for languages like JavaScript

More information

Google Protocol Buffers for Embedded IoT

Google Protocol Buffers for Embedded IoT Google Protocol Buffers for Embedded IoT Integration in a medical device project Quick look 1: Overview What is it and why is useful Peers and alternatives Wire format and language syntax Libraries for

More information

Kasper Lund, Software engineer at Google. Crankshaft. Turbocharging the next generation of web applications

Kasper Lund, Software engineer at Google. Crankshaft. Turbocharging the next generation of web applications Kasper Lund, Software engineer at Google Crankshaft Turbocharging the next generation of web applications Overview Why did we introduce Crankshaft? Deciding when and what to optimize Type feedback and

More information

Integrate MATLAB Analytics into Enterprise Applications

Integrate MATLAB Analytics into Enterprise Applications Integrate Analytics into Enterprise Applications Lyamine Hedjazi 2015 The MathWorks, Inc. 1 Data Analytics Workflow Preprocessing Data Business Systems Build Algorithms Smart Connected Systems Take Decisions

More information

416 Distributed Systems. RPC Day 2 Jan 11, 2017

416 Distributed Systems. RPC Day 2 Jan 11, 2017 416 Distributed Systems RPC Day 2 Jan 11, 2017 1 Last class Finish networks review Fate sharing End-to-end principle UDP versus TCP; blocking sockets IP thin waist, smart end-hosts, dumb (stateless) network

More information

Sista: Improving Cog s JIT performance. Clément Béra

Sista: Improving Cog s JIT performance. Clément Béra Sista: Improving Cog s JIT performance Clément Béra Main people involved in Sista Eliot Miranda Over 30 years experience in Smalltalk VM Clément Béra 2 years engineer in the Pharo team Phd student starting

More information

Java without the Coffee Breaks: A Nonintrusive Multiprocessor Garbage Collector

Java without the Coffee Breaks: A Nonintrusive Multiprocessor Garbage Collector Java without the Coffee Breaks: A Nonintrusive Multiprocessor Garbage Collector David F. Bacon IBM T.J. Watson Research Center Joint work with C.R. Attanasio, Han Lee, V.T. Rajan, and Steve Smith ACM Conference

More information

Field Analysis. Last time Exploit encapsulation to improve memory system performance

Field Analysis. Last time Exploit encapsulation to improve memory system performance Field Analysis Last time Exploit encapsulation to improve memory system performance This time Exploit encapsulation to simplify analysis Two uses of field analysis Escape analysis Object inlining April

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

Virtual Memory. Kevin Webb Swarthmore College March 8, 2018

Virtual Memory. Kevin Webb Swarthmore College March 8, 2018 irtual Memory Kevin Webb Swarthmore College March 8, 2018 Today s Goals Describe the mechanisms behind address translation. Analyze the performance of address translation alternatives. Explore page replacement

More information

Towards Ruby3x3 Performance

Towards Ruby3x3 Performance Towards Ruby3x3 Performance Introducing RTL and MJIT Vladimir Makarov Red Hat September 21, 2017 Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 1 / 30 About Myself Red Hat, Toronto

More information

Attila Szegedi, Software

Attila Szegedi, Software Attila Szegedi, Software Engineer @asz Everything I ever learned about JVM performance tuning @twitter Everything More than I ever wanted to learned about JVM performance tuning @twitter Memory tuning

More information

Profiling & Optimization

Profiling & Optimization Lecture 11 Sources of Game Performance Issues? 2 Avoid Premature Optimization Novice developers rely on ad hoc optimization Make private data public Force function inlining Decrease code modularity removes

More information

Webservices In Java Tutorial For Beginners Using Netbeans Pdf

Webservices In Java Tutorial For Beginners Using Netbeans Pdf Webservices In Java Tutorial For Beginners Using Netbeans Pdf Java (using Annotations, etc.). Part of way) (1/2). 1- Download Netbeans IDE for Java EE from here: 2- Follow the tutorial for creating a web

More information

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process

More information

Operating System. Chapter 4. Threads. Lynn Choi School of Electrical Engineering

Operating System. Chapter 4. Threads. Lynn Choi School of Electrical Engineering Operating System Chapter 4. Threads Lynn Choi School of Electrical Engineering Process Characteristics Resource ownership Includes a virtual address space (process image) Ownership of resources including

More information

Caching and Buffering in HDF5

Caching and Buffering in HDF5 Caching and Buffering in HDF5 September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 1 Software stack Life cycle: What happens to data when it is transferred from application buffer to HDF5 file and from HDF5

More information

High Performance Computing and Programming, Lecture 3

High Performance Computing and Programming, Lecture 3 High Performance Computing and Programming, Lecture 3 Memory usage and some other things Ali Dorostkar Division of Scientific Computing, Department of Information Technology, Uppsala University, Sweden

More information

CS 241 Honors Memory

CS 241 Honors Memory CS 241 Honors Memory Ben Kurtovic Atul Sandur Bhuvan Venkatesh Brian Zhou Kevin Hong University of Illinois Urbana Champaign February 20, 2018 CS 241 Course Staff (UIUC) Memory February 20, 2018 1 / 35

More information

WAY MORE THAN YOU EVER WANTED TO KNOW ABOUT C++ TEMPLATES AND C++ TEMPLATE METAPROGRAMMING DYLAN KNUTSON, UCSD 13-17

WAY MORE THAN YOU EVER WANTED TO KNOW ABOUT C++ TEMPLATES AND C++ TEMPLATE METAPROGRAMMING DYLAN KNUTSON, UCSD 13-17 WAY MORE THAN YOU EVER WANTED TO KNOW ABOUT C++ TEMPLATES AND C++ TEMPLATE METAPROGRAMMING DYLAN KNUTSON, UCSD 13-17 Creative Commons Attribution + Noncommercial WHO AM I Facebook Seattle s Core Systems/Infra

More information

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Remote Procedure Call. Chord: Node Joins and Leaves. Recall? Socket API

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Remote Procedure Call. Chord: Node Joins and Leaves. Recall? Socket API Recap: Finger Table Finding a using fingers CSE 486/586 Distributed Systems Remote Procedure Call Steve Ko Computer Sciences and Engineering University at Buffalo N102" 86 + 2 4! N86" 20 +

More information

"Web Age Speaks!" Webinar Series

Web Age Speaks! Webinar Series "Web Age Speaks!" Webinar Series Java EE Patterns Revisited WebAgeSolutions.com 1 Introduction Bibhas Bhattacharya CTO bibhas@webagesolutions.com Web Age Solutions Premier provider of Java & Java EE training

More information

MongoDB Web Architecture

MongoDB Web Architecture MongoDB Web Architecture MongoDB MongoDB is an open-source, NoSQL database that uses a JSON-like (BSON) document-oriented model. Data is stored in collections (rather than tables). - Uses dynamic schemas

More information

Rate-Limiting at Scale. SANS AppSec Las Vegas 2012 Nick

Rate-Limiting at Scale. SANS AppSec Las Vegas 2012 Nick Rate-Limiting at Scale SANS AppSec Las Vegas 2012 Nick Galbreath @ngalbreath nickg@etsy.com Who is Etsy? Marketplace for Small Creative Businesses Alexa says #51 for USA traffic > $500MM transaction volume

More information

CS 31: Introduction to Computer Systems. 03: Binary Arithmetic January 29

CS 31: Introduction to Computer Systems. 03: Binary Arithmetic January 29 CS 31: Introduction to Computer Systems 03: Binary Arithmetic January 29 WiCS! Swarthmore Women in Computer Science Slide 2 Today Binary Arithmetic Unsigned addition Subtraction Representation Signed magnitude

More information

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process Performance evaluation It s an everyday process CS/COE0447: Computer Organization and Assembly Language Chapter 4 Sangyeun Cho Dept. of Computer Science When you buy food Same quantity, then you look at

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

CS506 Web Design & Development Final Term Solved MCQs with Reference

CS506 Web Design & Development Final Term Solved MCQs with Reference with Reference I am student in MCS (Virtual University of Pakistan). All the MCQs are solved by me. I followed the Moaaz pattern in Writing and Layout this document. Because many students are familiar

More information

Scalability, Performance & Caching

Scalability, Performance & Caching COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015) Scalability, Performance & Caching Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah Copyright

More information

CSCE 212: FINAL EXAM Spring 2009

CSCE 212: FINAL EXAM Spring 2009 CSCE 212: FINAL EXAM Spring 2009 Name (please print): Total points: /120 Instructions This is a CLOSED BOOK and CLOSED NOTES exam. However, you may use calculators, scratch paper, and the green MIPS reference

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming Linda Woodard CAC 19 May 2010 Introduction to Parallel Computing on Ranger 5/18/2010 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor

More information

Memory Management. Kevin Webb Swarthmore College February 27, 2018

Memory Management. Kevin Webb Swarthmore College February 27, 2018 Memory Management Kevin Webb Swarthmore College February 27, 2018 Today s Goals Shifting topics: different process resource memory Motivate virtual memory, including what it might look like without it

More information