Now What? Sean Parent Principal Scientist Adobe Systems Incorporated. All Rights Reserved..

Similar documents
Better Code: Concurrency Sean Parent Principal Scientist Adobe Systems Incorporated. All Rights Reserved.

Better Code: Concurrency Sean Parent Principal Scientist Adobe Systems Incorporated. All Rights Reserved.

Better Code: Concurrency Sean Parent Principal Scientist Adobe Systems Incorporated. All Rights Reserved.

Better Code: Concurrency Sean Parent Principal Scientist Adobe Systems Incorporated. All Rights Reserved.

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

Using SYCL as an Implementation Framework for HPX.Compute

GPUs and Emerging Architectures

IPHONE/IPAD PHOTO MANAGEMENT

Overview of Project's Achievements

Introduction to GPU hardware and to CUDA

Better Code: Concurrency Sean Parent Principal Scientist Adobe Systems Incorporated. All Rights Reserved.

P0591r4 Utility functions to implement uses-allocator construction

Overload Resolution. Ansel Sermersheim & Barbara Geller Amsterdam C++ Group March 2019

VisionCPP Mehdi Goli

G, William James. The smartphone & tablet have changed the course of real estate

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

Programming Parallel Computers

Pablo Halpern Parallel Programming Languages Architect Intel Corporation

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

CS 553: Algorithmic Language Compilers (PLDI) Graduate Students and Super Undergraduates... Logistics. Plan for Today

Technology for a better society. hetcomp.com

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

Modern Processor Architectures. L25: Modern Compiler Design

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

SIMD Exploitation in (JIT) Compilers

Instruction Set Architecture ( ISA ) 1 / 28

Emerging technologies, trends and standards that can impact ongoing the evolution of government streaming platforms

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Modern GPU Programming With CUDA and Thrust. Gilles Civario (ICHEC)

Programming Parallel Computers

No boundaries to news production.

Using Graphics Chips for General Purpose Computation

C++ Developer Survey "Lite": C++ and Cloud

Chapter 5. The Standard Template Library.

GPUs and GPGPUs. Greg Blanton John T. Lubia

Our Technology Expertise for Software Engineering Services. AceThought Services Your Partner in Innovation

CUTTING THE CABLE CORD!

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Elementary Computing CSC 100. M. Cheng, Computer Science

Trends and Challenges in Multicore Programming

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

BUILT FOR BUSINESS. 10 Reasons BlackBerry Smartphones Are Still the Best Way to Do Business. Whitepaper

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

Adding Advanced Shader Features and Handling Fragmentation

Elementary Computing CSC 100. M. Cheng, Computer Science

Marwan Burelle. Parallel and Concurrent Programming

AMD Elite A-Series APU Desktop LAUNCHING JUNE 4 TH PLACE YOUR ORDERS TODAY!

HTML5 Mobile App Development

Attached is a list of the required apps that need to be purchased for the 2015 school year.

Alexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria

OpenMPSuperscalar: Task-Parallel Simulation and Visualization of Crowds with Several CPUs and GPUs

GPU programming. Dr. Bernhard Kainz

The Nook Tablet and Kindle Fire are both great values, but neither tablet is perfect.

Office 365 Business The Microsoft Office you know, powered by the cloud.

Manual Delete Songs From Ipod Touch 4g Ios 6 >>>CLICK HERE<<<

Learn how to get started with Dropbox: Take your stuff anywhere. Send large files. Keep your files safe. Work on files together. Welcome to Dropbox!

How to Write Fast Code , spring th Lecture, Mar. 31 st

Chromebooks boot in seconds, and resume instantly. When you turn on a Chromebook and sign in, you can get online fast.

Multi/Many Core Programming Strategies

Google Drive. Table of Contents. Install Google Drive 2. Google Drive for Mac or PC 2. Google Drive Mobile 2. Creating New Files 3.

Converting CVT Account to Another Account

Storing your Data in the Cloud

Parallel Hybrid Computing Stéphane Bihan, CAPS

Vc: Portable and Easy SIMD Programming with C++

Introducing. Introducing...

Built to keep you moving

Cloud Computing Overview. Don Simpson Sun City Computer Club

Performance Profiler. Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava,

Introduction to Computer Systems /18-243, fall th Lecture, Dec 1

Arkuda Concert. Audio Network Solutions

i3allsync Wireless Presentation System User Manual

Online Course Evaluation. What we will do in the last week?

1:1 ipad Program Yr Apps V 1.2

DATA DISASTER AVERTED! HOW TO BACK UP YOUR ANDROID SMARTPHONE

OpenCL Vectorising Features. Andreas Beckmann

Synchronizing Your PC

Overload Resolution. Ansel Sermersheim & Barbara Geller ACCU / C++ June 2018

Data Management CS 4720 Mobile Application Development

General Purpose GPU Computing in Partial Wave Analysis

The Future of Computing is Here! The Cloud

GPUfs: Integrating a file system with GPUs

Action Steps Taken. HP Businesses Personal Computing Printers IT Computers Technical Services Computer Networking. HPQ 52 Week High: $49.

LA-UR Approved for public release; distribution is unlimited.

Dropbox is a free service that lets you bring all your photos, docs, and videos anywhere. This means that any file you save to your Dropbox will

Security Center Mobile System Requirements 4.1

Lecture 1: Gentle Introduction to GPUs

Kampala August, Agner Fog

An introduction to Halide. Jonathan Ragan-Kelley (Stanford) Andrew Adams (Google) Dillon Sharlet (Google)

Welcome to Windows 10. Love it or Avoid it

Heterogeneous Computing and OpenCL

RAD Studio XE Datasheet

Variadic Templates. Andrei Alexandrescu Research Scientist Facebook. c 2012 Andrei Alexandrescu, Facebook. 1 / 34

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

CSCI-1200 Data Structures Spring 2018 Lecture 14 Associative Containers (Maps), Part 1 (and Problem Solving Too)

Kick Start your Embedded Development with Qt

Parallel and High Performance Computing CSE 745

Itunes download 10.5

A Simple Path to Parallelism with Intel Cilk Plus

GPUfs: Integrating a file system with GPUs

Transcription:

Now What? Sean Parent Principal Scientist

Beauty 2

C++11 Standard 1338 Pages 2012 Adobe Systems Incorporated. All Rights Reserved. 3

C++11 Standard 1338 Pages C++98 Standard 757 Pages 2012 Adobe Systems Incorporated. All Rights Reserved. 3

Beauty Nearly every addition to the language is intended to make it easier for developers to write beautiful code Does it succeed? 4

STL was intended to be an example of beautiful code 2012 Adobe Systems Incorporated. All Rights Reserved. 5

Beauty struct _LIBCPP_VISIBLE piecewise_construct_t { }; //constexpr extern const piecewise_construct_t piecewise_construct;// = piecewise_construct_t(); template <class _T1, class _T2> struct _LIBCPP_VISIBLE pair { typedef _T1 first_type; typedef _T2 second_type; _T1 first; _T2 second; // pair(const pair&) = default; // pair(pair&&) = default; _LIBCPP_INLINE_VISIBILITY pair() : first(), second() {} _LIBCPP_INLINE_VISIBILITY pair(const _T1& x, const _T2& y) : first( x), second( y) {} template<class _U1, class _U2> _LIBCPP_INLINE_VISIBILITY pair(const pair<_u1, _U2>& p #ifndef _LIBCPP_HAS_NO_ADVANCED_SFINAE,typename enable_if<is_constructible<_t1, _U1>::value && is_constructible<_t2, _U2>::value>::type* = 0 #endif ) : first( p.first), second( p.second) {} _LIBCPP_INLINE_VISIBILITY 6 pair(const pair& p) _NOEXCEPT_(is_nothrow_copy_constructible<first_type>::value &&

Beauty _T2&& get(pair<_t1, _T2>&& p) _NOEXCEPT {return _VSTD::forward<_T2>( p.second);} #endif // _LIBCPP_HAS_NO_RVALUE_REFERENCES }; template <size_t _Ip, class _T1, class _T2> _LIBCPP_INLINE_VISIBILITY inline typename tuple_element<_ip, pair<_t1, _T2> >::type& get(pair<_t1, _T2>& p) _NOEXCEPT { return get_pair<_ip>::get( p); } template <size_t _Ip, class _T1, class _T2> _LIBCPP_INLINE_VISIBILITY inline const typename tuple_element<_ip, pair<_t1, _T2> >::type& get(const pair<_t1, _T2>& p) _NOEXCEPT { return get_pair<_ip>::get( p); } #ifndef _LIBCPP_HAS_NO_RVALUE_REFERENCES template <size_t _Ip, class _T1, class _T2> _LIBCPP_INLINE_VISIBILITY inline typename tuple_element<_ip, pair<_t1, _T2> >::type&& get(pair<_t1, _T2>&& p) _NOEXCEPT { return get_pair<_ip>::get(_vstd::move( p)); } #endif // _LIBCPP_HAS_NO_RVALUE_REFERENCES #endif // _LIBCPP_HAS_NO_VARIADICS 6

Complete std::pair 372 Lines 2012 Adobe Systems Incorporated. All Rights Reserved. 7

Complete std::pair 372 Lines The compiler provided the copy and move constructors 2012 Adobe Systems Incorporated. All Rights Reserved. 7

Beauty The language is too large for anyone to master So everyone lives within a subset 8

Beauty The language is too large for anyone to master So everyone lives within a subset Is there a beautiful subset? 8

Beauty Is the library now intended to be primitive constructs? How would I use pair or tuple to define a point or euclidean vector class? I still can t write: pair<int> x; Unless I define it myself: template <typename T> using pair = pair<t, T>; How much does language and library complexity matter? 9

We re getting an error that has something to do with rvalue references and std::pair. 2012 Adobe Systems Incorporated. All Rights Reserved. 10

Beauty 1>c:\Program Files (x86)\microsoft Visual Studio 10.0\VC\include\utility(163): error C2220: warning treated as error - no 'object' file generated 1> c:\program Files (x86)\microsoft Visual Studio 10.0\VC\include\utility(247) : see reference to function template instantiation 'std::_pair_base<_ty1,_ty2>::_pair_base<_ty,int>(_other1 &&,_Other2 &&)' being compiled 1> with 1> [ 1> _Ty1=std::_Tree_iterator<std::_Tree_val<std::_Tmap_traits<Mondo::num32,Mondo::CP hotoshopformat *,std::less<mondo::num32>,std::allocator<std::pair<const Mondo::num32,Mondo::CPhotoshopFormat *>>,false>>>, 1> _Ty2=bool, 1> _Ty=std::_Tree_iterator<std::_Tree_val<std::_Tmap_traits<Mondo::num32,Mondo::CPh otoshopformat *,std::less<mondo::num32>,std::allocator<std::pair<const Mondo::num32,Mondo::CPhotoshopFormat *>>,false>>>, 1> _Other1=std::_Tree_iterator<std::_Tree_val<std::_Tmap_traits<Mondo::num32,Mondo:: CPhotoshopFormat *,std::less<mondo::num32>,std::allocator<std::pair<const 11 Mondo::num32,Mondo::CPhotoshopFormat *>>,false>>>,

1> _Traits=std::_Tmap_traits<Mondo::num32,Mondo::CPhotoshopFormat *,std::less<mondo::num32>,std::allocator<std::pair<const Mondo::num32,Mondo::CPhotoshopFormat *>>,false> 1> ] 1> c:\program Files (x86)\microsoft Visual Studio 10.0\VC\include\map(81) : see reference to class template instantiation 'std::_tree<_traits>' being compiled 1> with 1> [ 1> _Traits=std::_Tmap_traits<Mondo::num32,Mondo::CPhotoshopFormat *,std::less<mondo::num32>,std::allocator<std::pair<const Mondo::num32,Mondo::CPhotoshopFormat *>>,false> 1> ] 1> c:\p4\m1710\khopps\dpxcode4\shared\mondo\source\photoshop \CPhotoshopFormat.h(35) : see reference to class template instantiation 'std::map<_kty,_ty>' being compiled 1> with 1> [ 1> _Kty=Mondo::num32, 1> _Ty=Mondo::CPhotoshopFormat * 1> ] 1>c:\Program Files (x86)\microsoft Visual Studio 10.0\VC\include\utility(163): warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning) Beauty 11

Beauty template<class U, class V> pair(u&& x, V&& y); For a pair<t, bool> what happens if we pass an int to y? Why would we pass an int? 12

Beauty ADMStandardTypes.h: #define false 0 AGFConvertUTF.cpp: #define false 0 ASBasic.h: #define false 0 ASBasicTypes.h: #define false 0 ASNumTypes.h: #define false 0 ASTypes.h: #define false 0 basics.h: #define false ((Bool32) 0) common.h: #define false 0 config_assert.h: #define false 0 ConvertUTF.cpp: #define false 0 CoreExpT.h: #define false 0 ICCUtils.h: #define false 0 isparameter.cpp: #define false 0 PITypes.h: #define false FALSE piwinutl.h: #define false FALSE PSSupportPITypes.h: #define false FALSE stdbool.h: #define false false t_9_017.cpp: #define false 0 WinUtilities.h: #define false FALSE 13

<Placeholder> Insert your own beautiful code here. 14

Beauty What we lack in beauty, we gain in efficiency 15

Beauty What we lack in beauty, we gain in efficiency? 15

Truth 16

Demo Adobe Revel 2012 Adobe Systems Incorporated. All Rights Reserved. 17

Desktop Compute Power (8-core 3.5GHz Sandy Bridge + AMD Radeon 6950) GPU Vectorization Multi-thread Scalar 2012 Adobe Systems Incorporated. All Rights Reserved. 18

Desktop Compute Power (8-core 3.5GHz Sandy Bridge + AMD Radeon 6950) 0 750 1500 2250 3000 GPU Vectorization Multi-thread Scalar (GFlops) 2012 Adobe Systems Incorporated. All Rights Reserved. 18

Desktop Compute Power (8-core 3.5GHz Sandy Bridge + AMD Radeon 6950) OpenGL OpenCL CUDA Direct Compute C++ AMP DirectX 0 750 1500 2250 3000 GPU Vectorization Multi-thread Scalar (GFlops) 2012 Adobe Systems Incorporated. All Rights Reserved. 18

Desktop Compute Power (8-core 3.5GHz Sandy Bridge + AMD Radeon 6950) OpenGL OpenCL CUDA Direct Compute C++ AMP DirectX Intrinsics Auto-vectorization OpenCL 0 750 1500 2250 3000 GPU Vectorization Multi-thread Scalar (GFlops) 2012 Adobe Systems Incorporated. All Rights Reserved. 18

Desktop Compute Power (8-core 3.5GHz Sandy Bridge + AMD Radeon 6950) OpenGL OpenCL CUDA Direct Compute C++ AMP DirectX Intrinsics Auto-vectorization OpenCL TBB GCD OpenMP C++11 0 750 1500 2250 3000 GPU Vectorization Multi-thread Scalar (GFlops) 2012 Adobe Systems Incorporated. All Rights Reserved. 18

Desktop Compute Power (8-core 3.5GHz Sandy Bridge + AMD Radeon 6950) OpenGL OpenCL CUDA Direct Compute C++ AMP DirectX Intrinsics Auto-vectorization OpenCL TBB GCD OpenMP C++11 Straight C++ 0 750 1500 2250 3000 GPU Vectorization Multi-thread Scalar (GFlops) 2012 Adobe Systems Incorporated. All Rights Reserved. 18

Two kinds of parallel 2012 Adobe Systems Incorporated. All Rights Reserved. 19

Two kinds of parallel Functional Data Parallel 2012 Adobe Systems Incorporated. All Rights Reserved. 19

Vectorization SSE Intrinsics: great speed potential, but 127 0 m128i vdst = _mm_cvttps_epi32(_mm_mul_ps(_mm_cvtepi32_ps(vsum0), vinvarea)); Moving target: MMX, SSE, SSE2, SSE3, SSE 4.1, SSE 4.2, AVX, AVX2, AVX3 AVX 255 0 Solutions: Auto-vectorization #pragma SIMD CEAN Dest[:] += src[start:length] + 2; OpenCL 20

Why Not Put Everything on the GPU? 2012 Adobe Systems Incorporated. All Rights Reserved. 21

Why Not Put Everything on the GPU? Data Parallel 300 : 1 2012 Adobe Systems Incorporated. All Rights Reserved. 21

Why Not Put Everything on the GPU? Data Parallel 300 : 1 Sequential 1 : 10 2012 Adobe Systems Incorporated. All Rights Reserved. 21

Truth That typical object oriented paradigms of using shared references to objects breaks down in a massively parallel environment Sharing implies either single threaded Or synchronization 22

Amdahl s Law 16 15 14 13 12 11 10 Performance 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Processors 23

Truth To utilize the hardware we need to move towards functional, declarative, reactive, and value semantic programming No raw loops 24

Truth Without addressing vectorization, GPGPU, and scalable parallelism, standard C++ is just a scripting system to get to the other 99% of the machine through other languages and libraries 25

Truth Without addressing vectorization, GPGPU, and scalable parallelism, standard C++ is just a scripting system to get to the other 99% of the machine through other languages and libraries Do we need such a complex scripting system? 25

Goodness 26

Content Ubiquity Ubiquitous access to: Photos calendar Flickr contacts Facebook notes & tasks e-mail (corporate and personal) A full web experience Music itunes Music Match Adobe Revel Documents Google Docs Microsoft Office Spotify Pandora Everything... Movies Netflix Vudu 27

Content ubiquity is access to all your information, on all your devices, all of the time 2012 Adobe Systems Incorporated. All Rights Reserved. 28

The Problem Ubiquity has gone mainstream A typical US household now has 3 TVs, 2 PCs, and 1 Smartphone 1 in 3 households has an internet connected TV A typical US worker has access to a PC at work or is provided an e-mail solution for communication The deluge of digital information has become a challenge to manage How do I get this contract to my phone? How do I get this video from my phone to my PC? Which computer has the latest version of this photo? 29

The Problem Ubiquity has gone mainstream A typical US household now has 3 TVs, 2 PCs, and 1 Smartphone 1 in 3 households has an internet connected TV A typical US worker has access to a PC at work or is provided an e-mail solution for communication The deluge of digital information has become a challenge to manage How do I get this contract to my phone? How do I get this video from my phone to my PC? Which computer has the latest version of this photo? Content ubiquity has become the expectation 29

The Technology is Here Now 3mbps broadband is available to 95% of the US population 3mbps mobile broadband is available to 85% US ranks 28th in broadband subscriptions per capita Every other tier one market is ahead of US France (12), Germany (19), UK (21), Japan (27) 30

No Excuses Your data set is not too large 31

No Excuses Your application is not too interactive 32

No Excuses 33

No Excuses Current hardware is capable enough Typical: 2x1GHz cores, 512GB RAM, 32GB SSD, GPU, 802.11n Revel runs the entire ACR image pipeline on an ipad 1 (half the above capabilities) 34

The Players 35

The Players 35

The Opportunity Focus on content ubiquity all your content, instantly, on any available device zero management overhead Users don t want to care about The Cloud, users want their content 36

The Challenge Content Ubiquity isn't a feature you can bolt-on Dropbox, and similar technologies that require management and synchronization aren't the solution Achieving a seamless experience requires rethinking data model to support incremental changes transactional models to support dynamic mobile environment editor model to support partial editing (proxies, pyramid) UI model to support touch, small devices, 10 foot interfaces 37

Content Ubiquity Opens the Door to Sharing and Collaboration If you can make changes available to other devices immediately then you can make changes available to other apps immediately (works with sandboxing technology) If you can make documents available to all your devices then you can make documents available to others - supporting both collaboration and sharing 38

New Products and New Technologies Start by putting yourself in todays customers shoes Assume anything is possible Build it Invest in technology peer-to-peer interactive streaming proxy and pyramidal editing transactional data-structures 39

Developer Pain The market is very fragmented Windows, OS X, ios, Android, Linux (for cloud service), Browsers, Roku,... And will become more so Windows RT,... 40

Developer Pain To provide a solution requires you write for multiple platforms And many vendors are focusing on proprietary technology to get to 99% of the machine C++ itself becomes a fragmented scripting system Objective-C++, Managed C++ 41

Developer Pain Vendor lock-in on commodity technologies only serves to slow development including incorporating vendor specific technology that provides user benefit 42

Now What? C++Next Simplicity Standardize access to modern hardware 43

2012 Adobe Systems Incorporated. All Rights Reserved.