Arm's role in co-design for the next generation of HPC platforms

Size: px

Start display at page:

Download "Arm's role in co-design for the next generation of HPC platforms"

Harry Ross
5 years ago
Views:

1 Arm's role in co-design for the next generation of HPC platforms Arm HPC workshop at SJTU July 2018 Filippo Spiga Software and Large Scale Systems

2 Outline Arm and High Performance Computing Introducing SVE Simulation Tools: gem5 and DynamoRIO+ARMIE SVE Ecosystem Full Application Enablement 2

Arm Technology Already Connects the World Arm is Ubiquitous 21 billion chips sold by partners in 2017 alone Mobile/Embedded/IoT/ Automotive/Server/GPUs Partnership is key We design IP,

3 Arm Technology Already Connects the World Arm is Ubiquitous 21 billion chips sold by partners in 2017 alone Mobile/Embedded/IoT/ Automotive/Server/GPUs Partnership is key We design IP, not manufacture chips Partners build products for their target markets Arm is Ubiquitous One size is not always the best fit for all HPC is a great fit for codesign and collaboration 4

4 The Arm Business Model Deploy energy-efficient Arm-based technology, wherever computing happens Leading in wearables and the Internet of Things ~85% share of laptops, tablets, and smartphones Driving the transformation of the network and data center to an Intelligent Flexible Cloud Enabling innovation and creativity with embedded intelligence Taking mobile computing to the next four billion people Partnering to deliver data center efficiency 6

5 Arm Cortex Family Cortex - A Highest performance Optimised for rich operating systems Cortex - R Fast response Optimised for high performance, hard real-time applications Cortex - M Smallest/lowest power Optimised for embedded intelligence and microcontrollers 8

6 Arm application processors are everywhere Cortex-A CPUs cover a wide variety of markets Scale efficiently to substantially higher performance Fit even more compute in a smaller footprint with less power Mobile and Consumer Automotive Servers and networking IoT and Embedded 9

7 Introducing Arm Research Mission Partner to accelerate innovation and transfer research knowledge across Arm and the Arm Ecosystem Objectives Build a pipeline to create and bring future technology into the Arm Ecosystem Create and maintain the emerging technology landscape Enable innovative Academic research through collaboration and partnership Strategic Research Agendas Research Programs Technology Landscapes Collaboration Programs 10

8 Arm Research to End User Research ARM IP SOC OEM Service 5-10 years lead time 12

9 Arm Research Groups Architectures Systems and Memory Design Methodology Software and Large Scale Systems New Materials, Devices and Circuits IoT and Energy-Constrained Systems (+ Security) 14

10 Arm High Performance Computing strategy Enablement Address gaps in computational capability and data movement within Architecture Seed the software ecosystem with open source support for Armv8 and SVE libraries, tools, and optimized workloads Provide world class tools for compilation, analysis, and debug at large scale. Mission Enable the world s first Arm supercomputer(s) Strategy Enablement + Co-Design + Partnership Strategy: Enablement + Co-Design + Partnership Building Blocks Co-Design Work with key end-customers in DoE, DoD, RIKEN, and EU to design balanced architecture, uarchitecture and SoCs based on real-world workloads, not benchmarks. Develop simulation and modelling tools to support co-design development with end-customers, partners, and academia. Partnership Work with Architecture partners to quickly bring optimized solutions to market. Work with ATG & uarchitecture design teams to steer future designs to be more relevant for HPC, HPDA, and ML Work with key ISVs to enable mid-market 17

11 What it is Co-design? Abstract: Preparations for Exascale computing have led to the realization that future computing environments will be significantly different from those that provide Petascale capabilities. This change is driven by energy constraints, which is compelling architects to design systems that will require a significant re-thinking of how algorithms are developed and implemented. Co-design has been proposed as a methodology for scientific application, software and hardware communities to work together. This chapter gives an overview of co-design and discusses the early application of this methodology to HighPerformance Computing. [ ] The co-design strategy is based on developing partnerships with computer vendors and application scientists and engaging them in a highly collaborative and iterative design process well before a given system is available for commercial use. The process is built around identifying leading edge, high-impact scientific applications and providing concrete optimization targets rather than focusing on speeds andfeeds (FLOPs andbandwidth) andpercent ofpeak. On the Role of Co-design in High Performance Computing, Transition of HPC Towards Exascale Computing (2013) 18

12 Co-design with Arm Analysis of applications to devise the most efficient solutions Applications System Software & Tools Platform Hardware (the implementation ) Architecture Opportunities to exploit 19

13 History of Arm in HPC 2011 Calxada 32-bit ARMv7-A Cortex A AMD Opteron A bit ARMv8-A Cortex A Cores 2017 Cavium ThunderX 2 64-bit ARMv8-A 32 Cores Mont-Blanc 1 32-bit ARMv7-A Cortex A15 First Arm HPC cluster 2015 Cavium ThunderX 64-bit ARMv8-A 48 Cores 20

14 Introducing SVE

15 Performance Portability 1 st source of frustration for Applications Developers Start thinking massively parallel easy to say, hard in practice! Exploit massive parallelism without over-specifying the details Programmer may be able to provide hints to a compiler/runtime but how good in self-discover and self-tuning? Exploit heterogeneous compute elements is hard MPI+X still good enough? Will we ever move beyond it? One programming language to rule them all? Handling efficient data sharing across components The CPU has everything we need, can we (Arm) make CPU architecture better? 30

16 Evolution of Arm SIMD architectures Single Instruction Multiple Data = data level parallelism, many operations per instruction Armv6 SIMD bit integer/core register file Integer only 2x16-bit fixed-point data elements Armv7-A Advanced SIMD (aka Arm NEON) bit SIMD register file, supporting well-conditioned memory data layouts Non-IEEE single-precision floating-point and 8/16/32-bit fixed-point data elements Armv8-A AArch64 Advanced SIMD was an evolutionary step b SIMD register file, identical memory data layouts Full IEEE double-precision floating-point and 64-bit fixed-point data elements But new markets for AArch64 (HPC) have demanded more radical features Ability to vectorize irregular code and more complex data structures Longer vectors to extract more data-level parallelism per cycle? 31

17 Scalable Vector Extension (SVE) SVE does not mandate a single, fixed vector length Vector Length (VL) is hardware implementation choice of 128 to 2048 bits Vector Length Agnostic (VLA) programming paradigm made possible by the per-lane predication, predicate-driven loop control, vector partitioning and software-managed speculation SVE is not a simple extension of AArch64 Advanced SIMD A separate, optional architectural extension with a new set of instruction encodings (ARMv8.3) Initial focus is HPC and general-purpose server software, not media/image processing SVE begins to address traditional barriers to auto-vectorization Compilers often cannot vectorize due to intra-vector control and data dependencies Software-managed speculative vectorization allows more loops to be vectorized by a compiler 32

18 Introducing the Scalable Vector Extension (SVE) A vector extension to the ARMv8-A architecture with some major new features: Gather-load and scatter-store Loads a single register from several non-contiguous memory locations. + pred = Per-lane predication Operations work on individual lanes under control of a predicate register. for (i = 0; i < n; ++i) INDEX i n-2 n-1 n n+1 CMPLT n Predicate-driven loop control and management Eliminate scalar loop heads and tails by processing partial vectors. 33

19 Introducing the Scalable Vector Extension (SVE) A vector extension to the ARMv8-A architecture with some major new features: + pred Vector partitioning and software-managed speculation First Faulting Load instructions allow memory accesses to cross into invalid pages = = = = Extended floating-point horizontal reductions In-order and tree-based reductions trade-off performance and repeatability. 34

20 Example:daxpy (scalar) void daxpy(double *x, double *y, double a, int n) { for (int i = 0; i < n; i++) { y[i] = a * x[i] + y[i]; } } // x0 = &x[0] // x1 = &y[0] // x2 = &a // x3 = &n daxpy_: ldrsw x3, [x3] mov x4, #0 ldr d0, [x2] b.latch.loop: ldr d1, [x0, x4, lsl #3] ldr d2, [x1, x4, lsl #3] fmadd d2, d1, d0, d2 str d2, [x1, x4, lsl #3] add x4, x4, #1.latch: cmp x4, x3 b.lt.loop ret 36

21 daxpy (SVE) daxpy (scalar) daxpy_:.loop:.latch: ldrsw x3, [x3] mov x4, #0 whilelt p0.d, x4, x3 ld1rd z0.d, p0/z, [x2] ld1d z1.d, p0/z, [x0, x4, lsl #3] ld1d z2.d, p0/z, [x1, x4, lsl #3] fmla z2.d, p0/m, z1.d, z0.d st1d z2.d, p0, [x1, x4, lsl #3] incd x4 whilelt b.first ret p0.d, x4, x3.loop daxpy_:.loop:.latch: ldrsw x3, [x3] mov x4, #0 ldr d0, [x2] b.latch ldr d1, [x0, x4, lsl #3] ldr d2, [x1, x4, lsl #3] fmadd d2, d1, d0, d2 str d2, [x1, x4, lsl #3] add x4, x4, #1 cmp b.lt ret x4, x3.loop Q1: How do we handle the non-multiples of VL? Q2: What happens at different vector lengths? check bonus slides 37

22 Vector Architecture Design Trade-offs SLC SLC MC MC Example of Cache Coherent Vector Microarchitecture Cores with one or more SVE and Load/Store (LS) unit(s) NoC Private L1 and L2 caches (considered part of the core) System-level cache (SLC) shared among cores L2 Memory controllers (MC) L1 Network on chip (NoC) interconnects cores, SLCs and MCs SVE LS Core Core Disclaimer: Logical representation, not representative of a physical implementation 38

23 Simulation Tools: gem5 and DynamoRIO+ARMIE

gem5 gem5: open-source sophisticated processor and

Analyze workloads that customers use and care about

Comprehensive model library Memory and I/O devices

not a microarchitectural model out of the box!

x) Android Nougat New ideas can be tested quickly

24 gem5 gem5: open-source sophisticated processor and computer system simulator Runs real workloads Analyze workloads that customers use and care about including complex workloads such as Android Comprehensive model library Memory and I/O devices Full OS, Web browsers Rapid early prototyping But not a microarchitectural model out of the box! Ubuntu (Linux 4.x) Android Nougat New ideas can be tested quickly System-level impact can be quantified System-level insights Enables us to study complex memory-system 44

25 When Not to Use gem5 Performance validation gem5 is not a cycle-accurate microarchitecture model for any specific product This typically requires more accurate models such as RTL simulation. Commercial products such as Arm CycleModels operate in this space. Core microarchitecture exploration Only do this if you have a custom, detailed, CPU model! gem5 s core models were not designed to replace more accurate microarchitectural models. To validate functional correctness or test bleeding-edge ISA improvements gem5 is not as rigorously tested as commercial products. New (ARMv8.0+) or optional instructions are sometimes not implemented Commercial products such as Arm FastModels offer better reliability in this space. 45

26 Arm Research Enablement Kit System Modeling Using gem5 System-on-chip (SoC) systems are complex and expensive to prototype Simulation is a cost-effective way to evaluate new ideas gem5 models various system components, such as different CPU models, interconnection topologies, memory subsystems, etc. gem5 also models the interactions among system components Arm Research Enablement Kit is a guide through Arm-based system modeling using the gem5 simulator and a 64-bit processor model based on Armv8-A. 46

27 SVE Support in gem5 Commitment from Arm Research to open-source full support for SVE in gem5, with the goal of enabling research around SVE (and vector architectures in general) Main contributors: Giacomo Gabrielli Rekai Gonzalez Alberquilla Javier Setoain Gabor Dosza Still technically a work-in-progress, but most of the functionality is in place! 47

Where Do I Get It From? > git clone https://gem5.googlesource.

28 Where Do I Get It From? > git clone > git checkout sve/beta1 sve/beta1 is the latest development branch the plan is to merge all changes onto master after external review process Check README_SVE.md for up-to-date list of known issues and limitations 48

supports code transformations on any part of a program, while it executes.

29 DynamoRIO Open-source BSD project Support for AArch32, AArch64 (added by Arm Research), IA-32, and x86-64 Enable runtime code manipulation system that supports code transformations on any part of a program, while it executes. Provides utilities for memory address tracing Opportunity for custom instrumentations via clients 49

30 DynamoRIO + ArmIE SVE DynamoRIO does not offer (yet!) natively SVE support Arm Instruction Emulator (ArmIE) is used to execute and instrument any SVE instructions Memory mapped shared counter ensures output correctness for traces 3 Clients currently supported: Memory Tracing Instruction Tracing Instruction Count Individual traces can be merged and analyze via additional post-processing scripts DynamoRIO ArmIE SVE binary SVE instrumentation AArch64 instrumentation Shared Counter 51

31 DynamoRIO + ArmIE SVE Code intrinsics must be added to the applications to enable instrumentation These define the region-of-interest to evaluate DynamoRIO ArmIE SVE binary Shared Counter #define START_TRACE() { } #define STOP_TRACE() { } Full compatibility with single-threaded apps Multi-thread support currently under work SVE instrumentation AArch64 instrumentation 52

32 DynamoRIO + ARMIE for SVE Tracing Run ARMIE within DR Introduces noise to the instrumentation Avoid tracing ARMIE s memory accesses ARMIE library memory space is different from the application s memory space DR will only generate traces inside the application s memory region Memory mapped shared counter Atomically accessed by ARMIE and DR Post-process script merges traces based on the sequence number freedom of do your own analysis! ARM Executable LDR STR ADD SUB SVE LOAD SVE ADD LDR SVE STR DynamoRIO ARMIE Non-SVE 0x8000 0x x8004 0x x801C 0x x8000 0x x8004 0x x8014 0x x8014 0x x801C 0x x8020 0x x8020 0x40008 S SVE 0x8014 0x x8014 0x x8020 0x x8020 0x40008 S Merged trace Shared Counter 53

33 SVE Ecosystem

34 SVE Programming How to get started Computational Scientists, Domain Experts Assembly (not suggested) Full ISA Specification: The Scalable Vector Extension for ARMv8-A Lots of worked examples, see A sneak peek into SVE and VLA programming Intrinsics (only if truly needed) Arm C Language Extensions for SVE Arm Scalable Vector Extensions and application to Machine Learning Auto-vectorization via GCC, Arm Compiler for HPC, Cray, Fujitsu Compiler Hints to the compiler via OpenMP Best practice writing code #pragma omp parallel for simd 55

35 SVE Tools How to get started Performance Specialists, Computer Architects Arm Compiler Arm Instruction Emulator Arm v8-a Binary Arm v8-a Binary with new features Arm Instruction Emulator Linux Arm v8-a 56 Research Enablement Kit

Commercial C/C++/Fortran compiler with best-in-class performance Compilers tuned for Scientific

Tuned for Scientific Computing, HPC and Enterprise workloads Processor-specific optimizations

Arm-optimized OpenMP runtime Linux user-space compiler with latest features C++ 14 and Fortran

5* Support for Armv8-A and SVE architecture extension Based on LLVM and Flang, leading

36 Commercial C/C++/Fortran compiler with best-in-class performance Compilers tuned for Scientific Computing and HPC Latest features and performance optimizations Commercially supported by Arm Tuned for Scientific Computing, HPC and Enterprise workloads Processor-specific optimizations for various server-class Arm-based platforms Optimal shared-memory parallelism using latest Arm-optimized OpenMP runtime Linux user-space compiler with latest features C++ 14 and Fortran 2003 language support with OpenMP 4.5* Support for Armv8-A and SVE architecture extension Based on LLVM and Flang, leading open-source compiler projects Commercially supported by Arm Available for a wide range of Arm-based platforms running leading Linux distributions RedHat, SUSE and Ubuntu 57

37 Arm Compiler Building on LLVM, Clang and Flang projects Arm C/C++/Fortran Compiler C/C++ Files (.c/.cpp) C/C++ Frontend Clang based Optimizer LLVM based ARMv8-A code-gen LLVM based ARMv8-A binary Fortran Files (.f/.f90) PGI Flang based Fortran Frontend LLVM IR IR Optimizations Auto-vectorization Enhanced optimization for ARMv8-A and SVE LLVM IR SVE code-gen LLVM based SVE binary Language specific frontend Language agnostic optimization Architecture specific backend 58

Status of GCC for SVE Making progress towards the aim of getting SVE support into GCC 8 GCC support was developed by a team in Arm and released publicly in Nov 2016 This year Linaro have been

38 Status of GCC for SVE Making progress towards the aim of getting SVE support into GCC 8 GCC support was developed by a team in Arm and released publicly in Nov 2016 This year Linaro have been updating and incorporating those changes into upstream GCC Nearly 60% of the patches have been committed and a further 30% have been accepted but can t be committed yet Aim is to get the rest into GCC 8, due for release in Q Source: Arm HPC Workshop Tokyo 2017 by Linaro

Features of GCC for SVE Summary of features and scope of changes Vector-length agnostic and vector-length specific code generation Auto-vectorisation, including: Fully-predicated loops Predicated

39 Features of GCC for SVE Summary of features and scope of changes Vector-length agnostic and vector-length specific code generation Auto-vectorisation, including: Fully-predicated loops Predicated structure loads and stores Gather loads & scatter stores Aliasing checks with variable strides Non-associative reductions (FADDA) No support for SVE intrinsics yet: aim is to add that to GCC 9 61 Source: Arm HPC Workshop Tokyo 2017 by Linaro

40 Arm Instruction Emulator Develop your user-space applications for future hardware today Run Linux user-space code that uses new hardware features (SVE) on current Arm hardware Simple black box command line tool $ armclang hello.c --march=armv8+sve $./a.out Illegal instruction $ armie msve-vector-bits=256./a.out Hello Arm v8-a Binary Arm v8-a Binary with new features Arm Instruction Emulator Converts unsupported instructions to native Armv8-A instructions Linux Arm v8-a 62

41 Evaluating SVE Compile Emulate Analyse Arm Compiler C/C++/Fortran code SVE via auto-vectorization, intrinsics and assembly. Compiler Insight: Compiler places results of compiletime decisions and analysis in the resulting binary. Arm Instruction Emulator Runs userspace binaries for future Arm architectures on today s systems. Supported instructions run unmodified. Unsupported instructions are trapped and emulated. Arm Code Advisor Console or web-based output shows prioritized advice in-line with original source code. 63

insights are embedded into the application binary Instrumented OpenMP runtime traces OpenMP events Extensible

42 Arm Code Advisor Combines static and dynamic information to produce actionable insights Performance Advice Compiler vectorization hints Compilation flags advice OpenMP instrumentation Insight from compilation and runtime Compiler insights are embedded into the application binary Instrumented OpenMP runtime traces OpenMP events Extensible Architecture User can write plugins to add their own analysis information Data accessible via web browser or command line 64

43 Arm Code Advisor Typical workflow 65

44 Arm Code Advisor Screenshot 66

Exploring SVE for scientific applications Objective: understand if apps cat benefit from SVE, assess quality and readiness of tools Various Arm-based SoC (Huawei Taishan) Several applications of

45 Exploring SVE for scientific applications Objective: understand if apps cat benefit from SVE, assess quality and readiness of tools Various Arm-based SoC (Huawei Taishan) Several applications of interest: QE, KKRnano, GRID, BQCD Results on MiniKKR show Arm-based SoC (no SVE) similar performance versus x86 Estimate performance using instruction/branch counting (dynamic) and critical path analysis (static) Early Experience with ARM SVE, presented at SC 17 Arm SVE User Meeting by D. Pleiter (JSC) Exploring SVE for scientific applications, presented at HiPEAC 18 by S. Nassyr (JSC) 67

46 Full Application Enablement

Arm Allinea Studio A quick glance at what is in Arm Allinea

3) C/C++ Compiler Fortran Compiler Performance Libraries

47 Arm Allinea Studio A quick glance at what is in Arm Allinea Studio (latest 18.3) C/C++ Compiler Fortran Compiler Performance Libraries Forge (DDT and MAP) Performance Reports C++ 14 support OpenMP 4.5 without offloading SVE ready Fortran 2003 support Partial Fortran 2008 support OpenMP 3.1 SVE ready Optimized math libraries BLAS, LAPACK and FFT Threaded parallelism with OpenMP Profile, Tune and Debug Scalable debugging with DDT Parallel Profiling with MAP Analyze your application Memory, MPI, Threads, I/O, CPU metrics Tuned by Arm for a wide-range of server-class Arm-based platforms 69

Optimized BLAS, LAPACK and FFT Commercially supported by Arm Best serial and parallel performance

FFTW compatible interface for FFT routines Batch BLAS support Best serial and parallel performance

collaboration with silicon vendors Validated and supported by Arm Engineers Available for a wide range

48 Optimized BLAS, LAPACK and FFT Commercially supported by Arm Best serial and parallel performance Commercial 64-bit Armv8-A math libraries Commonly used low-level math routines - BLAS, LAPACK and FFT FFTW compatible interface for FFT routines Batch BLAS support Best serial and parallel performance Generic Armv8-A optimizations by Arm Tuning for specific platforms like Cavium ThunderX2 in collaboration with silicon vendors Validated and supported by Arm Engineers Available for a wide range of server-class Arm-based platforms Validated with NAG s test suite, a de-facto standard Validated with NAG test suite 70

49 18.2 release vs FFTW FFT timings on TX2 1-d complex-complex double precision Higher means new Arm Performance Libraries FFT implementation better than FFTW Results show: speed-up over FFTW 18.1 GROMACS LAMMPS NAMD Quantum ESPRESSO CASTEP 18.2 speeds up over FFTW Average about 30% faster than FFTW Lots of this comes from better usage of NEON vectors on TX2 Cases where we are still slower are: Very small (no major issue) Powers of primes better FFTW better

50 Arm Performance Library 18.3 release $ armclang code_with_math_routines.c \ -L${ARMPL_LIBRARIES} lamath $ armflang code_with_math_routines.f \ -L${ARMPL_LIBRARIES} -lamath 72

Arm Forge An interoperable toolkit for debugging and profiling Commercially supported by Arm Fully Scalable The

supported by Arm on x86, IBM Power, Nvidia GPUs, etc.

memory debugging) Sampling-based profiler to identify and understand bottlenecks Available at any scale (from serial

51 Arm Forge An interoperable toolkit for debugging and profiling Commercially supported by Arm Fully Scalable The de-facto standard for HPC development Available on the vast majority of the Top500 machines in the world Fully supported by Arm on x86, IBM Power, Nvidia GPUs, etc. State-of-the art debugging and profiling capabilities Powerful and in-depth error detection mechanisms (including memory debugging) Sampling-based profiler to identify and understand bottlenecks Available at any scale (from serial to petaflopic applications) Easy to use by everyone Unique capabilities to simplify remote interactive sessions Innovative approach to present quintessential information to users Very user-friendly 73

52 Software Ecosystem HPC Applications Porting GROMACS NAMD LAMMPS CESM MrBayes Bowtie AMBER Paraview SIESTA VMD WRF Quantum ESPRESSO CP2k MILC GEANT4 OpenFOAM GAMESS VisIT DL_POLY NEMO BLAST NWCHEM Abinit BWA QMCPACK Build recipes online at 74

Arm HPC Ecosystem website House for Arm s HPC ecosystem, information channels, and collaboration Latest events, news, blogs, and collateral including webinars, whitepapers and

53 Arm HPC Ecosystem website House for Arm s HPC ecosystem, information channels, and collaboration Latest events, news, blogs, and collateral including webinars, whitepapers and presentations Links to HPC open-source & commercial software packages Recipes for porting HPC apps Curated and moderated by Arm New Arm HPC User Group Forum 75

Wiki https://gitlab.com/arm-hpc/packages/wikis/home Dynamic list of common HPC applications Provides focus for porting progress Community driven. Maintained by Arm, but anyone can join and contribute.

54 Wiki Dynamic list of common HPC applications Provides focus for porting progress Community driven. Maintained by Arm, but anyone can join and contribute. Allows developers to share recipes, and learn from progress on other applications Provides a mechanism for tracking status of applications and package sets (e.g. OpenHPC packages, Mantevo, etc.) Up-to-date summary of package status 76

55 17-19 September 2018 Robinson College, Cambridge, UK Architecture & Memory Biotechnology Flexible Electronics High Performance Computing Internet of Things Large Scale Systems Machine Learning Security & Servers Agenda available online, early bird expires July 31 st! arm.com/summit

56 Thank You! Danke! Merci!!! Gracias! Kiitos! 78 Thanks to Eric Van Hensbergen, Chris Goodyer, Miguel Tairum- Cruz, Alex Rico, Jose Joao, Giacomo Gabrielli, Olly Perks

57 The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. 79

Arm's role in co-design for the next generation of HPC platforms

Arm's role in co-design for the next generation of HPC platforms Filippo Spiga Software and Large Scale Systems What it is Co-design? Abstract: Preparations for Exascale computing have led to the realization