Virtual EM Inc. Ann Arbor, Michigan, USA

Similar documents
Lecture 7: Introduction to HFSS-IE

Advanced Surface Based MoM Techniques for Packaging and Interconnect Analysis

Simulation Advances for RF, Microwave and Antenna Applications

Simulation Advances. Antenna Applications

High Performance Computing

1.2 Numerical Solutions of Flow Problems

A Graphical User Interface (GUI) for Two-Dimensional Electromagnetic Scattering Problems

HFSS Ansys ANSYS, Inc. All rights reserved. 1 ANSYS, Inc. Proprietary

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou

HFSS Hybrid Finite Element and Integral Equation Solver for Large Scale Electromagnetic Design and Simulation

HFSS 14 Update for SI and RF Applications Markus Kopp Product Manager, Electronics ANSYS, Inc.

Lattice Boltzmann with CUDA

Fast Multipole Method on the GPU

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method

Numerical Algorithms on Multi-GPU Architectures

Aspects of RF Simulation and Analysis Software Methods. David Carpenter. Remcom. B = t. D t. Remcom (Europe)

Driven Cavity Example

computational Fluid Dynamics - Prof. V. Esfahanian

Studies of the Continuous and Discrete Adjoint Approaches to Viscous Automatic Aerodynamic Shape Optimization

High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters

LATTICE-BOLTZMANN AND COMPUTATIONAL FLUID DYNAMICS

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK

Turbostream: A CFD solver for manycore

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Lecture 2: Introduction

High-Frequency Algorithmic Advances in EM Tools for Signal Integrity Part 1. electromagnetic. (EM) simulation. tool of the practic-

Two-Phase flows on massively parallel multi-gpu clusters

1 Past Research and Achievements

Towards real-time prediction of Tsunami impact effects on nearshore infrastructure

CUDA Experiences: Over-Optimization and Future HPC

International Supercomputing Conference 2009

LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs

Optimization of HOM Couplers using Time Domain Schemes

Recent Approaches of CAD / CAE Product Development. Tools, Innovations, Collaborative Engineering.

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata

Performance Analysis of the Lattice Boltzmann Method on x86-64 Architectures

Software and Performance Engineering for numerical codes on GPU clusters

Optimization of FEM solver for heterogeneous multicore processor Cell. Noriyuki Kushida 1

Hardware Acceleration for CST MICROWAVE STUDIO. Amy Dewis Channel Manager

Reconfigurable Computing - (RC)

Exploring the features of OpenCL 2.0

FEKO Mesh Optimization Study of the EDGES Antenna Panels with Side Lips using a Wire Port and an Infinite Ground Plane

Introduction to Parallel Programming in OpenMp Dr. Yogish Sabharwal Department of Computer Science & Engineering Indian Institute of Technology, Delhi

Large-scale Gas Turbine Simulations on GPU clusters

Parallelization of a Electromagnetic Analysis Tool

Unstructured Grid Numbering Schemes for GPU Coalescing Requirements

Introducing Virtuoso RF Designer (RFD) For RFIC Designs

A laboratory-dualsphysics modelling approach to support landslide-tsunami hazard assessment

Session S0069: GPU Computing Advances in 3D Electromagnetic Simulation

The future is parallel but it may not be easy

Accelerating Double Precision FEM Simulations with GPUs

Shape Optimization (activities ) Raino A. E. Mäkinen

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016

CGT 581 G Fluids. Overview. Some terms. Some terms

High Performance Computing for PDE Some numerical aspects of Petascale Computing

Numerical methods in plasmonics. The method of finite elements

Shape optimisation using breakthrough technologies

Computational electromagnetic modeling in parallel by FDTD in 2D SIMON ELGLAND. Thesis for the Degree of Master of Science in Robotics

INNOVATIVE CFD FOR SUPER-COMPUTER RESULTS ON YOUR DESKTOP

Center for Computational Science

IMPLEMENTATION OF ANALYTICAL (MATLAB) AND NUMERICAL (HFSS) SOLUTIONS ADVANCED ELECTROMAGNETIC THEORY SOHAIB SAADAT AFRIDI HAMMAD BUTT ZUNNURAIN AHMAD

Next-generation CFD: Real-Time Computation and Visualization

GPU Acceleration of Matrix Algebra. Dr. Ronald C. Young Multipath Corporation. fmslib.com

Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids

AIR LOAD CALCULATION FOR ISTANBUL TECHNICAL UNIVERSITY (ITU), LIGHT COMMERCIAL HELICOPTER (LCH) DESIGN ABSTRACT

Particleworks: Particle-based CAE Software fully ported to GPU

Recent Via Modeling Methods for Multi-Vias in a Shared Anti-pad

cuibm A GPU Accelerated Immersed Boundary Method

Porting a parallel rotor wake simulation to GPGPU accelerators using OpenACC

Sorting Through EM Simulators

COSC6365. Introduction to HPC. Lecture 21. Lennart Johnsson Department of Computer Science

Application Performance on Dual Processor Cluster Nodes

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Maxwell: a 64-FPGA Supercomputer

Using Sonnet Interface in Eagleware-Elanix GENESYS. Sonnet Application Note: SAN-205A JULY 2005

Graphical Processing Units (GPU)-based modeling for Acoustic and Ultrasonic NDE

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

General Purpose GPU Computing in Partial Wave Analysis

New Technologies in CST STUDIO SUITE CST COMPUTER SIMULATION TECHNOLOGY

Introduction to C omputational F luid Dynamics. D. Murrin

Simulation of Turbulent Flow around an Airfoil

Outline. Darren Wang ADS Momentum P2

Experimental Validation of the Computation Method for Strongly Nonlinear Wave-Body Interactions

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Advances of parallel computing. Kirill Bogachev May 2016

D036 Accelerating Reservoir Simulation with GPUs

ME964 High Performance Computing for Engineering Applications

Dual Polarized Phased Array Antenna Simulation Using Optimized FDTD Method With PBC.

Computation of Velocity, Pressure and Temperature Distributions near a Stagnation Point in Planar Laminar Viscous Incompressible Flow

A Kernel-independent Adaptive Fast Multipole Method

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

SENSEI / SENSEI-Lite / SENEI-LDC Updates

Fra superdatamaskiner til grafikkprosessorer og

MAGNETIC ANALYSIS OF BRUSHLESS DC MOTORS USING THE BOUNDARY ELEMENT METHOD. Integrated Engineering Software - Website Links

EM Software & Systems GmbH

Accelerating Double Precision FEM Simulations with GPUs

NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011

Transcription:

Functional Description of the Architecture of a Special Purpose Processor for Orders of Magnitude Reduction in Run Time in Computational Electromagnetics Tayfun Özdemir Virtual EM Inc. Ann Arbor, Michigan, USA tayfun@virtualem.com SPI 2013, Paris, France

1. Definition of the Problem 2. Algorithm 3. Processor Architecture Organization of the Talk Chip and the User Interface Scalable Run Time Processor Nodes and Mapping of the Algorithm 4. Manufacturing of the Chip SPI 2013, Paris, France, May 13, 2013 2

1. Definition of Problem Use of Computational Electromagnetics (CEM) Signal Integrity during design EMI related to board design and packaging RF Circuit Antennas Slow rate of adoption of CEM tools due to large run times Sequential Programming Use of general purpose hardware Recent Progress Massively parallel machines Graphics Processing Units (GPUs) Multi-core Central Processing Units (CPUs) Giga Floating Point Operations Per Second (Gflops)/$ is still too low MDGRAPE delivers small multiples of Gflops/$ but is expensive SPI 2013, Paris, France, May 13, 2013 3

What is Needed Orders of magnitude increase in Gflops/$ ratio: 100x or 1,000x What is the hold up? Current algorithms require sequential programming General purpose hardware are used The problem with the current algorithms: Based on Galerkin s or Functional formulation Results in Linear System of Equations Sequential Programming for Matrix Fill and Solver Poor scaling in High Performance Computing (HPC) platforms Gflops/$ of GPUs and Multi-core CPUs are increasing but Sequential Programming is holding back the scaling Economy of Scale for GPUs and multi-core CPUs will not alone suffice A new paradigm needed: New Algorithms implemented in the form of Hardware designed specially for CEM: Example: FFT Chips SPI 2013, Paris, France, May 13, 2013 4

2. Algorithm Hardware and the Algorithm are inseparable: Algorithm implemented in the form of hardware What needs to happen for realizing the above? A new numerical algorithm that can be implemented in hardware in a scalable fashion A special purpose processor built to implement the above algorithm Rule: Electrical Numerical Science lags Mechanical by 20 years Computational Fluid Dynamics (CFD) colleagues have already done it! Abandoned Navier-Stokes Equations in favor of Boltzman Equation Simulate the flow of a Newtonian fluid with collision models Simulate streaming and collision processes across a limited number of particles to realize a viscous flow behavior across greater dimensions SPI 2013, Paris, France, May 13, 2013 5

Lattice Boltzman Method (LBM) PowerFLOW software by Exa Corporation (Burlington, MA, USA) Fictitious particles performing consecutive propagation and collision processes over discrete lattice Yields the Navier-Stokes equations in asymptotic expansion Algorithm is highly scalable on HPC platforms (After Lattice Boltzmann Methods for Fluid Dynamics by Steven Orszag, Yale University) SPI 2013, Paris, France, May 13, 2013 6

Conventional CFD vs LBM (After Lattice Boltzmann Methods for Fluid Dynamics by Steven Orszag, Yale University) SPI 2013, Paris, France, May 13, 2013 7

New Algorithm for CEM We are not proposing to replace Maxwell's Equations We are not proposing to abandon the current efforts on developing algorithms with a focus on sequential programming Such efforts will continue to exist and are essential in improving algorithmic efficiency Rather, replacing the Galerkin s and Functional methods that are used today to directly discretize the Maxwell s equations Perhaps the CEM analogy to LBM Multi-pole expansion of the field? New Algorithm must be Maxwellian Two options: New scalable algorithm for existing HPC platforms (as CFD folks did) A new special purpose processor with an accompanying new algorithm SPI 2013, Paris, France, May 13, 2013 8

New Algorithm for CEM We are not proposing to replace Maxwell's Equations We are not proposing to abandon the current efforts on developing algorithms with a focus on sequential programming Such efforts will continue to exist and are essential in improving algorithmic efficiency Rather, replacing the Galerkin s and Functional methods that are used today to directly discretize the Maxwell s equations Perhaps the CEM analogy to LBM of CFD Multi-pole expansion of the field? Two options: New scalable algorithm for existing HPC platforms (as CFD folks did) A new special purpose processor with an accompanying new algorithm What is presented here. SPI 2013, Paris, France, May 13, 2013 9

3. Processor Architecture Inspired by the Finite Element Method (FEM): FEM Chip Expandable to: Finite Difference Time Domain (FDTD) and the Time Domain Finite Element Method (TDFEM) Method of Moments (MoM) requires a bit more thinking. SPI 2013, Paris, France, May 13, 2013 10

FEM Chip and User Interface Geometry Excitation Boundary Conditions FEM Chip Unknowns (E or H Field Vector) PC GUI Engine API FEM PCI Board SPI 2013, Paris, France, May 13, 2013 11

Scalable Run Time ~ O(N) CLOCK (250MHz) ITER 1 ITER 2 ITER 3 ITER N CLOCK One clock cycle = (1/250) micro seconds t (sec) Solution Time = N Clock Cycle = (N / 250) x 10-6 seconds SPI 2013, Paris, France, May 13, 2013 12

Run Times Type of Problem (*) N Run Time A resonant antenna ~10 3 1msec Five wavelength long RF circuit ~10 5 10msec Small Boat ~10 8 1 sec F16 aircraft (**) ~10 9 1 min Large Ship ~10 11 5 hrs (*) One frequency point and/or one look angle at 10GHz (**) U.S. Air Force Challenge since 1970s SPI 2013, Paris, France, May 13, 2013 13

Processor Nodes & Scalable Algorithm Multi-Pole Representation of the EM Field (ongoing research) Map Mesh Nodes to Computing Nodes (ongoing research) P1 P2 P1 P2 P4 P3 P3 P4 P5 P5 P6 P6 MESH COMPUTING NODES SPI 2013, Paris, France, May 13, 2013 14

New Paradigm Nodes must perform simple computations Data sharing must be local Must converge in O(N) iterations, i.e., O(N) clock cycles At each iteration, P1 = a * P1 + b *P2 + c * P3 + d * P5 (ongoing research) Computational Unit (P1 = a * P1 + b *P2 + ) b, P2 c, P3 d, P5 P1 c b P2 P1 b, P1 d P3 P4 RAM = P1 c, P1 d, P1 P5 P6 CLOCK SPI 2013, Paris, France, May 13, 2013 15

New Paradigm Computational Unit (P1 = a * P1 + b *P2 + ) b, P2 c, P3 d, P5 P1 c b P2 P1 b, P1 d P3 P4 RAM = P1 c, P1 d, P1 P5 P6 CLOCK SPI 2013, Paris, France, May 13, 2013 16

New Paradigm Computational Unit (P1 = a * P1 + b *P2 + ) b, P2 c, P3 d, P5 P1 c b P2 P1 b, P1 d P3 P4 RAM = P1 c, P1 d, P1 P5 P6 CLOCK SPI 2013, Paris, France, May 13, 2013 17

New Paradigm Computational Unit (P1 = a * P1 + b *P2 + ) b, P2 c, P3 d, P5 P1 c b P2 P1 b, P1 d P3 P4 RAM = P1 c, P1 d, P1 P5 P6 CLOCK SPI 2013, Paris, France, May 13, 2013 18

4. Manufacturing of the Chip Chip Design & Manufacturing Functionality (ongoing research) HDL via Verilog (~$300K) Low volume prototype chip ($20K/unit via Asian Foundries) PCI Card Design & Manufacturing Application Programming Interface (API) PC System and Benchmarking High Volume Production SPI 2013, Paris, France, May 13, 2013 19

Manufacturing Challenges Chip has to have as many nodes as the number of unknowns 1K for a Resonant Antenna but 100M for a Small Boat at 10GHz 3D Chip not possible today Interconnects between the nodes form a 3D lattice Has to be 2D with today s chip manufacturing technology Parallel nature of the above paradigm (and therefore the run time scaling with N) must be compromised Introduce a level of sequential computational steps Sub-divide the three-dimensional solution space into sections, each of which could be mapped to a two-dimensional grid A reasonably high Gflops/$ ratio could still be achieved. SPI 2013, Paris, France, May 13, 2013 20

GFlops/$ Wars Acceleware Corp: GPUs? Gflops/$1,000 Impulse Technologies: FPGAs? Gflops/$1,000 Appro International: Blade Clusters 4 Gflops/$1,000 (*) IBM: BlueGene L 0.1 Gflops/$1,000 Virtual EM: MDGRAPE machine 21 Gflops/$1,000 Proposed Scheme (estimated) >100 Gflops/$1,000 (*) 2008 numbers SPI 2013, Paris, France, May 13, 2013 21

Next Steps 1. Confirm that the proposed architecture will provide a) Order of magnitude increase in Gflops/$ b) O(N) scaling of Run Time via a) Simulations b) Limited prototyping using simple micro-controllers serving as simple computational nodes 2. Research on Algorithms a) Scalable b) Can be implemented in hardware 3. Manufacturing of the Processor a) 3D Chip (not possible in the near future) b) 2D Chip for 3D Problems with compromised scaling (most likely) c) 2D Chip for 2D Problems 2D problems Body of Revolution (BoR) Problems SPI 2013, Paris, France, May 13, 2013 22

Next Steps 4. Improve scaling of current algorithms on today s hardware GPUs Multi-core CPUs FPGAs ARM-based micro-controllers SPI 2013, Paris, France, May 13, 2013 23