By : Veenus A V, Associate GM & Lead NeST-NVIDIA Center for GPU computing, Trivandrum, India Office: NeST/SFO Technologies, San Jose, CA,

Size: px

Start display at page:

Download "By : Veenus A V, Associate GM & Lead NeST-NVIDIA Center for GPU computing, Trivandrum, India Office: NeST/SFO Technologies, San Jose, CA,"

Byron Shelton
5 years ago
Views:

1 By : Veenus A V, Associate GM & Lead NeST-NVIDIA Center for GPU computing, Trivandrum, India Office: NeST/SFO Technologies, San Jose, CA, gmail. com

2 Sri Buddha Do not simply believe in anything because you have heard it. No matter that if I have told it! Believe only after you observe and analyze. Reference: Anguttara Nikaya, Vol 1,

3 Application Architecture policies Scientific Visualization Software blends with the platform Demands of modern users Proof of Concept >> Product

4 We are showing a few technical experiments for your understanding. Not a PRODUCT demonstration!

6 Pre Solver Post

7 The data structures are plain to process May be a few arrays. An under graduate can understand all these in plain form. Graphics is not that vast Compared to a typical game, it is a simple deal. Na?!! A bit serious results Users will adjust!

8 .. Let me explain about our background before continuing..

9 It will reveal the way how we are proceeding so!

12 We make specific software solutions for your scientific needs.

13 We are specialized in engineering software development. NeST-NVIDIA center for GPU computing Lab specifically for GPU based technologies Inaugurated by Dr. Bill Dally chief scientist NVIDIA

14 How to architect the software for your futuristic hardware and software.. Proof-of-concept to Product Not giving emphasis on: Features of the applications Algorithms

15 In focus: Scientific data visualization Pre Solver Post

16 ` Shapes and geometry Outer surfaces Analysis model (boundary & other params) Volume (v or thd) Multi physics Solver Display Frame (image) Results Volume (tensor)

17 ` Shapes and geometry Outer surfaces Analysis model (boundary & other params) Historical Experiment Results Known model (Expert system db some cases) For eg: inverse modeling process Volume (v or thd) Multi physics Solver Display Frame (image) Results Volume (tensor)

18 Workstation PC

19 Shapes and geometry 48.8 KB Historical Experiment Results 2.5 TB Outer surfaces 591 MB Known model (Expert system db some cases) For eg: inverse modeling process 5.6 GB Volume tetra hedron 3.2 GB 5.93 MB PC Display Frame 1920 x 1080 Multi physics Solver 2.22 MB Tablet Display Frame 1080 X GB x 10 Results Volume

20 Workstation PC

21 10GB/s HDD CPU RAM 340 MB/s SATA DDR3 70 ~130 MB/s CPU Cores Mother board Bus SSD 350~550 MB/s 12 GB/s PCI Express Interface GIGABIT Ethernet Interface GDDR5 GPU Memory 5.3 GB/s (Global) Fast local Network Internet GPU Cor es 42GB/s Global Memor y Texture Memor y GPU Cor es GPU Cache (2D) Intranet User (Tablet) Remote User (Tablet or Browser) GPU Cor es (Shared Memory)

22 Algorithms are good. Mathematics doing fine for centuries Newton s laws, Maxwell's equations still hold good. Proof of concepts might be the best the world!

23 The data structures are plain to process May be a few arrays. An under graduate can understand all these at plain form. Graphics is not that vast Compared to a typical game, it is a simple deal. Na?!! A bit serious results Users will adjust!

24 A popular myth pci express cannot give data to monitor..! PCIExpress can give good frame rate if your data is ready in CPU memory A lot of points like when you closely watch the platform facts.. GPU for FLOPS only

27 Shapes and geometry 48.8 KB Historical Experiment Results 2.5 TB Outer surfaces 591 MB Known model (Expert system db some cases) For eg: inverse modeling process 5.6 GB Volume tetra hedron 3.2 GB 5.93 MB PC Display Frame 1920 x 1080 Multi physics Solver 2.22 MB Tablet Display Frame 1080 X GB Results Volume

29 GPU means - More FLOPS/$, FLOPS/realestate. Use GLSL for graphics (SH 5.0 gives you freedom of mesh quality too!) CUDA syntax is simple, do data flow analysis for maximum throughput But don t forget to juice your CPU too!

30 Offline processing before graphics viewer Even letting your user to have a coffee before he starts to analysis.! Extra data - Mind HDD space and transfer rate Spatially order data viewer will seek like that. Processor wait means DELAY! 2D locality of reference Make an LoD arrangement User want response not details always!

31 Maximum parallelism, WARP full, threads > cores Only compute for the device and screen. Higher resolution is not always needed. User wants responsive software Pixel shader is your time eater.. Resolution of RT GPU utilized for other compute, do these based on real response metrics. Do 2D bicubic instead.

32 Texelize.. Texelize. Read-only data, a knowledge that gives freedom for GPU cache Use asynchronous system at the maximum Processor is not the only active component in the board! Use streams of CUDA or switching of textures

33 Its time of BOYD Do watch software systems on specific platforms For googling: Kepler grid, cloudgaming

34 Volume viewer voxelized data Geometric Editor Mesh can be perfect! Preparation for solver - inverse modeling with GPU (only platform work) Remote visualization for post processor

36 Video

38 Volume resolution and dimensions Avoid empty spaces Bricking, Compression Quality Graphics demanded Phong SM

39 Video

41 Algorithm based on Laplacian The operations involved is as follows. Select a ROI in the mesh on the screen Draw a sketch on the screen suggesting a edited region of mesh The model will be reshaped to fit the curve but still retaining the shape.

42 2D edge tracking to 3D was a challenge Used modified form of classic algorithms of CPU. In GPU was difficult Created regular triangles on the fly to give neat result Same area. So isosceles or equilateral

43 To make the model, real world data used Huge data inputs Point cloud, volumetric, high data rate Inverse modeling techniques used by preparatory algorithm SVD to avoid non-significant information Challenge partial volume correlation

44 Volume division optimized for maximum threads in gpu and MPI Model the control flow (limit) as per the locality heuristics (expert system with direction vectors) Always handle border separate(good for processor) Each module may not be that fast..! Win war.. Not every battle!

45 Users demand BOYD Not all features but subset KEPLER GRID most awaiting hardware

47 Features Html 5 client Stream based server LoD based RayCaster viewer TO Nvidia iray Serviced on a GPU cluster Challenge Time-to-market: Conversion of existing engine Multi user support and faster data speed

49 Proof-of-concept level complexities - Algorithm level research Development process How to manage projects which involves scientific stuff and new platform challenges. Test automation architecture Deployment scenarios and hardware tuneup at the final level (it is a fact always!)

50 Remember Kalama Sutta Your questions may transform my thinking Please ask even after the session

51 Do write to us on technical and business queries. Speaker: gmail.com Website: Business queries: hpc@nestgroup.net

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in