NOW and the Killer Network David E. Culler

Size: px

Start display at page:

Download "NOW and the Killer Network David E. Culler"

Thomasina Wilcox
5 years ago
Views:

1 NOW and the Killer Network David E. Culler NOW 1

2 Remember the Killer Micro 100,000,000 10,000,000 R10000 Pentium Transistors 1,000, ,000 i80286 i80386 R3000 R2000 i ,000 i8080 i4004 1, Year NOW 2

3 Timely Engineering of Large Systems SpecInt SpecFP Year NOW 3

4 90 s Technological Breakthrough the killer network the single-chip, high bandwidth, low-latency, high reliability building block for scalable networks => the new LAN? at least SAN (System Area Network) NOW 4

5 The Last Two Inches Problem Really Fast Workstations ns MS Network Stacks Drivers Network Interfaces µs Fast Scalable Networks NOW 5

6 Goals of the Research Fundamental change in how we design largescale computing systems snap together commodity components self-managing, self-tuning, highly available Make the killer network real realize the potential of emerging hardware technology and push its effect through the rest of the system Integrated system on a building-wide scale pool of resources (proc, disk mem) remote processor and memory closer than local disk federation of systems with local and global role The right way to build internet services NOW 6

7 NOW System Architecture Large Seq. Apps Parallel Apps Sockets, Split-C, MPI, HPF, vsm Global Layer Unix Resource Management Network RAM Distributed Files Process Migration Unix Workstation Unix Workstation Unix Workstation Unix Workstation Comm. SW Comm. SW Comm. SW Comm. SW Net Inter. HW Net Inter. HW Net Inter. HW Net Inter. HW Fast Commercial Switch (Myrinet) NOW 7

8 Intelligent Network Interfaces Processing power and storage embedded in the NIC Mryicom Net 160 MB/s Myricom NIC P M M I/O bus (S-Bus) 50 MB/s $ P Sun Ultra 170 NOW 8

9 AM: Fast, Portable Communication Overhead+Latency Gap (1/BW) NOW-SS20 NOW-Ultra Paragon Meiko NOW-SS20 NOW-Ultra Paragon Meiko µs g L Or Os NOW 9

10 MPI over AM System Start-up (µs) Peak BW (MB/s) NOW Paragon Meiko CS IBM SP NOW 10

11 Sockets over AM: Latency Time (µs) TCP 100BT TCP Fast Sockets GAM Transfer Size NOW 11

12 netperf, ttcp bandwidth Throughput (MB/s) Transfer Size FastSockets: netperf FastSockets: ttcp Myrinet TCP: netperf Myrinet TCP: ttcp NOW 12

13 Application Sensitivity to Overhead Slowdown ebarnes radix em3d sample p-ray murphi connect radb Overhead (µs) NOW 13

14 Sensitivity to gap (1/msg rate) Slowdown ebarnes radix sample em3d p-ray connect murphi radb gap (µs) NOW 14

15 Sensitivity to Latency Slowdown p-ray ebarnes connect em3d radix radb sample murphi Added latency (us) NOW 15

16 General Purpose, fault-tolerant Virtual Networks Many User / System Process User Processes NIC NIC Segment Driver Segment Driver Dynamic binding of multiple virtual network end-points directly to physical NIC resources Deep integration with VM and threads Smart NIC mux-demux, errors, and flow-control Clean error model - return-to-sender NOW 16

17 Truely Distibuted File System XFS Scalable Low-Latency Communication Network Cooperative Caching Local Cache P P P P P P P P File Cache File Cache File Cache File Cache File Cache File Cache File Cache File Cache Log Structured File Stripe Group G = Node Comm BW / Disk BW NOW 17

18 Virtualization of O.S. Services system call Std. Unix User Process redirection System Call translation modified system call on user s behalf Conventional Unix Kernel Small kernel insertion provides redirection to user-level translation facilty Translation provide global-local mapping of system functions for std. binaries NOW 18

19 World-Record Disk-to-Disk Sort Gigabytes sorted Minute Sort SGI Processors Seconds Datamation Million Record Sort SGI Processors NOW 19

20 Toward a Web O.S. Build on basic technology developed for NOW to provide a powerful operating environment for advanced web applications Global (URL-based) file system imports home environment to NOW and vice versa build services on a cache-coherent global file system OS sandbox isolates foreign entity Smart (java-based) browsers provide scalable, interactive front-end Rent-a-server when you re too hot Cooperative web caching around the planet Interactive services NOW 20

21 Toward Immense Disk Clusters NOW 21

22 Clusters of SMPs (CLUMPS) NOW 22

23 Overall System Configuration NOW-2 Myricom Scalable Network 100 Ultras 200 disks Tertiary Disk 24 PCs 400 disks 4 x 8 SMPs Switched EtherNet Servers ATM Backbone LBL-NERSC Router UCB NPACI NOW 23

24 Invitation System is operational enough for research CS267 is using it heavily Think about it for term projects CS252, CS262, CS286,... Ready to work with other research groups see: NOW 24

Parallel Computing Trends: from MPPs to NoWs

Parallel Computing Trends: from MPPs to NoWs (from Massively Parallel Processors to Networks of Workstations) Fall Research Forum Oct 18th, 1994 Thorsten von Eicken Department of Computer Science Cornell