Gigabyte Bandwidth Enables Global Co-Laboratories

Similar documents
Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Programmable Information Highway (with no Traffic Jams)

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Grid Computing a new tool for science

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

Internet data transfer record between CERN and California. Sylvain Ravot (Caltech) Paolo Moroni (CERN)

Center for Advanced Computing Research

The CMS Computing Model

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Data Transfers Between LHC Grid Sites Dorian Kcira

Storage and I/O requirements of the LHC experiments

ICN for Cloud Networking. Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory

Summary of the LHC Computing Review

Media-Ready Network Transcript

High-Energy Physics Data-Storage Challenges

Achieving Record Speed Trans- Atlantic End-to-end TCP Throughput

Data Intensive Science Impact on Networks

Exaflood Optics 1018

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

CERN s Business Computing

5 KEYS TO A WINNING TECHNOLOGY PARTNERSHIP

Physicists Set New Record for Network Data Transfer 13 December 2006

Towards Network Awareness in LHC Computing

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Management Information Systems OUTLINE OBJECTIVES. Information Systems: Computer Hardware. Dr. Shankar Sundaresan

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

Case Study: Tata Communications Delivering a Truly Interactive Business Intelligence Experience on a Large Multi-Tenant Hadoop Cluster

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

High Throughput WAN Data Transfer with Hadoop-based Storage

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

FAST Kernel: Background Theory and Experimental Results. (Extended Abstract)

N. Marusov, I. Semenov

Virtualizing a Batch. University Grid Center

Parallelism and Concurrency. COS 326 David Walker Princeton University

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

We invented the Web. 20 years later we got Drupal.

SSD in the Enterprise August 1, 2014

CSCS CERN videoconference CFD applications

New Approach to Unstructured Data

Travelling securely on the Grid to the origin of the Universe

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

HEP replica management

o MAC (6/18 or 33%) o WIN (17/18 or 94%) o Unix (0/18 or 0%)

Wide-Area Networking at SLAC. Warren Matthews and Les Cottrell (SCS Network Group) Presented at SLAC, April

CC-IN2P3: A High Performance Data Center for Research

2 Databases for calibration and bookkeeping purposes

An Oracle White Paper December Oracle Exadata Database Machine Warehouse Architectural Comparisons

FAST TCP: From Theory to Experiments

CERN and Scientific Computing

LHC and LSST Use Cases

Improving Packet Processing Performance of a Memory- Bounded Application

Lenovo RD240 storage system

DATA MINING II - 1DL460

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Prepare for Ludicrous Speed! Kelly Urbanik Programs Specialist

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

FAST TCP: From Theory to Experiments

CMS Data Transfer Challenges and Experiences with 40G End Hosts

Next Generation Integrated Architecture SDN Ecosystem for LHC and Exascale Science. Harvey Newman, Caltech

System Requirements. PREEvision. System requirements and deployment scenarios Version 7.0 English

Advanced Placement Computer Science Principles The Information Age

A VO-friendly, Community-based Authorization Framework

Now we are going to speak about the CPU, the Central Processing Unit.

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010.

ISTITUTO NAZIONALE DI FISICA NUCLEARE

The LHC computing model and its evolution. Dr Bob Jones CERN

Pervasive PSQL Summit v10 Highlights Performance and analytics

The LHC Computing Grid

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

SENSE: SDN for End-to-end Networked Science at the Exascale

Processor: Faster and Faster

Grid Computing: dealing with GB/s dataflows

Disk-to-Disk network transfers at 100 Gb/s

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

Storage on the Lunatic Fringe

A Minute of Mainframe Batch Sorting on Windows

UW-ATLAS Experiences with Condor

Fujitsu: Your Partner for SAP HANA Solutions

Extending InfiniBand Globally

AMD DEVELOPER INSIDE TRACK

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

System Requirements. PREEvision. System requirements and deployment scenarios Version 7.0 English

Overview. About CERN 2 / 11

Technology & Society. More Questions Than Answers. Tal Lavian Nortel Networks Labs

Batch Services at CERN: Status and Future Evolution

CyberStore DSS. Multi Award Winning. Broadberry. CyberStore DSS. Open-E DSS v7 based Storage Appliances. Powering these organisations

Grid Computing: dealing with GB/s dataflows

High Performance SSD & Benefit for Server Application

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK

Look Up! Your Future is in the Cloud

High Performance Ethernet for Grid & Cluster Applications. Adam Filby Systems Engineer, EMEA

International Cooperation in High Energy Physics. Barry Barish Caltech 30-Oct-06

MINIMUM HARDWARE AND OS SPECIFICATIONS File Stream Document Management Software - System Requirements for V4.2

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP

Grid Computing Activities at KIT

White Paper. How the Meltdown and Spectre bugs work and what you can do to prevent a performance plummet. Contents

IEPSAS-Kosice: experiences in running LCG site

DATA MINING II - 1DL460

Transcription:

Gigabyte Bandwidth Enables Global Co-Laboratories Prof. Harvey Newman, Caltech Jim Gray, Microsoft Presented at Windows Hardware Engineering Conference Seattle, WA, 2 May 2004 Credits: This represents the joint work of many people. CERN Olivier Martin Paolo Moroni. Caltech Julian Bunn Harvey Newman Dan Nae Sylvain Ravot Xun Su Yang Xia; S2io Leonid Grossman AMD Brent Kelley Newisys John Jenne Doug Norton Rich Oehler Dave Raddatz Microsoft Eric Beardsley Jim Gray Neel Jain Peter Kukol Clyde Rodriguez Inder Sethi Ahmed Talat Brad Waters Bruce Worthington Ordinal Charles Koester Chris Nyberg 1

Analysis and development Global Distributed Cooperative Massive amounts of data ~1 Gigabytes per second ~20 Petabytes per year Gigabyte bandwidth required Today 500+ Physicists 100+ Institutes 35+ Countries 2007 5,000+ Physicists 250+ Institutes 60+ Countries CMS Thanks Bill. Good Morning! We are honored to have Harvey here today. He leads the effort connecting the experiments at CERN in Geneva, Switzerland with Physicists in the Americas and co-leads the International Virtual Data Grid Laboratory. Harvey, what is the Virtual Data Grid? Large physics experiments have become international collaborations involving thousands of people. Current and new experiments share a common theme: Groups all over the world are distributing, processing and analyzing massive datasets. The picture at right shows the CERN LHC accelerator, where four experiments will generate about 300 terabytes per second. Pre-filtering reduces that to about a gigabyte per second of recorded data, or 20 petabytes a year. Wow! That s a 2 zetabytes a year. The equivalent of sifting through a hundred million copies of the Library of Congress each year and saving 1,000 copies of it. And we continually reprocess and redistribute it, as we advance our ability to separate the rare discovery signals from the petabytes of standard physics events we already understand. That s why we need gigabyte per second bandwidth. Atlas 2

Experiment Tier 1 ~5 GBps ~PBps Filter ~1 GBps CERN Tier 0 +1 INP3 RAL INFN FNAL ~1 GBps Tier 3 Tier 4 Physics data cache ~1 GBps.1 GBps Institute Institute Workstations Institute Institute 30 Petabytes by 2008 1 Exabyte by 2014 We developed a four tiered architecture to support these collaborations. Data processed at CERN is distributed among the Tier1 national centers for further analysis. The Tier1 sites act as a distributed data warehouse for the lower tiers, and refine the data by applying the physicists latest algorithms and calibrations. The lower tiers are where most of the analysis gets done. They are also a massive source of simulated events. As a database guy, I am scared of managing 20 petabytes, that s hundreds of thousands of disks. Yes, and as the LHC intensity increases, the accumulation rate will increase, and we expect to reach an Exabyte stored by 2013-15. All the flows in this picture are designated in gigabytes per second; so it s clear why we need a reliable gigabyte per second network; And our bandwidth demand is accelerating, along with many other fields of science; growing by a factor of two each year or 1000-fold per decade; Much faster than Moore s Law.. We have to innovate each year, learning to use the latest technologies effectively just to keep up. 3

OC192 = 9.9 Gbps Harvey This is the network we re building. We re working with our partners to extend it round the world. The next generation will be an intelligent global system, with real-time monitoring and tracking, and with adaptive agent-based services at all layers that control and optimize the system. 4

Send data from CERN to Caltech Fast Use standard TCP/IP Rules set by Internet2 @ http://lsr.internet2.edu/ New speed record, multiple parallel streams: ~ 850 mbps 6.2 6.8 gbps gbps~ ~ 800 mbps 850 mbps per second 7,000 6,000 5,000 4,000 3,000 2,000 1,000 0 2000 2001 2002 2003 2004 2005 Internet2 is a consortium of universities building the Next Generation Internet. They defined a speed contest to recognize work on high-speed networking. Microsoft entered and won the first round of this contest, but we have not entered since. Indeed, Harvey s team at Caltech has been setting the records over the last two years. To set the records, including the one just certified by Internet2, we used the network shown on the previous slide and out-of-the-box tcp/ip to move data from CERN to Pasadena. Using Windows on Itanium2 servers with Intel and S2io 10 Gigabit Ethernet cards, and Cisco switches, we reached 6.25 gigabits per second or just under 800 megabytes per second. 5

4:42 AM 6.8 Gbps Last night we set a new record, actually at 4:30 a.m. using Windows on new Isis AMD Opteron and Itanium 2 servers with S2io and Intel 10-gigabyte Ethernet cards and Cisco switches, and we reached 6.8 gigabytes a second or just about 850 megabytes per second. So that's a CD per second. We needed 64-bit machines for this; 32-bit Xeons top out at about 4 gigabits per second. Now we're working towards 1 gigabyte per second. 6

Workstation 2 x AMD Opteron Tyan motherboard SATA + NTFS sequential 20 disks 1.0+ GBps read/write Server 4 x AMD Opteron Newisys system SATA + NTFS sequential 48 disks 2.0+ GBps read/write Details at: http://research.microsoft.com/~peteku research.microsoft.com/~peteku/ Harvey and I are collaborating on transcontinental disk-to-disk transfers at 1GBps. That s the real science requirement. Harvey moves the data 11,000 kilometers. My job is to move it the first and last meter, from and to disk. We measured disk bandwidth on various machines. An NEC super-server can easily read and write at 3.5 GBps. Workstation class AMD Opterons deliver about 1GBps A Newisys 4-way AMD Opteron server delivers over 2 GBps from NTFS stripe sets. Details of our disk measurements are on my website. It finds NTFS on AMD Opterons deliver several gigabytes per second. We are now working on moving data at 1GBps from a 20-disk Newisys Opteron at CERN to an identical system at Caltech. 7

Beyond TCP/IP Predictable stability Better error and congestion control Hybrid networks with dynamic optical paths Higher speed links Truly global collaboration and virtual organizations I m confident we ll soon be able to move data at one gigabyte per second or more across the globe. But we still need a lot of work to make TCP/IP stable over long distance networks at gigabyte per second data rates. And work is needed to put this into production use, which is what enables the science. Data-intensive science needs these technologies to allow our global scientific communities to collaborate effectively at gigabyte per second speeds. This is essential for the next round of discoveries at energies that were previously unobtainable. Networks are a lot faster than you think. They're running at gigabyte per second speeds. That means that you need to move things around inside the processor, IO system, and memory system, at tens of gigabytes a second in order to keep up. Working with Caltech and working with CERN has been a wonderful thing for the Windows group. We've learned a huge amount from them, and I really want to thank Harvey for coming here today and telling us about his work. With that we'll turn the podium back to Bill. Thank you very much 8