CERN network overview Ryszard Jurga CERN openlab Summer Student Programme 2010
Outline CERN networks Campus network LHC network CERN openlab NCC 2
CERN campus network 3
Network Backbone Topology Gigabit 10 Gigabit Multi Gigabit Multi 10 Gigabit Computer Center Vault Farms 230x BB20 RB20 PG1 PL1 SL1 BL7 RL7 BB16 40x RB16 50x BB17 RB17 SG1 IB1 Internet PG2 SG2 IB2 Monitoring BB51 ZB51 BB52 ZB52 BB53 ZB53 ZB54 BB54 ZB50 BB50 ZB55 BB55 BB56 ZB56 AB51 RB51 AB52 RB52 AB53 RB53 RB54 AB54 RB50 AB50 RB55 AB55 RB56 AB56 Meyrin area Prevessin area BL5 RL5 2-S 10-1 40-S2 376-R 513-C 874-R 887-R 513-CCR BT2 BT3 BT13 212 354 376 513 RT3 RT4 RT14 RT13 BT12 BT1 BT16 874 874 RT17 RT2 513-CCR BL4 513 874 RL4 100x BT15 BL3 RL3 BL2 RL2 1000x or 874-CCC RT16 LHC area BT4 RT5 BT5 RT6 BT6 BT7 BT8 BT9 BT10 BT11 RT7 RT8 RT9 RT10 RT11 RT12 BL1 RL1 TT1 TT2 TT3 TT4 TT5 TT6 TT7 CERN sites Minor Starpoints 100x 2175 2275 2375 2475 2575 2675 2775 2875 TT8 21x 433x AG AG AG ATCN 15x Control 15x Control 10x Control 10x 15x TN 15x TN 10x TN 10x TN AG AG AG CDR AG 8x CDR AG CDR CDR AG AG HLT DAQ DAQ DAQ DAQ 25x 90x 90x Original document : 2007/M.C. Latest update : 19-Feb-2009 -O.v.d.V. MS3634-version 13-Mar-2009 O.v.d.V.
The CERN Networks Facts and Figures Three distinct multi-ten-gigabit backbones 150+ very high performance routers 3 700+ subnets 2200+ switches (increasing) 50 000 active user devices (exploding) 80 000 sockets 5 000 km of UTP cable 400+ starpoints (from 20 to 1 000 outlets) 5 000 km of fibers (CERN owned) 150 Gbps of WAN connectivity 4.8 TB LCG core Desktops, HPC, VoIP, Process control, etc Extremely Dynamic 1 500+ requests for Moves-Adds-Changes per month ISP like (2x more visitors than staff) Multi vendor site using only standards
Switches & Routers Force10 for the CORE HP for distribution
Network Management (1) CERN is particular Very large infrastructure all centrally managed Large number of network devices (of different manufacturers) Very limited staff, rationalization 10+ years of development of our own NMS ~5 software developers Extremely high level of automation 500 000+ lines of code/150+ DB tables Only one commercial package Network Supervision: SPECTRUM (CA) 7
Network Management (2) SOAP Devices States Topology Addressing Cables Fibers Security (FW) SLAs User Requests Tickets 2500 network devices Cisco HP HP 3com F10 Configuration Manager HR Active Dir. Others If it does not fit, it will not happen
DNS Stats ~2500 DNS queries/sec (all CERN DNS servers) 9
DHCP Stats Annual Closure 10
LHC Network 11
CMS ATLAS ALICE CERN Computer Center
LHC Networking 13
The Roles of Tier Centers 11 Tier1s, over 100 Tier2s LHC Computing will be more dynamic & networkoriented Prompt calibration and alignment Reconstruction Store complete set of RAW data Tier 0 (CERN) Tier 1 Reprocessing Store part of processed data Monte Carlo Production Physics Analysis Physics Analysis Tier 1 Tier 2 Tier 3 14
LHC Open Private Network It consists of any T0-T1 or T1-T1 link which is dedicated to the transport of WLCG traffic It s based on 10Gbps light path technologies Infrastructure is provided by GEANT, US LHCNet, National Research and Education Networks (NREN) and Commercial links 70 000 km of optical paths 15
LHCOPN map 16
T0-T1 Lambda Routing TRIUMF T1??? ASGC Via SMW-3 or 4 (?) T1 Copenhagen DK T1 NDGF BNL T1 MAN LAN NY??? AC-1 AC-2/Yellow VSNL N RAL T1 London UK SURFnet NL T1 SARA DE Hamburg Frankfurt CH T1 Starlight VSNL S Paris FR Strasbourg/Kehl T1 GRIDKA Stuttgart FNAL Atlantic Ocean Basel Zurich Lyon Madrid ES Barcelona T1 PIC T1 IN2P3 T0 GENEVA Milan IT T1 CNAF Michael Enrico (DANTE) 17
LHCOPN traffic Collisions and Luminosity Increase STEP 09 Preparation for the LHC startup 10 27 10 28 1030 1029 LHC is back 18
Network Competence Center Projects CINBAD (2007-) & WIND (2010-) 19
CINBAD codename deciphered CERN Investigation of Network Behaviour and Anomaly Detection Project Goal To understand the behaviour of large computer networks (10 000+ nodes) in High Performance Computing or large Campus installations to be able to: Detect traffic anomalies in the system Be able to perform trend analysis Automatically take counter measures Provide post-mortem analysis facilities
data sources CINBAD project principle storage analysis collectors
sflow main data source Based on packet sampling (RFC 3176) on average 1-out-of-N packet is sampled by an agent and sent to a collector packet header and payload included (max 128 bytes) switching/routing/transport protocol information application protocol data (e.g. http, dns) SNMP counters included low CPU/memory requirements scalable For more details, see our technical report http://openlab-mu-internal.web.cern.ch/openlab-mu-internal/documents/2_technical_documents/technical_reports/2007/rj-mm_samplingreport.pdf 22
CINBAD sflow data collection Configuration via SNMP Configurator Data for adjusting sflow configuration Collector Up to 1Gbps of raw sflow Multicasted to two collectors Unpacked Data Aggregated data CINBAD DB Level II Storage Redundant Collector Level I Processing Level I Disk Storage Level II Processing Current collection based on traffic from ~1000 switches ~6000 sampled packets per second ~ 3500 snmp counter sets per second ~100GB per day Level III Processing Data Mining
CINBAD-eye Host Data Activity Collection and Connectivity, and Switch Activity Trends 24
Tools for Anomaly Detection Statistical analysis methods detect a change from normal network behavior selection of suitable metrics is needed can detect new, unknown anomalies poor anomaly type identification Signature based we ported SNORT to work with sampled data performs well against known problems tends to have low false positive rate does not work against unknown anomalies 25
Synergy from both detection techniques Translation Rules Incoming sflow stream Sample-based SNORT evaluation engine Anomaly Alerts Statistical Analysis engine New baselines New Signatures Traffic Profiles 26
Statistical and Signature-based Anomaly Detection 27
WIND 28
WIND codename deciphered Wireless Infrastructure Network Deployment Project Goals Analyze the problems of large scale wireless deployments and understand the constraints Simulate behaviour of WLAN Develop new optimisation algorithms Verify them in the real world Improve and refine the algorithms Deliver : algorithms, guidelines, solutions
Why WIND? WLAN deployments are problematic Radio propagation is very difficult to predict Interference an ever present danger WLANs difficult to properly deploy Monitoring was not an issue when the first standards were developed When administrators are struggling just to operate the WLAN, performance optimisation is often forgotten
Problem Example RF interference Max data rate in 0031-S: The APs work on 3 independent channels Max data rate in 0031-S: The APs work on the same channel S. Ceuterickx 31
Acknowledgments IT/CS 32
Thank you! 33