Dynamic Graph Query Support for SDN Management Ramya Raghavendra IBM TJ Watson Research Center
Roadmap SDN scenario 1: Cloud provisioning Management/Analytics primitives
Current Cloud Offerings Limited control of the network Requires integration of third-party solutions Limits the opportunity to migrate production applications introduction of cloud networking functions Subnets and ACLs e.g., VPC enhancements Network monitoring e.g., CloudWatch VPN to the enterprise e.g., Virt Private Cloud Server load balancing e.g., Elastic Load Balancing persistent connectivity for services e.g., elastic IP base IP connectivity Third-party virtual appliances Examples of Missing Features No ability to create VLANs in the cloud No facility to manage bandwidth or QoS Limited ability to craft network segments No intelligence for dynamically structured networks reference: http://broadcast.oreilly.com/2010/12/ cloud-2011-the-year-of-the-network-in-the-cloud.html 3
Network as a Service Model High-level VM and network description virtual switches, hypervisors Network devices, switches, routers, firewalls, setup virtual switch on host Device state OpenFlow, SNMP, BladeOS etc Monitoring, notifications Cloud Controller Network controller network description Provide rich set of services isolation, custom addressing, service differentiation etc. Interact with variety of network devices support multiple cloud platforms, device management protocols Introduces a network controller network-aware VM placement, QoS support, real-time monitoring, diagnostics, management, security etc. * CloudNaaS: A Cloud Networking Platform for Enterprise Applications, Symposium on Cloud Computing, 2011
Cloud Networking-as-a-Service Cloud controller Provides base IaaS service for managing VM instances and images Self-service provisioning UI Connects VMs via host virtual switches self-service UI Network specification Cloud controller Network controller application Network controller Provides VM placement directives to cloud controller Generates virtual network between VMs Configures physical and virtual switches application middleware OS VM middleware OS VM virtual network application middleware OS VM 5
Prototype Cloud Controller: OpenNebula 1.4 Modified to accept user-specified network policies Modified to accept placement decisions from Network Controller Network Controller: NOX and OpenFlow-enabled switches Network controller implemented as a C++ NOX application (~2500 LOC) HP Procurve 5400 switches w/ OpenFlow 1.0 firmware VM2 VM4 Network Controller OpenNebula Cloud Controller HOST5 VM8 HOST1 VM1 VM5 SWITCH 2 SWITCH 3 SWITCH 5 VM3 HOST3 HOST2 SWITCH 1 SWITCH 4 HOST4
Roadmap SDN scenario 1: Cloud provisioning Management/Analytics primitives
Graph Queries for Network Management Support for efficient queries on data structure containing time-varying graph, e.g., shortest path between two nodes Utilized in various network management operations 9 24 14 15 5 20 30 18 44 2 11 16 6 19 6 Network graphs can represent: physical network elements such as routers, switches, and servers virtual elements such as VMs and virtual switches logical elements such as people, processes, web pages, etc. and links between them Algorithm support: shortest path spanning tree min flow
Cloud DCN Graphs Are Dynamic Cloud DCNs may experience frequent changes Deployment of new VMs, removal of old ones Migration of VMs Layer 2/3 network reconfiguration Graph updates edge/vertex insertion & deletion edge weight change (e.g. congestion level) Customer: this is what my workloads look like Monitoring, notifications High-level VM & network description Cloud controller Network controller High-level network description & commands OpenFlow 9
NetGraph Software Architecture Cloud Controller shortest path spanning tree reachability bipartite matching n/w aware placement Load balancing Broadcast SDN-enabled Network Controller Service Modules Isolation Routing Path computation OpenFlow SNMP QoS Monitoring Topology graph queries physical & virtual network graph, updates NetGraph shortest paths, spanning tree, transitiveclosure Physical and virtual network infrastructure * NetGraph: Dynamic Graph Query Primitives for SDN-based Cloud Network Management, HotSDN, Sigcomm 2012
System G Tools v1.0 Architecture Visualization Huge Network Visualization Network Propagation I2 3D Network Visualization Geo Network Visualization Graphical Model Visualization Analytics Communities Graph Search Network Info Flow Bayesian Networks Centralities Graph Query Shortest Paths Latent Net Inference Ego Net Features Graph Matching Graph Sampling Markov Networks Middleware Database System G Assets Open Source IBM Product Hardware BigInsights Hadoop Pthreads PERCS Coh. Clus. GBase (update, scan, operators, indexing)) HBase HDFS Shared Memory Run Time Library Graph Processing Interface Graphs RDMA Native Store Distr. Memory RT Library MPI Cluster (BladeCenter, BlueGene) Graph Data Interface Netezza DB2 RDF DB2 Graphs FPGA/ HMC Infosphere Streams (ISS) TinkerPop Compliant DBs
Graph Workload Types Type 1: Computations on graph structures / topologies Bayesian network to Junction tree Example converting Bayesian network into junction tree, graph traversal (BFS/DFS), etc. Characteristics Poor locality, irregular memory access, limited numeric operations 1 2 Type 2: Computations on graphs with rich properties Example Belief propagation: diffuse information through a graph using statistical models Characteristics 3 4 5 6 11 Locality and memory access pattern depend on vertex models Typically a lot of numeric operations Hybrid workload 7 10 8 9 1 2 3 4 5 6 11 7 10 8 9 1 2 3 4 5 6 11 7 10 8 9 6,4, 5 3,1, 2 5,3, 4 11,5,6 7,3, 5 10,7 9.8 8,7 7,3,5 9.8 3,1,2 5,3,4 8,7 6,4,5 10,7 11,5, 6 Type 3: Computations on dynamic graphs Example streaming graph clustering, incremental k-core, etc. Characteristics Poor locality, irregular memory access Operations to update a model (e.g., cluster, sub-graph) Hybrid workload 3-core subgraph
13 System G Analytics Overview System G is a comprehensive set of graph analytics libraries for IBM Big Data Analytics Communities Graph Search Network Info Flow Bayesian Networks Centralities Graph Query Shortest Paths Latent Net Inference Ego Net Features Graph Matching Graph Sampling Markov Networks Analytics target at big graphs Create HBase coprocessors to provide scalable graph analytics by distributing data and computation in a balanced way based on graph topology. They outperform MapReduce-based approaches by 2.8 times. Exploit RDMA and other hardware-based optimization for CPU intensive analytics to achieve high CPU utilization, low IO and good speed up Analytics target at dynamic graphs A major differentiator compared to competitors like GraphLab (dynamic graph clustering coefficients analytics on System G is up to 21 times faster than GraphLab) Include incremental K core, evolution aware clustering, and streaming graph clustering coefficients Exploit IBM Infosphere Stream framework
Analytics Characteristics Analytics Exact or Approx. Graph size and dynamics K-core Exact Tested against > 200M edge graph K-neighborhood Exact Scale free K shortest path Exact Scale free Connected component Exact Tested against > 70M edge graph Pagerank Exact Tested against > 70M edge graph Graph search & recommendation Exact Scale free Probabilistic Inference in Bayesian Networks Approx. Constrained by physical memory XRIME scalable community detection Approx. Tested against > 5M edge graph Snazzy community detection Exact Tested against > 1M edge graph Dynamic subgraph matching Exact Tested against > 150K edge graph. Change up to 30% within 10 sec Incremental streaming K-core Exact Tested against > 16M edge graph Streaming graph clustering Approx. Tested against 400K updates/sec Streaming graph clustering coefficient * Cloud Service Placement via Subgraph Matching, ICDE 2014 14 Exact Tested against > 1.8B edge graph, up to 24M updates/sec
System G Graph Data Interface API 15 Existing Implementations: GBase HBase implementation of Graph Native Store C++ Graph Store, in memory and on disc DB2RDF Tuned for SPARQL + RDF Non-IBM Graph DBs: TinkerGraph in memory graph Neo4J Titan (on HBase, BerkeleyDB, Cassandra) More
Sample queries What is the average hourly RTT at a router? When RTT is high, are the k hop links experiencing high RTT? What is the current shortest path between two network endpoints? Is the direct link more expensive than the overlay link?
Ongoing Work Graph + Time series Queries Graph store coupled with (distributed) time series support What was the shortest path at peak hour yesterday? How do routes vary on a daily basis? Graph + Location Queries RCA based on client locations, or service locations
Thank You! Ramya Raghavendra IBM Research rraghav@us.ibm.com