GlobalNOC Services Update 2015 Internet2 Global Summit
Annual Report http://globalnoc.iu.edu/annual-report/2014/ 4/28/15
Service Desk Year in Review: Welcomed ARE-ON and OSHEAN to the GlobalNOC Family All I2 FootPrints Projects Consolidated Into 1 = 1/5 of the Former Notifications Grown by 4 Staff and 1 Robot April 28, 2015
Service Desk Year in Review: Conducted DR Exercise in Early December 2015 with Positive Result Created and Implemented a Major Incident Communication Policy April 28, 2015
Service Desk Activity Metrics for 2014 1.9 million alarms/year ~ 5200/day 30,000 tickets created/year ~ 82/day 15,600 phone calls received/year ~ 43/day 264,000 e-mails sent and received ~ 720/ day April 28, 2015
Year Ahead: Service Desk Pursuing ISO 20,000 certification Why? By When? What Will the Net Effect Be?
2015 Priorities
2015 Focus Areas
Automation
Goal Find the worst things to do by hand. Make a machine do those things. Things that are: Dangerous Slow Annoying
Focus Areas Business Processes on-call button auto-assign issues auto-notify auto-discover devices in a new network Reporting How many times did we call an engineer? Config automation alerting on config drift generate template config for new boxes push & pipeline Incident Advisor auto-fix hints Annoying
Service Management
Goal MINIMIZE unplanned work confusion inconsistency Stay flexibile, agile, and custom
Huh? STANDARDIZE: for processes where consistency is most important ORGANIZE: a simple lightweight structure where custom and novel work happens
2 Parts Part 1: ISO/IEC 20000 Certification Sparked by Internet2 effort, working to reach certification Aligned with ITIL Incident Management Change Management Capacity Management Availability Management etc
2 Parts Part 2: Other service-level improvements Service Dashboard (end users, network owners) Prioritize improvements Faster Turn-up Change Management
So what It s not good enough anymore to talk about boxes and circuits. Everything is more complicated now. We don t deliver networks, we deliver services Requires rigor to make sure those services work, and agility to make sure those services evolve quickly
example What s the availability of everyone s IP Service for Internet2? complexities: multiple sessions connectors back each other up Let s define available! First, a service is down if packets have to be retransmitted So: Up = ALL BGP sessions are established, no loss known At Risk = At least 1 session is down, but at least one route is still in the routing table Down = no routes
Data Model Entity Routed R&E Service BGP Routing Data Peer State Routes BGP Peering BGP Peering ASN Peer IP SLA Reporting Engine Weekly Report
Service Awareness
Corresponding process S y s report generated SLA met? no send to NPT N T P yes outage in GRNOC control? yes recommend changes Dir of Op no Approve Changes? no Recommended Changes N et w or k O w n er Published Report yes Published Report with Outline of Changes
Work Management
Goal Get coherent system to manage our work systems tools disciplines processes In other words, track, prioritize, and measure everything we do.
This means For the people who do work: "Where do I go to see everything I'm supposed to be doing? What should I be doing first? For the managers: "Are we too busy? Are we working on the right things? For the strategic view: "Are we doing well/better than a year ago?
How does work get tracked Tickets Emails Post-its Workflow records Meeting docs Many todo lists
The future Review ticketing Look at structured processes Project management Unified view of workload and results
Recruiting
Goal Make sure we have enough talented people now and 5 years from now
Parts Attract & hire Pipeline Get more students in Improve Development
Attracting How do we attract experts that fit? Challenges Scary job descriptions People don t know what R&E or GlobalNOC does Indiana - No really, it s a nice place!
Pipeline Getting people into the pipeline Students have worked very well Summer of Networking How do we get more? Keeping the talent growing Develop people well Level up!
What s New With GlobalNOC Software?
SNAPP High performance SNMP measurement/visualization tool 3 major revisions, project began in 2002 RRDtool based storage High performance SNMP data collector Web-based data browser and Web-services API
SNAPP 4 with TSDS Moving from RRDtool to a non-relational database TSDS Database based on MongoDB Sophisticated query language: TSQL Rich meta-data integrated with data. Allows for powerful queries; long-term longitudinal analysis General Time Series Data Store, not just SNMP data Ex. NOC activity metrics / key performance indicators; optical characteristics (light levels, loss, etc.); environmental/power data; aggregate flow data; OWAMP; BWCTL
Alertmon Improvements Alert Collapsing Collapse services on a host when host is not reachable Root cause analysis based on dependency graph allows for intelligent collapsing of alerts and suggests root cause of multiple alerts. Monitoring of management VPN endpoints to collapse alerts behind VPN when management network access is impaired
NOAA Operations Portal High-level overview of network status Operational Status Map Performance Measurement Overview Operations Calendars Detailed data pulled from other GlobalNOC tools Multi-network aggregate views
SciPass Science DMZ Campus Networks are enterprise infrastructure large number of small flows security is a required capability not elephant flow friendly could just bypass but that doesn t provide required security what about performance assurance? 19
Combine OpenFlow Switch Bro PerfSonar create reactive system default to secure / slow path use IDS to control what goes on fast path Approach
Reactive Bypass Performance 64 ms - time to detect and bypass 250 ms - doubled throughput of firewall 1.5 sec - same throughput as no firewall
Find Out More Software Page https://globalnoc.iu.edu/sdn/scipass.html Code Repository https://github.com/globalnoc/scipass email globalnoc@iu.edu ebalas@iu.edu
FlowSpace Firewall Developed in partnership with Internet2 Open Source Software OpenFlow Hypervisor Slice OpenFlow 1.0 based on VLAN ID Currently running on Internet2 AL2S Other deployments growing. We re interested in helping get FlowSpace Firewall running on your OpenFlow network More Information/Download: http://globalnoc.iu.edu/sdn/fsfw.html/