Government Needs in Big Data Analytics Irina Vayndiner, Ken Smith, Peter Mork
Government Big Data Challenges Data volumes are growing fast Need to ingest larger and larger amounts of data and to perform more and more complex analyses More data, faster and cheaper A sensor in every cave in Afghanistan, on every cargo container, on every possibly mad cow Regulations call for archiving more types of data (e.g., e-mails) in searchable format, other unstructured data Need for real-time data analysis becoming crucial for many sponsors Technologies that worked for smaller data sets often do not scale Need to ensure cost effectiveness while providing high availability e.g., space & power in data centers Big Data Research is supported by Government, for example: Department of Energy National Science Foundation Many countries are involved in the physics as well as data analysis. They have a distribution challenge, sending the data across 150 data centers across dozens of countries. Some of our sponsors are dealing with data on a similar scale. -- Michal Cenkl, Director, MITRE, in a comment after CERN presentation 2
Examples of Government Big Data Sensor Data Biosecurity Emergency preparedness Logistics Economics Data, including financial compliance Climate modeling Environmental Data Cybersecurity "US government computers are attacked an average of 1.8 billion times a month 1 Healthcare, e.g., new healthcare initiative compliance Many others 1 Congress vulnerable to online attacks, Politico, Mar 22, 2010 3
Real-Time Data Analysis Getting real-time answers to complex questions through detailed analysis using historical data through Pattern detection Optimization Demand prediction, predictive modeling Link analysis Examples of pattern detection and optimization problems: Epidemics early detection and other alerts Fraud detection; e.g., stimulus fraud detection using link analysis Compliance, e.g., financial Optimized decisions about logistics 4
Analytics Map-Reduce with RDBMSs or as a standalone system Use of in-database analytics CPU-intensive complex analytics e.g., with a lot of mathematical calculations Historical (or old ) and near real-time data (or new ) data, their interaction, compression and performance optimization for each type Scaling dimensions: data volume and variety, complexity of analytics, and shrinking time frame 5
Other Government needs in Data Management Collaboration, data sharing, data integration for very large data Data pedigree and provenance Cross-domain data sharing Data cleanliness at a scale New data centers increase risk of data security issues Thin pipes between the operations in the field and data centers Edge decision making in the field Use of social media in government Clouds Other Government needs and challenges: GOTS vs. COTS vs. open source solutions Often not enough trained cleared resources in newer technologies Need for easier and more efficient data management, including workload management 6
XLDB solutions evolution Sponsor s needs are changing Often needs are too broad, or changing very fast or unanticipated Planned vs. evolved scalability The following things evolve Schemas To accommodate for changing requirements Role of hardware acceleration Analytics Constantly need for new analytics and ways to develop them Hardware - e.g., need for near real-time data on non-indexed fields New types of storage devices, faster servers, networking DBMSs - E.g., space is needed for the results of Big Data queries - Importance of data compression as a driver in both cost and power/space/performance. Vendors are constantly innovating Needed: risk management for XLDB solutions 7
Conclusion Increasing Data Volumes Need for Real Time processing Need for Complex Analytics QUESTIONS? 8