Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific and Japan
Information Architecture Capability Model Data Data technology Technology Management management Information Sharing & Delivery Information sharing & delivery Business Intelligence intelligence & & Data data Warehousing warehousing Oracle Enterprise Architecture Framework Data security Data Security Data Data governance, Governance, quality Quality, & lifecycle Lifecycle mgmt Unstructured Data Analytical Data Big Data Enterprise data model Enterprise Data Model Data Realms Transaction Data Master Data Reference Data Metadata Master data management Master Data Management Oracle Information Architecture Framework Data integration Integration Content Content management Management Unstructured data Metadata Big Data Data Realms Analytical data Master data Transaction data Reference data
The Information Architecture Spectrum Evaluating Economic and Architecture Tradeoffs Data Realms Structure Volume Security Master data Transactions Analytical data Metadata Structured Medium - High Database, app, & user access Storage & Retrieval RDBMS / SQL Modeling Pre-defined relational or dimensional modeling Processing/I ntegration ETL/ELT, CDC, Replication, Message Consumption BI & Statistical Tools, Operational Applications Reference data Structured and Semi- Structured Low- Medium Platform security XML / xquery Flexible & Extensible ETL/ELT, Message System-based data consumption Documents and Content Unstructured High File system based File System / Search Free Form OS-level file movement Content Mgmt Big Data - Weblogs - Sensors - Social Media Structured, Semi- Structured, Unstructured High File system & database Distributed FS / nosql Flexible (Key Value) Hadoop, MapReduce, ETL/ELT, Message BI & Statistical Tools
Total Archive in TerraBytes (TB) Big Data Evolution of ESA's EO Data Archives between 1986-2007 and future estimates (up to 2020) The LOFAR Radio-Interferometre is producing 1.6TB/sec 138PB/day, setting new frontiers for radio-astronomy 22000 21000 20000 19000 18000 17000 16000 15000 14000 13000 12000 11000 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 1986 1989 1993 1995 1998 2000 2003 2005 2007 2015 2020 Year The volume of earth-observation data from European Space Agency s satellites passed 3PB in 2007 and the projection for 2020 is seven-fold Future Data Estimates LANDSAT 2-4 MSS (75-Dec 93) AQUA Modis (April 03-today) ENVISAT LR (March 02-today) ENVISAT HR (March 02-today) TERRA Modis (June 01-today) QUICK SCATT (01-today) /PROBA (May 02-today) LANDSAT 7 ETM (April 99-Dec 03) SEA STAR SeaWifs (Apr 98-today) ERS 2 HR (May 95-today) ERS 2 LBR (May 95-today) JERS SAR/OPS VNIR (92-Sep 98) ERS 1 HR (Jul 91-Mar 00) ERS 1 LBR (Jul 91-Mar 00) SPOT 1-4 HRV (87-today) MOS 1, 1b MESSR (87-Oct 93) NOAA 9-17 AVHRR (86-today) LANDSAT 5 TM (April 84-today) NIMBUS 7 (Nov 78-May 86), SEASAT (Jun-Oct 78) Courtesy of BERIS In genomics: Cost of sequencing is dropping by 50% every 5 months analysis, not sequencing, will be the main expense hurdle (Cambridge University, UK) The volume of worldwide climate data is expanding rapidly, creating challenges for both physical archiving and sharing, for ease of access of relevant information in a multidisciplinary environment In high energy physics, the data recorded by each of the big experiments at the Large Hadron Collider will be enough to fill around 100,000 DVDs every year! J T Overpeck et al. Science 2011;331:700-702 4
The Challenges of Big Data Volume Very large quantities of data Velocity Extremely fast streams of data Variety Wide range of data type characteristics Value High potential value if harnessed correctly 5
Intel Xeon 5500 Series: First Platform with End-to-End HW Virtualization Intel Virtualization Technology Intel VT-x Intel VT For Directed I/O Intel VT-d Intel VT For Connectivity Intel VT-c Processor Chipset Network Holistic platform centric approach for virtualization usages
Oracle First Platform with Data Embedded Instructions Oracle Enabling Technology SQL Smart Cache Low Latency Data Processing Unit DPU Data Aware Storage Data Defined Network Optimised for Data Processing and Database
120000000 100000000 80000000 60000000 40000000 20000000 0 Economies of Real Time Analytics Waiting for DATA Today s Research applications are increasingly held back by slow storage When requesting data, the server spends most of its time waiting for storage Application performance remains sluggish regardless of the Server CPU horsepower The traditional remedy of adding more DRAM or short-stroking HDDs is both expensive and inefficient
Big Data inside the Research Lifecycle Oracle s Engineered Systems Solution Oracle Big Data Appliance Oracle Exadata Oracle Exalytics Infiniband Acquire Organize Analyze Visualize 13
The Research Industry Solutions The Research Enterprise Research Analytics Research Data Management Research Administration & Control Our goal: To support researchers, their communities and their organizations to do better Research by providing cost-effective, reliable and open solutions 3
Oracle Research Analytics A platform that enables Researchers to: Work collaboratively on extremely large data sets providing performance and innovative ways to exploit into data Build workflows that best support science and the operations of complex Research Run applications and best adapt them to different scientific loads and challenges 9
Challenges to address Exponential growth in data and the ability to access critical information Enterprise infrastructure ability to quickly accommodate new data sources Evolve from data analysis to predictive science Ability to translate raw data into information and knowledge Managing resources across workloads and platforms 7
Oracle Differentiators Process high-volume, low-density information Support flexible data structures In-database deep analytics Perform analysis on big data Parallel execution for efficient processing Deep, rich set of analytics for extracting maximum business value Research Data Management Research Mission Research Infrastructure Research Ecosystem Research Administration 11
Research Analytics Flow Organization Discovery Visualization Sharing 12
Oracle Research Analytics: overview Organization Discovery Visualization Sharing High velocity loading and organization of information Ability to optimize workloads and system operations Ingest a wide range of data types Data integration Map reduction Statistical tools Analyze data across a wide variety of data characteristics using deep analytics Key Capabilities Key Benefits Represent analyze finding Transform big data into something easy to analyze Load data quickly Ability to work on extremely large data sets allowing researchers new ways to exploit data Ensure trust and security Interoperable access to distributed repositories of data Open standards-based environment Minimize development time and effort Ensures appropriate levels of access Lower cost of research Facilitate innovative approach to discovery and results Support deep rich set of analytics Minimize development time/effort Reduce time-to-discovery Lower cost of research Enables new science Facilitate manipulation of extremely large data sets Maximize analytic performance and achieve faster results Access to the latest investigative methods & tools Enables new science Ensures appropriate levels of access Enables cross-disciplinary science & discovery 14
Oracle s Integrated Big Data Solution Stack People. Process. Portfolio.
Oracle Integrated Solution for Big Data In-Database Analytics HDFS Oracle NoSQL Database Enterprise Applications Hadoop Oracle Big Data Connectors Data Warehouse Analytic Applications Interactive Discovery ACQUIRE ORGANIZE ANALYZE DECIDE
Oracle s Big Data solution Endeca Information Discovery Oracle Big Data Appliance Cloudera Hadoop Oracle NoSQL Open-Source R Big Data Connectors InfiniBand Oracle Data Integrator Oracle Exadata Oracle Advanced Analytics Oracle Spatial and Graph Oracle Database InfiniBand Oracle Exalytics Oracle Business Intelligence Acquire Organize & Discover Analyze Decide
Oracle Big Data Appliance Engineered Systems for Big Data Big Data Appliance Pre-configured and optimized for Big Data processing 18 Servers, 864GB RAM, 648TB Storage/Rack; easy rack expansion NoSQL, Cloudera Hadoop, Oracle R Oracle Loader, Oracle Data Integrator, HDFS Connector for integration Integrates into your existing architecture Streams data into Exadata @15 TB/hour Oracle Big Data Appliance
Oracle Exadata Engineered Systems for Systems of Record Exadata Oracle Exadata Fastest Data Warehouse & OLTP: 10X-20X fast load and query times 10X storage savings, 80% less power, and a lot less space Optimized for In-Database Analytics Model functions execute in storage Optimized for Network Throughput Network connections In from Big Data Capture and Out to In-Memory Analytics 1/5th to1/8th cost of other alternatives
Oracle Advanced Analytics Advanced In-Database Predictive Analytics Predictive Analytics Comprehensive Predictive Analytic platform built inside Database Data mining, text mining Statistical analysis (based on R) Built for data analysts / scientists Text Mining Statistics Data Mining Scalable and parallel: analyzes huge volumes of data Tightly integrated with SQL, enabling broad usage Works inside Exadata and Big Data Appliance
Oracle Exalytics In-Memory Engineered System for Analytics Exalytics In-Memory Machine Spans Relational, Multi-Dimensional, and Unstructured analysis, combined with Financial & Operational Planning In-Memory Optimized Hardware In-Memory Oracle BI, TimesTen, Essbase, and Endeca Several In-Memory Software Innovations Tightly integrated with Exadata
Oracle Information Discovery In-Memory Un-Structured & Semi-Structured Analysis Exalytics In-Memory Machine Unified Search Information Discovery Faceted Navigation Interactive Exploration Hybrid in-memory search / analytic engine Combines un-structured/structured and internal/external data (big data) Enables search, navigation, and discovery of data and correlations Data Mashup Unified Indexing Text Analysis Highly interactive UI for discovery/exploration Social Media Analytics Customer 360 Analysis Competitive Intelligence
Customer Success in Big Data Architecture People. Process. Portfolio.
Customer Success: Erasmus Medical Challenges Centre Complex data processing and analysis. Ability to load huge data information in minimum time store these data and their genomic DNA research results on storage disk have an efficient system able to give them query performance Results Thanks to an Exadata-based solution, Erasmus Medical Centre achieved: For a 11 minute query, Exadata could improve it to 1 second, which is a major advantage for researchers to have immediate results Smart Scan and Flash Card : give performance in analyzing data. Hybrid Columnar Compression : gives performance in the ability to manipulate Tb of data (compression from 133 Gb to 11 Gb), with increased performance. Adding Oracle Database 11g features like partitioning gives more performance in manipulating, quantifying data obtained through the study of various genomes 16
Customer Success: Oregon State University s COAS COAS: College of Oceanic and Atmospheric Sciences Challenges To expand its infrastructure to support its leading edge scientific research on the ocean and atmosphere s influence on the Earth s climate To meet the data intensive demands of its scientific research and foster an environment that will address current and future workflows Results With Oracle, COAS has an easy to manage, integrated system that delivers the flexibility and scalability necessary to address the exponential data increases associated with its leading-edge research, as well as quickly adjust to ever-changing data availability requirements. As a result of extending its infrastructure with Oracle, COAS has improved data movement and performance by approximately 3 to 4 times, reduced system administration and management time, and unified research silos to gain a holistic view of integrated data sets. Additionally, COAS can now manage its unusually large input/output (I/O) loads, enabling the computation, storage, analysis and visualization of massive data flows. 17
Customer Success: Indiana University Challenges To provide researchers with a first-class database environment that is secure, reliable and easy to use To gain rapidly and effectively insight into the data by building and managing research-oriented, data-intensive applications. To provide tools, templates and plug-ins they need to easily leverage research data to enhance their findings and increase productivity. Results Enable Research and effective data analysis in different fields Provide and run a robust, secure and cost-effective Research environment protecting data and ensuring that researchers have access to state-of-the-art technology. For additional insight into research data, it provides researchers with access to Oracle Data Mining, Oracle Spatial and Oracle OLAP to deliver its Database-as-a- Service to researchers both within Indiana University and at other universities around the country. 18