Oracle R Technologies R for the Enterprise Mark Hornick, Director, Oracle Advanced Analytics @MarkHornick mark.hornick@oracle.com
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 3
What technologies does Oracle provide? Oracle R Distribution ROracle Oracle R Enterprise Oracle R Advanced Analytics for Hadoop 4
Oracle R Distribution Ability to dynamically load Intel Math Kernel Library AMD Core Math Library Solaris Sun Performance Library Oracle Support An Oracle-Supported Redistribution of Open Source R Enhanced linear algebra performance via dynamically loaded libraries Improve scalability at client and database for embedded R execution Enterprise support for customers of Oracle Advanced Analytics option, Big Data Appliance, and Oracle Linux Free download Oracle contributes bug fixes and enhancements to open source R
ROracle R package enabling scalable and performant connectivity to Oracle Database Open source, publicly available on CRAN Oracle is maintainer Oracle Database Interface (DBI) for R Re-implemented and optimized driver based on OCI Execute SQL statements from R interface Enables transactional behavior for insert, update, and delete Oracle Database ROracle
Oracle R Enterprise Oracle Advanced Analytics Option to Oracle Database Eliminate memory constraint of client R engine Minimize or eliminate data movement latency Leverage Oracle Database as HPC environment Execute R scripts through database server machine for scalability and performance Leverage parallel, distributed in-database data mining algorithms Execute and manage R scripts via SQL Operationalize R scripts in production applications eliminate porting R code Avoid reinventing code to integrate R results into existing applications Client R Engine ORE packages Oracle Database User tables In-db stats SQL Interfaces SQL*Plus, SQLDeveloper, Database Server Machine 7
HCache Hadoop Abstraction Layer Oracle R Advanced Analytics for Hadoop ORD R Client R script {CRAN packages} Hadoop Job Mapper Reducer R HDFS R MapReduce R Hive R sqoop/olh Hadoop Cluster MapReduce Nodes {CRAN packages} HDFS Nodes ORD Transparent access to Hadoop Cluster from R Manipulate data in HDFS, Hive, database, and file system Write and execute MapReduce jobs with R Leverage CRAN R packages to work on HDFS-resident data Prepackaged parallel, distributed algorithms Oracle Database
Analytics Pain Points for example It takes too long to get my data or to get the right data I can t analyze or mine all of my data it has to be sampled Putting R models and results into production is ad hoc and complex Recoding R models into SQL, C, or Java takes time and is error prone Our company is concerned about data security, backup and recovery We need to build 10s of thousands of models fast to meet business objectives 9
Oracle Strategy 1. Provide choice of data management infrastructure for analytics Bring algorithms to data to eliminate data movement, but don t dictate data management infrastructure Achieve scalability and parallelism using open source R environment as interface 2. Exploit evolving technology trends for reduced time to insight Distributed memory & distributed computation for fast terabyte-scale analytics Use as much data as business problem requires for a quality solution avoid compromise due to tool limits Solve real-world problems with infrastructure on demand with high perf, in-mem, parallel distributed algorithms 3. Enable agility across enterprise user types One size doesn t fit all GUI users, data scientists, and application developers have different needs Deployment simplicity and speed is critical for all analytics 4. Leverage and contribute to open source Facilitate deploying open source packages in production Enable open source algorithms to work close to where data exists 10
Research Prototype: FastR New implementation of R in Java Uses the new Truffle interpreter framework and Graal optimizing compiler in conjunction with the HotSpot JVM for high performance, scalability and portability Dynamically compiles, adaptively optimizes and deoptimizes at run time Joint effort: Oracle Labs (Germany, USA, Austria), JKU Linz (Austria), Purdue University (USA), TU Dortmund (Germany) Open-source project GPLv2 https://bitbucket.org/allr/fastr 11
To Learn More about Oracle s R Technologies http://oracle.com/goto/r 12
Oracle Confidential Internal/Restricted/Highly Restricted 15