NEC Reference Architecture for SAP HANA & Hadoop

Size: px
Start display at page:

Download "NEC Reference Architecture for SAP HANA & Hadoop"

Transcription

1 NEC Reference Architecture for SAP HANA & Hadoop Using NEC High-Performance Appliance for SAP HANA and NEC Data Platform for Hadoop 1

2 Table of Contents Executive Summary Section 1: Introduction Section 2: Solution Overview NEC Appliance for SAP HANA: Turnkey appliance with NEC's SAP HANA certified Expresss5800/A2040d server: Equipped with Intel Xeon Processor E7: Fault-management functions using EXPRESSCOPE Engine SP3: Enhanced reliability, availability, and service (RAS) for SAP HANA delivered through NEC- Red Hat Enterprise System Collaboration: NEC Data Platform for Hadoop (DPH): High Performance, Scalable and Hortonworks Certified Platform Reduce Total Cost of Ownership (TCO) Platform & Data Management Services Vupico's Data Analysis Service Section 3: Benefits of Integrated Platform Section 4: SAP HANA & Hadoop Integrated Solution use case: Use Case: Data warehouse Optimization Use Case: Business Intelligence and Analytics Section 5: Platform Integration for Analytics Advantage of NEC Data Platform for Hadoop integration with SAP HANA Unprecedented Scalability: Common Data Lake Platform: Lower TCO: Proof of Concept: Intelligent Analytics across SAP HANA and Hadoop using SAP Vora and Spark by NEC and Vupico Business Use Case: Reduce lost opportunities by rapid and accurate evaluation of credit score POC Platform Configuration - Hardware & Software POC System Solution Component Analytical Model (Use case Implementation) and Analytical Model Results Use Case Background Use Case Implementation SAP Vora Interfaces and Control Section 6: Product Information HDP SAP HANA SAP Vora Tableau

3 Executive Summary One of the biggest challenge organizations are facing these days are to manage the large volume of data which is generated by the different enterprise operation and they need a system which is more agile and capable of faster and scalable analytics by collecting the data from multiple sources and types, be it structured or unstructured. Every organization has their own operational challenges but most of them have common business drivers like improve operational efficiency, customer retention & satisfaction and better product better quality to gain competitive advantage. Additional challenges could be to simplify the complex data management process, reduce the cost, platform consolidation and intelligent data placement for better analytics. Organizations needs platform and tools which can bridge the gap between business critical data and huge volume of data coming from new sources. SAP Vora has emerged as one of the technology which provides distributed computing solution for business that leverage Apache Spark distribution framework to provide enriched interactive analytics on Hadoop platform. SAP Vora is in-memory query engine which allows organizations to use SQL as query engine to analyze large volume of data from enterprise application, data warehouse, Hadoop Data Lake and real time streaming data from IoT devices. This whitepaper describes integration of NEC's Big Data Platform called "Data Platform for Hadoop (hereinafter DPH)" with NEC SAP HANA appliance and Analytics from Vupico to solve the challenges of customers credit loan scoring in real time. For this use case, Vupico has designed and developed end to end solution to implement data pipeline that will shorten the time between loan request submission to validation from days to minutes which helps financial institution to decide the credit worthiness of customer in real-time. Some of the important topics covered in this whitepaper are: Benefit of Integration between SAP HANA and Hadoop Platform Key use cases to solve using NEC's integrated platform Intelligent Analytics across SAP HANA and Hadoop using SAP Vora and Spark by NEC and Vupico 3

4 Section 1: Introduction Due to the emergence and rapid increase of new types of data in recent years, companies have been forced to re-evaluate their data strategy and embrace a profound digital transformation journey by utilizing Big Data Platform and Technology, IoT and Artificial intelligence. In today's world, almost 80% of the data generated and stored by enterprises are unstructured and it remains unanalyzed due to lack of right platform, tool and resources who can quickly identify potential value from such data. To become competitive, it's important for any organization to link business data derived from traditional systems with huge data from new sources and get real time insights that result in better business outcomes. This requires a new approach that combines and correlates structured data with unstructured data obtained from new devices, social media or sensors in a cost effective and timely manner. Enterprises using SAP products, like ERP and CRM, have been trying to identify ways to lower the total cost of ownership and this pursuit can partly be addressed by deploying SAP HANA as transactional and analytical system to store and process data. However, the growth and importance of unstructured data to deliver in depth business intelligence has limited the relevance of SAP HANA because high cost of data storage and data management makes SAP HANA system very expensive when the volume of data increases significantly. With the evolution of modern data architecture and framework, organizations have been looking for open systems which can run on commodity hardware and can scale flexibly as demand grows. This created the need for flexible and modular infrastructure requirement that provides clients with a cost effective platform with easy expansion capabilities. As a result, the industry has witnesses a growth in demand for Hadoop/Spark based platform that allows distributed processing of large data sets across clusters of computers in real-time and such platforms have been adopted by many enterprises as analytics platform for big data. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage, capable of storing and processing petabytes of data in any format and helps organizations to ingest, store, process and visualize data using a common platform. To address above challenges of storing and analyzing data in cost effective manner, NEC with its partner Vupico has introduced use case based Reference Architecture that combines SAP HANA and Hadoop with SAP Vora and integrates it seamlessly into the existing enterprise data and big data environment. NEC's large-scale distributed processing platform named as "Data Platform for 4

5 Hadoop" (DPH) is integrated with Hortonworks Data Platform (HDP ) and NEC SAP HANA appliance and it supplements the SAP HANA capabilities by SAP Vora integration. NEC and Vupico have joined hands together to create an end-to-end integrated solution that combines the power of Hadoop Platform & SAP HANA and helps in enhancing the capabilities of SAP HANA, while lowering the overall storage and processing cost. In the integrated stack, DPH is used to lower the cost of data storage system and also offload expensive ETL processes from SAP HANA. This leads to an increase in profitability as it frees up the capacity of SAP HANA system which can instead be used for higher value analytical workload. Vupico with the experience of building and implementing business intelligence, data processing pipeline and advance analytics using Hadoop and SAP ecosystem has developed an interactive analytics and data tiering process by using SAP Vora for effective scoring for loan processing. Section 2: Solution Overview The NEC High-Performance Appliance for SAP HANA and the NEC large scale distributed processing platform - DPH combined with Vupico analytical services helps to leverage Hadoop alongside SAP solutions for analysis and processing of very large volumes of data from a multiple number of varied, structured and unstructured sources. The solution overview describes SAP HANA systems and its integration with Hadoop and related technologies such as Spark and SAP Vora to find potential business insights locked inside all unstructured and underused information. 2.1 NEC Appliance for SAP HANA: NEC offers end to end turnkey appliance with NEC's SAP HANA certified Expresss5800/A2040d server for quick and easy deployment. The server features fast processing performance especially designed for real-time analysis of big data and other applications. The NEC appliance for SAP HANA incorporates the innovative in-memory computing technology of SAP and the dependable hardware platform of NEC with host of other rich features to offer high performance, availability and ease of management. Some of the important features of NEC SAP HANA appliance are: Turnkey appliance with NEC's SAP HANA certified Expresss5800/A2040d server: The appliance is designed using NEC Express 5800/A2040d scalable enterprise server. NEC Express 5800/A2040d is a scale-up server designed with massive resource pool to support compute intensive and memory-hungry applications in mission critical and virtualized environments, supporting up to 4 processors with 96 cores (192 threads), 4TB of memory and 16 PCIe 3.1 slots. 5

6 Equipped with Intel Xeon Processor E7: NEC SAP HANA appliance with Intel Xeon Processor E7 v4 Family offers the highest level of performance, availability, and scalability, making it an ideal platform for mission critical and database applications Fault-management functions using EXPRESSCOPE Engine SP3: NEC Express 5800/A2040d, armed with the NEC EXPRESSSCOPE Engine SP3, a specially designed baseboard management controller, provides extensive remote management capabilities. NEC EXPRESSSCOPE's Built-In-Diagnostics (BID) can identify failure location based on core granularity, allowing to perform in depth failure analysis as compared to regular IA servers. Further, besides checking the health of CPU and memory at the start up, it also checks the path of the input/output and this reduces the risks of failure after operation Enhanced reliability, availability, and service (RAS) for SAP HANA delivered through NEC- Red Hat Enterprise System Collaboration: Before the advent of in-memory systems, NEC worked collaboratively with Red Hat in the development of enterprise systems that delivered dynamic processing and memory functionality. This collaboration resulted in the ability to remove faulty components from operation, and reallocate system resources without system outage through standardized system calls to Red Hat Enterprise Linux. NEC Express 5800/A2040d offers RAS features required to support business critical workload for enterprise computing and avert SAP HANA down time. NEC SAP HANA offering is available not only through SAP certified appliances but also through Tailored Datacenter Integration (TDI); that brings wider choices to SAP HANA customers in leveraging their existing hardware components, which should be SAP HANA certified, for their SAP HANA environment. For a list of certified appliances from NEC for SAP HANA, refer to online documentation at: NEC Data Platform for Hadoop (DPH): NEC DPH is a large-scale distributed processing platform that combines structure and unstructured data to realize batch and real-time processing on one common platform. DPH is a pre-designed and pre-validated platform consisting of NEC world class hardware optimized for big data workload, RHEL OS and Hortonworks Hadoop and supported with range of services like platform integration, data management & analytics. 6

7 The solution is designed to analyze various forms of unstructured data such as text, images, audio and video along with traditional, structured data sources through high-speed parallel processing and extreme density, thus providing a complete solution for big data utilization. NEC Data Platform for Hadoop High Performance, Scalable and Hortonworks Certified Platform NEC DPH is a modular infrastructure platform which helps organization to accelerate business insights by rapid deployment, gain unprecedented scalability to manage the growth in volume, variety or velocity of data and associated processes. The foundation of this platform is the NEC Express5800 series server where master nodes are based on 1U rack server and worker node are 2U storage rich nodes, which allow scalability for both compute and storage together. DPH is Hortonworks certified and optimized for Hadoop workload along with additional features such as power efficiency and cooling with intelligent fan control that supports operation even in temperatures as high as degree Celsius Reduce Total Cost of Ownership (TCO) NEC DPH is certified with HDP as a Big Data appliance, built and optimized for Big Data workloads. It is pre-designed and pre-validated Hadoop platform that integrates hardware and HDP to reduce deployment period and TCO. DPH enables storage and analysis of both structured as well as unstructured data such as sensor data, SNS data through batch and real-time processing in a single platform. It also reduces additional expenditure to derive new business insights and enables taking appropriate action in real time and improve business performance. It provides prevalidated and certified upgrade paths to the customer to always use the latest Hadoop version with updated features. 7

8 Platform & Data Management Services NEC offers range of data management services that cover entire life cycle of Big Data & Analytics. It helps organizations to plan, design and implement optimized infrastructure and supports them through the process of data ingestion, integration, security, data classification, tiered storage and delivery across each phase of the data lifecycle. NEC offers single vendor support for platform design & deployment, upgrade, expansion, product & operation support, all at one place Vupico's Data Analysis Service Vupico is an Analytics consulting company with the aim of providing modern solutions in business intelligence, technology innovation, Big Data, machine learning and predictive analytics. VUPICO specializes in helping clients through the journey of converting data into business value through actionable analytics and insights. VUPICO's innovative services are centered on bringing modern architecture and latest technology while integrating Big Data IoT, SAP HANA, Hadoop and Predictive Analytics into an information platform. It provides consulting service that helps customers solve their business problems through data analytics. Based on their extensive experience in providing data driven solution to various industries and verticals, Vupico has expertise in implementing an end-to-end dataflow that ingests data from multiple data-sources and combine the best of Hadoop and SAP solutions. NEC and Vupico together have designed a proof of concept that integrates NEC DPH and SAP HANA appliance along with Vupico's analytics use case of credit scoring. Vupico has developed a business use case and has also implemented data pipeline that shortens the time between loan request submissions and their subsequent validation, from days to minutes. For better and effective decision, Vupico has developed additional functionality such as: Identified features that determine creditability of a loan applicant Defined optimal prediction model using machine learning in order to handle patterns that don t fit traditional linear regressive models Proposed a credit score approach to have human decision on borderline cases Implemented the flow from data ingestion up to restitution through processing and storage optimizing, data throughput leveraging in-memory, processing in Apache Spark and storage in SAP Vora and SAP HANA Comprehensive set of dashboards to support quick decision making 8

9 Section 3: Benefits of Integrated Platform SAP HANA and DPH are two disparate solutions that have their own strengths and display enormous potential when implemented and deployed as a combined solution. SAP HANA inmemory platform enables businesses to analyze mass data near-real time, while DPH helps to overcome cost and storage limitations with unprecedented scalability. Hence, integrating DPH and SAP HANA, amalgamates the advantages of both the solutions and results in a platform that can process huge amount of structured as well as un-structured data along with running complex analytic processing at a high speed. With the increase in the volume of data to be processed and the variety of data consisting of the conventional structured data and lately unveiled potential data mine i.e. the unstructured data, a business use case of integrating SAP HANA with Hadoop based platform has created strong buzz. Libraries such as Spark, process the unstructured data in Hadoop and store it as structured data in SAP HANA using Hive adapters. With the use of commodity hardware, DPH helps in reducing the data storage cost. This helps in reducing the overall solution cost as cold data sets from SAP HANA can be archived on DPH, thus providing the required scalability at a lower cost. Some of the key benefits derived by organizations from implementation of this integrated solution are: Combining the social media data and logs along with CRM data available in SAP HANA, companies can generate customized promotional offers for customers on the basis of the analysis performed on a combination of CRM and clickstream data Preventive maintenance for the equipment placed at remote locations by combining the sensor data (Unstructured) received from the equipment viz. a viz. the procurement date and the maintenance schedule data (Structured) Offload data and expensive processes from SAP HANA to the integrated platform so as to overcome processing bottlenecks and offer increased capacity, speed and flexibility 9

10 Section 4: SAP HANA & Hadoop Integrated Solution use case: Integration of SAP HANA with Hadoop can help customer save huge costs and embrace potential values. Below table shows the list of use cases that can be offered to customers using integrated platform: Data warehouse (DWH) Optimization DWH & Data Lake Business Intelligence & Analytics DPH as Data staging and landing Operational data store migration DPH as Active archive Batch processing Common storage of different types/sources of data ETL and Visualization Batch, real-time and interactive processing Data Exploration & Visualization Interactive processing 4.1. Use Case: Data warehouse (DWH) Optimization Transforming legacy DWH architecture to support real-time data processing is a massive project. The modernization initiative may involve multiple aspects like hardware upgrade, tweaking of data models or addition of new platforms to the environment as extended arm to the existing DWH. DWH optimized solution features the following key ingredients: A data pipeline consisting of structured, semi-structured and unstructured forms of data, capable of ingesting and storing voluminous data from a variety of disparate sources Leverage horizontal scalability/elasticity with Open Source technologies to reduce costs Augment enterprise data warehouse storage with Hadoop and Hive Use flexible data organization to enable schema-on-read Support for advanced analytics scenarios without the requirement to copy or migrate data to multiple systems 10

11 High-level architecture of DPH and Data warehouse (SAP HANA) integrated solution Extension of SAP HANA with DPH presents an opportunity for end-users and data scientists to consume the required information whether in SAP HANA or in DPH system transparently from the same user interface, without compromising on performance. While the combined solution offers a plethora of features, many of its uses are simple and have compelling results. DWH optimization is one of the many benefits presented by the combined solution that has an easily quantifiable and immediate return on investment Use Case: Business Intelligence and Analytics With unique advantages, both enterprise operational data as well as unstructured data are critical to derive business decisions. While Spark along with Hadoop offers advantage of real time processing with cost-effective storage and management of large volumes of unstructured data, the ability to combine data at one place and have access to both unstructured data and data from operational and business systems placed in data warehouses has always been a challenge. SAP HANA when combined with Hadoop brings both the formats of data together. While data tiering allows data to be stored in SAP HANA and Hadoop, combination of both presents the ability to interactively analyze data with a single logical view that ties business and operational data in SAP HANA with big data in Hadoop. Data scientists have access to both datasets without requirement to move data between two. Data scientists with this approach now have the ability to build structured data hierarchies in the unstructured data in Hadoop and integrate it with data from HANA. They can then use SAP Vora over Spark SQL interface OLAP-style in-memory analysis on the combined data for better visualization. 11

12 Section 5: Platform Integration for Analytics Organizations are increasingly looking at avenues to combine SAP HANA as strategic platform and integrate it with Big Data Platform like Hadoop with SAP HANA to enable newer analytics capabilities and lower the total cost of ownership. NEC and Vupico help customer to design and integrate an end-to-end solution using NEC DPH (Hadoop big data appliance), NEC SAP HANA appliance and SAP Vora. This integrated solution provides flexibility to store and process data based on the value of data (hot, warm and cold) and enables to run analytics from a common platform. Advantage of NEC DPH integration with SAP HANA NEC DPH helps customers to lower the total cost of ownership by enabling the data and workload consolidation and allows to scale at minimal costs when the demand grows for processing or storage. High level design for integrated platform 12

13 Some of the key advantages are: Unprecedented Scalability: NEC DPH allows customer to start small and scale as the demand grows for the analytics platform, by adding one node at a time. Common Data Lake Platform: Customers can consolidate multiple analytics platform to a common data lake platform from NEC, which allows customers to have single data access platform. It helps organizations to consolidate the workload running from multiple cluster to a single platform and eliminate the need of keeping the duplicate copy of data which directly results in huge cost saving. Lower TCO: Consolidating data from multiple clusters and costly data warehouse systems onto a cost effective data platform enables organizations to distribute the workload effectively and reduce total cost of ownership Proof of Concept: Intelligent Analytics across SAP HANA and Hadoop using SAP Vora and Spark by NEC and Vupico NEC and its solution partner Vupico have jointly designed and developed an end to end solution to implement data pipeline that would shorten time taken between loan request submission to validation from days to minutes leveraging in-memory processing and in-memory data storage combining Apache Spark on NEC Hadoop platform and SAP HANA appliance using SAP Vora Business Use Case: Reduce lost opportunities by faster and accurate evaluation of credit score Business opportunities were lost due to a paper based credit examination process that could take up to several days for a financial organizations offering loan services. NEC in collaboration with Vupico streamlined this process by integrating NEC SAP HANA appliance with Hadoop Platform and SAP Vora & implemented a flow based on machine learning model that performs risk assessment of an applicant's capability to repay giving a 98% accurate score within minutes. Based on multiple dimensions like - the applicant s past and present financial situation, employment status, assets owned or the amount requested or the purpose, the model calculates a credit score on a scale between 225 and 900, with lower score meaning high risk borrower and a high score being a low risk borrower. 13

14 End-to-End solution designed by NEC & Vupico The high level solution implemented as part of this POC helps in addressing the challenge of analytics by ingesting data from multiple sources to Hadoop whereas SAP Vora bridges the gap between operational and high value data in SAP HANA and all structured/unstructured data in Hadoop. Using SAP Vora along with Spark has helped us to simplify the data access between Hadoop and SAP HANA and only recent data resides in SAP HANA for in-memory processing. A high level solution overview implemented for POC 5.2. POC Platform Configuration - Hardware & Software NEC High-Performance Appliance for SAP HANA and NEC Data Platform for Hadoop provide the building block for optimal deployment of both SAP HANA and SAP Vora for customer s business needs. NEC and Vupico have jointly worked on simplifying the infrastructure design for customer and provide a solution which helps in driving new business models and insights. This white paper focuses on a predictive analytics use case of customer loan scoring by integrating SAP Vora with Spark and Hadoop on the following infrastructure: 14

15 NEC High-Performance Appliance for SAP HANA running SAP HANA and DLM (Data Lifecycle Management) NEC Big Data Appliance "Data Platform for Hadoop" running Hortonworks HDP for Hadoop, Spark and SAP Vora Example of SAP HANA and Hadoop integrated stack POC System Solution Component Below is a high level system component and hardware configuration used for SAP HANA and Hadoop integration along with some of the key use cases. Hardware configuration of the platform for implemented POC 15

16 Below is the detailed server configuration and list of component installed on each node. System Details Server Configuration Component Description 2 X Hadoop Master Controller Server: Express5800/R120e-1M CPU: 2x E5-2650v2 (2x 8C/2.60GHz) RAM: 128GB OS Disk: 2x 100GB HDD(RAID1) Network: 2x 10GbE (2p) Hadoop: Hortonworks Data Platform 2.6 OS: Red Hat Enterprise Linux 7.2 Ambari Server, AppTimeline, History Server, Metrics Collector, Grafana, Name Node (HA), Resource Manager (HA), Zookeeper, Journal Node, Metrics Monitor, ZKFailover Controller & HDP Clients 3 X Hadoop Worker Controller Server: Express5800/R120f-2E CPU: 2x E5-2620v3 (2x 6C/2.40GHz) RAM: 256GB OS Disk: 2x 1TB HDD(OS RAID1), Data Disk: 12x4TB HDD(Data JBOD) Network: 2x 10GbE (2p) Hadoop: Hortonworks Data Platform 2.6 OS: Red Hat Enterprise Linux 7.2 Zookeeper, Data Node, Journal Node, Metrics Monitor, Node Manager & HDP Clients SAP HANA Appliance Server: Express5800/A2040b CPU: 4x E7-4890v2 (60Core, 2.8GHz) RAM: 1TB OS/Data Disk: 8x 900GB HDD (OS/Data) Network: 2x 10GbE (2p) SAP: HANA2.0 SPS02 OS: Red Hat Enterprise Linux 7.2 (for SAP HANA) SAP HANA 2.0 SPS02 Tableau Server Server: Express5800/R120e-1M CPU: 2x E5-2650v2 (2x 8C/2.60GHz) RAM: 128GB OS Disk: 2x 100GB HDD(RAID1) Network: 2x 10GbE (2p) OS: Red Hat Enterprise Linux 7.2 Tableau Server, Tableau Desktop, Hive ODBC driver 16

17 5.3. Analytical Model (Use case Implementation) and Analytical Model Results Use Case Background In the current competitive environment, organizations providing loan services have to innovate their lending services in order to maximize opportunities and reach under-served customers and verticals. Customers view loan request as a complex, stressful and lengthy process where they need to provide a lot of information and justification in order to prove their worthiness to be granted a credit. One of the key benefit that can be offered to customers can be to make the overall process and experience as seamless as possible for the loan applicant and limit their stress by shortening the time between application submissions to receipt of response. Considering the above, NEC and Vupico have brought their extensive experience in mass data processing, machine learning and business process re-engineering to support a financial institution wanting to learn how leveraging big data to redesign their scoring system can drastically reduce the current scoring pipeline from days to hours or minutes Use Case Implementation To implement the credit scoring use case Vupico applied its analytic framework to first determine 21 major features used to identify the worthiness of a loan applicant, which in turn form a classification problem and can be handled in Machine Learning through several algorithms each of which have their merits and demerits. Based on the past experience and non-linear data type, Vupico quickly concluded that using traditional algorithms like logistics regression will not perform well and shortlisted two other algorithms to compare performance - Naïve Bayes and Random Forests over other models, as they were capable of giving out well defined rules at the end of processing which they learnt iteratively and can be predicted. 17

18 Vupico created a very clear and transparent view of what exactly were the components and rationales of the decision-making process adopted by the artificial intelligence algorithm, a benefit greatly valued as much too often credit scoring systems are black boxes for the customer who has to believe in the voodoo happening inside to give the relevant insight. Instead of having a hardline decision saying if an applicant was granted a loan or not, Vupico decided to label applicants as high, moderate and low risk on the predictions and generated a FICO score like indicator on a scale of to enable the manual assessment of borderline cases and enabling processing of loan applicants that would have been rejected with traditional criteria based models. After analyzing the data volume and potential throughput requirements, Vupico and NEC decided to present an architecture with an upstream integration that would ingest and process multiple 18

19 loan applications at a time with current systems with a pipeline using Apache Kafka to decouple processing from data producers and consumers, as well as buffering messages to easily implement a stream oriented processing type of architecture when required. Apache NiFi was used to play the critical role of managing dataflow influx and orchestrating Apache Spark jobs execution. High level workflow for platform Downstream, data visualization and dash boarding were implemented on Tableau for selfexploration and analysis of the data by the customer. This was possible because of the in-memory capabilities of both SAP HANA and SAP Vora that were fed with the scored loan application giving the customer a complete control over its operation. To offload data from SAP HANA and save storage cost on the SAP HANA system, a process was put in place to retain only the latest 3 years of data onto SAP HANA and the rest of the historical data was transferred to SAP Vora residing on the Hadoop cluster. Dashboards were built in Tableau and calculation view was used in SAP HANA that enabled combining the data locally stored in SAP HANA and the data in SAP Vora, letting users query not only the last 3 years but the whole dataset within acceptable processing time SAP Vora Interfaces and Control SAP Vora can be controlled in two ways, either using the web interfaces provided or programmatically in Spark directly through the SapSQLContext. It is worth noting that SapSQLContext is also valid for SAP HANA and enables the user for instance to load data from Hadoop into SAP HANA by Spark. There are two main web interfaces for SAP Vora called the SAP Vora Manager, used to manage SAP Vora services, start and stop different services as well as delete all the data currently in memory. User can also use it to configure the services and assign which node is responsible for which services. 19

20 The second interface is called SAP Vora Tools which allows users to manually execute most of the operation found in SQL modeler, like creating or dropping tables and views, execute SQL queries and manually load data into SAP Vora from Hadoop Distributed File System (HDFS). Tables in SAP Vora need to load their data from HDFS but in case data loaded is from ORC file, and ORC file has been changed/updated then update will not be reflected in SAP Vora. In such 20

21 cases, it requires either to manually load the file from SAP Vora tool interface or use a Spark progress to append the data into SAP Vora table. Section 6: Product Information 6.1. Hortonworoks Data Platform The Hortonworks Data Platform(HDP), powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing, processing and analyzing large volumes of data. It is designed to deal with data from many sources and formats in a very quick, easy and cost-effective manner. The Hortonworks Data Platform consists of the essential set of Apache Hadoop projects including MapReduce, HDFS, HCatalog, Pig, Hive, HBase, Zooker and Ambari. While HDFS provides the scalable, fault-tolerant, cost-efficient storage for your big data lake, YARN provides the centralized architecture that enables you to process multiple workloads simultaneously. YARN provides the resource management and pluggable architecture for enabling a wide variety of data access methods. Hortonworks contributes to the Apache Hadoop project by committing code or proposing solutions to issues and strives to deliver the most advanced Hadoop at the right timing. Hortonworks Data Platform overview (Source: Hortonworks Inc.) 21

22 6.2. SAP HANA SAP HANA is an in-memory, column-oriented, relational database management system which provides a single platform with application building blocks for database, processing, integration and application services. SAP HANA offers significant performance benefits over conventional database platforms for both Online Analytical Processing (OLAP) and Online Transaction Processing (OLTP) and provides the capabilities as an application server, ETL and can perform advanced analytics. The systems can scale up or scale out to handle in-memory processing of terabytes of data. Additionally SAP HANA has capabilities to support data tiering to manage the data storage cost and processing at the database storage layer. It helps to extend the platform to intelligently distribute data and its processing to low cost scalable platform by moving warm and cold data off the memory to alternate disk based solution like Hadoop. SAP HANA Overview (Source: SAP) 6.3. SAP Vora SAP Vora provides contextually aware analytics capability by integrating the SAP HANA platform seamlessly with Hadoop. SAP Vora is an in-memory query engine that brings powerful 22

23 contextual analytics across all data stored in Hadoop, enterprise systems and other distributed data sources, and drives lower TCO by achieving low cost and faster analytics on huge data set. SAP Vora extends the capabilities of Spark with a richer SQL capabilities. SAP Vora is an extended Spark execution framework which provides SQL like capabilities and produce the accelerated results by processing and loading Hadoop data/tables in memory. SAP Vora provides a simple graphical interface to model data and build star schemas which helps in boosting the SQL performance. Additionally, it can help in building the hierarchies and drill down on Hadoop data which is very difficult to realize in general. SAP Vora overview (Source: SAP) SAP Vora bridges the gap between SAP HANA and Hadoop and enables customer to run several key business use cases on integrated platform to lower the cost Tableau Tableau is an interactive data visualization tools that enables users to create interactive and apt visualizations in the form of dashboards, worksheets to gain business insights for the better development. It allows users to easily create customized dashboards that provide insight to a broad spectrum of information. The characteristics of Tableau are as follows: Using patented technology VizQL for visualization, it is much easier to understand data With its intuitive interface and exceptional ease of use, it is faster and simpler to get new insights With Tableau's server function, users can publish and share their visualizations so that anyone can use them 23

24 Contact Us. NEC Corporation VUPICO LLC JAPAN DEUX TOURS EAST 45F E Harumi, Chuo-ku, Tokyo SINGAPORE 31, St Thomas Walk 0403, St Thomas Suites Singapore AUSTRALIA 607/17 Grattan Close Glebe NSW, 2037 INDIA 305 Adiya Trade Centre Ameerpet Hyderabad NEC IS A HORTONWORKS CERTIFIED TECHNOLOGY PARTNER Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other countries. Apache, Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Phoenix, NiFi, Zeppelin, Slider, MapReduce, HDFS, YARN, Hadoop elephant, and Apache project logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States and the other countries. SAP, SAP logo, SAP HANA, SAP Vora, and other SAP products are the trademark or registered trademark of SAP AG in Germany and in several other countries. Tableau and all the Tableau products mentioned in this document are trademark or registered trademark of Tableau Software Inc. Red Hat and Red Hat Enterprise Linux are trademarks of Red Hat, Inc., registered in the U.S. and other countries. Intel, Intel logo, Intel Inside, Intel Inside logo, and the other intel products are trademarks or registered trademark of Intel Corporation in the United States. and other countries. All other product and service names mentioned are the trademarks of their respective companies.

Hortonworks and The Internet of Things

Hortonworks and The Internet of Things Hortonworks and The Internet of Things Dr. Bernhard Walter Solutions Engineer About Hortonworks Customer Momentum ~700 customers (as of November 4, 2015) 152 customers added in Q3 2015 Publicly traded

More information

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been

More information

Table 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti

Table 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti Solution Overview Cisco UCS Integrated Infrastructure for Big Data with the Elastic Stack Cisco and Elastic deliver a powerful, scalable, and programmable IT operations and security analytics platform

More information

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems 1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Analyze Big Data Faster and Store It Cheaper

Analyze Big Data Faster and Store It Cheaper Analyze Big Data Faster and Store It Cheaper Dr. Steve Pratt, CenterPoint Russell Hull, SAP Public About CenterPoint Energy, Inc. Publicly traded on New York Stock Exchange Headquartered in Houston, Texas

More information

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous

More information

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr Solution Overview Cisco UCS Integrated Infrastructure for Big Data and Analytics with Cloudera Enterprise Bring faster performance and scalability for big data analytics. Highlights Proven platform for

More information

Introduction to Big-Data

Introduction to Big-Data Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,

More information

Strategic Briefing Paper Big Data

Strategic Briefing Paper Big Data Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which

More information

HDP Security Overview

HDP Security Overview 3 HDP Security Overview Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents HDP Security Overview...3 Understanding Data Lake Security... 3 What's New in This Release: Knox... 5 What's New

More information

HDP Security Overview

HDP Security Overview 3 HDP Security Overview Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents HDP Security Overview...3 Understanding Data Lake Security... 3 What's New in This Release: Knox... 5 What's New

More information

Capture Business Opportunities from Systems of Record and Systems of Innovation

Capture Business Opportunities from Systems of Record and Systems of Innovation Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information

More information

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications

More information

MapR Enterprise Hadoop

MapR Enterprise Hadoop 2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com : Getting Started Guide Copyright 2012, 2014 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing,

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Security and Performance advances with Oracle Big Data SQL

Security and Performance advances with Oracle Big Data SQL Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,

More information

BUILDING the VIRtUAL enterprise

BUILDING the VIRtUAL enterprise BUILDING the VIRTUAL ENTERPRISE A Red Hat WHITEPAPER www.redhat.com As an IT shop or business owner, your ability to meet the fluctuating needs of your business while balancing changing priorities, schedules,

More information

DATACENTER SERVICES DATACENTER

DATACENTER SERVICES DATACENTER SERVICES SOLUTION SUMMARY ALL CHANGE React, grow and innovate faster with Computacenter s agile infrastructure services Customers expect an always-on, superfast response. Businesses need to release new

More information

Lenovo Data Center Group. Define a different future

Lenovo Data Center Group. Define a different future Lenovo Data Center Group Define a different future Think change is hard? We don t. Concerned about the risks and complexities of transformation We see new ways to seize opportunity. Wrestling with inflexible

More information

When, Where & Why to Use NoSQL?

When, Where & Why to Use NoSQL? When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

EMC SOLUTION FOR SPLUNK

EMC SOLUTION FOR SPLUNK EMC SOLUTION FOR SPLUNK Splunk validation using all-flash EMC XtremIO and EMC Isilon scale-out NAS ABSTRACT This white paper provides details on the validation of functionality and performance of Splunk

More information

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing

More information

ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE

ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE An innovative storage solution from Pure Storage can help you get the most business value from all of your data THE SINGLE MOST IMPORTANT

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

Oracle GoldenGate for Big Data

Oracle GoldenGate for Big Data Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines

More information

AWS & Intel: A Partnership Dedicated to fueling your Innovations. Thomas Kellerer BDM CSP, Intel Central Europe

AWS & Intel: A Partnership Dedicated to fueling your Innovations. Thomas Kellerer BDM CSP, Intel Central Europe AWS & Intel: A Partnership Dedicated to fueling your Innovations Thomas Kellerer BDM CSP, Intel Central Europe The Digital Service Economy Growth in connected devices enables new business opportunities

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

Accelerate Big Data Insights

Accelerate Big Data Insights Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

2 to 4 Intel Xeon Processor E v3 Family CPUs. Up to 12 SFF Disk Drives for Appliance Model. Up to 6 TB of Main Memory (with GB LRDIMMs)

2 to 4 Intel Xeon Processor E v3 Family CPUs. Up to 12 SFF Disk Drives for Appliance Model. Up to 6 TB of Main Memory (with GB LRDIMMs) Based on Cisco UCS C460 M4 Rack Servers Solution Brief May 2015 With Intelligent Intel Xeon Processors Highlights Integrate with Your Existing Data Center Our SAP HANA appliances help you get up and running

More information

New Approach to Unstructured Data

New Approach to Unstructured Data Innovations in All-Flash Storage Deliver a New Approach to Unstructured Data Table of Contents Developing a new approach to unstructured data...2 Designing a new storage architecture...2 Understanding

More information

Orchestration of Data Lakes BigData Analytics and Integration. Sarma Sishta Brice Lambelet

Orchestration of Data Lakes BigData Analytics and Integration. Sarma Sishta Brice Lambelet Orchestration of Data Lakes BigData Analytics and Integration Sarma Sishta Brice Lambelet Introduction The Five Megatrends Driving Our Digitized World And Their Implications for Distributed Big Data Management

More information

Overview of Data Services and Streaming Data Solution with Azure

Overview of Data Services and Streaming Data Solution with Azure Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server

More information

@Pentaho #BigDataWebSeries

@Pentaho #BigDataWebSeries Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of

More information

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

MAPR DATA GOVERNANCE WITHOUT COMPROMISE MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance

More information

VOLTDB + HP VERTICA. page

VOLTDB + HP VERTICA. page VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics

More information

QLE10000 Series Adapter Provides Application Benefits Through I/O Caching

QLE10000 Series Adapter Provides Application Benefits Through I/O Caching QLE10000 Series Adapter Provides Application Benefits Through I/O Caching QLogic Caching Technology Delivers Scalable Performance to Enterprise Applications Key Findings The QLogic 10000 Series 8Gb Fibre

More information

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018 Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/

More information

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1 TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1 ABSTRACT This introductory white paper provides a technical overview of the new and improved enterprise grade features introduced

More information

Dell EMC ScaleIO Ready Node

Dell EMC ScaleIO Ready Node Essentials Pre-validated, tested and optimized servers to provide the best performance possible Single vendor for the purchase and support of your SDS software and hardware All-Flash configurations provide

More information

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet WHITE PAPER Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet Contents Background... 2 The MapR Distribution... 2 Mellanox Ethernet Solution... 3 Test

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Lenovo Database Configuration for Microsoft SQL Server TB

Lenovo Database Configuration for Microsoft SQL Server TB Database Lenovo Database Configuration for Microsoft SQL Server 2016 22TB Data Warehouse Fast Track Solution Data Warehouse problem and a solution The rapid growth of technology means that the amount of

More information

Real Time for Big Data: The Next Age of Data Management. Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104

Real Time for Big Data: The Next Age of Data Management. Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104 Real Time for Big Data: The Next Age of Data Management Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104 Real Time for Big Data The Next Age of Data Management Introduction

More information

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Create the Data Center of the Future Accelerate

More information

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,

More information

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group

More information

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud Microsoft Azure Databricks for data engineering Building production data pipelines with Apache Spark in the cloud Azure Databricks As companies continue to set their sights on making data-driven decisions

More information

Progress DataDirect For Business Intelligence And Analytics Vendors

Progress DataDirect For Business Intelligence And Analytics Vendors Progress DataDirect For Business Intelligence And Analytics Vendors DATA SHEET FEATURES: Direction connection to a variety of SaaS and on-premises data sources via Progress DataDirect Hybrid Data Pipeline

More information

DELL EMC VXRACK FLEX FOR HIGH PERFORMANCE DATABASES AND APPLICATIONS, MULTI-HYPERVISOR AND TWO-LAYER ENVIRONMENTS

DELL EMC VXRACK FLEX FOR HIGH PERFORMANCE DATABASES AND APPLICATIONS, MULTI-HYPERVISOR AND TWO-LAYER ENVIRONMENTS PRODUCT OVERVIEW DELL EMC VXRACK FLEX FOR HIGH PERFORMANCE DATABASES AND APPLICATIONS, MULTI-HYPERVISOR AND TWO-LAYER ENVIRONMENTS Dell EMC VxRack FLEX is a Dell EMC engineered and manufactured rack-scale

More information

Microsoft Big Data and Hadoop

Microsoft Big Data and Hadoop Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common

More information

Hybrid Data Platform

Hybrid Data Platform UniConnect-Powered Data Aggregation Across Enterprise Data Warehouses and Big Data Storage Platforms A Percipient Technology White Paper Author: Ai Meun Lim Chief Product Officer Updated Aug 2017 2017,

More information

Fast Innovation requires Fast IT

Fast Innovation requires Fast IT Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:

More information

DriveScale-DellEMC Reference Architecture

DriveScale-DellEMC Reference Architecture DriveScale-DellEMC Reference Architecture DellEMC/DRIVESCALE Introduction DriveScale has pioneered the concept of Software Composable Infrastructure that is designed to radically change the way data center

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET SOLUTION SHEET Syncsort DMX-h Simplifying Big Data Integration Goals of the Modern Data Architecture Data warehouses and mainframes are mainstays of traditional data architectures and still play a vital

More information

Digital Enterprise Platform for Live Business. Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU

Digital Enterprise Platform for Live Business. Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU Digital Enterprise Platform for Live Business Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU Rethinking the Future Competing in today s marketplace means leveraging

More information

Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability

Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability IT teams in companies of all sizes face constant pressure to meet the Availability requirements of today s Always-On

More information

NEC Express5800 R320f Fault Tolerant Servers & NEC ExpressCluster Software

NEC Express5800 R320f Fault Tolerant Servers & NEC ExpressCluster Software NEC Express5800 R320f Fault Tolerant Servers & NEC ExpressCluster Software Downtime Challenges and HA/DR Solutions Undergoing Paradigm Shift with IP Causes of Downtime: Cost of Downtime: HA & DR Solutions:

More information

Was ist dran an einer spezialisierten Data Warehousing platform?

Was ist dran an einer spezialisierten Data Warehousing platform? Was ist dran an einer spezialisierten Data Warehousing platform? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Data warehousing, Exadata, specialized hardware proprietary hardware Introduction

More information

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure Generational Comparison Study of Microsoft SQL Server Dell Engineering February 2017 Revisions Date Description February 2017 Version 1.0

More information

Lenovo Database Configuration Guide

Lenovo Database Configuration Guide Lenovo Database Configuration Guide for Microsoft SQL Server OLTP on ThinkAgile SXM Reduce time to value with validated hardware configurations up to 2 million TPM in a DS14v2 VM SQL Server in an Infrastructure

More information

Evolving To The Big Data Warehouse

Evolving To The Big Data Warehouse Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from

More information

New Approaches to Big Data Processing and Analytics

New Approaches to Big Data Processing and Analytics New Approaches to Big Data Processing and Analytics Contributing authors: David Floyer, David Vellante Original publication date: February 12, 2013 There are number of approaches to processing and analyzing

More information

Lenovo Validated Designs

Lenovo Validated Designs Lenovo Validated Designs for ThinkAgile HX Appliances Deliver greater reliability and simplify the modern datacenter Simplify Solutions Infrastructure Lenovo and Nutanix share a common vision of delivering

More information

Guatemala July 31st, 2018

Guatemala July 31st, 2018 Guatemala July 31st, 2018 Make It Real Modernize your IT Infrastructure with Dell EMC ARTURO BENAVIDES MDC, LATIN AMERICA What do you see? 4 Poll Did you see: 1. The young man? 2. The cell phone? 3. The

More information

Informatica Enterprise Information Catalog

Informatica Enterprise Information Catalog Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with

More information

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their

More information

Hortonworks DataFlow. Accelerating Big Data Collection and DataFlow Management. A Hortonworks White Paper DECEMBER Hortonworks DataFlow

Hortonworks DataFlow. Accelerating Big Data Collection and DataFlow Management. A Hortonworks White Paper DECEMBER Hortonworks DataFlow Hortonworks DataFlow Accelerating Big Data Collection and DataFlow Management A Hortonworks White Paper DECEMBER 2015 Hortonworks DataFlow 2015 Hortonworks www.hortonworks.com 2 Contents What is Hortonworks

More information

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018 Cloudline Autonomous Driving Solutions Accelerating insights through a new generation of Data and Analytics October, 2018 HPE big data analytics solutions power the data-driven enterprise Secure, workload-optimized

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme PBO1046BES Simplifying the Journey To The Software Defined Datacenter Tikiri Wanduragala Senior Consultant Data Center Group, Lenovo EMEA VMworld 2017 Geoff Hunt Senior Product Manager Data Center Group,

More information

7 Things ISVs Must Know About Virtualization

7 Things ISVs Must Know About Virtualization 7 Things ISVs Must Know About Virtualization July 2010 VIRTUALIZATION BENEFITS REPORT Table of Contents Executive Summary...1 Introduction...1 1. Applications just run!...2 2. Performance is excellent...2

More information

Top 4 considerations for choosing a converged infrastructure for private clouds

Top 4 considerations for choosing a converged infrastructure for private clouds Top 4 considerations for choosing a converged infrastructure for private clouds Organizations are increasingly turning to private clouds to improve efficiencies, lower costs, enhance agility and address

More information

Lenovo Database Configuration

Lenovo Database Configuration Lenovo Database Configuration for Microsoft SQL Server OLTP on Flex System with DS6200 Reduce time to value with pretested hardware configurations - 20TB Database and 3 Million TPM OLTP problem and a solution

More information

Introducing SUSE Enterprise Storage 5

Introducing SUSE Enterprise Storage 5 Introducing SUSE Enterprise Storage 5 1 SUSE Enterprise Storage 5 SUSE Enterprise Storage 5 is the ideal solution for Compliance, Archive, Backup and Large Data. Customers can simplify and scale the storage

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

ELASTIC DATA PLATFORM

ELASTIC DATA PLATFORM SERVICE OVERVIEW ELASTIC DATA PLATFORM A scalable and efficient approach to provisioning analytics sandboxes with a data lake ESSENTIALS Powerful: provide read-only data to anyone in the enterprise while

More information

Active Archive and the State of the Industry

Active Archive and the State of the Industry Active Archive and the State of the Industry Taking Data Archiving to the Next Level Abstract This report describes the state of the active archive market. New Applications Fuel Digital Archive Market

More information

Hortonworks University. Education Catalog 2018 Q1

Hortonworks University. Education Catalog 2018 Q1 Hortonworks University Education Catalog 2018 Q1 Revised 03/13/2018 TABLE OF CONTENTS About Hortonworks University... 2 Training Delivery Options... 3 Available Courses List... 4 Blended Learning... 6

More information

Optimizing Apache Spark with Memory1. July Page 1 of 14

Optimizing Apache Spark with Memory1. July Page 1 of 14 Optimizing Apache Spark with Memory1 July 2016 Page 1 of 14 Abstract The prevalence of Big Data is driving increasing demand for real -time analysis and insight. Big data processing platforms, like Apache

More information

NEW CONVERGED APPROACH FOR SAP POWERED BY ATOS

NEW CONVERGED APPROACH FOR SAP POWERED BY ATOS NEW CONVERGED APPROACH FOR SAP POWERED BY ATOS Michael Schmitter, Atos Tim Wörfel, Hitachi Vantara 28.02.2018 HITACHI and Atos Partnership More 9 Years Partnership Partnership covers main areas of the

More information

Smart Data Center From Hitachi Vantara: Transform to an Agile, Learning Data Center

Smart Data Center From Hitachi Vantara: Transform to an Agile, Learning Data Center Smart Data Center From Hitachi Vantara: Transform to an Agile, Learning Data Center Leverage Analytics To Protect and Optimize Your Business Infrastructure SOLUTION PROFILE Managing a data center and the

More information

Discover the all-flash storage company for the on-demand world

Discover the all-flash storage company for the on-demand world Discover the all-flash storage company for the on-demand world STORAGE FOR WHAT S NEXT The applications we use in our personal lives have raised the level of expectations for the user experience in enterprise

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

QLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics

QLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics QLogic 16Gb Gen 5 Fibre Channel for Database Assessment for Database and Business Analytics Using the information from databases and business analytics helps business-line managers to understand their

More information

OpenStack and Hadoop. Achieving near bare-metal performance for big data workloads running in a private cloud ABSTRACT

OpenStack and Hadoop. Achieving near bare-metal performance for big data workloads running in a private cloud ABSTRACT OpenStack and Hadoop Achieving near bare-metal performance for big data workloads running in a private cloud ABSTRACT IT organizations are increasingly turning to the open source Apache Hadoop software

More information

Configuring and Deploying Hadoop Cluster Deployment Templates

Configuring and Deploying Hadoop Cluster Deployment Templates Configuring and Deploying Hadoop Cluster Deployment Templates This chapter contains the following sections: Hadoop Cluster Profile Templates, on page 1 Creating a Hadoop Cluster Profile Template, on page

More information

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution EMC Virtual Infrastructure for Microsoft Applications Data Center Solution Enabled by EMC Symmetrix V-Max and Reference Architecture EMC Global Solutions Copyright and Trademark Information Copyright 2009

More information

Virtualization Strategies on Oracle x86. Hwanki Lee Hardware Solution Specialist, Local Product Server Sales

Virtualization Strategies on Oracle x86. Hwanki Lee Hardware Solution Specialist, Local Product Server Sales Virtualization Strategies on Oracle x86 Hwanki Lee Hardware Solution Specialist, Local Product Server Sales Agenda Customer Business Needs Oracle VM for x86/x64 Summary Customer Business Needs Common IT

More information

Lenovo Big Data Reference Architecture for Hortonworks Data Platform Using System x Servers

Lenovo Big Data Reference Architecture for Hortonworks Data Platform Using System x Servers Lenovo Big Data Reference Architecture for Hortonworks Data Platform Using System x Last update: 12 December 2017 Version 1.1 Configuration Reference Number: BDAHWKSXX62 Describes the RA for Hortonworks

More information