The Complete Guide to Data Integration 2017

Size: px
Start display at page:

Download "The Complete Guide to Data Integration 2017"

Transcription

1 1 The Complete Guide to Data 2017 Simplifying data integration for the modern era E-BOOK

2 THE COMPLETE GUIDE TO DATA INTEGRATION The Complete Guide to Data 2017 Simplifying data integration for the modern era AUDIENCE: BI Managers Practitioners Project Managers Solution Architects Unlocking the Big data is here, and it s transforming the very nature of commerce, enabling new insights, and accelerating the generation of business insights. While the concept of big data isn t new, its potential is just now being realized as powerful tools to organize, manage, and analyze, immense volumes of enterprise-generated and third-party data finally become available for mainstream use. However, for many organizations, it s not so easy to unlock the value in this data. While data volume (the amount of data) and velocity (speed that data is generated) is in part what makes it so valuable, volume and velocity also present significant challenges. Still more daunting is the broad variation in the types and sources of data (variety), including highly structured files, semi-structured text, and unstructured video and audio feeds.

3 THE COMPLETE GUIDE TO DATA INTEGRATION Biggest Big Data Challenge for Businesses 49% variety 16% velocity 35% volume In a recent Gartner study, 49% of organizations reported that they struggled the most with the variety of big data compared to 35% citing volume as their most significant problem, and 16% of organizations claimed velocity was the largest problem relating to big data 1. Contending with data from multiple databases and systems has always been a challenge but now with increasingly different types of data, the task has become overwhelming. In addition, with data distributed across disparate systems, sources, and silos, it can be a seemingly impossible challenge to obtain a unified, enterprise-wide view of the information available for analysis. For companies attempting to integrate this onslaught of data in the same manner as was popular 20 years ago with traditional data warehouse approaches it is indeed impossible, or close to it. To extract real value from data organizations must ingest and process data from both internal and external sources and perform near real-time analysis not an easy task. Faced with these challenges, traditional data warehouse solutions cannot keep up with rapidly changing data ecosystems. 1 Gartner, 2014, Survey Analysis: Big Data Adoption in 2013

4 THE COMPLETE GUIDE TO DATA INTEGRATION In a typical IT environment, traditional data warehouses ingest, model, and store data through an extract, transform, and load process (ETL). These ETL jobs are used to move large amounts of data in a batch-oriented manner and are most commonly scheduled to run daily. Running these jobs daily means that, at best, the warehoused data is a few hours old, but it is typically a day or more old. Because ETL jobs consume significant CPU, memory, disk space, and network bandwidth, it is difficult to justify running these jobs more than once daily. In a time when APIs were not as prevalent as they are now, ETL tools were the go-to solution for operational use cases. With APIs now in the picture and the sheer variety of data they represent the ETL method is becoming impractical. However, even before the era of APIs and big data, ETL tools posed significant challenges, mainly because they require

5 THE COMPLETE GUIDE TO DATA INTEGRATION comprehensive knowledge of each operational database or application. Interconnectivity is complicated and requires thorough knowledge of each data source all the way down to the field level. The greater the number of interconnected systems that are to be included in the data warehouse, the more complicated the effort is. In this digital era, new requirements arise faster than ever before and previous requirements change as quickly making development agility and responsiveness necessary factors for success. As such, ETL-based data warehousing projects became infamous for appallingly high failure rates. When these projects don t fail outright they are frequently plagued with cost over runs, and delayed implementations. Great care is needed to conceptualize the database and thoroughly define requirements to avoid having to re-work complicated and brittle connections, since tightly coupled interdependencies often trigger unpredictable and far-reaching impacts even when slight changes are made. Another shortcoming of the ETL data warehouse approach is that the business staff rarely gets an opportunity to see the results until after several months of development work has been completed. By this point it is common that requirements have changed, errors have been discovered, or the objective of the project has shifted. Any of these variables might force IT back to the drawing board to collect new requirements, and in all likelihood months of development effort will be scrapped. In fact, Gartner estimated that between 70 and 80 percent of corporate business intelligence projects failed to deliver the expected outcomes Poor-communication-to-blame-for-business-intelligencefailure-says-Gartner

6 THE COMPLETE GUIDE TO DATA INTEGRATION Data warehouses were originally built for operational reporting rather than for interactive data analysis and using a traditional data warehouse for analytic queries requires carefully building just the right structure and performing extensive and specific performance optimization. If you later decide to use the data differently, you must change the data structure and re-optimize, which is a very cumbersome and costly process. The inherent problems of the traditional ETL approach is compounded by the sheer number of data sources available and the myriad ways to access data such as the proliferation of APIs that rely on importing and exporting data, each of which has its own access protocol. While it s technically possible to implement this sort of connectivity through ETL, the actual implementation would be overly complex, difficult to maintain, and costly to extend, problems that are made worse if the APIs do not use data exchange standards such as ODBC or JDBC. In this digital era, new requirements arise faster than ever before and previous requirements change as quickly making development agility and responsiveness necessary factors for success. Because of issues like these, traditional data warehouses simply can t cope with the needs of today s businesses and related overall digital transformation trends. Because of the shortcomings of the traditional data warehouse approach, new approaches to data processing emerged and what came next was multi-dimensional OLAP methodology.

7 THE COMPLETE GUIDE TO DATA INTEGRATION THE TRADITIONAL WAREHOUSE AT A GLANCE + Move large amounts of data Built for operational reporting Significant consummation of bandwidth, CPU etc. Long development cycles (several months) No interactive data analysis High complexity due to high number of potential ways to integrate data OLAP Online Analytical Processing (OLAP), and cubes are other words for multi-dimensional sets of data that essentially serve as a staging space in which to analyze information. These special online analytic processing databases hold data not in tables but in OLAP cubes which are a mechanism used to store and query data in an organized, multi-dimensional, structure specifically optimized for analysis. OLAP databases are designed to pre-calculate as many queries and combinations of data fields as possible in order to provide fast query response. However, while these solutions perform better than classical relational databases, their multidimensional structure makes them inflexible and unable to accommodate changes easily. In addition, storing large amounts of data in a cube causes a performance bottleneck. While OLAP databases are quite useful for basic use cases, large data sets

8 THE COMPLETE GUIDE TO DATA INTEGRATION require using capabilities from additional tools in tandem, which complicates analytical efforts and requires unique skills. ROLAP Another way to organize data for multi-dimensional querying is relational online analytic processing (ROLAP). ROLAP is a form of OLAP that performs multi-dimensional analysis of data stored in a relational database rather than in a multi-dimensional database, which is considered the OLAP standard. Although ROLAP technology performs better than OLAP databases when processing large amounts of data, it cannot beat the speed and efficiency of OLAP on smaller amounts of data. ROLAP databases require a great deal of manual maintenance and are difficult for business users to operate so ROLAP is considered to be more inflexible than OLAP cubes. OLAP and ROLAP are both still popular today but neither technology can keep up with today s demands for near real time data for analytics nor handle unstructured data.

9 THE COMPLETE GUIDE TO DATA INTEGRATION MULTIDIMENSIONAL DATABASES (OLAP, ROLAP) AT A GLANCE Store and query data in an organized way Fast query response due to pre-calculation Fast and efficient for small amounts of data + Problems with large amounts of data Inflexibilty through multidimensional structure Performance bottleneck due to storage limitations of cubes Need for manual maintenance Difficult to use for business users Need for additional tools when dealing with high data volumes Because both the data warehouse and OLAP approaches fall short of business expectations for speedy and comprehensive analytical data access, a new approach surfaced. Self-service business intelligence (SSBI) technologies like Qlik and Tableau introduced an approach to data analytics that enables business users to access and work with corporate information without the IT department s involvement. These SSBI tools have the capability of blending or locally integrating data from the data warehouse with any other data sources not stored in the data warehouse. This is accomplished through pulling copies of the data sources

10 THE COMPLETE GUIDE TO DATA INTEGRATION into a local data store where the analyst can blend or integrate the data as needed. These self-service tools are flexible and relatively easy to implement and provide a good level of independence for data analysts but there are clear disadvantages to the approach. The most prominent disadvantage is that data analysis performed in this manner quickly becomes unmanageable, resulting in redundant work, inconsistent results and in short, chaotic reporting practices when used on a broad scale throughout organizations. Since everybody has the ability to define their own rules and calculations, it is both possible and likely for different groups and individuals to calculate the same KPIs and metrics in different ways, leading to an array of conflicting results and the publishing of both confusing and contradictory information. Because these solutions have no permissions structure, there is no security layer to protect sensitive data which is a severe vulnerability since analysts frequently and casually exchange data files. Also, the ability to transform the data is relatively limited in most cases. Further, because many machines are doing the same work for different users in parallel, powerful computer resources are being used inefficiently, contributing to rising costs and lower system performance. For all of these reasons, pure SSBI tools can fill a limited and short term need but fall short of being an endto-end enterprise level analytical solution.

11 THE COMPLETE GUIDE TO DATA INTEGRATION SELF-SERVICE BI TOOLS AT A GLANCE + Enable business user to perform analysis without IT-support Data blending of external data sources with data warehouse Flexible and easy to implement Different KPI calculations due de-cetralised analytics No security layer Limited data transformation capabilities Inefficient use of resources due to parallel usage As SSBI tools evolved, data scientists were still wrestling with the overall challenge of finding an analytical database as flexible for analytics as relational databases were for transactional data processing. Progressive software vendors sought to overcome the limitations of data warehouses, cubes, and SSBIs and began working towards creating databases that were both flexible and able to process analytical workloads. These analytical databases, or column stores, were the next step in the trend to provide business analysts the tools and flexibility they need. These analytical databases have evolved into massively parallel processing (MPP) analytical databases that are more flexible and more performant than Cubes even in the cases where large amounts of data are being stored and queried. However, these analytical databases require that data be copied into them using processes very similar to the aforementioned ETL processes and have similar drawbacks. The load processes are typically slower than in traditional data warehouse based on row based technology because there is an extra step required

12 THE COMPLETE GUIDE TO DATA INTEGRATION to optimize the data for quick analytical retrieval. This extra step is required to convert the data from a row-based format into a columnar format and then apply field level data compression. Although these extra steps provide significant performance improvements, they also require additional time that delays the analysts ability to analyze the data. It is impossible to access realtime data in analytical databases due to this load time latency. ANALYTICAL DATABASES AT A GLANCE Scalable and able to deal with huge workloads Strong parallel processing High scalability + Slow load processes due to need for conversion from row- to column-based data and data compression No real-time data access Not agile Next came the data lake strategy. Data lakes are storage repositories able to hold a vast amount of raw data in its native format until needed. In many cases data lakes are Hadoopbased systems and they represent the next stage in both power and flexibility. A compelling benefit of the approach is that there is no need to structure (transform) the data before querying it (which would be referred to as schema on write ). In fact, you can assign structure to the data at the time it is being queried (referred to as schema on read ). However, while data lakes are able to hold large amounts of unstructured data in a costeffective manner, they are insufficient for interactive analysis when fast query response is required or if access to real-time data is needed.

13 THE COMPLETE GUIDE TO DATA INTEGRATION The proliferation of data lakes enables the switch from ETL to ELT (extract, load, and transform). Unlike ETL where data is transformed before it s loaded into the database, ELT significantly accelerates load time by ingesting data in its raw state. The rationale behind this approach was that data lakes storage technologies are not picky about the structure of the data. Therefore, no development time is required to transform the data into the right structure before it can be accessed for analytics. This means that all data could be simply parked or dumped into a data lake, and all further operations and transformations could occur within this database if and when needed. While it is a tantalizing approach, the data lake falls short of expectations for several reasons. A primary objective of the data lake is to simplify and accelerate, however the approach often complicates matters with extra steps to prepare data for analytics, and although it provides significant reductions in labor for data loads it still requires that all data be moved or copied to a single location prior to accessibility for analytical purposes. This drawback is shared with the traditional data warehouse using ETL approach since data load latency cannot be eliminated from the analytical data supply chain although the load time latency is greatly reduced for the data lake as compared to a data warehouse. Another disadvantage to the data lake is a phenomenon that has come to be known as the data swamp or data graveyard. The data lake approach often leads to dumping and storing much more data as compared to ETL because of lower cost of storage, but the save everything approach leads to loading and storing much more data than businesses are prepared to analyze. Since any data load takes time and consumes disk space and network bandwidth, unnecessary loads can be expensive and cause additional latency that delays other more analytically valuable data from being analyzed in a timely manner.

14 THE COMPLETE GUIDE TO DATA INTEGRATION Although data lakes and ELT bring data together into one place quickly they cannot provide fast query response as analytical databases do, nor can they provide access to data in real-time. DATA LAKES AND ELT AT A GLANCE hold vast amounts of unstructured data no need to structure data before querying it efficient data load + no real-time analysis possible data needs to be moved to single location before analysis low costs encourage data graveyards which decrease performance and increase costs Looking back at both the traditional data warehouses and the data lakes, one commonality they share is that they rely on having all data in a physical, central repository. The idea was that before you could work with it, you had to corral the data into a single location. However, this assumption has been a barrier to accelerating data accessibility and is what is fundamentally wrong with all of the approaches previously discussed.

15 THE COMPLETE GUIDE TO DATA INTEGRATION While the majority of data analysts were busy exploring the progression from relational databases to Cubes, analytic databases, and data lakes, another camp was looking into using data federation to integrate data for analysis. Data federation allows analysts to instantly run queries joining multiple disparate databases without the need to copy or move data from the original operational sources to a central analytical repository. This approach is clearly a significant improvement on all of its predecessors regarding the immediacy at which data can be analyzed. While the idea is sound and value is self-evident, data federation alone isn t scalable for large amounts of data or for large numbers of simultaneous users. In addition, because it relies heavily on the speed and stability of the source systems and network, its performance is commonly diminished for both data analysis and production operations. So, while data federation is quick and flexible, in itself it is not scalable or particularly dependable. But, it was an important step in the right direction. The next stage of evolution was to combine data federation with caching repositories to address these issues. This hybrid approach used big data solutions to complement data warehousing. The result is a combination of repositories, virtualization, and distributed processes for data management that delivers the best capabilities from several technologies but still falls short of the expectation for a robust, agile, performant data warehouse. Caching can be problematic due to the need to schedule cache loads around performance concerns of source systems and that the cache is loaded into a single repository that may or may not be optimized for different data sets and/or data types.

16 THE COMPLETE GUIDE TO DATA INTEGRATION Still, in moving closer to modern data warehouses, virtual data technology is essential from simple federation to virtualization, as well as virtual views, indices, and semantics. Developing virtual or logical data views is faster than relocating all data physically and can be done with ease through point and click operations. In addition, virtual views can be altered without the need to transform and reload data, as in earlier data warehouse integration approaches meaning the changes can be presented live immediately, without waiting for the data to populate through an overnight process. It is the virtualization of data integration that enables extreme agility in analytical development and significantly reduces build times and costs, all of which leads us to the next breakthrough in data warehousing. DATA FEDERATION AT A GLANCE + joining databases in a central repositiory without need to copy them very fast data access flexible change of virtual views virtual data integration enables extreme agility and reduces buildtimes/costs limited scalability (e.g. many simultaneous users) caching repositories cause performance problems

17 THE COMPLETE GUIDE TO DATA INTEGRATION The First Logical A modern data integration strategy employs what s known as best-fit engineering, whereby each part of the data management infrastructure utilizes the most appropriate technology solution to perform its role, including storing data determined by business requirements and service-level agreements (SLAs). Unlike a data lake, this new architecture has a distributed approach, aligning information storage selection, with information use, and leveraging multiple data technologies that are fit for specific purposes. A hybrid approach can also significantly reduce costs and time to delivery when changes or additions in the warehouse are required. One term for this new architecture is logical data warehouse. Another is virtual data lake. In either case, the premise is that there is no single data repository. Instead, the logical data warehouse is an ecosystem of multiple, fit-for-purpose, repositories, technologies, and tools that interact synergistically to manage data storage and provide performant enterprise analytical capabilities. The original unmet analytical requirements of the traditional data warehouse were to be able to retrieve data using a single query language, get speedy query response, and to have the ability to quickly assemble different data models or views of the data to meet specific needs. By combining data federation, physical data integration, and a common query language (SQL), the logical data warehouse approach achieves all three of these goals without the need to copy or move all of the data to a central location.

18 THE COMPLETE GUIDE TO DATA INTEGRATION Physical data integration is a robust feature of the logical data warehouse that ensures fast query response while decoupling performance from the source data stores and moving it to the logical data warehouse repository. In this manner, the effort-intensive, physical transfer of the data is minimized and simplified, effectively removing lengthy data movement delays from the critical path of data integration projects. In Understanding : The Emerging Practice, Gartner weighed in on this approach, pointing out that it offers flexibility for companies that have different data requirements at different times. For example, many use cases require a central repository, such as a traditional data warehouse or analytic database, where data that is needed frequently, or with the greatest retrieval speed can be stored and optimized for performance. Increasingly, data analysts must be able to explore data freely with guaranteed adequate query performance. Frequent uses cases along these lines are sentiment analysis or fraud detection analysis. These use cases require a distributed technology such as Hadoop to store the massive amounts of data available through social media feeds and click stream activity logs. Additionally, they demand direct access to data sources via data federation. As Gartner rightly indicates, a logical layer is needed on top of these technologies in order to unify the architecture and allow queries and processes to operate on all systems concurrently as needed.

19 THE COMPLETE GUIDE TO DATA INTEGRATION

20 THE COMPLETE GUIDE TO DATA INTEGRATION As the first logical data warehouse, Data Virtuality provides this uniform layer over numerous data storage technologies, unifying these data stores and facilitating the use cases suggested above by Gartner. By routing queries among data stores behind the scenes as needed, the Data Virtuality technology offers great benefits to business users. The business can use the same platform for handling a variety of use cases, for example, far more than could be handled by a traditional data warehouse. Also, new approaches to data integration are possible, enabling users to put business needs first and allow the technology platform to adapt as needed. By decoupling the semantic unified data access layer in which the business users interact from the actual data sources, changes occurring in the original data source can be isolated from interfering with analytical processes. In a profound departure from past data accessibility strategies, business users can interact with data comfortably and easily, focusing on their objective rather than the technological underpinnings. By consolidating relational and non-relational data sources, including real-time data, Data Virtuality enables immediate analysis via SQL query language. Data Virtuality provides a central data cockpit, allowing all data sources, whether analytical or operational, to freely interchange data. Integrated connectors allow data to be immediately processed in analysis, planning, or statistics tools, or written back to source systems as needed. In addition, the logical data warehouse automatically adjusts to changes in the IT landscape and user behavior, offering the highest possible degree of flexibility and speed, with little administrative overhead. In a logical data warehouse project, a few clicks can seamlessly connect all data-producing and data-processing systems, including ERP and CRM systems, web shops, social media applications, and just about any SQL and No-SQL data source,

21 THE COMPLETE GUIDE TO DATA INTEGRATION all in real time. With instant access to the data, users can begin experimenting with these connections and joins until they achieve the results they want. In stark contrast to traditional ETL solutions, the key difference with the logical data warehouse is that there s no need to move the data to analyze it. This greatly reduces development and database structuring time and costs. Equally flexible and responsive, the logical data warehouse is a completely different data integration paradigm than the inflexible traditional data warehouse approach. The logical data warehouse works by intelligently marrying two distinct technologies to create an entirely new manner of integrating data. The first technology is data federation, which connects two or more disparate databases and makes them all appear as if they were a single database. The second is analytical database management providing semantic business-friendly data element naming and modeling allowing flexible ingestion and modeling options. The results are profound. Data federation alone offers flexibility, but can t scale. Analytical database management scales beautifully, but is inflexible. The combination of the two enables breakaway flexibility and performance and represents an entirely new paradigm in the way we think, manage, and work with data. For example, a logical data warehouse can connect to a variety of data sources simultaneously, including classic relational databases like Oracle and MS-SQL; No-SQL databases like MongoDB or Hadoop; column stores like Vertica or SAP HANA; or web services like Google Analytics, AdWords, Facebook, Twitter, and others. Once these have been connected, the resulting integrated overarching view of the data appears within a data analysis tool as if everything was contained in a single SQL

22 THE COMPLETE GUIDE TO DATA INTEGRATION database, accessible with a common query language. Virtually any data analysis tool currently on the market (such as Qlik, Tableau, Aqua Data, etc.) can connect, query, and analyze data over the virtual layer with no need to pull or copy data from any location. The method offers vast new opportunities and possibilities for data exploration, data discovery, rapid prototyping, and intuitive experimentation. Business users can get results instantly and can refactor data models just as quickly. Further, building logical data views as shareable components including common KPIs and metrics can ensure that every report, every visualization, and every query response conforms to the same corporate standards and definitions. Data Virtuality acts as a central hub connecting all systems and applications within the enterprise, enabling data exchange between systems, and ensuring the latest data is available anywhere and anytime. LOGICAL DATAWAREHOUSE AT A GLANCE + consolidation of structure, unstructured and real-time data by combination of data federations and analytical database management no need to move data for analysis immediate processing (analysis) or writing back in data sources central hub connectiong all systems and applications within the enterprise secures latest data everywhere at any time needs at least 10 different data sources to show full efficiency no integrated analytical tool

23 THE COMPLETE GUIDE TO DATA INTEGRATION A MODERN DATA WAREHOUSE The logical data warehouse is essential for organizations that wish to combine big data and data warehousing in the enterprise. A VIRTUAL DATA MART A logical data warehouse makes it easy to create a virtual data mart for expediency. By combining an organization s primary data infrastructure with auxiliary data sources relevant to specific, data-driven business units, initiatives can move forward more quickly than if data would need to be on-boarded to a traditional data warehouse. AN EVOLVING CORPORATION Modern data integration allows rapidly changing organizations to quickly combine data from disparate business units and provide BI & analytical transparency to top management. This kind of flexibility is crucial for strategic changes, mergers and acquisitions, and other sensitive operations where there s no time to waste building a central data warehouse. E-COMMERCE Modern data integration offers a compelling solution for e-commerce and retail organizations with a great number of different systems in the IT landscape. For example, a typical e-commerce business has an ERP system, CRM, web and mobile apps, analytics programs, online marketing, social media marketing, and other tools. With a logical data warehouse all of these data sources can be joined quickly and flexibly to provide 360 degree views of customers, products, etc.

24 THE COMPLETE GUIDE TO DATA INTEGRATION DIGITAL MARKETING Digital marketing is extremely data-driven, relying on the volatile flow of real-time data. A logical data warehouse offers the only viable way to manage complexity of this kind, easily connecting to a host of digital marketing data providers for affiliate marketing, performance marketing, personalization, and other approaches. MAKING DATA ACTIONABLE Modern data integration methods go the extra mile by making data actionable. In addition to receiving the data in one direction for analysis, a user can return data, or essentially trigger actions based on the data. For example, the solution can analyze data from ERP, CRM, and a web shop simultaneously to trigger marketing campaigns unconstrained by traditional business hours. REAL-TIME ANALYSIS The logical data warehouse excels at manipulating real-time data and can flexibly model and re-model the data to fit the latest analytical initiatives. INTEGRATING BIG DATA The open-source, big data solution Hadoop, is adept at analyzing unstructured data and performing batch analysis, but performs poorly in interactive situations. To achieve real-time functionality, companies must combine the traditional data warehouse with modern big data tools, and often multiple ones, such as an Oracle warehouse with Hadoop and Greenplum. Unifying these data sources into one common view provides instant access to a 360 degree view of your organization.

25 THE COMPLETE GUIDE TO DATA INTEGRATION In this digital era, harnessing large amounts of data to make astute business decisions and improve operations is an imperative. While our ability to generate data still far outstrips our ability to effectively analyze it, we are making great progress in balancing these out. Exciting new approaches are merging big data solutions with traditional enterprise data strategies. Without the need for a central repository, logical data warehouses hold enormous promise. By offering an ecosystem of multiple, best-fit repositories, technologies, and tools, businesses can now effectively and rapidly analyze realtime data in pursuit of valuable insight. For organizations sifting through reams of data for treasure, these virtual data lakes represent the Holy Grail that can help them tailor products and fulfill desires we haven t yet dreamed of.

26 THE COMPLETE GUIDE TO DATA INTEGRATION work together to offer impressive in-memory data processing for big data applications. Although there has been hope that the in-memory capabilities of Spark would solve many of the latency issues related to Hadoop, both technologies have limitations and fall short of a one-size fits all solution. Apache Hadoop is an open-source software framework providing distributed storage and processing of very large data sets data sets so large that it would not be economical to store them in most any other data storage technology. Hadoop accomplishes this by using a multiple server, clustering approach that removes many earlier constraints regarding the storing and processing of large data sets. To process data, Hadoop s MapReduce function abandons the convention of moving the data over a network to the application server for processing. Instead of moving data to the application server, MapReduce analyzes data on the individual servers and then compiles the results from the individual servers into a single response to the query. Hadoop itself is not a single system, but rather an ecosystem of numerous interconnected products that allows users to run various types of analytics and operations on any type of data. Hadoop is an open source system so it is constantly evolving and improving. While Hadoop is complex to use, startups and established companies alike are quickly creating tools to simplify and expand the use of Hadoop. For example, executing queries within the Hadoop ecosystem originally required extensive knowledge of new and lesser known programming languages such as map-reduce, pig, and python. The results of this custom coding was that these queries could be performed on data types previously impossible to query such as unstructured data, but at the cost of there being fewer programmers available to write

27 THE COMPLETE GUIDE TO DATA INTEGRATION and run these queries. Currently, however, there are numerous products available that allow using the very popular SQL query language to analyze data stored in Hadoop. Classic Hadoop is in itself batch-oriented and as such, is capable of analyzing vast amounts of data with relative ease by distributing the work across a number of different Hadoop nodes that act in parallel to provide the results. However, analyzing smaller amounts of data requires just as much complexity and programming as the processing of large data sets so overall it is a rather slow method to query data. and related technologies are making an effort to improve Hadoop query performance by adding a fast, in-memory, data processing engine with development APIs. The objective is that technologies such as these will eventually allow data workers to execute streaming, machine learning, or SQL workloads on Hadoop in a timely manner and with less custom coding. While almost any analytical task can be undertaken with Hadoop, including analysis of very large amounts of data like fraud and sentiment analysis, overall, it remains a relatively immature technology whose ecosystem is not yet fully integrated and requires custom coding at several junctures for complete functionality. Because it s highly technical and difficult to use, most often success within Hadoop comes in the form of an inexpensive data archive.

28 THE COMPLETE GUIDE TO DATA INTEGRATION Data Virtuality GmbH develops and distributes the software DataVirtuality, which affords companies an especially simple means of integrating and connecting a variety of data and applications. The solution is revolutionizing the technological concept of data virtualization and generates a data warehouse consisting of relational and non-relational data sources in just a few days. Using integrated connectors, the data can be immediately processed in analysis, planning or statistics tools or written back to source systems as needed. The data warehouse also automatically adjusts to changes in IT landscape and user behavior, which lends companies using DataVirtuality the highest possible degree of flexibility and swiftness with minimum administrative overhead. Founded in 2012, the Leipzig and San Francisco-based company originated from a research initiative of the Chair of Information Technology at the Universität Leipzig and is financed by Technologiegründerfonds Sachsen (TGFS) and High-Tech Gründerfonds (HTGF). COMPANY CONTACT: Nick Golovin, Ph.D. Founder and CEO Data Virtuality GmbH phone: nick.golovin@datavirtuality.com

How to integrate data into Tableau

How to integrate data into Tableau 1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service

More information

Fast Innovation requires Fast IT

Fast Innovation requires Fast IT Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:

More information

Strategic Briefing Paper Big Data

Strategic Briefing Paper Big Data Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which

More information

Drawing the Big Picture

Drawing the Big Picture Drawing the Big Picture Multi-Platform Data Architectures, Queries, and Analytics Philip Russom TDWI Research Director for Data Management August 26, 2015 Sponsor 2 Speakers Philip Russom TDWI Research

More information

Shine a Light on Dark Data with Vertica Flex Tables

Shine a Light on Dark Data with Vertica Flex Tables White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,

More information

Hybrid Data Platform

Hybrid Data Platform UniConnect-Powered Data Aggregation Across Enterprise Data Warehouses and Big Data Storage Platforms A Percipient Technology White Paper Author: Ai Meun Lim Chief Product Officer Updated Aug 2017 2017,

More information

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization Composite Software, Inc. June 2011 TABLE OF CONTENTS INTRODUCTION... 3 DATA FEDERATION... 4 PROBLEM DATA CONSOLIDATION

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications

More information

Introduction to Big-Data

Introduction to Big-Data Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,

More information

Progress DataDirect For Business Intelligence And Analytics Vendors

Progress DataDirect For Business Intelligence And Analytics Vendors Progress DataDirect For Business Intelligence And Analytics Vendors DATA SHEET FEATURES: Direction connection to a variety of SaaS and on-premises data sources via Progress DataDirect Hybrid Data Pipeline

More information

Hierarchy of knowledge BIG DATA 9/7/2017. Architecture

Hierarchy of knowledge BIG DATA 9/7/2017. Architecture BIG DATA Architecture Hierarchy of knowledge Data: Element (fact, figure, etc.) which is basic information that can be to be based on decisions, reasoning, research and which is treated by the human or

More information

THE RISE OF. The Disruptive Data Warehouse

THE RISE OF. The Disruptive Data Warehouse THE RISE OF The Disruptive Data Warehouse CONTENTS What Is the Disruptive Data Warehouse? 1 Old School Query a single database The data warehouse is for business intelligence The data warehouse is based

More information

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value KNOWLEDGENT INSIGHTS volume 1 no. 5 October 7, 2011 Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value Today s growing commercial, operational and regulatory

More information

When, Where & Why to Use NoSQL?

When, Where & Why to Use NoSQL? When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),

More information

Data Analytics at Logitech Snowflake + Tableau = #Winning

Data Analytics at Logitech Snowflake + Tableau = #Winning Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief

More information

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their

More information

The #1 Key to Removing the Chaos. in Modern Analytical Environments

The #1 Key to Removing the Chaos. in Modern Analytical Environments October/2018 Advanced Data Lineage: The #1 Key to Removing the Chaos in Modern Analytical Environments Claudia Imhoff, Ph.D. Sponsored By: Table of Contents Executive Summary... 1 Data Lineage Introduction...

More information

The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization

The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization May 2014 Prepared by: Zeus Kerravala The Top Five Reasons to Deploy Software-Defined Networks and Network Functions

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR Table of Contents Foreword... 2 New Era of Rapid Data Warehousing... 3 Eliminating Slow Reporting and Analytics Pains... 3 Applying 20 Years

More information

Microsoft Analytics Platform System (APS)

Microsoft Analytics Platform System (APS) Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual

More information

Introduction to Data Science

Introduction to Data Science UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics

More information

Capture Business Opportunities from Systems of Record and Systems of Innovation

Capture Business Opportunities from Systems of Record and Systems of Innovation Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information

More information

How to Evaluate a Next Generation Mobile Platform

How to Evaluate a Next Generation Mobile Platform How to Evaluate a Next Generation Mobile Platform appcelerator.com Introduction Enterprises know that mobility presents an unprecedented opportunity to transform businesses and build towards competitive

More information

FINANCIAL REGULATORY REPORTING ACROSS AN EVOLVING SCHEMA

FINANCIAL REGULATORY REPORTING ACROSS AN EVOLVING SCHEMA FINANCIAL REGULATORY REPORTING ACROSS AN EVOLVING SCHEMA MODELDR & MARKLOGIC - DATA POINT MODELING MARKLOGIC WHITE PAPER JUNE 2015 CHRIS ATKINSON Contents Regulatory Satisfaction is Increasingly Difficult

More information

How to Accelerate Merger and Acquisition Synergies

How to Accelerate Merger and Acquisition Synergies How to Accelerate Merger and Acquisition Synergies MERGER AND ACQUISITION CHALLENGES Mergers and acquisitions (M&A) occur frequently in today s business environment; $3 trillion in 2017 alone. 1 M&A enables

More information

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways

More information

SQL Maestro and the ELT Paradigm Shift

SQL Maestro and the ELT Paradigm Shift SQL Maestro and the ELT Paradigm Shift Abstract ELT extract, load, and transform is replacing ETL (extract, transform, load) as the usual method of populating data warehouses. Modern data warehouse appliances

More information

Improving the ROI of Your Data Warehouse

Improving the ROI of Your Data Warehouse Improving the ROI of Your Data Warehouse Many organizations are struggling with a straightforward but challenging problem: their data warehouse can t affordably house all of their data and simultaneously

More information

Cloud Computing: Making the Right Choice for Your Organization

Cloud Computing: Making the Right Choice for Your Organization Cloud Computing: Making the Right Choice for Your Organization A decade ago, cloud computing was on the leading edge. Now, 95 percent of businesses use cloud technology, and Gartner says that by 2020,

More information

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc. JAVASCRIPT CHARTING Scaling for the Enterprise with Metric Insights 2013 Copyright Metric insights, Inc. A REVOLUTION IS HAPPENING... 3! Challenges... 3! Borrowing From The Enterprise BI Stack... 4! Visualization

More information

Evolving To The Big Data Warehouse

Evolving To The Big Data Warehouse Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from

More information

Azure Data Factory. Data Integration in the Cloud

Azure Data Factory. Data Integration in the Cloud Azure Data Factory Data Integration in the Cloud 2018 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and

More information

Introduction to K2View Fabric

Introduction to K2View Fabric Introduction to K2View Fabric 1 Introduction to K2View Fabric Overview In every industry, the amount of data being created and consumed on a daily basis is growing exponentially. Enterprises are struggling

More information

Optimizing Apache Spark with Memory1. July Page 1 of 14

Optimizing Apache Spark with Memory1. July Page 1 of 14 Optimizing Apache Spark with Memory1 July 2016 Page 1 of 14 Abstract The prevalence of Big Data is driving increasing demand for real -time analysis and insight. Big data processing platforms, like Apache

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes?

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? White Paper Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? How to Accelerate BI on Hadoop: Cubes or Indexes? Why not both? 1 +1(844)384-3844 INFO@JETHRO.IO Overview Organizations are storing more

More information

Appliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1

Appliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1 Appliances and DW Architecture John O Brien President and Executive Architect Zukeran Technologies 1 OBJECTIVES To define an appliance Understand critical components of a DW appliance Learn how DW appliances

More information

Why Converged Infrastructure?

Why Converged Infrastructure? Why Converged Infrastructure? Three reasons to consider converged infrastructure for your organization Converged infrastructure isn t just a passing trend. It s here to stay. A recent survey 1 by IDG Research

More information

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been

More information

The Data Explosion. A Guide to Oracle s Data-Management Cloud Services

The Data Explosion. A Guide to Oracle s Data-Management Cloud Services The Data Explosion A Guide to Oracle s Data-Management Cloud Services More Data, More Data Everyone knows about the data explosion. 1 And the challenges it presents to businesses large and small. No wonder,

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

Q1) Describe business intelligence system development phases? (6 marks)

Q1) Describe business intelligence system development phases? (6 marks) BUISINESS ANALYTICS AND INTELLIGENCE SOLVED QUESTIONS Q1) Describe business intelligence system development phases? (6 marks) The 4 phases of BI system development are as follow: Analysis phase Design

More information

PERSPECTIVE. Data Virtualization A Potential Antidote for Big Data Growing Pains. Abstract

PERSPECTIVE. Data Virtualization A Potential Antidote for Big Data Growing Pains. Abstract PERSPECTIVE Data Virtualization A Potential Antidote for Big Data Growing Pains Abstract Enterprises are already facing challenges around data consolidation, heterogeneity, quality, and value. Now they

More information

Big Data Integration BIG DATA 9/15/2017. Business Performance

Big Data Integration BIG DATA 9/15/2017. Business Performance BIG DATA Business Performance Big Data Integration Big data is often about doing things that weren t widely possible because the technology was not advanced enough or the cost of doing so was prohibitive.

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Přehled novinek v SQL Server 2016

Přehled novinek v SQL Server 2016 Přehled novinek v SQL Server 2016 Martin Rys, BI Competency Leader martin.rys@adastragrp.com https://www.linkedin.com/in/martinrys 20.4.2016 1 BI Competency development 2 Trends, modern data warehousing

More information

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems 1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for

More information

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata Research Report IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata Executive Summary The problem: how to analyze vast amounts of data (Big Data) most efficiently. The solution: the solution is threefold:

More information

EXPERIENCE THE POWER OF METADATA-DRIVEN AUTOMATION. Explore Features You Wish You Had DISCOVERY HUB.

EXPERIENCE THE POWER OF METADATA-DRIVEN AUTOMATION. Explore Features You Wish You Had DISCOVERY HUB. EXPERIENCE THE POWER OF METADATA-DRIVEN AUTOMATION Explore Features You Wish You Had DISCOVERY HUB www.timextender.com GENERAL INTRODUCTION MARKET CHALLENGES: TIME TO DATA MATTERS The need for joint cooperation

More information

Information empowerment for your evolving data ecosystem

Information empowerment for your evolving data ecosystem Information empowerment for your evolving data ecosystem Highlights Enables better results for critical projects and key analytics initiatives Ensures the information is trusted, consistent and governed

More information

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp. Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020

More information

Preparing your network for the next wave of innovation

Preparing your network for the next wave of innovation Preparing your network for the next wave of innovation The future is exciting. Ready? 2 Executive brief For modern businesses, every day brings fresh challenges and opportunities. You must be able to adapt

More information

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous

More information

Caché and Data Management in the Financial Services Industry

Caché and Data Management in the Financial Services Industry Caché and Data Management in the Financial Services Industry Executive Overview One way financial services firms can improve their operational efficiency is to revamp their data management infrastructure.

More information

ELTMaestro for Spark: Data integration on clusters

ELTMaestro for Spark: Data integration on clusters Introduction Spark represents an important milestone in the effort to make computing on clusters practical and generally available. Hadoop / MapReduce, introduced the early 2000s, allows clusters to be

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Postgres Plus and JBoss

Postgres Plus and JBoss Postgres Plus and JBoss A New Division of Labor for New Enterprise Applications An EnterpriseDB White Paper for DBAs, Application Developers, and Enterprise Architects October 2008 Postgres Plus and JBoss:

More information

Virtualizing the SAP Infrastructure through Grid Technology. WHITE PAPER March 2007

Virtualizing the SAP Infrastructure through Grid Technology. WHITE PAPER March 2007 Virtualizing the SAP Infrastructure through Grid Technology WHITE PAPER March 2007 TABLE OF CONTENTS TABLE OF CONTENTS 2 Introduction 3 The Complexity of the SAP Landscape 3 Specific Pain Areas 4 Virtualizing

More information

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g Vlamis Software Solutions, Inc. Founded in 1992 in Kansas City, Missouri Oracle Partner and reseller since 1995 Specializes

More information

Part 1: Indexes for Big Data

Part 1: Indexes for Big Data JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,

More information

Extending the Value of MDM Through Data Virtualization

Extending the Value of MDM Through Data Virtualization Extending the Value of MDM Through Data Virtualization Perspective on how data virtualization adds business value to MDM implementations Audience Business Stakeholders Line of Business Managers Enterprise

More information

Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery

Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery White Paper Business Continuity Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery Table of Contents Executive Summary... 1 Key Facts About

More information

Building a Data Strategy for a Digital World

Building a Data Strategy for a Digital World Building a Data Strategy for a Digital World Jason Hunter, CTO, APAC Data Challenge: Pushing the Limits of What's Possible The Art of the Possible Multiple Government Agencies Data Hub 100 s of Service

More information

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group

More information

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations Table of contents Faster Visualizations from Data Warehouses 3 The Plan 4 The Criteria 4 Learning

More information

Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability

Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability IT teams in companies of all sizes face constant pressure to meet the Availability requirements of today s Always-On

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC SAP Agile Data Preparation Simplify the Way You Shape Data Introduction SAP Agile Data Preparation Overview Video SAP Agile Data Preparation is a self-service data preparation application providing data

More information

IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK

IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK DR. KONSTANTIN BOUDNIK DR.KONSTANTIN BOUDNIK EPAM SYSTEMS CHIEF TECHNOLOGIST BIGDATA, OPEN SOURCE

More information

Massive Scalability With InterSystems IRIS Data Platform

Massive Scalability With InterSystems IRIS Data Platform Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special

More information

12 Minute Guide to Archival Search

12 Minute Guide to  Archival Search X1 Technologies, Inc. 130 W. Union Street Pasadena, CA 91103 phone: 626.585.6900 fax: 626.535.2701 www.x1.com June 2008 Foreword Too many whitepapers spend too much time building up to the meat of the

More information

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. 17-18 March, 2018 Beijing Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020 Today, 80% of organizations

More information

Accelerate Your Enterprise Private Cloud Initiative

Accelerate Your Enterprise Private Cloud Initiative Cisco Cloud Comprehensive, enterprise cloud enablement services help you realize a secure, agile, and highly automated infrastructure-as-a-service (IaaS) environment for cost-effective, rapid IT service

More information

VOLTDB + HP VERTICA. page

VOLTDB + HP VERTICA. page VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics

More information

Real Time for Big Data: The Next Age of Data Management. Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104

Real Time for Big Data: The Next Age of Data Management. Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104 Real Time for Big Data: The Next Age of Data Management Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104 Real Time for Big Data The Next Age of Data Management Introduction

More information

The Evolution of Big Data Platforms and Data Science

The Evolution of Big Data Platforms and Data Science IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes

Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes AN UNDER THE HOOD LOOK Databricks Delta, a component of the Databricks Unified Analytics Platform*, is a unified

More information

New Approaches to Big Data Processing and Analytics

New Approaches to Big Data Processing and Analytics New Approaches to Big Data Processing and Analytics Contributing authors: David Floyer, David Vellante Original publication date: February 12, 2013 There are number of approaches to processing and analyzing

More information

Full file at

Full file at Chapter 2 Data Warehousing True-False Questions 1. A real-time, enterprise-level data warehouse combined with a strategy for its use in decision support can leverage data to provide massive financial benefits

More information

Oracle Big Data SQL brings SQL and Performance to Hadoop

Oracle Big Data SQL brings SQL and Performance to Hadoop Oracle Big Data SQL brings SQL and Performance to Hadoop Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data SQL, Hadoop, Big Data Appliance, SQL, Oracle, Performance, Smart Scan Introduction

More information

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION The process of planning and executing SQL Server migrations can be complex and risk-prone. This is a case where the right approach and

More information

White Paper: Delivering Enterprise Web Applications on the Curl Platform

White Paper: Delivering Enterprise Web Applications on the Curl Platform White Paper: Delivering Enterprise Web Applications on the Curl Platform Table of Contents Table of Contents Executive Summary... 1 Introduction... 2 Background... 2 Challenges... 2 The Curl Solution...

More information

WHITEPAPER. MemSQL Enterprise Feature List

WHITEPAPER. MemSQL Enterprise Feature List WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure

More information

Four Steps to Unleashing The Full Potential of Your Database

Four Steps to Unleashing The Full Potential of Your Database Four Steps to Unleashing The Full Potential of Your Database This insightful technical guide offers recommendations on selecting a platform that helps unleash the performance of your database. What s the

More information

MODERNIZE INFRASTRUCTURE

MODERNIZE INFRASTRUCTURE SOLUTION OVERVIEW MODERNIZE INFRASTRUCTURE Support Digital Evolution in the Multi-Cloud Era Agility and Innovation Are Top of Mind for IT As digital transformation gains momentum, it s making every business

More information

E-Guide DATABASE DESIGN HAS EVERYTHING TO DO WITH PERFORMANCE

E-Guide DATABASE DESIGN HAS EVERYTHING TO DO WITH PERFORMANCE E-Guide DATABASE DESIGN HAS EVERYTHING TO DO WITH PERFORMANCE D atabase performance can be sensitive to the adjustments you make to design. In this e-guide, discover the affects database performance data

More information

Big Data The end of Data Warehousing?

Big Data The end of Data Warehousing? Big Data The end of Data Warehousing? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Big data, data warehousing, advanced analytics, Hadoop, unstructured data Introduction If there was an Unwort

More information

Data Lake Based Systems that Work

Data Lake Based Systems that Work Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a

More information

Enterprise Data Architecture: Why, What and How

Enterprise Data Architecture: Why, What and How Tutorials, G. James, T. Friedman Research Note 3 February 2003 Enterprise Data Architecture: Why, What and How The goal of data architecture is to introduce structure, control and consistency to the fragmented

More information

Big Data Specialized Studies

Big Data Specialized Studies Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate

More information

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

MAPR DATA GOVERNANCE WITHOUT COMPROMISE MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance

More information

The Business Value of Metadata for Data Governance: The Challenge of Integrating Packaged Applications

The Business Value of Metadata for Data Governance: The Challenge of Integrating Packaged Applications The Business Value of Metadata for Data Governance: The Challenge of Integrating Packaged Applications By Donna Burbank Managing Director, Global Data Strategy, Ltd www.globaldatastrategy.com Sponsored

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC) An Oracle White Paper June 2011 (EHCC) Introduction... 3 : Technology Overview... 4 Warehouse Compression... 6 Archive Compression... 7 Conclusion... 9 Introduction enables the highest levels of data compression

More information

Answer: A Reference:http://www.vertica.com/wpcontent/uploads/2012/05/MicroStrategy_Vertica_12.p df(page 1, first para)

Answer: A Reference:http://www.vertica.com/wpcontent/uploads/2012/05/MicroStrategy_Vertica_12.p df(page 1, first para) 1 HP - HP2-N44 Selling HP Vertical Big Data Solutions QUESTION: 1 When is Vertica a better choice than SAP HANA? A. The customer wants a closed ecosystem for BI and analytics, and is unconcerned with support

More information

QLIKVIEW ARCHITECTURAL OVERVIEW

QLIKVIEW ARCHITECTURAL OVERVIEW QLIKVIEW ARCHITECTURAL OVERVIEW A QlikView Technology White Paper Published: October, 2010 qlikview.com Table of Contents Making Sense of the QlikView Platform 3 Most BI Software Is Built on Old Technology

More information

Moving Technology Infrastructure into the Future: Value and Performance through Consolidation

Moving Technology Infrastructure into the Future: Value and Performance through Consolidation Moving Technology Infrastructure into the Future: Value and Performance through Consolidation An AMI-Partners Business Benchmarking White Paper Sponsored by: HP Autumn Watters Ryan Brock January 2014 Introduction

More information