The Complete Guide to Data Integration 2017
|
|
- Ethel Jennings
- 6 years ago
- Views:
Transcription
1 1 The Complete Guide to Data 2017 Simplifying data integration for the modern era E-BOOK
2 THE COMPLETE GUIDE TO DATA INTEGRATION The Complete Guide to Data 2017 Simplifying data integration for the modern era AUDIENCE: BI Managers Practitioners Project Managers Solution Architects Unlocking the Big data is here, and it s transforming the very nature of commerce, enabling new insights, and accelerating the generation of business insights. While the concept of big data isn t new, its potential is just now being realized as powerful tools to organize, manage, and analyze, immense volumes of enterprise-generated and third-party data finally become available for mainstream use. However, for many organizations, it s not so easy to unlock the value in this data. While data volume (the amount of data) and velocity (speed that data is generated) is in part what makes it so valuable, volume and velocity also present significant challenges. Still more daunting is the broad variation in the types and sources of data (variety), including highly structured files, semi-structured text, and unstructured video and audio feeds.
3 THE COMPLETE GUIDE TO DATA INTEGRATION Biggest Big Data Challenge for Businesses 49% variety 16% velocity 35% volume In a recent Gartner study, 49% of organizations reported that they struggled the most with the variety of big data compared to 35% citing volume as their most significant problem, and 16% of organizations claimed velocity was the largest problem relating to big data 1. Contending with data from multiple databases and systems has always been a challenge but now with increasingly different types of data, the task has become overwhelming. In addition, with data distributed across disparate systems, sources, and silos, it can be a seemingly impossible challenge to obtain a unified, enterprise-wide view of the information available for analysis. For companies attempting to integrate this onslaught of data in the same manner as was popular 20 years ago with traditional data warehouse approaches it is indeed impossible, or close to it. To extract real value from data organizations must ingest and process data from both internal and external sources and perform near real-time analysis not an easy task. Faced with these challenges, traditional data warehouse solutions cannot keep up with rapidly changing data ecosystems. 1 Gartner, 2014, Survey Analysis: Big Data Adoption in 2013
4 THE COMPLETE GUIDE TO DATA INTEGRATION In a typical IT environment, traditional data warehouses ingest, model, and store data through an extract, transform, and load process (ETL). These ETL jobs are used to move large amounts of data in a batch-oriented manner and are most commonly scheduled to run daily. Running these jobs daily means that, at best, the warehoused data is a few hours old, but it is typically a day or more old. Because ETL jobs consume significant CPU, memory, disk space, and network bandwidth, it is difficult to justify running these jobs more than once daily. In a time when APIs were not as prevalent as they are now, ETL tools were the go-to solution for operational use cases. With APIs now in the picture and the sheer variety of data they represent the ETL method is becoming impractical. However, even before the era of APIs and big data, ETL tools posed significant challenges, mainly because they require
5 THE COMPLETE GUIDE TO DATA INTEGRATION comprehensive knowledge of each operational database or application. Interconnectivity is complicated and requires thorough knowledge of each data source all the way down to the field level. The greater the number of interconnected systems that are to be included in the data warehouse, the more complicated the effort is. In this digital era, new requirements arise faster than ever before and previous requirements change as quickly making development agility and responsiveness necessary factors for success. As such, ETL-based data warehousing projects became infamous for appallingly high failure rates. When these projects don t fail outright they are frequently plagued with cost over runs, and delayed implementations. Great care is needed to conceptualize the database and thoroughly define requirements to avoid having to re-work complicated and brittle connections, since tightly coupled interdependencies often trigger unpredictable and far-reaching impacts even when slight changes are made. Another shortcoming of the ETL data warehouse approach is that the business staff rarely gets an opportunity to see the results until after several months of development work has been completed. By this point it is common that requirements have changed, errors have been discovered, or the objective of the project has shifted. Any of these variables might force IT back to the drawing board to collect new requirements, and in all likelihood months of development effort will be scrapped. In fact, Gartner estimated that between 70 and 80 percent of corporate business intelligence projects failed to deliver the expected outcomes Poor-communication-to-blame-for-business-intelligencefailure-says-Gartner
6 THE COMPLETE GUIDE TO DATA INTEGRATION Data warehouses were originally built for operational reporting rather than for interactive data analysis and using a traditional data warehouse for analytic queries requires carefully building just the right structure and performing extensive and specific performance optimization. If you later decide to use the data differently, you must change the data structure and re-optimize, which is a very cumbersome and costly process. The inherent problems of the traditional ETL approach is compounded by the sheer number of data sources available and the myriad ways to access data such as the proliferation of APIs that rely on importing and exporting data, each of which has its own access protocol. While it s technically possible to implement this sort of connectivity through ETL, the actual implementation would be overly complex, difficult to maintain, and costly to extend, problems that are made worse if the APIs do not use data exchange standards such as ODBC or JDBC. In this digital era, new requirements arise faster than ever before and previous requirements change as quickly making development agility and responsiveness necessary factors for success. Because of issues like these, traditional data warehouses simply can t cope with the needs of today s businesses and related overall digital transformation trends. Because of the shortcomings of the traditional data warehouse approach, new approaches to data processing emerged and what came next was multi-dimensional OLAP methodology.
7 THE COMPLETE GUIDE TO DATA INTEGRATION THE TRADITIONAL WAREHOUSE AT A GLANCE + Move large amounts of data Built for operational reporting Significant consummation of bandwidth, CPU etc. Long development cycles (several months) No interactive data analysis High complexity due to high number of potential ways to integrate data OLAP Online Analytical Processing (OLAP), and cubes are other words for multi-dimensional sets of data that essentially serve as a staging space in which to analyze information. These special online analytic processing databases hold data not in tables but in OLAP cubes which are a mechanism used to store and query data in an organized, multi-dimensional, structure specifically optimized for analysis. OLAP databases are designed to pre-calculate as many queries and combinations of data fields as possible in order to provide fast query response. However, while these solutions perform better than classical relational databases, their multidimensional structure makes them inflexible and unable to accommodate changes easily. In addition, storing large amounts of data in a cube causes a performance bottleneck. While OLAP databases are quite useful for basic use cases, large data sets
8 THE COMPLETE GUIDE TO DATA INTEGRATION require using capabilities from additional tools in tandem, which complicates analytical efforts and requires unique skills. ROLAP Another way to organize data for multi-dimensional querying is relational online analytic processing (ROLAP). ROLAP is a form of OLAP that performs multi-dimensional analysis of data stored in a relational database rather than in a multi-dimensional database, which is considered the OLAP standard. Although ROLAP technology performs better than OLAP databases when processing large amounts of data, it cannot beat the speed and efficiency of OLAP on smaller amounts of data. ROLAP databases require a great deal of manual maintenance and are difficult for business users to operate so ROLAP is considered to be more inflexible than OLAP cubes. OLAP and ROLAP are both still popular today but neither technology can keep up with today s demands for near real time data for analytics nor handle unstructured data.
9 THE COMPLETE GUIDE TO DATA INTEGRATION MULTIDIMENSIONAL DATABASES (OLAP, ROLAP) AT A GLANCE Store and query data in an organized way Fast query response due to pre-calculation Fast and efficient for small amounts of data + Problems with large amounts of data Inflexibilty through multidimensional structure Performance bottleneck due to storage limitations of cubes Need for manual maintenance Difficult to use for business users Need for additional tools when dealing with high data volumes Because both the data warehouse and OLAP approaches fall short of business expectations for speedy and comprehensive analytical data access, a new approach surfaced. Self-service business intelligence (SSBI) technologies like Qlik and Tableau introduced an approach to data analytics that enables business users to access and work with corporate information without the IT department s involvement. These SSBI tools have the capability of blending or locally integrating data from the data warehouse with any other data sources not stored in the data warehouse. This is accomplished through pulling copies of the data sources
10 THE COMPLETE GUIDE TO DATA INTEGRATION into a local data store where the analyst can blend or integrate the data as needed. These self-service tools are flexible and relatively easy to implement and provide a good level of independence for data analysts but there are clear disadvantages to the approach. The most prominent disadvantage is that data analysis performed in this manner quickly becomes unmanageable, resulting in redundant work, inconsistent results and in short, chaotic reporting practices when used on a broad scale throughout organizations. Since everybody has the ability to define their own rules and calculations, it is both possible and likely for different groups and individuals to calculate the same KPIs and metrics in different ways, leading to an array of conflicting results and the publishing of both confusing and contradictory information. Because these solutions have no permissions structure, there is no security layer to protect sensitive data which is a severe vulnerability since analysts frequently and casually exchange data files. Also, the ability to transform the data is relatively limited in most cases. Further, because many machines are doing the same work for different users in parallel, powerful computer resources are being used inefficiently, contributing to rising costs and lower system performance. For all of these reasons, pure SSBI tools can fill a limited and short term need but fall short of being an endto-end enterprise level analytical solution.
11 THE COMPLETE GUIDE TO DATA INTEGRATION SELF-SERVICE BI TOOLS AT A GLANCE + Enable business user to perform analysis without IT-support Data blending of external data sources with data warehouse Flexible and easy to implement Different KPI calculations due de-cetralised analytics No security layer Limited data transformation capabilities Inefficient use of resources due to parallel usage As SSBI tools evolved, data scientists were still wrestling with the overall challenge of finding an analytical database as flexible for analytics as relational databases were for transactional data processing. Progressive software vendors sought to overcome the limitations of data warehouses, cubes, and SSBIs and began working towards creating databases that were both flexible and able to process analytical workloads. These analytical databases, or column stores, were the next step in the trend to provide business analysts the tools and flexibility they need. These analytical databases have evolved into massively parallel processing (MPP) analytical databases that are more flexible and more performant than Cubes even in the cases where large amounts of data are being stored and queried. However, these analytical databases require that data be copied into them using processes very similar to the aforementioned ETL processes and have similar drawbacks. The load processes are typically slower than in traditional data warehouse based on row based technology because there is an extra step required
12 THE COMPLETE GUIDE TO DATA INTEGRATION to optimize the data for quick analytical retrieval. This extra step is required to convert the data from a row-based format into a columnar format and then apply field level data compression. Although these extra steps provide significant performance improvements, they also require additional time that delays the analysts ability to analyze the data. It is impossible to access realtime data in analytical databases due to this load time latency. ANALYTICAL DATABASES AT A GLANCE Scalable and able to deal with huge workloads Strong parallel processing High scalability + Slow load processes due to need for conversion from row- to column-based data and data compression No real-time data access Not agile Next came the data lake strategy. Data lakes are storage repositories able to hold a vast amount of raw data in its native format until needed. In many cases data lakes are Hadoopbased systems and they represent the next stage in both power and flexibility. A compelling benefit of the approach is that there is no need to structure (transform) the data before querying it (which would be referred to as schema on write ). In fact, you can assign structure to the data at the time it is being queried (referred to as schema on read ). However, while data lakes are able to hold large amounts of unstructured data in a costeffective manner, they are insufficient for interactive analysis when fast query response is required or if access to real-time data is needed.
13 THE COMPLETE GUIDE TO DATA INTEGRATION The proliferation of data lakes enables the switch from ETL to ELT (extract, load, and transform). Unlike ETL where data is transformed before it s loaded into the database, ELT significantly accelerates load time by ingesting data in its raw state. The rationale behind this approach was that data lakes storage technologies are not picky about the structure of the data. Therefore, no development time is required to transform the data into the right structure before it can be accessed for analytics. This means that all data could be simply parked or dumped into a data lake, and all further operations and transformations could occur within this database if and when needed. While it is a tantalizing approach, the data lake falls short of expectations for several reasons. A primary objective of the data lake is to simplify and accelerate, however the approach often complicates matters with extra steps to prepare data for analytics, and although it provides significant reductions in labor for data loads it still requires that all data be moved or copied to a single location prior to accessibility for analytical purposes. This drawback is shared with the traditional data warehouse using ETL approach since data load latency cannot be eliminated from the analytical data supply chain although the load time latency is greatly reduced for the data lake as compared to a data warehouse. Another disadvantage to the data lake is a phenomenon that has come to be known as the data swamp or data graveyard. The data lake approach often leads to dumping and storing much more data as compared to ETL because of lower cost of storage, but the save everything approach leads to loading and storing much more data than businesses are prepared to analyze. Since any data load takes time and consumes disk space and network bandwidth, unnecessary loads can be expensive and cause additional latency that delays other more analytically valuable data from being analyzed in a timely manner.
14 THE COMPLETE GUIDE TO DATA INTEGRATION Although data lakes and ELT bring data together into one place quickly they cannot provide fast query response as analytical databases do, nor can they provide access to data in real-time. DATA LAKES AND ELT AT A GLANCE hold vast amounts of unstructured data no need to structure data before querying it efficient data load + no real-time analysis possible data needs to be moved to single location before analysis low costs encourage data graveyards which decrease performance and increase costs Looking back at both the traditional data warehouses and the data lakes, one commonality they share is that they rely on having all data in a physical, central repository. The idea was that before you could work with it, you had to corral the data into a single location. However, this assumption has been a barrier to accelerating data accessibility and is what is fundamentally wrong with all of the approaches previously discussed.
15 THE COMPLETE GUIDE TO DATA INTEGRATION While the majority of data analysts were busy exploring the progression from relational databases to Cubes, analytic databases, and data lakes, another camp was looking into using data federation to integrate data for analysis. Data federation allows analysts to instantly run queries joining multiple disparate databases without the need to copy or move data from the original operational sources to a central analytical repository. This approach is clearly a significant improvement on all of its predecessors regarding the immediacy at which data can be analyzed. While the idea is sound and value is self-evident, data federation alone isn t scalable for large amounts of data or for large numbers of simultaneous users. In addition, because it relies heavily on the speed and stability of the source systems and network, its performance is commonly diminished for both data analysis and production operations. So, while data federation is quick and flexible, in itself it is not scalable or particularly dependable. But, it was an important step in the right direction. The next stage of evolution was to combine data federation with caching repositories to address these issues. This hybrid approach used big data solutions to complement data warehousing. The result is a combination of repositories, virtualization, and distributed processes for data management that delivers the best capabilities from several technologies but still falls short of the expectation for a robust, agile, performant data warehouse. Caching can be problematic due to the need to schedule cache loads around performance concerns of source systems and that the cache is loaded into a single repository that may or may not be optimized for different data sets and/or data types.
16 THE COMPLETE GUIDE TO DATA INTEGRATION Still, in moving closer to modern data warehouses, virtual data technology is essential from simple federation to virtualization, as well as virtual views, indices, and semantics. Developing virtual or logical data views is faster than relocating all data physically and can be done with ease through point and click operations. In addition, virtual views can be altered without the need to transform and reload data, as in earlier data warehouse integration approaches meaning the changes can be presented live immediately, without waiting for the data to populate through an overnight process. It is the virtualization of data integration that enables extreme agility in analytical development and significantly reduces build times and costs, all of which leads us to the next breakthrough in data warehousing. DATA FEDERATION AT A GLANCE + joining databases in a central repositiory without need to copy them very fast data access flexible change of virtual views virtual data integration enables extreme agility and reduces buildtimes/costs limited scalability (e.g. many simultaneous users) caching repositories cause performance problems
17 THE COMPLETE GUIDE TO DATA INTEGRATION The First Logical A modern data integration strategy employs what s known as best-fit engineering, whereby each part of the data management infrastructure utilizes the most appropriate technology solution to perform its role, including storing data determined by business requirements and service-level agreements (SLAs). Unlike a data lake, this new architecture has a distributed approach, aligning information storage selection, with information use, and leveraging multiple data technologies that are fit for specific purposes. A hybrid approach can also significantly reduce costs and time to delivery when changes or additions in the warehouse are required. One term for this new architecture is logical data warehouse. Another is virtual data lake. In either case, the premise is that there is no single data repository. Instead, the logical data warehouse is an ecosystem of multiple, fit-for-purpose, repositories, technologies, and tools that interact synergistically to manage data storage and provide performant enterprise analytical capabilities. The original unmet analytical requirements of the traditional data warehouse were to be able to retrieve data using a single query language, get speedy query response, and to have the ability to quickly assemble different data models or views of the data to meet specific needs. By combining data federation, physical data integration, and a common query language (SQL), the logical data warehouse approach achieves all three of these goals without the need to copy or move all of the data to a central location.
18 THE COMPLETE GUIDE TO DATA INTEGRATION Physical data integration is a robust feature of the logical data warehouse that ensures fast query response while decoupling performance from the source data stores and moving it to the logical data warehouse repository. In this manner, the effort-intensive, physical transfer of the data is minimized and simplified, effectively removing lengthy data movement delays from the critical path of data integration projects. In Understanding : The Emerging Practice, Gartner weighed in on this approach, pointing out that it offers flexibility for companies that have different data requirements at different times. For example, many use cases require a central repository, such as a traditional data warehouse or analytic database, where data that is needed frequently, or with the greatest retrieval speed can be stored and optimized for performance. Increasingly, data analysts must be able to explore data freely with guaranteed adequate query performance. Frequent uses cases along these lines are sentiment analysis or fraud detection analysis. These use cases require a distributed technology such as Hadoop to store the massive amounts of data available through social media feeds and click stream activity logs. Additionally, they demand direct access to data sources via data federation. As Gartner rightly indicates, a logical layer is needed on top of these technologies in order to unify the architecture and allow queries and processes to operate on all systems concurrently as needed.
19 THE COMPLETE GUIDE TO DATA INTEGRATION
20 THE COMPLETE GUIDE TO DATA INTEGRATION As the first logical data warehouse, Data Virtuality provides this uniform layer over numerous data storage technologies, unifying these data stores and facilitating the use cases suggested above by Gartner. By routing queries among data stores behind the scenes as needed, the Data Virtuality technology offers great benefits to business users. The business can use the same platform for handling a variety of use cases, for example, far more than could be handled by a traditional data warehouse. Also, new approaches to data integration are possible, enabling users to put business needs first and allow the technology platform to adapt as needed. By decoupling the semantic unified data access layer in which the business users interact from the actual data sources, changes occurring in the original data source can be isolated from interfering with analytical processes. In a profound departure from past data accessibility strategies, business users can interact with data comfortably and easily, focusing on their objective rather than the technological underpinnings. By consolidating relational and non-relational data sources, including real-time data, Data Virtuality enables immediate analysis via SQL query language. Data Virtuality provides a central data cockpit, allowing all data sources, whether analytical or operational, to freely interchange data. Integrated connectors allow data to be immediately processed in analysis, planning, or statistics tools, or written back to source systems as needed. In addition, the logical data warehouse automatically adjusts to changes in the IT landscape and user behavior, offering the highest possible degree of flexibility and speed, with little administrative overhead. In a logical data warehouse project, a few clicks can seamlessly connect all data-producing and data-processing systems, including ERP and CRM systems, web shops, social media applications, and just about any SQL and No-SQL data source,
21 THE COMPLETE GUIDE TO DATA INTEGRATION all in real time. With instant access to the data, users can begin experimenting with these connections and joins until they achieve the results they want. In stark contrast to traditional ETL solutions, the key difference with the logical data warehouse is that there s no need to move the data to analyze it. This greatly reduces development and database structuring time and costs. Equally flexible and responsive, the logical data warehouse is a completely different data integration paradigm than the inflexible traditional data warehouse approach. The logical data warehouse works by intelligently marrying two distinct technologies to create an entirely new manner of integrating data. The first technology is data federation, which connects two or more disparate databases and makes them all appear as if they were a single database. The second is analytical database management providing semantic business-friendly data element naming and modeling allowing flexible ingestion and modeling options. The results are profound. Data federation alone offers flexibility, but can t scale. Analytical database management scales beautifully, but is inflexible. The combination of the two enables breakaway flexibility and performance and represents an entirely new paradigm in the way we think, manage, and work with data. For example, a logical data warehouse can connect to a variety of data sources simultaneously, including classic relational databases like Oracle and MS-SQL; No-SQL databases like MongoDB or Hadoop; column stores like Vertica or SAP HANA; or web services like Google Analytics, AdWords, Facebook, Twitter, and others. Once these have been connected, the resulting integrated overarching view of the data appears within a data analysis tool as if everything was contained in a single SQL
22 THE COMPLETE GUIDE TO DATA INTEGRATION database, accessible with a common query language. Virtually any data analysis tool currently on the market (such as Qlik, Tableau, Aqua Data, etc.) can connect, query, and analyze data over the virtual layer with no need to pull or copy data from any location. The method offers vast new opportunities and possibilities for data exploration, data discovery, rapid prototyping, and intuitive experimentation. Business users can get results instantly and can refactor data models just as quickly. Further, building logical data views as shareable components including common KPIs and metrics can ensure that every report, every visualization, and every query response conforms to the same corporate standards and definitions. Data Virtuality acts as a central hub connecting all systems and applications within the enterprise, enabling data exchange between systems, and ensuring the latest data is available anywhere and anytime. LOGICAL DATAWAREHOUSE AT A GLANCE + consolidation of structure, unstructured and real-time data by combination of data federations and analytical database management no need to move data for analysis immediate processing (analysis) or writing back in data sources central hub connectiong all systems and applications within the enterprise secures latest data everywhere at any time needs at least 10 different data sources to show full efficiency no integrated analytical tool
23 THE COMPLETE GUIDE TO DATA INTEGRATION A MODERN DATA WAREHOUSE The logical data warehouse is essential for organizations that wish to combine big data and data warehousing in the enterprise. A VIRTUAL DATA MART A logical data warehouse makes it easy to create a virtual data mart for expediency. By combining an organization s primary data infrastructure with auxiliary data sources relevant to specific, data-driven business units, initiatives can move forward more quickly than if data would need to be on-boarded to a traditional data warehouse. AN EVOLVING CORPORATION Modern data integration allows rapidly changing organizations to quickly combine data from disparate business units and provide BI & analytical transparency to top management. This kind of flexibility is crucial for strategic changes, mergers and acquisitions, and other sensitive operations where there s no time to waste building a central data warehouse. E-COMMERCE Modern data integration offers a compelling solution for e-commerce and retail organizations with a great number of different systems in the IT landscape. For example, a typical e-commerce business has an ERP system, CRM, web and mobile apps, analytics programs, online marketing, social media marketing, and other tools. With a logical data warehouse all of these data sources can be joined quickly and flexibly to provide 360 degree views of customers, products, etc.
24 THE COMPLETE GUIDE TO DATA INTEGRATION DIGITAL MARKETING Digital marketing is extremely data-driven, relying on the volatile flow of real-time data. A logical data warehouse offers the only viable way to manage complexity of this kind, easily connecting to a host of digital marketing data providers for affiliate marketing, performance marketing, personalization, and other approaches. MAKING DATA ACTIONABLE Modern data integration methods go the extra mile by making data actionable. In addition to receiving the data in one direction for analysis, a user can return data, or essentially trigger actions based on the data. For example, the solution can analyze data from ERP, CRM, and a web shop simultaneously to trigger marketing campaigns unconstrained by traditional business hours. REAL-TIME ANALYSIS The logical data warehouse excels at manipulating real-time data and can flexibly model and re-model the data to fit the latest analytical initiatives. INTEGRATING BIG DATA The open-source, big data solution Hadoop, is adept at analyzing unstructured data and performing batch analysis, but performs poorly in interactive situations. To achieve real-time functionality, companies must combine the traditional data warehouse with modern big data tools, and often multiple ones, such as an Oracle warehouse with Hadoop and Greenplum. Unifying these data sources into one common view provides instant access to a 360 degree view of your organization.
25 THE COMPLETE GUIDE TO DATA INTEGRATION In this digital era, harnessing large amounts of data to make astute business decisions and improve operations is an imperative. While our ability to generate data still far outstrips our ability to effectively analyze it, we are making great progress in balancing these out. Exciting new approaches are merging big data solutions with traditional enterprise data strategies. Without the need for a central repository, logical data warehouses hold enormous promise. By offering an ecosystem of multiple, best-fit repositories, technologies, and tools, businesses can now effectively and rapidly analyze realtime data in pursuit of valuable insight. For organizations sifting through reams of data for treasure, these virtual data lakes represent the Holy Grail that can help them tailor products and fulfill desires we haven t yet dreamed of.
26 THE COMPLETE GUIDE TO DATA INTEGRATION work together to offer impressive in-memory data processing for big data applications. Although there has been hope that the in-memory capabilities of Spark would solve many of the latency issues related to Hadoop, both technologies have limitations and fall short of a one-size fits all solution. Apache Hadoop is an open-source software framework providing distributed storage and processing of very large data sets data sets so large that it would not be economical to store them in most any other data storage technology. Hadoop accomplishes this by using a multiple server, clustering approach that removes many earlier constraints regarding the storing and processing of large data sets. To process data, Hadoop s MapReduce function abandons the convention of moving the data over a network to the application server for processing. Instead of moving data to the application server, MapReduce analyzes data on the individual servers and then compiles the results from the individual servers into a single response to the query. Hadoop itself is not a single system, but rather an ecosystem of numerous interconnected products that allows users to run various types of analytics and operations on any type of data. Hadoop is an open source system so it is constantly evolving and improving. While Hadoop is complex to use, startups and established companies alike are quickly creating tools to simplify and expand the use of Hadoop. For example, executing queries within the Hadoop ecosystem originally required extensive knowledge of new and lesser known programming languages such as map-reduce, pig, and python. The results of this custom coding was that these queries could be performed on data types previously impossible to query such as unstructured data, but at the cost of there being fewer programmers available to write
27 THE COMPLETE GUIDE TO DATA INTEGRATION and run these queries. Currently, however, there are numerous products available that allow using the very popular SQL query language to analyze data stored in Hadoop. Classic Hadoop is in itself batch-oriented and as such, is capable of analyzing vast amounts of data with relative ease by distributing the work across a number of different Hadoop nodes that act in parallel to provide the results. However, analyzing smaller amounts of data requires just as much complexity and programming as the processing of large data sets so overall it is a rather slow method to query data. and related technologies are making an effort to improve Hadoop query performance by adding a fast, in-memory, data processing engine with development APIs. The objective is that technologies such as these will eventually allow data workers to execute streaming, machine learning, or SQL workloads on Hadoop in a timely manner and with less custom coding. While almost any analytical task can be undertaken with Hadoop, including analysis of very large amounts of data like fraud and sentiment analysis, overall, it remains a relatively immature technology whose ecosystem is not yet fully integrated and requires custom coding at several junctures for complete functionality. Because it s highly technical and difficult to use, most often success within Hadoop comes in the form of an inexpensive data archive.
28 THE COMPLETE GUIDE TO DATA INTEGRATION Data Virtuality GmbH develops and distributes the software DataVirtuality, which affords companies an especially simple means of integrating and connecting a variety of data and applications. The solution is revolutionizing the technological concept of data virtualization and generates a data warehouse consisting of relational and non-relational data sources in just a few days. Using integrated connectors, the data can be immediately processed in analysis, planning or statistics tools or written back to source systems as needed. The data warehouse also automatically adjusts to changes in IT landscape and user behavior, which lends companies using DataVirtuality the highest possible degree of flexibility and swiftness with minimum administrative overhead. Founded in 2012, the Leipzig and San Francisco-based company originated from a research initiative of the Chair of Information Technology at the Universität Leipzig and is financed by Technologiegründerfonds Sachsen (TGFS) and High-Tech Gründerfonds (HTGF). COMPANY CONTACT: Nick Golovin, Ph.D. Founder and CEO Data Virtuality GmbH phone: nick.golovin@datavirtuality.com
How to integrate data into Tableau
1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service
More informationFast Innovation requires Fast IT
Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:
More informationStrategic Briefing Paper Big Data
Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which
More informationDrawing the Big Picture
Drawing the Big Picture Multi-Platform Data Architectures, Queries, and Analytics Philip Russom TDWI Research Director for Data Management August 26, 2015 Sponsor 2 Speakers Philip Russom TDWI Research
More informationShine a Light on Dark Data with Vertica Flex Tables
White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,
More informationHybrid Data Platform
UniConnect-Powered Data Aggregation Across Enterprise Data Warehouses and Big Data Storage Platforms A Percipient Technology White Paper Author: Ai Meun Lim Chief Product Officer Updated Aug 2017 2017,
More informationComposite Software Data Virtualization The Five Most Popular Uses of Data Virtualization
Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization Composite Software, Inc. June 2011 TABLE OF CONTENTS INTRODUCTION... 3 DATA FEDERATION... 4 PROBLEM DATA CONSOLIDATION
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationIntroduction to Big-Data
Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,
More informationProgress DataDirect For Business Intelligence And Analytics Vendors
Progress DataDirect For Business Intelligence And Analytics Vendors DATA SHEET FEATURES: Direction connection to a variety of SaaS and on-premises data sources via Progress DataDirect Hybrid Data Pipeline
More informationHierarchy of knowledge BIG DATA 9/7/2017. Architecture
BIG DATA Architecture Hierarchy of knowledge Data: Element (fact, figure, etc.) which is basic information that can be to be based on decisions, reasoning, research and which is treated by the human or
More informationTHE RISE OF. The Disruptive Data Warehouse
THE RISE OF The Disruptive Data Warehouse CONTENTS What Is the Disruptive Data Warehouse? 1 Old School Query a single database The data warehouse is for business intelligence The data warehouse is based
More informationXcelerated Business Insights (xbi): Going beyond business intelligence to drive information value
KNOWLEDGENT INSIGHTS volume 1 no. 5 October 7, 2011 Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value Today s growing commercial, operational and regulatory
More informationWhen, Where & Why to Use NoSQL?
When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),
More informationData Analytics at Logitech Snowflake + Tableau = #Winning
Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief
More informationUNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX
UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their
More informationThe #1 Key to Removing the Chaos. in Modern Analytical Environments
October/2018 Advanced Data Lineage: The #1 Key to Removing the Chaos in Modern Analytical Environments Claudia Imhoff, Ph.D. Sponsored By: Table of Contents Executive Summary... 1 Data Lineage Introduction...
More informationThe Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization
The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization May 2014 Prepared by: Zeus Kerravala The Top Five Reasons to Deploy Software-Defined Networks and Network Functions
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationLow Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR
Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR Table of Contents Foreword... 2 New Era of Rapid Data Warehousing... 3 Eliminating Slow Reporting and Analytics Pains... 3 Applying 20 Years
More informationMicrosoft Analytics Platform System (APS)
Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual
More informationIntroduction to Data Science
UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics
More informationCapture Business Opportunities from Systems of Record and Systems of Innovation
Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information
More informationHow to Evaluate a Next Generation Mobile Platform
How to Evaluate a Next Generation Mobile Platform appcelerator.com Introduction Enterprises know that mobility presents an unprecedented opportunity to transform businesses and build towards competitive
More informationFINANCIAL REGULATORY REPORTING ACROSS AN EVOLVING SCHEMA
FINANCIAL REGULATORY REPORTING ACROSS AN EVOLVING SCHEMA MODELDR & MARKLOGIC - DATA POINT MODELING MARKLOGIC WHITE PAPER JUNE 2015 CHRIS ATKINSON Contents Regulatory Satisfaction is Increasingly Difficult
More informationHow to Accelerate Merger and Acquisition Synergies
How to Accelerate Merger and Acquisition Synergies MERGER AND ACQUISITION CHALLENGES Mergers and acquisitions (M&A) occur frequently in today s business environment; $3 trillion in 2017 alone. 1 M&A enables
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationSQL Maestro and the ELT Paradigm Shift
SQL Maestro and the ELT Paradigm Shift Abstract ELT extract, load, and transform is replacing ETL (extract, transform, load) as the usual method of populating data warehouses. Modern data warehouse appliances
More informationImproving the ROI of Your Data Warehouse
Improving the ROI of Your Data Warehouse Many organizations are struggling with a straightforward but challenging problem: their data warehouse can t affordably house all of their data and simultaneously
More informationCloud Computing: Making the Right Choice for Your Organization
Cloud Computing: Making the Right Choice for Your Organization A decade ago, cloud computing was on the leading edge. Now, 95 percent of businesses use cloud technology, and Gartner says that by 2020,
More informationJAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.
JAVASCRIPT CHARTING Scaling for the Enterprise with Metric Insights 2013 Copyright Metric insights, Inc. A REVOLUTION IS HAPPENING... 3! Challenges... 3! Borrowing From The Enterprise BI Stack... 4! Visualization
More informationEvolving To The Big Data Warehouse
Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from
More informationAzure Data Factory. Data Integration in the Cloud
Azure Data Factory Data Integration in the Cloud 2018 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and
More informationIntroduction to K2View Fabric
Introduction to K2View Fabric 1 Introduction to K2View Fabric Overview In every industry, the amount of data being created and consumed on a daily basis is growing exponentially. Enterprises are struggling
More informationOptimizing Apache Spark with Memory1. July Page 1 of 14
Optimizing Apache Spark with Memory1 July 2016 Page 1 of 14 Abstract The prevalence of Big Data is driving increasing demand for real -time analysis and insight. Big data processing platforms, like Apache
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationAccelerating BI on Hadoop: Full-Scan, Cubes or Indexes?
White Paper Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? How to Accelerate BI on Hadoop: Cubes or Indexes? Why not both? 1 +1(844)384-3844 INFO@JETHRO.IO Overview Organizations are storing more
More informationAppliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1
Appliances and DW Architecture John O Brien President and Executive Architect Zukeran Technologies 1 OBJECTIVES To define an appliance Understand critical components of a DW appliance Learn how DW appliances
More informationWhy Converged Infrastructure?
Why Converged Infrastructure? Three reasons to consider converged infrastructure for your organization Converged infrastructure isn t just a passing trend. It s here to stay. A recent survey 1 by IDG Research
More informationCombine Native SQL Flexibility with SAP HANA Platform Performance and Tools
SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been
More informationThe Data Explosion. A Guide to Oracle s Data-Management Cloud Services
The Data Explosion A Guide to Oracle s Data-Management Cloud Services More Data, More Data Everyone knows about the data explosion. 1 And the challenges it presents to businesses large and small. No wonder,
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationQ1) Describe business intelligence system development phases? (6 marks)
BUISINESS ANALYTICS AND INTELLIGENCE SOLVED QUESTIONS Q1) Describe business intelligence system development phases? (6 marks) The 4 phases of BI system development are as follow: Analysis phase Design
More informationPERSPECTIVE. Data Virtualization A Potential Antidote for Big Data Growing Pains. Abstract
PERSPECTIVE Data Virtualization A Potential Antidote for Big Data Growing Pains Abstract Enterprises are already facing challenges around data consolidation, heterogeneity, quality, and value. Now they
More informationBig Data Integration BIG DATA 9/15/2017. Business Performance
BIG DATA Business Performance Big Data Integration Big data is often about doing things that weren t widely possible because the technology was not advanced enough or the cost of doing so was prohibitive.
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationPřehled novinek v SQL Server 2016
Přehled novinek v SQL Server 2016 Martin Rys, BI Competency Leader martin.rys@adastragrp.com https://www.linkedin.com/in/martinrys 20.4.2016 1 BI Competency development 2 Trends, modern data warehousing
More informationTaming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems
1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for
More informationIBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata
Research Report IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata Executive Summary The problem: how to analyze vast amounts of data (Big Data) most efficiently. The solution: the solution is threefold:
More informationEXPERIENCE THE POWER OF METADATA-DRIVEN AUTOMATION. Explore Features You Wish You Had DISCOVERY HUB.
EXPERIENCE THE POWER OF METADATA-DRIVEN AUTOMATION Explore Features You Wish You Had DISCOVERY HUB www.timextender.com GENERAL INTRODUCTION MARKET CHALLENGES: TIME TO DATA MATTERS The need for joint cooperation
More informationInformation empowerment for your evolving data ecosystem
Information empowerment for your evolving data ecosystem Highlights Enables better results for critical projects and key analytics initiatives Ensures the information is trusted, consistent and governed
More informationData 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.
Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020
More informationPreparing your network for the next wave of innovation
Preparing your network for the next wave of innovation The future is exciting. Ready? 2 Executive brief For modern businesses, every day brings fresh challenges and opportunities. You must be able to adapt
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationCaché and Data Management in the Financial Services Industry
Caché and Data Management in the Financial Services Industry Executive Overview One way financial services firms can improve their operational efficiency is to revamp their data management infrastructure.
More informationELTMaestro for Spark: Data integration on clusters
Introduction Spark represents an important milestone in the effort to make computing on clusters practical and generally available. Hadoop / MapReduce, introduced the early 2000s, allows clusters to be
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationPostgres Plus and JBoss
Postgres Plus and JBoss A New Division of Labor for New Enterprise Applications An EnterpriseDB White Paper for DBAs, Application Developers, and Enterprise Architects October 2008 Postgres Plus and JBoss:
More informationVirtualizing the SAP Infrastructure through Grid Technology. WHITE PAPER March 2007
Virtualizing the SAP Infrastructure through Grid Technology WHITE PAPER March 2007 TABLE OF CONTENTS TABLE OF CONTENTS 2 Introduction 3 The Complexity of the SAP Landscape 3 Specific Pain Areas 4 Virtualizing
More informationPartner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g
Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g Vlamis Software Solutions, Inc. Founded in 1992 in Kansas City, Missouri Oracle Partner and reseller since 1995 Specializes
More informationPart 1: Indexes for Big Data
JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,
More informationExtending the Value of MDM Through Data Virtualization
Extending the Value of MDM Through Data Virtualization Perspective on how data virtualization adds business value to MDM implementations Audience Business Stakeholders Line of Business Managers Enterprise
More informationProtecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery
White Paper Business Continuity Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery Table of Contents Executive Summary... 1 Key Facts About
More informationBuilding a Data Strategy for a Digital World
Building a Data Strategy for a Digital World Jason Hunter, CTO, APAC Data Challenge: Pushing the Limits of What's Possible The Art of the Possible Multiple Government Agencies Data Hub 100 s of Service
More informationAbstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight
ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group
More informationBuilt for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations
Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations Table of contents Faster Visualizations from Data Warehouses 3 The Plan 4 The Criteria 4 Learning
More informationHyper-Converged Infrastructure: Providing New Opportunities for Improved Availability
Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability IT teams in companies of all sizes face constant pressure to meet the Availability requirements of today s Always-On
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationSAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC
SAP Agile Data Preparation Simplify the Way You Shape Data Introduction SAP Agile Data Preparation Overview Video SAP Agile Data Preparation is a self-service data preparation application providing data
More informationIOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK
IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK DR. KONSTANTIN BOUDNIK DR.KONSTANTIN BOUDNIK EPAM SYSTEMS CHIEF TECHNOLOGIST BIGDATA, OPEN SOURCE
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More information12 Minute Guide to Archival Search
X1 Technologies, Inc. 130 W. Union Street Pasadena, CA 91103 phone: 626.585.6900 fax: 626.535.2701 www.x1.com June 2008 Foreword Too many whitepapers spend too much time building up to the meat of the
More informationData 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.
17-18 March, 2018 Beijing Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020 Today, 80% of organizations
More informationAccelerate Your Enterprise Private Cloud Initiative
Cisco Cloud Comprehensive, enterprise cloud enablement services help you realize a secure, agile, and highly automated infrastructure-as-a-service (IaaS) environment for cost-effective, rapid IT service
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationReal Time for Big Data: The Next Age of Data Management. Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104
Real Time for Big Data: The Next Age of Data Management Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104 Real Time for Big Data The Next Age of Data Management Introduction
More informationThe Evolution of Big Data Platforms and Data Science
IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationDatabricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes
Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes AN UNDER THE HOOD LOOK Databricks Delta, a component of the Databricks Unified Analytics Platform*, is a unified
More informationNew Approaches to Big Data Processing and Analytics
New Approaches to Big Data Processing and Analytics Contributing authors: David Floyer, David Vellante Original publication date: February 12, 2013 There are number of approaches to processing and analyzing
More informationFull file at
Chapter 2 Data Warehousing True-False Questions 1. A real-time, enterprise-level data warehouse combined with a strategy for its use in decision support can leverage data to provide massive financial benefits
More informationOracle Big Data SQL brings SQL and Performance to Hadoop
Oracle Big Data SQL brings SQL and Performance to Hadoop Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data SQL, Hadoop, Big Data Appliance, SQL, Oracle, Performance, Smart Scan Introduction
More informationFIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION
FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION The process of planning and executing SQL Server migrations can be complex and risk-prone. This is a case where the right approach and
More informationWhite Paper: Delivering Enterprise Web Applications on the Curl Platform
White Paper: Delivering Enterprise Web Applications on the Curl Platform Table of Contents Table of Contents Executive Summary... 1 Introduction... 2 Background... 2 Challenges... 2 The Curl Solution...
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationFour Steps to Unleashing The Full Potential of Your Database
Four Steps to Unleashing The Full Potential of Your Database This insightful technical guide offers recommendations on selecting a platform that helps unleash the performance of your database. What s the
More informationMODERNIZE INFRASTRUCTURE
SOLUTION OVERVIEW MODERNIZE INFRASTRUCTURE Support Digital Evolution in the Multi-Cloud Era Agility and Innovation Are Top of Mind for IT As digital transformation gains momentum, it s making every business
More informationE-Guide DATABASE DESIGN HAS EVERYTHING TO DO WITH PERFORMANCE
E-Guide DATABASE DESIGN HAS EVERYTHING TO DO WITH PERFORMANCE D atabase performance can be sensitive to the adjustments you make to design. In this e-guide, discover the affects database performance data
More informationBig Data The end of Data Warehousing?
Big Data The end of Data Warehousing? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Big data, data warehousing, advanced analytics, Hadoop, unstructured data Introduction If there was an Unwort
More informationData Lake Based Systems that Work
Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a
More informationEnterprise Data Architecture: Why, What and How
Tutorials, G. James, T. Friedman Research Note 3 February 2003 Enterprise Data Architecture: Why, What and How The goal of data architecture is to introduce structure, control and consistency to the fragmented
More informationBig Data Specialized Studies
Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate
More informationMAPR DATA GOVERNANCE WITHOUT COMPROMISE
MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance
More informationThe Business Value of Metadata for Data Governance: The Challenge of Integrating Packaged Applications
The Business Value of Metadata for Data Governance: The Challenge of Integrating Packaged Applications By Donna Burbank Managing Director, Global Data Strategy, Ltd www.globaldatastrategy.com Sponsored
More informationLambda Architecture for Batch and Stream Processing. October 2018
Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.
More informationAn Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)
An Oracle White Paper June 2011 (EHCC) Introduction... 3 : Technology Overview... 4 Warehouse Compression... 6 Archive Compression... 7 Conclusion... 9 Introduction enables the highest levels of data compression
More informationAnswer: A Reference:http://www.vertica.com/wpcontent/uploads/2012/05/MicroStrategy_Vertica_12.p df(page 1, first para)
1 HP - HP2-N44 Selling HP Vertical Big Data Solutions QUESTION: 1 When is Vertica a better choice than SAP HANA? A. The customer wants a closed ecosystem for BI and analytics, and is unconcerned with support
More informationQLIKVIEW ARCHITECTURAL OVERVIEW
QLIKVIEW ARCHITECTURAL OVERVIEW A QlikView Technology White Paper Published: October, 2010 qlikview.com Table of Contents Making Sense of the QlikView Platform 3 Most BI Software Is Built on Old Technology
More informationMoving Technology Infrastructure into the Future: Value and Performance through Consolidation
Moving Technology Infrastructure into the Future: Value and Performance through Consolidation An AMI-Partners Business Benchmarking White Paper Sponsored by: HP Autumn Watters Ryan Brock January 2014 Introduction
More information