1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER
WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service and and data The era of big data is upon us, and with it the dawn of a new industrial revolution. The theoretical benefits of all of this data are tantalizing. But, that s just it. Until data can be unlocked, those benefits remain elusive. While data is more important than ever, it s also more complex, and there s more of it far more.
WHITE PAPER 3 data and In order to yield insight, data must be integrated. The goal of data is to gather data from a variety of different sources, combine it, and present it as a unified whole. However, synchronizing huge quantities of variable, heterogeneous data from disparate, incompatible sources across an enterprise poses significant s. While integrating data has never been easy, the difficulty is only increasing with the proliferation of data sources, types, and stores. In addition to structured data, enormous amounts of raw data are being captured. Much of this data, such as JSON documents or social media, have no schema at all. Combining all of this data to produce meaningful insights is no small task. At the same time, the pace of doing business has accelerated considerably. Business wants to integrate data into their dayto-day operations to help them make important decisions and increase profits. Competitive pressures and new sources of data are creating new requirements, and business users are demanding the ability to answer their questions quickly and easily. Business needs to know if an idea is viable immediately and expects IT to respond with a prototype that can be tweaked accordingly, in the moment. Slow, rigid systems are out of the question for these users and the IT teams that support them. In this blog series, we will explore data, with a special focus on integrating data into Tableau, an interactive data visualization product focused on business intelligence. When it comes to integrating data, business intelligence managers can choose from multiple data approaches. In blog 2 of our series, we ll explore three of them, with special attention paid to how they work with popular BI tool Tableau. Specifically, we ll focus on ETL, Tableau Data Blending, and logical data warehouse/tableau.
WHITE PAPER 4 data and ETL (AND THE TRADITIONAL DATA WAREHOUSE) Tried and true ETL (extract, transform, load) tools can be used to move large amounts of data in a batch-oriented manner. However, when it comes to getting value out of data, these tools pose significant s. Because they require comprehensive knowledge of each operational database or application involved and increasingly complex custom s, ETL-based projects tend to experience a high failure rate. Architecturally rigid in nature, even small changes trigger large and unpredictable impacts. To avoid this, great care must be taken to conceptualize the database and determine requirements. By the time business is able to see the results of the effort, months have passed, and requirements have changed. Business wants quick answers; they want to test an idea, cross it off if it fails, and move on to the next one. In addition, as IT transitions to the cloud, lack of visibility into the internals of cloud databases and applications make it virtually impossible to implement ETL-based s. Also, the transition to the cloud means greater value is placed on realtime updates, something primarily batch-oriented ETL tools cannot easily deliver. In the digital era, responsiveness is the name of the game, with new requirements arising faster than ever before. There s simply no time to read data from one system, copy it over a network, and write it into a new system. Repetitious, error-prone, timeconsuming, and expensive, ETL tools represent a serious bottleneck. It s not uncommon for IT teams to finish an ETL job only to find it s no longer necessary. With the emergence of application programming interfaces (APIs) and Software as a Service (SaaS), developers no longer have to start from scratch every time they write a program.
WHITE PAPER 5 data and Now they can contract our parts of the work to remote software that can do it better. Tableau Software produces a family of interactive data visualization products focused on business intelligence. Tableau allows you to extract data into Tableau s fast in-memory data engine where you can do ad-hoc visualization at interactive speeds. With this approach, you can query an extract of data without waiting for Hadoop s MapReduce queries to complete. A great strength of Tableau is its ability to connect to data sources directly. In this way, the business users are able to explore the data sources directly and provide value to business quickly. A powerful and popular tool, Tableau works very well as an in-house data-blending tool for smaller data sets. In the realm of truly big data, when data sets become unwieldy or diverse, however, the tool begins to falter. For example, with Tableau data-blending, it s not possible to perform different join operations as well as blend datasources with millions of records on each side. Also, the ability to store historical data, as in the case of tracking changes in data, is strongly limited when using Tableau extracts. At a certain threshold, it no longer makes sense to use Tableau alone. At that point, it becomes necessary to add a logical data warehouse tool such as. A logical data warehouse represents a new data management architecture for analytics which combines the strengths of traditional repository warehouses with alternative data management and access strategies. This new approach is made possible by the maturation of today s networks which are now
WHITE PAPER HOW TO INTEGRATE DATA INTO TABLEAU 6 data and sufficiently fast, reliable, and inter-operable. Logical data warehouse solutions usually involve advanced forms of data, including federation and virtualization, which are key to unifying multiple data ecosystems. The main advantage of the logical data warehouse is that virtual views can be altered without needed to first transform and reload data. View technologies go hand in hand with in-memory functions, and data can be created, processed, and delivered onthe fly, enabling purely semantic views of data structures. Lightly persisted data can be materialized into the view on an asneeded basis. In situations where data is time-sensitive, such as determining production yield on a shop floor, data virtualization techniques can produce results that are several seconds to a few minutes old. With optimization capabilities, execution speeds for queries can be increased ten-fold or more. Another important concept in the logical data warehouse is that of an in-memory data fabric that stretches around the technology stack and around key applications like finance, CRM, ERP, or a call center, for example. The data fabric provides a unified view or collection of views of data in multiple systems across an enterprise, or one look into the big picture. This makes it possible for BI managers to see into multiple databases, applications, and legacy platforms. As the interface layer, it s invisible to the user, who doesn t know if the data is persisted, fetched, materialized, or what not. This offers great scalability, flexibility, and speed for time-sensitive business practices like lean data management. Data Virtuality offers a logical data warehouse approach that allows organizations to keep the tools they currently have, abstract data from multiple sources, and create virtual views through a Web portal. This enables users to quickly query, share,
WHITE PAPER 7 data and and, most importantly, integrate data, whether it resides in flat files, web services, an Oracle database or on a SQL Server. DataVirtuality allows you to take the next step in data blending joining, correlating, and querying massive amounts of live data on the fly. A self-service business intelligence user can join new data and create new insights. Using a traditional ETL approach, a user would first need to have a clear data model in mind. In contrast, DataVirtuality allows you to test new approaches as you think of them, see different angles, and experiment as you go. Here, the direct access possibilities of both Tableau and DataVirtuality create a perfect combination: by using Tableau to access hundreds of data sources directly through DataVirtuality the data exploration possibilities rise a hitherto unknown degree. What really sets DataVirtuality apart, however, is the solution s ability to realize and remember the data being queried. For example, the server can remember that a certain join was used with postgres and Oracle in Tableau. As a result, the user can access that model from internal data storage with a single mouse click, as needed. This kind of immediacy is revolutionary. BI managers no longer have to create or populate models from scratch over and over again. In the past, if you wanted to try new combinations of data, you had to for plan it. You couldn t simply analyze your data immediately. You d need to have your IT department prepare the data for you. With, querying takes mere seconds, with the option to optimize execution speeds for results up to ten times faster than straight querying. However, data exploration is not the only aspect of analytics which benefits from the combination of and Tabeau. Another important aspect is the possibility to define centralized data models to be shared by all Tableau and Non-
WHITE PAPER 8 data and Tableau reports. Finally, businesses can define once and for all how their KPIs are calculated, so that all business users have a single source of truth. Using, a user can build a data model over completely diverse data sources and join the data on the fly. They can pull the data in from the virtual layer, build a new data model using Tableau s own query builder, and define relationships in and among the data sets. This kind of virtual data modeling is not possible for disparate data sources using Tableau alone. If a number of tables, for example, were being ingested from a single system like MySQL, then it would be possible. But when there are a number of different data sources, you need. This is a big data model; it goes further than simple data blending. with big data insights derived from Tableau and As we ve seen, organizations have a historical opportunity to mine big data to transform their business. Data can reveal new business opportunities and dramatically reduce costs. Modern solutions based on the logical data warehouse can give organizations a significant leg up in the race to gain insight and real business advantage from their data. In the past, serious expense and time was needed to upgrade existing BI infrastructures. Today, this is not the case. Datavirtuality automatically builds most of the database structures, creating a virtual layer around all data sources that allows users to start experimenting with the data immediately. The system then observes how the users work with the data and automatically arranges the data structures in the fastest way based on usage patterns. The system also allows users to create
WHITE PAPER 9 data and their own data models and manage them centrally. As an platform, requires a front-end tool such as Excel, Tableau, Looker or QlikView to visualize the data. All common front-end solutions currently on the market can be connected to. The beauty of lies in its flexible nature. It works best when it has direct access to data sources and APIs. Using and Tableau together transforms requirements gathering. What was once a painful, error-prone, and arduous process becomes a data exploration exercise with data profiling tools converging and merging with each other. Users gather requirements by looking at data through a logical layer in a virtual fashion, putting it together in minutes to show the business lead looking over their shoulder. The logical data warehouse enables us to think differently about data and development methods and employ agile development. By enveloping the data of the source system with a virtual view, presents data of all source systems in what appears to be a single large, relational database, which can be handled in a unified way. With user requests translated transparently into queries to the diverse source systems, analyzes user behavior continually and automatically builds an internal data warehouse, ready with answer at the click of a mouse. SOURCES http://www.data.info/data- http://computer.howstuffworks.com/data-4.htm