Data Preparation for Enhancing Modern BI: Common Design Patterns
|
|
- Bruno Jefferson
- 5 years ago
- Views:
Transcription
1 Data Preparation for Enhancing Modern BI: Common Design Patterns
2 Introduction If you want to be successful in analyzing big data to mine it for insights that will move the needle for your business, then the data you re analyzing needs to be clean, organized, detailed and well-understood. Raw data lacking the necessary pedigree for analysis can be fixed and shaped the process of doing so is often referred to as data preparation. In practice, data preparation involves several subtasks. The most common ones are data cleansing and data transformation. Yet there are other subtasks, such as looking inside the data for other data of interest. Enriching data by joining it to other data sets, whether from internal corporate systems or external, publicly available data sets, is another way to discover the data hidden within the apparent data. There is a large laundry list of use cases for data preparation potentially as large as the candidate data sources multiplied by the ways in which the data will be analyzed. Many drive very unique requirements. For example: You re using a processing series of texts and wish to do lexical analysis to understand how people of different eras exhibited their attitudes through vocabulary. The analysis involves social media posts including status messages, tweets, comments and other responses to gauge mood and sentiment, or for mentions of a specific person or company or theme. You re applying geographical or geospatial content to perform garden-variety address matching and address correction, or to look at the number of customers in a geographical area defined by physical radius, drive time, ZIP code, congressional district or other mapped region. You want to look at purchasing patterns overtime, break your customers down into demographic groupings, and correlate shopping activity at brick and mortar locations with weather information. No matter what your analytic use case is, taking the data you have and changing it to the data you need is a critical step in the analysis. PAGE 2
3 Sentiment Analysis Name, Company Mentions Fraud Detection Data Preparation Weather Impact Geographical Slicing Figure 1: Data Preparation Driving Various Analysis Use Cases Click Stream Processing Data Cleansing As the need for analysis created the desire to take data offline and explore it in spreadsheet and desktop database applications, most datasets came in as extracts from transactional systems. The data in the originating systems were not meant for analysis and data discovery, and because of that, the data wasn t always so clean. When all a system needs to do is capture enough information to get a transaction out, the integrity of ancillary data may suffer. For example: Duplicate customer records (with slightly different spellings) were pretty common Field names were often a combination of capital letters and underscores, and may also have been very short, making the extracts far from self-documenting Extraneous characters could enter pretty much anywhere, and go uncorrected Even the crucial data, such as line item amounts, could be incorrect, due to data entry errors, illegible source documents, or other sources of error Because of this, data cleansing became a common step prior to analysis. Features such as de-duping, address correction, field renaming and data value anomaly detection PAGE 3
4 have become common place. Performing these tasks easily, quickly and with as much automated assistance as possible, is an important aspect of data preparation, and ETL, data warehousing and OLAP readily address data preparation requirements for cleansing. These tools work on the notion of combining a fact table (containing important numerical data) with several dimension tables (which provide drill-down categories). These tables come directly from operational databases they involve simple denormalizing of transactional tables, along with some level of aggregation. Data warehouses and OLAP subsist on data in that format and ETL is there to automate getting transactional data into it. Beyond Data Cleansing However, cleaning the data is a small part of the battle, and even a perfectly clean data set is often not ready for analysis. It takes strategy and experience to start with data in its raw format and pre-process it to work well for analysis. Typically, the data that leads to insights requires some interpretation and transformation of the raw data. This isn t necessarily a needle-in-a-haystack search it s more like an archeological dig, where you ll need to continually dust things off to get a better look. Big data analytics in particular involves a different process data discovery in which data is pulled from multiple sources, some of which are not transactional, then sifting through it until useable data can be found. That takes some constructivist thinking. Data Preparation is More Than Just Cleansing A relatively new set of products collectively characterized as self-service data prep tools has emerged in the market. These products are geared toward helping users of self-service BI tools fill a functionality gap to convert more complex data sets which are often very raw and untreated into organized formats the self-service BI tools can consume. However, with big data analytics there are a large number of use cases that have an expanded set of requirements that scale beyond what self-service data prep tools cover. Standard data transformation, cleansing and joins are a general part of data preparation, but there are additional needs such as: Looking inside the data for other data Very precise parsing Pre-aggregation of data, often in complex ways Shredding of hierarchical information into individual rows of data, or vice-versa Mining unformatted text for specific pieces of information Inserting value-added calculated or algorithmically derived fields PAGE 4
5 Preparation Is Integral to the Workflow It is important to also think outside the functionality scope and understand the role of data preparation in the larger analytic workflow. Keep in mind that the word preparation is somewhat a misnomer, as it implies that the data prep process takes place as an early step and from which one moves on. In big data discovery, data preparation may take place at any point in an iterative data discovery process. After analysis, the data discovery may reveal flaws in the data or the need to add new data, revealing new requirements for how the data should be shaped, interpreted, enhanced or cleansed. The diversity and amount of data available today is unprecedentedly large, creating data preparation requirements for big data analytics that expand beyond the scope of standard self-service data prep. The range of transformations you can perform on these vast datasets is also large and differs markedly from the relatively simple, formal process of extract, transform and load (ETL). PAGE 5
6 Data Preparation Design Patterns Data preparation involves exposing information in the data that is latent, obfuscated and/or which has a dependency on initial analysis before it can be discovered or detected. This need generates requirements well beyond the scope of most self-service data prep tools. Following are a number of common big data preparation design patterns that go beyond traditional cleansing, joining and transformation. These will make the promise of big data preparation more clear, more navigable and, in general, more actionable. By the time you ve finished reading this paper you should have a list of things that you want to get done with your data, a good set of ideas on how to proceed and a realistic expectation of the challenges you will likely encounter along the way. Sessionization Sessionization is capturing a number of discrete events into a specific window so that window can be analyzed as a unique entity. Sessionization is typically associated with clickstream analysis but is often used in time-series analysis as well. When it comes to clickstream analytics, while web log data can often be grouped and sorted by user and date, most clickstream analysis requires you to sort it by actual web browser session. This is done by carefully looking at the timestamp and then collating all log entries within a certain time window of each other. Sessionization is very powerful. This type of affinity grouping between otherwise independent time-series events makes all kinds of downstream analysis easy. You can also use the first and last time stamp in each group to calculate the length of the session, and count the number of rows in each group to get the number of clicks per session. Popular page pairs (i.e. two pages between which users often directly navigate) can be easily calculated as well. PAGE 6
7 Advanced grouping Sessionizing is a kind of custom grouping. Speaking more generally, custom grouping involves taking a large array of data rows and segregating them not by discrete values in a column but by some discretization of continuous values. Custom binning can be used across use cases, including fraud analysis, preventive maintenance, shopping patterns of walk-in customers at retail stores, and more. Abilities like custom binning, time windowing, statistical grouping (based, for example, on standard deviation) and path analysis functions are each relevant here. Does your data prep tool offer this functionality? If not, you may not be covered for important analyses. PAGE 7
8 Standard deviation and custom binning can be useful in applications like credit card fraud detection. Find all the transactions with amounts greater than two standard deviations from the average for transactions in the same vendor category and you have candidates for fraudulent transactions. Then cluster those records together in groups that represent, say, three-day transaction intervals, and you may see patterns of abuse that eliminate the false positives. Column splitting Column splitting refers to the process of breaking up columns that contain multiple pieces of information, into a series of smaller columns that contain a single value. Column splitting may be especially useful in the processing of data extracts from older transactional systems, some of which had tendencies to concatenate multiple values in the database and separate them only when populating fields on a screen. PAGE 8
9 Sometimes splitting a column can be done simply. Specify a delimiter character and instantly you have a number of new simple (non-compound), columns. Other times it s trickier, because the delimiters are inconsistent, and even consistent delimiters can be more complex than a specific single character. Some data prep tools make short work of the simple column splitting tasks: specify a compound column and a delimiter and new, separate, columns will be produced. Other products handle more complex splitting tasks. For example, breaking up free text into individual words or sentences, which may require expression-based separators, can help prepare data for sentiment analysis. Data Enrichment Data enrichment refers to the general task of integrating external data sets be they commercial, public or proprietary to derive more information from the data you already have. Enriching data is common in customer analytics where you want to see the impact of external factors (e.g. weather or median household income) on internally tracked data like customer spending levels. It can also be useful in building predictive models. Transactional history of who bought what and when is a great baseline for analysis but, in today s markets, organizations need to identify deeper insights from their transactional data. For example, we might want to ask, what was the impact of weather, seasonality, customer gender, income, or education level on sales? One can only find these insights through the process of data enrichment to the transactional data. PAGE 9
10 Many data prep products handle acquisition of public data sets, and some provide a graphical facility for blending or harmonizing them in with homegrown data. Some products will also handle complex joins between data sets, and precise ways to specify them. Column/row pivoting Pivoting refers to the process of turning data rows into columns, and vice versa. It comes up repeatedly with data sets like product catalogs, which track items, and a number of their attributes (like color or size). Attribute values often show up in rows, but are sometimes useful as classifications in which case modeling them as columns may work better for specific analyses. ID Item 1 Chair 2 Table 3 Coffee Maker ID Attribute 101 Color 102 Size 103 Material 104 Style Item Attribute Value ID Item Color Size Material Style 1 Chair Blue Large 2 Table Cherry Wood Shaker 3 Coffee Medium Maker Blue Large Cherry Wood Shaker Medium In relational databases, tables that have lots of columns, many of which are often empty, can be problematic. Such wide and sparse tables are not storage-efficient, and are sometimes reworked so that the would-be columns and their values are stored as pairs in a single, rather narrow table. Making these pseudo rows into real columns is necessary for most analytical scenarios. And, again, at other times, rows may contain actual values that, simply for analytical purposes, can work well as columns. For example, you might want each country in which you do business to be a single column, so that you can see the revenue for each. PAGE 10
11 Working with lists List processing involves constructing series of values into a single value, or extracting a particular value from such a series. Transforming a data set into lists can be useful in time-series analyses. Pulling values out can be useful when reading data from hierarchical file formats. In the Sessionization section, we discussed grouping web log data rows by session. In that scenario, concatenating visited pages makes it easy to determine popular landing pages and common exit pages. (This is discussed in more detail in the Path Construction section, below). The opposite type of transformation can also be helpful. Data stored in JavaScript Object Notation (JSON) format frequently contains a set of hierarchical child data as a list, which may need to be projected out into distinct rows, for further grouping and aggregation. In addition to processing all elements in a list, sometimes a particular list element, or set of elements, that is nonetheless a subset of the full list, needs to be extracted. Perhaps you d like to see just the members of a list that are also present in another list (useful when looking for commonalities or affinity); maybe you d like to remove null or empty members from a list, or remove all duplicate elements. You get the idea. You can perform these list processing functions in many products. The question to keep in mind is how much work you need to do to in the process. Do you need to perform these operations imperatively, by looping through all list elements, or is there a set of declarative functions such that each operation can be performed in a single step? Advanced parsing Parsing is the process of reading long text values and extracting smaller text passages, or individual values, from it. Breaking paragraphs into sentences, or sentences into words are examples. Crawling Web pages and looking for mentions of a particular phrase, or subsidiary data in a specific format, are others. The easier it is to parse text, the more parsing you can do. And the more complex the parsing patterns can be, the more powerful the resulting analyses. In general, parsing text, or encoding text into, and decoding text from, certain well-known formats, like HTML, XML and JSON, are important capabilities. And a technical standard, called regular expressions, provides a powerful way to find specific text, based on specifications of patterns, and extract that text, so that it can be analyzed. Known by developers as REGEX, it can be the key to converting raw data to refined data. PAGE 11
12 REGEX syntax is obscure. But an investment in learning it can bring very significant rewards. In addition, there are many websites that can provide the REGEX syntax you might need for a variety of use cases. For example, suppose you pulled down a web page, as HTML, containing names and US addresses of locations for an event. You know that to pull out the cities, states and zip codes for those event locations, you ll need to look for a proper name, followed by a comma, a space and a two letter abbreviation. That text will be followed another space and a 5-digit number (or a 9-digit number with a dash between digits 5 and 6). Imagine you re sifting through tweets and you d like to pull out all the hashtags you ve found. To do so, you can just use a regular expression (like \B#\w+ ) that specifies you want to see all words that begin with a # character. Some tools make advanced parsing easy. If your data preparation needs include advanced parsing such as using REGEX, you should ensure the tools you select offer this capability. PAGE 12
13 Path Construction Path analysis involves the manipulation of time series data to illustrate the points or values encountered, and the sequence in which they occurred. It is the very essence of clickstream, log, IoT and other analytics scenarios. Path analysis also builds on other techniques we ve discussed here, like using list processing and sessionizing your data. For example, to determine popular landing pages and common exit pages, start with simple web log data, then parse out the file name from the URL for each logged request. Next, sessionize the data and concatenate filenames from all the visited URLs into a list. That list will in fact describe a user s full path through your web site during their session. Now grab the first and last elements of each list to get the landing and exit page for each session. Combining these steps lets you see what landing pages visitors are gravitating toward, and also where you re losing them. Now, doing some profiling like determining the top 10 in each category is easy. This analysis is surprisingly simple, but without data preparation functionality for grouping and coalescing your data the right way, you wouldn t even get close. And while you want the data to be as clean as possible before you work with it, the mere cleansing of it is nowhere close to the whole story. In fact, it s just the very beginning. PAGE 13
14 Conclusion We ve articulated a number of use cases and patterns around data preparation, all of which involved operations that went well beyond data quality and remediation. Data preparation is not just about de-duplicating and removing corrupted or dirty data. It s also about re-shaping, or transforming, that data so the data reveals the answers to you. Cleansing data is important, especially in the big data world, where data is coming from unconventional sources. But a big part of data preparation, in its full scope, is to emancipate insight by reshaping data, and thus to make analysis more straightforward. The data preparation design patterns shown here cut across different analytics scenarios, and are industry-agnostic. Tokenizing text is useful for sentiment analysis, but it s also useful in path analysis. Enriching data works really well for marketing optimization, but is also applicable for scenarios ranging from figuring out how weather impacts equipment breakdowns to understanding event attendance patterns in different geographies. Ongoing Data Conditioning Perhaps the most important thing to understand about data preparation, especially when it comes to the design patterns discussed in the paper, is that it isn t just a oneand-done, up-front process. Instead, you can think of the whole discipline as data conditioning, which is an ongoing, iterative process. An analysis may need a curated, well prepared data set. Related analyses may require new datasets that further shape the data. This leads to analysts collaborating, sharing and building upon each other s prepared datasets, helping them save time and increasing their productivity. In addition, analytic pipelines, that run daily or weekly, might need to be visited over time, exposing the need for more data shaping, to get at information based on adjacent or more detailed results. Realizing you have additional data shaping needs after doing some analysis isn t a failure it s not even a matter of course correcting. In fact, it s a very natural workflow, and an indicator of well-directed discovery. As you analyze data, you become more familiar and intimate with it. As that happens, you better understand it, and you then understand the next question you want to ask. This is a cyclical process, and all of it takes place after data quality checks and cleansing have. To learn more about big data preparation, the key functionality required, and how it fits in with the larger big data analytics workflow, please visit the Datameer website at: PAGE 14
15 FREE TRIAL datameer.com/free-trial T WIT LINKEDIN linkedin.com/company/datameer 2016 Datameer, Inc. All rights reserved. Datameer is a trademark of Datameer, Inc. Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation. Other names may be trademarks of their respective owners.
Datameer for Data Preparation:
Datameer for Data Preparation: Explore, Profile, Blend, Cleanse, Enrich, Share, Operationalize DATAMEER FOR DATA PREPARATION: EXPLORE, PROFILE, BLEND, CLEANSE, ENRICH, SHARE, OPERATIONALIZE Datameer Datameer
More informationIntroduction to Data Science
UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics
More informationSelf-Service Data Preparation for Qlik. Cookbook Series Self-Service Data Preparation for Qlik
Self-Service Data Preparation for Qlik What is Data Preparation for Qlik? The key to deriving the full potential of solutions like QlikView and Qlik Sense lies in data preparation. Data Preparation is
More informationShine a Light on Dark Data with Vertica Flex Tables
White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,
More informationThe Definitive Guide to Preparing Your Data for Tableau
The Definitive Guide to Preparing Your Data for Tableau Speed Your Time to Visualization If you re like most data analysts today, creating rich visualizations of your data is a critical step in the analytic
More informationAn Oracle White Paper October Oracle Social Cloud Platform Text Analytics
An Oracle White Paper October 2012 Oracle Social Cloud Platform Text Analytics Executive Overview Oracle s social cloud text analytics platform is able to process unstructured text-based conversations
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationA detailed comparison of EasyMorph vs Tableau Prep
A detailed comparison of vs We at keep getting asked by our customers and partners: How is positioned versus?. Well, you asked, we answer! Short answer and are similar, but there are two important differences.
More informationMicroStrategy Academic Program
MicroStrategy Academic Program Creating a center of excellence for enterprise analytics and mobility. DATA PREPARATION: HOW TO WRANGLE, ENRICH, AND PROFILE DATA APPROXIMATE TIME NEEDED: 1 HOUR TABLE OF
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationChapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationSAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC
SAP Agile Data Preparation Simplify the Way You Shape Data Introduction SAP Agile Data Preparation Overview Video SAP Agile Data Preparation is a self-service data preparation application providing data
More informationData Warehouses Chapter 12. Class 10: Data Warehouses 1
Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is
More informationExcel and Tableau. A Beautiful Partnership. Faye Satta, Senior Technical Writer Eriel Ross, Technical Writer
Excel and Tableau A Beautiful Partnership Faye Satta, Senior Technical Writer Eriel Ross, Technical Writer Microsoft Excel is used by millions of people to track and sort data, and to perform various financial,
More informationMaking the Impossible Possible
Making the Impossible Possible Find and Eliminate Data Errors with Automated Discovery and Data Lineage Introduction Organizations have long struggled to identify and take advantage of opportunities for
More informationMAPR DATA GOVERNANCE WITHOUT COMPROMISE
MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance
More informationChapter 6 VIDEO CASES
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationOracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service
Demo Introduction Keywords: Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service Goal of Demo: Oracle Big Data Preparation Cloud Services can ingest data from various
More informationBuilding Self-Service BI Solutions with Power Query. Written By: Devin
Building Self-Service BI Solutions with Power Query Written By: Devin Knight DKnight@PragmaticWorks.com @Knight_Devin CONTENTS PAGE 3 PAGE 4 PAGE 5 PAGE 6 PAGE 7 PAGE 8 PAGE 9 PAGE 11 PAGE 17 PAGE 20 PAGE
More informationFROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE
FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE David C. Hay Essential Strategies, Inc In the buzzword sweepstakes of 1997, the clear winner has to be Data Warehouse. A host of technologies and techniques
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationANALYTICS DATA To Make Better Content Marketing Decisions
HOW TO APPLY ANALYTICS DATA To Make Better Content Marketing Decisions AS A CONTENT MARKETER you should be well-versed in analytics, no matter what your specific roles and responsibilities are in working
More informationHow to analyze JSON with SQL
How to analyze JSON with SQL SCHEMA-ON-READ MADE EASY Author: Kent Graziano 1 What s inside 3 Semi-structured brings new insights to business 4 Schema? No need! 5 How Snowflake solved this problem 6 Enough
More informationSAS Visual Analytics 8.2: Getting Started with Reports
SAS Visual Analytics 8.2: Getting Started with Reports Introduction Reporting The SAS Visual Analytics tools give you everything you need to produce and distribute clear and compelling reports. SAS Visual
More informationUNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX
UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their
More information1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar
1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar 1) What does the term 'Ad-hoc Analysis' mean? Choice 1 Business analysts use a subset of the data for analysis. Choice 2: Business analysts access the Data
More informationWhat s New in Spotfire DXP 1.1. Spotfire Product Management January 2007
What s New in Spotfire DXP 1.1 Spotfire Product Management January 2007 Spotfire DXP Version 1.1 This document highlights the new capabilities planned for release in version 1.1 of Spotfire DXP. In this
More informationSix Core Data Wrangling Activities. An introductory guide to data wrangling with Trifacta
Six Core Data Wrangling Activities An introductory guide to data wrangling with Trifacta Today s Data Driven Culture Are you inundated with data? Today, most organizations are collecting as much data in
More informationJohn Biancamano Inbound Digital LLC InboundDigital.net
John Biancamano Inbound Digital LLC 609.865.7994 InboundDigital.net About Me Owner of Inbound Digital, LLC digital marketing consulting and training: websites, SEO, advertising, and social media. Senior
More informationChapter 3. Foundations of Business Intelligence: Databases and Information Management
Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional
More informationManagement Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management
Management Information Systems Review Questions Chapter 6 Foundations of Business Intelligence: Databases and Information Management 1) The traditional file environment does not typically have a problem
More informationBusiness Impacts of Poor Data Quality: Building the Business Case
Business Impacts of Poor Data Quality: Building the Business Case David Loshin Knowledge Integrity, Inc. 1 Data Quality Challenges 2 Addressing the Problem To effectively ultimately address data quality,
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationBig Data The end of Data Warehousing?
Big Data The end of Data Warehousing? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Big data, data warehousing, advanced analytics, Hadoop, unstructured data Introduction If there was an Unwort
More informationThe Teradata Enterprise Analytic Data Set
Data Warehousing > Advanced Analytics The Teradata Enterprise Analytic Data Set By: Bill Franks, Partner, Teradata Advanced Business Analytics Table of Contents Introduction 2 What Is an Enterprise Analytic
More informationDATA MINING TRANSACTION
DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is
More informationCRM Insights. User s Guide
CRM Insights User s Guide Copyright This document is provided "as-is". Information and views expressed in this document, including URL and other Internet Web site references, may change without notice.
More informationOverture Advertiser Workbook. Chapter 4: Tracking Your Results
Overture Advertiser Workbook Chapter 4: Tracking Your Results Tracking Your Results TRACKING YOUR RESULTS Tracking the performance of your keywords enables you to effectively analyze your results, adjust
More informationSpecial Report. What to test (and how) to increase your ROI today
Special Report What to test (and how) to A well-designed test can produce an impressive return on investment. Of course, you may face several obstacles to producing that well-designed test to begin with.
More informationMeasuring Web 2.0. Business Challenges
Measuring Web 2.0 Technologies with WebTrends Marketing Lab Overview WebTrends 851 SW 6th Ave., Suite 700 Portland, OR 97204 1.503.294.7025 1.503.294.7130 fax US Toll Free 1-877-WebTrends (1-877-932-8736)
More informationMicroStrategy Desktop Quick Start Guide
MicroStrategy Desktop Quick Start Guide Version: 10.4 10.4, December 2017 Copyright 2017 by MicroStrategy Incorporated. All rights reserved. Trademark Information The following are either trademarks or
More informationDay 1 Agenda. Brio 101 Training. Course Presentation and Reference Material
Data Warehouse www.rpi.edu/datawarehouse Brio 101 Training Course Presentation and Reference Material Day 1 Agenda Training Overview Data Warehouse and Business Intelligence Basics The Brio Environment
More informationTDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.
Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide
More informationStatistics: Interpreting Data and Making Predictions. Visual Displays of Data 1/31
Statistics: Interpreting Data and Making Predictions Visual Displays of Data 1/31 Last Time Last time we discussed central tendency; that is, notions of the middle of data. More specifically we discussed
More informationQLIKVIEW SCALABILITY BENCHMARK WHITE PAPER
QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER Measuring Business Intelligence Throughput on a Single Server QlikView Scalability Center Technical White Paper December 2012 qlikview.com QLIKVIEW THROUGHPUT
More informationTitle: Episode 11 - Walking through the Rapid Business Warehouse at TOMS Shoes (Duration: 18:10)
SAP HANA EFFECT Title: Episode 11 - Walking through the Rapid Business Warehouse at (Duration: 18:10) Publish Date: April 6, 2015 Description: Rita Lefler walks us through how has revolutionized their
More informationCreate-a-Product API. User Guide. - Updated: 6/2018 -
Create-a-Product API User Guide - Updated: 6/2018 - Copyright (c) 2018, Zazzle Inc. All rights reserved. Zazzle is a registered trademark of Zazzle Inc. All other trademarks and registered trademarks are
More informationXcelerated Business Insights (xbi): Going beyond business intelligence to drive information value
KNOWLEDGENT INSIGHTS volume 1 no. 5 October 7, 2011 Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value Today s growing commercial, operational and regulatory
More informationEcommerce Site Search. A Guide to Evaluating Site Search Solutions
Ecommerce Site Search A Guide to Evaluating Site Search Solutions Contents 03 / Introduction 13 / CHAPTER 4: Tips for a Successful Selection Process 04 / CHAPTER 1: The Value of Site Search 16 / Conclusion
More informationGoogle Analytics. Gain insight into your users. How To Digital Guide 1
Google Analytics Gain insight into your users How To Digital Guide 1 Table of Content What is Google Analytics... 3 Before you get started.. 4 The ABC of Analytics... 5 Audience... 6 Behaviour... 7 Acquisition...
More informationInformatica Enterprise Information Catalog
Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with
More informationChapter 3 Process of Web Usage Mining
Chapter 3 Process of Web Usage Mining 3.1 Introduction Users interact frequently with different web sites and can access plenty of information on WWW. The World Wide Web is growing continuously and huge
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationGuide to Google Analytics: Admin Settings. Campaigns - Written by Sarah Stemen Account Manager. 5 things you need to know hanapinmarketing.
Guide to Google Analytics: Google s Enhanced Admin Settings Written by Sarah Stemen Account Manager Campaigns - 5 things you need to know INTRODUCTION Google Analytics is vital to gaining business insights
More informationData Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini
Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 669-674 Research India Publications http://www.ripublication.com/aeee.htm Data Warehousing Ritham Vashisht,
More informationData Modeling in Looker
paper Data Modeling in Looker Quick iteration of metric calculations for powerful data exploration By Joshua Moskovitz The Reusability Paradigm of LookML At Looker, we want to make it easier for data analysts
More informationValue of Data Transformation. Sean Kandel, Co-Founder and CTO of Trifacta
Value of Data Transformation Sean Kandel, Co-Founder and CTO of Trifacta Organizations today generate and collect an unprecedented volume and variety of data. At the same time, the adoption of data-driven
More informationI CAN T FIND THE #$%& DATA. Why You Need a Data Catalog
I CAN T FIND THE #$%& DATA Why You Need a Data Catalog Data is everywhere It s embedded in our social media, streaming across the Internet of Things, and stored in the cloud. The volume of data available
More informationunderstanding media metrics WEB METRICS Basics for Journalists FIRST IN A SERIES
understanding media metrics WEB METRICS Basics for Journalists FIRST IN A SERIES Contents p 1 p 3 p 3 Introduction Basic Questions about Your Website Getting Started: Overall, how is our website doing?
More informationSetup Google Analytics
Setup Google Analytics 1.1 Sign Up Google Analytics 1. Once you have a Google account, you can go to Google Analytics (https://analytics.google.com) and click the Sign into Google Analytics button. You
More informationLambda Architecture for Batch and Stream Processing. October 2018
Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationETL and OLAP Systems
ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationWHITE PAPER. The General Data Protection Regulation: What Title It Means and How SAS Data Management Can Help
WHITE PAPER The General Data Protection Regulation: What Title It Means and How SAS Data Management Can Help ii Contents Personal Data Defined... 1 Why the GDPR Is Such a Big Deal... 2 Are You Ready?...
More informationUsing Metadata Queries To Build Row-Level Audit Reports in SAS Visual Analytics
SAS6660-2016 Using Metadata Queries To Build Row-Level Audit Reports in SAS Visual Analytics ABSTRACT Brandon Kirk and Jason Shoffner, SAS Institute Inc., Cary, NC Sensitive data requires elevated security
More informationFacebook Page Insights
Facebook Product Guide for Facebook Page owners Businesses will be better in a connected world. That s why we connect 845M people and their friends to the things they care about, using social technologies
More informationTalend Big Data Sandbox. Big Data Insights Cookbook
Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is
More informationBenchmarks Prove the Value of an Analytical Database for Big Data
White Paper Vertica Benchmarks Prove the Value of an Analytical Database for Big Data Table of Contents page The Test... 1 Stage One: Performing Complex Analytics... 3 Stage Two: Achieving Top Speed...
More informationBasics of Dimensional Modeling
Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension
More informationQlik Sense Desktop. Data, Discovery, Collaboration in minutes. Qlik Sense Desktop. Qlik Associative Model. Get Started for Free
Qlik Sense Desktop Data, Discovery, Collaboration in minutes With Qlik Sense Desktop making business decisions becomes faster, easier, and more collaborative than ever. Qlik Sense Desktop puts rapid analytics
More informationTexas Death Row. Last Statements. Data Warehousing and Data Mart. By Group 16. Irving Rodriguez Joseph Lai Joe Martinez
Texas Death Row Last Statements Data Warehousing and Data Mart By Group 16 Irving Rodriguez Joseph Lai Joe Martinez Introduction For our data warehousing and data mart project we chose to use the Texas
More informationMicroStrategy Analytics Desktop
MicroStrategy Analytics Desktop Quick Start Guide MicroStrategy Analytics Desktop is designed to enable business professionals like you to explore data, simply and without needing direct support from IT.
More informationDocument your findings about the legacy functions that will be transformed to
1 Required slide 2 Data conversion is a misnomer. This implies a simple mapping of data fields from one system to another. In reality, transitioning from one system to another requires a much broader understanding
More informationUAccess ANALYTICS Next Steps: Working with Bins, Groups, and Calculated Items: Combining Data Your Way
UAccess ANALYTICS Next Steps: Working with Bins, Groups, and Calculated Items: Arizona Board of Regents, 2014 THE UNIVERSITY OF ARIZONA created 02.07.2014 v.1.00 For information and permission to use our
More informationemetrics Study Llew Mason, Zijian Zheng, Ron Kohavi, Brian Frasca Blue Martini Software {lmason, zijian, ronnyk,
emetrics Study Llew Mason, Zijian Zheng, Ron Kohavi, Brian Frasca Blue Martini Software {lmason, zijian, ronnyk, brianf}@bluemartini.com December 5 th 2001 2001 Blue Martini Software 1. Introduction Managers
More informationWhy All Column Stores Are Not the Same Twelve Low-Level Features That Offer High Value to Analysts
White Paper Analytics & Big Data Why All Column Stores Are Not the Same Twelve Low-Level Features That Offer High Value to Analysts Table of Contents page Compression...1 Early and Late Materialization...1
More informationFeatured Articles AI Services and Platforms A Practical Approach to Increasing Business Sophistication
118 Hitachi Review Vol. 65 (2016), No. 6 Featured Articles AI Services and Platforms A Practical Approach to Increasing Business Sophistication Yasuharu Namba, Dr. Eng. Jun Yoshida Kazuaki Tokunaga Takuya
More informationOLAP Introduction and Overview
1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata
More informationIntermediate Tableau Public Workshop
Intermediate Tableau Public Workshop Digital Media Commons Fondren Library Basement B42 dmc-info@rice.edu (713) 348-3635 http://dmc.rice.edu 1 Intermediate Tableau Public Workshop Handout Jane Zhao janezhao@rice.edu
More informationIT1105 Information Systems and Technology. BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing. Student Manual
IT1105 Information Systems and Technology BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing Student Manual Lesson 3: Organizing Data and Information (6 Hrs) Instructional Objectives Students
More informationWKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems
Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring
More informationEXCEL BASICS: MICROSOFT OFFICE 2010
EXCEL BASICS: MICROSOFT OFFICE 2010 GETTING STARTED PAGE 02 Prerequisites What You Will Learn USING MICROSOFT EXCEL PAGE 03 Opening Microsoft Excel Microsoft Excel Features Keyboard Review Pointer Shapes
More informationBUYING DECISION CRITERIA WHEN DEVELOPING IOT SENSORS
BUYING DECISION CRITERIA WHEN DEVELOPING IOT SENSORS PHILIP POULIDIS VIDEO TRANSCRIPT What is your name and what do you do? My name is Philip Poulidis and I m the VP and General Manager of Mobile and Internet
More informationTopics covered 10/12/2015. Pengantar Teknologi Informasi dan Teknologi Hijau. Suryo Widiantoro, ST, MMSI, M.Com(IS)
Pengantar Teknologi Informasi dan Teknologi Hijau Suryo Widiantoro, ST, MMSI, M.Com(IS) 1 Topics covered 1. Basic concept of managing files 2. Database management system 3. Database models 4. Data mining
More informationData Explorer: User Guide 1. Data Explorer User Guide
Data Explorer: User Guide 1 Data Explorer User Guide Data Explorer: User Guide 2 Contents About this User Guide.. 4 System Requirements. 4 Browser Requirements... 4 Important Terminology.. 5 Getting Started
More informationTURN DATA INTO ACTIONABLE INSIGHTS. Google Analytics Workshop
TURN DATA INTO ACTIONABLE INSIGHTS Google Analytics Workshop The Value of Analytics Google Analytics is more than just numbers and stats. It tells the story of how people are interacting with your brand
More informationMicroStrategy Academic Program
MicroStrategy Academic Program Creating a center of excellence for enterprise analytics and mobility. GEOSPATIAL ANALYTICS: HOW TO VISUALIZE GEOSPATIAL DATA ON MAPS AND CUSTOM SHAPE FILES APPROXIMATE TIME
More informationDecisionPoint For Excel
DecisionPoint For Excel Getting Started Guide 2015 Antivia Group Ltd Notation used in this workbook Indicates where you need to click with your mouse Indicates a drag and drop path State >= N Indicates
More informationGeospatial Day II Introduction to ArcGIS Editor for Open Street Map
Geospatial Day II Introduction to ArcGIS Editor for Open Street Map Geospatial Operations Support Team (GOST) Katie McWilliams kmcwilliams@worldbankgroup.org GOST@worldbank.org 0 Using OSM for Network
More informationMicrosoft Power BI for O365
Microsoft Power BI for O365 Next hour.. o o o o o o o o Power BI for O365 Data Discovery Data Analysis Data Visualization & Power Maps Natural Language Search (Q&A) Power BI Site Data Management Self Service
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationMeaning & Concepts of Databases
27 th August 2015 Unit 1 Objective Meaning & Concepts of Databases Learning outcome Students will appreciate conceptual development of Databases Section 1: What is a Database & Applications Section 2:
More informationGOOGLE ANALYTICS HELP PRESENTATION. We Welcome You to. Google Analytics Implementation Guidelines
GOOGLE ANALYTICS HELP PRESENTATION We Welcome You to Google Analytics Implementation Guidelines 05/23/2008 Ashi Avalon - Google Analytics Implementation Presentation Page 1 of 28 1) What Is Google Analytics?
More informationFrom Insight to Action: Analytics from Both Sides of the Brain. Vaz Balasingham Director of Solutions Consulting
From Insight to Action: Analytics from Both Sides of the Brain Vaz Balasingham Director of Solutions Consulting vbalasin@tibco.com Insight to Action from Both Sides of the Brain Value Grow Revenue Reduce
More informationEasing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide
Paper 809-2017 Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide ABSTRACT Marje Fecht, Prowerk Consulting Whether you have been programming in SAS for years, are new to
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationRelease September 2018
Oracle Fusion Middleware What's New for Oracle Data Visualization Desktop E76890-11 September 2018 What s New for Oracle Data Visualization Desktop Here s an overview of the new features and enhancements
More informationSee Types of Data Supported for information about the types of files that you can import into Datameer.
Importing Data When you import data, you import it into a connection which is a collection of data from different sources such as various types of files and databases. See Configuring a Connection to learn
More information