Analysis of Big Data Tim Miller, Sr. Analytics Consultant Teradata Alexander Kolovos, Ph.D., Advanced Analytics Software Engineer Teradata
|
|
- Kristian Rogers
- 5 years ago
- Views:
Transcription
1 Analysis of Big Data Tim Miller, Sr. Analytics Consultant Teradata Alexander Kolovos, Ph.D., Advanced Analytics Software Engineer Teradata March 28, 2017
2 Your Presenters Tim Miller Senior Analytics Consultant Teradata Corporation Expertise in advanced analytic software, systems and methodologies. Principal engineer for the first commercial in-database data mining system, Teradata Warehouse Miner. Consultant to Teradata analytic partners (SAS, SPSS, etc.) and customers. Retired youth basketball, football, baseball, softball, soccer, etc. coach Alexander Kolovos, Ph.D. Advanced Analytics Software Engineer Teradata Corporation Expertise in analytical methodologies and platforms Specialization in space-time data and stochastic predictive analysis Ph.D. in Sciences and Engineering 4 years in Teradata (Analytics Engineer) 6 years in SAS (Spatial software expert) Loves language, theater, & rock music Teradata
3 Overview Big Data: Brief review Storage and Availability Information in your hands. Wanna keep it where? Analysis Exploratory and summary. Is this enough? Can pretty pictures tell stories? Recreate the world around you Tools and tricks of the trade Walking the walk The foundation, the strategy, the tools Managing the big story Model management Application management Teradata
4 Overview Big Data: Brief review Storage and Availability Information in your hands. Wanna keep it where? Analysis Exploratory and summary. Is this enough? Can pretty pictures tell stories? Recreate the world around you Tools and tricks of the trade Walking the walk The foundation, the strategy, the tools Managing the big story Model management Application management Teradata
5 Big Data: Brief Review Last half-century marked precipitous technological advances that gradually provided unprecedented computing power to speed up calculations that were previously time-consuming or even time-prohibitive enabled progressively increasing monitoring, recording and storing of empirical information, specifically focusing on data measurements Schematic depiction of Moore s Law (computational power doubles annually to Physics laws limit) Teradata
6 Big Data: Brief Review A more sober approach appears to have followed an initial frenzy about the availability of large volumes of information. According to Hortonworks Inc., developer of the Hadoop platform: Big data describes the realization of greater business intelligence by storing, processing, and analyzing data that was previously ignored or siloed due to the limitations of traditional data management technologies Teradata
7 Big Data: Brief Review Big Data is the 3 Vs: Variety Structured and unstructured data Volume Tera- (10 12 ) peta- (10 15 ) and even exabytes (10 18 ) of data Velocity Data flows into your organization at an increasing rate Teradata
8 Big Data: Brief Review Big Data bring forward the issue of scaling: Solve problems old or new, trivial or elaborate in entirely new frameworks characterized by increasing data sizes Face new challenges: Conceive and apply appropriate methodologies Maintain competitive performance Engineer effective hardware architectures Generate suitable software solutions Example: Handle matrix inversions for increasingly large matrix sizes Example: Deal with sparse data in very large dimensions Teradata
9 Overview Big Data: Brief review Storage and Availability Information in your hands. Wanna keep it where? Analysis Exploratory and summary. Is this enough? Can pretty pictures tell stories? Recreate the world around you Tools and tricks of the trade Walking the walk The foundation, the strategy, the tools Managing the big story Model management Application management Teradata
10 Storing Your Information So you have this dazzling amount of information. Where will you keep it? Teradata
11 Storing Your Information Nowadays, majority of options can be rather summarized in the following: Locally: Computer / Drive Locally: Company Server Cloud: Remote Server Teradata
12 Storing Your Information Nowadays, majority of options can be rather summarized in the following: [Typically] somebody else s computer! No matter whether it is called DropBox, Box, icloud, AWS, Samsung Cloud, etc. it is what it is: Somebody else s computer. Cloud: Remote Server Privacy concerns and data safety: Huge topics in the era of Big Data! Teradata
13 Availability and Safekeeping If your data are not in your hands, then where are they? Cloud servers can be hardware located anywhere in the world. Data can be stored in multiple copies, possibly in different locations, too, to prevent loss in case of hardware or network failures Example: Amazon Web Services Global Infrastructure Teradata
14 Availability and Safekeeping What steps can be taken to protect your data? Formal legislation Data encryption, safety protocols, restricted access A very sensitive topic in the nascent steps of new tech. Technology offers great business opportunities, but Caution needed to prevent putting the cart before the horse Teradata
15 Overview Big Data: Brief review Storage and Availability Information in your hands. Wanna keep it where? Analysis Exploratory and summary. Is this enough? Can pretty pictures tell stories? Recreate the world around you Tools and tricks of the trade Walking the walk The foundation, the strategy, the tools Managing the big story Model management Application management Teradata
16 Now What? Assume your data is kept safe somewhere. What comes next? Teradata
17 The Next Step In academic environments: Use data to answer scientific questions about a phenomenon or study an attribute of interest. In a business context: Gain insight about problem to understand market, optimize operations, increase profit, etc.; commonly expressed as aiming to increase Business Intelligence Teradata
18 D e b t < 1 0 % o f I n c o m e D e b t = 0 % Y G o o d C r e d i t R i s k s e s Y e s N O I n c o m e > $ 4 0 K B a d C r e d i t R i s k s N O N O Y e s G o o d C r e d i t R i s k s Preliminary Data Exploration Big Data analysis conceptually similar to any other data analysis. First step: Perform preliminary exploratory analysis Obtain data Make records available within a data processing environment Data may be accessed at storage location or brought over locally Ensure all analysis-relevant datasets are present Such as unprocessed / raw data, different contributing data collections. Server Desktop Teradata
19 Preliminary Data Exploration Big Data analysis conceptually similar to any other data analysis. First step: Perform preliminary exploratory analysis Inspect data For missing values and errors Check for type mismatch Missings: Remove record? Assign default? Additional quality checks Check for out-of-bound values (Ex.: Remove neg recs from positive variable) Transform variables as needed (Ex.: Isolate street number from address string) Yield secondary variables as needed (Ex.: Time duration from date range) Isolate and handle extremes, if makes sense to do so Teradata
20 Preliminary Data Exploration Big Data analysis conceptually similar to any other data analysis. First step: Perform preliminary exploratory analysis Possibly run summary calculations For example, compute minimum, maximum, average, etc. Insight from charts, plots, maps Visualization can be huge aid for cognitive understanding or: When, then vs. helps Teradata
21 Preliminary Data Exploration Scale matters in Big Data. Visualization can be all the more elaborate and important when exploring complexity and multiple dimensions in data. At the end of the day, are summary statistics and pretty pictures enough? Teradata
22 Transition To Data Workbench Most typically, one needs deeper insight to help drive decision-making. Very often, one may want to predict a variable Examples: Assess sales volume; project the number of passengers for an airline select between multiple options when taking action Examples: Accept or reject a transaction? Has a threshold been exceeded? classify/put in order a series of items Examples: How can a business distribute retail stores? Which customers are loyal? To provide answers to similar problems, one must seek to understand behavior of a target variable as a function of the data features Teradata
23 Transition To Data Workbench One starts with the output of the first exploratory step, that is the so-called Analytics Data Set (ADS). The ADS contains all the features (variables) that are assessed to be relevant to the target variable. In each problem, one utterly seeks to build an accurate enough representation of reality to enable inference about the target variable. This is done in the next step, otherwise known as Second step: Data modeling A model is a representation of reality. Data modeling serves in discovering and establishing associations among the data to describe accurately the target variable Teradata
24 Transition To Data Workbench How do you fit a model to data? Keep in mind the Indetermination Thesis: In principle, there may exist an infinite number of curves that satisfy a given dataset Teradata
25 Transition To Data Workbench How do you fit a model to data? Keep in mind the Indetermination Thesis: In principle, there may exist an infinite number of curves that satisfy a given dataset Teradata
26 Transition To Data Workbench How do you fit a model to data? Keep in mind the Indetermination Thesis: In principle, there may exist an infinite number of curves that satisfy a given dataset Teradata
27 Transition To Data Workbench How do you fit a model to data? A theoretical model can eliminate the bulk of possibilities and provide us with a few meaningful ones Teradata
28 Data Modeling: Prediction We distinguish 2 main data modeling categories: Prediction and clustering. Prediction: A model is trained by data to develop data-driven behavior Training data: The subset used to train and validate the model behavior Testing/scoring data: The subset used by the model to yield predictions Also known as: Supervised learning Teradata
29 Data Modeling: Prediction We distinguish 2 main data modeling categories: Prediction and clustering. Prediction: Commonly used methodologies include: Regression: Family of techniques suitable for a variety of tasks; some types are Linear: Suitable for continuous variable prediction Logistic: Suitable for prediction of binary outcome or class variables (discrete values) k-nearest Neighbor: Prediction based on averaged response from k nearest samples Ridge: An option to solve ill-posed (overfitted or underfitted) problems Neural Networks: Parametric, layered models inspired by how neurons work Decision Trees: Node-based classification algorithm. Each node makes a decision by using a condition on one of the input features. Random Forests: A regressor made up of multiple decision trees. Performs partial analysis for each tree; then averages answers from trees Teradata
30 Data Modeling: Clustering We distinguish 2 main data modeling categories: Prediction and clustering. Clustering: Associate and categorize data in groups (clusters) on the basis of specified group characteristics Model can be used to split a data set in a desired number of clusters Also known as: Unsupervised learning Teradata
31 Data Modeling: Clustering We distinguish 2 main data modeling categories: Prediction and clustering. Clustering: Commonly used methodologies include: K-Means Algorithm: Data separated into specified number of clusters around center points called centroids. Objective is to minimize each cluster s data distance from the cluster centroid. Done by minimizing a criterion called inertia. Hierarchical Clustering: Family of clustering algorithms. Objective is to build nested clusters in the form of a dendrogram tree. The tree root is a unique cluster that gathers all the samples; the leaves are clusters with a single sample. Variations / Combinations of the above that include Spectral Clustering: Performs a low-dimensional embedding of the affinity matrix between samples; then applies K-Means. Interactive Clustering: First user K-Means to create small clusters; then applies hierarchical clustering to each one of those Teradata
32 Data Modeling: Contemporary Trends Aside from core statistical techniques adopted for data modeling Big Data analysis also gave rise to some very popular concepts in the field: Machine Learning: Actually, ML is a Computer Science domain Very similar to Computational Statistics. Focuses on using/combining statistical methodologies to solve problems through use of computers alone. Artificial Intelligence: A neighboring concept; relies on machine-only intelligence. Deep Learning: A class of ML algorithms Cascading multiple nonlinear processing units for feature transformation and/or extraction. Based on unsupervised learning of multiple levels of data features. Hierarchical representation: Higher level features derived from lower level ones. Multiple level approach: Different levels of abstraction; a hierarchy of concepts Teradata
33 Data Modeling: Business Applications Often enough, specific methodologies might be recommended more than others for particular business tasks. Business Issue Customer Segmentation Propensity to Buy Attrition Lifetime Value Purchase Sequence Sales Forecasting Customer Acquisition and Prospecting Profitability analysis Campaign Effectiveness Assessment Data Mining Analytical Approaches Clustering, Factor Analysis, Ranking and Tiering Induction Trees, Logistic Regression, Neural Nets Induction Trees, Logistic Regression, Neural Nets Net Present value, Structural Equation Modeling Association/Affinity and Sequence Analysis, Time Series Time Series, Neural Nets, Linear Regression Induction Trees, Logistic Regression, Neural Nets Activity-based costing, process-based costing, Neural Nets, Rule Induction, Logistic Regression, Discriminant Analysis Teradata
34 Overview Big Data: Brief review Storage and Availability Information in your hands. Wanna keep it where? Analysis Exploratory and summary. Is this enough? Can pretty pictures tell stories? Recreate the world around you Tools and tricks of the trade Walking the walk The foundation, the strategy, the tools Managing the big story Model management Application management Teradata
35 Big Data Analysis Setup: Foundation Everything we saw to this point requires a core framework to be built on... Hardware, including: Servers Storage devices Network infrastructure Connectors Computing resources Teradata
36 Big Data Analysis Setup: Foundation Everything we saw to this point requires a core framework to be built on... Software for the engineering foundation to operate on to enable data transfers for hardware setup for communication between hardware, and processing requests Teradata
37 Big Data Analysis Setup: Architecture Everything we saw to this point requires an architectural approach strategy. Example: A very simple strategy: I can do everything from the comfort of my multi-core computer! Teradata
38 Big Data Analysis Setup: Architecture Everything we saw to this point requires an architectural approach strategy. Often in practice, one of the following computing architectures is adopted: Side-by-side Distributed Computing In-Database PC Client PC Client PC Client Request Results Server or Cluster Of Servers Request Results Data Extract Data Extract Database Database Database Teradata
39 D e b t < 1 0 % o f I n c o m e D e b t = 0 % Y e s G o o d C r e d i t R i s k s Y e s N O I n c o m e > $ 4 0 K B a d C r e d i t R i s k s N O N O Y e s G o o d C r e d i t R i s k s D e b t < 1 0 % o f I n c o m e D e b t = 0 % Y G o o d C r e d i t R i s k s e s Y e s N O I n c o m e > $ 4 0 K B a d C r e d i t R i s k s N O N O Y e s G o o d C r e d i t R i s k s Big Data Analysis Setup: Architecture Everything we saw to this point requires an architectural approach strategy. In-Database architecture may carry significant speed advantages. Sample Data Desktop and Server Analytic Architecture Results Results Processing Request In-Database Analytic Architecture Exponential Performance Improvement Teradata
40 Big Data Analysis Setup: Architecture Everything we saw to this point requires an architectural approach strategy. In business applications, the architecture strategy further implies how data are accessed to enable lower cost and higher speed in decision-making. OPERATIONAL SYSTEMS DECISION MAKERS OPERATIONAL SYSTEMS DECISION MAKERS Integrated Data Warehouse (IDW) Teradata Fragmented vs. Integrated data approach
41 Big Data Analysis Setup: Tools Everything we saw to this point requires programming and software. Programming languages and software packages provide the building blocks for implementation of suitable algorithms and analysis with Big Data Teradata
42 Big Data Analysis Setup: Tools Everything we saw to this point requires programming and software. Analytical frameworks and interfaces provide creative programming platforms that facilitate data analysis and understanding Teradata
43 Big Data Analysis Setup: People Above all, everything we saw to this point requires people on the helm. Sign of times? The industry no longer needs just a mathematician, a statistician, a programmer, or a general scientist. The new superstar in the age of Big Data is a Data Scientist: Coined as a person with a most versatile skillset to perform all-in-one tasks such as handling computationally any size of datasets possessing statistical prowess and modeling skills understanding and programming relational data storage (databases) solving problems; visualize, communicate well Teradata
44 Overview Big Data: Brief review Storage and Availability Information in your hands. Wanna keep it where? Analysis Exploratory and summary. Is this enough? Can pretty pictures tell stories? Recreate the world around you Tools and tricks of the trade Walking the walk The foundation, the strategy, the tools Managing the big story Model management Application management Teradata
45 In Production: Model Management Aspects Data analysis yields an optimally fit model for prediction. For an individual or a small team, this is where a Big Data analysis might come to completion. In an industrial/commercial setting, however: Data refresh: Customer data might be in constant flow. Deliverable must account for streaming data flows and continuous model use Data Scientists Business Analysts Marketing Front-Line Workers Engineers Customers / Partners Executives Operational Systems LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS Teradata
46 In Production: Model Management Aspects Data analysis yields an optimally fit model for prediction. For an individual or a small team, this is where a Big Data analysis might come to completion. In an industrial/commercial setting, however: Scoring, validation, and model health: Model must be kept current. With time, data characteristics might change and prediction may deteriorate. Mechanisms should perform quality checks and support model retraining Teradata
47 In Production: Model Management Aspects Data analysis yields an optimally fit model for prediction. For an individual or a small team, this is where a Big Data analysis might come to completion. In an industrial/commercial setting, however: Promotion: Model may need to circulate across an organization. Different teams might require training to learn about a model Champion model: Analysis may indicate multiple prevailing solutions. Different teams might need to select own champion model for usage among challenger ones. In addition, this might be a repeatable periodic process Teradata
48 In Production: Application Management Aspects In industrial/commercial settings, a successful model could be converted into an application for broader adoption. Topics pertaining to this case are: Application development: Fitting a new application in existing ecosystem Effort to retain model functionality and provide users with control over parameters and variables to specify Deployment and Extensions: Distributing, maintaining and extending the application Teradata
49 Review: Flow Overview For Big Data Analysis What you want to do with Big Data? A conceptual analytical framework Data Refresh Scoring / Validation Champion & Challenger Model Management Health Statistics Promotion Version Control Application Deployment Application Extensions Application Management Edge Systems Streams IoT Workflow Management Data integration Interface Data Profiling Provisioning Lifecycle Dashboard Tools Algorithms Feature Engineering ADS Data Science Lab Data Science Workbench Production Analytics Common Scoring Library Analytics Batch Stream Data Ingest Modeling Analytic Data Set Repository ADS Metadata Data Stores Scoring Data Set Repository Scoring Metadata Batch Stream Data Ingest Teradata
50 50
BIG DATA SCIENTIST Certification. Big Data Scientist
BIG DATA SCIENTIST Certification Big Data Scientist Big Data Science Professional (BDSCP) certifications are formal accreditations that prove proficiency in specific areas of Big Data. To obtain a certification,
More informationDeploying, Managing and Reusing R Models in an Enterprise Environment
Deploying, Managing and Reusing R Models in an Enterprise Environment Making Data Science Accessible to a Wider Audience Lou Bajuk-Yorgan, Sr. Director, Product Management Streaming and Advanced Analytics
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationBEST BIG DATA CERTIFICATIONS
VALIANCE INSIGHTS BIG DATA BEST BIG DATA CERTIFICATIONS email : info@valiancesolutions.com website : www.valiancesolutions.com VALIANCE SOLUTIONS Analytics: Optimizing Certificate Engineer Engineering
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationOptimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower
Optimizing Your Analytics Life Cycle with SAS & Teradata Rick Lower 1 Agenda The Analytic Life Cycle Common Problems SAS & Teradata solutions Analytical Life Cycle Exploration Explore All Your Data Preparation
More informationEvent: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect
Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of
More informationNow, Data Mining Is Within Your Reach
Clementine Desktop Specifications Now, Data Mining Is Within Your Reach Data mining delivers significant, measurable value. By uncovering previously unknown patterns and connections in data, data mining
More informationUNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX
UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationData Mining: Models and Methods
Data Mining: Models and Methods Author, Kirill Goltsman A White Paper July 2017 --------------------------------------------------- www.datascience.foundation Copyright 2016-2017 What is Data Mining? Data
More informationData Science Course Content
CHAPTER 1: INTRODUCTION TO DATA SCIENCE Data Science Course Content What is the need for Data Scientists Data Science Foundation Business Intelligence Data Analysis Data Mining Machine Learning Difference
More informationExploratory Analysis: Clustering
Exploratory Analysis: Clustering (some material taken or adapted from slides by Hinrich Schutze) Heejun Kim June 26, 2018 Clustering objective Grouping documents or instances into subsets or clusters Documents
More informationSTATISTICS (STAT) Statistics (STAT) 1
Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).
More informationIntroduction to Data Science
UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics
More informationADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA
INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS
More informationAnalytics Fundamentals by Mark Peco
Analytics Fundamentals by Mark Peco All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks of their respective
More informationData Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software
1 CHAPTER 1 Introduction to SAS Enterprise Miner Software Data Mining Overview 1 Layout of the SAS Enterprise Miner Window 2 Using the Application Main Menus 3 Using the Toolbox 8 Using the Pop-Up Menus
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationCOMP 465 Special Topics: Data Mining
COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,
More informationData Science. Data Analyst. Data Scientist. Data Architect
Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &
More informationInternational Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16
The Survey Of Data Mining And Warehousing Architha.S, A.Kishore Kumar Department of Computer Engineering Department of computer engineering city engineering college VTU Bangalore, India ABSTRACT: Data
More informationOracle Big Data Discovery
Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It
More information2 The IBM Data Governance Unified Process
2 The IBM Data Governance Unified Process The benefits of a commitment to a comprehensive enterprise Data Governance initiative are many and varied, and so are the challenges to achieving strong Data Governance.
More informationData Management Glossary
Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative
More informationYunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction
More informationSpotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data
Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing
More informationData mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014
Data Mining Data mining processes What technological infrastructure is required? Data mining is a system of searching through large amounts of data for patterns. It is a relatively new concept which is
More informationATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V
ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Create the Data Center of the Future Accelerate
More informationMACHINE LEARNING Example: Google search
MACHINE LEARNING Lauri Ilison, PhD Data Scientist 20.11.2014 Example: Google search 1 27.11.14 Facebook: 350 million photo uploads every day The dream is to build full knowledge of the world and know everything
More informationImplementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies
Implementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies Joseph Coughlin, Rohit Mital, Shashi Nittur, Benjamin SanNicolas, Christian Wolf, Rinor Jusufi Stinger
More informationOverview of Data Services and Streaming Data Solution with Azure
Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server
More informationIntroducing SAS Model Manager 15.1 for SAS Viya
ABSTRACT Paper SAS2284-2018 Introducing SAS Model Manager 15.1 for SAS Viya Glenn Clingroth, Robert Chu, Steve Sparano, David Duling SAS Institute Inc. SAS Model Manager has been a popular product since
More informationINTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...
INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data
More informationBuild a system health check for Db2 using IBM Machine Learning for z/os
Build a system health check for Db2 using IBM Machine Learning for z/os Jonathan Sloan Senior Analytics Architect, IBM Analytics Agenda A brief machine learning overview The Db2 ITOA model solutions template
More informationCustomer Clustering using RFM analysis
Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras
More informationThis document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and
AI and Visual Analytics: Machine Learning in Business Operations Steven Hillion Senior Director, Data Science Anshuman Mishra Principal Data Scientist DISCLAIMER During the course of this presentation,
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationFast Innovation requires Fast IT
Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:
More informationOverview and Practical Application of Machine Learning in Pricing
Overview and Practical Application of Machine Learning in Pricing 2017 CAS Spring Meeting May 23, 2017 Duncan Anderson and Claudine Modlin (Willis Towers Watson) Mark Richards (Allstate Insurance Company)
More informationVirtuoso Infotech Pvt. Ltd.
Virtuoso Infotech Pvt. Ltd. About Virtuoso Infotech Fastest growing IT firm; Offers the flexibility of a small firm and robustness of over 30 years experience collectively within the leadership team Technology
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationLearning Objectives for Data Concept and Visualization
Learning Objectives for Data Concept and Visualization Assignment 1: Data Quality Concept and Impact of Data Quality Summarize concepts of data quality. Understand and describe the impact of data on actuarial
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationBig Data Analytics The Data Mining process. Roger Bohn March. 2016
1 Big Data Analytics The Data Mining process Roger Bohn March. 2016 Office hours HK thursday5 to 6 in the library 3115 If trouble, email or Slack private message. RB Wed. 2 to 3:30 in my office Some material
More informationtowards advanced HR Analytics by Arie-Jan Baan and Bram Eigenhuis
towards advanced HR Analytics by Arie-Jan Baan and Bram Eigenhuis Content #1 advanced Data Analytics (?) #2 data Science Process #3 a case study #4 your case #5 Q&A Who is who? and what is your expectation?
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationAutomate Transform Analyze
Competitive Intelligence 2.0 Turning the Web s Big Data into Big Insights Automate Transform Analyze Introduction Today, the web continues to grow at a dizzying pace. There are more than 1 billion websites
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing
More informationBIG DATA SCIENCE PROFESSIONAL Certification. Big Data Science Professional
BIG DATA SCIENCE PROFESSIONAL Certification Big Data Science Professional Big Data Science Professional (BDSCP) certifications are formal accreditations that prove proficiency in specific areas of Big
More informationThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments bit.ly/datalake SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin,
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationHybrid Data Platform
UniConnect-Powered Data Aggregation Across Enterprise Data Warehouses and Big Data Storage Platforms A Percipient Technology White Paper Author: Ai Meun Lim Chief Product Officer Updated Aug 2017 2017,
More informationACHIEVEMENTS FROM TRAINING
LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM
More informationMachine Learning in Python. Rohith Mohan GradQuant Spring 2018
Machine Learning in Python Rohith Mohan GradQuant Spring 2018 What is Machine Learning? https://twitter.com/myusuf3/status/995425049170489344 Traditional Programming Data Computer Program Output Getting
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationGetting Started with Advanced Analytics in Finance, Marketing, and Operations
Getting Started with Advanced Analytics in Finance, Marketing, and Operations Southwest Regional Oracle Applications User Group Dan Vlamis February 24, 2017 @VlamisSoftware Vlamis Software Solutions Vlamis
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationOracle Machine Learning Notebook
Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com
More informationData Mining: Approach Towards The Accuracy Using Teradata!
Data Mining: Approach Towards The Accuracy Using Teradata! Shubhangi Pharande Department of MCA NBNSSOCS,Sinhgad Institute Simantini Nalawade Department of MCA NBNSSOCS,Sinhgad Institute Ajay Nalawade
More informationClustering algorithms and autoencoders for anomaly detection
Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms
More informationPutting it all together: Creating a Big Data Analytic Workflow with Spotfire
Putting it all together: Creating a Big Data Analytic Workflow with Spotfire Authors: David Katz and Mike Alperin, TIBCO Data Science Team In a previous blog, we showed how ultra-fast visualization of
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationTransforming Utility Grid Operations with the Internet of Things
Solution Brief Internet of Things Energy Industry Transforming Utility Grid Operations with the Internet of Things Access key process data in real time to increase situational awareness of grid operations.
More informationGETTING STARTED WITH DATA MINING
GETTING STARTED WITH DATA MINING Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIR Forum 2017 Washington, D.C. 1 Using Data
More informationAn Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa
An Introduction to Data Mining in Institutional Research Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa AIR/SPSS Professional Development Series Background Covering variety
More informationDATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:
DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business
More informationdata-based banking customer analytics
icare: A framework for big data-based banking customer analytics Authors: N.Sun, J.G. Morris, J. Xu, X.Zhu, M. Xie Presented By: Hardik Sahi Overview 1. 2. 3. 4. 5. 6. Why Big Data? Traditional versus
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationOutrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS
Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with
More information8/3/2017. Contour Assessment for Quality Assurance and Data Mining. Objective. Outline. Tom Purdie, PhD, MCCPM
Contour Assessment for Quality Assurance and Data Mining Tom Purdie, PhD, MCCPM Objective Understand the state-of-the-art in contour assessment for quality assurance including data mining-based techniques
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationAmbition Market Insights
The second half of 2017 has seen strong hiring activities, driven by a number of key factors, across the technology sector. Many organisations were embracing technology to make their business more efficient
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationCS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek
CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function
More informationQ1) Describe business intelligence system development phases? (6 marks)
BUISINESS ANALYTICS AND INTELLIGENCE SOLVED QUESTIONS Q1) Describe business intelligence system development phases? (6 marks) The 4 phases of BI system development are as follow: Analysis phase Design
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More informationCONFIDENTLY INTEGRATE VMWARE CLOUD ON AWS WITH INTELLIGENT OPERATIONS
SOLUTION OVERVIEW CONFIDENTLY INTEGRATE VMWARE WITH INTELLIGENT OPERATIONS VMware Cloud TM on AWS brings VMware s enterprise class Software-Defined Data Center (SDDC) software to the AWS Cloud, with optimized
More informationThink & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)
Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) About the Speaker Dr. SubraMANI Paramasivam PhD., MCT, MCSE, MCITP, MCP, MCTS, MCSA CEO, Principal Consultant & Trainer
More informationSAS Enterprise Miner : What does the future hold?
SAS Enterprise Miner : What does the future hold? David Duling EM Development Director SAS Inc. Sascha Schubert Product Manager Data Mining SAS International Topics for Discussion: EM 4.2/SAS 9.0 AF/SCL
More informationBig Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition
Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition What s the BIG deal?! 2011 2011 2008 2010 2012 What s the BIG deal?! (Gartner Hype Cycle) What s the
More informationA Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York
A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationA Systematic Overview of Data Mining Algorithms
A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a
More informationBig Data Integration BIG DATA 9/15/2017. Business Performance
BIG DATA Business Performance Big Data Integration Big data is often about doing things that weren t widely possible because the technology was not advanced enough or the cost of doing so was prohibitive.
More informationCorrelative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC
Correlative Analytic Methods in Large Scale Network Infrastructure Hariharan Krishnaswamy Senior Principal Engineer Dell EMC 2018 Storage Developer Conference. Dell EMC. All Rights Reserved. 1 Data Center
More informationEnterprise Data Architect
Enterprise Data Architect Position Summary Farmer Mac maintains a considerable repository of financial data that spans over two decades. Farmer Mac is looking for a hands-on technologist and data architect
More informationSVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines
SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines Boriana Milenova, Joseph Yarmus, Marcos Campos Data Mining Technologies Oracle Overview Support Vector
More informationCorrelation Based Feature Selection with Irrelevant Feature Removal
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationBig Data Specialized Studies
Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate
More informationAPPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE
APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata
More informationHow a Metadata Repository enables dynamism and automation in SDTM-like dataset generation
Paper DH05 How a Metadata Repository enables dynamism and automation in SDTM-like dataset generation Judith Goud, Akana, Bennekom, The Netherlands Priya Shetty, Intelent, Princeton, USA ABSTRACT The traditional
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More informationIvy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)
Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V) Based on Industry Cases, Live Exercises, & Industry Executed Projects Module (I) Analytics Essentials 81 hrs 1. Statistics
More information