High-Performance Statistical Modeling

Size: px
Start display at page:

Download "High-Performance Statistical Modeling"

Transcription

1 High-Performance Statistical Modeling Koen Knapen Academic Day, March 27 th, 2014 SAS Tervuren

2 The Routes (Roots) Of Confusion How do I get HP procedures? Just add HP?? Single-machine mode Distributed mode Distributed-Alongside Scalability REG vs. HPREG GENMOD vs. HPGENSELECT Symmetric vs. Asymmetric Mode support.sas.com/statistics/papers

3 Part 1: General Considerations

4 GENERAL CONSIDERATIONS Execution Modes Single-Machine Mode Executes entirely on the server where SAS is installed Also called client mode or SMP (Symmetric Multi-Processing) mode Distributed Mode Major computations done on an appliance ( blade server ) Also called MPP (massively parallel processing) mode

5 Single-Machine Mode SAS Server proc hpgenselect data=a2013; class c:; model ypoisson = x: c: ; selection method=stepwise; run; The HPA procedure determines the n of concurrent threads based on the n of CPUs (cores) on server.

6 Appliance - Racks of Blades and Software Multi-socket, multi-core platform Commodity blade Chassis of blades Appliance / blade server = tightly integrated homogeneous cluster of computers that are arranged in racks. The individual computers in each rack are called nodes or blades. Database appliances include database software.

7 Database Appliance Controller Worker Nodes A table is stored in parts across multiple worker nodes SQL queries operate in parallel on the different parts of the table

8 GENERAL CONSIDERATIONS Data Access Features Client-data (or local-data) method data are moved from SAS server to distributed computing environment. Alongside-the-database-method Data are stored in distributed DBMS and are read in parallel from the distributed DBMS into a SAS analytic process that runs on the database appliance. Alongside-HDFS method HDFS: Hadoop Distributed File System Alongside-LASR method The data are loaded from a SAS LASR Analytic Server that runs on the appliance.

9 Availability

10 AVAILABILITY High-Performance Analytical Products High-Performance Analytics Product Associated MVA Product SAS High-Performance Statistics SAS/STAT SAS High-Performance Econometrics SAS/ETS SAS High-Performance Optimization SAS/OR SAS High-Performance Data Mining SAS Enterprise Miner SAS High-Performance Text Mining SAS Text Miner SAS High-Performance Forecasting SAS High-Performance Forecasting MVA products include single-machine mode operation of HP procedures as part of the MVA product license.

11 AVAILABILITY SAS High-Performance Product Offerings Release 13.1 Available in December with SAS 9.4M High-Performance Statistics High-Performance Data Mining High-Performance Text Mining High-Performance Optimization High-Performance Econometrics High-Performance Forecasting 2 HPLOGISTIC HPREDUCE HPTMINE OPTLSO HPCOUNTREG HPFORECAST HPREG HPLMIXED HPNLMOD HPSPLIT HPGENSELECT HPQUANTSELECT HPFMM HPNEURAL HPFOREST HP4SCORE HPDECIDE HPCLUS HPSVM HPBNET HPTMSCORE Select features in OPTMILP OPTLP OPTMODEL HPSEVERITY HPQLIM HPPANEL HPCOPULA HPCDM HPTIMEDATA HPCANDISC HPPRINCOMP Common Set (HPDS2, HPDMDB, HPSAMPLE, HPSUMMARY, HPIMPUTE, HPBIN, HPCORR)

12 Part 2: High-Performance Statistical Modeling

13 HIGH-PERFORMANCE STATISTICAL MODELING General Design Principles for HPA Procedures 1. Support single-machine and distributed modes 2. Use multithreading to exploit all CPUs 3. Support a variety of data sources 4. Require syntactical consistency across modes 5. Require syntactical consistency across HPA procedures

14 HIGH-PERFORMANCE STATISTICAL MODELING Design Principles for High-Performance Statistical Procedures 1. Focus on prediction and not post-fit inference 2. Standardize and improve syntax where needed 3. Support model selection where appropriate 4. Combine functionality from SAS/STAT procedures when appropriate 5. Provide new functionality within HPA framework when viable

15 HIGH-PERFORMANCE STATISTICAL MODELING Functionality of HPGENSELECT Procedure Fits generalized linear models Distributions: Normal, Poisson, Tweedie, Link functions: log, logit, Linear predictors: effects involving continuous and classification variables Provides model building Forward, backward, stepwise methods Multiple criteria for choosing model: AIC, AICC, SBC Splitting of classification effects Writes DATA step code for computing predicted values

16 HIGH-PERFORMANCE STATISTICAL MODELING GENMOD or HPGENSELECT? GENMOD Fits models with moderate-to-large data Offers rich set of methods for statistical inference GEE methods for correlated responses Bayesian inference Exact conditional regression Wide array of postfitting analysis: contrasts, estimates, tests, HPGENSELECT Fits and builds models with large-to-massive data Designed for large-data tasks such as predictive model building

17 Performance Comparisons

18 Scalable Percentage Not Scalable Scalable t s t 1 Scalable Percentage = 100 t s / t 1 = 60%

19 Amdahl s Law Not Scalable 40% Scalable 60% 1 CPU t s t 1 57% 43% 2 CPUs ½ t s t 2 Speedup = t 1 /t 2 = %

20 HIGH-PERFORMANCE STATISTICAL MODELING Scalability and Big Data Amdahl s law implies a limit to scalability. Yet every job has some unavoidable serial component. Reading data with a single I/O controller in single-machine mode Establishing connections to an appliance and database in distributed mode

21 HIGH-PERFORMANCE STATISTICAL MODELING Benefits 1. High-performance procedures in SAS/STAT deliver modeling methods and scalability for a wide range of problem sizes. 2. If you have SAS/STAT, you can run these procedures in single-machine mode and exploit all the cores. 3. As your problem size grows, you can take full advantage of all the cores and huge amounts of memory available in distributed computing environments.

22 High-Performance Statistical Modeling Koen Knapen Academic Day, March 27 th, 2014 SAS Tervuren

WHAT S NEW IN SAS INCLUDING BASE, STAT, SAS ENTERPRISE GUIDE

WHAT S NEW IN SAS INCLUDING BASE, STAT, SAS ENTERPRISE GUIDE WHAT S NEW IN SAS INCLUDING BASE, STAT, SAS ENTERPRISE GUIDE AGENDA WHAT S NEW Base SAS/Stat SAS Enterprise Guide 2 SAS 9 WHAT S NEW THEME Productivity enhancements Improved graphics More powerful algorithms

More information

High-Performance Procedures in SAS 9.4: Comparing Performance of HP and Legacy Procedures

High-Performance Procedures in SAS 9.4: Comparing Performance of HP and Legacy Procedures Paper SD18 High-Performance Procedures in SAS 9.4: Comparing Performance of HP and Legacy Procedures Jessica Montgomery, Sean Joo, Anh Kellermann, Jeffrey D. Kromrey, Diep T. Nguyen, Patricia Rodriguez

More information

SAS and Hadoop. paulmkent. 3 rd Annual State of the Union. Paul Kent VP BigData, SAS

SAS and Hadoop.  paulmkent. 3 rd Annual State of the Union. Paul Kent VP BigData, SAS SAS and Hadoop 3 rd Annual State of the Union Paul Kent VP BigData, SAS Paul.Kent @ sas.com @hornpolish paulmkent SAS and Hadoop :: the BIG Picture SAS and Hadoop are made for each other This talk explains

More information

What does SAS Enterprise Miner do? For whom is SAS Enterprise Miner designed?

What does SAS Enterprise Miner do? For whom is SAS Enterprise Miner designed? FACT SHEET SAS Enterprise Miner Create highly accurate analytical models that enable you to predict with confidence What does SAS Enterprise Miner do? It streamlines the data mining process so you can

More information

SAS Enterprise Miner High-Performance Procedures

SAS Enterprise Miner High-Performance Procedures SAS Enterprise Miner 13.1 High-Performance Procedures The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2013. SAS Enterprise Miner 13.1: High-Performance Procedures.

More information

Base SAS 9.4 Procedures Guide

Base SAS 9.4 Procedures Guide Base SAS 9.4 Procedures Guide High-Performance Procedures Fourth Edition The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. Base SAS 9.4 Procedures Guide: High-Performance

More information

Base SAS 9.4 Procedures Guide

Base SAS 9.4 Procedures Guide Base SAS 9.4 Procedures Guide High-Performance Procedures Fifth Edition The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. Base SAS 9.4 Procedures Guide: High-Performance

More information

SAS Meets Big Iron: High Performance Computing in SAS Analytic Procedures

SAS Meets Big Iron: High Performance Computing in SAS Analytic Procedures SAS Meets Big Iron: High Performance Computing in SAS Analytic Procedures Robert A. Cohen SAS Institute Inc. Cary, North Carolina, USA Abstract Version 9targets the heavy-duty analytic procedures in SAS

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

SAS High-Performance Analytics Products

SAS High-Performance Analytics Products Fact Sheet What do SAS High-Performance Analytics products do? With high-performance analytics products from SAS, you can develop and process models that use huge amounts of diverse data. These products

More information

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with

More information

Regression Model Building for Large, Complex Data with SAS Viya Procedures

Regression Model Building for Large, Complex Data with SAS Viya Procedures Paper SAS2033-2018 Regression Model Building for Large, Complex Data with SAS Viya Procedures Robert N. Rodriguez and Weijie Cai, SAS Institute Inc. Abstract Analysts who do statistical modeling, data

More information

Text Mine Your Big Data: What High Performance Really Means WHITE PAPER

Text Mine Your Big Data: What High Performance Really Means WHITE PAPER Text Mine Your Big Data: What High Performance Really Means WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 How It Works.... 2 SAS High-Performance Text Mining... 5 SAS High-Performance

More information

SAS Text Miner High-Performance Procedures

SAS Text Miner High-Performance Procedures SAS Text Miner 12.3 High-Performance Procedures The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2013. SAS Text Miner 12.3: High-Performance Procedures. Cary, NC: SAS

More information

Netezza The Analytics Appliance

Netezza The Analytics Appliance Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for

More information

SAS Text Miner High-Performance Procedures

SAS Text Miner High-Performance Procedures SAS Text Miner 14.2 High-Performance Procedures The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. SAS Text Miner 14.2: High-Performance Procedures. Cary, NC: SAS

More information

Scoring with Analytic Stores

Scoring with Analytic Stores Scoring with Analytic Stores Merve Yasemin Tekbudak, SAS Institute Inc., Cary, NC In supervised learning, scoring is the process of applying a previously built predictive model to a new data set in order

More information

Massively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data

Massively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data Big Data Really Fast A Proven In-Memory Analytical Processing Platform for Big Data 2 Executive Summary / Overview: Big Data can be a big headache for organizations that have outgrown the practicality

More information

Model Selection Using Information Criteria (Made Easy in SAS )

Model Selection Using Information Criteria (Made Easy in SAS ) ABSTRACT Paper 2587-2018 Model Selection Using Information Criteria (Made Easy in SAS ) Wendy Christensen, University of California, Los Angeles Today s statistical modeler has an unprecedented number

More information

Bull Fast Track/PDW and Big Data

Bull Fast Track/PDW and Big Data Bull Fast Track/PDW and Big Data Add High Performance BI to your Big Data Roger Van Unen Expert Microsoft / BI roger.van-unen@bull.net http://www.bull.fr/bi/fastrack.html Michael Schmitter BI Sales Germany

More information

GLMSELECT for Model Selection

GLMSELECT for Model Selection Winnipeg SAS User Group Meeting May 11, 2012 GLMSELECT for Model Selection Sylvain Tremblay SAS Canada Education Copyright 2010 SAS Institute Inc. All rights reserved. Proc GLM Proc REG Class Statement

More information

The Future of the SAS Platform

The Future of the SAS Platform SAS USER FORUM FINLAND 2017 The Future of the SAS Platform Fiona McNeill @fiona_r_mcn The analytics economy Our digital transformation to power the analytics economy Model inventory & management Asset

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Chapter 3. Foundations of Business Intelligence: Databases and Information Management Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional

More information

SAS Platform Strategy Prepared for FANS usergroup. Mike Frost, Director, Product Management Fiona McNeill, Global Product Marketing

SAS Platform Strategy Prepared for FANS usergroup. Mike Frost, Director, Product Management Fiona McNeill, Global Product Marketing SAS Platform Strategy Prepared for FANS usergroup Mike Frost, Director, Product Management Fiona McNeill, Global Product Marketing Information is subject to change. Q1 2017 Q2 2017 Q3 2017 Q4 2017 H1

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Bridging Traditional Analytics with BigData - SAS on UCS

Bridging Traditional Analytics with BigData - SAS on UCS Bridging Traditional Analytics with BigData - SAS on UCS Vadiraja Bhatt, Principal Engineer Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session

More information

Twelve Cluster Technologies Available in SAS 9.4

Twelve Cluster Technologies Available in SAS 9.4 ABSTRACT Paper SAS415-2017 Twelve Cluster Technologies Available in SAS 9.4 Rob Collum, SAS Institute Inc. We are always looking for ways to improve the performance, efficiency, and availability of our

More information

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower Optimizing Your Analytics Life Cycle with SAS & Teradata Rick Lower 1 Agenda The Analytic Life Cycle Common Problems SAS & Teradata solutions Analytical Life Cycle Exploration Explore All Your Data Preparation

More information

GEN_OMEGA2: The HPSUMMARY Procedure: A SAS Macro for Computing the Generalized Omega-Squared Effect Size Associated with

GEN_OMEGA2: The HPSUMMARY Procedure: A SAS Macro for Computing the Generalized Omega-Squared Effect Size Associated with GEN_OMEGA2: A SAS Macro for Computing the Generalized Omega-Squared Effect Size Associated with The HPSUMMARY Procedure: Analysis of Variance Models An Old Friend s Younger (and Brawnier) Cousin The HPSUMMARY

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

Introduction to Hadoop and MapReduce

Introduction to Hadoop and MapReduce Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large

More information

The Future of the SAS Platform. Mathias

The Future of the SAS Platform. Mathias The Future of the SAS Platform Mathias Coopmans @macoopma The analytics economy The question is not whether data should be shared, but how we can usher in responsible methods for doing so. Link to Press

More information

Microsoft Analytics Platform System (APS)

Microsoft Analytics Platform System (APS) Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES

Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES Tom Atwood Business Development Manager Sun Microsystems, Inc. Takeaways Understand the technical differences between

More information

Information Criteria Methods in SAS for Multiple Linear Regression Models

Information Criteria Methods in SAS for Multiple Linear Regression Models Paper SA5 Information Criteria Methods in SAS for Multiple Linear Regression Models Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN ABSTRACT SAS 9.1 calculates Akaike s Information

More information

What s New in SAS 9.3

What s New in SAS 9.3 What s New in SAS 9.3 Steve Herskovits Copyright 2010 SAS Institute Inc. All rights reserved. Big Data, Big Analytics, Data Governance 2 For the users interacting daily with SAS software SAS 9.3 delivers:

More information

Resource allocation for autonomic data centers using analytic performance models.

Resource allocation for autonomic data centers using analytic performance models. Bennani, Mohamed N., and Daniel A. Menasce. "Resource allocation for autonomic data centers using analytic performance models." Autonomic Computing, 2005. ICAC 2005. Proceedings. Second International Conference

More information

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018 Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/

More information

Every SAS Cloud has a Silver Lining. Letting your data reign in the cloud

Every SAS Cloud has a Silver Lining. Letting your data reign in the cloud Every SAS Cloud has a Silver Lining Letting your data reign in the cloud DSS SAS SYSTEM Current Single Virtual Server unit with 16 cores upgraded to 32 cores 256 Gb RAM 150 registered users Data collector

More information

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241

More information

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

Data Quality Control for Big Data: Preventing Information Loss With High Performance Binning

Data Quality Control for Big Data: Preventing Information Loss With High Performance Binning Data Quality Control for Big Data: Preventing Information Loss With High Performance Binning ABSTRACT Deanna Naomi Schreiber-Gregory, Henry M Jackson Foundation, Bethesda, MD It is a well-known fact that

More information

Introducing Oracle R Enterprise 1.4 -

Introducing Oracle R Enterprise 1.4 - Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I

More information

Data Mining: STATISTICA

Data Mining: STATISTICA Outline Data Mining: STATISTICA Prepare the data Classification and regression (C & R, ANN) Clustering Association rules Graphic user interface Prepare the Data Statistica can read from Excel,.txt and

More information

Data Quality Control: Using High Performance Binning to Prevent Information Loss

Data Quality Control: Using High Performance Binning to Prevent Information Loss SESUG Paper DM-173-2017 Data Quality Control: Using High Performance Binning to Prevent Information Loss ABSTRACT Deanna N Schreiber-Gregory, Henry M Jackson Foundation It is a well-known fact that the

More information

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University 18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

BIG DATA TESTING: A UNIFIED VIEW

BIG DATA TESTING: A UNIFIED VIEW http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation

More information

Data Management - 50%

Data Management - 50% Exam 1: SAS Big Data Preparation, Statistics, and Visual Exploration Data Management - 50% Navigate within the Data Management Studio Interface Register a new QKB Create and connect to a repository Define

More information

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation SAS Enterprise Miner Performance on IBM System p 570 Jan, 2008 Hsian-Fen Tsao Brian Porter Harry Seifert IBM Corporation Copyright IBM Corporation, 2008. All Rights Reserved. TABLE OF CONTENTS ABSTRACT...3

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Improving Performance and Ensuring Scalability of Large SAS Applications and Database Extracts

Improving Performance and Ensuring Scalability of Large SAS Applications and Database Extracts Improving Performance and Ensuring Scalability of Large SAS Applications and Database Extracts Michael Beckerle, ChiefTechnology Officer, Torrent Systems, Inc., Cambridge, MA ABSTRACT Many organizations

More information

Top Five Reasons for Data Warehouse Modernization Philip Russom

Top Five Reasons for Data Warehouse Modernization Philip Russom Top Five Reasons for Data Warehouse Modernization Philip Russom TDWI Research Director for Data Management May 28, 2014 Sponsor Speakers Philip Russom TDWI Research Director, Data Management Steve Sarsfield

More information

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10 Onur Kahraman High Performance Is No Longer A Nice To Have In Analytical Applications Users expect Google Like performance from

More information

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface Data Mining: i STATISTICA Outline Prepare the data Classification and regression Clustering Association rules Graphic user interface 1 Prepare the Data Statistica can read from Excel,.txt and many other

More information

chapter two: building your first report... 15

chapter two: building your first report... 15 An Introduction to SAS Visual Analytics: How to Explore Numbers, Design Reports, and Gain Insight into Your Data. Full book available for purchase here. contents about this book... ix about these authors...

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS CIS 601 - Graduate Seminar Presentation 1 GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS PRESENTED BY HARINATH AMASA CSU ID: 2697292 What we will talk about.. Current problems GPU What are GPU Databases GPU

More information

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015 GLM II Basic Modeling Strategy 2015 CAS Ratemaking and Product Management Seminar by Paul Bailey March 10, 2015 Building predictive models is a multi-step process Set project goals and review background

More information

Analyzing Big Data with Microsoft R

Analyzing Big Data with Microsoft R Analyzing Big Data with Microsoft R 20773; 3 days, Instructor-led Course Description The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis

More information

DriveScale-DellEMC Reference Architecture

DriveScale-DellEMC Reference Architecture DriveScale-DellEMC Reference Architecture DellEMC/DRIVESCALE Introduction DriveScale has pioneered the concept of Software Composable Infrastructure that is designed to radically change the way data center

More information

Overview of Data Services and Streaming Data Solution with Azure

Overview of Data Services and Streaming Data Solution with Azure Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server

More information

Tackling the Challenges of Big Data! Tackling The Challenges of Big Data. This Module. Samuel Madden. Samuel Madden. Visualizing Twitter

Tackling the Challenges of Big Data! Tackling The Challenges of Big Data. This Module. Samuel Madden. Samuel Madden. Visualizing Twitter Samuel Madden Professor and Director of Big Data at CSAIL Massachusetts Institute of Technology Introduction to Twitter Data Samuel Madden Professor and Director of Big Data at CSAIL Massachusetts Institute

More information

Some software included in SAS Foundation may display a release number other than 9.2.

Some software included in SAS Foundation may display a release number other than 9.2. Copyright Notice The correct bibliographic citation for this manual is as follows: SAS Institute Inc., SAS 9.2 Foundation System Requirements for Linux for Intel Architecture, Cary, NC: SAS Institute Inc.,

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete

More information

Pervasive Insight. Mission Critical Platform

Pervasive Insight. Mission Critical Platform Empowered IT Pervasive Insight Mission Critical Platform Dynamic Development Desktop & Mobile Server & Datacenter Cloud Over 7 Million Downloads of SQL Server 2008 Over 30,000 partners are offering solutions

More information

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT. Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem

More information

Pervasive DataRush TM

Pervasive DataRush TM Pervasive DataRush TM Parallel Data Analysis with KNIME www.pervasivedatarush.com Company Overview Global Software Company Tens of thousands of users across the globe Americas, EMEA, Asia ~230 employees

More information

An Oracle White Paper December SAS Application Performance on the Oracle M5-32 SPARC Server

An Oracle White Paper December SAS Application Performance on the Oracle M5-32 SPARC Server An Oracle White Paper December 2013 SAS Application Performance on the Oracle M5-32 SPARC Server Introduction... 2 SAS Application Solutions Exploit Oracle's SPARC Technology... 2 Managing SAS Workloads

More information

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree World Applied Sciences Journal 21 (8): 1207-1212, 2013 ISSN 1818-4952 IDOSI Publications, 2013 DOI: 10.5829/idosi.wasj.2013.21.8.2913 Decision Making Procedure: Applications of IBM SPSS Cluster Analysis

More information

Data Quality Control: Using High Performance Binning to Prevent Information Loss

Data Quality Control: Using High Performance Binning to Prevent Information Loss Paper 2821-2018 Data Quality Control: Using High Performance Binning to Prevent Information Loss Deanna Naomi Schreiber-Gregory, Henry M Jackson Foundation ABSTRACT It is a well-known fact that the structure

More information

Overview. Audience profile. At course completion. Course Outline. : 20773A: Analyzing Big Data with Microsoft R. Course Outline :: 20773A::

Overview. Audience profile. At course completion. Course Outline. : 20773A: Analyzing Big Data with Microsoft R. Course Outline :: 20773A:: Module Title Duration : 20773A: Analyzing Big Data with Microsoft R : 3 days Overview The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis

More information

Logistic Model Selection with SAS PROC s LOGISTIC, HPLOGISTIC, HPGENSELECT

Logistic Model Selection with SAS PROC s LOGISTIC, HPLOGISTIC, HPGENSELECT MWSUG 2017 - Paper AA02 Logistic Model Selection with SAS PROC s LOGISTIC, HPLOGISTIC, HPGENSELECT Bruce Lund, Magnify Analytic Solutions, Detroit MI, Wilmington DE, Charlotte NC ABSTRACT In marketing

More information

VOLTDB + HP VERTICA. page

VOLTDB + HP VERTICA. page VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

SAS/STAT 15.1 User s Guide The HPREG Procedure

SAS/STAT 15.1 User s Guide The HPREG Procedure SAS/STAT 15.1 User s Guide The HPREG Procedure This document is an individual chapter from SAS/STAT 15.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute Inc.

More information

Data Analytics and Machine Learning: From Node to Cluster

Data Analytics and Machine Learning: From Node to Cluster Data Analytics and Machine Learning: From Node to Cluster Presented by Viswanath Puttagunta Ganesh Raju Understanding use cases to optimize on ARM Ecosystem Date BKK16-404B March 10th, 2016 Event Linaro

More information

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest

More information

System Requirements for SAS 9.4 Foundation for Solaris for x64

System Requirements for SAS 9.4 Foundation for Solaris for x64 System Requirements for SAS 9.4 Foundation for Solaris for x64 Copyright Notice The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. System Requirements for SAS 9.4

More information

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 22: MapReduce We are talking about parallel query processing There exist two main types of engines: Parallel DBMSs (last lecture + quick review)

More information

Topics covered 10/12/2015. Pengantar Teknologi Informasi dan Teknologi Hijau. Suryo Widiantoro, ST, MMSI, M.Com(IS)

Topics covered 10/12/2015. Pengantar Teknologi Informasi dan Teknologi Hijau. Suryo Widiantoro, ST, MMSI, M.Com(IS) Pengantar Teknologi Informasi dan Teknologi Hijau Suryo Widiantoro, ST, MMSI, M.Com(IS) 1 Topics covered 1. Basic concept of managing files 2. Database management system 3. Database models 4. Data mining

More information

Integrate MATLAB Analytics into Enterprise Applications

Integrate MATLAB Analytics into Enterprise Applications Integrate Analytics into Enterprise Applications Dr. Roland Michaely 2015 The MathWorks, Inc. 1 Data Analytics Workflow Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

System Requirements for SAS 9.4 Foundation for AIX

System Requirements for SAS 9.4 Foundation for AIX System Requirements for SAS 9.4 Foundation for AIX Copyright Notice The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. System Requirements for SAS 9.4 Foundation

More information

SAS/STAT 15.1 User s Guide The HPQUANTSELECT Procedure

SAS/STAT 15.1 User s Guide The HPQUANTSELECT Procedure SAS/STAT 15.1 User s Guide The HPQUANTSELECT Procedure This document is an individual chapter from SAS/STAT 15.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute

More information

Advanced Analytics with Enterprise Guide Catherine Truxillo, Ph.D., Stephen McDaniel, and David McNamara, SAS Institute Inc.

Advanced Analytics with Enterprise Guide Catherine Truxillo, Ph.D., Stephen McDaniel, and David McNamara, SAS Institute Inc. Advanced Analytics with Enterprise Guide Catherine Truxillo, Ph.D., Stephen McDaniel, and David McNamara, SAS Institute Inc., Cary, NC ABSTRACT From SAS/STAT to SAS/ETS to SAS/QC to SAS/GRAPH, Enterprise

More information

10th August Part One: Introduction to Parallel Computing

10th August Part One: Introduction to Parallel Computing Part One: Introduction to Parallel Computing 10th August 2007 Part 1 - Contents Reasons for parallel computing Goals and limitations Criteria for High Performance Computing Overview of parallel computer

More information

Decision Management with DS2

Decision Management with DS2 Decision Management with DS2 Helen Fowler, Teradata Corporation, West Chester, Ohio Tho Nguyen, Teradata Corporation, Raleigh, North Carolina ABSTRACT We all make tactical and strategic decisions every

More information

Project Requirements

Project Requirements Project Requirements Version 4.0 2 May, 2016 2015-2016 Computer Science Department, Texas Christian University Revision Signatures By signing the following document, the team member is acknowledging that

More information

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013 SAP HANA Jake Klein/ SVP SAP HANA June, 2013 SAP 3 YEARS AGO Middleware BI / Analytics Core ERP + Suite 2013 WHERE ARE WE NOW? Cloud Mobile Applications SAP HANA Analytics D&T Changed Reality Disruptive

More information

Some software included in SAS Foundation may display a release number other than 9.2.

Some software included in SAS Foundation may display a release number other than 9.2. Copyright Notice The correct bibliographic citation for this manual is as follows: SAS Institute Inc., SAS 9.2 Foundation System Requirements for AIX, Cary, NC: SAS Institute Inc., 2012. SAS 9.2 Foundation

More information

HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J. Abadi, Alexander Rasin and Avi Silberschatz Presented by

More information

Distributed Data Analytics Introduction

Distributed Data Analytics Introduction G-3.1.09, Campus III Hasso Plattner Institut Information Systems Team Prof. Felix Naumann Dr. Ralf Krestel Tim Repke Diana Stephan project DuDe Duplicate Detection Data Fusion Sebastian Kruse Data Change

More information

Evolving To The Big Data Warehouse

Evolving To The Big Data Warehouse Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from

More information

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process

More information