Surfing the SAS cache

Size: px
Start display at page:

Download "Surfing the SAS cache"

Transcription

1 Surfing the SAS cache to improve optimisation Michael Thompson Department of Employment / Quantam Solutions

2 Background Did first basic SAS course in 1989 Didn t get it at all Actively avoided SAS programing had very capable team members to do it for me Went consulting in 1999 First interviewed was asked a lot of SAS questions which I could answer because I had had good SAS people working for me Wasn t asked if I could program in SAS! Only realised starting on day one that SAS was required Had to learn via SAS tech support web page really quickly Discovered quite quickly that SAS proc SQL was both quick to program and ran fast Yes I am an unashamed SQL zealot!

3 Let s make this interactive Don t be shy! Better Ideas Observations Insights Might be years of SAS experience in the room!

4 What is going on that affects us Budget cuts drive for more efficiency We must do more with less Demands for more responsiveness to support the business of our organisations Government must follow private enterprise and understand at a finer level the attributes and needs of the public we serve and be able to quickly measure effectiveness of both new and old policy

5 What is going on that affects us Big Data (Sure it is one of the latest buzzwords, however ) The data is growing massively and will not stop Maybe in the next few years more data will be created than in the last 40,000 With so much data around, an important skill will be the ability to discern which data to ignore But also the creativity to identify surprising new ways to use new data to serve our organisations

6 What is going on that affects us The advent of the micro-policy is coming. Policies benefiting small numbers of people Developed and implemented quickly Evaluated quickly If they fail make sure they fail fast To facilitate concepts like micro-policies it is up to us ensure our IT areas have the right data available, are even more flexible, responsive and our output is trusted (both by perception and reality)

7 What is going on that affects us Trend to open public data to scrutiny Obama s second administration directed government (taking account of privacy and national security) to publish government data and open it to scrutiny Given enough eyeballs all insights are shallow Slight twist on Linus Law (Linix creator) crowdsource research May mean our data processes need to be more robust and we may have an increased workload We may more often need to verify insights identified in our data by others

8 What used to be most important For SAS programmers (creativity aside) Writing code so it executed quickly Getting IF tests in the right order reducing tests executed Using ELSE to avoid executing un-needed IF tests Sub-setting where statements after set ing data Keeping numbers of fields and their length down to a minimum

9 Now things have changed Moore s law has continued to work CPU speeds have increased massively Disk storage is getting larger and cheaper per Gb

10 Moore s Law Transistor counts on integrated circuits double every 2 years (Ref: Wikipedia Moore s Law)

11 Why change the way I do things now? Computers will just get faster and help me keep up! Unfortunately the speed data can be read, written and transmitted has not kept up This means that as our data volumes increase, even though processors are getting faster and storage is getting bigger and cheaper, read/write speeds are not keeping up It doesn t matter how fast our CPUs are or how efficiently we write our code if the bottle neck is the reading data from and writing data to storage

12 As a SAS programmer what things can we do? Write our programs or processes more quickly helps no matter what the data volumes Where our data volumes are high, engineer processes which optimise/minimise IO

13 As a SAS programmer what things can we do? (to speed up joining data from 2 or more tables) I personally believe that at the moment (and this may change as we more and more utilise solid state memory) the best thing we can do to speed up our SAS is to understand how the cache operates and work with it. When SAS reads data from storage it doesn t just read 1 record It reads many into cache.

14 As a SAS programmer what things can we do? (to speed up joining data from 2 or more tables) The #1 thing we can do to optimise the cache is to ensure tables we are joining are sorted by the key variable we are joining with and preferably indexed by that variable as well This means that when we start to read two tables into memory as we join the first records from each table we have also just read the data for thousands of subsequent matches.

15 As a SAS programmer what things can we do? (to speed up joining data from 2 or more tables) This utilisation of the data flowing through cache can be further enhanced by ensuring the tables are compressed (preferably using SPDE binary compression) If you really need to make the matched even faster ensuring that only the fields needed in the output are in the input files. Allowing even more records to be read in each bite of the cache

16 Caveats Many of these advantages can be lost if more than 2 tables are joined at the same time. SAS internally if joining 3 tables actually joins 2 tables, writes the output to work and then joins the work table to the 3 rd. Unfortunately work files are not SPDE compressed so the reads and writes to work are slow.

17 Work around only do this with great care. Some SAS processes will fail if you follow this idea. EG SAS Graph /* The following code gets SAS to utilise SPDE compression by default for work files be careful!!!*/ options obs=max compress=binary; %let path=%sysfunc(pathname(work)); libname s spde "&path"; run; options user=s ;

18 Questions

19

20 #1 learn SQL As a SAS programmer what things can we do? Data steps describe the path to solving a problem SQL semantically describes the answer and delivers it When joining tables (datasets SQL almost always out performs SAS merge) Short SQL programs can often replace processes with 100 s of lines of code Can avoid complicated data steps containing retain and set by statements

21 Results of testing file types No order no index Regular SAS DS Random order indexed Index ordered & indexed Random order indexed SPDE Index ordered & indexed Elapsed 2m17s 14m34s 1m21 1hr15m54 1m05s Merge User CPU 16s 2m08s 23s 1hr25m13 31s SQL Sys CPU 4s 2m47s 6s 5m13s 6s Elapsed 1m23s 50s 41s 31s 24s User CPU 14s 25s 16s 27s 20s Sys CPU 8s 6s 6s 2s 1s A join between 10 million and 30 million row tables Also shows the benefit of SPDE format for SAS tables (datasets) Note the worst performing SQL join was only just slower than the best merge

22 Joining/merging data There are many ways to join or merge data and give the correct result. All are not equal when choosing strategy what matters is :- Speed of coding Speed of execution (run-time/processor time) Maintenance Size of data

23 Joining/merging data What makes a difference... file sizes sort order indexing technique (merge/sql join/sort/other) will code be run regularly or one-off? Will SPDE help?

24 Join Types Traditional Join Options SQL join Merge

25 Joining/merging data 15 proc sql; 16 create table CALD_Profile as 17 select a.ssr 18,start 19,a.bentype 20,b.COB 21,end 22 from REDlast.benhist as a join 23 redlast.customer as b 24 on a.ssr eq B.ssr 25 where a.end eq ; NOTE: Table WORK.CALD_PROFILE created, with rows and 5 columns. 28 quit; NOTE: PROCEDURE SQL used (Total process time): real time seconds user cpu time seconds system cpu time 1.82 seconds 16 data CALD_Profile_MG (keep=ssr 17 start 18 bentype 19 COB 20 end); 21 Merge REDlast.benhist (in=a) 22 redlast.customer ; 23 by ssr ; 24 if end eq. and a ; 25 run; NOTE: There were observations read from the data set REDLAST.BENHIST. NOTE: There were observations read from the data set REDLAST.CUSTOMER. NOTE: The data set WORK.CALD_PROFILE_MG has observations and 5 variables. NOTE: DATA statement used (Total process time): real time seconds user cpu time seconds system cpu time seconds

26 As a SAS programmer what things can we do? #2 Look after the IO (when data volumes are high) Understand how you can make the cache work for you The order of your data can matter a lot Learn how SPDE format data can help

27 Demo Joining/merging data sort order matters!

28 SPDE makes big difference IO/footprint High density data can actually increase in size ~25% when regular SAS compression is applied SPDE binary compression can cut size of same dataset in half faster to write faster to read! Remember SPDE files CAN NOT BE MOVED outside SAS NOTE: Compressing data set JKN.RAND_ORDER_INDEXED_MG increased size by percent. Compressed is pages; un-compressed would require pages. NOTE: MODIFY was successful for JKS.BENHIST_RAND_ORDER_INDSPD.DATA. NOTE: Compressing data set JKS.BENHIST_RAND_ORDER_INDSPD decreased size by percent.

The Right Read Optimization is Actually Write Optimization. Leif Walsh

The Right Read Optimization is Actually Write Optimization. Leif Walsh The Right Read Optimization is Actually Write Optimization Leif Walsh leif@tokutek.com The Right Read Optimization is Write Optimization Situation: I have some data. I want to learn things about the world,

More information

XP: Backup Your Important Files for Safety

XP: Backup Your Important Files for Safety XP: Backup Your Important Files for Safety X 380 / 1 Protect Your Personal Files Against Accidental Loss with XP s Backup Wizard Your computer contains a great many important files, but when it comes to

More information

It s not the Yellow Brick Road but the SAS PC FILES SERVER will take you Down the LIBNAME PATH= to Using the 64-Bit Excel Workbooks.

It s not the Yellow Brick Road but the SAS PC FILES SERVER will take you Down the LIBNAME PATH= to Using the 64-Bit Excel Workbooks. Paper FP_82 It s not the Yellow Brick Road but the SAS PC FILES SERVER will take you Down the LIBNAME PATH= to Using the 64-Bit Excel Workbooks. ABSTRACT William E Benjamin Jr, Owl Computer Consultancy,

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012 CSE2: Data Abstractions Lecture 7: B Trees James Fogarty Winter 20 The Dictionary (a.k.a. Map) ADT Data: Set of (key, value) pairs keys must be comparable insert(jfogarty,.) Operations: insert(key,value)

More information

Pharmacy college.. Assist.Prof. Dr. Abdullah A. Abdullah

Pharmacy college.. Assist.Prof. Dr. Abdullah A. Abdullah The kinds of memory:- 1. RAM(Random Access Memory):- The main memory in the computer, it s the location where data and programs are stored (temporally). RAM is volatile means that the data is only there

More information

How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan

How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan How To Rock with MyRocks Vadim Tkachenko CTO, Percona Webinar, Jan-16 2019 Agenda MyRocks intro and internals MyRocks limitations Benchmarks: When to choose MyRocks over InnoDB Tuning for the best results

More information

Ten tips for efficient SAS code

Ten tips for efficient SAS code Ten tips for efficient SAS code Host Caroline Scottow Presenter Peter Hobart Managing the webinar In Listen Mode Control bar opened with the white arrow in the orange box Efficiency Overview Optimisation

More information

50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas

50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas Paper 103-26 50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas ABSTRACT When you need to join together two datasets, how do

More information

DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in

DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in versions 8 and 9. that must be used to measure, evaluate,

More information

2

2 1 2 3 4 5 All resources: how fast, how many? If all the CPUs are pegged, that s as fast as you can go. CPUs have followed Moore s law, the rest of the system hasn t. Not everything can be made threaded,

More information

Microservices Smaller is Better? Eberhard Wolff Freelance consultant & trainer

Microservices Smaller is Better? Eberhard Wolff Freelance consultant & trainer Microservices Smaller is Better? Eberhard Wolff Freelance consultant & trainer http://ewolff.com Why Microservices? Why Microservices? Strong modularization Replaceability Small units Sustainable Development

More information

Getting the Most from Hash Objects. Bharath Gowda

Getting the Most from Hash Objects. Bharath Gowda Getting the Most from Hash Objects Bharath Gowda Getting the most from Hash objects Techniques covered are: SQL join Data step merge using BASE engine Data step merge using SPDE merge Index Key lookup

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

DIGITALGLOBE ENHANCES PRODUCTIVITY

DIGITALGLOBE ENHANCES PRODUCTIVITY DIGITALGLOBE ENHANCES PRODUCTIVITY WITH NVIDIA GRID High-performance virtualized desktops transform daily tasks and drastically improve staff efficiency. ABOUT DIGITALGLOBE FIVE REASONS FOR NVIDIA GRID

More information

Processor: Faster and Faster

Processor: Faster and Faster Chapter 4 Processor: Faster and Faster Most of the computers, no matter how it looks, can be cut into five parts: Input/Output brings things in and, once done, sends out the result; a memory remembers

More information

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11 DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance

More information

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I Memory Performance of Algorithms CSE 32 Data Structures Lecture Algorithm Performance Factors Algorithm choices (asymptotic running time) O(n 2 ) or O(n log n) Data structure choices List or Arrays Language

More information

Best Practice for Creation and Maintenance of a SAS Infrastructure

Best Practice for Creation and Maintenance of a SAS Infrastructure Paper 2501-2015 Best Practice for Creation and Maintenance of a SAS Infrastructure Paul Thomas, ASUP Ltd. ABSTRACT The advantage of using metadata to control and maintain data and access to data on databases,

More information

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. Guest Lecture in MIT Performance Engineering, 18 November 2010.

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. Guest Lecture in MIT Performance Engineering, 18 November 2010. 6.172 How Fractal Trees Work 1 How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul Guest Lecture in MIT 6.172 Performance Engineering, 18 November 2010. 6.172 How Fractal Trees Work 2 I m an MIT

More information

THE MORE THINGS CHANGE THE MORE THEY STAY THE SAME FOR BACKUP!

THE MORE THINGS CHANGE THE MORE THEY STAY THE SAME FOR BACKUP! THE MORE THINGS CHANGE THE MORE THEY STAY THE SAME FOR BACKUP! Latest Macrium survey results take a detailed look into the backup and recovery space. INTRODUCTION WHO DID WE SPEAK TO? Where are you responsible

More information

Balancing the pressures of a healthcare SQL Server DBA

Balancing the pressures of a healthcare SQL Server DBA Balancing the pressures of a healthcare SQL Server DBA More than security, compliance and auditing? Working with SQL Server in the healthcare industry presents many unique challenges. The majority of these

More information

Why Hash? Glen Becker, USAA

Why Hash? Glen Becker, USAA Why Hash? Glen Becker, USAA Abstract: What can I do with the new Hash object in SAS 9? Instead of focusing on How to use this new technology, this paper answers Why would I want to? It presents the Big

More information

Another fundamental component of the computer is the main memory.

Another fundamental component of the computer is the main memory. Another fundamental component of the computer is the main memory. The main memory of the computer is called random-access memory (abbreviated to RAM). According to the Von Neumann architecture, the RAM

More information

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015 Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015 Page 1 Example Replicated File Systems NFS Coda Ficus Page 2 NFS Originally NFS did not have any replication capability

More information

The Implications of Multi-core

The Implications of Multi-core The Implications of Multi- What I want to do today Given that everyone is heralding Multi- Is it really the Holy Grail? Will it cure cancer? A lot of misinformation has surfaced What multi- is and what

More information

Enhancing Security With SQL Server How to balance the risks and rewards of using big data

Enhancing Security With SQL Server How to balance the risks and rewards of using big data Enhancing Security With SQL Server 2016 How to balance the risks and rewards of using big data Data s security demands and business opportunities With big data comes both great reward and risk. Every company

More information

In-Memory Data Management Jens Krueger

In-Memory Data Management Jens Krueger In-Memory Data Management Jens Krueger Enterprise Platform and Integration Concepts Hasso Plattner Intitute OLTP vs. OLAP 2 Online Transaction Processing (OLTP) Organized in rows Online Analytical Processing

More information

Estimate performance and capacity requirements for Access Services

Estimate performance and capacity requirements for Access Services Estimate performance and capacity requirements for Access Services This document is provided as-is. Information and views expressed in this document, including URL and other Internet Web site references,

More information

WHITE PAPER. Apache Spark: RDD, DataFrame and Dataset. API comparison and Performance Benchmark in Spark 2.1 and Spark 1.6.3

WHITE PAPER. Apache Spark: RDD, DataFrame and Dataset. API comparison and Performance Benchmark in Spark 2.1 and Spark 1.6.3 WHITE PAPER Apache Spark: RDD, DataFrame and Dataset API comparison and Performance Benchmark in Spark 2.1 and Spark 1.6.3 Prepared by: Eyal Edelman, Big Data Practice Lead Michael Birch, Big Data and

More information

Physical DB Issues, Indexes, Query Optimisation. Database Systems Lecture 13 Natasha Alechina

Physical DB Issues, Indexes, Query Optimisation. Database Systems Lecture 13 Natasha Alechina Physical DB Issues, Indexes, Query Optimisation Database Systems Lecture 13 Natasha Alechina In This Lecture Physical DB Issues RAID arrays for recovery and speed Indexes and query efficiency Query optimisation

More information

Strategy. 1. You must do an internal needs analysis before looking at software or creating an ITT

Strategy. 1. You must do an internal needs analysis before looking at software or creating an ITT Strategy 1. You must do an internal needs analysis before looking at software or creating an ITT It is very easy to jump straight in and look at database software before considering what your requirements

More information

Intro to Algorithms. Professor Kevin Gold

Intro to Algorithms. Professor Kevin Gold Intro to Algorithms Professor Kevin Gold What is an Algorithm? An algorithm is a procedure for producing outputs from inputs. A chocolate chip cookie recipe technically qualifies. An algorithm taught in

More information

Implementing Oracle Database 12c s Heat Map and Automatic Data Optimization to optimize the database storage cost and performance

Implementing Oracle Database 12c s Heat Map and Automatic Data Optimization to optimize the database storage cost and performance Implementing Oracle Database 12c s Heat Map and Automatic Data Optimization to optimize the database storage cost and performance Kai Yu, Senior Principal Engineer, Oracle Solutions Engineering, Dell Inc

More information

Object-Oriented Analysis and Design Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology-Kharagpur

Object-Oriented Analysis and Design Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology-Kharagpur Object-Oriented Analysis and Design Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology-Kharagpur Lecture 06 Object-Oriented Analysis and Design Welcome

More information

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache.

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache. ECE 201 - Lab 8 Logic Design for a Direct-Mapped Cache PURPOSE To understand the function and design of a direct-mapped memory cache. EQUIPMENT Simulation Software REQUIREMENTS Electronic copy of your

More information

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College

Computer Systems C S Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College Computer Systems C S 0 7 Cynthia Lee Today s materials adapted from Kevin Webb at Swarthmore College 2 Today s Topics TODAY S LECTURE: Caching ANNOUNCEMENTS: Assign6 & Assign7 due Friday! 6 & 7 NO late

More information

Massive Scalability With InterSystems IRIS Data Platform

Massive Scalability With InterSystems IRIS Data Platform Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special

More information

Search Lesson Outline

Search Lesson Outline 1. Searching Lesson Outline 2. How to Find a Value in an Array? 3. Linear Search 4. Linear Search Code 5. Linear Search Example #1 6. Linear Search Example #2 7. Linear Search Example #3 8. Linear Search

More information

CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon

CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon The data warehouse environment - like all other computer environments - requires hardware resources. Given the volume of data and the type of processing

More information

Sorting. 4.2 Sorting and Searching. Sorting. Sorting. Insertion Sort. Sorting. Sorting problem. Rearrange N items in ascending order.

Sorting. 4.2 Sorting and Searching. Sorting. Sorting. Insertion Sort. Sorting. Sorting problem. Rearrange N items in ascending order. 4.2 and Searching pentrust.org Introduction to Programming in Java: An Interdisciplinary Approach Robert Sedgewick and Kevin Wayne Copyright 2002 2010 23/2/2012 15:04:54 pentrust.org pentrust.org shanghaiscrap.org

More information

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single

More information

So, coming back to this picture where three levels of memory are shown namely cache, primary memory or main memory and back up memory.

So, coming back to this picture where three levels of memory are shown namely cache, primary memory or main memory and back up memory. Computer Architecture Prof. Anshul Kumar Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture - 31 Memory Hierarchy: Virtual Memory In the memory hierarchy, after

More information

Monitoring Tool Made to Measure for SharePoint Admins. By Stacy Simpkins

Monitoring Tool Made to Measure for SharePoint Admins. By Stacy Simpkins Monitoring Tool Made to Measure for SharePoint Admins By Stacy Simpkins Contents About the Author... 3 Introduction... 4 Who s it for and what all can it do?... 4 SysKit Insights Features... 6 Drillable

More information

CSCI-GA Multicore Processors: Architecture & Programming Lecture 3: The Memory System You Can t Ignore it!

CSCI-GA Multicore Processors: Architecture & Programming Lecture 3: The Memory System You Can t Ignore it! CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 3: The Memory System You Can t Ignore it! Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Memory Computer Technology

More information

Staleness and Isolation in Prometheus 2.0. Brian Brazil Founder

Staleness and Isolation in Prometheus 2.0. Brian Brazil Founder Staleness and Isolation in Prometheus 2.0 Brian Brazil Founder Who am I? One of the core developers of Prometheus Founder of Robust Perception Primary author of Reliable Insights blog Contributor to many

More information

How do you transform your business into a business service center?

How do you transform your business into a business service center? How do you transform your business into a business service center? accelerate your ambition Jessica Constantinidis Next Generation Datacenter Client Transitional Lead Europe Jessica.Constantinidis@dimensiondata.com

More information

Data in the Cloud and Analytics in the Lake

Data in the Cloud and Analytics in the Lake Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)

More information

Notes From SUGI 24. Jack Hamilton. First Health West Sacramento, California. SUGI24OV.DOC 2:45 PM 29 April, 1999 Page 1 of 18

Notes From SUGI 24. Jack Hamilton. First Health West Sacramento, California. SUGI24OV.DOC 2:45 PM 29 April, 1999 Page 1 of 18 Notes From SUGI 24 Jack Hamilton First Health West Sacramento, California SUGI24OV.DOC 2:45 PM 29 April, 1999 Page 1 of 18 No News: Good News? The main news is that there s not much new of general interest

More information

4.1, 4.2 Performance, with Sorting

4.1, 4.2 Performance, with Sorting 1 4.1, 4.2 Performance, with Sorting Running Time As soon as an Analytic Engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question

More information

_APP A_541_10/31/06. Appendix A. Backing Up Your Project Files

_APP A_541_10/31/06. Appendix A. Backing Up Your Project Files 1-59863-307-4_APP A_541_10/31/06 Appendix A Backing Up Your Project Files At the end of every recording session, I back up my project files. It doesn t matter whether I m running late or whether I m so

More information

Main Memory (RAM) Organisation

Main Memory (RAM) Organisation Main Memory (RAM) Organisation Computers employ many different types of memory (semi-conductor, magnetic disks, USB sticks, DVDs etc.) to hold data and programs. Each type has its own characteristics and

More information

Generalising LUTI Models to Systems of Cities: Web-Based Interfaces to Simulation

Generalising LUTI Models to Systems of Cities: Web-Based Interfaces to Simulation 1-4 November 2016 Generalising LUTI Models to Systems of Cities: Web-Based Interfaces to Simulation Michael Batty m.batty@ucl.ac.uk @jmichaelbatty http://www.complexcity.info/ http://www.spatialcomplexity.info/

More information

Managing the Database

Managing the Database Slide 1 Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage and handling of physical data. To appreciate

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015 CS 3: Intro to Systems Caching Kevin Webb Swarthmore College March 24, 205 Reading Quiz Abstraction Goal Reality: There is no one type of memory to rule them all! Abstraction: hide the complex/undesirable

More information

Native POSIX Thread Library (NPTL) CSE 506 Don Porter

Native POSIX Thread Library (NPTL) CSE 506 Don Porter Native POSIX Thread Library (NPTL) CSE 506 Don Porter Logical Diagram Binary Memory Threads Formats Allocators Today s Lecture Scheduling System Calls threads RCU File System Networking Sync User Kernel

More information

Elementary IR: Scalable Boolean Text Search. (Compare with R & G )

Elementary IR: Scalable Boolean Text Search. (Compare with R & G ) Elementary IR: Scalable Boolean Text Search (Compare with R & G 27.1-3) Information Retrieval: History A research field traditionally separate from Databases Hans P. Luhn, IBM, 1959: Keyword in Context

More information

Outline. Course Administration /6.338/SMA5505. Parallel Machines in Applications Special Approaches Our Class Computer.

Outline. Course Administration /6.338/SMA5505. Parallel Machines in Applications Special Approaches Our Class Computer. Outline Course Administration 18.337/6.338/SMA5505 Parallel Machines in 2003 Overview Details Applications Special Approaches Our Class Computer Parallel Computer Architectures MPP Massively Parallel Processors

More information

A Technical Marketing Document

A Technical Marketing Document A Technical Marketing Document By Daniel Verbin TWR2009 July 2013 Contents Executive Summary... 2 Introduction... 3 Technical Overview... 4 New and Enhanced Features... 6 Success Stories. 10 Conclusion..

More information

Tips & Tricks. With lots of help from other SUG and SUGI presenters. SAS HUG Meeting, November 18, 2010

Tips & Tricks. With lots of help from other SUG and SUGI presenters. SAS HUG Meeting, November 18, 2010 Tips & Tricks With lots of help from other SUG and SUGI presenters 1 SAS HUG Meeting, November 18, 2010 2 3 Sorting Threads Multi-threading available if your computer has more than one processor (CPU)

More information

Flash Decisions: Which Solution is Right for You?

Flash Decisions: Which Solution is Right for You? Flash Decisions: Which Solution is Right for You? A Guide to Finding the Right Flash Solution Introduction Chapter 1: Why Flash Storage Now? Chapter 2: Flash Storage Options Chapter 3: Choosing the Right

More information

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering CSE 141: Computer 0 Architecture Professor: Michael Taylor RF UCSD Department of Computer Science & Engineering Computer Architecture from 10,000 feet foo(int x) {.. } Class of application Physics Computer

More information

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1 MySQL UC 2010 How Fractal Trees Work 1 How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul MySQL UC 2010 How Fractal Trees Work 2 More Information You can download this talk and others at http://tokutek.com/technology

More information

Archive-Tools. Powering your performance

Archive-Tools. Powering your performance Archive-Tools Powering your performance Archive-Tools Go for Smaller. Better. Faster. Stronger. Archive-Tools help you maximize your Return on Investment. Our products are designed to prolong the life

More information

ECE 152 Introduction to Computer Architecture

ECE 152 Introduction to Computer Architecture Introduction to Computer Architecture Main Memory and Virtual Memory Copyright 2009 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) Spring 2009 1 Where We Are in This Course

More information

New Vs. Old Under the Hood with Procs CONTENTS and COMPARE Patricia Hettinger, SAS Professional, Oakbrook Terrace, IL

New Vs. Old Under the Hood with Procs CONTENTS and COMPARE Patricia Hettinger, SAS Professional, Oakbrook Terrace, IL Paper SS-03 New Vs. Old Under the Hood with Procs CONTENTS and COMPARE Patricia Hettinger, SAS Professional, Oakbrook Terrace, IL ABSTRACT There s SuperCE for comparing text files on the mainframe. Diff

More information

Lecture S3: File system data layout, naming

Lecture S3: File system data layout, naming Lecture S3: File system data layout, naming Review -- 1 min Intro to I/O Performance model: Log Disk physical characteristics/desired abstractions Physical reality Desired abstraction disks are slow fast

More information

Computer Science 432/563 Operating Systems The College of Saint Rose Spring Topic Notes: Memory Hierarchy

Computer Science 432/563 Operating Systems The College of Saint Rose Spring Topic Notes: Memory Hierarchy Computer Science 432/563 Operating Systems The College of Saint Rose Spring 2016 Topic Notes: Memory Hierarchy We will revisit a topic now that cuts across systems classes: memory hierarchies. We often

More information

Bits and Bytes. Here is a sort of glossary of computer buzzwords you will encounter in computer use:

Bits and Bytes. Here is a sort of glossary of computer buzzwords you will encounter in computer use: Bits and Bytes Here is a sort of glossary of computer buzzwords you will encounter in computer use: Bit Computer processors can only tell if a wire is on or off. Luckily, they can look at lots of wires

More information

Lecture 12. Lecture 12: The IO Model & External Sorting

Lecture 12. Lecture 12: The IO Model & External Sorting Lecture 12 Lecture 12: The IO Model & External Sorting Announcements Announcements 1. Thank you for the great feedback (post coming soon)! 2. Educational goals: 1. Tech changes, principles change more

More information

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data

More information

(Refer Slide Time 3:31)

(Refer Slide Time 3:31) Digital Circuits and Systems Prof. S. Srinivasan Department of Electrical Engineering Indian Institute of Technology Madras Lecture - 5 Logic Simplification In the last lecture we talked about logic functions

More information

The attendee will get a deep dive into all the DDL changes needed in order to exploit DB2 V10 Temporal tables as well as the limitations.

The attendee will get a deep dive into all the DDL changes needed in order to exploit DB2 V10 Temporal tables as well as the limitations. The attendee will get a deep dive into all the DDL changes needed in order to exploit DB2 V10 Temporal tables as well as the limitations. A case study scenario using a live DB2 V10 system will be used

More information

Background. Let s see what we prescribed.

Background. Let s see what we prescribed. Background Patient B s custom application had slowed down as their data grew. They d tried several different relief efforts over time, but performance issues kept popping up especially deadlocks. They

More information

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi. Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data

More information

Media-Ready Network Transcript

Media-Ready Network Transcript Media-Ready Network Transcript Hello and welcome to this Cisco on Cisco Seminar. I m Bob Scarbrough, Cisco IT manager on the Cisco on Cisco team. With me today are Sheila Jordan, Vice President of the

More information

Chapter 2: Universal Building Blocks. CS105: Great Insights in Computer Science

Chapter 2: Universal Building Blocks. CS105: Great Insights in Computer Science Chapter 2: Universal Building Blocks CS105: Great Insights in Computer Science Homework 1 It is now available on our website. Answer questions in Word or any text editor and upload it via sakai. No paper

More information

5 Fundamental Strategies for Building a Data-centered Data Center

5 Fundamental Strategies for Building a Data-centered Data Center 5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse

More information

The TokuFS Streaming File System

The TokuFS Streaming File System The TokuFS Streaming File System John Esmet Tokutek & Rutgers Martin Farach-Colton Tokutek & Rutgers Michael A. Bender Tokutek & Stony Brook Bradley C. Kuszmaul Tokutek & MIT First, What are we going to

More information

Scalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX

Scalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX Scalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX ABSTRACT Symmetric multiprocessor (SMP) computers can increase performance by reducing the time required to analyze large volumes

More information

Extreme Storage Performance with exflash DIMM and AMPS

Extreme Storage Performance with exflash DIMM and AMPS Extreme Storage Performance with exflash DIMM and AMPS 214 by 6East Technologies, Inc. and Lenovo Corporation All trademarks or registered trademarks mentioned here are the property of their respective

More information

An Annotated Guide: The New 9.1, Free & Fast SPDE Data Engine Russ Lavery, Ardmore PA, Independent Contractor Ian Whitlock, Kennett Square PA

An Annotated Guide: The New 9.1, Free & Fast SPDE Data Engine Russ Lavery, Ardmore PA, Independent Contractor Ian Whitlock, Kennett Square PA An Annotated Guide: The New 9.1, Free & Fast SPDE Data Engine Russ Lavery, Ardmore PA, Independent Contractor Ian Whitlock, Kennett Square PA ABSTRACT SAS has been working hard to decrease clock time to

More information

TWOO.COM CASE STUDY CUSTOMER SUCCESS STORY

TWOO.COM CASE STUDY CUSTOMER SUCCESS STORY TWOO.COM CUSTOMER SUCCESS STORY With over 30 million users, Twoo.com is Europe s leading social discovery site. Twoo runs the world s largest scale-out SQL deployment, with 4.4 billion transactions a day

More information

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju Distributed Data Infrastructures, Fall 2017, Chapter 2 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note: Term Warehouse-scale

More information

Computer Architecture Review. ICS332 - Spring 2016 Operating Systems

Computer Architecture Review. ICS332 - Spring 2016 Operating Systems Computer Architecture Review ICS332 - Spring 2016 Operating Systems ENIAC (1946) Electronic Numerical Integrator and Calculator Stored-Program Computer (instead of Fixed-Program) Vacuum tubes, punch cards

More information

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18 PROCESS VIRTUAL MEMORY CS124 Operating Systems Winter 2015-2016, Lecture 18 2 Programs and Memory Programs perform many interactions with memory Accessing variables stored at specific memory locations

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis Test Loads CS 147: Computer Systems Performance Analysis Test Loads 1 / 33 Overview Overview Overview 2 / 33 Test Load Design Test Load Design Test Load Design

More information

Vertica s Design: Basics, Successes, and Failures

Vertica s Design: Basics, Successes, and Failures Vertica s Design: Basics, Successes, and Failures Chuck Bear CIDR 2015 January 5, 2015 1. Vertica Basics: Storage Format Design Goals SQL (for the ecosystem and knowledge pool) Clusters of commodity hardware

More information

Index Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search

Index Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search Index Construction Dictionary, postings, scalable indexing, dynamic indexing Web Search 1 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing

More information

Computers: Inside and Out

Computers: Inside and Out Computers: Inside and Out Computer Components To store binary information the most basic components of a computer must exist in two states State # 1 = 1 State # 2 = 0 1 Transistors Computers use transistors

More information

Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)

Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5) Memory Hierarchy Why are you dressed like that? Halloween was weeks ago! It makes me look faster, don t you think? Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity (Study

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2014 Lecture 14 LAST TIME! Examined several memory technologies: SRAM volatile memory cells built from transistors! Fast to use, larger memory cells (6+ transistors

More information

Anomaly Detection vs. Pattern Recognition

Anomaly Detection vs. Pattern Recognition WHITEPAPER The difference between anomaly detection and pattern recognition nd Scenarios and use cases Anomaly Detection vs. Pattern Recognition By Dennis Zimmer CEO opvizor GmbH, VMware vexpert, VCP,

More information

Subject: Top-Paying IT Certificates for 2015 (And Our New Courses)

Subject: Top-Paying IT Certificates for 2015 (And Our New Courses) ITProTV Emails What You Missed Email #1 Subject: Top-Paying IT Certificates for 2015 (And Our New Courses) If you re like me you re already thinking about your 2015 goals. So I thought I d share a few

More information

The 21 WORD . That Can Get You More Clients. Ian Brodie

The 21 WORD  . That Can Get You More Clients. Ian Brodie The 21 WORD EMAIL That Can Get You More Clients Ian Brodie The 21 Word Email That Can Get You More Clients Hey there! Welcome to this short report on the 21 Word Email That Can Get You More Clients If

More information

THE TRUTH ABOUT SEARCH 2.0

THE TRUTH ABOUT SEARCH 2.0 THE TRUTH ABOUT SEARCH 2.0 SEO A WORLD OF PERPETUAL CHANGE Twelve months ago we launched the Truth About Search in a bid to simplify exactly what is going on in the world of search. Within the last year

More information

Don t Forget About SMALL Data

Don t Forget About SMALL Data Don t Forget About SMALL Data Lisa Eckler Lisa Eckler Consulting Inc. September 25, 2015 Outline Why does Small Data matter? Defining Small Data Where to look for it How to use it examples What else might

More information

4.2 Sorting and Searching. Section 4.2

4.2 Sorting and Searching. Section 4.2 4.2 Sorting and Searching 1 Sequential Search Scan through array, looking for key. Search hit: return array index. Search miss: return -1. public static int search(string key, String[] a) { int N = a.length;

More information

Introduction to OS. Introduction MOS Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1

Introduction to OS. Introduction MOS Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1 Introduction to OS Introduction MOS 1.1 1.3 Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Introduction to OS 1 Why an Operating Systems course? Understanding of inner workings of systems

More information