Big Data Pragmaticalities Experiences from Time Series Remote Sensing
|
|
- Gabriella Hill
- 5 years ago
- Views:
Transcription
1 Big Data Pragmaticalities Experiences from Time Series Remote Sensing Edward King Remote Sensing & Software Team Leader 3 September 2013 MARINE & ATMOSPHERIC RESEARCH
2 Overview Remote sensing (RS) and RS time series (type of processing & scale) Opportunities for parallelism Compute versus Data Scientific programming versus software engineering Some handy techniques Where next 2 Big Data Pragmaticalities
3 Automated data collection. 3 Big Data Pragmaticalities
4 Presto! Big Data(sets). 4 Big Data Pragmaticalities
5 More Detail Composites Remapped L2 (derived quantity) L1B (calibrated) L0 (raw sensor) Examples 1km imagery 3000 scenes/year x 500MB/scene x 10 years = 15TB 500m imagery x 4 = 60TB
6 Recap - Big Picture View These archives are large They are often only stored in raw format We usually need to do some significant amount of processing to extract the geophysical variable(s) of interest We often need to process the whole archive to achieve consistency in the data As scientists, unless you have a background in high performance computing and data intensive science, this is a daunting prospect. There are things that can make it easier 6 Big Data Pragmaticalities
7 Output types Scenes: User Composites: + + = best pixels User + + = etc 7 Big Data Pragmaticalities
8 Things to notice Some operations are done over and over again to data from different times. For example: processing Monday data and Tuesday data are independent This is an opportunity to do things in parallel (ie all at the same time) Operations on one place in the data are completely independent to operations in other places. For example: Processing data from WA doesn t depend on data from Tas. This is another opportunity to do things in parallel (ie all at the same time) 8 Big Data Pragmaticalities
9 12 th ARSPC - Fremantle Note: This general pattern is often referred to as a HADOOP or MAP- REDUCE system, and there are software frameworks that formalise it eg it lies behind Google search indexing. (Disclaimer: I ve never used one)
10 So what? Our previous example 10yrs x mins/scene = 5000hrs = 30weeks Give me 200 CPUs = 25hours But what about the data flux? 15TB/30 weeks = 3 GB/hour 15TB/25 hours = 600 GB/hour ~0.5GB Problem is transformed from compute bound to I/O bound 10 Big Data Pragmaticalities
11 Key tradeoff #1: Can you supply data fast enough to make the most of your computing? How much effort you put into this depends on How big is your data set How much computing you have available How many times you have to do it How soon you need your result Figuring out how to balance data organisation and supply against time spent computing is key to getting the best results. Unless you have an extraordinarily computationally intensive algorithm, you re (usually) better off focussing on steps to speed up data. 11 Big Data Pragmaticalities
12 Computing Clusters Workstation 2 CPUs (15 weeks) NCI (now obsolete) CPUs (20 mins) My first (& last) cluster (2002) 20 CPUs (1.5 weeks) 12 Big Data Pragmaticalities
13 Plumbing & Software Somehow we have to connect data to operations: Operations = atmosphere correction remap calibrate mycleveralgorithm Might be pre-existing packages Your own special code (Fortran, C, Python,. Matlab, IDL) Connect = provide the right data to the right operation and collect the results Usually you will use a scripting language since you need: To work with the operating system Run programs Analyse file names Maybe read log files to see if something went wrong Software for us is like glassware in a chem lab: a specialised setup for our experiments; you can get components off the shelf, but only you know how you want to connect them together. Bottom line you re going to be doing some programming of some sort. 13 Big Data Pragmaticalities
14 Scientific Programming versus Software Engineering (Key Tradeoff #2) Do you want to do this processing only once, or many times? Which parts of your workflow are repeated, which are one-off? Eg base processing many times, followed by one-off analysis experiments How does the cost of your time spent programming compare with the availability of computing and time spent running your workflow? Why spend a week making something twice as fast if it already runs in two days? (maybe because you need to do it many times?) Will you need to understand it later? 14 Big Data Pragmaticalities
15 Proprietary fly in the ointment (#1) If you use licenced software (IDL, Matlab etc.) you need licences for each CPU you want to run on. This may mean you can t use anything like as much computing as you otherwise could. These languages are good for prototyping and testing But, to really make the most of modern computing, you need to escape the licencing encumbrance = migrate to free software. PS: Windows is licenced software Example: we have complex IDL code that we run on a big data set at the NCI. We have only 4 licences. It runs in a week (6 days). If we had 50 licences -> 12hours. We can live with that since there would be weeks and weeks of coding and testing to port to Python. 15 Big Data Pragmaticalities
16 How to do it
17 Maximise performance by 1. Minimise the amount of programming you do Exploit existing tools (eg std. processing packages, operating system cmds) Write things you can re-use (data access, logging tools) Choose file names that make it easy to figure out what to do Use the file-system as your database. 2. Maximise your ability to use multiple CPUs Eliminate unnecessary differences (eg data formats, standards) Look for opportunities to parallelise Avoid licencing (eg proprietary data formats, libraries, languages) 3. Seek data movement efficiency everywhere Data layout Compression RAM disks 4. Minimise the number of times you have to run your workflow Log everything (so there is no uncertainty about whether you did what you think you did) 17 Big Data Pragmaticalities
18 RAM disks Tapes are slow Disks are less slow Memory is even less slow Cache is fast but small Most modern systems have multiple GB of RAM for each CPU, which you can assign to working memory and as virtual disk. TAPE DISK RAM CPU Cache If you have multiple processing steps, which need intermediate file storage use a RAM disk. Can get a factor of 10 improvement. 18 Big Data Pragmaticalities
19 Compression Data that is half the size takes half as long to move (but then you have to uncompress it but CPUs are faster than disks) Zip, gzip will usually get you a factor of 2-4 compression Bzip2 is often 10-15% better BUT it is much slower (factor of 5). Don t store random precision (3.14 compresses more than ) Avoid recompressing (treat compressed archive as read-only, ie copyuncompress-use-delete, DO NOT move-uncompress-use-recompressmoveback) Remote Disk File.gz File RAM CPU (decompression) 19 Big Data Pragmaticalities
20 Data Layout Look at your data access patterns and organise your code/data to match Eg 1. if your analysis uses multiple files repeatedly, reorganise the data so you reduce the number of open & close operations Eg 2. Big files tend to end up as contiguous blocks on a disk, so try and localise access to data, not jumping around which will entail waiting for the disk. Access by row Access by column 20 Big Data Pragmaticalities
21 Data Formats (and metadata) This is still a religious subject, factors to consider: Avoid proprietary (may need licences or libraries for undocumented formats) versus open formats that are publicly documented Self-contained (keep header (metadata) and data together) Self-documenting formats have structure that can be decoded using only information already in the file Architectural independence will work on different computers Storage efficiency binary versus ascii Access efficiency and flexibility support for different layouts Interoperability openness and standard conformance = reuse Need some conventions around metadata for consistency Automated metadata harvest (for indexing/cataloguing) Longevity (& migration) Answer: use netcdf or HDF (or maybe FITS in astronomy) 21 Big Data Pragmaticalities
22 The file-system is my database Often in your multi-step processing of 1000s of files you will want to use a database to keep track of things DON T! Every time you do something, you have to update the DB It doesn t usually take long before inconsistencies arise (eg someone deletes a file by hand). Databases are a pain to work with by hand (SQL syntax, forgettable rules) Use the file-system (folders, filenames) to keep track. Egs: once file.nc has been processed, rename it to file.nc.done and just have your processing look for files *.nc. (rename it back to file.nc to run it again, use ls or dir to see where things are up to, and rm to get rid of things that didn t work). Create zero size files as breadcrumbs touch file.nc.fail.step2 ls *.FAIL.* to see how many failures there were and at what step Use directories to group data that need to be grouped for example all files for a particular composite. 22 Big Data Pragmaticalities
23 Filenames are really important Filenames are a good place to store metadata relevant to the processing workflow: They re easy to access without opening the file You can use file system tools to select data Use YYYYMMDD (or YYYYddd) for dates in filenames then they will automatically sort into time order (cf DDMMYY, DDmonYYYY) Make it easy to get metadata out of file names: Fixed width numerical fields (F1A.dat, F10B.dat, F100C.dat is harder to interpret by program than F001A.dat, F010B.dat, F100C.dat) Structured names but don t go overboard! D G-1455.P-aqua.C T-d000000n S-n.pds Eg. ls *.G-1[234]* to choose files at a particular time of day 23 Big Data Pragmaticalities
24 Logging and Provenance Every time you do something (move data, feed it to a program, put it somewhere): write a time-stamped message to a log file. Write a function that automatically prepends a timestamp to a piece of text you give to it. Time-stamps are really useful for profiling identifying where the bottlenecks are, or figuring out if something has gone wrong. Huge log files are a tiny marginal overhead Make them easy to read by program (eg grep) Make your processing code report a version (number, or description), and its inputs, to the log file. Write the log file into the output data file as a final step. This lets you understand what you did months later (so you don t do it again) Keeps the relevant log file with the data (so you don t lose it, or mix it up) 24 Big Data Pragmaticalities
25 Final Thoughts Most of this is applicable to other data intensive parallel processing tasks Eg. spatio-temporal model output grids Advantages may vary depending on file size Data organisation has many subtleties a little work in understanding can offer great returns in performance Keep an eye on file format capabilities More CPUs is a double edged sword Data efficiency will only become more important Haven t really touched on spatial metadata (v. important for ease of end-use/analysis but tedious (=automatable)) Get your data into a self-documenting machine-readable open file format and you ll never have to reformat by hand again. These are things we now do out of habit because they work for us Perhaps they ll work for you? 25 Big Data Pragmaticalities
26 Thank you Marine & Atmospheric Research Edward King Team Leader: Remote Sensing & Software t e edward.king@csiro.au w MARINE & ATMOSPHERIC RESEARCH
7. Archiving and compressing 7.1 Introduction
7. Archiving and compressing 7.1 Introduction In this chapter, we discuss how to manage archive files at the command line. File archiving is used when one or more files need to be transmitted or stored
More informationPart 6b: The effect of scale on raster calculations mean local relief and slope
Part 6b: The effect of scale on raster calculations mean local relief and slope Due: Be done with this section by class on Monday 10 Oct. Tasks: Calculate slope for three rasters and produce a decent looking
More informationTriton file systems - an introduction. slide 1 of 28
Triton file systems - an introduction slide 1 of 28 File systems Motivation & basic concepts Storage locations Basic flow of IO Do's and Don'ts Exercises slide 2 of 28 File systems: Motivation Case #1:
More informationComputer Basics 1/24/13. Computer Organization. Computer systems consist of hardware and software.
Hardware and Software Computer Basics TOPICS Computer Organization Data Representation Program Execution Computer Languages Computer systems consist of hardware and software. Hardware includes the tangible
More informationOperating System Principles: Memory Management Swapping, Paging, and Virtual Memory CS 111. Operating Systems Peter Reiher
Operating System Principles: Memory Management Swapping, Paging, and Virtual Memory Operating Systems Peter Reiher Page 1 Outline Swapping Paging Virtual memory Page 2 Swapping What if we don t have enough
More informationArchive II. The archive. 26/May/15
Archive II The archive 26/May/15 What is an archive? Is a service that provides long-term storage and access of data. Long-term usually means ~5years or more. Archive is strictly not the same as a backup.
More informationXP: Backup Your Important Files for Safety
XP: Backup Your Important Files for Safety X 380 / 1 Protect Your Personal Files Against Accidental Loss with XP s Backup Wizard Your computer contains a great many important files, but when it comes to
More informationLesson 9 Transcript: Backup and Recovery
Lesson 9 Transcript: Backup and Recovery Slide 1: Cover Welcome to lesson 9 of the DB2 on Campus Lecture Series. We are going to talk in this presentation about database logging and backup and recovery.
More informationComputer Basics 1/6/16. Computer Organization. Computer systems consist of hardware and software.
Hardware and Software Computer Basics TOPICS Computer Organization Data Representation Program Execution Computer Languages Computer systems consist of hardware and software. Hardware includes the tangible
More informationIntroduction to Remote Sensing Wednesday, September 27, 2017
Lab 3 (200 points) Due October 11, 2017 Multispectral Analysis of MASTER HDF Data (ENVI Classic)* Classification Methods (ENVI Classic)* SAM and SID Classification (ENVI Classic) Decision Tree Classification
More informationMemory Management: Virtual Memory and Paging CS 111. Operating Systems Peter Reiher
Memory Management: Virtual Memory and Paging Operating Systems Peter Reiher Page 1 Outline Paging Swapping and demand paging Virtual memory Page 2 Paging What is paging? What problem does it solve? How
More informationDeveloping MapReduce Programs
Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models RCFile: A Fast and Space-efficient Data
More informationIntroduction to High Performance Parallel I/O
Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing
More informationMemory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)
Memory Hierarchy Why are you dressed like that? Halloween was weeks ago! It makes me look faster, don t you think? Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity (Study
More informationOpen Data Standards for Administrative Data Processing
University of Pennsylvania ScholarlyCommons 2018 ADRF Network Research Conference Presentations ADRF Network Research Conference Presentations 11-2018 Open Data Standards for Administrative Data Processing
More informationExcel Basics: Working with Spreadsheets
Excel Basics: Working with Spreadsheets E 890 / 1 Unravel the Mysteries of Cells, Rows, Ranges, Formulas and More Spreadsheets are all about numbers: they help us keep track of figures and make calculations.
More informationWhat to do with Scientific Data? Michael Stonebraker
What to do with Scientific Data? by Michael Stonebraker Outline Science data what it looks like Hardware options for deployment Software options RDBMS Wrappers on RDBMS SciDB Courtesy of LSST. Used with
More informationMaking the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor
Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack Chief Architect RainStor Agenda Importance of Hadoop + data compression Data compression techniques Compression,
More informationIntroduction to Computing Systems - Scientific Computing's Perspective. Le Yan LSU
Introduction to Computing Systems - Scientific Computing's Perspective Le Yan HPC @ LSU 5/28/2017 LONI Scientific Computing Boot Camp 2018 Why We Are Here For researchers, understand how your instrument
More informationComputer Caches. Lab 1. Caching
Lab 1 Computer Caches Lab Objective: Caches play an important role in computational performance. Computers store memory in various caches, each with its advantages and drawbacks. We discuss the three main
More informationDealing with Large Datasets. or, So I have 40TB of data.. Jonathan Dursi, SciNet/CITA, University of Toronto
Dealing with Large Datasets or, So I have 40TB of data.. Jonathan Dursi, SciNet/CITA, University of Toronto Data is getting bigger Increase in computing power makes simulations larger/more frequent Increase
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 2: MapReduce Algorithm Design (2/2) January 14, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationTesting is a very big and important topic when it comes to software development. Testing has a number of aspects that need to be considered.
Testing Testing is a very big and important topic when it comes to software development. Testing has a number of aspects that need to be considered. System stability is the system going to crash or not?
More informationExadata X3 in action: Measuring Smart Scan efficiency with AWR. Franck Pachot Senior Consultant
Exadata X3 in action: Measuring Smart Scan efficiency with AWR Franck Pachot Senior Consultant 16 March 2013 1 Exadata X3 in action: Measuring Smart Scan efficiency with AWR Exadata comes with new statistics
More informationActian Hybrid Data Conference 2017 London Actian Corporation
Actian Hybrid Data Conference 2017 London 1 2017 Actian Corporation Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this
More informationHow to Rescue a Deleted File Using the Free Undelete 360 Program
R 095/1 How to Rescue a Deleted File Using the Free Program This article shows you how to: Maximise your chances of recovering the lost file View a list of all your deleted files in the free Restore a
More informationCASE STUDY IT. Albumprinter Adopting Redgate DLM
CASE STUDY IT Albumprinter Adopting Redgate DLM "Once the team saw they could deploy all their database changes error-free at the click of a button, with no more manual scripts, it spread by word of mouth.
More informationDesigning dashboards for performance. Reference deck
Designing dashboards for performance Reference deck Basic principles 1. Everything in moderation 2. If it isn t fast in database, it won t be fast in Tableau 3. If it isn t fast in desktop, it won t be
More informationPlot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;
How will execution time grow with SIZE? int array[size]; int A = ; for (int i = ; i < ; i++) { for (int j = ; j < SIZE ; j++) { A += array[j]; } TIME } Plot SIZE Actual Data 45 4 5 5 Series 5 5 4 6 8 Memory
More informationHigh Performance Data Efficient Interoperability for Scientific Data
High Performance Data Efficient Interoperability for Scientific Data Alex Ip 1, Andrew Turner 1, Dr. David Lescinsky 1 1 Geoscience Australia, Canberra, Australia Problem: Legacy Data Formats holding us
More informationLearning to Provide Modern Solutions
1 Learning to Provide Modern Solutions Over the course of this book, you will learn to enhance your existing applications to modernize the output of the system. To do this, we ll take advantage of the
More informationMemory Management. Kevin Webb Swarthmore College February 27, 2018
Memory Management Kevin Webb Swarthmore College February 27, 2018 Today s Goals Shifting topics: different process resource memory Motivate virtual memory, including what it might look like without it
More informationUtilities. September 8, 2015
Utilities September 8, 2015 Useful ideas Listing files and display text and binary files Copy, move, and remove files Search, sort, print, compare files Using pipes Compression and archiving Your fellow
More informationLecture 1: Overview
15-150 Lecture 1: Overview Lecture by Stefan Muller May 21, 2018 Welcome to 15-150! Today s lecture was an overview that showed the highlights of everything you re learning this semester, which also meant
More informationPractical Unix exercise MBV INFX410
Practical Unix exercise MBV INFX410 We will in this exercise work with a practical task that, it turns out, can easily be solved by using basic Unix. Let us pretend that an engineer in your group has spent
More informationSlide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationPhysical Representation of Files
Physical Representation of Files A disk drive consists of a disk pack containing one or more platters stacked like phonograph records. Information is stored on both sides of the platter. Each platter is
More informationIf you re using a Mac, follow these commands to prepare your computer to run these demos (and any other analysis you conduct with the Audio BNC
If you re using a Mac, follow these commands to prepare your computer to run these demos (and any other analysis you conduct with the Audio BNC sample). All examples use your Workshop directory (e.g. /Users/peggy/workshop)
More informationICS Principles of Operating Systems
ICS 143 - Principles of Operating Systems Lectures 17-20 - FileSystem Interface and Implementation Prof. Ardalan Amiri Sani Prof. Nalini Venkatasubramanian ardalan@ics.uci.edu nalini@ics.uci.edu Outline
More informationOutlook is easier to use than you might think; it also does a lot more than. Fundamental Features: How Did You Ever Do without Outlook?
04 537598 Ch01.qxd 9/2/03 9:46 AM Page 11 Chapter 1 Fundamental Features: How Did You Ever Do without Outlook? In This Chapter Reading e-mail Answering e-mail Creating new e-mail Entering an appointment
More informationCPU Pipelining Issues
CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput
More informationCS15100 Lab 7: File compression
C151 Lab 7: File compression Fall 26 November 14, 26 Complete the first 3 chapters (through the build-huffman-tree function) in lab (optionally) with a partner. The rest you must do by yourself. Write
More informationCS 101, Mock Computer Architecture
CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically
More informationTaskbar: Working with Several Windows at Once
Taskbar: Working with Several Windows at Once Your Best Friend at the Bottom of the Screen How to Make the Most of Your Taskbar The taskbar is the wide bar that stretches across the bottom of your screen,
More informationWhite Paper. How the Meltdown and Spectre bugs work and what you can do to prevent a performance plummet. Contents
White Paper How the Meltdown and Spectre bugs work and what you can do to prevent a performance plummet Programs that do a lot of I/O are likely to be the worst hit by the patches designed to fix the Meltdown
More informationWhat is a file system
COSC 6397 Big Data Analytics Distributed File Systems Edgar Gabriel Spring 2017 What is a file system A clearly defined method that the OS uses to store, catalog and retrieve files Manage the bits that
More informationDELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE
WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily
More informationIntroduction to Unix: Fundamental Commands
Introduction to Unix: Fundamental Commands Ricky Patterson UVA Library Based on slides from Turgut Yilmaz Istanbul Teknik University 1 What We Will Learn The fundamental commands of the Unix operating
More informationLecture 16. Today: Start looking into memory hierarchy Cache$! Yay!
Lecture 16 Today: Start looking into memory hierarchy Cache$! Yay! Note: There are no slides labeled Lecture 15. Nothing omitted, just that the numbering got out of sequence somewhere along the way. 1
More informationLecture 3. Essential skills for bioinformatics: Unix/Linux
Lecture 3 Essential skills for bioinformatics: Unix/Linux RETRIEVING DATA Overview Whether downloading large sequencing datasets or accessing a web application hundreds of times to download specific files,
More informationChapter 6. File Systems
Chapter 6 File Systems 6.1 Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems 350 Long-term Information Storage 1. Must store large amounts of data 2. Information stored must
More informationThe Evolution of a Data Project
The Evolution of a Data Project The Evolution of a Data Project Python script The Evolution of a Data Project Python script SQL on live DB The Evolution of a Data Project Python script SQL on live DB SQL
More informationThe Right Read Optimization is Actually Write Optimization. Leif Walsh
The Right Read Optimization is Actually Write Optimization Leif Walsh leif@tokutek.com The Right Read Optimization is Write Optimization Situation: I have some data. I want to learn things about the world,
More informationEssential Skills for Bioinformatics: Unix/Linux
Essential Skills for Bioinformatics: Unix/Linux WORKING WITH COMPRESSED DATA Overview Data compression, the process of condensing data so that it takes up less space (on disk drives, in memory, or across
More informationChapter 1: Introduction to Computer Science and Media Computation
Chapter 1: Introduction to Computer Science and Media Computation Story What is computer science about? What computers really understand, and where Programming Languages fit in Media Computation: Why digitize
More informationI/O: State of the art and Future developments
I/O: State of the art and Future developments Giorgio Amati SCAI Dept. Rome, 18/19 May 2016 Some questions Just to know each other: Why are you here? Which is the typical I/O size you work with? GB? TB?
More informationScaling Without Sharding. Baron Schwartz Percona Inc Surge 2010
Scaling Without Sharding Baron Schwartz Percona Inc Surge 2010 Web Scale!!!! http://www.xtranormal.com/watch/6995033/ A Sharding Thought Experiment 64 shards per proxy [1] 1 TB of data storage per node
More informationOcean Color Data Formats and Conventions:
Ocean Color Data Formats and Conventions: NASA's perspective Sean Bailey NASA Goddard Space Flight Center 07 May 2013 International Ocean Color Science Meeting Darmstadt, Germany 1 The Big Picture The
More informationLeakDAS Version 4 The Complete Guide
LeakDAS Version 4 The Complete Guide SECTION 4 LEAKDAS MOBILE Second Edition - 2014 Copyright InspectionLogic 2 Table of Contents CONNECTING LEAKDAS MOBILE TO AN ANALYZER VIA BLUETOOTH... 3 Bluetooth Devices...
More informationHow to approach a computational problem
How to approach a computational problem A lot of people find computer programming difficult, especially when they first get started with it. Sometimes the problems are problems specifically related to
More informationMap Reduce. Yerevan.
Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate
More informationPublications Database
Getting Started Guide Publications Database To w a r d s a S u s t a i n a b l e A s i a - P a c i f i c!1 Table of Contents Introduction 3 Conventions 3 Getting Started 4 Suggesting a Topic 11 Appendix
More informationDistributed Computation Models
Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case
More informationCustomizing DAZ Studio
Customizing DAZ Studio This tutorial covers from the beginning customization options such as setting tabs to the more advanced options such as setting hot keys and altering the menu layout. Introduction:
More informationColumn Stores vs. Row Stores How Different Are They Really?
Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background
More informationFile Structures and Indexing
File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures
More informationHigh Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove
High Performance Data Analytics for Numerical Simulations Bruno Raffin DataMove bruno.raffin@inria.fr April 2016 About this Talk HPC for analyzing the results of large scale parallel numerical simulations
More informationChapter-3. Introduction to Unix: Fundamental Commands
Chapter-3 Introduction to Unix: Fundamental Commands What You Will Learn The fundamental commands of the Unix operating system. Everything told for Unix here is applicable to the Linux operating system
More informationGadget in yt. christopher erick moody
Gadget in yt First of all, hello, and thank you for giving me the opp to speak My name is chris moody and I m a grad student here at uc santa cruz and I ve been working with Joel for the last year and
More informationWorldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010
Worldwide Production Distributed Data Management at the LHC Brian Bockelman MSST 2010, 4 May 2010 At the LHC http://op-webtools.web.cern.ch/opwebtools/vistar/vistars.php?usr=lhc1 Gratuitous detector pictures:
More informationParallel Programming Patterns Overview and Concepts
Parallel Programming Patterns Overview and Concepts Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationI/O Challenges: Todays I/O Challenges for Big Data Analysis. Henry Newman CEO/CTO Instrumental, Inc. April 30, 2013
I/O Challenges: Todays I/O Challenges for Big Data Analysis Henry Newman CEO/CTO Instrumental, Inc. April 30, 2013 The Challenge is Archives Big data in HPC means archive and archive translates to a tape
More informationAlgorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I
Memory Performance of Algorithms CSE 32 Data Structures Lecture Algorithm Performance Factors Algorithm choices (asymptotic running time) O(n 2 ) or O(n log n) Data structure choices List or Arrays Language
More informationMatlab for FMRI Module 1: the basics Instructor: Luis Hernandez-Garcia
Matlab for FMRI Module 1: the basics Instructor: Luis Hernandez-Garcia The goal for this tutorial is to make sure that you understand a few key concepts related to programming, and that you know the basics
More informationCS 1110 SPRING 2016: GETTING STARTED (Jan 27-28) First Name: Last Name: NetID:
CS 1110 SPRING 2016: GETTING STARTED (Jan 27-28) http://www.cs.cornell.edu/courses/cs1110/2016sp/labs/lab01/lab01.pdf First Name: Last Name: NetID: Goals. Learning a computer language is a lot like learning
More informationWHITEPAPER. Disk Configuration Tips for Ingres by Chip nickolett, Ingres Corporation
WHITEPAPER Disk Configuration Tips for Ingres by Chip nickolett, Ingres Corporation table of contents: 3 Preface 3 Overview 4 How Many Disks Do I Need? 5 Should I Use RAID? 6 Ingres Configuration Recommendations
More informationSlide Set 1. for ENCM 339 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary
Slide Set 1 for ENCM 339 Fall 2016 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary September 2016 ENCM 339 Fall 2016 Slide Set 1 slide 2/43
More informationL7: Performance. Frans Kaashoek Spring 2013
L7: Performance Frans Kaashoek kaashoek@mit.edu 6.033 Spring 2013 Overview Technology fixes some performance problems Ride the technology curves if you can Some performance requirements require thinking
More informationUnix/Linux Operating System. Introduction to Computational Statistics STAT 598G, Fall 2011
Unix/Linux Operating System Introduction to Computational Statistics STAT 598G, Fall 2011 Sergey Kirshner Department of Statistics, Purdue University September 7, 2011 Sergey Kirshner (Purdue University)
More information[537] Fast File System. Tyler Harter
[537] Fast File System Tyler Harter File-System Case Studies Local - FFS: Fast File System - LFS: Log-Structured File System Network - NFS: Network File System - AFS: Andrew File System File-System Case
More informationCS 318 Principles of Operating Systems
CS 318 Principles of Operating Systems Fall 2017 Lecture 16: File Systems Examples Ryan Huang File Systems Examples BSD Fast File System (FFS) - What were the problems with the original Unix FS? - How
More informationGoogle Drive: Access and organize your files
Google Drive: Access and organize your files Use Google Drive to store and access your files, folders, and Google Docs anywhere. Change a file on the web, your computer, or your mobile device, and it updates
More informationWelcome Back! Without further delay, let s get started! First Things First. If you haven t done it already, download Turbo Lister from ebay.
Welcome Back! Now that we ve covered the basics on how to use templates and how to customise them, it s time to learn some more advanced techniques that will help you create outstanding ebay listings!
More informationCS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching
CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2004 Lecture 18: Naming, Directories, and File Caching 18.0 Main Points How do users name files? What is a name? Lookup:
More informationBackground. Let s see what we prescribed.
Background Patient B s custom application had slowed down as their data grew. They d tried several different relief efforts over time, but performance issues kept popping up especially deadlocks. They
More information8/16/12. Computer Organization. Architecture. Computer Organization. Computer Basics
Computer Organization Computer Basics TOPICS Computer Organization Data Representation Program Execution Computer Languages 1 2 Architecture Computer Organization n central-processing unit n performs the
More informationIf Statements, For Loops, Functions
Fundamentals of Programming If Statements, For Loops, Functions Table of Contents Hello World Types of Variables Integers and Floats String Boolean Relational Operators Lists Conditionals If and Else Statements
More informationCS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching
CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2002 Lecture 18: Naming, Directories, and File Caching 18.0 Main Points How do users name files? What is a name? Lookup:
More informationCS510 Operating System Foundations. Jonathan Walpole
CS510 Operating System Foundations Jonathan Walpole File System Performance File System Performance Memory mapped files - Avoid system call overhead Buffer cache - Avoid disk I/O overhead Careful data
More informationCS5460: Operating Systems Lecture 20: File System Reliability
CS5460: Operating Systems Lecture 20: File System Reliability File System Optimizations Modern Historic Technique Disk buffer cache Aggregated disk I/O Prefetching Disk head scheduling Disk interleaving
More informationWeek - 01 Lecture - 04 Downloading and installing Python
Programming, Data Structures and Algorithms in Python Prof. Madhavan Mukund Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 01 Lecture - 04 Downloading and
More informationFeature allows you to view, create, change, and delete IMS databases and application views (PSBs)
IMS Administration Tool ISPF Demo Script Key Feature: Database and Application Administration Feature allows you to view, create, change, and delete IMS databases and application views (PSBs) DBS/PSB source
More informationAutomating Digital Downloads
Automating Digital Downloads (Copyright 2018 Reed Hoffmann, not to be shared without permission) One of the best things you can do to simplify your imaging workflow is to automate the download process.
More informationStudent Success Guide
Student Success Guide Contents Like a web page, links in this document can be clicked and they will take you to where you want to go. Using a Mouse 6 The Left Button 6 The Right Button 7 The Scroll Wheel
More informationSearch Lesson Outline
1. Searching Lesson Outline 2. How to Find a Value in an Array? 3. Linear Search 4. Linear Search Code 5. Linear Search Example #1 6. Linear Search Example #2 7. Linear Search Example #3 8. Linear Search
More information10 Simple User Experience Best Practices
10 Simple User Experience Best Practices That Make Your Website Design 100% More Powerful Presented by We all know that feeling of frustration. As a user, it starts off as mild annoyance, but eventually
More informationSplunk is a great tool for exploring your log data. It s very powerful, but
Sysadmin David Lang David Lang is a site reliability engineer at Google. He spent more than a decade at Intuit working in the Security Department for the Banking Division. He was introduced to Linux in
More informationSciSpark 201. Searching for MCCs
SciSpark 201 Searching for MCCs Agenda for 201: Access your SciSpark & Notebook VM (personal sandbox) Quick recap. of SciSpark Project What is Spark? SciSpark Extensions scitensor: N-dimensional arrays
More informationProMAX Cache-A Overview
ProMAX Cache-A Overview Cache-A Archive Appliances With the proliferation of tapeless workflows, came an amazing amount of data that not only needs to be backed up, but also moved and managed. This data
More information