Newly invented and fully owned by Turbo Data Laboratories, Inc. (TDL)

Size: px
Start display at page:

Download "Newly invented and fully owned by Turbo Data Laboratories, Inc. (TDL)"

Transcription

1 Newly invented and fully owned by Turbo Data Laboratories, Inc. (TDL) 28, July, 2017

2 Executive Summary Universal & Designless, yet Far Faster than Legacy Technologies Big Data Technology has to do with many kinds of operations(interactive + Batch) + IoT + AI. Universal and Designless and yet the Fastest is awaited. Innovation Continued Based On Mathematical Principles That technology should start from mathematical principles, laid on more fundamental part than the start line of current technologies. Turbo Data Laboratories, Inc. (TDL) is a company, has been developing orders of magnitude faster data processing technologies. TDL has been researching its technology from 1996, heaved its level in every 2 or 3 years, and now it comes critical point. 2001: ZAP-In : Big Data s Spread Sheet x500 ~ x700 faster, at Fujitsu s Benchmark. 2013: ZAP-Over : Searching / Gathering of Globally Distributed Big Data x1,000 in total performance, at National Tax Agency ZAP-Mass : PB class Super Big Data DB System, x400,000 faster at Sorting. 2

3 Section 1 History of Turbo Data Laboratories Linear Filtering Method (LFM) Theory 3

4 1. History of LFM Theory Refer to Earth wide Big-data Access to Local Bigdata Access to Google Class Bigdata 1. Zap-In Technology Big Data s Spread Sheet 10KByte - 1TByte 2. Zap-Over Technology Globally Distributed R/O DB 10KByte - 10TByte 3. Zap-Mass Technology Massive Parallel DB 1GByte - 100PByte Interactive Distributed Massive & Interactive LFM Theory A revolutionary DB theory based on Algorithm Index 4

5 2. Chart of Technologies and Products Technology layer is as follows ZAP-In / ZAP-Over 1. Math Quark Theory: defines substructures of a table, provides a universal foundation for Math Index 2. Math Index Theory: provides universal multi-functional indexes to every field and/or ordered set 3. Math Switch Theory: provides runtime partitioning to allot CPU / memory / communication 4. PETA DB OS: provides preemptive multi-tasking & resource control 5. PETA Sheet: provides browsing / accessing / analyzing / programming platform to users Product Series Layer ZAP-In (2001-) Zap-Over (2013-) Zap-Mass (2017-) Application - - PETA Sheet OS - - PETA DB OS Technology Layer LFM Technology Architecture - - Algorithms Data Structure LFM Index Math Index Theory (for 1/3 model) Math Quark Theory (for 1/3 model) Math Index Theory (for 4/6 model) Math Quark Theory (for 4/6 model) Math Switch Theory Math Index Theory (for 3/5 model) Math Quark Theory (for 3/5 model) 5

6 Section 2 Why Linear Filtering Method (LFM) Works Well Always? 6

7 3. Every Data s Substructure Math Quark Math Quark : Collection of arrays (= Table) has 3 basic substructures: Math Quarks. 1 st. Ordered Set: has role to control select status and access order to each member. 2 nd. Value Number: has role to abstract real data into integer. fig 1. 3 rd. Value List: has role to control existing values. A way of combination of Math Quarks is equivalent to original arrays (= Table). And another combination of Math Quarks becomes index for sort / tabulation and another Math Index Merits of Math Quarks : 1 st. Because Math Quark exists in every combination of fields and ordered sets, algorithm (Math Index) is available always. Thus Math Quark enables any cascading of algorithms. (See fig. 2) 1 st. By Math Quarks, we can reuse existing Math Quarks to build transformation results. For example, in sorting / searching, we can reuse Value Number and Value List. That reduces CPU steps in sorting: O(n*log(n)) O(n). Math Quark Enables Math Index Fig 1. A simple example of Math Quarks Math Quarks Original Data G. Age G. Age OrdSet VNo VL VNo VL 0 F F M 9 = M M F Fig 2. Cascading of processes 7

8 4. Ever Existing Index Math Index Math Index : Data Index : Existing Index is outside data of indexed data. It and indexed by it are independent each other. It is always defined by data, I named them Data Index. It is strictly bound to specific data, we can use it to that specific data only. Math Index is defined by algorithms, is an Algorithm Index. Math Index is available at anytime / any case. Because Math Quark exists always. Math Index Enables Math Switch Merits of Math Index : 1 st. It doesn t use memory / storage. It can be transferred without communication cost. 2 nd. It can go with every field and every ordered set (subset). It can be cascaded always. 3 rd. It is Rich in functions: OLTP / Join / Sorting / Search / Tabulation / Set Operations / etc., every DB operation is possible. 4 th. No need to update. 5 th. It can utilize multi-core. / It can run in massive parallel systems. 8

9 Section 3 Existing Product Series 1 ZAP-In Technology 9

10 1. Zap-In for Big Data s Spread Sheet Interactive Big Data Suitable Data size: 10KByte - 1TByte Very Fast and Quick in Response x25 faster than Spark x500 ~ x700 at Fujitsu s Benchmark, 2001 (next page) Very Rich in functionality Enables Big-data s Spread Sheet with RDB functions Is One Stop Platform to access Big Data 10

11 1. Zap-In Technology (continued) Spread Sheet for Big-data Interactive operation like Excel for Big-data (up to 1TB) Quick operation even for Big-data Quick system integration by Automatic Programming Zap-In Spread Sheet Excel Relational Database Big-data OK (Up to 10TB) NG OK Interactive Operation OK (Easy) OK (Easy) NO Operation Speed Very Fast Slow Fast Macro Recording OK (creates Python code) OK NO DB Operation OK (tabulation, sorting, search, join, union ) NG OK 11

12 Benchmark at Fujitsu, Zap-In (continued) 12

13 Track Record Zap-In has been main product series. ZAP-In Engine made by LFM technology, turned many impossible to possible. It has been used by 2 kinds of users. 1 st. Those who need absolutely fast Big Data s Batches. 2 nd. Those who need interactive Big Data operations, like its Cleansing / Transformation / Analytics / etc.. In 2001, Fujitsu benchmarked it, and found it runs x500 ~ x700 faster in BOM development and MRP. So, ZAP-In has been used for its central procurement system. Fujitsu announced it reduces $2.8B/y from total $30B/y. Patents of ZAP-In has licensed to SAP, NEC, Fujitsu BSC, and others. And other users about

14 Section 4 Existing Product Series 2 ZAP-Over Technology 14

15 2. Zap-Over Technology TWO Remote Big-data case (fig. right) Globally Distributed DB DB operations over Internet, including Union/Join/ Big-data A Zap-Over Service Big-data B Zap-Over Service Interactive Operation with Quick Response Read only (mainly) Suitable Each Table Size: 10KByte - 10TByte Applications: Open Data Service Distributed IoT DB for Distributed Organizations Zap-Over Client Zap-Over Client Zap-Over Client Zap-Over Client 15

16 2. Zap-Over Technology (continued) Big-data unification/search at distributed branches of an Enterprise Super high speed Unification, Search & Browsing Before Zap-Over After Zap-Over Carry BigData by Airlines Merge Operation takes a long time Big-data operation at the center x100 x10 Big-data operations over Internet Merge Operation takes only 100ms Big-data operation at any place 16

17 Track Record Zap-Over By ZAP-Over technology (2013-), One Stop Searching / Browsing over many Big Data at many locations, comes possible. By looking up over 100 countries deal logs, money laundering s trace comes possible. But it took 15 ~ 20 minutes each 1 trace, and simultaneous user count was up to 2. By ZAP-Over technology, 1 trace time reduced to about 10 sec. (x100), and simultaneous user count comes to 20 (x10). That system has been running in National Tax Agency from 2013-, to detect international money laundering. 17

18 Section 5 Future Product Series ZAP-Mass Technology 18

19 1. ZAP-Mass Introduction Cloud Computing: Main field where Amazon/Google/Microsoft/etc. are competing Next winner will be who achieve to provide PB class DB platform on cloud to users, by conquering following problems: 1. Too slow. 2. Too few functions. Innovation Continued Based On Mathematical Principles Turbo Data Laboratories, Inc. (TDL) is a company, has been developing orders of magnitude faster data processing technologies. TDL has been researching its technology from 1996, heaved its level in every 2 or 3 years, and now it comes critical point. 2001: ZAP-In : Big Data s Spread Sheet x500 ~ x700 faster, at Fujitsu s Benchmark. 2013: ZAP-Over : Searching / Gathering of Globally Distributed Big Data x1,000 in total performance, at National Tax Agency ZAP-Mass : PB class Super Big Data DB System, x400,000 faster at Sorting. ZAP-Mass: is a massive parallel Big Data DB system (Algorithm + Architecture + DB-OS + Application), with dedicated communication chip in each server node. can do PB class DB processing. can enable Big Data s versatile operations by end users own selves. 19

20 2. ZAP-Mass: Performance Simulation at PB Table Example DB DB Total 1PB 100 Fields 2KB / record 500,000,000,000 rec. Example System & Architecture System Total Each Server Each Chip- Module 32,768 servers 2PB Memory 64GB Mem Com. speed 50Gbps Storage 500MB/s 128 Chip-Modules 1GB memory 50 Gbps input 50 Gbps output Zap- Mass Only Operation Hadoop, etc (Estimation) ZAP-Mass (Estimation) Magnificatio n 1. Sort by int. field, 100,000 cardinality 1,200,000 sec 3 sec x 400, Extraction by search, 10% Hit 8 sec sec x 4, Extraction by search, 50% Hit 40 sec sec x 20, Tabulation, occurrence in 100,000 cardinality string field 5. N:N sort Join, by 1 string key, that key has 100,000 cardinality 6. Distinct, by 2 string keys, each key has 100,000 cardinality 120 sec 0.06 sec x 2,000-4 sec sec - 7. Insert or Delete 1,000 records sec - Using current technologies causes Severe Limitations too slow sorting and functions who use it, are almost impossible editing is almost impossible Impossible to use for common users 20

21 3. Math Switch Enables Dynamic Partitioning Math Switch : Math Switch Theory is available over Math Index that makes between nodes communication to be easy to handle in massive parallel ways. Math Switch offers multiple ring architecture as shown in fig 5-1 (next page). That architecture has 2 directional symmetries. 1 st. ring wise. Ring wise direction assigns pipeline length. 2 nd. inter ring wise. Inter ring wise direction assigns degree of parallelism. Math Switch offers dynamic partitioning in 2 directions. (fig 5-2, fig 5-3, next page) Math Switch can assign task s Pipeline length and Degree of parallelism by changing partition sizes. Math Switch can control amount of resources for each task also by changing partition sizes. Math Switch offers preemptive task switching also, see fig 5-4 (next page), that was not easy for super computers. 21

22 3. Math Switch Enables Dynamic Partitioning fig 5-1. Multiple Ring Architecture fig 5-2. Division of Ring n03 ring 0 n02 n02 data data n00 n01 ring 0 n00 n03 n01 n02 ring 1 n10 ring 2 n20 n13 n23 n11 n12 n22 (ring-wise) data n21 data : data passed to next n12 n22 ring 1 n10 ring 2 n20 n13 n23 n11 n21 n12 n22 (inter ring-wise) : data not passed to next fig 5-3. Horizontal Division fig 5-4. Preemptive Task Switching Enabled by Switching Packets to Pass ring 0 n03 n02 n00 n01 ring 1 n10 n13 n11 n12 ring 2 n20 n23 n21 n22 22

23 It controls runtime partitioning ability of Math Switch. System becomes much more scalable, easily and meaningfully, by that partitioning. It can keep and manage big count of / many kinds of Big Data, that is not easy for other Big Data systems. It can run many tasks in many partitions. It can switch tasks preemptively in each partition. 4. PETA DB OS PB class, Preemptive Multi Task, DB OS Over Massive Parallel Architecture 2 3

24 5. PETA Sheet A Big Data s Spread Sheet with RDB functions Featuring following functions. A. for Big Data s Browsing B. for Accessing (Cleansing / Transforming / Editing / etc.) Big Data C. for Analyzing (Statistics / Data Mining / BI / etc.) Big Data D. for Programming Big Data E. Control panel of PETA DB OS 24

25 6. Summary of Zap-Mass Technology Enables DB System On Massive Parallel Computer System, employing dedicated chips (to be designed), Composed by Math Quark Math Index Math Switch PETA DB OS PETA Sheet Suitable Data size: 1TByte - 100PByte Suitable System size: 16 servers 1,000,000 servers or more Expected performance: about x10,000 than Hadoop, at same count of servers 25

26 7. Zap-Mass Enables ZAP-Mass enables Big Data s versatile operations by end users own selves. 26

27 Thank you 30

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013 SAP HANA Jake Klein/ SVP SAP HANA June, 2013 SAP 3 YEARS AGO Middleware BI / Analytics Core ERP + Suite 2013 WHERE ARE WE NOW? Cloud Mobile Applications SAP HANA Analytics D&T Changed Reality Disruptive

More information

Part 1: Indexes for Big Data

Part 1: Indexes for Big Data JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,

More information

End-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.

End-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved. End-to-End data mining feature integration, transformation and selection with Datameer Fastest time to Insights Rapid Data Integration Zero coding data integration Wizard-led data integration & No ETL

More information

LazyBase: Trading freshness and performance in a scalable database

LazyBase: Trading freshness and performance in a scalable database LazyBase: Trading freshness and performance in a scalable database (EuroSys 2012) Jim Cipar, Greg Ganger, *Kimberly Keeton, *Craig A. N. Soules, *Brad Morrey, *Alistair Veitch PARALLEL DATA LABORATORY

More information

The Pliny Database PDB

The Pliny Database PDB The Pliny Database PDB Chris Jermaine Carlos Monroy, Kia Teymourian, Sourav Sikdar Rice University 1 PDB Overview PDB: Distributed object store + compute platform In Pliny project: Used to store processed

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

Fujitsu/Fujitsu Labs Technologies for Big Data in Cloud and Business Opportunities

Fujitsu/Fujitsu Labs Technologies for Big Data in Cloud and Business Opportunities Fujitsu/Fujitsu Labs Technologies for Big Data in Cloud and Business Opportunities Satoshi Tsuchiya Cloud Computing Research Center Fujitsu Laboratories Ltd. January, 2012 Overview: Fujitsu s Cloud and

More information

Fujitsu: Your Partner for SAP HANA Solutions

Fujitsu: Your Partner for SAP HANA Solutions Fujitsu: Your Partner for SAP HANA Solutions The In-memory Revolution Process vast amounts of data in real-time Run analytics dramatically faster than disk-based DB (10x to >1,000x) Big Data Challenge

More information

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management Management Information Systems Review Questions Chapter 6 Foundations of Business Intelligence: Databases and Information Management 1) The traditional file environment does not typically have a problem

More information

Software and Tools for HPE s The Machine Project

Software and Tools for HPE s The Machine Project Labs Software and Tools for HPE s The Machine Project Scalable Tools Workshop Aug/1 - Aug/4, 2016 Lake Tahoe Milind Chabbi Traditional Computing Paradigm CPU DRAM CPU DRAM CPU-centric computing 2 CPU-Centric

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

The Mathematics of Big Data

The Mathematics of Big Data The Mathematics of Big Data Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) Math & Big Data Fall 2017 1 / 10 Introduction We briefly present Big Data and the issues associated with Big Data. Philippe

More information

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements Lec 13: Linking and Memory Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University PA 2 is out Due on Oct 22 nd Announcements Prelim Oct 23 rd, 7:30-9:30/10:00 All content up to Lecture on Oct

More information

Accelerator Design for Big Data Processing Frameworks

Accelerator Design for Big Data Processing Frameworks Accelerator Design for Big Data Processing Frameworks Hiroki Matsutani Dept. of ICS, Keio University http://www.arc.ics.keio.ac.jp/~matutani July 5th, 2017 International Forum on MPSoC for Software-Defined

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 11 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(11), 2014 [5368-5376] The study on magnanimous data-storage system based

More information

WITH INTEL TECHNOLOGIES

WITH INTEL TECHNOLOGIES WITH INTEL TECHNOLOGIES Commitment Is to Enable The Best Democratize technologies Advance solutions Unleash innovations Intel Xeon Scalable Processor Family Delivers Ideal Enterprise Solutions NEW Intel

More information

NEW CONVERGED APPROACH FOR SAP POWERED BY ATOS

NEW CONVERGED APPROACH FOR SAP POWERED BY ATOS NEW CONVERGED APPROACH FOR SAP POWERED BY ATOS Michael Schmitter, Atos Tim Wörfel, Hitachi Vantara 28.02.2018 HITACHI and Atos Partnership More 9 Years Partnership Partnership covers main areas of the

More information

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Creating a Recommender System. An Elasticsearch & Apache Spark approach Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused

More information

BigDataBench-MT: Multi-tenancy version of BigDataBench

BigDataBench-MT: Multi-tenancy version of BigDataBench BigDataBench-MT: Multi-tenancy version of BigDataBench Gang Lu Beijing Academy of Frontier Science and Technology BigDataBench Tutorial, ASPLOS 2016 Atlanta, GA, USA n Software perspective Multi-tenancy

More information

GPU Accelerated Data Processing Speed of Thought Analytics at Scale

GPU Accelerated Data Processing Speed of Thought Analytics at Scale GPU Accelerated Data Processing Speed of Thought Analytics at Scale The benefits of Brytlyt s GPU Accelerated Database Brytlyt is an ultra-high performance database that combines patent pending intellectual

More information

Data-intensive computing in NGS

Data-intensive computing in NGS Data-intensive computing in NGS Luca Pireddu Distributed Computing Group June 6, 2013 luca.pireddu@crs4.it (CRS4) BigData Tech in NGS June 6, 2013 1 / 15 Data-intensive computing What is data-intensive

More information

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory

More information

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE) COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE) PRESENTATION BY PRANAV GOEL Introduction On analytical workloads, Column

More information

Accelerate Big Data Insights

Accelerate Big Data Insights Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not

More information

Proposal for parallel sort in base R (and Python/Julia)

Proposal for parallel sort in base R (and Python/Julia) Proposal for parallel sort in base R (and Python/Julia) Directions in Statistical Computing 2 July 2016, Stanford Matt Dowle Initial timings https://github.com/rdatatable/data.table/wiki/installation See

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

Massive Scalability With InterSystems IRIS Data Platform

Massive Scalability With InterSystems IRIS Data Platform Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special

More information

Approaching the Petabyte Analytic Database: What I learned

Approaching the Petabyte Analytic Database: What I learned Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Oracle Partitioning für Einsteiger Hermann Bär Partitioning Produkt Management 2 Disclaimer The goal is to establish a basic understanding of what can be done with Partitioning I want you to start thinking

More information

Sublinear Algorithms for Big Data Analysis

Sublinear Algorithms for Big Data Analysis Sublinear Algorithms for Big Data Analysis Michael Kapralov Theory of Computation Lab 4 EPFL 7 September 2017 The age of big data: massive amounts of data collected in various areas of science and technology

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 24: MapReduce CSE 344 - Fall 2016 1 HW8 is out Last assignment! Get Amazon credits now (see instructions) Spark with Hadoop Due next wed CSE 344 - Fall 2016

More information

SAP HANA as an Accelerator for PLM Processes HANA Basics and Scenarios

SAP HANA as an Accelerator for PLM Processes HANA Basics and Scenarios SAP HANA as an Accelerator for PLM Processes HANA Basics and Scenarios Michael Dietz, Principal Solution Architect HANA Public Agenda SAP HANA Platform Usage Scenarios Potentials in Product Lifecycle Management

More information

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores Swapnil Patil Milo Polte, Wittawat Tantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie

More information

HOW TO BUILD A MODERN AI

HOW TO BUILD A MODERN AI HOW TO BUILD A MODERN AI FOR THE UNKNOWN IN MODERN DATA 1 2016 PURE STORAGE INC. 2 Official Languages Act (1969/1988) 3 Translation Bureau 4 5 DAWN OF 4 TH INDUSTRIAL REVOLUTION BIG DATA, AI DRIVING CHANGE

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

COMP 273 Winter physical vs. virtual mem Mar. 15, 2012

COMP 273 Winter physical vs. virtual mem Mar. 15, 2012 Virtual Memory The model of MIPS Memory that we have been working with is as follows. There is your MIPS program, including various functions and data used by this program, and there are some kernel programs

More information

The Earth Simulator System

The Earth Simulator System Architecture and Hardware for HPC Special Issue on High Performance Computing The Earth Simulator System - - - & - - - & - By Shinichi HABATA,* Mitsuo YOKOKAWA and Shigemune KITAWAKI The Earth Simulator,

More information

Understanding the SAP HANA Difference. Amit Satoor, SAP Data Management

Understanding the SAP HANA Difference. Amit Satoor, SAP Data Management Understanding the SAP HANA Difference Amit Satoor, SAP Data Management Webinar Logistics Got Flash? http://get.adobe.com/flashplayer to download. The future holds many transformational opportunities Capitalize

More information

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating

More information

Advanced Databases: Parallel Databases A.Poulovassilis

Advanced Databases: Parallel Databases A.Poulovassilis 1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger

More information

Unifying Big Data Workloads in Apache Spark

Unifying Big Data Workloads in Apache Spark Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache

More information

SAP HANA Scalability. SAP HANA Development Team

SAP HANA Scalability. SAP HANA Development Team SAP HANA Scalability Design for scalability is a core SAP HANA principle. This paper explores the principles of SAP HANA s scalability, and its support for the increasing demands of data-intensive workloads.

More information

ADVANCED IN-MEMORY COMPUTING USING SUPERMICRO MEMX SOLUTION

ADVANCED IN-MEMORY COMPUTING USING SUPERMICRO MEMX SOLUTION TABLE OF CONTENTS 2 WHAT IS IN-MEMORY COMPUTING (IMC) Benefits of IMC Concerns with In-Memory Processing Advanced In-Memory Computing using Supermicro MemX 1 3 MEMX ARCHITECTURE MemX Functionality and

More information

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate

Outline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,

More information

World s most advanced data center accelerator for PCIe-based servers

World s most advanced data center accelerator for PCIe-based servers NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying

More information

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6 6.0 Introduction Sorting algorithms used in computer science are often classified by: Computational complexity (worst, average and best behavior) of element

More information

Available online at ScienceDirect. Procedia Computer Science 98 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 98 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 515 521 The 3rd International Symposium on Emerging Information, Communication and Networks (EICN 2016) A Speculative

More information

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,

More information

webmethods Task Engine 9.9 on Red Hat Operating System

webmethods Task Engine 9.9 on Red Hat Operating System webmethods Task Engine 9.9 on Red Hat Operating System Performance Technical Report 1 2015 Software AG. All rights reserved. Table of Contents INTRODUCTION 3 1.0 Benchmark Goals 4 2.0 Hardware and Software

More information

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors CPUs Caches. Memory management. CPU performance. Cache : MainMemory :: Window : 1. Door 2. Bigger Door 3. The Great Outdoors 4. Horizontal Blinds 18% 9% 64% 9% Door Bigger Door The Great Outdoors Horizontal

More information

Innovations in Business Solutions. SAP Analytics, Data Modeling and Reporting Course

Innovations in Business Solutions. SAP Analytics, Data Modeling and Reporting Course SAP Analytics, Data Modeling and Reporting Course Introduction: This course is design to cover SAP Analytics, Data Modeling and Reporting course content. After completion of this course students can go

More information

Big data systems 12/8/17

Big data systems 12/8/17 Big data systems 12/8/17 Today Basic architecture Two levels of scheduling Spark overview Basic architecture Cluster Manager Cluster Cluster Manager 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores

More information

BIG DATA COURSE CONTENT

BIG DATA COURSE CONTENT BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data

More information

Multi-tenancy version of BigDataBench

Multi-tenancy version of BigDataBench Multi-tenancy version of BigDataBench Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY 1 Multi-tenancy

More information

Perform scalable data exchange using InfoSphere DataStage DB2 Connector

Perform scalable data exchange using InfoSphere DataStage DB2 Connector Perform scalable data exchange using InfoSphere DataStage Angelia Song (azsong@us.ibm.com) Technical Consultant IBM 13 August 2015 Brian Caufield (bcaufiel@us.ibm.com) Software Architect IBM Fan Ding (fding@us.ibm.com)

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

COMP Data Structures

COMP Data Structures COMP 2140 - Data Structures Shahin Kamali Topic 5 - Sorting University of Manitoba Based on notes by S. Durocher. COMP 2140 - Data Structures 1 / 55 Overview Review: Insertion Sort Merge Sort Quicksort

More information

Linear Regression Optimization

Linear Regression Optimization Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:

More information

In-Memory Data Management

In-Memory Data Management In-Memory Data Management Martin Faust Research Assistant Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam Agenda 2 1. Changed Hardware 2.

More information

CrateDB for Time Series. How CrateDB compares to specialized time series data stores

CrateDB for Time Series. How CrateDB compares to specialized time series data stores CrateDB for Time Series How CrateDB compares to specialized time series data stores July 2017 The Time Series Data Workload IoT, digital business, cyber security, and other IT trends are increasing the

More information

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine SAP IQ Software16, Edge Edition The Affordable High Performance Analytical Database Engine Agenda Agenda Introduction to Dobler Consulting Today s Data Challenges Overview of SAP IQ 16, Edge Edition SAP

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Every SAS Cloud has a Silver Lining. Letting your data reign in the cloud

Every SAS Cloud has a Silver Lining. Letting your data reign in the cloud Every SAS Cloud has a Silver Lining Letting your data reign in the cloud DSS SAS SYSTEM Current Single Virtual Server unit with 16 cores upgraded to 32 cores 256 Gb RAM 150 registered users Data collector

More information

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III NEC Express5800 A2040b 22TB Data Warehouse Fast Track Reference Architecture with SW mirrored HGST FlashMAX III Based on Microsoft SQL Server 2014 Data Warehouse Fast Track (DWFT) Reference Architecture

More information

Chuck Cartledge, PhD. 24 September 2017

Chuck Cartledge, PhD. 24 September 2017 Introduction Amdahl BD Processing Languages Q&A Conclusion References Big Data: Data Analysis Boot Camp Serial vs. Parallel Processing Chuck Cartledge, PhD 24 September 2017 1/24 Table of contents (1 of

More information

The Optimal CPU and Interconnect for an HPC Cluster

The Optimal CPU and Interconnect for an HPC Cluster 5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance

More information

HANA Performance. Efficient Speed and Scale-out for Real-time BI

HANA Performance. Efficient Speed and Scale-out for Real-time BI HANA Performance Efficient Speed and Scale-out for Real-time BI 1 HANA Performance: Efficient Speed and Scale-out for Real-time BI Introduction SAP HANA enables organizations to optimize their business

More information

Unit 14 plan installation and maintenance of hardware in a technology system

Unit 14 plan installation and maintenance of hardware in a technology system Unit 14 plan installation and maintenance of hardware in a technology system In this assessment I will be describing the purpose and client requirements for the hardware, I will produce a plan for installing

More information

Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development

Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development M. Serdar Celebi

More information

Tokyo. Copyright 2013 FUJITSU LIMITED

Tokyo. Copyright 2013 FUJITSU LIMITED Shaping Tomorrow Through Modernization and Innovation Noriyuki Toyoki Corporate Senior Vice President Fujitsu Limited Tokyo Fujitsu Vision Human Centric Intelligent Society Fujitsu Technology and Service

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale.

Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale. Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale. January 2016 Credit Card Fraud prevention is among the most time-sensitive and high-value of IT tasks. The databases

More information

How to integrate data into Tableau

How to integrate data into Tableau 1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service

More information

Progress DataDirect For Business Intelligence And Analytics Vendors

Progress DataDirect For Business Intelligence And Analytics Vendors Progress DataDirect For Business Intelligence And Analytics Vendors DATA SHEET FEATURES: Direction connection to a variety of SaaS and on-premises data sources via Progress DataDirect Hybrid Data Pipeline

More information

Query Evaluation Overview, cont.

Query Evaluation Overview, cont. Query Evaluation Overview, cont. Lecture 9 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Architecture of a DBMS Query Compiler Execution Engine Index/File/Record Manager

More information

Understanding the latent value in all content

Understanding the latent value in all content Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence

More information

Goro Watanabe. Bill King. OOW 2013 The Best Platform for Big Data and Oracle Database 12c. EVP Fujitsu R&D Center North America

Goro Watanabe. Bill King. OOW 2013 The Best Platform for Big Data and Oracle Database 12c. EVP Fujitsu R&D Center North America OOW 2013 The Best Platform for Big Data and Oracle Database 12c Goro Watanabe EVP Fujitsu R&D Center North America Bill King EVP Platform Products Group Fujitsu America, Inc. Overview 1. Fujitsu: Quick

More information

Introduction to Big-Data

Introduction to Big-Data Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,

More information

Netezza The Analytics Appliance

Netezza The Analytics Appliance Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for

More information

Infor Lawson on IBM i 7.1 and IBM POWER7+

Infor Lawson on IBM i 7.1 and IBM POWER7+ Infor Lawson on IBM i 7.1 and IBM POWER7+ IBM Systems & Technology Group Mike Breitbach mbreit@us.ibm.com This document can be found on the web, Version Date: March, 2014 Table of Contents 1. Introduction...

More information

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC October 28, 2013 Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration 朱义普 Director, North Asia, HPC DDN Storage Vendor for HPC & Big Data

More information

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018 Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster

More information

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time

More information

SentryWire Next generation packet capture and network security.

SentryWire Next generation packet capture and network security. Next generation packet capture and network security. 1 The data landscape More data, more danger. Data proliferation brings many new opportunities but also many downsides: more data breaches, more sophisticated

More information

1 o Semestre 2007/2008

1 o Semestre 2007/2008 Efficient Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 6 7 Outline 1 2 3 4 5 6 7 Text es An index is a mechanism to locate a given term in

More information

In-Memory Data Management Jens Krueger

In-Memory Data Management Jens Krueger In-Memory Data Management Jens Krueger Enterprise Platform and Integration Concepts Hasso Plattner Intitute OLTP vs. OLAP 2 Online Transaction Processing (OLTP) Organized in rows Online Analytical Processing

More information

Defining The Software-Defined Technology Market Mario Blandini

Defining The Software-Defined Technology Market Mario Blandini Defining The Software-Defined Technology Market Mario Blandini HGST mario.blandini@hgst.com @SwiftMario Forward Looking Statement This presentation contains forward-looking statements that involve risks

More information

SentryWire Next generation packet capture and network security.

SentryWire Next generation packet capture and network security. Next generation packet capture and network security. 1 The data landscape 5 big cyber security trends for 2018 More data, more danger. Data proliferation brings many new opportunities but also many downsides:

More information

10 Million Smart Meter Data with Apache HBase

10 Million Smart Meter Data with Apache HBase 10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on

More information

zspotlight: Spark on z/os

zspotlight: Spark on z/os zspotlight: Spark on z/os Avijit Chatterjee, Ph.D. achatter@us.ibm.com, @ChatterAvijit STSM, IBM Competitive Project Office 1 CEOs are increasingly focused on customers as individuals leveraging contextual

More information

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing

More information

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may

More information

Introduction to the Mathematics of Big Data. Philippe B. Laval

Introduction to the Mathematics of Big Data. Philippe B. Laval Introduction to the Mathematics of Big Data Philippe B. Laval Fall 2017 Introduction In recent years, Big Data has become more than just a buzz word. Every major field of science, engineering, business,

More information

CompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy

CompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy CompSci 516 Database Systems Lecture 20 Parallel DBMS Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements HW3 due on Monday, Nov 20, 11:55 pm (in 2 weeks) See some

More information