Newly invented and fully owned by Turbo Data Laboratories, Inc. (TDL)
|
|
- Magnus Shaw
- 5 years ago
- Views:
Transcription
1 Newly invented and fully owned by Turbo Data Laboratories, Inc. (TDL) 28, July, 2017
2 Executive Summary Universal & Designless, yet Far Faster than Legacy Technologies Big Data Technology has to do with many kinds of operations(interactive + Batch) + IoT + AI. Universal and Designless and yet the Fastest is awaited. Innovation Continued Based On Mathematical Principles That technology should start from mathematical principles, laid on more fundamental part than the start line of current technologies. Turbo Data Laboratories, Inc. (TDL) is a company, has been developing orders of magnitude faster data processing technologies. TDL has been researching its technology from 1996, heaved its level in every 2 or 3 years, and now it comes critical point. 2001: ZAP-In : Big Data s Spread Sheet x500 ~ x700 faster, at Fujitsu s Benchmark. 2013: ZAP-Over : Searching / Gathering of Globally Distributed Big Data x1,000 in total performance, at National Tax Agency ZAP-Mass : PB class Super Big Data DB System, x400,000 faster at Sorting. 2
3 Section 1 History of Turbo Data Laboratories Linear Filtering Method (LFM) Theory 3
4 1. History of LFM Theory Refer to Earth wide Big-data Access to Local Bigdata Access to Google Class Bigdata 1. Zap-In Technology Big Data s Spread Sheet 10KByte - 1TByte 2. Zap-Over Technology Globally Distributed R/O DB 10KByte - 10TByte 3. Zap-Mass Technology Massive Parallel DB 1GByte - 100PByte Interactive Distributed Massive & Interactive LFM Theory A revolutionary DB theory based on Algorithm Index 4
5 2. Chart of Technologies and Products Technology layer is as follows ZAP-In / ZAP-Over 1. Math Quark Theory: defines substructures of a table, provides a universal foundation for Math Index 2. Math Index Theory: provides universal multi-functional indexes to every field and/or ordered set 3. Math Switch Theory: provides runtime partitioning to allot CPU / memory / communication 4. PETA DB OS: provides preemptive multi-tasking & resource control 5. PETA Sheet: provides browsing / accessing / analyzing / programming platform to users Product Series Layer ZAP-In (2001-) Zap-Over (2013-) Zap-Mass (2017-) Application - - PETA Sheet OS - - PETA DB OS Technology Layer LFM Technology Architecture - - Algorithms Data Structure LFM Index Math Index Theory (for 1/3 model) Math Quark Theory (for 1/3 model) Math Index Theory (for 4/6 model) Math Quark Theory (for 4/6 model) Math Switch Theory Math Index Theory (for 3/5 model) Math Quark Theory (for 3/5 model) 5
6 Section 2 Why Linear Filtering Method (LFM) Works Well Always? 6
7 3. Every Data s Substructure Math Quark Math Quark : Collection of arrays (= Table) has 3 basic substructures: Math Quarks. 1 st. Ordered Set: has role to control select status and access order to each member. 2 nd. Value Number: has role to abstract real data into integer. fig 1. 3 rd. Value List: has role to control existing values. A way of combination of Math Quarks is equivalent to original arrays (= Table). And another combination of Math Quarks becomes index for sort / tabulation and another Math Index Merits of Math Quarks : 1 st. Because Math Quark exists in every combination of fields and ordered sets, algorithm (Math Index) is available always. Thus Math Quark enables any cascading of algorithms. (See fig. 2) 1 st. By Math Quarks, we can reuse existing Math Quarks to build transformation results. For example, in sorting / searching, we can reuse Value Number and Value List. That reduces CPU steps in sorting: O(n*log(n)) O(n). Math Quark Enables Math Index Fig 1. A simple example of Math Quarks Math Quarks Original Data G. Age G. Age OrdSet VNo VL VNo VL 0 F F M 9 = M M F Fig 2. Cascading of processes 7
8 4. Ever Existing Index Math Index Math Index : Data Index : Existing Index is outside data of indexed data. It and indexed by it are independent each other. It is always defined by data, I named them Data Index. It is strictly bound to specific data, we can use it to that specific data only. Math Index is defined by algorithms, is an Algorithm Index. Math Index is available at anytime / any case. Because Math Quark exists always. Math Index Enables Math Switch Merits of Math Index : 1 st. It doesn t use memory / storage. It can be transferred without communication cost. 2 nd. It can go with every field and every ordered set (subset). It can be cascaded always. 3 rd. It is Rich in functions: OLTP / Join / Sorting / Search / Tabulation / Set Operations / etc., every DB operation is possible. 4 th. No need to update. 5 th. It can utilize multi-core. / It can run in massive parallel systems. 8
9 Section 3 Existing Product Series 1 ZAP-In Technology 9
10 1. Zap-In for Big Data s Spread Sheet Interactive Big Data Suitable Data size: 10KByte - 1TByte Very Fast and Quick in Response x25 faster than Spark x500 ~ x700 at Fujitsu s Benchmark, 2001 (next page) Very Rich in functionality Enables Big-data s Spread Sheet with RDB functions Is One Stop Platform to access Big Data 10
11 1. Zap-In Technology (continued) Spread Sheet for Big-data Interactive operation like Excel for Big-data (up to 1TB) Quick operation even for Big-data Quick system integration by Automatic Programming Zap-In Spread Sheet Excel Relational Database Big-data OK (Up to 10TB) NG OK Interactive Operation OK (Easy) OK (Easy) NO Operation Speed Very Fast Slow Fast Macro Recording OK (creates Python code) OK NO DB Operation OK (tabulation, sorting, search, join, union ) NG OK 11
12 Benchmark at Fujitsu, Zap-In (continued) 12
13 Track Record Zap-In has been main product series. ZAP-In Engine made by LFM technology, turned many impossible to possible. It has been used by 2 kinds of users. 1 st. Those who need absolutely fast Big Data s Batches. 2 nd. Those who need interactive Big Data operations, like its Cleansing / Transformation / Analytics / etc.. In 2001, Fujitsu benchmarked it, and found it runs x500 ~ x700 faster in BOM development and MRP. So, ZAP-In has been used for its central procurement system. Fujitsu announced it reduces $2.8B/y from total $30B/y. Patents of ZAP-In has licensed to SAP, NEC, Fujitsu BSC, and others. And other users about
14 Section 4 Existing Product Series 2 ZAP-Over Technology 14
15 2. Zap-Over Technology TWO Remote Big-data case (fig. right) Globally Distributed DB DB operations over Internet, including Union/Join/ Big-data A Zap-Over Service Big-data B Zap-Over Service Interactive Operation with Quick Response Read only (mainly) Suitable Each Table Size: 10KByte - 10TByte Applications: Open Data Service Distributed IoT DB for Distributed Organizations Zap-Over Client Zap-Over Client Zap-Over Client Zap-Over Client 15
16 2. Zap-Over Technology (continued) Big-data unification/search at distributed branches of an Enterprise Super high speed Unification, Search & Browsing Before Zap-Over After Zap-Over Carry BigData by Airlines Merge Operation takes a long time Big-data operation at the center x100 x10 Big-data operations over Internet Merge Operation takes only 100ms Big-data operation at any place 16
17 Track Record Zap-Over By ZAP-Over technology (2013-), One Stop Searching / Browsing over many Big Data at many locations, comes possible. By looking up over 100 countries deal logs, money laundering s trace comes possible. But it took 15 ~ 20 minutes each 1 trace, and simultaneous user count was up to 2. By ZAP-Over technology, 1 trace time reduced to about 10 sec. (x100), and simultaneous user count comes to 20 (x10). That system has been running in National Tax Agency from 2013-, to detect international money laundering. 17
18 Section 5 Future Product Series ZAP-Mass Technology 18
19 1. ZAP-Mass Introduction Cloud Computing: Main field where Amazon/Google/Microsoft/etc. are competing Next winner will be who achieve to provide PB class DB platform on cloud to users, by conquering following problems: 1. Too slow. 2. Too few functions. Innovation Continued Based On Mathematical Principles Turbo Data Laboratories, Inc. (TDL) is a company, has been developing orders of magnitude faster data processing technologies. TDL has been researching its technology from 1996, heaved its level in every 2 or 3 years, and now it comes critical point. 2001: ZAP-In : Big Data s Spread Sheet x500 ~ x700 faster, at Fujitsu s Benchmark. 2013: ZAP-Over : Searching / Gathering of Globally Distributed Big Data x1,000 in total performance, at National Tax Agency ZAP-Mass : PB class Super Big Data DB System, x400,000 faster at Sorting. ZAP-Mass: is a massive parallel Big Data DB system (Algorithm + Architecture + DB-OS + Application), with dedicated communication chip in each server node. can do PB class DB processing. can enable Big Data s versatile operations by end users own selves. 19
20 2. ZAP-Mass: Performance Simulation at PB Table Example DB DB Total 1PB 100 Fields 2KB / record 500,000,000,000 rec. Example System & Architecture System Total Each Server Each Chip- Module 32,768 servers 2PB Memory 64GB Mem Com. speed 50Gbps Storage 500MB/s 128 Chip-Modules 1GB memory 50 Gbps input 50 Gbps output Zap- Mass Only Operation Hadoop, etc (Estimation) ZAP-Mass (Estimation) Magnificatio n 1. Sort by int. field, 100,000 cardinality 1,200,000 sec 3 sec x 400, Extraction by search, 10% Hit 8 sec sec x 4, Extraction by search, 50% Hit 40 sec sec x 20, Tabulation, occurrence in 100,000 cardinality string field 5. N:N sort Join, by 1 string key, that key has 100,000 cardinality 6. Distinct, by 2 string keys, each key has 100,000 cardinality 120 sec 0.06 sec x 2,000-4 sec sec - 7. Insert or Delete 1,000 records sec - Using current technologies causes Severe Limitations too slow sorting and functions who use it, are almost impossible editing is almost impossible Impossible to use for common users 20
21 3. Math Switch Enables Dynamic Partitioning Math Switch : Math Switch Theory is available over Math Index that makes between nodes communication to be easy to handle in massive parallel ways. Math Switch offers multiple ring architecture as shown in fig 5-1 (next page). That architecture has 2 directional symmetries. 1 st. ring wise. Ring wise direction assigns pipeline length. 2 nd. inter ring wise. Inter ring wise direction assigns degree of parallelism. Math Switch offers dynamic partitioning in 2 directions. (fig 5-2, fig 5-3, next page) Math Switch can assign task s Pipeline length and Degree of parallelism by changing partition sizes. Math Switch can control amount of resources for each task also by changing partition sizes. Math Switch offers preemptive task switching also, see fig 5-4 (next page), that was not easy for super computers. 21
22 3. Math Switch Enables Dynamic Partitioning fig 5-1. Multiple Ring Architecture fig 5-2. Division of Ring n03 ring 0 n02 n02 data data n00 n01 ring 0 n00 n03 n01 n02 ring 1 n10 ring 2 n20 n13 n23 n11 n12 n22 (ring-wise) data n21 data : data passed to next n12 n22 ring 1 n10 ring 2 n20 n13 n23 n11 n21 n12 n22 (inter ring-wise) : data not passed to next fig 5-3. Horizontal Division fig 5-4. Preemptive Task Switching Enabled by Switching Packets to Pass ring 0 n03 n02 n00 n01 ring 1 n10 n13 n11 n12 ring 2 n20 n23 n21 n22 22
23 It controls runtime partitioning ability of Math Switch. System becomes much more scalable, easily and meaningfully, by that partitioning. It can keep and manage big count of / many kinds of Big Data, that is not easy for other Big Data systems. It can run many tasks in many partitions. It can switch tasks preemptively in each partition. 4. PETA DB OS PB class, Preemptive Multi Task, DB OS Over Massive Parallel Architecture 2 3
24 5. PETA Sheet A Big Data s Spread Sheet with RDB functions Featuring following functions. A. for Big Data s Browsing B. for Accessing (Cleansing / Transforming / Editing / etc.) Big Data C. for Analyzing (Statistics / Data Mining / BI / etc.) Big Data D. for Programming Big Data E. Control panel of PETA DB OS 24
25 6. Summary of Zap-Mass Technology Enables DB System On Massive Parallel Computer System, employing dedicated chips (to be designed), Composed by Math Quark Math Index Math Switch PETA DB OS PETA Sheet Suitable Data size: 1TByte - 100PByte Suitable System size: 16 servers 1,000,000 servers or more Expected performance: about x10,000 than Hadoop, at same count of servers 25
26 7. Zap-Mass Enables ZAP-Mass enables Big Data s versatile operations by end users own selves. 26
27 Thank you 30
SAP HANA. Jake Klein/ SVP SAP HANA June, 2013
SAP HANA Jake Klein/ SVP SAP HANA June, 2013 SAP 3 YEARS AGO Middleware BI / Analytics Core ERP + Suite 2013 WHERE ARE WE NOW? Cloud Mobile Applications SAP HANA Analytics D&T Changed Reality Disruptive
More informationPart 1: Indexes for Big Data
JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,
More informationEnd-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.
End-to-End data mining feature integration, transformation and selection with Datameer Fastest time to Insights Rapid Data Integration Zero coding data integration Wizard-led data integration & No ETL
More informationLazyBase: Trading freshness and performance in a scalable database
LazyBase: Trading freshness and performance in a scalable database (EuroSys 2012) Jim Cipar, Greg Ganger, *Kimberly Keeton, *Craig A. N. Soules, *Brad Morrey, *Alistair Veitch PARALLEL DATA LABORATORY
More informationThe Pliny Database PDB
The Pliny Database PDB Chris Jermaine Carlos Monroy, Kia Teymourian, Sourav Sikdar Rice University 1 PDB Overview PDB: Distributed object store + compute platform In Pliny project: Used to store processed
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationFujitsu/Fujitsu Labs Technologies for Big Data in Cloud and Business Opportunities
Fujitsu/Fujitsu Labs Technologies for Big Data in Cloud and Business Opportunities Satoshi Tsuchiya Cloud Computing Research Center Fujitsu Laboratories Ltd. January, 2012 Overview: Fujitsu s Cloud and
More informationFujitsu: Your Partner for SAP HANA Solutions
Fujitsu: Your Partner for SAP HANA Solutions The In-memory Revolution Process vast amounts of data in real-time Run analytics dramatically faster than disk-based DB (10x to >1,000x) Big Data Challenge
More informationManagement Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management
Management Information Systems Review Questions Chapter 6 Foundations of Business Intelligence: Databases and Information Management 1) The traditional file environment does not typically have a problem
More informationSoftware and Tools for HPE s The Machine Project
Labs Software and Tools for HPE s The Machine Project Scalable Tools Workshop Aug/1 - Aug/4, 2016 Lake Tahoe Milind Chabbi Traditional Computing Paradigm CPU DRAM CPU DRAM CPU-centric computing 2 CPU-Centric
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationMap-Reduce. Marco Mura 2010 March, 31th
Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More informationColumn Stores vs. Row Stores How Different Are They Really?
Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background
More informationThe Mathematics of Big Data
The Mathematics of Big Data Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) Math & Big Data Fall 2017 1 / 10 Introduction We briefly present Big Data and the issues associated with Big Data. Philippe
More informationLec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements
Lec 13: Linking and Memory Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University PA 2 is out Due on Oct 22 nd Announcements Prelim Oct 23 rd, 7:30-9:30/10:00 All content up to Lecture on Oct
More informationAccelerator Design for Big Data Processing Frameworks
Accelerator Design for Big Data Processing Frameworks Hiroki Matsutani Dept. of ICS, Keio University http://www.arc.ics.keio.ac.jp/~matutani July 5th, 2017 International Forum on MPSoC for Software-Defined
More informationPSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets
2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department
More informationAn Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 11 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(11), 2014 [5368-5376] The study on magnanimous data-storage system based
More informationWITH INTEL TECHNOLOGIES
WITH INTEL TECHNOLOGIES Commitment Is to Enable The Best Democratize technologies Advance solutions Unleash innovations Intel Xeon Scalable Processor Family Delivers Ideal Enterprise Solutions NEW Intel
More informationNEW CONVERGED APPROACH FOR SAP POWERED BY ATOS
NEW CONVERGED APPROACH FOR SAP POWERED BY ATOS Michael Schmitter, Atos Tim Wörfel, Hitachi Vantara 28.02.2018 HITACHI and Atos Partnership More 9 Years Partnership Partnership covers main areas of the
More informationCreating a Recommender System. An Elasticsearch & Apache Spark approach
Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused
More informationBigDataBench-MT: Multi-tenancy version of BigDataBench
BigDataBench-MT: Multi-tenancy version of BigDataBench Gang Lu Beijing Academy of Frontier Science and Technology BigDataBench Tutorial, ASPLOS 2016 Atlanta, GA, USA n Software perspective Multi-tenancy
More informationGPU Accelerated Data Processing Speed of Thought Analytics at Scale
GPU Accelerated Data Processing Speed of Thought Analytics at Scale The benefits of Brytlyt s GPU Accelerated Database Brytlyt is an ultra-high performance database that combines patent pending intellectual
More informationData-intensive computing in NGS
Data-intensive computing in NGS Luca Pireddu Distributed Computing Group June 6, 2013 luca.pireddu@crs4.it (CRS4) BigData Tech in NGS June 6, 2013 1 / 15 Data-intensive computing What is data-intensive
More informationChapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative
Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory
More informationCOLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)
COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE) PRESENTATION BY PRANAV GOEL Introduction On analytical workloads, Column
More informationAccelerate Big Data Insights
Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not
More informationProposal for parallel sort in base R (and Python/Julia)
Proposal for parallel sort in base R (and Python/Julia) Directions in Statistical Computing 2 July 2016, Stanford Matt Dowle Initial timings https://github.com/rdatatable/data.table/wiki/installation See
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationApproaching the Petabyte Analytic Database: What I learned
Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Oracle Partitioning für Einsteiger Hermann Bär Partitioning Produkt Management 2 Disclaimer The goal is to establish a basic understanding of what can be done with Partitioning I want you to start thinking
More informationSublinear Algorithms for Big Data Analysis
Sublinear Algorithms for Big Data Analysis Michael Kapralov Theory of Computation Lab 4 EPFL 7 September 2017 The age of big data: massive amounts of data collected in various areas of science and technology
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 24: MapReduce CSE 344 - Fall 2016 1 HW8 is out Last assignment! Get Amazon credits now (see instructions) Spark with Hadoop Due next wed CSE 344 - Fall 2016
More informationSAP HANA as an Accelerator for PLM Processes HANA Basics and Scenarios
SAP HANA as an Accelerator for PLM Processes HANA Basics and Scenarios Michael Dietz, Principal Solution Architect HANA Public Agenda SAP HANA Platform Usage Scenarios Potentials in Product Lifecycle Management
More informationYCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores
YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores Swapnil Patil Milo Polte, Wittawat Tantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie
More informationHOW TO BUILD A MODERN AI
HOW TO BUILD A MODERN AI FOR THE UNKNOWN IN MODERN DATA 1 2016 PURE STORAGE INC. 2 Official Languages Act (1969/1988) 3 Translation Bureau 4 5 DAWN OF 4 TH INDUSTRIAL REVOLUTION BIG DATA, AI DRIVING CHANGE
More informationSomething to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:
Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base
More informationCOMP 273 Winter physical vs. virtual mem Mar. 15, 2012
Virtual Memory The model of MIPS Memory that we have been working with is as follows. There is your MIPS program, including various functions and data used by this program, and there are some kernel programs
More informationThe Earth Simulator System
Architecture and Hardware for HPC Special Issue on High Performance Computing The Earth Simulator System - - - & - - - & - By Shinichi HABATA,* Mitsuo YOKOKAWA and Shigemune KITAWAKI The Earth Simulator,
More informationUnderstanding the SAP HANA Difference. Amit Satoor, SAP Data Management
Understanding the SAP HANA Difference Amit Satoor, SAP Data Management Webinar Logistics Got Flash? http://get.adobe.com/flashplayer to download. The future holds many transformational opportunities Capitalize
More informationExploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating
More informationAdvanced Databases: Parallel Databases A.Poulovassilis
1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationSAP HANA Scalability. SAP HANA Development Team
SAP HANA Scalability Design for scalability is a core SAP HANA principle. This paper explores the principles of SAP HANA s scalability, and its support for the increasing demands of data-intensive workloads.
More informationADVANCED IN-MEMORY COMPUTING USING SUPERMICRO MEMX SOLUTION
TABLE OF CONTENTS 2 WHAT IS IN-MEMORY COMPUTING (IMC) Benefits of IMC Concerns with In-Memory Processing Advanced In-Memory Computing using Supermicro MemX 1 3 MEMX ARCHITECTURE MemX Functionality and
More informationOutline. 1 Reiteration. 2 Cache performance optimization. 3 Bandwidth increase. 4 Reduce hit time. 5 Reduce miss penalty. 6 Reduce miss rate
Outline Lecture 7: EITF20 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University November 21, 2012 A. Ardö, EIT Lecture 7: EITF20 Computer Architecture November 21,
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationSAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms
SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6 6.0 Introduction Sorting algorithms used in computer science are often classified by: Computational complexity (worst, average and best behavior) of element
More informationAvailable online at ScienceDirect. Procedia Computer Science 98 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 515 521 The 3rd International Symposium on Emerging Information, Communication and Networks (EICN 2016) A Speculative
More informationData Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20
Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,
More informationwebmethods Task Engine 9.9 on Red Hat Operating System
webmethods Task Engine 9.9 on Red Hat Operating System Performance Technical Report 1 2015 Software AG. All rights reserved. Table of Contents INTRODUCTION 3 1.0 Benchmark Goals 4 2.0 Hardware and Software
More informationCPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors
CPUs Caches. Memory management. CPU performance. Cache : MainMemory :: Window : 1. Door 2. Bigger Door 3. The Great Outdoors 4. Horizontal Blinds 18% 9% 64% 9% Door Bigger Door The Great Outdoors Horizontal
More informationInnovations in Business Solutions. SAP Analytics, Data Modeling and Reporting Course
SAP Analytics, Data Modeling and Reporting Course Introduction: This course is design to cover SAP Analytics, Data Modeling and Reporting course content. After completion of this course students can go
More informationBig data systems 12/8/17
Big data systems 12/8/17 Today Basic architecture Two levels of scheduling Spark overview Basic architecture Cluster Manager Cluster Cluster Manager 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationMulti-tenancy version of BigDataBench
Multi-tenancy version of BigDataBench Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY 1 Multi-tenancy
More informationPerform scalable data exchange using InfoSphere DataStage DB2 Connector
Perform scalable data exchange using InfoSphere DataStage Angelia Song (azsong@us.ibm.com) Technical Consultant IBM 13 August 2015 Brian Caufield (bcaufiel@us.ibm.com) Software Architect IBM Fan Ding (fding@us.ibm.com)
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationCOMP Data Structures
COMP 2140 - Data Structures Shahin Kamali Topic 5 - Sorting University of Manitoba Based on notes by S. Durocher. COMP 2140 - Data Structures 1 / 55 Overview Review: Insertion Sort Merge Sort Quicksort
More informationLinear Regression Optimization
Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:
More informationIn-Memory Data Management
In-Memory Data Management Martin Faust Research Assistant Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam Agenda 2 1. Changed Hardware 2.
More informationCrateDB for Time Series. How CrateDB compares to specialized time series data stores
CrateDB for Time Series How CrateDB compares to specialized time series data stores July 2017 The Time Series Data Workload IoT, digital business, cyber security, and other IT trends are increasing the
More informationSAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine
SAP IQ Software16, Edge Edition The Affordable High Performance Analytical Database Engine Agenda Agenda Introduction to Dobler Consulting Today s Data Challenges Overview of SAP IQ 16, Edge Edition SAP
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationEvery SAS Cloud has a Silver Lining. Letting your data reign in the cloud
Every SAS Cloud has a Silver Lining Letting your data reign in the cloud DSS SAS SYSTEM Current Single Virtual Server unit with 16 cores upgraded to 32 cores 256 Gb RAM 150 registered users Data collector
More informationNEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III
NEC Express5800 A2040b 22TB Data Warehouse Fast Track Reference Architecture with SW mirrored HGST FlashMAX III Based on Microsoft SQL Server 2014 Data Warehouse Fast Track (DWFT) Reference Architecture
More informationChuck Cartledge, PhD. 24 September 2017
Introduction Amdahl BD Processing Languages Q&A Conclusion References Big Data: Data Analysis Boot Camp Serial vs. Parallel Processing Chuck Cartledge, PhD 24 September 2017 1/24 Table of contents (1 of
More informationThe Optimal CPU and Interconnect for an HPC Cluster
5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance
More informationHANA Performance. Efficient Speed and Scale-out for Real-time BI
HANA Performance Efficient Speed and Scale-out for Real-time BI 1 HANA Performance: Efficient Speed and Scale-out for Real-time BI Introduction SAP HANA enables organizations to optimize their business
More informationUnit 14 plan installation and maintenance of hardware in a technology system
Unit 14 plan installation and maintenance of hardware in a technology system In this assessment I will be describing the purpose and client requirements for the hardware, I will produce a plan for installing
More informationPerformance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development M. Serdar Celebi
More informationTokyo. Copyright 2013 FUJITSU LIMITED
Shaping Tomorrow Through Modernization and Innovation Noriyuki Toyoki Corporate Senior Vice President Fujitsu Limited Tokyo Fujitsu Vision Human Centric Intelligent Society Fujitsu Technology and Service
More informationEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management
More informationMaximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale.
Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale. January 2016 Credit Card Fraud prevention is among the most time-sensitive and high-value of IT tasks. The databases
More informationHow to integrate data into Tableau
1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service
More informationProgress DataDirect For Business Intelligence And Analytics Vendors
Progress DataDirect For Business Intelligence And Analytics Vendors DATA SHEET FEATURES: Direction connection to a variety of SaaS and on-premises data sources via Progress DataDirect Hybrid Data Pipeline
More informationQuery Evaluation Overview, cont.
Query Evaluation Overview, cont. Lecture 9 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Architecture of a DBMS Query Compiler Execution Engine Index/File/Record Manager
More informationUnderstanding the latent value in all content
Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence
More informationGoro Watanabe. Bill King. OOW 2013 The Best Platform for Big Data and Oracle Database 12c. EVP Fujitsu R&D Center North America
OOW 2013 The Best Platform for Big Data and Oracle Database 12c Goro Watanabe EVP Fujitsu R&D Center North America Bill King EVP Platform Products Group Fujitsu America, Inc. Overview 1. Fujitsu: Quick
More informationIntroduction to Big-Data
Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,
More informationNetezza The Analytics Appliance
Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for
More informationInfor Lawson on IBM i 7.1 and IBM POWER7+
Infor Lawson on IBM i 7.1 and IBM POWER7+ IBM Systems & Technology Group Mike Breitbach mbreit@us.ibm.com This document can be found on the web, Version Date: March, 2014 Table of Contents 1. Introduction...
More information朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC
October 28, 2013 Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration 朱义普 Director, North Asia, HPC DDN Storage Vendor for HPC & Big Data
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationSentryWire Next generation packet capture and network security.
Next generation packet capture and network security. 1 The data landscape More data, more danger. Data proliferation brings many new opportunities but also many downsides: more data breaches, more sophisticated
More information1 o Semestre 2007/2008
Efficient Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 6 7 Outline 1 2 3 4 5 6 7 Text es An index is a mechanism to locate a given term in
More informationIn-Memory Data Management Jens Krueger
In-Memory Data Management Jens Krueger Enterprise Platform and Integration Concepts Hasso Plattner Intitute OLTP vs. OLAP 2 Online Transaction Processing (OLTP) Organized in rows Online Analytical Processing
More informationDefining The Software-Defined Technology Market Mario Blandini
Defining The Software-Defined Technology Market Mario Blandini HGST mario.blandini@hgst.com @SwiftMario Forward Looking Statement This presentation contains forward-looking statements that involve risks
More informationSentryWire Next generation packet capture and network security.
Next generation packet capture and network security. 1 The data landscape 5 big cyber security trends for 2018 More data, more danger. Data proliferation brings many new opportunities but also many downsides:
More information10 Million Smart Meter Data with Apache HBase
10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on
More informationzspotlight: Spark on z/os
zspotlight: Spark on z/os Avijit Chatterjee, Ph.D. achatter@us.ibm.com, @ChatterAvijit STSM, IBM Competitive Project Office 1 CEOs are increasingly focused on customers as individuals leveraging contextual
More informationSpotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data
Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing
More informationBring Context To Your Machine Data With Hadoop, RDBMS & Splunk
Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may
More informationIntroduction to the Mathematics of Big Data. Philippe B. Laval
Introduction to the Mathematics of Big Data Philippe B. Laval Fall 2017 Introduction In recent years, Big Data has become more than just a buzz word. Every major field of science, engineering, business,
More informationCompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy
CompSci 516 Database Systems Lecture 20 Parallel DBMS Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements HW3 due on Monday, Nov 20, 11:55 pm (in 2 weeks) See some
More information