Newly invented and fully owned by Turbo Data Laboratories, Inc. (TDL)

Size: px

Start display at page:

Download "Newly invented and fully owned by Turbo Data Laboratories, Inc. (TDL)"

Magnus Shaw
5 years ago
Views:

1 Newly invented and fully owned by Turbo Data Laboratories, Inc. (TDL) 28, July, 2017

2 Executive Summary Universal & Designless, yet Far Faster than Legacy Technologies Big Data Technology has to do with many kinds of operations(interactive + Batch) + IoT + AI. Universal and Designless and yet the Fastest is awaited. Innovation Continued Based On Mathematical Principles That technology should start from mathematical principles, laid on more fundamental part than the start line of current technologies. Turbo Data Laboratories, Inc. (TDL) is a company, has been developing orders of magnitude faster data processing technologies. TDL has been researching its technology from 1996, heaved its level in every 2 or 3 years, and now it comes critical point. 2001: ZAP-In : Big Data s Spread Sheet x500 ~ x700 faster, at Fujitsu s Benchmark. 2013: ZAP-Over : Searching / Gathering of Globally Distributed Big Data x1,000 in total performance, at National Tax Agency ZAP-Mass : PB class Super Big Data DB System, x400,000 faster at Sorting. 2

3 Section 1 History of Turbo Data Laboratories Linear Filtering Method (LFM) Theory 3

1. History of LFM Theory 2017-2013- Refer to Earth

4 1. History of LFM Theory Refer to Earth wide Big-data Access to Local Bigdata Access to Google Class Bigdata 1. Zap-In Technology Big Data s Spread Sheet 10KByte - 1TByte 2. Zap-Over Technology Globally Distributed R/O DB 10KByte - 10TByte 3. Zap-Mass Technology Massive Parallel DB 1GByte - 100PByte Interactive Distributed Massive & Interactive LFM Theory A revolutionary DB theory based on Algorithm Index 4

5 2. Chart of Technologies and Products Technology layer is as follows ZAP-In / ZAP-Over 1. Math Quark Theory: defines substructures of a table, provides a universal foundation for Math Index 2. Math Index Theory: provides universal multi-functional indexes to every field and/or ordered set 3. Math Switch Theory: provides runtime partitioning to allot CPU / memory / communication 4. PETA DB OS: provides preemptive multi-tasking & resource control 5. PETA Sheet: provides browsing / accessing / analyzing / programming platform to users Product Series Layer ZAP-In (2001-) Zap-Over (2013-) Zap-Mass (2017-) Application - - PETA Sheet OS - - PETA DB OS Technology Layer LFM Technology Architecture - - Algorithms Data Structure LFM Index Math Index Theory (for 1/3 model) Math Quark Theory (for 1/3 model) Math Index Theory (for 4/6 model) Math Quark Theory (for 4/6 model) Math Switch Theory Math Index Theory (for 3/5 model) Math Quark Theory (for 3/5 model) 5

6 Section 2 Why Linear Filtering Method (LFM) Works Well Always? 6

7 3. Every Data s Substructure Math Quark Math Quark : Collection of arrays (= Table) has 3 basic substructures: Math Quarks. 1 st. Ordered Set: has role to control select status and access order to each member. 2 nd. Value Number: has role to abstract real data into integer. fig 1. 3 rd. Value List: has role to control existing values. A way of combination of Math Quarks is equivalent to original arrays (= Table). And another combination of Math Quarks becomes index for sort / tabulation and another Math Index Merits of Math Quarks : 1 st. Because Math Quark exists in every combination of fields and ordered sets, algorithm (Math Index) is available always. Thus Math Quark enables any cascading of algorithms. (See fig. 2) 1 st. By Math Quarks, we can reuse existing Math Quarks to build transformation results. For example, in sorting / searching, we can reuse Value Number and Value List. That reduces CPU steps in sorting: O(n*log(n)) O(n). Math Quark Enables Math Index Fig 1. A simple example of Math Quarks Math Quarks Original Data G. Age G. Age OrdSet VNo VL VNo VL 0 F F M 9 = M M F Fig 2. Cascading of processes 7

8 4. Ever Existing Index Math Index Math Index : Data Index : Existing Index is outside data of indexed data. It and indexed by it are independent each other. It is always defined by data, I named them Data Index. It is strictly bound to specific data, we can use it to that specific data only. Math Index is defined by algorithms, is an Algorithm Index. Math Index is available at anytime / any case. Because Math Quark exists always. Math Index Enables Math Switch Merits of Math Index : 1 st. It doesn t use memory / storage. It can be transferred without communication cost. 2 nd. It can go with every field and every ordered set (subset). It can be cascaded always. 3 rd. It is Rich in functions: OLTP / Join / Sorting / Search / Tabulation / Set Operations / etc., every DB operation is possible. 4 th. No need to update. 5 th. It can utilize multi-core. / It can run in massive parallel systems. 8

9 Section 3 Existing Product Series 1 ZAP-In Technology 9

1. Zap-In for Big Data s Spread Sheet Interactive Big Data Suitable Data size: 10KByte - 1TByte Very Fast and Quick in Response x25 faster than Spark x500 ~ x700

10 1. Zap-In for Big Data s Spread Sheet Interactive Big Data Suitable Data size: 10KByte - 1TByte Very Fast and Quick in Response x25 faster than Spark x500 ~ x700 at Fujitsu s Benchmark, 2001 (next page) Very Rich in functionality Enables Big-data s Spread Sheet with RDB functions Is One Stop Platform to access Big Data 10

11 1. Zap-In Technology (continued) Spread Sheet for Big-data Interactive operation like Excel for Big-data (up to 1TB) Quick operation even for Big-data Quick system integration by Automatic Programming Zap-In Spread Sheet Excel Relational Database Big-data OK (Up to 10TB) NG OK Interactive Operation OK (Easy) OK (Easy) NO Operation Speed Very Fast Slow Fast Macro Recording OK (creates Python code) OK NO DB Operation OK (tabulation, sorting, search, join, union ) NG OK 11

12 Benchmark at Fujitsu, Zap-In (continued) 12

Track Record Zap-In has been main product series.

Those who need absolutely fast Big Data s Batches.

like its Cleansing / Transformation / Analytics / etc.

x500 ~ x700 faster in BOM development and MRP.

system. Fujitsu announced it reduces $2.

13 Track Record Zap-In has been main product series. ZAP-In Engine made by LFM technology, turned many impossible to possible. It has been used by 2 kinds of users. 1 st. Those who need absolutely fast Big Data s Batches. 2 nd. Those who need interactive Big Data operations, like its Cleansing / Transformation / Analytics / etc.. In 2001, Fujitsu benchmarked it, and found it runs x500 ~ x700 faster in BOM development and MRP. So, ZAP-In has been used for its central procurement system. Fujitsu announced it reduces $2.8B/y from total $30B/y. Patents of ZAP-In has licensed to SAP, NEC, Fujitsu BSC, and others. And other users about

14 Section 4 Existing Product Series 2 ZAP-Over Technology 14

15 2. Zap-Over Technology TWO Remote Big-data case (fig. right) Globally Distributed DB DB operations over Internet, including Union/Join/ Big-data A Zap-Over Service Big-data B Zap-Over Service Interactive Operation with Quick Response Read only (mainly) Suitable Each Table Size: 10KByte - 10TByte Applications: Open Data Service Distributed IoT DB for Distributed Organizations Zap-Over Client Zap-Over Client Zap-Over Client Zap-Over Client 15

2. Zap-Over Technology (continued) Big-data

Enterprise Super high speed Unification, Search

BigData by Airlines Merge Operation takes a long

16 2. Zap-Over Technology (continued) Big-data unification/search at distributed branches of an Enterprise Super high speed Unification, Search & Browsing Before Zap-Over After Zap-Over Carry BigData by Airlines Merge Operation takes a long time Big-data operation at the center x100 x10 Big-data operations over Internet Merge Operation takes only 100ms Big-data operation at any place 16

17 Track Record Zap-Over By ZAP-Over technology (2013-), One Stop Searching / Browsing over many Big Data at many locations, comes possible. By looking up over 100 countries deal logs, money laundering s trace comes possible. But it took 15 ~ 20 minutes each 1 trace, and simultaneous user count was up to 2. By ZAP-Over technology, 1 trace time reduced to about 10 sec. (x100), and simultaneous user count comes to 20 (x10). That system has been running in National Tax Agency from 2013-, to detect international money laundering. 17

18 Section 5 Future Product Series ZAP-Mass Technology 18

19 1. ZAP-Mass Introduction Cloud Computing: Main field where Amazon/Google/Microsoft/etc. are competing Next winner will be who achieve to provide PB class DB platform on cloud to users, by conquering following problems: 1. Too slow. 2. Too few functions. Innovation Continued Based On Mathematical Principles Turbo Data Laboratories, Inc. (TDL) is a company, has been developing orders of magnitude faster data processing technologies. TDL has been researching its technology from 1996, heaved its level in every 2 or 3 years, and now it comes critical point. 2001: ZAP-In : Big Data s Spread Sheet x500 ~ x700 faster, at Fujitsu s Benchmark. 2013: ZAP-Over : Searching / Gathering of Globally Distributed Big Data x1,000 in total performance, at National Tax Agency ZAP-Mass : PB class Super Big Data DB System, x400,000 faster at Sorting. ZAP-Mass: is a massive parallel Big Data DB system (Algorithm + Architecture + DB-OS + Application), with dedicated communication chip in each server node. can do PB class DB processing. can enable Big Data s versatile operations by end users own selves. 19

20 2. ZAP-Mass: Performance Simulation at PB Table Example DB DB Total 1PB 100 Fields 2KB / record 500,000,000,000 rec. Example System & Architecture System Total Each Server Each Chip- Module 32,768 servers 2PB Memory 64GB Mem Com. speed 50Gbps Storage 500MB/s 128 Chip-Modules 1GB memory 50 Gbps input 50 Gbps output Zap- Mass Only Operation Hadoop, etc (Estimation) ZAP-Mass (Estimation) Magnificatio n 1. Sort by int. field, 100,000 cardinality 1,200,000 sec 3 sec x 400, Extraction by search, 10% Hit 8 sec sec x 4, Extraction by search, 50% Hit 40 sec sec x 20, Tabulation, occurrence in 100,000 cardinality string field 5. N:N sort Join, by 1 string key, that key has 100,000 cardinality 6. Distinct, by 2 string keys, each key has 100,000 cardinality 120 sec 0.06 sec x 2,000-4 sec sec - 7. Insert or Delete 1,000 records sec - Using current technologies causes Severe Limitations too slow sorting and functions who use it, are almost impossible editing is almost impossible Impossible to use for common users 20

3. Math Switch Enables Dynamic Partitioning Math Switch : Math Switch Theory is available over Math Index that makes between nodes communication to be easy to handle in massive parallel ways.

21 3. Math Switch Enables Dynamic Partitioning Math Switch : Math Switch Theory is available over Math Index that makes between nodes communication to be easy to handle in massive parallel ways. Math Switch offers multiple ring architecture as shown in fig 5-1 (next page). That architecture has 2 directional symmetries. 1 st. ring wise. Ring wise direction assigns pipeline length. 2 nd. inter ring wise. Inter ring wise direction assigns degree of parallelism. Math Switch offers dynamic partitioning in 2 directions. (fig 5-2, fig 5-3, next page) Math Switch can assign task s Pipeline length and Degree of parallelism by changing partition sizes. Math Switch can control amount of resources for each task also by changing partition sizes. Math Switch offers preemptive task switching also, see fig 5-4 (next page), that was not easy for super computers. 21

22 3. Math Switch Enables Dynamic Partitioning fig 5-1. Multiple Ring Architecture fig 5-2. Division of Ring n03 ring 0 n02 n02 data data n00 n01 ring 0 n00 n03 n01 n02 ring 1 n10 ring 2 n20 n13 n23 n11 n12 n22 (ring-wise) data n21 data : data passed to next n12 n22 ring 1 n10 ring 2 n20 n13 n23 n11 n21 n12 n22 (inter ring-wise) : data not passed to next fig 5-3. Horizontal Division fig 5-4. Preemptive Task Switching Enabled by Switching Packets to Pass ring 0 n03 n02 n00 n01 ring 1 n10 n13 n11 n12 ring 2 n20 n23 n21 n22 22

23 It controls runtime partitioning ability of Math Switch. System becomes much more scalable, easily and meaningfully, by that partitioning. It can keep and manage big count of / many kinds of Big Data, that is not easy for other Big Data systems. It can run many tasks in many partitions. It can switch tasks preemptively in each partition. 4. PETA DB OS PB class, Preemptive Multi Task, DB OS Over Massive Parallel Architecture 2 3

for Accessing (Cleansing / Transforming / Editing / etc.) Big Data C.

24 5. PETA Sheet A Big Data s Spread Sheet with RDB functions Featuring following functions. A. for Big Data s Browsing B. for Accessing (Cleansing / Transforming / Editing / etc.) Big Data C. for Analyzing (Statistics / Data Mining / BI / etc.) Big Data D. for Programming Big Data E. Control panel of PETA DB OS 24

6. Summary of Zap-Mass Technology Enables DB System On Massive Parallel Computer System, employing dedicated chips (to be designed), Composed by Math Quark Math Index Math Switch PETA DB

25 6. Summary of Zap-Mass Technology Enables DB System On Massive Parallel Computer System, employing dedicated chips (to be designed), Composed by Math Quark Math Index Math Switch PETA DB OS PETA Sheet Suitable Data size: 1TByte - 100PByte Suitable System size: 16 servers 1,000,000 servers or more Expected performance: about x10,000 than Hadoop, at same count of servers 25

26 7. Zap-Mass Enables ZAP-Mass enables Big Data s versatile operations by end users own selves. 26

27 Thank you 30

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013 SAP HANA Jake Klein/ SVP SAP HANA June, 2013 SAP 3 YEARS AGO Middleware BI / Analytics Core ERP + Suite 2013 WHERE ARE WE NOW? Cloud Mobile Applications SAP HANA Analytics D&T Changed Reality Disruptive