1Table A System for Managing Structured Web Data. Yang Zhang with: Alon Halevy, Mike Cafarella, Nodira Khoussainova, Eugene Wu and Daisy Zhe
|
|
- Camron Palmer
- 5 years ago
- Views:
Transcription
1 1Table A System for Managing Structured Web Data Yang Zhang with: Alon Halevy, Mike Cafarella, Nodira Khoussainova, Eugene Wu and Daisy Zhe
2 Structured Web Data No Web is more than just text tables, tags, lists, etc 50% pages have tables 25% tables appear to be useful data tables (relational, entity, sets, etc.) No existing tools to effectively query this data RDBMSs don t scale, process noisy data poorly Search engines are structure blind 1Table fills the gap! tables Other tables Data tables
3 Data Visualization Table Search The 1Table Project Synthetic Table Generation Reference Reconciliation Schema Reconciliation
4 Data Visualization Table Search The 1Table Project Synthetic Table Generation Reference Reconciliation Schema Reconciliation
5 1Table Project HOBO: TABLE SEARCH
6 The Quest for Infrastructure _: limited indexing options, inefficient structure _: lots of hoops, un structured _: little bang for the buck, slow setup, inefficient structure Wanted control over query model, ranking Hobo: poor man s text search
7 Challenges Millions of tables (~100M in Core) Noisy: many are not data tables (layout) Query by: attributes? values? similar examples? No structured metadata Hobo Similar to traditional inverted index search Schema agnostic structured query model
8 Hobo Query Processor Slave 0 Shard Slaves Shard Slaves Shard Slaves Table TID TID Index Master Slave 1 Shard Slaves Shard Slaves Shard Slaves Table TID TID Index GFS Slave 499
9 Processing Pipeline extraction filtering docjoins raw tables good tables annotation servers labeling, annotation, munging Daffie querying indexing query processor Hobo inverted index analyzed/cleaned tables
10 Recipe: Hobo Query Model Start with Google.com-style conjunction of disjunctions Add structural primitives: terms have attributes Introduce binding of variables to terms Impose binary relational constraints (½ cup) Mix bindings and constraints in arbitrary boolean expressions Serve and enjoy
11 Query Model and x y united states where x.offset + 1 = y.offset
12 Query Model and x z y france germany paris where x.row = y.row and x.col = z.col
13 Query Model What attributes are currently available? Physical: offset, col, row Logical: source (header/body/context) For ranking: size, pagerank, isdatatable, hasheaders, Easy to add more! Fast (poly time) constraint verifier
14 Query Languages High level template based query language example: united states china prc us cn * to ((("united states") (us)) ((china prc) (cn)) ((_) (to))) parser, rewriter Low level constraint based query language: and { a = and { a = term { united } b = term { states } where a.pos + 1 = b.pos } b = or { term { china } term { prc } } c = us d = cn e = to where a.col == b.col c.col == d.col c.col == e.col a.row == c.row b.row == d.row }
15 Demo!
16 Areas for Future Work Low hanging performance fruits O(n) constraint verification by ordering/hashing Smarter concurrent iteration over inverted index Query rewriting More resources Soft constraints: not required, but use for ranking Frontend: richer data visualization Ranking of results Easy integration into Dataspaces
17 1Table Project TABLE SUGGEST
18 Synthetic Table Generation What country corresponds to code tr? united states china us cn tr united states us china cn turkey tr japan jp...
19 Challenges Inconsistent/inaccurate information Resolving data from multiple sources Ad hoc semantics Data with nested (sub cell) structure.us (united states) united states/us
20 TableSuggest Features Spreadsheet that suggests values to fill in Can draw data from _ and Google Sets, but primarily 1Table (Hobo) Hodgpodge of techniques (thrown in ad hoc manner from inspecting results) Type enumeration (_, Hobo) Set expansion (Sets, Hobo) Attribute resolution (Hobo) Column clustering (1Table)
21 Demo!
22 Areas for Future Work More principled evaluation Implementation infelicities Support for numeric queries using two tier indexing structure with range buckets Richer sub structure extraction (lists) Incremental indexing with live data feeds/sources Tailoring to specific domains Entity tables Aggregating values in denormalized tables
Flexible, secure and proven SAP mass master data maintenance using Excel. Webinar Q&A
Flexible, secure and proven SAP mass master data maintenance using Excel Webinar Q&A 25 March 2015 Q: Are there any limitations to the volumes of records that Winshuttle can handle? And can these be run
More informationStructured Data on the Web
Structured Data on the Web Alon Halevy Google Australasian Computer Science Week January, 2010 Structured Data & The Web Andree Hudson, 4 th of July Hard to find structured data via search engines
More informationProcessing of Very Large Data
Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first
More informationContinuous MySQL Restores Divij Rajkumar
Continuous MySQL Restores Divij Rajkumar (divij@fb.com) Production Engineer, MySQL Infrastructure, Facebook Continuous Restores Why? Verify backup integrity Haven t tested your backups? You don t have
More informationCS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Text data and information retrieval Li Xiong Department of Mathematics and Computer Science Emory University Outline Information Retrieval (IR) Concepts Text Preprocessing Inverted
More informationKey Differentiators. What sets Ideal Anaytics apart from traditional BI tools
Key Differentiators What sets Ideal Anaytics apart from traditional BI tools Ideal-Analytics is a suite of software tools to glean information and therefore knowledge, from raw data. Self-service, real-time,
More informationApache Hadoop Goes Realtime at Facebook. Himanshu Sharma
Apache Hadoop Goes Realtime at Facebook Guide - Dr. Sunny S. Chung Presented By- Anand K Singh Himanshu Sharma Index Problem with Current Stack Apache Hadoop and Hbase Zookeeper Applications of HBase at
More informationOverview of Reporting in the Business Information Warehouse
Overview of Reporting in the Business Information Warehouse Contents What Is the Business Information Warehouse?...2 Business Information Warehouse Architecture: An Overview...2 Business Information Warehouse
More informationWebTables: Exploring the Power of Tables on the Web
WebTables: Exploring the Power of Tables on the Web Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, Yang Zhang Presented by: Ganesh Viswanathan September 29 th, 2011 CIS 6930 Data Science:
More informationETL Transformations Performance Optimization
ETL Transformations Performance Optimization Sunil Kumar, PMP 1, Dr. M.P. Thapliyal 2 and Dr. Harish Chaudhary 3 1 Research Scholar at Department Of Computer Science and Engineering, Bhagwant University,
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationSQL Data Definition: Table Creation
SQL Data Definition: Table Creation ISYS 464 Spring 2002 Topic 11 Student Course Database Student (Student Number, Student Name, Major) Course (Course Number, Course Name, Day, Time) Student Course (Student
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationMinghai Liu, Rui Cai, Ming Zhang, and Lei Zhang. Microsoft Research, Asia School of EECS, Peking University
Minghai Liu, Rui Cai, Ming Zhang, and Lei Zhang Microsoft Research, Asia School of EECS, Peking University Ordering Policies for Web Crawling Ordering policy To prioritize the URLs in a crawling queue
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationBigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis
BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis Motivation Lots of (semi-)structured data at Google URLs: Contents, crawl metadata, links, anchors, pagerank,
More informationCrawler. Crawler. Crawler. Crawler. Anchors. URL Resolver Indexer. Barrels. Doc Index Sorter. Sorter. URL Server
Authors: Sergey Brin, Lawrence Page Google, word play on googol or 10 100 Centralized system, entire HTML text saved Focused on high precision, even at expense of high recall Relies heavily on document
More informationPutting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt
Putting together the platform: Riak, Redis, Solr and Spark Bryan Hunt 1 $ whoami Bryan Hunt Client Services Engineer @binarytemple 2 Minimum viable product - the ideologically correct doctrine 1. Start
More informationLessons Learned While Building Infrastructure Software at Google
Lessons Learned While Building Infrastructure Software at Google Jeff Dean jeff@google.com Google Circa 1997 (google.stanford.edu) Corkboards (1999) Google Data Center (2000) Google Data Center (2000)
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationMike Fechner, Consultingwerk Ltd.
Mike Fechner, Consultingwerk Ltd. mike.fechner@consultingwerk.de http://www.consultingwerk.de/ 2 Consultingwerk Ltd. Independent IT consulting organization Focusing on OpenEdge and related technology Located
More informationColumn-Family Databases Cassandra and HBase
Column-Family Databases Cassandra and HBase Kevin Swingler Google Big Table Google invented BigTableto store the massive amounts of semi-structured data it was generating Basic model stores items indexed
More informationFAST& SCALABLE SYSTEMS WITH APACHESOLR. Arnon Yogev IBM Research
FAST& SCALABLE EMAIL SYSTEMS WITH APACHESOLR Arnon Yogev IBM Research Background IBM Verse is a cloud based business email system Background cont. Verse backend is based on Apache Solr Almost every user
More informationFinal Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23
Final Exam Review 2 Kathleen Durant CS 3200 Northeastern University Lecture 23 QUERY EVALUATION PLAN Representation of a SQL Command SELECT {DISTINCT} FROM {WHERE
More informationEfficient Object-Relational Mapping for JAVA and J2EE Applications or the impact of J2EE on RDB. Marc Stampfli Oracle Software (Switzerland) Ltd.
Efficient Object-Relational Mapping for JAVA and J2EE Applications or the impact of J2EE on RDB Marc Stampfli Oracle Software (Switzerland) Ltd. Underestimation According to customers about 20-50% percent
More informationUncovering the Relational Web
Uncovering the Relational Web Michael J. Cafarella University of Washington mjc@cs.washington.edu Eugene Wu MIT eugene@csail.mit.edu Alon Halevy Google, Inc. halevy@google.com Yang Zhang MIT zhang@csail.mit.edu
More informationVisualizing semantic table annotations with TableMiner+
Visualizing semantic table annotations with TableMiner+ MAZUMDAR, Suvodeep and ZHANG, Ziqi Available from Sheffield Hallam University Research Archive (SHURA) at:
More informationEBA 2017 EU wide transparency exercise dataset
EBA 2017 EU wide transparency exercise dataset Data user guide For the 2017 EU wide transparency exercise, the EBA published bank by bank data contained in 10 transparency templates (more than 4 000 data
More informationSemantic Search at Bloomberg
Semantic Search at Bloomberg Search Solutions 2017 Edgar Meij Team lead, R&D AI emeij@bloomberg.net @edgarmeij Bloomberg Professional Service Bloomberg at a glance Bloomberg Professional Service Trading
More informationHYRISE In-Memory Storage Engine
HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University
More informationData Infrastructure at LinkedIn. Shirshanka Das XLDB 2011
Data Infrastructure at LinkedIn Shirshanka Das XLDB 2011 1 Me UCLA Ph.D. 2005 (Distributed protocols in content delivery networks) PayPal (Web frameworks and Session Stores) Yahoo! (Serving Infrastructure,
More informationVisual Query Suggestion
Visual Query Suggestion Zheng-Jun Zha, Linjun Yang, Tao Mei, Meng Wang, Zengfu Wang University of Science and Technology of China Textual Visual Query Suggestion Microsoft Research Asia Motivation Framework
More informationHBase. Леонид Налчаджи
HBase Леонид Налчаджи leonid.nalchadzhi@gmail.com HBase Overview Table layout Architecture Client API Key design 2 Overview 3 Overview NoSQL Column oriented Versioned 4 Overview All rows ordered by row
More informationDatabase infrastructure for electronic structure calculations
Database infrastructure for electronic structure calculations Fawzi Mohamed fawzi.mohamed@fhi-berlin.mpg.de 22.7.2015 Why should you be interested in databases? Can you find a calculation that you did
More informationEmbedding Metadata and Other Semantics In Word-Processing Documents
Embedding Metadata and Other Semantics In Word-Processing Documents Peter Sefton (University Southern Queensland) Ian Barnes (Australian National University) Ron Ward (University Southern Queensland) Jim
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationIndex Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search
Index Construction Dictionary, postings, scalable indexing, dynamic indexing Web Search 1 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing
More informationNoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationEmpowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia
Empowering People with Knowledge the Next Frontier for Web Search Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Important Trends for Web Search Organizing all information Addressing user
More informationHow Tencent uses PGXZ(PGXC) in WeChat payment system. jason( 李跃森 )
How Tencent uses PGXZ(PGXC) in WeChat payment system jason( 李跃森 ) jasonysli@tencent.com About Tencent and Wechat Payment Tencent, one of the biggest internet companies in China. Wechat, the most popular
More informationOverview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa
ICS 624 Spring 2011 Overview of DB & IR Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 1/12/2011 Lipyeow Lim -- University of Hawaii at Manoa 1 Example
More informationIntroduction to MySQL NDB Cluster. Yves Trudeau Ph. D. Percona Live DC/January 2012
Introduction to MySQL NDB Cluster Yves Trudeau Ph. D. Percona Live DC/January 2012 Agenda What is NDB Cluster? How MySQL uses NDB Cluster Good use cases Bad use cases Example of tuning What is NDB cluster?
More informationWeb Applications. Software Engineering 2017 Alessio Gambi - Saarland University
Web Applications Software Engineering 2017 Alessio Gambi - Saarland University Based on the work of Cesare Pautasso, Christoph Dorn, Andrea Arcuri, and others ReCap Software Architecture A software system
More informationSymbol Table Information. Symbol Tables. Symbol table organization. Hash Tables. What kind of information might the compiler need?
Symbol Table Information For compile-time efficiency, compilers often use a symbol table: associates lexical names (symbols) with their attributes What items should be entered? variable names defined constants
More informationConsistency Without Transactions Global Family Tree
Consistency Without Transactions Global Family Tree NoSQL Matters Cologne Spring 2014 2014 by Intellectual Reserve, Inc. All rights reserved. 1 Contents Introduction to FamilySearch Family Tree Motivation
More informationW b b 2.0. = = Data Ex E pl p o l s o io i n
Hypertable Doug Judd Zvents, Inc. Background Web 2.0 = Data Explosion Web 2.0 Mt. Web 2.0 Traditional Tools Don t Scale Well Designed for a single machine Typical scaling solutions ad-hoc manual/static
More informationHyperion Financial Management Course Content:35-40hours
Hyperion Financial Management Course Content:35-40hours Course Outline Introduction to Financial Management About Enterprise Performance Management Financial Management Solution Financial Consolidation,
More informationConjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.
Conjunctive queries Relational calculus queries without negation and disjunction. Conjunctive queries have a normal form: ( y 1 ) ( y n )(p 1 (x 1,..., x m, y 1,..., y n ) p k (x 1,..., x m, y 1,..., y
More informationWhy and How to Use Simio Data Tables
Data Tables Why and How to Use Simio Data Tables Model Enhancements with Data Tables Data Table Binding Relational Tables Referencing Output Tables Input Parameter Table Property Spreadsheet Tip 7/18/2017
More informationOracle 1Z0-591 Exam Questions and Answers (PDF) Oracle 1Z0-591 Exam Questions 1Z0-591 BrainDumps
Oracle 1Z0-591 Dumps with Valid 1Z0-591 Exam Questions PDF [2018] The Oracle 1Z0-591 Oracle Business Intelligence Foundation Suite 11g Essentials exam is an ultimate source for professionals to retain
More informationMySQL 8.0: Atomic DDLs Implementation and Impact
MySQL 8.0: Atomic DDLs Implementation and Impact Ståle Deraas, Senior Development Manager Oracle, MySQL 26 Sept 2017 Copyright 2017, Oracle and/or its its affiliates. All All rights reserved. Safe Harbor
More informationQuerying Introduction to Information Retrieval INF 141 Donald J. Patterson. Content adapted from Hinrich Schütze
Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Overview Boolean Retrieval Weighted Boolean Retrieval Zone Indices
More informationPresented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu
Presented by: Dimitri Galmanovich Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu 1 When looking for Unstructured data 2 Millions of such queries every day
More informationModule 4. Implementation of XQuery. Part 0: Background on relational query processing
Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:
More informationSymbol Tables. For compile-time efficiency, compilers often use a symbol table: associates lexical names (symbols) with their attributes
For compile-time efficiency, compilers often use a symbol table: associates lexical names (symbols) with their attributes What items should be entered? variable names defined constants procedure and function
More informationSAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less
SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less Dipl.- Inform. Volker Stöffler Volker.Stoeffler@DB-TecKnowledgy.info Public Agenda Introduction: What is SAP IQ - in a
More informationSDMX self-learning package No. 3 Student book. SDMX-ML Messages
No. 3 Student book SDMX-ML Messages Produced by Eurostat, Directorate B: Statistical Methodologies and Tools Unit B-5: Statistical Information Technologies Last update of content February 2010 Version
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationBig Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline
More informationMULTIPLE OPERAND ADDITION. Multioperand Addition
MULTIPLE OPERAND ADDITION Chapter 3 Multioperand Addition Add up a bunch of numbers Used in several algorithms Multiplication, recurrences, transforms, and filters Signed (two s comp) and unsigned Don
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationQuery optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.
Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE
More informationPart XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321
Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends
More informationTable Identification and Information extraction in Spreadsheets
Table Identification and Information extraction in Spreadsheets Elvis Koci 1,2, Maik Thiele 1, Oscar Romero 2, and Wolfgang Lehner 1 1 Technische Universität Dresden, Germany 2 Universitat Politècnica
More information70-459: Transition Your MCITP: Database Administrator 2008 or MCITP: Database Developer 2008 to MCSE: Data Platform
70-459: Transition Your MCITP: Database Administrator 2008 or MCITP: Database Developer 2008 to MCSE: Data Platform The following tables show where changes to exam 70-459 have been made to include updates
More informationInfrastructure for innovation. Osma Ahvenlampi, CTO Sulake Corporation
Infrastructure for innovation Osma Ahvenlampi, CTO Sulake Corporation www.sulake.com Sulake Corporation Founded May 2000 in Helsinki Interactive entertainment company based on online communities and casual
More informationCLARIN for Linguists Portal & Searching for Resources. Jan Odijk LOT Summerschool Nijmegen,
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen, 2014-06-23 1 Overview CLARIN Portal Find data and tools 2 Overview CLARIN Portal Find data and tools 3 CLARIN
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web
More informationAn UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry
An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry I-Chen Wu 1 and Shang-Hsien Hsieh 2 Department of Civil Engineering, National Taiwan
More informationElementary IR: Scalable Boolean Text Search. (Compare with R & G )
Elementary IR: Scalable Boolean Text Search (Compare with R & G 27.1-3) Information Retrieval: History A research field traditionally separate from Databases Hans P. Luhn, IBM, 1959: Keyword in Context
More informationFastForward I/O and Storage: IOD M5 Demonstration (5.2, 5.3, 5.9, 5.10)
FastForward I/O and Storage: IOD M5 Demonstration (5.2, 5.3, 5.9, 5.10) 1 EMC September, 2013 John Bent john.bent@emc.com Sorin Faibish faibish_sorin@emc.com Xuezhao Liu xuezhao.liu@emc.com Harriet Qiu
More informationMCSA SQL SERVER 2012
MCSA SQL SERVER 2012 1. Course 10774A: Querying Microsoft SQL Server 2012 Course Outline Module 1: Introduction to Microsoft SQL Server 2012 Introducing Microsoft SQL Server 2012 Getting Started with SQL
More informationTIPSTER Text Phase II Architecture Requirements
1.0 INTRODUCTION TIPSTER Text Phase II Architecture Requirements 1.1 Requirements Traceability Version 2.0p 3 June 1996 Architecture Commitee tipster @ tipster.org The requirements herein are derived from
More informationAMAχOS Abstract Machine for Xcerpt
AMAχOS Abstract Machine for Xcerpt Principles Architecture François Bry, Tim Furche, Benedikt Linse PPSWR 06, Budva, Montenegro, June 11th, 2006 Abstract Machine(s) Definition and Variants abstract machine
More informationAccessing Arbitrary Hierarchical Data
D.G.Muir February 2010 Accessing Arbitrary Hierarchical Data Accessing experimental data is relatively straightforward when data are regular and can be modelled using fixed size arrays of an atomic data
More informationDataspaces: A New Abstraction for Data Management. Mike Franklin, Alon Halevy, David Maier, Jennifer Widom
Dataspaces: A New Abstraction for Data Management Mike Franklin, Alon Halevy, David Maier, Jennifer Widom Today s Agenda Why databases are great. What problems people really have Why databases are not
More informationMap-Reduce. Marco Mura 2010 March, 31th
Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of
More informationPregel: A System for Large- Scale Graph Processing. Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010
Pregel: A System for Large- Scale Graph Processing Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010 1 Graphs are hard Poor locality of memory access Very
More informationThe course modules of MongoDB developer and administrator online certification training:
The course modules of MongoDB developer and administrator online certification training: 1 An Overview of the Course Introduction to the course Table of Contents Course Objectives Course Overview Value
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.
More informationGraph Databases. Graph Databases. May 2015 Alberto Abelló & Oscar Romero
Graph Databases 1 Knowledge Objectives 1. Describe what a graph database is 2. Explain the basics of the graph data model 3. Enumerate the best use cases for graph databases 4. Name two pros and cons of
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationA Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP
A Brief Introduction of TiDB Dongxu (Edward) Huang CTO, PingCAP About me Dongxu (Edward) Huang, Cofounder & CTO of PingCAP PingCAP, based in Beijing, China. Infrastructure software engineer, open source
More informationMike Fechner Director
Mike Fechner Director 2 3 Consultingwerk Software Services Ltd. Independent IT consulting organization Focusing on OpenEdge and related technology Located in Cologne, Germany, subsidiaries in UK and Romania
More informationTHE MINIBASE SOFTWARE
B THE MINIBASE SOFTWARE Practice is the best of all instructors. Publius Syrus, 42 B.C. Minibase is a small relational DBMS, together with a suite of visualization tools, that has been developed for use
More informationPASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year
PASS4TEST IT Certification Guaranteed, The Easy Way! \ http://www.pass4test.com We offer free update service for one year Exam : 000-141 Title : XML and related technologies Vendors : IBM Version : DEMO
More informationThis page intentionally left blank
This page intentionally left blank arting Out with Java: From Control Structures through Objects International Edition - PDF - PDF - PDF Cover Contents Preface Chapter 1 Introduction to Computers and Java
More informationCall: SAS BI Course Content:35-40hours
SAS BI Course Content:35-40hours Course Outline SAS Data Integration Studio 4.2 Introduction * to SAS DIS Studio Features of SAS DIS Studio Tasks performed by SAS DIS Studio Navigation to SAS DIS Studio
More informationBoolean Retrieval. Manning, Raghavan and Schütze, Chapter 1. Daniël de Kok
Boolean Retrieval Manning, Raghavan and Schütze, Chapter 1 Daniël de Kok Boolean query model Pose a query as a boolean query: Terms Operations: AND, OR, NOT Example: Brutus AND Caesar AND NOT Calpuria
More informationSearch and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria
More informationShark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker
Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationInformation Retrieval. Information Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent
More informationExtreme Computing. NoSQL.
Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable
More information1Z Oracle Business Intelligence Foundation Suite 11g Essentials
1Z0-591 - Oracle Business Intelligence Foundation Suite 11g Essentials 1.When a customer wants to get sales numbers by day, how is data stored in the Star Schema, if the data is loaded nightly? A. The
More informationThe Vanilla approach to building Reporting Systems
The Vanilla approach to building Reporting Systems Introduction A typical Vanilla reporting system processes log files from multiple raw data sources and loads the processed data into a database against
More informationBigtable. Presenter: Yijun Hou, Yixiao Peng
Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 06 Presenter: Yijun Hou, Yixiao Peng
More informationAdvanced WCF 4.0 .NET. Web Services. Contents for.net Professionals. Learn new and stay updated. Design Patterns, OOPS Principles, WCF, WPF, MVC &LINQ
Serialization PLINQ WPF LINQ SOA Design Patterns Web Services 4.0.NET Reflection Reflection WCF MVC Microsoft Visual Studio 2010 Advanced Contents for.net Professionals Learn new and stay updated Design
More informationIntroduction to Kubernetes
Introduction to Kubernetes Neil Peterson @nepeters #ITDEVCONNECTIONS Session Topics - Quick primer on containers - Container mgmt solutions - Kubernetes basics - Kubernetes deeper dive - Kubernetes beyond
More information