Overview Problem Statement Related Work KLEE The Histogram Bloom Structure Candidate Filtering Evaluation Conclusion / Future Work
|
|
- Hortense Johnston
- 6 years ago
- Views:
Transcription
1 Sebastian Michel Max-Planck Institute for Informatics Saarbrücken, Germany KLEE: Approximate Distributed Top-k Query Algorithms Peter Triantafillou RACTI / Univ. of Patras Rio, Greece peter@ceid.upatras.gr Gerhard Weikum Max-Planck Institute for Informatics Saarbrücken, Germany weikum@mpi-inf.mpg.de Max-Planck-Institut Informatik University of Patras NetCInS Lab Overview Problem Statement Related Work KLEE The Histogram Bloom Structure Candidate Filtering Evaluation Conclusion / Future Work HDMS 25, Athens 2
2 Computational Model distributed aggregation queries: Query with m terms with index lists spread across m peers P... Pm P2 P5 P P3 P4 Applications: Internet traffic monitoring Sensor networks P2P Web search HDMS 25, Athens 3 Problem Statement Query initiator P serves as per-query coordinator P t d78.9 d23.8 d.8 d.7 d88.2 P P2 t 2 d64.8 d23.6 d.6 d.2 d78. P3 t 3 d.7 d78.5 d64.4 d99.2 d34. Consider network consumption per peer load latency (query response time) network I/O processing HDMS 25, Athens 4 2
3 Related Work Existing Methods: Distributed NRA/TA: NRA/TA (Fagin et al. `99/ 3, Güntzer et al. `, Nepal et al. `99) with batched access TPUT (Cao/Wang 24): ) fetch k best entries (d, s j ) from each of P... Pm and aggregate ( j=..m s j (d)) at P 2) ask each of P... Pm for all entries with s j > and aggregate results at P 3) fetch missing s for all candidates by random lookups at P... Pm + DNRA aims to minimize per-peer work - DTA/DNRA incur many messages + TPUT guarantees fixed number of message rounds - TPUT incurs high per-peer load and net BW HDMS 25, Athens 5 TPUT Coordinator Peer P current top-kcandidate set Peer Pi candidates top k Retrieve missing s Peer Pj candidates top k Retrieve missing s... Index List... Index List HDMS 25, Athens 6 3
4 KLEE: Key Ideas if mink / m is small TPUT retrieves a lot of data in Phase 2 high network traffic random accesses high per-peer load KLEE: Different philosophy: approximate answers! Efficiency: Reduces (docid, )-pair transfers no random accesses Two pillars: The HistogramBlooms structure The Candidate List Filter structure HDMS 25, Athens 7 KLEE 3/4: The KLEE Algorithms. Exploration Step: to get a better approximation of min-k threshold 2. Optimization Step: decide: 3 or 4 phases? 3. Candidate Filtering: a docid is a good candidate if high-d in many peers. 4. Candidate Retrieval: get all good docid candidates. HDMS 25, Athens 8 4
5 Histogram Bloom Structure Each peer pre-computes for each index list: an equi-width histogram #docs + Bloom filter for each cell + average per cell + upper/lower increase the mink / m threshold HDMS 25, Athens 9 Bloom Filter bit array of size m k hash functions h i : docid_space {,..,m} insert n docs by hashing the ids and settings the corresponding bits Membership Queries: document is in the Bloom Filter if the corresponding bits are set kn / m probability of false positives (pfp) pfp = ( e ) tradeoff accuracy vs. efficiency k HDMS 25, Athens 5
6 Exploration and Candidate Retrieval Coordinator Peer P current top-kcandidate set Peer Pi candidates top k c cells b bits Histogram Peer Pj candidates top k c cells b bits Histogram... Index List... Index List HDMS 25, Athens Candidate List Filter Matrix Goal: filter out unpromising candidate documents in step 2 estimate the max number of docs that are above the mink / m threshold number of documents threshold send this number and the threshold to the cohort peers HDMS 25, Athens 2 6
7 Candidate List Filter Matrix (2) Each cohort returns a Bloom Filter that contains all docs above the mink / m threshold Candidate List Filter Matrix (CLFM) Select all columns with at least R bits set HDMS 25, Athens 3 Coordinator Peer P KLEE Candidate Set Reduction current top-k candidate set candidate filter matrix x x x Peer Pi top k Peer Pj candidates... Index List HDMS 25, Athens 4 7
8 Coordinator Peer P KLEE Candidate Retrieval current top-k candidate set candidate filter matrix x x x Peer Pi top k Peer Pj candidates... Index List early stopping point HDMS 25, Athens 5 Enhanced Filtering BF represenation can be improved (d,.9) (d2,.6) (d5,.5) (d3,.3) (d4,.25) (d7,.8) (d9,.7) Cell Cell 2 Cell 3 d cell and d2 cell but s s2 =.3! Send byte-array with cell-numbers instead of bits Select columns with Sum over upper-bounds > min-k HDMS 25, Athens 6 8
9 Architecture/Testbed 2 KLEE Algorithmic Framework 4 3 open() get(k) getabove()... getwithbf(..) open() next() close() Extended IndexLists with BloomFilters, Histograms, and Batched Access SQL B+ Index Index Lists Oracle DB HDMS 25, Athens 7 Evaluation: Benchmarks GOV: TREC.GOV collection + 5 TREC-23 Web queries, e.g. juvenile delinquency XGOV: TREC.GOV collection + 5 manually expanded queries, e.g. juvenile delinquency youth minor crime law jurisdiction offense prevention IMDB: Movie Database, queries like actor = John Wayne; genre =western Synthetic Distribution (Zipf, different skewness): GOV collection but with synthetic s Synthetic Distribution + Synthetic Correlation: index lists HDMS 25, Athens 8 9
10 Evaluation: Metrics Relative recall w.r.t. to the actual results Score error Bandwidth consumption Rank distance Number of RA and number of SA Query response time - network cost (5ms RTT, 8Kb/s data transfer rate) - local I/O cost (8ms rotation latency + 8MB/s transfer delay) - processing cost HDMS 25, Athens 9 DTA: Evaluated Algorithms batched distributed threshold algorithm, batch size k. TPUT X-TPUT: approximate TPUT. No random accesses. KLEE-3 KLEE-4 C = % of the mass HDMS 25, Athens 2
11 Synthetic Score Benchmarks Zipf-GOV Total Total Average c=% # of Bytes Time in ms Recall DTA 7,752,769 3,532,8 TPUT 53,494,93 576,73 X-TPUT 53,,252 44,99.99 KLEE 3 49,86, ,93.97 KLEE 4 25,57,92 6, Zipf-XGOV Total Total Average c=% # of Bytes Time in ms Recall DTA 67,9,26 39,582,682 TPUT 377,928,88,599,58 X-TPUT 377,97,644,52,22.98 KLEE 3 287,294,82,89,89.9 KLEE 4 65,77,87 375,77.92 Bandwidth in Bytes 2.5E+7 2.E+7.5E+7.E+7 5.E+6.E+ Zipf-XGOV, c=%, θ=.7 DTA TPUT X-TPUT KLEE 3 KLEE Number of Query Terms θ =.7 HDMS 25, Athens 2 Synthetic Correlation Benchmark Overlap+Zipf Total Total Average c=% Ω = 3% # of Bytes Time in ms Recall DTA, ,42 TPUT 9,5,94 29,27 X-TPUT 9,5,94 28,335 KLEE 3 3,678,78 2,97.92 KLEE 4,92,74 6,546.9 randomly insert top k documents from list i in the top Ω documents of list j Ω Bandwidth in Bytes,, 9,, 8,, 7,, 6,, 5,, 4,, 3,, 2,,,, Overlap, c=%, θ=.7 DTA TPUT X-TPUT KLEE 3 KLEE Ω in % HDMS 25, Athens 22
12 GOV / XGOV GOV Total Total Average c=% # of Bytes Time in ms Recall DTA,72,446 9,259 TPUT,55,29 85,49 X-TPUT 597,99 3, KLEE 3 722,664 28,39.9 KLEE 4 44,868 39,564.9 XGOV Total Total Average c=% # of Bytes Time in ms Recall DTA 92,587,264 3,74,677 TPUT 7,44,884 2,346,882 X-TPUT 9,236,84 96,53.9 KLEE 3 6,69,92 88,27.83 KLEE 4 7,92,774 56, , GOV, c=% 6,, XGOV, c=% Bandwidth in Bytes 2,, 8, 6, 4, 2, DTA TPUT X-TPUT KLEE 3 KLEE Number of Query Terms Bandwidth in Bytes 5,, 4,, 3,, 2,,,, TA TPUT X-TPUT KLEE 3 KLEE Number of Query Terms HDMS 25, Athens 23 Conclusion / Future Work Conclusion KLEE: approximate top-k algorithms for wide-area networks significant performance benefits can be enjoyed, at only small penalties in result quality flexible framework for top-k algorithms, allowing for trading-off efficiency versus result quality and bandwidth savings versus the number of communication phases. various fine-tuning parameters Future Work Reasoning about parameter values Consider moving coordinator HDMS 25, Athens 24 2
13 Thanks for your attention! Questions? Comments? HDMS 25, Athens 25 3
KLEE: A Framework for Distributed Top-k Query Algorithms
KLEE: A Framework for Distributed Top-k Query Algorithms Sebastian Michel Max-Planck Institute for Informatics Saarbrücken, Germany smichel@mpi-inf.mpg.de Peter Triantafillou RACTI / Univ. of Patras Rio,
More informationKLEE: A Framework for Distributed Top-k Query Algorithms
KLEE: A Framework for Distributed Top-k Query Algorithms Sebastian Michel Max-Planck Institute for Informatics Saarbrücken, Germany smichel@mpi-inf.mpg.de Peter Triantafillou University of Patras Rio,
More informationImproving Collection Selection with Overlap Awareness in P2P Search Engines
Improving Collection Selection with Overlap Awareness in P2P Search Engines Matthias Bender Peter Triantafillou Gerhard Weikum Christian Zimmer and Improving Collection Selection with Overlap Awareness
More informationAlgebraic Query Optimization for Distributed Top-k Queries
Algebraic Query Optimization for Distributed Top-k Queries Thomas Neumann, Sebastian Michel Max-Planck-Institut für Informatik Saarbrücken, Germany {neumann, smichel}@mpi-inf.mpg.de Abstract: Distributed
More informationIO-Top-k at TREC 2006: Terabyte Track
IO-Top-k at TREC 2006: Terabyte Track Holger Bast Debapriyo Majumdar Ralf Schenkel Martin Theobald Gerhard Weikum Max-Planck-Institut für Informatik, Saarbrücken, Germany {bast,deb,schenkel,mtb,weikum}@mpi-inf.mpg.de
More informationSmooth Interpolating Histograms with Error Guarantees
Smooth Interpolating Histograms with Error Guarantees Thomas Neumann 1 and Sebastian Michel 2 1 Max-Planck-Institut Informatik, Saarbrücken, Germany neumann@mpi-inf.mpg.de 2 École Polytechnique Fédérale
More informationInformation Retrieval and Filtering over Self-Organising Digital Libraries
Information Retrieval and Filtering over Self-Organising Digital Libraries Paraskevi Raftopoulou 1,2, Euripides G.M. Petrakis 2, Christos Tryfonopoulos 1, and Gerhard Weikum 1 1 Max-Planck Institute for
More informationRIPPLE: A Scalable Framework for Distributed Processing of Rank Queries
RIPPLE: A Scalable Framework for Distributed Processing of Rank Queries George Tsatsanifos (NTUA) Dimitris Sacharidis ( Athena RC) Timos Sellis (RMIT) 17th International Conference on Extending Database
More informationReplication, Load Balancing and Efficient Range Query Processing in DHTs
Replication, Load Balancing and Efficient Range Query Processing in DHTs Theoni Pitoura, Nikos Ntarmos, and Peter Triantafillou R.A. Computer Technology Institute and Computer Engineering & Informatics
More informationMINERVA : A Scalable Efficient Peer-to-Peer Search Engine
MINERVA : A Scalable Efficient Peer-to-Peer Search Engine Sebastian Michel 1, Peter Triantafillou 2,andGerhardWeikum 1 1 Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany {smichel, weikum}@mpi-inf.mpg.de
More informationFast Approximations for Analyzing Ten Trillion Cells. Filip Buruiana Reimar Hofmann
Fast Approximations for Analyzing Ten Trillion Cells Filip Buruiana (filipb@google.com) Reimar Hofmann (reimar.hofmann@hs-karlsruhe.de) Outline of the Talk Interactive analysis at AdSpam @ Google Trade
More informationA Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems
A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems Tiejian Luo tjluo@ucas.ac.cn Zhu Wang wangzhubj@gmail.com Xiang Wang wangxiang11@mails.ucas.ac.cn ABSTRACT The indexing
More informationMemory hierarchy review. ECE 154B Dmitri Strukov
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal
More informationSigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds
SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds Erich Schubert, Michael Weiler, Hans-Peter Kriegel! Institute of Informatics Database Systems Group
More informationAn Efficient and Versatile Query Engine for TopX Search
An Efficient and Versatile Query Engine for TopX Search Martin Theobald, Ralf Schenkel, Gerhard Weikum Max-Planck Institute for Informatics Stuhlsatzenhausweg 85, D-66123 Saarbruecken, Germany {mtb, schenkel,
More informationCounting at Large: Efficient Cardinality Estimation in Internet-Scale Data Networks
Counting at Large: Efficient Cardinality Estimation in Internet-Scale Data etworks ikos tarmos R.A.C.T.I. and C.E.I.D. University of Patras Rio, Greece ntarmos@ceid.upatras.gr Peter Triantafillou R.A.C.T.I.
More informationBest-Offset Hardware Prefetching
Best-Offset Hardware Prefetching Pierre Michaud March 2016 2 BOP: yet another data prefetcher Contribution: offset prefetcher with new mechanism for setting the prefetch offset dynamically - Improvement
More informationSearch Engines. Provide a ranked list of documents. May provide relevance scores. May have performance information.
Search Engines Provide a ranked list of documents. May provide relevance scores. May have performance information. 3 External Metasearch Metasearch Engine Search Engine A Search Engine B Search Engine
More informationSummary Cache based Co-operative Proxies
Summary Cache based Co-operative Proxies Project No: 1 Group No: 21 Vijay Gabale (07305004) Sagar Bijwe (07305023) 12 th November, 2007 1 Abstract Summary Cache based proxies cooperate behind a bottleneck
More informationSCREAM: Sketch Resource Allocation for Software-defined Measurement
SCREAM: Sketch Resource Allocation for Software-defined Measurement (CoNEXT 15) Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat Measurement is Crucial for Network Management Network Management
More informationImproving the Performance of OLAP Queries Using Families of Statistics Trees
Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University
More informationDesign of Experiments - Terminology
Design of Experiments - Terminology Response variable Measured output value E.g. total execution time Factors Input variables that can be changed E.g. cache size, clock rate, bytes transmitted Levels Specific
More informationIntegrating rankings: Problem statement
Integrating rankings: Problem statement Each object has m grades, oneforeachofm criteria. The grade of an object for field i is x i. Normally assume 0 x i 1. Typically evaluations based on different criteria
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Data Stream Processing Topics Model Issues System Issues Distributed Processing Web-Scale Streaming 3 System Issues Architecture
More informationSummary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol Li Fan, Pei Cao and Jussara Almeida University of Wisconsin-Madison Andrei Broder Compaq/DEC System Research Center Why Web Caching One of
More informationImprove Web Application Performance with Zend Platform
Improve Web Application Performance with Zend Platform Shahar Evron Zend Sr. PHP Specialist Copyright 2007, Zend Technologies Inc. Agenda Benchmark Setup Comprehensive Performance Multilayered Caching
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/19/17 Fall 2017 - Lecture #16 1 Parallel
More informationCS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University
CS6200 Information Retrieval David Smith College of Computer and Information Science Northeastern University Indexing Process!2 Indexes Storing document information for faster queries Indexes Index Compression
More informationBDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz
BDCC: Exploiting Fine-Grained Persistent Memories for OLAP Peter Boncz NVRAM System integration: NVMe: block devices on the PCIe bus NVDIMM: persistent RAM, byte-level access Low latency Lower than Flash,
More informationSearch Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]
Search Evaluation Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Table of Content Search Engine Evaluation Metrics for relevancy Precision/recall F-measure MAP NDCG Difficulties in Evaluating
More informationMemory hier ar hier ch ar y ch rev re i v e i w e ECE 154B Dmitri Struko Struk v o
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Opteron example Cache performance Six basic optimizations Virtual memory Processor DRAM gap (latency) Four issue superscalar
More informationgsketch: On Query Estimation in Graph Streams
gsketch: On Query Estimation in Graph Streams Peixiang Zhao (Florida State University) Charu C. Aggarwal (IBM Research, Yorktown Heights) Min Wang (HP Labs, China) Istanbul, Turkey, August, 2012 Synopsis
More informationA Detailed Evaluation of Threshold Algorithms for Answering Top-k queries in Peer-to-Peer Networks
TECHNICAL REPORT: ICS-FORTH/TR-407, JULY 2010, COPYRIGHT BY INSTITUTE OF COMPUTER SCIENCE, FOUNDATION FOR RESEARCH & TECHNOLOGY - HELLAS [2010] A Detailed Evaluation of Threshold Algorithms for Answering
More informationIndexing Web pages. Web Search: Indexing Web Pages. Indexing the link structure. checkpoint URL s. Connectivity Server: Node table
Indexing Web pages Web Search: Indexing Web Pages CPS 296.1 Topics in Database Systems Indexing the link structure AltaVista Connectivity Server case study Bharat et al., The Fast Access to Linkage Information
More informationMaking Session Stores More Intelligent KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS
Making Session Stores More Intelligent KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS What is a session store? A session store is An chunk of data that is connected to one user of a service user
More informationOracle Enterprise Manager 12 c : ASH in 3D
Oracle Enterprise Manager 12 c : ASH in 3D Deba Chatterjee and John Beresniewicz, Oracle Anton Topurov, CERN Agenda The DB Time Data Cube What is DB Time? ASH fundamentals Populating
More informationJ. Parallel Distrib. Comput.
J. Parallel Distrib. Comput. 71 (011) 30 315 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc Top-k vectorial aggregation queries in
More informationOne-Pass Streaming Algorithms
One-Pass Streaming Algorithms Theory and Practice Complaints and Grievances about theory in practice Disclaimer Experiences with Gigascope. A practitioner s perspective. Will be using my own implementations,
More information15-744: Computer Networking. Overview. Queuing Disciplines. TCP & Routers. L-6 TCP & Routers
TCP & Routers 15-744: Computer Networking RED XCP Assigned reading [FJ93] Random Early Detection Gateways for Congestion Avoidance [KHR02] Congestion Control for High Bandwidth-Delay Product Networks L-6
More information10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy
CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanović & Randy H Katz http://insteecsberkeleyedu/~cs6c/ Parallel Requests Assigned to computer eg, Search
More informationTrack Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross
Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk
More informationHow Flash-Based Storage Performs on Real Applications Session 102-C
How Flash-Based Storage Performs on Real Applications Session 102-C Dennis Martin, President August 2016 1 Agenda About Demartek Enterprise Datacenter Environments Storage Performance Metrics Synthetic
More informationInterdomain Routing Design for MobilityFirst
Interdomain Routing Design for MobilityFirst October 6, 2011 Z. Morley Mao, University of Michigan In collaboration with Mike Reiter s group 1 Interdomain routing design requirements Mobility support Network
More informationScalable Enterprise Networks with Inexpensive Switches
Scalable Enterprise Networks with Inexpensive Switches Minlan Yu minlanyu@cs.princeton.edu Princeton University Joint work with Alex Fabrikant, Mike Freedman, Jennifer Rexford and Jia Wang 1 Enterprises
More informationCS 425 / ECE 428 Distributed Systems Fall 2015
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Measurement Studies Lecture 23 Nov 10, 2015 Reading: See links on website All Slides IG 1 Motivation We design algorithms, implement
More informationCS 268: Computer Networking
CS 268: Computer Networking L-6 Router Congestion Control TCP & Routers RED XCP Assigned reading [FJ93] Random Early Detection Gateways for Congestion Avoidance [KHR02] Congestion Control for High Bandwidth-Delay
More informationMemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing
MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU), Dave Andersen (CMU), Michael Kaminsky (Intel Labs) NSDI 2013 http://www.pdl.cmu.edu/ 1 Goal: Improve Memcached 1. Reduce space overhead
More informationInvestigating the Use of Synchronized Clocks in TCP Congestion Control
Investigating the Use of Synchronized Clocks in TCP Congestion Control Michele Weigle (UNC-CH) November 16-17, 2001 Univ. of Maryland Symposium The Problem TCP Reno congestion control reacts only to packet
More informationSIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto
SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES University of Toronto Interaction of Coherence and Network 2 Cache coherence protocol drives network-on-chip traffic Scalable coherence protocols
More informationOverview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste
Overview 5-44 5-44 Computer Networking 5-64 Lecture 6: Delivering Content: Peer to Peer and CDNs Peter Steenkiste Web Consistent hashing Peer-to-peer Motivation Architectures Discussion CDN Video Fall
More informationLB-MAP: LOAD-BALANCED MIDDLEBOX ASSIGNMENT IN POLICY-DRIVEN DATA CENTERS
LB-MAP: LOAD-BALANCED MIDDLEBOX ASSIGNMENT IN POLICY-DRIVEN DATA CENTERS MANAR ALQARNI DEPARTMENT OF COMPUTER SCIENCE CALIFORNIA STATE UNIVERSITY DOMINGUEZ HILLS 1 INTRODUCTION - Middleboxes network appliances
More informationPerformance Analysis and Culling Algorithms
Performance Analysis and Culling Algorithms Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Assignment 2 Sign up for Pluto labs on the web
More informationHPE ProLiant DL580 Gen10 and Ultrastar SS300 SSD 195TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture
WHITE PAPER HPE ProLiant DL580 Gen10 and Ultrastar SS300 SSD 195TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture BASED ON MICROSOFT SQL SERVER 2017 DATA WAREHOUSE FAST TRACK REFERENCE
More informationAgenda. Cache-Memory Consistency? (1/2) 7/14/2011. New-School Machine Structures (It s a bit more complicated!)
7/4/ CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches II Instructor: Michael Greenbaum New-School Machine Structures (It s a bit more complicated!) Parallel Requests Assigned to
More informationCopyright 2018, Oracle and/or its affiliates. All rights reserved.
Beyond SQL Tuning: Insider's Guide to Maximizing SQL Performance Monday, Oct 22 10:30 a.m. - 11:15 a.m. Marriott Marquis (Golden Gate Level) - Golden Gate A Ashish Agrawal Group Product Manager Oracle
More informationCombining Fuzzy Information - Top-k Query Algorithms. Sanjay Kulhari
Combining Fuzzy Information - Top-k Query Algorithms Sanjay Kulhari Outline Definitions Objects, Attributes and Scores Querying Fuzzy Data Top-k query algorithms Naïve Algorithm Fagin s Algorithm (FA)
More information! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large
Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!
More informationChapter 20: Parallel Databases
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationChapter 20: Parallel Databases. Introduction
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationIEEE Energy Efficient Ethernet. Beyond the PHY
Energy Efficient Ethernet Beyond the PHY Power savings in networked systems Rudy Klecka, Hugh Barrass (Cisco) Mai 2007, Genève, Suisse. 1 A wider view EEE study group has discussed saving power in the
More informationOptimizing near duplicate detection for peer-to-peer networks
Optimizing near duplicate detection for peer-to-peer networks Odysseas Papapetrou* Stefan Siersdorfer Sukriti Ramesh Wolfgang Nejdl Introduction Near duplicate on the content level: near duplicates: resources
More informationarxiv: v3 [cs.ni] 3 May 2017
Modeling Request Patterns in VoD Services with Recommendation Systems Samarth Gupta and Sharayu Moharir arxiv:1609.02391v3 [cs.ni] 3 May 2017 Department of Electrical Engineering, Indian Institute of Technology
More informationRecommender Systems. Techniques of AI
Recommender Systems Techniques of AI Recommender Systems User ratings Collect user preferences (scores, likes, purchases, views...) Find similarities between items and/or users Predict user scores for
More informationLimitations of XPath & XQuery in an Environment with Diverse Schemes
Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML-Data Martin Theobald, Ralf Schenkel, and Gerhard Weikum Saarland University Saarbrücken, Germany 23.06.2003
More informationThe OceanStore Write Path
The OceanStore Write Path Sean C. Rhea John Kubiatowicz University of California, Berkeley June 11, 2002 Introduction: the OceanStore Write Path Introduction: the OceanStore Write Path The Inner Ring Acts
More informationBe Fast, Cheap and in Control with SwitchKV Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman Goal: fast and cost-effective key-value store Target: cluster-level storage for
More informationEngineering at Scale. Paul Baecke
Engineering at Scale THE CHALLENGES OF PREDICTING QUERIES IN WEB SEARCH ENGINES Paul Baecke Introduction How is what we do Extreme Computing? What is the product Complexity online Complexity offline
More informationPower of Slicing in Internet Flow Measurement. Ramana Rao Kompella Cristian Estan
Power of Slicing in Internet Flow Measurement Ramana Rao Kompella Cristian Estan 1 IP Network Management Network Operator What is happening in my network? How much traffic flows towards a given destination?
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationLow Latency via Redundancy
Low Latency via Redundancy Ashish Vulimiri, Philip Brighten Godfrey, Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, Scott Shenker Presenter: Meng Wang 2 Low Latency Is Important Injecting just 400 milliseconds
More informationDonnybrook: Enabling Large-Scale, High-Speed, Peer-to-Peer Games
Donnybrook: Enabling Large-Scale, High-Speed, Peer-to-Peer Games Ashwin Bharambe Jeffrey Pang Srinivasan Seshan Xinyu Zhuang John R. Douceur Jacob R. Lorch Thomas Moscibroda Donnybrook Jeffrey Pang (CMU)
More informationInformation Retrieval
Information Retrieval ETH Zürich, Fall 2012 Thomas Hofmann LECTURE 6 EVALUATION 24.10.2012 Information Retrieval, ETHZ 2012 1 Today s Overview 1. User-Centric Evaluation 2. Evaluation via Relevance Assessment
More informationChapter 18: Parallel Databases
Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery
More informationChapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction
Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of
More informationDesign and Performance Analysis of a DRAM-based Statistics Counter Array Architecture
Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture Chuck Zhao 1 Hao Wang 2 Bill Lin 2 Jim Xu 1 1 Georgia Institute of Technology 2 University of California, San Diego
More informationORACLE TRAINING CURRICULUM. Relational Databases and Relational Database Management Systems
ORACLE TRAINING CURRICULUM Relational Database Fundamentals Overview of Relational Database Concepts Relational Databases and Relational Database Management Systems Normalization Oracle Introduction to
More informationCSCI 5417 Information Retrieval Systems. Jim Martin!
CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 7 9/13/2011 Today Review Efficient scoring schemes Approximate scoring Evaluating IR systems 1 Normal Cosine Scoring Speedups... Compute the
More informationAggregation and Degradation in JetStream: Streaming Analytics in the Wide Area
Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area Ariel Rabkin Princeton University asrabkin@cs.princeton.edu Work done with Matvey Arye, Siddhartha Sen, Vivek S. Pai, and
More informationRevisiting Aggregation for Data Intensive Applications: A Performance Study
manuscript No. (will be inserted by the editor) Revisiting Aggregation for Data Intensive Applications: A Performance Study Jian Wen Vinayak R. Borkar Michael J. Carey Vassilis J. Tsotras Aggregation has
More informationEfficient Lists Intersection by CPU- GPU Cooperative Computing
Efficient Lists Intersection by CPU- GPU Cooperative Computing Di Wu, Fan Zhang, Naiyong Ao, Gang Wang, Xiaoguang Liu, Jing Liu Nankai-Baidu Joint Lab, Nankai University Outline Introduction Cooperative
More informationA STUDY OF GOSSIP ALGORITHMS FOR INTERNET-SCALE CARDINALITY ESTIMATION OF DISTRIBUTED XML DATA
MS Thesis Defense of Vasil Slavov 1 A STUDY OF GOSSIP ALGORITHMS FOR INTERNET-SCALE CARDINALITY ESTIMATION OF DISTRIBUTED XML DATA Vasil G. Slavov Computer Science & Electrical Engineering University of
More informationManaging Redundant Content in Bandwidth Constrained Wireless Networks
Managing Redundant Content in Bandwidth Constrained Wireless Networks Tuan Dao, Amit K. Roy- Chowdhury, Srikanth V. Krishnamurthy U.C. Riverside Harsha V. Madhyastha University of Michigan Tom La Porta
More informationEvaluation. David Kauchak cs160 Fall 2009 adapted from:
Evaluation David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture8-evaluation.ppt Administrative How are things going? Slides Points Zipf s law IR Evaluation For
More informationSecurity Control Methods for Statistical Database
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security Statistical Database A statistical database is a database which provides statistics on subsets of records OLAP
More informationP6 Compression Server White Paper Release 8.2 December 2011 Copyright Oracle Primavera P6 Compression Server White Paper Copyright 2005, 2011, Oracle and/or its affiliates. All rights reserved. Oracle
More informationFast Design Space Subsetting. University of Florida Electrical and Computer Engineering Department Embedded Systems Lab
Fast Design Space Subsetting University of Florida Electrical and Computer Engineering Department Embedded Systems Lab Motivation & Greater Impact Energy & Data Centers Estimated¹ energy by servers data
More informationObject Detection Design challenges
Object Detection Design challenges How to efficiently search for likely objects Even simple models require searching hundreds of thousands of positions and scales Feature design and scoring How should
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:
More informationChapter 13: Query Optimization. Chapter 13: Query Optimization
Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Equivalent Relational Algebra Expressions Statistical
More informationPivoting M-tree: A Metric Access Method for Efficient Similarity Search
Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic tomas.skopal@vsb.cz
More informationGeneric Topology Mapping Strategies for Large-scale Parallel Architectures
Generic Topology Mapping Strategies for Large-scale Parallel Architectures Torsten Hoefler and Marc Snir Scientific talk at ICS 11, Tucson, AZ, USA, June 1 st 2011, Hierarchical Sparse Networks are Ubiquitous
More informationOverlay and P2P Networks. Unstructured networks. PhD. Samu Varjonen
Overlay and P2P Networks Unstructured networks PhD. Samu Varjonen 25.1.2016 Contents Unstructured networks Last week Napster Skype This week: Gnutella BitTorrent P2P Index It is crucial to be able to find
More informationQLIKVIEW SCALABILITY BENCHMARK WHITE PAPER
QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER Hardware Sizing Using Amazon EC2 A QlikView Scalability Center Technical White Paper June 2013 qlikview.com Table of Contents Executive Summary 3 A Challenge
More informationLearning Objectives : This chapter provides an introduction to performance tuning scenarios and its tools.
Oracle Performance Tuning Oracle Performance Tuning DB Oracle Wait Category Wait AWR Cloud Controller Share Pool Tuning 12C Feature RAC Server Pool.1 New Feature in 12c.2.3 Basic Tuning Tools Learning
More informationTopology and affinity aware hierarchical and distributed load-balancing in Charm++
Topology and affinity aware hierarchical and distributed load-balancing in Charm++ Emmanuel Jeannot, Guillaume Mercier, François Tessier Inria - IPB - LaBRI - University of Bordeaux - Argonne National
More informationCategory vs. instance recognition
Category vs. instance recognition Category: Find all the people Find all the buildings Often within a single image Often sliding window Instance: Is this face James? Find this specific famous building
More informationCompressed local descriptors for fast image and video search in large databases
Compressed local descriptors for fast image and video search in large databases Matthijs Douze2 joint work with Hervé Jégou1, Cordelia Schmid2 and Patrick Pérez3 1: INRIA Rennes, TEXMEX team, France 2:
More informationTopX and XXL at INEX 2005
TopX and XXL at INEX 2005 Martin Theobald, Ralf Schenkel, and Gerhard Weikum Max-Planck-Institut für Informatik, Saarbrücken, Germany {mtb, schenkel, weikum}@mpi-inf.mpg.de Abstract. We participated with
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationPERSONALIZED TAG RECOMMENDATION
PERSONALIZED TAG RECOMMENDATION Ziyu Guan, Xiaofei He, Jiajun Bu, Qiaozhu Mei, Chun Chen, Can Wang Zhejiang University, China Univ. of Illinois/Univ. of Michigan 1 Booming of Social Tagging Applications
More information