Computational Social Choice in the Cloud

Similar documents
Winner Determination in Huge Elections with MapReduce

Computing the Schulze Method for Large-Scale Preference Data Sets

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

modern database systems lecture 10 : large-scale graph processing

Possibilities of Voting

One Trillion Edges. Graph processing at Facebook scale

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing

Thrill : High-Performance Algorithmic

Graph Data Management

MATH 1340 Mathematics & Politics

Voting. Xiaoyue Zhang

iihadoop: an asynchronous distributed framework for incremental iterative computations

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

/ Cloud Computing. Recitation 13 April 14 th 2015

Practical Algorithms for Computing STV and Other Multi-Round Voting Rules

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Lecture Map-Reduce. Algorithms. By Marina Barsky Winter 2017, University of Toronto

Distributed Graph Algorithms

MATH 1340 Mathematics & Politics

Distributed Systems. 21. Graph Computing Frameworks. Paul Krzyzanowski. Rutgers University. Fall 2016

Using Statistics for Computing Joins with MapReduce

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Sublinear Models for Streaming and/or Distributed Data

/ Cloud Computing. Recitation 13 April 12 th 2016

King Abdullah University of Science and Technology. CS348: Cloud Computing. Large-Scale Graph Processing

Distributed Graph Storage. Veronika Molnár, UZH

TI2736-B Big Data Processing. Claudia Hauff

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis

Hadoop, Yarn and Beyond

The Plurality-with-Elimination Method

GraphCEP Real-Time Data Analytics Using Parallel Complex Event and Graph Processing

PREGEL AND GIRAPH. Why Pregel? Processing large graph problems is challenging Options

Voting Methods and Colluding Voters

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM

Databases 2 (VU) ( / )

DATA SCIENCE USING SPARK: AN INTRODUCTION

Spark: A Brief History.

hyperx: scalable hypergraph processing

Distributed Systems. 20. Other parallel frameworks. Paul Krzyzanowski. Rutgers University. Fall 2017

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018

CS November 2017

CS /21/2016. Paul Krzyzanowski 1. Can we make MapReduce easier? Distributed Systems. Apache Pig. Apache Pig. Pig: Loading Data.

15-388/688 - Practical Data Science: Big data and MapReduce. J. Zico Kolter Carnegie Mellon University Spring 2018

Graph-Processing Systems. (focusing on GraphChi)

Lecture 11: Graph algorithms! Claudia Hauff (Web Information Systems)!

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data

Spark Overview. Professor Sasu Tarkoma.

Big Graph Processing. Fenggang Wu Nov. 6, 2016

Big Data Infrastructures & Technologies

Distributed Machine Learning" on Spark

Department of Computer Science San Marcos, TX Report Number TXSTATE-CS-TR Clustering in the Cloud. Xuan Wang

Data Platforms and Pattern Mining

Pregel. Ali Shah

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

Big Data Hadoop Course Content

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G.

CSE 444: Database Internals. Lecture 23 Spark

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

Clustering Large scale data using MapReduce

Preference Elicitation for Single Crossing Domain

Tegra Time-evolving Graph Processing on Commodity Clusters

Search Engines. Provide a ranked list of documents. May provide relevance scores. May have performance information.

Determining the k in k-means with MapReduce

Big Data Architect.

Fast Connected Components Computation in Large Graphs by Vertex Pruning

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Data Reduction and Problem Kernels for Voting Problems

Data Partitioning and MapReduce

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

Hadoop An Overview. - Socrates CCDH

An exceedingly high-level overview of ambient noise processing with Spark and Hadoop

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

Hybrid MapReduce Workflow. Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US

Graph-Parallel Problems. ML in the Context of Parallel Architectures

Data-Intensive Distributed Computing

Iterative Voting Rules

TI2736-B Big Data Processing. Claudia Hauff

Distributed Systems. 21. Other parallel frameworks. Paul Krzyzanowski. Rutgers University. Fall 2018

Key aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling

Cluster Computing Architecture. Intel Labs

CS November 2018

Data Processing at Scale (CSE 511)

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING

The Pairwise-Comparison Method

Large-Scale Graph Processing 1: Pregel & Apache Hama Shiow-yang Wu ( 吳秀陽 ) CSIE, NDHU, Taiwan, ROC

Semi-Structured Data Management (CSE 511)

Available online at ScienceDirect. Procedia Computer Science 79 (2016 )

Data Analytics on RAMCloud

Hadoop Execution Environment

The Datacenter Needs an Operating System

Batch & Stream Graph Processing with Apache Flink. Vasia

Challenges for Data Driven Systems

Cloud, Big Data & Linear Algebra

Experimental Analysis of Distributed Graph Systems

Transcription:

Computational Social Choice in the Cloud Theresa Csar, Martin Lackner, Emanuel Sallinger, Reinhard Pichler Technische Universität Wien Oxford University PPI, Stuttgart, March 2017

By Sam Johnston [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Cloud Computing Technologies MapReduce Pregel Giraph GraphX Spark Hadoop

MapReduce Map Phase The input data is mapped to (key,value)-pairs Shuffle Phase The (key,value)-pairs are assigned to the reduce tasks Reduce Phase Each reduce task performs a simple calculation on all its values

What s an election? Given as lists of preferences with n votes and m candidates. We are interested in finding the best candidate, or the set of best candidates.

MapReduce and Elections by Example Given a set of m = 3 candidates a,b,c and n voters. Each voter provides a ranking of candidates, e.g.: a > b > c Borda Scoring Rule: The candidate ranked first receives m 1 points, the second m 2 points, etc.

MapReduce and Elections by Example Borda Scoring Rule

Performance Analysis of a Mapreduce Computation data replication rate (rr) number of MapReduce rounds number of keys / reduce tasks wall clock time (wct): the maximum time consumed by a single computation path in the parallel execution of the algorithm total communication cost (tcc): number of values transferred during the computation.

Performance Borda Scoring Rule The scores of all candidates given a scoring rule can be computed using MapReduce with the following characteristics: rr = 1, # rounds= 1, # keys = m, wct n, and tcc mn.

Winner Determination in Elections Scoring Rules: Borda Scoring Rule, Copeland Set: The Copeland set is based on Copeland scores. The Copeland score of candidate a is defined as {b C : a > b} {b C : b < a}. The Copeland set is the set of candidates that have the maximum Copeland score. The Smith set is the (unique) smallest set of candidates that dominate all outside candidates. The Schwartz set is the union of minimal sets that are not dominated by outside candidates.

Winner Determination in Elections Input Data Preference-Lists (Scoring Rules) Number of Lists / Number of Votes Length of Votes / Number of Candidates Dominance Graph (Smith Set, Copeland Set, Schwartz Set) Number of Candidates

Smith Set Definition Candidate a is in the Smith set if and only if for every candidate b there is a path from a to b in the weak dominance graph. Brandt,Fischer and Harrenstein (2009) show that in the weak dominance graph a vertex t is not reachable from a vertex s if and only if there exists a vertex v such that D2(v) = D3(v), s D2(v), and t / D2(v). In other words: We only need paths of length 3 to find

Smith Set Algorithm-sketch Preprocessing step (create needed datastructure) 2 MR-Rounds: to find paths of length 2 und 3 (or 4) Postprocessing: find vertices contained in the Smith set

Smith Set Vertex Datastructure Think like a vertex Each vertex saves three sets storing information on incoming and outgoing edges for a vertex a as follows: the set old stores all vertices that have been found previously to be reachable from a; the set new stores all vertices that have been found in the last mapreduce round to be reachable from a; the set reachedby stores all vertices known to reach a;

Vertex Data Structure in action

Experimental Design Mapreduce Java Implementation github.com/theresacsar/bigvoting Amazon Web Services (AWS) Elastic Compute Cloud Synthetic Datasets with varying number of candidates and edges in the dominance graph (m=7000 candidates and m2/10 edges) up to 128 EC2 instances

Future Work Exploring other Technologies Pregel (Giraph, GraphX) Think like a vertex Pregel-like systems are better suited for iterative Graph computations Spark Interactivity Data is loaded in memory

Future Work Other Technologies Using real word data Results from search engines Other Rules for Winner Determination

Thank you for listening! ask questions now or send me an email csar@dbai.tuwien.ac.at