Introduction to Data Science Day 2

Size: px
Start display at page:

Download "Introduction to Data Science Day 2"

Transcription

1 Introduction to Data Science Day 2 Data Matters Summer workshop series in data science Sponsored by the Odum Institute, RENCI, and NCDS Thomas M. Carsey carsey@unc.edu

2 Examples of Data Science Google Flu Billion Prices Project Medicare Fraud medicare-fraud-strike-force-charges-90-individuals-forapproximately-260-million-in-false-billing Twitter trending Angie s List

3 More Examples (research) Agricultural crop insurance fraud Satellite data Diffusion of Policy innovation Plagiarism and networks networks and topic models Structure in Wikipedia Repeat offenders individual or group-based causes?

4 Big Data Computing MapReduce and Hadoop Designed to process large operations quickly Distributes the problem across multiple distribute servers The Map part filters and sorts data into bins or queues based on some share characteristic The Reduce part then executes some operation on each bin of data. Results are then reassembled It is like parallel processes, but distributed across servers rather than just processors Scalable and has a fault tolerance Hadoop is an open-source versoin

5 More on MapReduce Pig Software platform used for creating MapRedce programs used by Hadoop. Hive A date warehouse infrastructure built on top of Hadoop. Used to query, summarize, or analyzed data.

6 Database management SQL Structured Query Language A programming language designed for management of relational databases MySQL Open source implementation of an SQL-like system for management of relational databases (used by Wikipedia, Google, Facebook, Twitter, Flickr, YouTube) NoSQL (Not Only SQL) Used for databases where the data is in some form other than tabular relations like those used in relational databases Cassandra (Apache) Distributed database management with not single node of failure Scalable with no down time

7 Cloud Computing Standard client-server model where computing operations don t happen on the local (desktop) machine. What s new? Virtualization. You are not connecting to a specific server. Servers are virtual One server can run multiple virtual machines One virtual machine can use multiple servers This makes the machine scalable, moveable, configurable. Allows selling software, platforms, and even computing infrastructure as a service

8 Data Access and Research Transparency (DA-RT) Increased regulations from government to provide access to data funded by government dollars. Increased professional publication standards for sharing research data Increased pressure for transparency in research methods more generally, including data access Symposium in Political Science (PS: Political Science and Politics, Vol. 47, Issue 01.) jid=psc&volumeid=47&seriesid=0&issueid=01

9 Privacy and Ethics Several questions emerge from the explosion of digital data about us: Identity: What is the relationships between our offline and online identities? Privacy: Who should control access to online data? Ownership: Who owns data? Can ownership rights be transferred? What are the obligations of those who generate and use data? Reputation: How can we determine if data is trustworthy? How can we know if provider acquired data legitimately? There are entire industries on reputation management.

10 Ethical Claims Davis claims Big Data, like all technology, is ethically neutral However, business is a social/public enterprise, so you will encounter the values others hold. Big data magnifies this encounter. Google says between 15 and 50 billion webpages with 3.5 billion searches every day. YouTube has more than 1 billion unique visitors every month watching over 6 billion hours of video. 100 hours of video are uploaded every day 500 million Tweets every day billion Facebook users

11 Ethical Decision Points Inquiry: what are our values? Analysis: what are our current data handling practices? Do they align with our values? Articulation: explicit written expression of alignment and gaps between values and practices Action: specific plans and activities used to close those gaps.

12 Buying vs. Selling Company data policies: 34 of 50 Fortune-class companies said the would not sell personal data. No company said explicitly that the would sell personal data. No company made any explicit statement that they would not buy personal data 11 companies said explicitly that buying personal data was allowed. Why different standards for buying vs. selling personal data?

13 Privacy Physical Privacy Informational Privacy Organizational Privacy Privacy of Our communications Our behavior Our person Is digital communal living different than physical communal living? Who decides? Who regulates? Who Enforces?

14 Privacy as a Common Pool Resource? To govern our collective behavior: We need norms/rules that are agreed upon and known We need effective monitoring We need a mechanism to sanction violations The question is, do we need an external force or can we govern ourselves? (Ostrom s work)

15 Traps in Empirical Research Simpson s Paradox

16 Other Traps Aggregation effects more generally E.g. mis-report bias in voting Residential segregation Regression to the mean Special education programs Hawthorne effect Thanks for noticing Superstitious learning Midas touch vs. Damned if you do, damned if you don t.

17 How to Avoid Traps Theory Think critically Use simulations but that s another course!

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

Importance of the Data Management process in setting up the GDPR within a company CREOBIS

Importance of the Data Management process in setting up the GDPR within a company CREOBIS Importance of the Data Management process in setting up the GDPR within a company CREOBIS 1 Alain Cieslik Personal Data is the oil of the digital world 2 Alain Cieslik Personal information comes in different

More information

Hierarchy of knowledge BIG DATA 9/7/2017. Architecture

Hierarchy of knowledge BIG DATA 9/7/2017. Architecture BIG DATA Architecture Hierarchy of knowledge Data: Element (fact, figure, etc.) which is basic information that can be to be based on decisions, reasoning, research and which is treated by the human or

More information

Trends in Mobile Forensics from Cellebrite

Trends in Mobile Forensics from Cellebrite Trends in Mobile Forensics from Cellebrite EBOOK 1 Cellebrite Survey Cellebrite is a well-known name in the field of computer forensics, and they recently conducted a survey as well as interviews with

More information

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most

More information

Functionality, Challenges and Architecture of Social Networks

Functionality, Challenges and Architecture of Social Networks Functionality, Challenges and Architecture of Social Networks INF 5370 Outline Social Network Services Functionality Business Model Current Architecture and Scalability Challenges Conclusion 1 Social Network

More information

Big Data - Some Words BIG DATA 8/31/2017. Introduction

Big Data - Some Words BIG DATA 8/31/2017. Introduction BIG DATA Introduction Big Data - Some Words Connectivity Social Medias Share information Interactivity People Business Data Data mining Text mining Business Intelligence 1 What is Big Data Big Data means

More information

Hadoop, Yarn and Beyond

Hadoop, Yarn and Beyond Hadoop, Yarn and Beyond 1 B. R A M A M U R T H Y Overview We learned about Hadoop1.x or the core. Just like Java evolved, Java core, Java 1.X, Java 2.. So on, software and systems evolve, naturally.. Lets

More information

Webinar Series TMIP VISION

Webinar Series TMIP VISION Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing

More information

Online Bill Processing System for Public Sectors in Big Data

Online Bill Processing System for Public Sectors in Big Data IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 10 March 2018 ISSN (online): 2349-6010 Online Bill Processing System for Public Sectors in Big Data H. Anwer

More information

A REVIEW PAPER ON BIG DATA ANALYTICS

A REVIEW PAPER ON BIG DATA ANALYTICS A REVIEW PAPER ON BIG DATA ANALYTICS Kirti Bhatia 1, Lalit 2 1 HOD, Department of Computer Science, SKITM Bahadurgarh Haryana, India bhatia.kirti.it@gmail.com 2 M Tech 4th sem SKITM Bahadurgarh, Haryana,

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

MAPR DATA GOVERNANCE WITHOUT COMPROMISE MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

Telkomtelstra Corporate Website Increase a Business Experience through telkomtelstra Website

Telkomtelstra Corporate Website Increase a Business Experience through telkomtelstra Website Telkomtelstra Corporate Website Increase a Business Experience through telkomtelstra Website Award for Innovation in Corporate Websites Asia Pacific Stevie Awards 2016 Table of Content Telkomtelstra Website

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

Database Management Systems

Database Management Systems Database Management Systems Fall 2017 Knowledge is of two kinds: we know a subject ourselves, or we know where we can find information upon it. -- Samuel Johnson (1709-1784) Queries for Today Why? Who?

More information

Microsoft Big Data and Hadoop

Microsoft Big Data and Hadoop Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common

More information

PRIVACY POLICY TYPE AND USES OF INFORMATION WE COLLECT FROM YOU:

PRIVACY POLICY TYPE AND USES OF INFORMATION WE COLLECT FROM YOU: PRIVACY POLICY Rixmann Companies, Pawn America Minnesota, L.L.C. also d/b/a My Bridge Now, Pawn America Iowa, LLC, Pawn America Wisconsin, LLC, PayDay America, Inc., Pawn America Family Limited Partnership,

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

GROW YOUR BUSINESS ONLINE

GROW YOUR BUSINESS ONLINE GROW YOUR BUSINESS ONLINE Grow Your Business Online Connect with customers in moments that matter 76% of people who search on their smartphones for something nearby visit a business within a day. 1 80%

More information

IMPLEMENTING SECURITY, PRIVACY, AND FAIR DATA USE PRINCIPLES

IMPLEMENTING SECURITY, PRIVACY, AND FAIR DATA USE PRINCIPLES IMPLEMENTING SECURITY, PRIVACY, AND FAIR DATA USE PRINCIPLES Introductions Agenda Overall data risk and benefit landscape / shifting risk and opportunity landscape and market expectations Looking at data

More information

Review - Relational Model Concepts

Review - Relational Model Concepts Lecture 25 Overview Last Lecture Query optimisation/query execution strategies This Lecture Non-relational data models Source: web pages, textbook chapters 20-22 Next Lecture Revision Review - Relational

More information

Election Analysis and Prediction Using Big Data Analytics

Election Analysis and Prediction Using Big Data Analytics Election Analysis and Prediction Using Big Data Analytics Omkar Sawant, Chintaman Taral, Roopak Garbhe Students, Department Of Information Technology Vidyalankar Institute of Technology, Mumbai, India

More information

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype? Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Review Ch. 3 Connecting to the World s Information. 2010, 2006 South-Western, Cengage Learning

Review Ch. 3 Connecting to the World s Information. 2010, 2006 South-Western, Cengage Learning Review Ch. 3 Connecting to the World s Information 2010, 2006 South-Western, Cengage Learning Networks Two linked computers is a network A network of computers located within a short distance is called

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Digital Layer Trends PostalVision 2020

Digital Layer Trends PostalVision 2020 Digital Layer Trends PostalVision 2020 Matt Swain Associate Director June 12, 2012 @SwainfoTrends @InfoTrends #PV2020 2012 InfoTrends www.infotrends.com 1 Comprehensive Research on Customer Communications

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

Title II vs. Section 706

Title II vs. Section 706 Title II vs. Section 706 Patrick W. Gilmore, Chief Technology Officer Markley Cloud Services / Markley Group NANOG 62 October 7, 2014 Agenda This is a discussion on the United States Federal Communication

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

Fritztile is a brand of The Stonhard Group THE STONHARD GROUP Privacy Notice The Stonhard Group" Notice Whose Personal Data do we collect?

Fritztile is a brand of The Stonhard Group THE STONHARD GROUP Privacy Notice The Stonhard Group Notice Whose Personal Data do we collect? Fritztile is a brand of The Stonhard Group THE STONHARD GROUP Privacy Notice For the purposes of applicable data protection and privacy laws, The Stonhard Group, a division of Stoncor Group, Inc. ( The

More information

Big Data Technologies and Geospatial Data Processing:

Big Data Technologies and Geospatial Data Processing: Big Data Technologies and Geospatial Data Processing: A perfect fit Albert Godfrind Spatial and Graph Solutions Architect Oracle Corporation Agenda 1 2 3 4 The Data Explosion Big Data? Big Data and Geo

More information

Lecture 25 Overview. Last Lecture Query optimisation/query execution strategies

Lecture 25 Overview. Last Lecture Query optimisation/query execution strategies Lecture 25 Overview Last Lecture Query optimisation/query execution strategies This Lecture Non-relational data models Source: web pages, textbook chapters 20-22 Next Lecture Revision COSC344 Lecture 25

More information

Introduction to NoSQL by William McKnight

Introduction to NoSQL by William McKnight Introduction to NoSQL by William McKnight All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks of their

More information

Databases and Big Data Today. CS634 Class 22

Databases and Big Data Today. CS634 Class 22 Databases and Big Data Today CS634 Class 22 Current types of Databases SQL using relational tables: still very important! NoSQL, i.e., not using relational tables: term NoSQL popular since about 2007.

More information

An overview of. Mobile Testing. By André Jacobs. A Jacobs

An overview of. Mobile Testing. By André Jacobs. A Jacobs An overview of Mobile Testing By André Jacobs THE RISE AND RISE OF MOBILE 3 The Apple Story A look at the company that arguably has sparked the explosive growth in smart devices and mobile applications

More information

/ Cloud Computing. Recitation 10 March 22nd, 2016

/ Cloud Computing. Recitation 10 March 22nd, 2016 15-319 / 15-619 Cloud Computing Recitation 10 March 22nd, 2016 Overview Administrative issues Office Hours, Piazza guidelines Last week s reflection Project 3.3, OLI Unit 4, Module 15, Quiz 8 This week

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Smile IT Ltd Privacy Policy. Hello, we re Smile IT Ltd. We offer computer and network support to businesses and home computer users.

Smile IT Ltd Privacy Policy. Hello, we re Smile IT Ltd. We offer computer and network support to businesses and home computer users. Smile IT Ltd Privacy Policy Hello, we re Smile IT Ltd. We offer computer and network support to businesses and home computer users. At Smile IT we value our clients and we re committed to protecting your

More information

IBM BigInsights on Cloud

IBM BigInsights on Cloud Service Description IBM BigInsights on Cloud This Service Description describes the Cloud Service IBM provides to Client. Client means the company and its authorized users and recipients of the Cloud Service.

More information

Future of the Data Center

Future of the Data Center Future of the Data Center Maine Digital Government Summit November 29, 2012 Dan Diltz Vice President, Public Sector 1 Session Description A new data center is the perfect opportunity to start fresh by

More information

Windows Azure Overview

Windows Azure Overview Windows Azure Overview Christine Collet, Genoveva Vargas-Solar Grenoble INP, France MS Azure Educator Grant Packaged Software Infrastructure (as a Service) Platform (as a Service) Software (as a Service)

More information

Website Acecore Technologies JL B.V.:

Website Acecore Technologies JL B.V.: Privacy policy Acecore Technologies JL B.V. B.V. Acecore Technologies JL B.V. (hereafter: Acecore technologies JL B.V.) focusses on the development and shipping of drones in the creative, industrial and

More information

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc. PROFESSIONAL NoSQL Shashank Tiwari WILEY John Wiley & Sons, Inc. Examining CONTENTS INTRODUCTION xvil CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 Definition and Introduction 4 Context and a Bit

More information

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10 Scalable Web Programming CS193S - Jan Jannink - 2/25/10 Weekly Syllabus 1.Scalability: (Jan.) 2.Agile Practices 3.Ecology/Mashups 4.Browser/Client 7.Analytics 8.Cloud/Map-Reduce 9.Published APIs: (Mar.)*

More information

STATS Data Analysis using Python. Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns

STATS Data Analysis using Python. Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns STATS 700-002 Data Analysis using Python Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns Unit 3: parallel processing and big data The next few lectures will focus on big

More information

Lecture 20: WSC, Datacenters. Topics: warehouse-scale computing and datacenters (Sections )

Lecture 20: WSC, Datacenters. Topics: warehouse-scale computing and datacenters (Sections ) Lecture 20: WSC, Datacenters Topics: warehouse-scale computing and datacenters (Sections 6.1-6.7) 1 Warehouse-Scale Computer (WSC) 100K+ servers in one WSC ~$150M overall cost Requests from millions of

More information

EIT Health UK-Ireland Privacy Policy

EIT Health UK-Ireland Privacy Policy EIT Health UK-Ireland Privacy Policy This policy describes how EIT Health UK-Ireland uses your personal information, how we protect your privacy, and your rights regarding your information. We promise

More information

A Glimpse of the Hadoop Echosystem

A Glimpse of the Hadoop Echosystem A Glimpse of the Hadoop Echosystem 1 Hadoop Echosystem A cluster is shared among several users in an organization Different services HDFS and MapReduce provide the lower layers of the infrastructures Other

More information

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The

More information

Developing Enterprise Cloud Solutions with Azure

Developing Enterprise Cloud Solutions with Azure Developing Enterprise Cloud Solutions with Azure Java Focused 5 Day Course AUDIENCE FORMAT Developers and Software Architects Instructor-led with hands-on labs LEVEL 300 COURSE DESCRIPTION This course

More information

Our Data Protection Officer is Andrew Garrett, Operations Manager

Our Data Protection Officer is Andrew Garrett, Operations Manager Construction Youth Trust Privacy Notice We are committed to protecting your personal information Construction Youth Trust is committed to respecting and keeping safe any personal information you share

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

TWOO.COM CASE STUDY CUSTOMER SUCCESS STORY

TWOO.COM CASE STUDY CUSTOMER SUCCESS STORY TWOO.COM CUSTOMER SUCCESS STORY With over 30 million users, Twoo.com is Europe s leading social discovery site. Twoo runs the world s largest scale-out SQL deployment, with 4.4 billion transactions a day

More information

Big Data It s not just for Google Any More

Big Data It s not just for Google Any More Big Data It s not just for Google Any More The Software and Compelling Economics of Big Data Computing EXECUTIVE SUMMARY Big Data holds out the promise of providing businesses with differentiated competitive

More information

Certified Big Data and Hadoop Course Curriculum

Certified Big Data and Hadoop Course Curriculum Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation

More information

Canadian Anti-Spam Legislation (CASL)

Canadian Anti-Spam Legislation (CASL) Canadian Anti-Spam Legislation (CASL) FREQUENTLY ASKED QUESTIONS The purpose of this document is to assist and guide U of R employees regarding their obligations under the Canadian Anti-Spam Legislation

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES 1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB

More information

Towards Practical Differential Privacy for SQL Queries. Noah Johnson, Joseph P. Near, Dawn Song UC Berkeley

Towards Practical Differential Privacy for SQL Queries. Noah Johnson, Joseph P. Near, Dawn Song UC Berkeley Towards Practical Differential Privacy for SQL Queries Noah Johnson, Joseph P. Near, Dawn Song UC Berkeley Outline 1. Discovering real-world requirements 2. Elastic sensitivity & calculating sensitivity

More information

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

EXTRACT DATA IN LARGE DATABASE WITH HADOOP International Journal of Advances in Engineering & Scientific Research (IJAESR) ISSN: 2349 3607 (Online), ISSN: 2349 4824 (Print) Download Full paper from : http://www.arseam.com/content/volume-1-issue-7-nov-2014-0

More information

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved. Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources

More information

LazyBase: Trading freshness and performance in a scalable database

LazyBase: Trading freshness and performance in a scalable database LazyBase: Trading freshness and performance in a scalable database (EuroSys 2012) Jim Cipar, Greg Ganger, *Kimberly Keeton, *Craig A. N. Soules, *Brad Morrey, *Alistair Veitch PARALLEL DATA LABORATORY

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

Data Platforms and Pattern Mining

Data Platforms and Pattern Mining Morteza Zihayat Data Platforms and Pattern Mining IBM Corporation About Myself IBM Software Group Big Data Scientist 4Platform Computing, IBM (2014 Now) PhD Candidate (2011 Now) 4Lassonde School of Engineering,

More information

Privacy policy SIdP website EU 2016/679

Privacy policy SIdP website EU 2016/679 Privacy policy SIdP website EU 2016/679 Categories of data subjects: Website users and users of the members-only area Update of the privacy policy: 30/08/2018 The present document contains the information

More information

How Insurers are Realising the Promise of Big Data

How Insurers are Realising the Promise of Big Data How Insurers are Realising the Promise of Big Data Jason Hunter, CTO Asia-Pacific, MarkLogic A Big Data Challenge: Pushing the Limits of What's Possible The Art of the Possible Multiple Government Agencies

More information

Ready, Willing & Able. Michael Cover, Manager, Blue Cross Blue Shield of Michigan

Ready, Willing & Able. Michael Cover, Manager, Blue Cross Blue Shield of Michigan Ready, Willing & Able Michael Cover, Manager, Blue Cross Blue Shield of Michigan Agenda 1. Organization Overview 2. GRC Journey Story 3. GRC Program Roadmap 4. Program Objectives and Guiding Principals

More information

Data Storage Infrastructure at Facebook

Data Storage Infrastructure at Facebook Data Storage Infrastructure at Facebook Spring 2018 Cleveland State University CIS 601 Presentation Yi Dong Instructor: Dr. Chung Outline Strategy of data storage, processing, and log collection Data flow

More information

Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391

Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391 Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391 Outline Big Data Big Data Examples Challenges with traditional storage NoSQL Hadoop HDFS MapReduce Architecture 2 Big Data In information

More information

Analytics in the Cloud Mandate or Option?

Analytics in the Cloud Mandate or Option? Analytics in the Cloud Mandate or Option? Rick Lower Sr. Director of Analytics Alliances Teradata 1 The SAS & Teradata Partnership Overview Partnership began in 2007 to improving analytic performance Teradata

More information

EMERGING TRENDS IN WHITE COLLAR CRIMES

EMERGING TRENDS IN WHITE COLLAR CRIMES EMERGING TRENDS IN WHITE COLLAR CRIMES Presented by: Collins Ojiambo Were CISA, CISM, Mciarb, AVSEC PMC Country Lead, Kroll Associates Thursday 24 th October 2018 Credibility. Professionalism. AccountAbility

More information

Oracle GoldenGate for Big Data

Oracle GoldenGate for Big Data Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines

More information

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 1 OBJECTIVES ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 2 WHAT

More information

Big Data and Cloud Computing

Big Data and Cloud Computing Big Data and Cloud Computing Presented at Faculty of Computer Science University of Murcia Presenter: Muhammad Fahim, PhD Department of Computer Eng. Istanbul S. Zaim University, Istanbul, Turkey About

More information

Canadian Anti-Spam Legislation (CASL)

Canadian Anti-Spam Legislation (CASL) Canadian Anti-Spam Legislation (CASL) FREQUENTLY ASKED QUESTIONS The purpose of this document is to assist and guide U of R staff and faculty members to understand their obligations under the Canadian

More information

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack Chief Architect RainStor Agenda Importance of Hadoop + data compression Data compression techniques Compression,

More information

Large-Scale GPU programming

Large-Scale GPU programming Large-Scale GPU programming Tim Kaldewey Research Staff Member Database Technologies IBM Almaden Research Center tkaldew@us.ibm.com Assistant Adjunct Professor Computer and Information Science Dept. University

More information

Apache Spark: A Literature Review. Presenter: Aaron Sarson

Apache Spark: A Literature Review. Presenter: Aaron Sarson Apache Spark: A Literature Review Presenter: Aaron Sarson Outline Introduction to Spark Problem to be addressed Proposed Approach Ø Research Questions Contributions Results Ø RQ1, RQ2, RQ3 Conclusion &

More information

10 WEBSITE MISTAKES EVEN GREAT MARKETERS CAN MAKE. Jessica Bybee-Dziedzic Saffire

10 WEBSITE MISTAKES EVEN GREAT MARKETERS CAN MAKE. Jessica Bybee-Dziedzic Saffire 10 WEBSITE MISTAKES EVEN GREAT MARKETERS CAN MAKE Jessica Bybee-Dziedzic Saffire BUT SOMETHING WAS MISSING We wanted to HELP MORE PEOPLE! Beautiful, Unique Designs PRINT-AT-HOME TICKETS SCANNING

More information

We offer background check and identity verification services to employers, businesses, and individuals. For example, we provide:

We offer background check and identity verification services to employers, businesses, and individuals. For example, we provide: This Privacy Policy applies to the websites, screening platforms, mobile applications, and APIs (each, a Service ) owned and/or operated by Background Research Solutions, LLC ("we"/ BRS ). It also describes

More information

<Insert Picture Here> Introduction to Big Data Technology

<Insert Picture Here> Introduction to Big Data Technology Introduction to Big Data Technology The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

A New HadoopBased Network Management System with Policy Approach

A New HadoopBased Network Management System with Policy Approach Computer Engineering and Applications Vol. 3, No. 3, September 2014 A New HadoopBased Network Management System with Policy Approach Department of Computer Engineering and IT, Shiraz University of Technology,

More information

Deploying Applications on DC/OS

Deploying Applications on DC/OS Mesosphere Datacenter Operating System Deploying Applications on DC/OS Keith McClellan - Technical Lead, Federal Programs keith.mcclellan@mesosphere.com V6 THE FUTURE IS ALREADY HERE IT S JUST NOT EVENLY

More information

Microsoft Exam

Microsoft Exam Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred

More information

NHS R&D Forum Privacy Policy: FINAL v0.1 May 25 th 2018

NHS R&D Forum Privacy Policy: FINAL v0.1 May 25 th 2018 NHS R&D Forum Privacy Policy: FINAL v0.1 May 25 th 2018 This privacy policy is published to provide transparent information about how we use, share and store any personal information that you may provide

More information

Distributed Systems CS6421

Distributed Systems CS6421 Distributed Systems CS6421 Intro to Distributed Systems and the Cloud Prof. Tim Wood v I teach: Software Engineering, Operating Systems, Sr. Design I like: distributed systems, networks, building cool

More information

"Charting the Course... Certified Information Systems Auditor (CISA) Course Summary

Charting the Course... Certified Information Systems Auditor (CISA) Course Summary Course Summary Description In this course, you will perform evaluations of organizational policies, procedures, and processes to ensure that an organization's information systems align with overall business

More information

BIG DATA TESTING: A UNIFIED VIEW

BIG DATA TESTING: A UNIFIED VIEW http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation

More information

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics

Scaling Up HBase. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Scaling Up HBase Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based on materials

More information

Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora

Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora Turning NoSQL data into Graph Playing with Apache Giraph and Apache Gora Team Renato Marroquín! PhD student: Interested in: Information retrieval. Distributed and scalable data management. Apache Gora:

More information

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation

More information