Data Quality and Cleaning
|
|
- Silvester Sutton
- 5 years ago
- Views:
Transcription
1 Data Quality and Cleaning A Case of Mobile Phone Survey Data INNA KOUPER DATA TO INSIGHT CENTER SCHOOL OF INFORMATICS AND COMPUTING INDIANA UNIVERSITY September,
2 Why DQ Data becomes: Big Frequent Heterogeneous Collaborative Integrated Reusable Shared
3 Agricultural Decision Making and Food Security in Africa When to plant and harvest What to plant How to grow What weather looks like Are all of your maize fields planted now? Did it rain on your fields this week? Did you plant any maize in the last 7 days? How many 50kg bags of maize do you have in storage now? What seed variety did you plant?
4 Common activities impacting DQ Redman, Thomas C. Data Quality: The Field Guide. Digital Press
5 DQ - Database / industry approach Accuracy The data was recorded correctly. Completeness All relevant data was recorded. Uniqueness Entities are recorded once. Timeliness The data is kept up to date. Consistency The data agrees with itself. Exploratory Data Mining and Data Quality, T. Dasu and T. Johnson, Wiley, 2004
6 DQ - Government approach Utility The data is useful to the public. Objectivity The data is accurate, clear, complete, and unbiased. The data is documented and/or reproducible. The data is subject to peer-review. Integrity The data is protected from corruption and unauthorized action.
7 DQ - Research lifecycle approach Validity Accuracy Consistency Integrity Completeness Context
8 Factors affecting mobile data quality Sampling Medium / message No interviewer Technical e.g., poor network signal, discharged device Social / individual Errors Non-response Policy / economical e.g., literacy level, no funds for texting Limitations of mobile and texting platforms
9 Completeness
10 Accuracy 1 Did it rain this week? Please answer yes or no : Yes, yes, YES, YESS, YAS No, no, No!, N0, NO, no rain 50
11 Accuracy / Consistency Did it rain this week? Please answer yes or no : No Yes
12 Consistency How many 50kg bags of maize do you have in storage now? Week 1 30 Week 2 25 Week 3 25 Week 4 20 Week 5 60
13 Context: Can you interpret the data?
14 Approaches to data quality Preemptive Processes (data management) Metadata and domain expertise Diagnostic Statistics Databases (data mining) Retrospective Data cleaning
15 Processes Decide where to store raw data and products Standardize content and formats Assign responsibility: data stewards Monitor data Archive data
16 Data Processing Pipeline TextIt Server Quality Control Metadata Management Cleaning and anomaly detection Build products Development Server Test Server Production Server MongoDB
17 Data monitoring
18
19 Metadata From the platform From the team Country "uuid": "f07a bc-ab11-572f4b562aa9", Season "name": "harvest flow 25 Apr 2016", Creator "runs": 674, Date created "completed_runs": 252, Run start date "label": "coll_fuelwood" Run start time "label": "harvest" Run end date Run end time Flow type List of questions
20 Data cleaning Identify variable types Check for invalid / incorrect values and missing values Conduct frequency analysis (mean, median, min, max, STD) Identify outliers Identify what can be corrected Decide how to treat outliers and missing values Evaluate time needed for automated and manual cleaning Use tools or manual correction Are there patterns in missing data?
21 Using Google (Open) Refine to clean data String to number Remove words ( bags ) Cluster similar values ( NO, no )
22 Difficulties in cleaning How many 50kg bags of maize do you expect to harvest? !50 You cant tell now rains have jast staoted 5O KGS ONLY
23 Questions for discussion How can we define data quality to reflect differences in types of data and its uses and the dynamic nature of research? What indicators can help to track quality processes and improvements? Who is responsible for ensuring high data quality? What preemptive techniques can help to improve the quality of mobile data?
SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC
SAP Agile Data Preparation Simplify the Way You Shape Data Introduction SAP Agile Data Preparation Overview Video SAP Agile Data Preparation is a self-service data preparation application providing data
More informationPreprocessing Short Lecture Notes cse352. Professor Anita Wasilewska
Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept
More informationFeed the Future Innovation Lab for Peanut (Peanut Innovation Lab) Data Management Plan Version:
Feed the Future Innovation Lab for Peanut (Peanut Innovation Lab) Data Management Plan Version: 20180316 Peanut Innovation Lab Management Entity The University of Georgia, Athens, Georgia Feed the Future
More informationQuality Assured (QA) data
Quality Assured (QA) data Towards DOI quality of data generated at the UFZ Mark Frenzel (Ecologist) & Thomas Schnicke (IT) DataCite / Helmholtz Open Science Workshop Leipzig, 12.01.2016 QA + DOI: Best
More informationBased on Big Data: Hype or Hallelujah? by Elena Baralis
Based on Big Data: Hype or Hallelujah? by Elena Baralis http://dbdmg.polito.it/wordpress/wp-content/uploads/2010/12/bigdata_2015_2x.pdf 1 3 February 2010 Google detected flu outbreak two weeks ahead of
More informationIMPLEMENTING SECURITY, PRIVACY, AND FAIR DATA USE PRINCIPLES
IMPLEMENTING SECURITY, PRIVACY, AND FAIR DATA USE PRINCIPLES Introductions Agenda Overall data risk and benefit landscape / shifting risk and opportunity landscape and market expectations Looking at data
More informationChecklist and guidance for a Data Management Plan, v1.0
Checklist and guidance for a Data Management Plan, v1.0 Please cite as: DMPTuuli-project. (2016). Checklist and guidance for a Data Management Plan, v1.0. Available online: https://wiki.helsinki.fi/x/dzeacw
More informationMidwest Big Data Hub Accelerating the Big Data Innovation Ecosystem
Ed Seidel PI (Illinois) Beth Plale Co-PI (Indiana) Sarah Nusser Co-PI (Iowa State) Brian Athey Co-PI (Michigan) Josh Riedy Co-PI, (UND) Melissa Cragin ED (Illinois) SEEDCorn: Sustainable Enabling Environment
More informationWKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems
Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring
More informationDataBridge: CREATING BRIDGES TO FIND DARK DATA. Vol. 3, No. 5 July 2015 RENCI WHITE PAPER SERIES. The Team
Vol. 3, No. 5 July 2015 RENCI WHITE PAPER SERIES DataBridge: CREATING BRIDGES TO FIND DARK DATA The Team HOWARD LANDER Senior Research Software Developer (RENCI) ARCOT RAJASEKAR, PhD Chief Domain Scientist,
More informationMultilingual Information Access for Digital Libraries The Metadata Records Translation Project
Multilingual Information Access for Digital Libraries The Metadata Records Translation Project Jiangping Chen Http://max.lis.unt.edu/ Jiangping.chen@unt.edu July 2011 Presentation Outline About Me Current
More informationThe ODP Focal Point leads their agency s contribution to ensuring the NSDP meets e-gdds requirements on an ongoing basis.
Focal Point Guide 2 PREFACE The African Development Bank (AfDB) and the International Monetary Fund (IMF) have collaborated to provide an Open Data Platform (ODP) for African countries and regional organizations.
More informationData Mining on Agriculture Data using Neural Networks
Data Mining on Agriculture Data using Neural Networks June 26th, 28 Outline Data Details Data Overview precision farming cheap data collection GPS-based technology divide field into small-scale parts treat
More informationNeAT Business Plan Component Data Integration and Annotation Services in Biodiversity (DIAS-B) 1. Service Description
NeAT Business Plan Component Data Integration and Annotation Services in Biodiversity (DIAS-B) 1. Service Description 1.1. Description of a research community and the eresearch service need The Atlas of
More informationTutorial of the Breeding Planner (BP) for Marker Assisted Backcrossing (MABC)
Tutorial of the Breeding Planner (BP) for Marker Assisted Backcrossing (MABC) BP system consists of three tools relevant to molecular breeding. MARS: Marker Assisted Recurrent Selection MABC: Marker Assisted
More informationProgress Report World Wide Web Foundation Vision, Programs, Plans
Advance the Web to Empower People Progress Report World Wide Web Foundation Vision, Programs, Plans Steve Bratt, CEO World Wide Web Foundation W3C Advisory Committee Meeting March 2010 World Wide Web Foundation
More informationMAIN REFORM AND CAPACITY BUILDING OF ECONOMIC STATISTICS IN CHINA
MAIN REFORM AND CAPACITY BUILDING OF ECONOMIC STATISTICS IN CHINA Wang Ping National Bureau of Statistics in China Contents Main reform in official statistics production of China statistics in China 1
More informationData Quality Framework
#THETA2017 Data Quality Framework Mozhgan Memari, Bruce Cassidy The University of Auckland This work is licensed under a Creative Commons Attribution 4.0 International License Two Figures from 2016 The
More informationThe Computation and Data Needs of Canadian Astronomy
Summary The Computation and Data Needs of Canadian Astronomy The Computation and Data Committee In this white paper, we review the role of computing in astronomy and astrophysics and present the Computation
More informationThe Data Science Process. Polong Lin Big Data University Leader & Data Scientist IBM
The Data Science Process Polong Lin Big Data University Leader & Data Scientist IBM polong@ca.ibm.com Every day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data
More informationEasy Knowledge Engineering and Usability Evaluation of Longan Knowledge-Based System
Easy Knowledge Engineering and Usability Evaluation of Longan Knowledge-Based System ChureeTechawut 1,*, Rattasit Sukhahuta 1, Pawin Manochai 2, Jariya Visithpanich 3, Yuttana Khaosumain 4 1 Computer Science
More informationNorth American Market for Electronic Content Archiving
An Osterman Research Industry Survey Report January 2016 Osterman Research, Inc. P.O. Box 1058 Black Diamond, Washington 98010-1058 USA Tel: +1 206 683 5683 Tel: +1 206 905 1010 info@ostermanresearch.com
More informationData Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality
Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing
More informationHarvesting Democracy: Archiving Federal Government Web Content at End of Term
Harvesting Democracy: Archiving Federal Government Web Content at End of Term Jefferson Bailey, Director, Web Archiving, Internet Archive @jefferson_bail jefferson@archive.org Abbie Grotke, Web Archiving
More informationTutorial of the Breeding Planner (BP) for Marker Assisted Recurrent Selection (MARS)
Tutorial of the Breeding Planner (BP) for Marker Assisted Recurrent Selection (MARS) BP system consists of three tools relevant to molecular breeding. MARS: Marker Assisted Recurrent Selection MABC: Marker
More informationData Mining Concepts & Tasks
Data Mining Concepts & Tasks Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Sept 9, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos Last Time
More informationStandard Glossary of Terms used in Software Testing. Version 3.2. Foundation Extension - Usability Terms
Standard Glossary of Terms used in Software Testing Version 3.2 Foundation Extension - Usability Terms International Software Testing Qualifications Board Copyright Notice This document may be copied in
More informationResearch Data Management and Institutional Repositories
Research Data Management and Institutional Repositories 2014 LIS Research Symposium UNISA Dr Lucia Lötter Social science that makes a difference 24 July 2014 Presentation overview Data and Research Data
More informationChallenges and Opportunities with Big Data. By: Rohit Ranjan
Challenges and Opportunities with Big Data By: Rohit Ranjan Introduction What is Big Data? Big data is data sets that are so voluminous and complex that traditional data processing application software
More informationHortonworks DataPlane Service
Data Steward Studio Administration () docs.hortonworks.com : Data Steward Studio Administration Copyright 2016-2017 Hortonworks, Inc. All rights reserved. Please visit the Hortonworks Data Platform page
More informationHPC Progress and Response to the National Cyber-Infrastructure
HPC Progress and Response to the National Cyber-Infrastructure Happy Sithole Center for High Performance Computing Email: hsithole@csir.co.za Phone: +27 21 658 2745 Website: http://www.chpc.ac.za The CHPC
More information7 The Protection of Certification Marks under the Trademark Act (*)
7 The Protection of Certification Marks under the Trademark Act (*) In this research, I examined the certification and verification business practices of certification bodies, the use of certification
More informationReproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team
Reproducible & Transparent Computational Science with Galaxy Jeremy Goecks The Galaxy Team 1 Doing Good Science Previous talks: performing an analysis setting up and scaling Galaxy adding tools libraries
More informationSEO PROPOSAL YOUR SEO CAMPAIGN YOUR SEO PROPOSAL CAMPAIGN STRATEGY
SEO PROPOSAL CAMPAIGN STRATEGY YOUR SEO CAMPAIGN Mr. Pipeline sets out to find you the right leads that will convert at a higher rate. We do not obsess about increasing page rankings, but over time will
More informationAbout Knowledge Convergence. e-infrastructures Austria an interdisciplinary case study concerning research resources and their management
About Knowledge Convergence e-infrastructures Austria an interdisciplinary case study concerning research resources and their management Paolo Budroni The Munin Conference Tromsø, 27th November 2014 THE
More informationDomestic electricity consumption analysis using data mining techniques
Domestic electricity consumption analysis using data mining techniques Prof.S.S.Darbastwar Assistant professor, Department of computer science and engineering, Dkte society s textile and engineering institute,
More informationHow App Ratings and Reviews Impact Rank on Google Play and the App Store
APP STORE OPTIMIZATION MASTERCLASS How App Ratings and Reviews Impact Rank on Google Play and the App Store BIG APPS GET BIG RATINGS 13,927 AVERAGE NUMBER OF RATINGS FOR TOP-RATED IOS APPS 196,833 AVERAGE
More informationNowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?
Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/
More informationPowering Official Statistics at Statistics New Zealand with DDI-L and Colectica
Powering Official Statistics at Statistics New Zealand with DDI-L and A Case Study Authors 2 Adam Brown adam.brown@stats.govt.nz Jeremy Iverson jeremy@colectica.com Sally Vermaaten sally.vermaaten@stats.govt.nz
More informationData Quality Assessment Tool for health and social care. October 2018
Data Quality Assessment Tool for health and social care October 2018 Introduction This interactive data quality assessment tool has been developed to meet the needs of a broad range of health and social
More informationDS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH116 Fall 2017 Merged CS586 and DS504 Examples of Reviews/ Critiques Random
More informationCSE 701: LARGE-SCALE GRAPH MINING. A. Erdem Sariyuce
CSE 701: LARGE-SCALE GRAPH MINING A. Erdem Sariyuce WHO AM I? My name is Erdem Office: 323 Davis Hall Office hours: Wednesday 2-4 pm Research on graph (network) mining & management Practical algorithms
More informationThe NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets
The NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets Jeffrey Brown, Lesley Curtis, and Rich Platt June 13, 2014 Previously The NIH Collaboratory:
More informationThe CROS portal. A platform for your collaborative initiative? Jean-Marie Bolis & Martin Karlberg ESTAT B1 17 November 2017.
The CROS portal A platform for your collaborative initiative? Jean-Marie Bolis & Martin Karlberg ESTAT B1 17 November 2017 1 Introduction and general information on the CROS portal Finding information
More informationRecord Linkage using Probabilistic Methods and Data Mining Techniques
Doi:10.5901/mjss.2017.v8n3p203 Abstract Record Linkage using Probabilistic Methods and Data Mining Techniques Ogerta Elezaj Faculty of Economy, University of Tirana Gloria Tuxhari Faculty of Economy, University
More informationStat Day 6 Graphs in Minitab
Stat 150 - Day 6 Graphs in Minitab Example 1: Pursuit of Happiness The General Social Survey (GSS) is a large-scale survey conducted in the U.S. every two years. One of the questions asked concerns how
More informationPowering Knowledge Discovery. Insights from big data with Linguamatics I2E
Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural
More informationCORE: Improving access and enabling re-use of open access content using aggregations
CORE: Improving access and enabling re-use of open access content using aggregations Petr Knoth CORE (Connecting REpositories) Knowledge Media institute The Open University @petrknoth 1/39 Outline 1. The
More informationEnabling efficiency through Data Governance: a phased approach
Enabling efficiency through Data Governance: a phased approach Transform your process efficiency, decision-making, and customer engagement by improving data accuracy An Experian white paper Enabling efficiency
More informationPreservation of Web Materials
Preservation of Web Materials Julie Dietrich INFO 560 Literature Review 7/20/13 1 Introduction Websites are a communication and informational tool that can be shared and updated across the World Wide Web.
More informationEnabling Collaboration for Digital Preservation
Enabling Collaboration for Digital Preservation ipres 2009, San Francisco Martha Anderson The Library of Congress .trust and reciprocity lengthen the shadow of the future. Axelrod,The Evolution of Cooperation,1984.
More informationDATA QUALITY KNOWLEDGE MANAGEMENT: A TOOL FOR THE COLLECTION AND ORGANIZATION OF METADATA IN A DATA WAREHOUSE
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2002 Proceedings Americas Conference on Information Systems (AMCIS) December 2002 DATA QUALITY KNOWLEDGE MANAGEMENT: A TOOL FOR
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationWSIS Implementation Process
World Summit on the Information Society (WSIS) WSIS Implementation Process (Turning Targets into Action!) Committee on Information and Communications Technology, second session, Bangkok 24-26 November
More informationPutting DDI in the driver s seat
Putting DDI in the driver s seat Using Metadata to control data capture Samuel Spencer Australian Bureau of Statistics 2010: XForms and DDI January: XForms transform demonstrated within ABS June: XForms
More informationLeveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands
Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Unleash Your Data Center s Hidden Power September 16, 2014 Molly Rector CMO, EVP Product Management & WW Marketing
More informationTechnical Working Session on Profiling Equity Focused Information
Technical Working Session on Profiling Equity Focused Information Using to create, knowledge and wisdom (with a particular focus on meta) 23 26 June, 2015 UN ESCAP, Bangkok 24/06/2015 1 Aims 1. Outline
More informationIUNI Web of Science Data Enclave 102
Enclave 102 Katy Börner and Robert Light Cyberinfrastructure for Network Science Center School of Informatics and Computing and IUNI Indiana University, USA Val Pentchev, Matt Hutchinson, and Benjamin
More informationData Curation Profile Human Genomics
Data Curation Profile Human Genomics Profile Author Profile Author Institution Name Contact J. Carlson N. Brown Purdue University J. Carlson, jrcarlso@purdue.edu Date of Creation October 27, 2009 Date
More informationDS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 232 Fall 2016 The Data Equation Oceans of Data Ocean Biodiversity Informatics,
More informationThe PICTURE project, ICT R&I priorities in EaP, areas of cooperation
The PICTURE project, ICT R&I priorities in EaP, areas of cooperation With the EU PICTURE project participants Yerevan, September 26,2013 THEME 1 : PICTURE PROJECT Svetlana Klessova, project coordinator
More informationData Mining Concepts & Tasks
Data Mining Concepts & Tasks Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Jan 16, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos Last Time
More informationProf. Ahmet Süerdem Istanbul Bilgi University London School of Economics
Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics Media Intelligence Business intelligence (BI) Uses data mining techniques and tools for the transformation of raw data into meaningful
More informationIntroduction to Data Management for Ocean Science Research
Introduction to Data Management for Ocean Science Research Cyndy Chandler Biological and Chemical Oceanography Data Management Office 12 November 2009 Ocean Acidification Short Course Woods Hole, MA USA
More informationDDI metadata for IPUMS I samples
DDI metadata for IPUMS I samples Wendy Thomas Workshop Integrating Global Census Microdata : Dublin Ireland, 58th ISI What is DDI DDI is a metadata standard d focused ocusedprimarily on microdata from
More informationThe United Republic of Tanzania THE THIRD QUARTER GROSS DOMESTIC PRODUCT (JULY - SEPTEMBER) 2015
The United Republic of Tanzania THE THIRD QUARTER GROSS DOMESTIC PRODUCT (JULY - SEPTEMBER) 2015 National Bureau of Statistics Ministry of Finance and Planning January 2016 1.0 INTRODUCTION The National
More informationThreat-Based Metrics for Continuous Enterprise Network Security
Threat-Based Metrics for Continuous Enterprise Network Security Management and James Riordan Lexington, MA 02420-9108 {lippmann,james.riordan}@ll.mit.edu To be Presented at IFIP Working Group 10.4 Workshop
More informationExecutive Committee Meeting
Executive Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers
More informationPre-Requisites: CS2510. NU Core Designations: AD
DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification
More informationSEMANTIC NETWORK AND SEARCH IN VEHICLE ENGINEERING
Martin Sturm, Sylke Rosenplaenter SEMANTIC NETWORK AND SEARCH IN VEHICLE ENGINEERING From Concept to Deployment Vehicle Design Operations & System Development GM Europe Engineering Adam Opel AG www.opel.com
More informationStatistical Yearbook for Africa
Statistical Yearbook for Africa Statistics Division, FAORAF Food and Agriculture Organization of the United Nations AFCAS 23, 2013 1 The background Previous yearbook: based on excel sheets, no or limited
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationSEO PROPOSAL YOUR SEO CAMPAIGN YOUR SEO PROPOSAL CAMPAIGN STRATEGY
SEO PROPOSAL CAMPAIGN STRATEGY YOUR SEO CAMPAIGN WorkWave Marketing sets out to find you the right leads that will convert at a higher rate. We do not obsess about increasing page rankings, but over time
More informationApplications to support the curation of African government microdata for research purposes
Statistics SA/OECD Seminar on Innovative Approaches to turn Statistics into Knowledge Applications to support the curation of African government microdata for research purposes Lynn Woolfrey, DataFirst,
More informationDL User Interfaces. Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza
DL User Interfaces Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza Delos work on DL interfaces Delos Cluster 4: User interfaces and visualization Cluster s goals:
More informationResearch, Development, and Evaluation of a FRBR-Based Catalog Prototype
Research, Development, and Evaluation of a FRBR-Based Catalog Prototype Yin Zhang School of Library and Information Science Kent State University yzhang4@kent.edu Athena Salaba School of Library and Information
More informationArchitecture of Complex Systems Tentative Schedule
Architecture of Complex Systems 2017 Schedule Architecture of Complex Systems Tentative Schedule WEEK 1: Systems Thinking (4.5 hrs) The course Pre-Assessment officially kicks off! Get Started In the first
More informationScholarly collaboration platforms
Scholarly collaboration platforms STM Meeting 22 April 2015 Washington, DC Mark Ware @mrkwr Question: Which social network do researchers know & use almost as much as Google Scholar? Source: Reprinted
More informationQualification Specification for the Knowledge Modules that form part of the BCS Level 3 Software Development Technician Apprenticeship
Qualification Specification for the Knowledge Modules that form part of the BCS Level 3 Software Development Technician Apprenticeship Level 3 Certificate in Software Development Context and Methodologies
More informationGlobal Partnership for Sustainable Development and Data Roadmaps. INTRODUCTION 14 June 2016
Global Partnership for Sustainable Development and Data Roadmaps INTRODUCTION 14 June 2016 1 The Global Sustainable Development Goals MDGs (2000-2015) Developing country focused Social SDGs (2015-2030)
More informationThaddeus (Thad) Pennas, FHI360
Integrated SBCC Programs: Key Challenges and Promising Strategies New horizons in data collection for integrated SBC Programs. Experience from Ghana and Malawi. Thaddeus (Thad) Pennas, FHI360 Overview
More informationData Mining. Jeff M. Phillips. January 9, 2013
Data Mining Jeff M. Phillips January 9, 2013 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational statistics? Data
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu September 10, 2013 2: Data Pre-Processing Getting to know your data Basic Statistical Descriptions of Data
More informationGlobal Initiatives in Support of Measurements of SDGs
Global Initiatives in Support of Measurements of SDGs UN Statistics division Taking Collective Action to Accelerate Transformation of Official Statistics for Agenda 2030 27 28 March 2017, Bangkok 48 th
More informationSome Big Data Challenges
Some Big Data Challenges 2,500,000,000,000,000,000 Bytes (2.5 x 10 18 ) of data are created every day! (2012) or 8,000,000,000,000,000,000 (8 exabytes) of new data were stored globally by enterprises in
More informationRural/Urban Divides in Mobile Coverage Expansion
Rural/Urban Divides in Mobile Coverage Expansion Pierre Biscaye & C. Leigh Anderson Evans School Policy Analysis & Research Group (EPAR) Evans School of Public Policy & Governance, University of Washington,
More informationData Governance in Mass upload processes Case KONE. Finnish Winshuttle User Group , Helsinki
Data Governance in Mass upload processes Case KONE Finnish Winshuttle User Group 6.11.2014, Helsinki Just IT Mastering the Data Just IT is a Finnish company focusing on Data Governance and Data Management.
More information10th Tranche Development Account Programme on Statistics and Data (DA10)
10th Tranche Development Account Programme on Statistics and Data (DA10) United Nations Statistics Division Regional Seminar on the Implementation of the SDG Indicators 3-4 April 2017, Santiago, Chile
More informationMAPR DATA GOVERNANCE WITHOUT COMPROMISE
MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationManaging the Evolution of Dataflows with VisTrails
Managing the Evolution of Dataflows with VisTrails Juliana Freire http://www.cs.utah.edu/~juliana University of Utah Joint work with: Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger, Claudio
More informationSummary. Machine Learning: Introduction. Marcin Sydow
Outline of this Lecture Data Motivation for Data Mining and Learning Idea of Learning Decision Table: Cases and Attributes Supervised and Unsupervised Learning Classication and Regression Examples Data:
More informationThe What, Why, Who and How of Where: Building a Portal for Geospatial Data. Alan Darnell Director, Scholars Portal
The What, Why, Who and How of Where: Building a Portal for Geospatial Data Alan Darnell Director, Scholars Portal What? Scholars GeoPortal Beta release Fall 2011 Production release March 2012 OLITA Award
More informationA Data Modeling Process. Determining System Requirements. Planning the Project. Specifying Relationships. Specifying Entities
Chapter 3 Entity-Relationship Data Modeling: Process and Examples Fundamentals, Design, and Implementation, 9/e A Data Modeling Process Steps in the data modeling process Plan project Determine requirements
More informationDevelopment of a Social Extension for Real-Time Communication in CAD Software
Development of a Social Extension for Real-Time Communication in CAD Software Markus Müller, 2.11.2015 (Bachelor s Thesis, final presentation) Software Engineering for Business Information Systems (sebis)
More informationApplication of Clustering Techniques to Energy Data to Enhance Analysts Productivity
Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity Wendy Foslien, Honeywell Labs Valerie Guralnik, Honeywell Labs Steve Harp, Honeywell Labs William Koran, Honeywell Atrium
More informationArchives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment
Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment Shigeo Sugimoto Research Center for Knowledge Communities Graduate School of Library, Information
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationRutgers Master Gardener Program of Somerset County Graduating Class of 2019 POSITION DESCRIPTION
Rutgers Master Gardener Program of Somerset County Graduating Class of 2019 POSITION DESCRIPTION TITLES Rutgers Master Gardener Intern: Currently part of the Rutgers Master Gardener training class or volunteering
More information