Large Scale Data Management of Astronomical Surveys with AstroSpark
|
|
- Joanna Carroll
- 5 years ago
- Views:
Transcription
1 Large Scale Data Management of Astronomical Surveys with AstroSpark Mariem BRAHEM 1,2, Karine ZEITOUNI 1, Laurent YEH 1 (1) DAVID Lab University of Versailles (2) CNES Centre National d Etudes Spatiale - Toulouse XLDB th Extremely Large Databases Conference Clermont-Ferrand, October 2017
2 Context & Motivation Big Data in the field of Cosmology 3D map of our Galaxy 1 billion stars observed >1PB Dec >10 billions of objects > 15PB for the catalog > 2020 Gaia mission LSST Project Ø Astronomers need an efficient solution for large scale astronomical data handling. AstroSpark - XLDB
3 Contributions We propose new algorithms and a framework AstroSpark Ø which extends Apache Spark, a distributed in-memory computing engine to process and analyze astronomical data ² Implements operators such as Cone-Search, Cross-Match, knn, Ø offers an expressive programming interface by supporting the unified query language ADQL Ø combines data partitioning and indexing with HEALPIX, a pixelization of data on the sphere, to speed up query processing. ² e.g., HX-Match uses HEALPIX to speed-up Cross-Match Ø implements a query optimizer and provides a set of customized strategies for astronomical queries. AstroSpark - XLDB
4 AstroSpark Architecture Input Data Querying system Astronomical Data Query Language (ADQL) SELECT * FROM gaia JOIN tycho2 ON 1=CONTAINS ( POINT( ICRS, gaia.ra, gaia.dec), CIRCLE( ICRS, tycho2.ra, tycho2.dec, 2/3600)) Data Partitioning Query Language (ADQL) Query Parser Query Optimizer (extended Catalyst) Int. Virtual Observatory Alliance IVOA Healpix library Storage (HDFS) SPARK Core AstroSpark - XLDB
5 Healpix Based Data Par00oning ü Locality: Close data points are likely to be in the same partition & all points withing a Healpix pixel belong to the same partition ü Balance: Partitions have roughly the same size and adapt to data density ü Each node can process many partitions Range = Range = Range = Node 1 Node 2 Node 3 AstroSpark - XLDB 2017 Visualization under ALADIN A tool provided by 5
6 Cross-Matching Example Iden0fy and correlate objects belonging to different observa0ons Could be expressed in Spark SQL : SELECT * FROM R JOIN S ON (2*ASIN(SQRT(SIN((DEC2 - DEC)/2) * SIN((DEC2 - DEC)/2) + COS(DEC2) * COS(DEC) * SIN((RA2 - RA)/ 2) * SIN((RA2 RA)/2))) <= ɛ) But untractable: Cross-matching only 200,000 records of Gaia and Tycho-2 takes 13,6 hours, and more than 12 days for 5 million objects in Gaia and tycho-2! ɛ 6 R S
7 Cross-Matching in AstroSpark Ø HX-Match leverages the space indexing & partitioning along with the HEALPIX NASA Library to guide the data access and limit the pairwise distance computation. Ø Substitutes the costly cartesian product by an Equi-join + Filter Query Plan in Spark SQL Query Plan in AstroSpark 7
8 Results of cross-matching: GAIA DR1 TYCHO2 Logarithmic scale Logarithmic scale AstroSpark - XLDB Gain of the partition materialization
9 Cone Search & knn Search Evalua0on on GAIA DR1 Effect of varying data size on Cone Search 9
10 First Results & Impacts Validated for 3 operators & ADQL queries on real datasets Experiments have shown that AstroSpark is effective in processing astronomical data, scalable and overperforms the state-of-the-art solutions. Publications: Ø Ø Ø Ø M. Brahem, K. Zeitouni & L. Yeh, HX-MATCH: In-Memory Cross- Matching Algorithm for Astronomical Big Data, International Symposium on Spatial and Temporal Databases (SSTD 2017). M. Brahem, K. Zeitouni & L. Yeh, Large Scale Data Management of Astronomical Surveys with AstroSpark, Conference on Big Data from Space (BiDS 2017) K. Zeitouni, M. Brahem & L. Yeh, Large Scale Data Management of Astronomical Surveys with AstroSpark, European Week of Astronomy and Space Science (EWASS 2017) M. Brahem, K. Zeitouni & L. Yeh, AstroSpark: towards a distributed data server for big data in astronomy. Proceedings of the 3 rd ACM SIGSPATIAL PhD Symposium. ACM 2016 AstroSpark - XLDB
VIRTUAL OBSERVATORY TECHNOLOGIES
VIRTUAL OBSERVATORY TECHNOLOGIES / The Johns Hopkins University Moore s Law, Big Data! 2 Outline 3 SQL for Big Data Computing where the bytes are Database and GPU integration CUDA from SQL Data intensive
More informationCDS. André Schaaff1, François-Xavier Pineau1, Gilles Landais1, Laurent Michel2 de Données astronomiques de Strasbourg, 2SSC-XMM-Newton
Docker @ CDS André Schaaff1, François-Xavier Pineau1, Gilles Landais1, Laurent Michel2 1Centre de Données astronomiques de Strasbourg, 2SSC-XMM-Newton Paul Trehiou Université de technologie de Belfort-Montbéliard
More informationTutorial "Gaia in the CDS services" Gaia data Heidelberg June 19, 2018 Sébastien Derriere (adapted from Thomas Boch)
Tutorial "Gaia in the CDS services" Gaia data workshop @ Heidelberg June 19, 2018 Sébastien Derriere (adapted from Thomas Boch) Each section (numbered 1. to 6.) can be done independently. 1. Explore Gaia
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationVirtual Observatory publication of interferometry simulations
Virtual Observatory publication of interferometry simulations Anita Richards, Paul Harrison JBCA, University of Manchester Francois Levrier LRA, ENS Paris Nicholas Walton, Eduardo Gonzalez-Solarez IoA,
More informationTHE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA
THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA Sara Nieto on behalf of B.Altieri, G.Buenadicha, J. Salgado, P. de Teodoro European Space Astronomy Center, European Space Agency, Spain O.R.
More informationCC-IN2P3 / NCSA Meeting May 27-28th,2015
The IN2P3 LSST Computing Effort Dominique Boutigny (CNRS/IN2P3 and SLAC) on behalf of the IN2P3 Computing Team CC-IN2P3 / NCSA Meeting May 27-28th,2015 OSG All Hands SLAC April 7-9, 2014 1 LSST Computing
More informationVisualization of SDSS III (BOSS) Cosmology Data
Visualization of SDSS III (BOSS) Cosmology Data Nazmus Saquib, University of Utah, Salt Lake City, UT 84102 December 16, 2011 Abstract Data collected over three years in the SDSS-III cosmology project
More informationTheme 7 Group 2 Data mining technologies Catalogues crossmatching on distributed database and application on MWA absorption source finding
Theme 7 Group 2 Data mining technologies Catalogues crossmatching on distributed database and application on MWA absorption source finding Crossmatching is a method to find corresponding objects in different
More informationTHE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA
THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA Rees Williams on behalf of A.N.Belikov, D.Boxhoorn, B. Dröge, J.McFarland, A.Tsyganov, E.A. Valentijn University of Groningen, Groningen,
More informationEuroplanet IDIS: Adapting existing VO building blocks to Planetary Sciences
Europlanet IDIS: Adapting existing VO building blocks to Planetary Sciences B. Cecconi, LESIA, Observatoire de Paris, France Cospar-2012, Mysore EPN/IDIS Building a planetary VO prototype VO = Virtual
More informationThe Ef'iciency of Spatial Indexing Methods Applied to Large Astronomical Databases
The Ef'iciency of Spatial Indexing Methods Applied to Large Astronomical Databases G. B. Berriman and J. C. Good Caltech/IPAC, Mail Stop 100-22, Pasadena, CA 91125 B. Shiao and T. Donaldson Space Telescope
More informationEuclid Archive Science Archive System
Euclid Archive Science Archive System Bruno Altieri Sara Nieto, Pilar de Teodoro (ESDC) 23/09/2016 Euclid Archive System Overview The EAS Data Processing System (DPS) stores the data products metadata
More informationPROCESSING THE GAIA DATA IN CNES: THE GREAT ADVENTURE INTO HADOOP WORLD
CHAOUL Laurence, VALETTE Véronique CNES, Toulouse PROCESSING THE GAIA DATA IN CNES: THE GREAT ADVENTURE INTO HADOOP WORLD BIDS 16, March 15-17th 2016 THE GAIA MISSION AND DPAC ARCHITECTURE AGENDA THE DPCC
More informationThe Canadian CyberSKA Project
The Canadian CyberSKA Project A. G. Willis (on behalf of the CyberSKA Project Team) National Research Council of Canada Herzberg Institute of Astrophysics Dominion Radio Astrophysical Observatory May 24,
More informationThe IPAC Research Archives. Steve Groom IPAC / Caltech
The IPAC Research Archives Steve Groom IPAC / Caltech IPAC overview The Infrared Processing and Analysis Center (IPAC) at Caltech is dedicated to science operations, data archives, and community support
More informationVersatile access to HEALPix based sky region objects within PostgreSQL data bases with PgSphere
Versatile access to HEALPix based sky region objects within PostgreSQL data bases with PgSphere Markus Nullmeier Zentrum für Astronomie der Universität Heidelberg Astronomisches Rechen Institut mnullmei@ari.uni.heidelberg.de
More informationSDSS Dataset and SkyServer Workloads
SDSS Dataset and SkyServer Workloads Overview Understanding the SDSS dataset composition and typical usage patterns is important for identifying strategies to optimize the performance of the AstroPortal
More informationPerformance-related aspects in the Big Data Astronomy Era: architects in software optimization
Performance-related aspects in the Big Data Astronomy Era: architects in software optimization Daniele Tavagnacco - INAF-Observatory of Trieste on behalf of EUCLID SDC-IT Design and Optimization image
More informationCDS X-match service API
CDS X-match service API François-Xavier Pineau 1, Thomas Boch 1 1 CDS, Observatoire Astronomique de Strasbourg IVOA Interop, Heidelberg François-Xavier Pineau (CDS) CDS X-match API 14/05/2013 1 / 9 Intro
More informationOUZO for indexing sets
OUZO for indexing sets Accelerating queries to sets with GIN, GiST, and custom indexing extensions Markus Nullmeier Zentrum für Astronomie der Universität Heidelberg Astronomisches Rechen-Institut mnullmei@ari.uni.heidelberg.de
More informationDesigning the Future Data Management Environment for [Radio] Astronomy. JJ Kavelaars Canadian Astronomy Data Centre
Designing the Future Data Management Environment for [Radio] Astronomy JJ Kavelaars Canadian Astronomy Data Centre 2 Started working in Radio Data Archiving as Graduate student at Queen s in 1993 Canadian
More informationTechnological Challenges in the GAIA Archive
Technological Challenges in the GAIA Archive Juan Gonzalez jgonzale at sciops.esa.int Jesus Salgado jsalgado at sciops.esa.int ESA Science Archives Team IVOA Interop 2013, Heidelberg May 2013 Presentation
More informationAccelerating queries of set data types with GIN, GiST, and custom indexing extensions
Accelerating queries of set data types with GIN, GiST, and custom indexing extensions Markus Nullmeier Zentrum für Astronomie der Universität Heidelberg Astronomisches Rechen-Institut mnullmei@ari.uni.heidelberg.de
More informationA scalability comparison study of data management approaches for smart metering systems
A scalability comparison study of data management approaches for smart metering systems Houssem Chihoub, Chris.ne Collet Grenoble INP houssem.chihoub@imag.fr Journées Plateformes Clermont Ferrand 6-7 octobre
More informationDistance-based Outlier Detection: Consolidation and Renewed Bearing
Distance-based Outlier Detection: Consolidation and Renewed Bearing Gustavo. H. Orair, Carlos H. C. Teixeira, Wagner Meira Jr., Ye Wang, Srinivasan Parthasarathy September 15, 2010 Table of contents Introduction
More informationSubmitted to: Dr. Sunnie Chung. Presented by: Sonal Deshmukh Jay Upadhyay
Submitted to: Dr. Sunnie Chung Presented by: Sonal Deshmukh Jay Upadhyay Submitted to: Dr. Sunny Chung Presented by: Sonal Deshmukh Jay Upadhyay What is Apache Survey shows huge popularity spike for Apache
More informationIntroduction to Relational Databases
Introduction to Relational Databases Third La Serena School for Data Science: Applied Tools for Astronomy August 2015 Mauro San Martín msmartin@userena.cl Universidad de La Serena Contents Introduction
More informationBlended Learning Outline: Cloudera Data Analyst Training (171219a)
Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills
More informationFocus Session on Multi-dimensional Data
Focus Session on Multi-dimensional Data Introduction Mark Allen, Joe Lazio IVOA Interoperability Meeting, ESAC, Madrid, May 20, 2014 CoSADIE Project Science Priority Areas Multi-dimensional Data image:
More informationHarnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets
Page 1 of 5 1 Year 1 Proposal Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets Year 1 Progress Report & Year 2 Proposal In order to setup the context for this progress
More informationExploiting Virtual Observatory and Information Technology: Techniques for Astronomy
Exploiting Virtual Observatory and Information Technology: Techniques for Astronomy Nicholas Walton AstroGrid Project Scientist Institute of Astronomy, The University of Cambridge Lecture #3 Goal: Applications
More informationHiPS Hierarchical Progressive Survey
International Virtual Observatory Alliance HiPS Hierarchical Progressive Survey Version 1.0 IVOA Note 15 th October 2015 Previous version(s): None Authors: Pierre Fernique [CDS] Mark Allen [CDS] Thomas
More informationACM MM Dong Liu, Shuicheng Yan, Yong Rui and Hong-Jiang Zhang
ACM MM 2010 Dong Liu, Shuicheng Yan, Yong Rui and Hong-Jiang Zhang Harbin Institute of Technology National University of Singapore Microsoft Corporation Proliferation of images and videos on the Internet
More informationNoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018
NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data
More informationESA Science Archives Architecture Evolution. Iñaki Ortiz de Landaluce Science Archives Team 13 th Sept 2013
ESA Science Archives Architecture Evolution Iñaki Ortiz de Landaluce Science Archives Team 13 th Sept 2013 Outline Introduction: ESA Science Archives Archives Architecture Evolution User Interfaces and
More informationTechnology for the Virtual Observatory. The Virtual Observatory. Toward a new astronomy. Toward a new astronomy
Technology for the Virtual Observatory BRAVO Lecture Series, INPE, Brazil July 23-26, 2007 1. Virtual Observatory Summary 2. Service Architecture and XML 3. Building and Using Services 4. Advanced Services
More informationLong-term management of 1000s of All-Sky reference data sets using the HiPS network
Long-term management of 1000s of All-Sky reference data sets using the HiPS network ADASS October 2016 - Trieste Présenté par P.Fernique, T.Boch, A. Oberto, M. Allen, D. Durand, K. Ebisawa, B. Merin, J.
More information4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)
4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,
More informationMalaga workshop, May 2009
DB multi-depth sky pixelization customizing MySQL with HEALPix and HTM - II Luciano Nicastro 1 & Giorgio Calderone 2 INAF-IASF, 1 Bologna, 2 Palermo Malaga workshop, 18-21 May 2009 Summary Introduction
More informationP Structured Query Language for Virtual Observatory
P1.1.23 Structured Query Language for Virtual Observatory Yuji Shirasaki National Astronomical Observatory of Japan, and Masahiro Tanaka (NAOJ), Satoshi Honda (NAOJ), Yoshihiko Mizumoto (NAOJ), Masatoshi
More informationNew Trends in Database Systems
New Trends in Database Systems Ahmed Eldawy 9/29/2016 1 Spatial and Spatio-temporal data 9/29/2016 2 What is spatial data Geographical data Medical images 9/29/2016 Astronomical data Trajectories 3 Application
More informationBuilding on Existing Communities: the Virtual Astronomical Observatory (and NIST)
Building on Existing Communities: the Virtual Astronomical Observatory (and NIST) Robert Hanisch Space Telescope Science Institute Director, Virtual Astronomical Observatory Data in astronomy 2 ~70 major
More informationWhat is Data Warehouse like
What is Data Warehouse like in the Big Data Era? Sales (Asia) Data Warehouse Sales (US) ETL ETL Collects and organizes historical data from multiple sources Inventory Advertising ETL ETL So far Ø Star
More informationError indexing of vertices in spatial database -- A case study of OpenStreetMap data
Error indexing of vertices in spatial database -- A case study of OpenStreetMap data Xinlin Qian, Kunwang Tao and Liang Wang Chinese Academy of Surveying and Mapping 28 Lianhuachixi Road, Haidian district,
More informationReviving and extending Pgsphere
Reviving and extending Pgsphere Markus Nullmeier Zentrum für Astronomie der Universität Heidelberg Astronomisches Rechen Institut mnullmei@ari.uni.heidelberg.de Reviving and extending Pgsphere Markus Nullmeier
More informationExtending the SDSS Batch Query System to the National Virtual Observatory Grid
Extending the SDSS Batch Query System to the National Virtual Observatory Grid María A. Nieto-Santisteban, William O'Mullane Nolan Li Tamás Budavári Alexander S. Szalay Aniruddha R. Thakar Johns Hopkins
More informationThe NOAO Data Lab Design, Capabilities and Community Development. Michael Fitzpatrick for the Data Lab Team
The NOAO Data Lab Design, Capabilities and Community Development Michael Fitzpatrick for the Data Lab Team What is it? Data Lab is Science Exploration Platform that provides:! Repository for large datasets
More informationDesign and Implementation of the Japanese Virtual Observatory (JVO) system Yuji SHIRASAKI National Astronomical Observatory of Japan
Design and Implementation of the Japanese Virtual Observatory (JVO) system Yuji SHIRASAKI National Astronomical Observatory of Japan 1 Introduction What can you do on Japanese Virtual Observatory (JVO)?
More informationPopulating the Galaxy Zoo
Populating the Galaxy Zoo Real-time Image Classification with SQL Server R Services David M Smith @revodavid R Community Lead Microsoft Algorithms and Data Science THANKS to all Sponsors! EVENT SPONSORS
More informationBigDataBench- S: An Open- source Scien6fic Big Data Benchmark Suite
BigDataBench- S: An Open- source Scien6fic Big Data Benchmark Suite Xinhui Tian, Shaopeng Dai, Zhihui Du, Wanling Gao, Rui Ren, Yaodong Cheng, Zhifei Zhang, Zhen Jia, Peijian Wang and Jianfeng Zhan INSTITUTE
More informationFishing Activity Visualization with Free Software Bigdata Analytics Institute
Fishing Activity Visualization with Free Software Bigdata Analytics Institute Erico N de Souza, PhD erico.souza@dal.ca Souza, Latouf (Bigdata Inst.) Bigdata Institute 1 / 22 Introduction What would you
More informationThe Virtual Observatory and the IVOA
The Virtual Observatory and the IVOA The Virtual Observatory Emergence of the Virtual Observatory concept by 2000 Concerns about the data avalanche, with in mind in particular very large surveys such as
More informationMassive Online Analysis - Storm,Spark
Massive Online Analysis - Storm,Spark presentation by R. Kishore Kumar Research Scholar Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Kharagpur-721302, India (R
More informationA Tutorial on Apache Spark
A Tutorial on Apache Spark A Practical Perspective By Harold Mitchell The Goal Learning Outcomes The Goal Learning Outcomes NOTE: The setup, installation, and examples assume Windows user Learn the following:
More informationThe Portal Aspect of the LSST Science Platform. Gregory Dubois-Felsmann Caltech/IPAC. LSST2017 August 16, 2017
The Portal Aspect of the LSST Science Platform Gregory Dubois-Felsmann Caltech/IPAC LSST2017 August 16, 2017 1 Purpose of the LSST Science Platform (LSP) Enable access to the LSST data products Enable
More informationApache Spark: A Literature Review. Presenter: Aaron Sarson
Apache Spark: A Literature Review Presenter: Aaron Sarson Outline Introduction to Spark Problem to be addressed Proposed Approach Ø Research Questions Contributions Results Ø RQ1, RQ2, RQ3 Conclusion &
More informationA BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK
A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific
More informationSempala. Interactive SPARQL Query Processing on Hadoop
Sempala Interactive SPARQL Query Processing on Hadoop Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, Georg Lausen University of Freiburg, Germany ISWC 2014 - Riva del Garda, Italy Motivation
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationClustering Billions of Images with Large Scale Nearest Neighbor Search
Clustering Billions of Images with Large Scale Nearest Neighbor Search Ting Liu, Charles Rosenberg, Henry A. Rowley IEEE Workshop on Applications of Computer Vision February 2007 Presented by Dafna Bitton
More information(1) Department of Physics University Federico II, Via Cinthia 24, I Napoli, Italy (2) INAF Astronomical Observatory of Capodimonte, Via
(1) Department of Physics University Federico II, Via Cinthia 24, I-80126 Napoli, Italy (2) INAF Astronomical Observatory of Capodimonte, Via Moiariello 16, I-80131 Napoli, Italy To measure the distance
More informationApplied Spark. From Concepts to Bitcoin Analytics. Andrew F.
Applied Spark From Concepts to Bitcoin Analytics Andrew F. Hart ahart@apache.org @andrewfhart My Day Job CTO, Pogoseat Upgrade technology for live events 3/28/16 QCON-SP Andrew Hart 2 Additionally Member,
More informationSimba: Efficient In-Memory Spatial Analytics.
Simba: Efficient In-Memory Spatial Analytics. Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou and Minyi Guo SIGMOD 16. Andres Calderon November 10, 2016 Simba November 10, 2016 1 / 52 Introduction Introduction
More informationA Tour of LSST Data Management. Kian- Tat Lim DM Project Engineer and System Architect
A Tour of LSST Data Management Kian- Tat Lim DM Project Engineer and System Architect Welcome Aboard Choo Yut Shing @flickr, CC BY-NC-SA 2.0 2 What We Do Accept and archive images and metadata Generate
More information3D visualization of astronomy data using immersive displays
ithes coffee meeting Riken 2016-12-09 3D visualization of astronomy data using immersive displays Gilles Ferrand Research Scientist Astrophysical Big Bang Laboratory 01 A collaboration Astronomy Computer
More informationVAPE Virtual observatory Aided Publishing for Education
VAPE Virtual observatory Aided Publishing for Education http://ia2-edu.oats.inaf.it:8080/vape VAPE is an application for the publication of educational data in the Virtual Observatory (VO). VAPE has been
More informationLASDA: an archiving system for managing and sharing large scientific data
LASDA: an archiving system for managing and sharing large scientific data JEONGHOON LEE Korea Institute of Science and Technology Information Scientific Data Strategy Lab. 245 Daehak-ro, Yuseong-gu, Daejeon
More informationArcGIS Enterprise: An Introduction. Philip Heede
Enterprise: An Introduction Philip Heede Online Enterprise Hosted by Esri (SaaS) - Upgraded automatically (by Esri) - Esri controls SLA Core Web GIS functionality (Apps, visualization, smart mapping, analysis
More informationAn InterSystems Guide to the Data Galaxy. Benjamin De Boe Product Manager
An InterSystems Guide to the Data Galaxy Benjamin De Boe Product Manager Analytics 3 InterSystems Corporation. All rights reserved. 4 InterSystems Corporation. All rights reserved. 5 InterSystems Corporation.
More informationAstrophysics with Terabytes. Alex Szalay The Johns Hopkins University Jim Gray Microsoft Research
Astrophysics with Terabytes Alex Szalay The Johns Hopkins University Jim Gray Microsoft Research Living in an Exponential World Astronomers have a few hundred TB now 1 pixel (byte) / sq arc second ~ 4TB
More informationA web portal to analyze and distribute cosmology data
on Hadoop https://cosmohub.pic.es A web portal to analyze and distribute cosmology data J.Carretero, P.Tallada, J.Casals, M.Caubet, C.Neissner, N.Tonello, J.Delgado, F.Torradeflot, M.Delfino, S.Serrano,
More informationCSE 190D Spring 2017 Final Exam Answers
CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationarxiv: v1 [cs.db] 21 Jun 2012
SkyQuery: An Implementation of a Parallel Probabilistic Join Engine for Cross-Identification of Multiple Astronomical Databases arxiv:1206.5021v1 [cs.db] 21 Jun 2012 László Dobos 1,2, Tamás Budavári 2,
More informationSpark and HPC for High Energy Physics Data Analyses
Spark and HPC for High Energy Physics Data Analyses Marc Paterno, Jim Kowalkowski, and Saba Sehrish 2017 IEEE International Workshop on High-Performance Big Data Computing Introduction High energy physics
More informationCase Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy
Case Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy Outline Motivation / Overview Participants / Industry Partners Documentation Architecture Current Status and Services
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationParallel Geospatial Data Management for Multi-Scale Environmental Data Analysis on GPUs DOE Visiting Faculty Program Project Report
Parallel Geospatial Data Management for Multi-Scale Environmental Data Analysis on GPUs 2013 DOE Visiting Faculty Program Project Report By Jianting Zhang (Visiting Faculty) (Department of Computer Science,
More informationUse and validation of the IAU Astronomy Thesaurus in ontologies
Use and validation of the IAU Astronomy Thesaurus in ontologies N. Hernandez, J. Mothe (IRIT) P. Dubois, S. Lesteven, F. Genova, S. Derriere (CDS) A. Preite Martinez (INAF) R&T work on ontologies at CDS
More informationRecommender Systems New Approaches with Netflix Dataset
Recommender Systems New Approaches with Netflix Dataset Robert Bell Yehuda Koren AT&T Labs ICDM 2007 Presented by Matt Rodriguez Outline Overview of Recommender System Approaches which are Content based
More informationUsage of the Astro Runtime
A PPARC funded project Usage of the Astro Runtime Noel Winstanley nw@jb.man.ac.uk AstroGrid, Jodrell Bank, UK AstroGrid Workbench A Rich GUI Client for the VO http://www.astrogrid.org/desktop Workbench
More informationVirtual Observatory Tools. Khadija EL Bouchefry
Virtual Observatory Tools Khadija EL Bouchefry AVN School -HartRAO- Feb 22, 2016 The Virtual Observatory VO What it the Virtual Observatory? What are VO tools? How can we use VO tools (for our Own research)
More informationMining Massive Data Sets With CANFAR and Skytree. Nicholas M. Ball Canadian Astronomy Data Centre National Research Council Victoria, BC, Canada
Mining Massive Data Sets With CANFAR and Skytree Nicholas M. Ball Canadian Astronomy Data Centre National Research Council Victoria, BC, Canada Collaborators David Schade (CADC) Alex Gray (Skytree and
More informationUsing in-vehicle Sensor Data for Naturalistic Driving Analysis
Using in-vehicle Sensor Data for Naturalistic Driving Analysis K. Zeitouni, I. Sandu Popa (University of Versailles) G. Saint Pierre, F. Dupin, S. Glaser (LCPC-INRETS) Outline Context Motivating applications
More informationHigh-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg
High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg common work with Nikolaus Glombiewski, Michael Körber, Marc Seidemann 1.
More informationSpatial Outlier Detection
Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point
More informationCopyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and
Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #14: Clustering Seoul National University 1 In This Lecture Learn the motivation, applications, and goal of clustering Understand the basic methods of clustering (bottom-up
More informationStructured Query Language for Virtual Observatory
Astronomical Data Analysis Software and Systems XIV ASP Conference Series, Vol. XXX, 2005 P. L. Shopbell, M. C. Britton, and R. Ebert, eds. P1-1-23 Structured Query Language for Virtual Observatory Yuji
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationScalable Tools - Part I Introduction to Scalable Tools
Scalable Tools - Part I Introduction to Scalable Tools Adisak Sukul, Ph.D., Lecturer, Department of Computer Science, adisak@iastate.edu http://web.cs.iastate.edu/~adisak/mbds2018/ Scalable Tools session
More informationSpark, Shark and Spark Streaming Introduction
Spark, Shark and Spark Streaming Introduction Tushar Kale tusharkale@in.ibm.com June 2015 This Talk Introduction to Shark, Spark and Spark Streaming Architecture Deployment Methodology Performance References
More informationCSE 190D Spring 2017 Final Exam
CSE 190D Spring 2017 Final Exam Full Name : Student ID : Major : INSTRUCTIONS 1. You have up to 2 hours and 59 minutes to complete this exam. 2. You can have up to one letter/a4-sized sheet of notes, formulae,
More informationOverview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::
Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional
More informationCarbonData: Spark Integration And Carbon Query Flow
CarbonData: Spark Integration And Carbon Query Flow SparkSQL + CarbonData: 2 Carbon-Spark Integration Built-in Spark integration Spark 1.5, 1.6, 2.1 Interface SQL DataFrame API Integration: Format Query
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationDeep Character-Level Click-Through Rate Prediction for Sponsored Search
Deep Character-Level Click-Through Rate Prediction for Sponsored Search Bora Edizel - Phd Student UPF Amin Mantrach - Criteo Research Xiao Bai - Oath This work was done at Yahoo and will be presented as
More informationApache Spark 2 X Cookbook Cloud Ready Recipes For Analytics And Data Science
Apache Spark 2 X Cookbook Cloud Ready Recipes For Analytics And Data Science We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing
More information