Preparing Digital Collections for Big Data Analysis. Sven Schlarb, Austrian Institute of Technology e-archiving, Cordoba, Spain 05 th October 2018
|
|
- Shana Gregory
- 5 years ago
- Views:
Transcription
1 Preparing Digital Collections for Big Data Analysis Sven Schlarb, Austrian Institute of Technology e-archiving, Cordoba, Spain 05 th October 2018
2 Digital Transformation Copyright Doc Searls,
3 Digital Transformation Copyright (network diagram) CC BY-SA 4.0
4 4 Archiving at internet scale
5 5 Is big data still a hype? 2014 BIG DATA Jeremykemp at English Wikipedia [GFDL ( or CC BY- SA 3.0 ( from Wikimedia Commons 05/10/2018
6 6 Is big data still a hype? 2015 Jeremykemp at English Wikipedia [GFDL ( or CC BY- SA 3.0 ( from Wikimedia Commons
7 7 Is big data still a hype? 2018 BIG DATA Jeremykemp at English Wikipedia [GFDL ( or CC BY-SA 3.0 ( from Wikimedia Commons
8 8 To SQL or to NoSQL? Relational databases NoSQL databases
9 Different Nosql database types K1 AAA,BBB,CCC K2 K3 K4 K5 AAA,BBB AAA,DDD AAA,2,01/01/2018 3,ZZZ,5623 Key-Value Wide Column Key Participant Conference ID Name City Name Address City 1 John London PVC2018 Townroad 2 Manchester 2 Linda Palme TFC2018 Market 2 Berlin NoSQL Databases { "name": "Sven Schlarb", " ": "sven.schlarbait.ac.at", "events": [ Event Graph Document { "name": "Kulturhackathon openglam.at", "date": " T00:00:00.000Z" Person }, { "name": "e-archving Cordoba", Person "date": " T00:00:00.000Z" } ] }
10 E-ARK Experimental Cluster Task Trackers Job Tracker Name Node CPU: 2 x 2.40GHz Quadcore CPU (16 HyperThreading cores) RAM: 24GB DISK: 3 x 1TB DISKs configured as RAID5 (Redundanz) 2 TB effective Data Nodes CPU: 1 x 2.53GHz Quadcore CPU (8 HyperThreading) RAM: 16GB DISK: 2 x 1TB DISKs configured as RAID0 (Performance) 2 TB effective Of 16 HT cores: 5 for Map; 2 for Reduce; 1 for OS. 25 processing cores for Map tasks 10 processing cores for Reduce tasks
11 Package transformation and Ingest Reference Implementation Full-text indexing & search Modular package transformation workflows & metadata creation Parallelize full-text indexing Access Fast random access to individual files Faceted Search & Data Mining Aggregating data using facet queries Data mining (Classification, NER)
12 E-ARK Information Package (simplified) SIP representations Metadata edits Migrations Add emulation info DIP metadata Structural metadata Provenance metadata Descriptive metadata Technical metadata SIP DIP [schemas/documentation]
13 earkweb earkweb is based on Phython and the Celery task execution system. Create archival workflows from predefined tasks which can be executed in parallel on a computer cluster. Examples are data validation, format migration, content extraction, database transformation, packaging, interfacing with storage systems. earkweb provides a graphical interface and can be used interactively as well as in batch mode.
14 Cluster Deployment Stack Information package status Task results <<search and retrieval>> decoupled <<notification>> Worker Worker Worker Worker Staging/Storage Area NAS <<package transfer>> 6/30/16
15 Standalone Deployment Stack Information package status Task results <<search and retrieval>> Worker Worker Worker Worker Staging/Storage Area NAS <<indexing>> 6/30/16
16 Data Mining/NLP Purpose: Analyse digital resources of collections Selected use cases: Location names occurring in texts. Named entity recognition and incorporation of geoinformation Text classification
17 Location names occurring in texts StanfordNER for NER nominatim (database behind openstreetmap.org) for georeferencing peripleo for visualization
18 Location names occurring in texts Peripleo - PELAGIOS Project
19 Geographical/timeline search Provided: GML data and TIFF images of maps with metadata (coordinate system, time, etc.) Convert GML data to Peripleo RDF Translate coordinate system if necessary Use peripleo to search for and visualize regions and filter by time Peripleo - PELAGIOS Project
20 Geographical/timeline search Peripleo - PELAGIOS Project
21 Text classification using scikit-learn Prepare data to train SVM classifier Dump full-texts of the repository into reusable packages Apply text classification and update SolR records accordingly
22 Database archiving, rebuilding and analysis e.g. Postgres SIARD e.g. Oracle RDBMS data (up to 80TB) Submit... Archive... Reconstruct... Analyse. source: wikipedia
23 Muchas Gracias por su atención! Hay preguntas?
Large Scale Processing with Hadoop
Large Scale Processing with Hadoop William Palmer Some slides courtesy of Per Møldrup-Dalum (State and University Library, Denmark) and Sven Schlarb (Austrian National Library) SCAPE Information Day British
More informationReal Time for Big Data: The Next Age of Data Management. Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104
Real Time for Big Data: The Next Age of Data Management Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104 Real Time for Big Data The Next Age of Data Management Introduction
More informationSystem Requirements EDT 6.0. discoveredt.com
System Requirements EDT 6.0 discoveredt.com Contents Introduction... 3 1 Components, Modules & Data Repositories... 3 2 Infrastructure Options... 5 2.1 Scenario 1 - EDT Portable or Server... 5 2.2 Scenario
More informationAnalytics Platform for ATLAS Computing Services
Analytics Platform for ATLAS Computing Services Ilija Vukotic for the ATLAS collaboration ICHEP 2016, Chicago, USA Getting the most from distributed resources What we want To understand the system To understand
More informationSYSTEM REQUIREMENTS M.APP ENTERPRISE
SYSTEM REQUIREMENTS M.APP ENTERPRISE Description or Document Category October 06, 2016 Contents M.App Enterprise Server... 3 Hardware requirements... 3 Disk space requirements... 3 Production environment
More informationBring Context To Your Machine Data With Hadoop, RDBMS & Splunk
Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may
More informationPerformance Baselines and Recommendations. September 7, 2018 Version 9.4
Performance Baselines and Recommendations September 7, 2018 Version 9.4 For the most recent version of this document, visit our documentation website. Table of Contents 1 Performance baselines and recommendations
More informationOKKAM-based instance level integration
OKKAM-based instance level integration Paolo Bouquet W3C RDF2RDB This work is co-funded by the European Commission in the context of the Large-scale Integrated project OKKAM (GA 215032) RoadMap Using the
More informationAutopsy as a Service Distributed Forensic Compute That Combines Evidence Acquisition and Analysis
Autopsy as a Service Distributed Forensic Compute That Combines Evidence Acquisition and Analysis Presentation to OSDFCon 2016 Dan Gonzales, Zev Winkelman, John Hollywood, Dulani Woods, Ricardo Sanchez,
More informationJure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah
Jure Leskovec (@jure) Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah 2 My research group at Stanford: Mining and modeling large social and information networks
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationvsphere Update Manager Installation and Administration Guide 17 APR 2018 VMware vsphere 6.7 vsphere Update Manager 6.7
vsphere Update Manager Installation and Administration Guide 17 APR 2018 VMware vsphere 6.7 vsphere Update Manager 6.7 You can find the most up-to-date technical documentation on the VMware website at:
More informationThe OAIS Reference Model: current implementations
The OAIS Reference Model: current implementations Michael Day, UKOLN, University of Bath m.day@ukoln.ac.uk Chinese-European Workshop on Digital Preservation, Beijing, China, 14-16 July 2004 Presentation
More informationSun Lustre Storage System Simplifying and Accelerating Lustre Deployments
Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems
More informationCan Enterprise Storage Fix Hadoop? PRESENTATION TITLE GOES HERE John Webster Senior Partner Evaluator Group
Can Enterprise Storage Fix Hadoop? PRESENTATIN TITLE GES HERE John Webster Senior Partner Evaluator Group Agenda What is the Internet Data Center and how is it different from Enterprise Data Center? How
More informationREACH-IT Stakeholder Workshop. REACH-IT Architecture
REACH-IT Stakeholder Workshop REACH-IT Architecture Aims of the presentation Introduce to the architecture of the REACH-IT application from different, complementary angles Functional [ Use Case and Logical
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationNational Documentation Centre Open access in Cultural Heritage digital content
National Documentation Centre Open access in Cultural Heritage digital content Haris Georgiadis, Ph.D. Senior Software Engineer EKT hgeorgiadis@ekt.gr The beginning.. 42 institutions documented & digitalized
More informationIvane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU
Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU Grid cluster at the Institute of High Energy Physics of TSU Authors: Arnold Shakhbatyan Prof. Zurab Modebadze Co-authors:
More informationSocrates: A System for Scalable Graph Analytics C. Savkli, R. Carr, M. Chapman, B. Chee, D. Minch
Socrates: A System for Scalable Graph Analytics C. Savkli, R. Carr, M. Chapman, B. Chee, D. Minch September 10, 2014 Cetin Savkli Cetin.Savkli@jhuapl.edu 240 228 0115 Challenges of Big Data & Analytics
More informationWhat's New In Informatica Data Quality 9.0.1
What's New In Informatica Data Quality 9.0.1 2010 Abstract When you upgrade Informatica Data Quality to version 9.0.1, you will find multiple new features and enhancements. The new features include a new
More informationTable 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti
Solution Overview Cisco UCS Integrated Infrastructure for Big Data with the Elastic Stack Cisco and Elastic deliver a powerful, scalable, and programmable IT operations and security analytics platform
More informationStreamlining CASTOR to manage the LHC data torrent
Streamlining CASTOR to manage the LHC data torrent G. Lo Presti, X. Espinal Curull, E. Cano, B. Fiorini, A. Ieri, S. Murray, S. Ponce and E. Sindrilaru CERN, 1211 Geneva 23, Switzerland E-mail: giuseppe.lopresti@cern.ch
More informationA never-ending database migration
A never-ending database migration Charles Delort IT-DB November 20, 2017 Table of Contents Years ago, decisions were made A few years later PostgreSQL Foreign Data Wrappers First step of Migration Apiato
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationMixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp
MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage
More informationThe webinar will start soon... Elasticsearch Performance Optimisation
The webinar will start soon... Performance Optimisation 1 whoami Alan Hardy Sr. Solutions Architect NEMEA 2 Webinar Housekeeping & Logistics Slides and recording will be available following the webinar
More informationAccelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators
WHITE PAPER Accelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents
More informationVCP410 VMware vsphere Cue Cards
VMware ESX 4.0 will only install and run on servers with 64-bit x86 CPUs. ESX 4.0 Requires 2GB RAM minimum ESX 4.0 requires 1 or more network adapters ESX 4.0 requires a SCSI disk, Fibre Channel LUN, or
More informationMigrating from FAST to EMC Documentum xplore: What To Do and Why You'll Love It. Ed Bueché EMC Distinguished Engineer and xplore Architect
Migrating from FAST to EMC Documentum xplore: What To Do and Why You'll Love It Ed Bueché EMC Distinguished Engineer and xplore Architect Agenda Introduction to xplore xplore 1.2 new capabilities FAST-to-xPlore
More informationData Capture Recommended Operating Environments
Oracle Insurance Data Capture Recommended Operating Environments Release 5.2 October 2014 CONTENTS STATEMENT OF PURPOSE... 3 OIDC Hardware Configuration Example... 4 OIDC Workflow Example... 5 QUICK VIEW...
More informationElasticsearch & ATLAS Data Management. European Organization for Nuclear Research (CERN)
Elasticsearch & ATAS Data Management European Organization for Nuclear Research (CERN) ralph.vigne@cern.ch mario.lassnig@cern.ch ATAS Analytics Platform proposed eb. 2015; work in progress; correlate data
More informationIBM Scale Out Network Attached Storage (SONAS) using the Acuo Universal Clinical Platform
IBM Scale Out Network Attached Storage (SONAS) using the Acuo Universal Clinical Platform A vendor-neutral medical-archive offering Dave Curzio IBM Systems and Technology Group ISV Enablement February
More informationGeneral Model of E-ARK Services
General Model of E-ARK Services DLM Forum Members Meeting 10-11 June 2014, Athens Istvan Alföldi National Archives of Hungary Agenda E-ARK General Model Conceptual framework Used methodology Results (not
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationMeridian. Technical Specifications
Meridian Technical Specifications Debt Management Unit Commonwealth Secretariat 2017 Commonwealth Secretariat, all rights reserved Copyright of the whole and any part of this document is owned by the Commonwealth
More informationLABEL ARCHIVE Administrator s Guide
LABEL ARCHIVE Administrator s Guide DOC-LAS2015_25/05/2015 The information in this manual is not binding and may be modified without prior notice. Supply of the software described in this manual is subject
More informationA Web Service for Scholarly Big Data Information Extraction
A Web Service for Scholarly Big Data Information Extraction Kyle Williams, Lichi Li, Madian Khabsa, Jian Wu, Patrick C. Shih and C. Lee Giles Information Sciences and Technology Computer Science and Engineering
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationResource and Performance Distribution Prediction for Large Scale Analytics Queries
Resource and Performance Distribution Prediction for Large Scale Analytics Queries Prof. Rajiv Ranjan, SMIEEE School of Computing Science, Newcastle University, UK Visiting Scientist, Data61, CSIRO, Australia
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationSharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment
SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment This document is provided as-is. Information and views expressed in this document, including
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationFuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc
Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,
More informationAustrian Statistical Datawarehouse (sdwh)
Eliane Schwerer Registers, Classifications and Geoinformation Geneva 11 th 13 th April 2018 Austrian Statistical Datawarehouse (sdwh) an application of the GSIM model www.statistik.at We provide information
More informationBig Data Facebook
Big Data Architectures@ Facebook QCon London 2012 Ashish Thusoo Outline Big Data @ Facebook - Scope & Scale Evolution of Big Data Architectures @ FB Past, Present and Future Questions Big Data @ FB: Scale
More informationCisco Unified Provisioning Manager 2.2
Cisco Unified Provisioning Manager 2.2 General Q. What is Cisco Unified Provisioning Manager (UPM)? A. Cisco Unified Provisioning Manager is part of the Cisco Unified Communications Management Suite. Cisco
More informationExtending the Scope of Custom Transformations
Paper 3306-2015 Extending the Scope of Custom Transformations Emre G. SARICICEK, The University of North Carolina at Chapel Hill. ABSTRACT Building and maintaining a data warehouse can require complex
More informationInternational Journal of Advance Engineering and Research Development. A Study: Hadoop Framework
Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja
More informationCisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr
Solution Overview Cisco UCS Integrated Infrastructure for Big Data and Analytics with Cloudera Enterprise Bring faster performance and scalability for big data analytics. Highlights Proven platform for
More informationFLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM
FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM RECOMMENDATION AND JUSTIFACTION Executive Summary: VHB has been tasked by the Florida Department of Transportation District Five to design
More informationIngo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA Copyright 2003, SAS Institute Inc. All rights reserved.
Intelligent Storage Results from real life testing Ingo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA SAS Intelligent Storage components! OLAP Server! Scalable Performance Data Server!
More informationTHIS ADDENDUM IS TO PROVIDE CLARIFICATION AND ANSWERS TO QUESTIONS THAT WERE SUBMITTED FOR THIS REP. QUESTIONS AND ANSWERS ARE ATTACHED.
TULSA COUNTY PURCHASING DEPARTMENT MEMO DATE: FROM: TO: SUBJECT: DECEMBER 15, 2015 LINDA R. DORRELL PURCHASING DIRECTOR BOARD OF COUNTY COMMI~ ADDENDUM #1- RFP- BACKUP AND RESTORAL SOLUTION ON NOVEMBER
More informationPhire 12.2 Hardware and Software Requirements
Phire 12.2 Hardware and Software Requirements Copyright 2017, Phire. All rights reserved. The Programs (which include both the software and documentation) contain proprietary information; they are provided
More informationSystem Requirements. Hardware and Virtual Appliance Requirements
This chapter provides a link to the Cisco Secure Network Server Data Sheet and lists the virtual appliance requirements. Hardware and Virtual Appliance Requirements, page 1 Virtual Machine Appliance Size
More informationMicrosoft SQL Server 2012 Fast Track Reference Architecture Using PowerEdge R720 and Compellent SC8000
Microsoft SQL Server 2012 Fast Track Reference Architecture Using PowerEdge R720 and Compellent SC8000 This whitepaper describes the Dell Microsoft SQL Server Fast Track reference architecture configuration
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationPinnacle 3 SmartEnterprise
Pinnacle 3 SmartEnterprise Pinnacle 3 SmartEnterprise centralized computing platform X6-2 specifications sheet Scalable capacity and robust healthcare IT integration for high volume clinics Built for high
More informationCC PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018
CC5212-1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018 Lecture 1: Introduction Aidan Hogan aidhog@gmail.com THE VALUE OF DATA Soho, London, 1854 Cholera: What we know now Cholera: What we knew in 1854 1854:
More informationOracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Oracle NoSQL Database and Cisco- Collaboration that produces results 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. What is Big Data? SOCIAL BLOG SMART METER VOLUME VELOCITY VARIETY
More informationvsphere Installation and Setup Update 2 Modified on 10 JULY 2018 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5
vsphere Installation and Setup Update 2 Modified on 10 JULY 2018 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5 You can find the most up-to-date technical documentation on the VMware website at:
More informationWhite Paper. The Architecture and Security of SAS Marketing Operations Management
White Paper The Architecture and Security of SAS Marketing Operations Management Contents Introduction... 1 High-Level Architecture Overview... 1 SAS Marketing Operations Management Foundation... 3 Marketing
More informationSession Two: OAIS Model & Digital Curation Lifecycle Model
From the SelectedWorks of Group 4 SundbergVernonDhaliwal Winter January 19, 2016 Session Two: OAIS Model & Digital Curation Lifecycle Model Dr. Eun G Park Available at: https://works.bepress.com/group4-sundbergvernondhaliwal/10/
More informationHPSS RAIT. A high performance, resilient, fault-tolerant tape data storage class. 1
HPSS RAIT A high performance, resilient, fault-tolerant tape data storage class http://www.hpss-collaboration.org 1 Why RAIT? HPSS supports striped tape without RAIT o Conceptually similar to RAID 0 o
More informationOracle Hospitality Materials Control. Server Sizing Guide
Oracle Hospitality Materials Control Server Sizing Guide Release 18.1 E96487-04 April 2019 Oracle Hospitality Materials Control Server Sizing Guide, Release 18.1 E96487-04 Copyright 1998, 2019, Oracle
More informationHDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung
HDFS: Hadoop Distributed File System CIS 612 Sunnie Chung What is Big Data?? Bulk Amount Unstructured Introduction Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per
More informationApplied Interoperability in Digital Preservation: Solutions from the E-ARK Project
Applied Interoperability in Digital Preservation: Solutions from the E-ARK Project Kuldar Aas National Archives of Estonia J. Liivi 4 Tartu, 50409, Estonia +372 7387 543 Kuldar.Aas@ra.ee Andrew Wilson
More informationMixing and matching virtual and physical HPC clusters. Paolo Anedda
Mixing and matching virtual and physical HPC clusters Paolo Anedda paolo.anedda@crs4.it HPC 2010 - Cetraro 22/06/2010 1 Outline Introduction Scalability Issues System architecture Conclusions & Future
More informationThis document lists hardware and software requirements for Connected Backup
Autonomy Connected Backup Version 8.8.0.2 Matrix Revision 1 This document lists hardware and software requirements for Connected Backup 8.8.0.2. Data Center This section lists the installation requirements
More informationCognos Dynamic Cubes
Cognos Dynamic Cubes Amit Desai Cognos Support Engineer Open Mic Facilitator Reena Nagrale Cognos Support Engineer Presenter Gracy Mendonca Cognos Support Engineer Technical Panel Member Shashwat Dhyani
More informationPolarion 18.2 Enterprise Setup
SIEMENS Polarion 18.2 Enterprise Setup POL005 18.2 Contents Overview........................................................... 1-1 Terminology..........................................................
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationThe power of centralized computing at your fingertips
Pinnacle 3 Professional The power of centralized computing at your fingertips Philips Pinnacle 3 Professional specifications The power of centralized computing in a scalable offering for mid-size clinics
More informationCisco Prime Home 6.X Minimum System Requirements: Standalone and High Availability
White Paper Cisco Prime Home 6.X Minimum System Requirements: Standalone and High Availability White Paper August 2014 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public
More informationColumnStore Indexes. מה חדש ב- 2014?SQL Server.
ColumnStore Indexes מה חדש ב- 2014?SQL Server דודאי מאיר meir@valinor.co.il 3 Column vs. row store Row Store (Heap / B-Tree) Column Store (values compressed) ProductID OrderDate Cost ProductID OrderDate
More informationIncremental Export of Relational Database Contents into RDF Graphs
National Technical University of Athens School of Electrical and Computer Engineering Multimedia, Communications & Web Technologies Incremental Export of Relational Database Contents into RDF Graphs Nikolaos
More informationStorage Solution : ONTAP Select. Netapp Interoperability Matrix
Storage Solution : Netapp Interoperability Matrix Search Criteria Solution; Platform ; Premium (FDvM300 ; ); Premium (FDvM300); Name Status Foot notes Platform 20171109-185338964 20171109-184927422 20171109-182752834
More informationLily 2.4 What s New Product Release Notes
Lily 2.4 What s New Product Release Notes WHAT S NEW IN LILY 2.4 2 Table of Contents Table of Contents... 2 Purpose and Overview of this Document... 3 Product Overview... 4 General... 5 Prerequisites...
More informationDell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III
[ White Paper Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III Performance of Microsoft SQL Server 2008 BI and D/W Solutions on Dell PowerEdge
More informationOracle Database 11g: New Features for Administrators DBA Release 2
Oracle Database 11g: New Features for Administrators DBA Release 2 Duration: 5 Days What you will learn This Oracle Database 11g: New Features for Administrators DBA Release 2 training explores new change
More informationIntroduction to MapReduce Algorithms and Analysis
Introduction to MapReduce Algorithms and Analysis Jeff M. Phillips October 25, 2013 Trade-Offs Massive parallelism that is very easy to program. Cheaper than HPC style (uses top of the line everything)
More informationNext Generation DWH Modeling. An overview of DWH modeling methods
Next Generation DWH Modeling An overview of DWH modeling methods Ronald Kunenborg www.grundsatzlich-it.nl Topics Where do we stand today Data storage and modeling through the ages Current data warehouse
More informationVMware vrealize Log Insight Getting Started Guide
VMware vrealize Log Insight Getting Started Guide vrealize Log Insight 2.5 This document supports the version of each product listed and supports all subsequent versions until the document is replaced
More informationOracle Big Data Fundamentals Ed 2
Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies
More informationDatabase Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu
Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based
More informationOrchestrating Big Data with Apache Airflow
Orchestrating Big Data with Apache Airflow July 2016 Airflow allows developers, admins and operations teams to author, schedule and orchestrate workflows and jobs within an organization. While it s main
More informationSMORE: A Cold Data Object Store for SMR Drives
SMORE: A Cold Data Object Store for SMR Drives Peter Macko, Xiongzi Ge, John Haskins Jr.*, James Kelley, David Slik, Keith A. Smith, and Maxim G. Smith Advanced Technology Group NetApp, Inc. * Qualcomm
More informationProject: Configure ArcGIS Server 10 using Microsoft Server 2008 Failover Cluster
July 25, 2012 Project: Configure ArcGIS Server 10 using Microsoft Server 2008 Failover Cluster Presented by Philip Dunn, Senior Consultant / Solution Architect POWER Engineers Established 1976 100% employee
More informationDigibess: thanks Islandora! Arcidosso Italy March, 20-22, Giancarlo Birello, Anna Perin IT office and Library CNR-Ceris
Digibess: thanks Islandora! Arcidosso Italy March, 20-22, 2013 Giancarlo Birello, Anna Perin IT office and Library CNR-Ceris BESS : group of 18 socioeconomic libraries in Piemonte (Italy) The libraries
More informationX-ray imaging software tools for HPC clusters and the Cloud
X-ray imaging software tools for HPC clusters and the Cloud Darren Thompson Application Support Specialist 9 October 2012 IM&T ADVANCED SCIENTIFIC COMPUTING NeAT Remote CT & visualisation project Aim:
More informationPreservation Planning in the OAIS Model
Preservation Planning in the OAIS Model Stephan Strodl and Andreas Rauber Institute of Software Technology and Interactive Systems Vienna University of Technology {strodl, rauber}@ifs.tuwien.ac.at Abstract
More informationLONG-TERM PRESERVATION OF DATABASES THE MEANINGFUL WAY
LONG-TERM PRESERVATION OF DATABASES THE MEANINGFUL WAY Janet Delve University of Portsmouth School of Creative Technologies Eldon Building, Winston Churchill Avenue, Portsmouth, PO12DJ, UK +44 2392 845524
More informationWorking with a Preservation Software Vendor - The Kentucky Experience Glen McAninch
Working with a Preservation Software Vendor - The Kentucky Experience Glen McAninch Kentucky Department for Libraries and Archives November 2014 Best Practices Exchange Montgomery, Alabama Who We Are Kentucky
More informationCisco Configuration Engine 2.0
Cisco Configuration Engine 2.0 The Cisco Configuration Engine provides a unified, secure solution for automating the deployment of Cisco customer premises equipment (CPE). This scalable product distributes
More information5 Fundamental Strategies for Building a Data-centered Data Center
5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse
More informationBusinessObjects Metadata Management XI 3.0 for Windows
BusinessObjects Metadata Management XI 3.0 for Windows Supported Platforms Overview Contents This document lists specific platforms and configurations for BusinessObjects Metadata Management XI 3.0 for
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationData Science with PostgreSQL
Balázs Bárány Data Scientist pgconf.de 2015 Contents Introduction What is Data Science? Process model Tools and methods of Data Scientists Business & data understanding Preprocessing Modeling Evaluation
More informationContract Information Management System (CIMS) Technical System Architecture
Technical System REVISION HISTORY REVISION NUMBER ISSUE DATE PRIMARY AUTHOR(S) NOTES 1.0 2/2015 Cheryl Kelmar Software: Kami Phengphet Engineer: Pornpat Nikamanon Architect: Jim Zhou Creation of CIMS document.
More informationPandektis: Implementing a repository of greek historical and cultural material with DSpace
Pandektis: Implementing a repository of greek historical and cultural material with DSpace Nikos Houssos Ilias Stavrakis Kostas Stamatis Ioanna-Ourania Stathopoulou Christina Paschou National Documentation
More information