GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content

Similar documents
Architecture Proposal

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Cloud Computing. Up until now

vcloud Automation Center Reference Architecture vcloud Automation Center 5.2

Grid Architectural Models

Grid Middleware and Globus Toolkit Architecture

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

The Integration of Grid Technology with OGC Web Services (OWS) in NWGISS for NASA EOS Data

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?

A Performance Evaluation of WS-MDS in the Globus Toolkit

vsan Remote Office Deployment January 09, 2018

Integration of Cloud and Grid Middleware at DGRZR

glite Grid Services Overview

Introduction to Grid Computing

Virtualization. Michael Tsai 2018/4/16

Native vsphere Storage for Remote and Branch Offices

FREE SCIENTIFIC COMPUTING

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

EsgynDB Enterprise 2.0 Platform Reference Architecture

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Introduction to SRM. Riccardo Zappi 1

The course modules of MongoDB developer and administrator online certification training:

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

Database Assessment for PDMS

Efficient On-Demand Operations in Distributed Infrastructures

Overview of HEP software & LCG from the openlab perspective

Understanding StoRM: from introduction to internals

Performance Analysis of Applying Replica Selection Technology for Data Grid Environments*

Xen and CloudStack. Ewan Mellor. Director, Engineering, Open-source Cloud Platforms Citrix Systems

70-414: Implementing an Advanced Server Infrastructure Course 01 - Creating the Virtualization Infrastructure

Deploying virtualisation in a production grid

Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Peer to Peer Systems and Probabilistic Protocols

Knowledge Discovery Services and Tools on Grids

High Throughput WAN Data Transfer with Hadoop-based Storage

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

By Ian Foster. Zhifeng Yun

Network Platform for Creating Services over Virtualized Networks

Open-source library and tools to support the OVF.

Profiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA

Course Content MongoDB

Data Staging: Moving large amounts of data around, and moving it close to compute resources

The glite middleware. Ariel Garcia KIT

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment

Sphinx: A Scheduling Middleware for Data Intensive Applications on a Grid

VMs at a Tier-1 site. EGEE 09, Sander Klous, Nikhef

GridFTP Scalability and Performance Results Ioan Raicu Catalin Dumitrescu -

A Decentralized Content-based Aggregation Service for Pervasive Environments

System Support for Internet of Things

Fusion Architecture. Planning for an on-premise deployment

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

Globus Toolkit 4 Execution Management. Alexandra Jimborean International School of Informatics Hagenberg, 2009

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

GridMonitor: Integration of Large Scale Facility Fabric Monitoring with Meta Data Service in Grid Environment

CLOUD COMPUTING IT0530. G.JEYA BHARATHI Asst.Prof.(O.G) Department of IT SRM University

A Case for High Performance Computing with Virtual Machines

Grid Computing Systems: A Survey and Taxonomy

SSIM Collection & Archiving Infrastructure Scaling & Performance Tuning Guide

Synergy: Sharing-Aware Component Composition for Distributed Stream Processing Systems

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

MapReduce. U of Toronto, 2014

GAIA CU6 Bruxelles Meeting (12-13 october 2006)

Xen Summit Spring 2007

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

VIProf: A Vertically Integrated Full-System Profiler

Kenneth A. Hawick P. D. Coddington H. A. James

Zhengyang Liu! Oct 25, Supported by NSF Grant OCI

ICAT Job Portal. a generic job submission system built on a scientific data catalog. IWSG 2013 ETH, Zurich, Switzerland 3-5 June 2013

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

White Paper. Major Performance Tuning Considerations for Weblogic Server

Seasonal forecast modeling application on the GARUDA Grid infrastructure

OpenNebula on VMware: Cloud Reference Architecture

Predicting the Performance of a GRID Environment: An Initial Effort to Increase Scheduling Efficiency

UX - User Experience: Multi-Cloud Network Visibility

CLIENT DATA NODE NAME NODE

Cisco HyperFlex Hyperconverged Infrastructure Solution for SAP HANA

SDS: A Scalable Data Services System in Data Grid

Grid Scheduling Architectures with Globus

Layered Architecture

SuccessMaker Learning Management System System Requirements v1.0

S.No QUESTIONS COMPETENCE LEVEL UNIT -1 PART A 1. Illustrate the evolutionary trend towards parallel distributed and cloud computing.

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds 1

Pivot3 Acuity with Microsoft SQL Server Reference Architecture

A QoS-Aware Middleware for Fault Tolerant Web Services

Cluster Setup and Distributed File System

SuccessMaker. System Requirements SuccessMaker v1.0

Systematic Cooperation in P2P Grids

Peer-to-Peer Internet Applications: A Review

Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4

AMGA metadata catalogue system

Transcription:

1st HellasGrid User Forum 10-11/1/2008 GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou School of ECE Computing Systems Laboratory National Technical University of Athens

Concept Text based search in audiovisual content Search results: Portions of video files containing selected keywords Example User searches for keyword Acropolis Video portions containing the spoken word Acropolis are located and presented in the user YouTube like functionality

Objectives Keyword extraction from video files using automatic speech recognition algorithms (ASR) Efficient and scalable distributed storage of large media content Indexing of extracted metadata for efficient keyword search YouTube like user interface for video searching/downloading Contribution to existing Grid Middleware using GGF standardized components

Addressed Issues Execution Distributed execution of CPU/Data intensive Speech Recognition Algorithms Storage Server load balancing using performance metrics Client transfer time optimization using bittorrent like algorithms Increase data availability Multi-organizational data storage support using Virtual Organizations (VOs)

Video file keyword extraction procedure MetaScheduler Application Software Storage Cluster Cluster Schedule Execution CVSP tool Data Client GET/PUT File Ops SE Export Cluster Import Metadata Parser SE SE Metadata File Update Metadata DB

Distributed Execution Platform Architecture MDS : Managing and Discovery Service PBS : Portable Batch Scheduler WSGRAM : Grid Resource and Allocation Management WN: Worker Node Central MDS Server User Interface GridWay Metascheduler Globus Toolkit 4 MDS Publish aggregate Cluster information SOAP/XML messages Computing Element PBS Scheduler Globus Toolkit 4 MDS + WSGRAM PBS client service Ganglia Tool Local MDS Server Local MDS Server Local MDS Server Ganglia Stats WN WN WN WN WN WN WN WN WN

Distributed Storage Platform Architecture MDS: Managing and Discovery Service GridFTP: Grid File Transfer Protocol DRLS: Distributed Replica Location Service SE: Storage Element PFN: Physical File Name LFN: Logical File Name DRLS Get/Put LFN to PFN mappings SOAP/XML Messages Pastry Peer to Peer Storage Subsystem Client Machine Query Storage Subsystem to obtain candidate storage servers using SOAP/XML Messages GridFTP Get/put SE SE Replication Algorithm using NWS stats SE GridNews DataClient component GridFTP Client Parallel Downloader SE GridFTP Server Network Weather Service (NWS) DRLSCom Service Globus WS Apache Axis Container

Distributed Replica Location Service Contains mappings of LFNs to PFNs DHT used: Pastry P2P Logarithmic routing In a network with n nodes, a query needs only log(n) messages (hops) Plaxton s algorithm minimizes query latencies Redundancy through replication Eliminates single point of failure situations Inherent load balancing capabilities Consistent hashing algorithms

Load Balancing Servers exchange load metrics CPU Bandwidth Free Disk Space Prediction algorithms (e.g. Linear regression) forecast future metrics from history data Weighted Normalized Metric WNM : Wm X(M t / M max ) Servers maintain sorted WNM list : [WNM 1,..WNM n ] WNM list periodically refreshed

Replication Upon a STORE client request: Top k servers are selected from WNM list k: configurable static replication factor Most suitable server is returned to the client Client initiates a single GridFTP file upload Server replicates the new file according to WNM list and factor k DRLS is informed about the new LFN->PFN mappings Client is informed Upon completion

Parallel Downloader Upon a GET client request: Server contacts DRLS and retrieves replica locations Client establishes N GridFTP connections Client initiates N parallel (threaded) small data chunk requests After each successful retrieval, client reinitiates another request Optimum file transfer time: The greater file portion is retrieved from the faster storage nodes

GridTorrent Metadata fields Current Id File size Piece size Hashes Distributed RLS instead of Tracker Partial GridFTP for actual transfer BitTorrent replica selection and tit-for-tat algorithm. Compatible with plain GridFTP servers PFN s prefix determines protocol (gtp://site.fully.qualified.d omain.name/path/to/file) size,piece_size, hashes, currentid Peer that wants to download a file New connection Bittorrent piece Peer1 GridTorrent client Bittorrent Bittorrent New connection piece publish pfns Striped GridFTP Peer2 GridTorrent client New connection Striped GridFTP piece piece piece Peer3 GridFTP Server Striped GridFTP piece Distributed RLS Peer set CurrentID :pfn1 pfn2 pfn3

EXISTING MIDDLEWARE CONTRIBUTION Design and development of a dynamic replica selection/placement algorithm Added support for multiple clusters using Gridway Metascheduler Enhanced bittorrent like file downloading from multiple sources Replace centralized replica location service with a scalable distributed peer to peer solution

Development Testbed Hardware 4X Dual Core AMD Opteron(tm) Processor 875 2.2GHz 8 virtual CPU 16Gb Ram Deployment of 5 Xen virtual machines, 2GB ram Software Globus Toolkit v4.0 Globus WS Core (Apache AXIS WS Container) Rice Pastry P2P (Java) Network Weather Service Torque (OpenPBS) scheduler GridWay Metascheduler

Virtualization Xen Hypervisor Paravirtualization tequnique Guest OS use special xen aware kernel Direct utilization of special CPU instructions Faster than full virtualization (VmWare) Use of Xen Hypervisor Easy prototype management/administration Simple control of the node lifecycle Facilitate prototype deployment in many actual nodes

Currently working Replace ParallelDownloader with GridTorrent Deploy prototype in the PlanetLab testbed Run experiments Fine-tune designed algorithms

Gridnews Portal Users can: Perform keyword search in the autoannotated multimedia content View the video from their browser in a youtube style Download only a fragment of the video where this keyword exists

Screenshots keyword Video URLs time position

Questions