MMT Modern Machine Translation

Size: px
Start display at page:

Download "MMT Modern Machine Translation"

Transcription

1 MMT Modern Machine Translation Second Design and Specifications Report Author(s): Davide Caroselli, Nicola Bertoldi, Mauro Cettolo, Marcello Federico Dissemination Level: Public Date: July 1 st, 2016

2 Grant agreement no Project acronym Project full title Funding scheme Coordinator Start date, duration Distribution MMT Modern Machine Translation MMT will deliver a language independent commercial online translation service based on a new open source machine translation distributed architecture Collaborative project Alessandro Cattelan (TRANSLATED) 1 January 2015, 36 months Public Contractual date of delivery April 1 st, 2016 Actual date of delivery July 1 st, 2016 Deliverable number 1.2 Deliverable title Type Status and version Second Design and Specifications Report Report Final Number of pages 20 Contributing partners WP leader Task leader Authors EC project officer The partners in MMT are: TRANSLATED, FBK TRANSLATED TRANSLATED Nicola Bertoldi, Davide Caroselli, Mauro Cettolo, Marcello Federico, David Madl, Luca Mastrostefano Saila Rinne Translated S.r.l. (TRANSLATED), Italy Fondazione Bruno Kessler (FBK), Italy The University of Edinburgh (UEDIN), United Kingdom TAUS B.V. (TAUS), The Netherlands 1

3 For copies of reports, updates on project activities and other MMT related information, contact: TRANSLATED MMT Alessandro Cattelan Via Nepal, 29 Phone: (+39) I Rome, Italy Fax: (+39) , Nicola Bertoldi, Davide Caroselli, Mauro Cettolo, Marcello Federico, David Madl, Luca Mastrostefano No part of this document may be reproduced or transmitted in any form, or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission from the copyright owner. 2

4 Table of Contents 1. Executive Summary 2. Introduction 3. Use Cases 3.1 General Requirements Linguistic requirements Functional requirements Performance requirements Computational requirements 4. Architecture Specifications 4.1 Distributed Infrastructure Fault tolerance Distribution of updates Scalability 5. Components Design and Specifications 5.1 Context Analyzer 5.2 Word Aligner 5.3 Adaptive Language Model 5.4 Adaptive Phrase Table 5.5 Text Processing Tokenizer and detokenizer XML Tag Manager 3

5 1. Executive Summary This document presents updated design and specifications for the ModernMT prototypes and final product, that will be developed during the final 18 months of the project. Starting from the addressed use cases, we outline the overall requirements of the ModernMT product, and then explain how these have to reflect in the design and specifications of the architecture and the main components developed in the project. 4

6 2. Introduction This report is an updated version of the previous deliverable D1.1 First Design and Specifications Report, which was prepared before developing the first prototype versions of the ModernMT software. This document in fact follows quite extensive development and testing phases performed first on a minimal viable product of ModernMT and then several versions of the first prototype. In particular, testing activities also included field tests with real potential customers represented by large IT companies. The field tests consisted of comparisons of current in house MT solution against ModernMT under fair comparison conditions: that is, same training data and MT quality evaluation carried out by external people. Experience gained from the field tests, discussions about requirements with the potential customers, and a preliminary market analysis conducted by the industrial partner, inspired a new list of requirements described in this document. This report is structured as follows. In Section 3, the two foreseen use cases of ModernMT are defined and then the main use case is described in detail. Then general requirements of the main use cases are defined in terms of linguistic, functional, performance, computational, and architectural requirements. In particular, linguistic requirements define the progression of translation directions that will be covered by ModernMT, functional requirements the core operations that it will permit to perform, performance requirements the level of MT quality to target, computational requirements the processing speed perceived by the user, and architectural requirements the way distributed processing and scalability should be improved during the second part of the project. In Section 4, the requirement for the ModernMT software architecture are described, covering, respectively, aspects related to distributed processing, reliability, updating policy, and scalability. Finally, Section 5 covers design and specifications of main components of the architecture that the project is developing around the Moses core platform, that is the context analyser, the word aligner, the language model manager, the translation model manager, and the pre and post processors. Specifications of each component are mainly defined in terms of required processing speed, by taking into account, where possible, the contribution of that component within the core operations previously defined at the use case level. 5

7 3. Use Cases ModernMT aims to develop an innovative solution for the translation industry, by providing both better MT quality for post editing as well as a better integration of MT with commercial CAT tools. Two use cases of ModernMT have been identified: (i) the enterprise use case, in which a language service provider or localisation department of a large company installs ModernMT to manage its translation workflow, (ii) the translator use case, in which single translators install the ModernMT plugin in their favorite CAT tool and use ModernMT as their preferred source of suggestions/matches for their daily workflow. This document mainly focuses on the translator or plugin use case, which seems the most promising from a commercial perspective, but also the most demanding in terms of requirements and specifications. In fact, we believe that the requirements of the enterprise use case are actually included in those of the plugin use case. The plugin use case assumes that professional translators can purchase a plugin that naturally integrates in their CAT tool and that provides suggestions from a machine translation engine that: Instantly adapts to the document they are translating Quickly learns from their data (translation memories) and their post editing work In terms of performance and usability, ModernMT should perform better than and be simpler to use than any available customizable commercial MT services (e.g. Microsoft Hub). At the same time, ModernMT should also perform better than popular online MT systems such as Google Translate. In order to meet the market demand implied by a CAT tool like MateCat, ModernMT should offer at least 60 language pairs, and support at least 10,000 active users, and the same order of magnitude of translation memories uploaded in the system. From a functional perspective, the ModernMT plugin should allow a user to perform the following operations: 1. Log in as a user 2. Upload one or more TMs 3. Connect ModernMT to a CAT tool through a key 4. Receive translation suggestions directly in the CAT tool 5. Seamlessly update ModernMT while using the CAT tool An important aspect related to ModernMT in general, and to the plugin scenario is particular, is the data privacy model offered to the customers/users that will upload their TMs through the 6

8 ModernMT plugin. While encouraging users to share their data for the sake of the overall machine translation quality, we will also foresee two data privacy options: Standard privacy : data from one user can be used to generate machine translations for other users but the source will be hidden to them. In other words, users will not know the origin of the translation fragments (phrases) that ModernMT used to assembly the machine translation of their text. Strong privacy : data supplied by a user will not be accessed to generate translations for another user. While the first level of privacy will be guaranteed by default, the strong privacy modality will be offered as optional. 3.1 General Requirements The use case introduced in the previous section implies requirements at different levels: linguistic, functional, performance, computational, and architectural. We briefly go through each of them Linguistic requirements We define the following progression in terms of required language coverage and language resources required for each language pair. The exact language pairs will be determined on the basis of commercial criteria by the industrial partner of the project (Translated). Date Languages Resources 2016 Q2 5 language pairs 200 million parallel words 5 billion monolingual words 2016 Q4 15 language pairs 200 million parallel words 5 billion monolingual words 2017 Q2 30 language pairs 200 million parallel words 5 billion monolingual words 2017 Q4 60 language pairs 200 million parallel words 5 billion monolingual words 7

9 3.1.2 Functional requirements The plugin scenario introduces the following functional requirements on the back end of ModernMT which are here presented with their timeline. Date Function Description 1 Q Q Fast training Context aware translation Incremental training Data privacy models ModernMT can be quickly trained from scratch starting from a collection of parallel and monolingual data. ModernMT translates segments by considering the context in which they occur. ModernMT can be updated by adding a new TM to the available data, w/o restarting the system. Both standard and strong privacy models can be applied. Q Online learning ModernMT can be updated from a stream of post edited data. ModernMT knowns where the data comes from and where to add them Performance requirements Requirements for translation quality are described below for two scenario: (i) the user does not provide a TM to translate a document, (ii) the user provides a TM to translated a document. For each condition, we have identified two commercial reference competitors, respectively offering a generic online system and a customizable online system. Our goal is to offer better translation quality than competitors in terms of automatic scores (BLEU) as well as human evaluation, based on quality ranking. Performance will be measured on benchmarks based on real translation projects. Improvements have to be statistically significant. 1 This requirement has been achieved by the time of writing this deliverable. 8

10 Condition Target Description Translation w/o TM Translation w TM Better than reference online MT Better that reference online and commercial customizable engines For all language pairs, and available industrial benchmarks For all language pairs, and available industrial benchmarks Computational requirements From the user perspective, not only translation quality counts. The user experience is strongly related to how smoothly and seamlessly ModernMT integrates in the human workflow. In particular, we are concerned with the perceived response time of ModernMT during training, online learning and translating. Below are our requirements. Function Performance User experience TM upload time 1 million words in 30 sec Training time must be comparable to upload time of user data. Translation time Online learning < 5 sec / segment (15 words on average) < 5 sec / segment (15 words on average) Translation time should not cause any delay in the workflow. Translation time should allow prefetching at steps of one segment. Updates during online learning should be effective from the second next segment. 9

11 3.1.5 Architectural requirements The following requirements are related to the deployment of the ModernMT architecture as an online commercial service. Requirement are listed according to a temporal progression, that gives priority first to the functional/performance aspects and then to the scalability aspects. Notice that ultimate goal is to permit scalability up to 10K users and 100K TMs. Date Feature Description Q Distributed workload Replicated static models Workload is automatically balanced among a fixed number of nodes. No scalability is allowed. Uses file based models Q Replicated dynamic models File based models can be efficiently updated with new data. Q Q Elastic architecture with scalability Shared dynamic models Efficient scalability Elastic with respect to workload (users). Performs efficient resync of models across nodes when models are updated. Scalability by replicating resources. Distributed models sharing information across the cluster. Scalability with efficient use of resources. 4. Architecture Specifications The final architecture of ModernMT must be designed with the goal of fulfilling the requirements shown in the previous section. The general guidelines imply an infrastructure capable of operating with multiple language pairs, scalable and resilient in the sense that it must be fault tolerant and it has to support a dynamic reorganization of the cluster in order to redistribute resources where most needed. 10

12 As a corollary of the main use case, the architecture of ModernMT must support real time incremental learning: this is a fundamental step for the project s goals and a very important challenge for the overall interaction between the components both at high level and low level abstraction. We have structured the Translation Engine s models into two parts: background model and foreground model. The foreground model is a lightweight, incremental data structure that holds customers data, which is used to adjust the probabilities of the background model, a large, static and immutable data structure trained, once per language pair, on a large amount of data collected from the web. This design can be found in Word Aligner, Multiplexed Language Model and Suffix Array Phrase Table. The following sections show in details the design choices of each component and the improvements made respect to the previous architecture. 4.1 Distributed Infrastructure One of the goals of this project is to design a Translation Technology capable of handling the high workload of a platform with thousands of users. Only an efficient and fault tolerant distributed infrastructure can achieve such goal: redistributing customers translation requests to the whole cluster is the best option. But in our use case there is also another important piece of information that must be spread across the whole cluster, that is translators feedback and customers TMs that must be delivered to the translation engines with particular attention to the data consistency. Failing to do that, will result in a cluster that cannot ensure a replicable behaviour with possible loss in translation quality and, even worse, errors in request handling Fault tolerance Fault tolerance is a generic design principle that indicates the ability of a system to recover after an unexpected condition that has produced an error. In this paragraph we present the design of ModernMT in order to be fault tolerant in two distinct situations: a translation request error and an shutdown of a cluster node. Translation error. The inability to complete a translation request by the engine must be handled in a way that the user is not alerted with an error message until it is really necessary. A translation error could occur due a temporary situation that prevented the system from operating properly, or a deterministic error due the request itself. In the first case, the engine must silently retry the translation without prompting an error to the end user, while in the second the error must be reported immediately to the user. This strategy gives to the system the ability to recover after a temporary problem and reduce the number of errors thrown during execution. 11

13 Cluster node shutdown. The ability of a cluster to recover after one of its nodes has halted unexpectedly depends on the its topology. The one that ModernMT will implement must allow our product to dynamically redistribute the workload even if one or more nodes shutdown, without corrupting the cluster or preventing its normal operation. More in detail the infrastructure must not have a single point of failure: a configuration that stops if a particular node halts (i.e. Master Slave configuration could lead to a global inability to operate if the Master node goes down). Furthermore the ModernMT cluster must be able reintegrate the broken node once it has fully recovered and seamlessly restore the original cluster operation Distribution of updates In the ModernMT use case there are two source of updates: the user uploading a new private TM and the translator who is translating documents while working on a translation job. Both data sources are an unbounded stream of parallel segments with source domain information attached; more precisely in the first case the domain is brand new, while in the latter the user is appending new segments to an existing domain. This stream of data must be delivered by the distributed infrastructure to every node of the cluster ensuring data consistency. This means that a node that had a temporary problem, and lost real time updates for a particular time window, must be able to reconnect to the data stream from the last update received before crashing. Figure 4.1 Figure 4.1 shows a possible implementation of an infrastructure that meets the requirements listed above. The update stream, made by both atomic segments or entire TMs, is backed into a distributed persistent queue; every node can read from the queue starting from any point in the past. This design allows: Regular nodes to receive the contributions from the users in real time. 12

14 A recovering node to start reading updates from the latest checkpoint before crashing. A brand new node joining the cluster to build its models starting from the beginning of the queue Scalability The scalability of a distributed system is the key for containing the costs of the infrastructure. Being able to redistribute resources when and where most needed allows the cluster to avoid wasting allocated memory and computational power for language pairs with less traffic. The ModernMT distributed architecture should support cluster resizing and language pair redistribution. The size of the cluster can change due workload variation during a time period (i.e. daily or weekly); being able to allocate new nodes only when needed and to shut them down as soon as they become useless can reduce the cost of the infrastructure in a cloud environment. On the other hand allowing the architecture to give more computational power to those languages that are used most will allow the system to better use its hardware and avoiding wasting too many resources for rarely used language combinations. Cluster resizing. The dynamic resizing of a fault tolerant cluster does not add complexity to the infrastructure: in fact a new node joining the cluster can be managed as a node that is recovering after an unexpected shutdown. Similarly, shutting down a node when it s not needed anymore can be managed in the same way the cluster handles a node shutting down unexpectedly. Language pairs balancing. Not all the language pairs have the same rate of translation requests, the ModernMT distributed infrastructure should be able to dynamically allocate more resources to the language pairs that are used most, while reducing resources for the rare language pairs. This optimization can be implemented by analyzing the traffic and the efficiency of the system and allowing it to dynamically load and unload translation engines resources (i.e. Language Model, Translation Model). For example, if the system detects an overload for a particular language pair, it should reassign resources to it evicting less utilized language pairs models. 5. Components Design and Specifications The following diagram sketches the ModernMT architecture and components developed during the project. In the following, we will concentrate on design and specifications of each component but the ModernMT core architecture, that was addressed in the previous section, and the Decoder, which will actually not be changed from the actual implementation available in Moses. 13

15 5.1 Context Analyzer One of the main goals of the ModernMT project is to provide context dependent machine translation. The Context Analyzer is the component responsible for this process: it analyzes the context provided by the user and through IR algorithms computes weights that will be used to bias the behaviour of the machine translation components in order to generate a contextualized translation of the input sentence. The Context Analyzer module is trained with the source documents during the training phase of ModernMT. It can be queried either through a REST API or natively in Java and, given as input either a text, a few sentences or a path to a local textual file, it produces as output a distribution of weights of the most similar documents. The current version of the Context Analyzer computes the cosine similarity in a word space model between the input text and the trained documents in order to estimate the real similarity of those documents. A query to the Context Analyzer index should take less than 300ms. The Context Analyzer will be able to delete or update a TM that has been previously added to the index during the training phase or to add a new one at runtime. Updating the content of a TM must take less than 1s. 5.2 Word Aligner Word Aligner is the ModernMT module which performs the word alignment of parallel sentences required by the MT module for building its models, and by the tag management module (see Section 5.5.2) for re inserting markup tags in the translated text. 14

16 Word Aligner applies to a stream of sentence pairs, and generates a stream of triplets including the original texts (for convenience) and the set of links between source and target words. To this purpose the module exploits a pre trained word alignment model. Word Aligner is also able to estimate a word alignment model either from scratch or incrementally, given a parallel corpus. Word Aligner estimates two directional models (source to target and target to source) first, and then combines them into a one bidirectional (symmetrized) model. In the incremental modality, when new parallel become available, Word Aligner adapts its models, already estimated on the previously existing data, to the new data avoiding the re estimation from scratch on old and new data. Word Aligner provides its functionality for estimating word alignment model and for aligning new sentence pairs by means of APIs and standalone executables. In order to be compliant with the computational requirements of the ModernMT system stated in Section 3.1.4, the Word Aligner component is expected to satisfy the following speed requirements: The alignment of a segment pair of 15 word average length should take less than 1 ms ; The overall estimation of the symmetrized word alignment model should take less than 20s for a corpus of 1M running words and 100K sentence pairs; The loading of pre estimated models trained on 100M running words should take less than 30 seconds. Word Aligner is expected to satisfy the following quality requirements: The translation quality achieved by the decoder exploiting the word alignment model should be not below 2 BLEU points than that obtainable with GIZA++ up to model Adaptive Language Model The Adaptive L anguage Model (LM) module computes the LM scores of the target fragments, required by the decoder to compute the overall score of the translation alternatives. As already mentioned in Section 4.1.2, in the ModernMT use case all data are partitioned into pre defined domains. New training data gathered from customers during the life cycle of the system are associated to either an existing or a brand new domain. The Adaptive LM relies on this partition to create its model. 15

17 In the bootstrap phase, the Adaptive LM estimates a language model from scratch exploiting the domain specific monolingual target corpora. In the incremental modality, when new target texts become available, the Adaptive LM shall adapt avoiding the re estimation from scratch on old and new data. If the new data are associated to an existing domain, the Adaptive LM appends the new data to the old data, and re estimates a new domain specific LM; when ready, the LM module replaces the old foreground LM. Otherwise, if the new data belong to a brand new domain, the Adaptive LM build a new foreground LM for the new domain using the new data, and adds it to the ensemble of domain specific LMS. Since the new training data sets are usually small, the Adaptive LM does not update the background LM, because the impact of 2 the new data on it is likely negligible. In the actual design, the bootstrap and incremental training phases are kept independent from the runtime queries. This solution has several advantages: The system is always active and ready to serve translation requests; The incremental training can be performed in parallel without blocking the translation service, and possibly on different machines without overloading those exploited for the translation service; The training of the domain specific LMs can be performed in parallel; The replacement of the foreground LMs is an almost atomic operation, which does not require switching off and on the system. We intend to support an overall number of 1,000 domains initially in 2016, and to extend this figure to the planned number of active users (10,000, see Section 3). In order to cope with the overall computational requirements of MateCat, the Adaptive LM is expected to satisfy the following speed requirements: The estimation of a domain specific LM should take less than 5s for a corpus of 1M running words; The Adaptive LM should provide query response time compatible with the overall translation time constraint. 2 In any case, the system administrator can always decide to re run the bootstrap training phase including new data as well. 16

18 5.4 Adaptive Phrase Table Based on an index of the bilingual text training corpus, the Adaptive Phrase Table module provides phrase translations and translation probabilities on the fly. As opposed to a static phrase table, this design allows domain sensitive probabilities to be computed on the fly, based on domain information passed to the decoder at run time as a domain probability distribution. The existing suffix array based Phrase Table implementation in Moses builds a static index of the training corpus on disk, which is queried in a read only mode. For this reason, the previous design of the suffix array prevents easy addition of training data to the index. For the plugin use case, the Adaptive Phrase Table has to support incremental addition of new training data. The incremental training data arrives as a stream of individual segment pairs (see Section 4.1). The ability to incrementally add segment pairs naturally extends the use case of adding an entire Translation Memory for a new project/customer. New training data should influence the possible phrase translations and their probabilities immediately after its addition. This influence allows ModernMT to adapt to individual translators domain of text while they are working. Therefore, the index of the training corpus must be incrementally updatable. A single engine supports multiple translators working at the same time. Therefore, both the updates and the index itself should carry information about which domain the segments belong to. We intend to support an overall number of 1,000 domains initially in 2016, and to extend this figure to the planned number of active users (10,000, see Section 3). Finally, the Adaptive PT should also support the strong privacy data model that forbids sampling from private translation memories. For practicality, the time for adding a new medium sized Translation Memory (1 million words) to the training corpus and incremental index should not take more than 30 seconds (write latency, see Section 3.1.4). Ideally, a Phrase Table on a single cluster node (see Section 4.1.2) should be able to support thousands of different translators in terms of write throughput and aforementioned write latency, so scaling becomes possible even with few cluster nodes. All the while, the Phrase Table must continue to provide read access performant enough to permit the overall translation latency goal of less than 5s per segment. 17

19 5.5 Text Processing Tokenizer and detokenizer The tokenizer and detokenizer are the two components of the pre and pro processing pipelines that most of all are language dependent. Both need rules or models for tokenization that must be customized for every single language. While in the project there are already tokenizers for 45 languages, for all but two languages we still need to evaluate and optimize these components on real benchmarks. Evaluating the tokenizers and the detokenizers in isolation is not very reliable: as their ultimate goal it to positively impact on the overall translation process. Moreover, the proper coupling of the two components is also important: a good tokenizer should provide enough information to the detokenizer in order to reconstruct the original sentence. The latter, in fact, has the duty to join the tokens, that the tokenizer has produced during the training, into a valid sentence. A too heavy tokenization, for example, can turn the detokenizer process into an very hard task, sometimes even impossible. We define two different tests for tokenization/detokenization evaluation; the first aims to estimate the proper coupling of the two components, while the second evaluates the utility of a candidate implementation in terms of translation quality. The simplest test is to tokenize and detokenize some text and to compare the result with the original. The less are the editings needed to recover the original text, the better is the tokenizer/detokenizer implementation. A more expensive test is to define a benchmark for training, tuning and evaluating the translations the ModernMT system. Improvements in the (de )tokenization process should lead to a higher translation quality. In order to also evaluate the quality of the post processed text, we consider both the BLEU and the Post Editing score. The first test should be used to quickly evaluate and iterate over different implementations in order to find out which one is the more promising. Only the second test however can definitively prove that the evaluated implementation lead to real enhancement in the translation quality of the ModernMT engine. In order to be compliant with the overall computational requirements of the ModernMT system, the (de )tokenization steps should take less than 2s for a corpus of 1M running words. 18

20 5.5.2 XML Tag Manager An important requirement in the translation industry is the capability to reproduce the layout of the input document in the output document as faithfully as possible. In particular, XML tags like formatting tags should be re inserted in the correct positions, and spaces of any type (tabs, multiple and hidden spaces) should be reproduced perfectly. XML Tag Manager is the ModernMT module which addresses this tasks. For instance, assuming that the English sentence Who is the ModernMT Team? translates into Chi è il gruppo ModernMT?, the tagged input Who is the <i><b>modernmt</b> Team</i>? XML Tag Manager should produce the tagged output: Chi è il <i>gruppo <b>modernmt</b></i>? Where the XML formatting tag <i>and <b>bend the correct fragments gruppo ModernMT and ModernMT, respectively. Notice that XML tags can be nested, overlapped, or even self contained. In the current version of ModernMT system, XML Tag Manager: identifies, classifies and stores tags and spaces in the input sentence; removes tags; transforms spaces into standard spaces; handles characters with unusual encoding; sends the input to the MT decoder and receives the output translation, which does not include any tag and has standard spaces; sends input and output to a word aligner and receives back the word alignments in both forward and backward directions; symmetrizes forward and backward alignment; re inserts the stored tags and spaces in the best positions suggested by the symmetrized word alignment. XML Tag Manager currently does not handle errors in the character encoding of the input. XML Tag Manager is expected to have an overall accuracy above 80% for all language pairs taken into account by ModernMT system, where the overall precision is the percentage of sentences which are correct in terms of number and positions of XML tags, and white spaces among all sentences without character encoding errors. 19

Marcello Federico MMT Srl / FBK Trento, Italy

Marcello Federico MMT Srl / FBK Trento, Italy Marcello Federico MMT Srl / FBK Trento, Italy Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 Page 207 Symbiotic Human and Machine Translation

More information

9 Reasons To Use a Binary Repository for Front-End Development with Bower

9 Reasons To Use a Binary Repository for Front-End Development with Bower 9 Reasons To Use a Binary Repository for Front-End Development with Bower White Paper Introduction The availability of packages for front-end web development has somewhat lagged behind back-end systems.

More information

Scalable Streaming Analytics

Scalable Streaming Analytics Scalable Streaming Analytics KARTHIK RAMASAMY @karthikz TALK OUTLINE BEGIN I! II ( III b Overview Storm Overview Storm Internals IV Z V K Heron Operational Experiences END WHAT IS ANALYTICS? according

More information

Migration. 22 AUG 2017 VMware Validated Design 4.1 VMware Validated Design for Software-Defined Data Center 4.1

Migration. 22 AUG 2017 VMware Validated Design 4.1 VMware Validated Design for Software-Defined Data Center 4.1 22 AUG 2017 VMware Validated Design 4.1 VMware Validated Design for Software-Defined Data Center 4.1 You can find the most up-to-date technical documentation on the VMware Web site at: https://docs.vmware.com/

More information

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo Document Sub Title Yotpo Technical Overview 07/18/2016 2015 Yotpo Contents Introduction... 3 Yotpo Architecture... 4 Yotpo Back Office (or B2B)... 4 Yotpo On-Site Presence... 4 Technologies... 5 Real-Time

More information

MMT Modern Machine Translation. First Evaluation Plan

MMT Modern Machine Translation. First Evaluation Plan This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No 645487. MMT Modern Machine Translation First Evaluation Plan Author(s):

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

Highly Available Forms and Reports Applications with Oracle Fail Safe 3.0

Highly Available Forms and Reports Applications with Oracle Fail Safe 3.0 Highly Available Forms and Reports Applications with Oracle Fail Safe 3.0 High Availability for Windows NT An Oracle Technical White Paper Robert Cheng Oracle New England Development Center System Products

More information

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

! Design constraints.  Component failures are the norm.  Files are huge by traditional standards. ! POSIX-like Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total

More information

MOC 6232A: Implementing a Microsoft SQL Server 2008 Database

MOC 6232A: Implementing a Microsoft SQL Server 2008 Database MOC 6232A: Implementing a Microsoft SQL Server 2008 Database Course Number: 6232A Course Length: 5 Days Course Overview This course provides students with the knowledge and skills to implement a Microsoft

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in

More information

How VoltDB does Transactions

How VoltDB does Transactions TEHNIL NOTE How does Transactions Note to readers: Some content about read-only transaction coordination in this document is out of date due to changes made in v6.4 as a result of Jepsen testing. Updates

More information

OL Connect Backup licenses

OL Connect Backup licenses OL Connect Backup licenses Contents 2 Introduction 3 What you need to know about application downtime 5 What are my options? 5 Reinstall, reactivate, and rebuild 5 Create a Virtual Machine 5 Run two servers

More information

Evaluation and lessons learnt from scenario on Real-time monitoring, reporting and response to security incidents related to a CSP

Evaluation and lessons learnt from scenario on Real-time monitoring, reporting and response to security incidents related to a CSP Secure Provisioning of Cloud Services based on SLA Management SPECS Project - Deliverable 5.2.1 Evaluation and lessons learnt from scenario on Real-time monitoring, reporting and response to security incidents

More information

Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's

Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's Building Agile and Resilient Schema Transformations using Apache Kafka and ESB's Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's Ricardo Ferreira

More information

MarkLogic Server. Scalability, Availability, and Failover Guide. MarkLogic 9 May, Copyright 2018 MarkLogic Corporation. All rights reserved.

MarkLogic Server. Scalability, Availability, and Failover Guide. MarkLogic 9 May, Copyright 2018 MarkLogic Corporation. All rights reserved. Scalability, Availability, and Failover Guide 1 MarkLogic 9 May, 2017 Last Revised: 9.0-4, January, 2018 Copyright 2018 MarkLogic Corporation. All rights reserved. Table of Contents Table of Contents Scalability,

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

STORM AND LOW-LATENCY PROCESSING.

STORM AND LOW-LATENCY PROCESSING. STORM AND LOW-LATENCY PROCESSING Low latency processing Similar to data stream processing, but with a twist Data is streaming into the system (from a database, or a netk stream, or an HDFS file, or ) We

More information

Fault Tolerance in K3. Ben Glickman, Amit Mehta, Josh Wheeler

Fault Tolerance in K3. Ben Glickman, Amit Mehta, Josh Wheeler Fault Tolerance in K3 Ben Glickman, Amit Mehta, Josh Wheeler Outline Background Motivation Detecting Membership Changes with Spread Modes of Fault Tolerance in K3 Demonstration Outline Background Motivation

More information

Finding a needle in Haystack: Facebook's photo storage

Finding a needle in Haystack: Facebook's photo storage Finding a needle in Haystack: Facebook's photo storage The paper is written at facebook and describes a object storage system called Haystack. Since facebook processes a lot of photos (20 petabytes total,

More information

PHP Composer 9 Benefits of Using a Binary Repository Manager

PHP Composer 9 Benefits of Using a Binary Repository Manager PHP Composer 9 Benefits of Using a Binary Repository Manager White Paper Copyright 2017 JFrog Ltd. March 2017 www.jfrog.com Executive Summary PHP development has become one of the most popular platforms

More information

SyncBreeze FILE SYNCHRONIZATION. User Manual. Version Dec Flexense Ltd.

SyncBreeze FILE SYNCHRONIZATION. User Manual. Version Dec Flexense Ltd. SyncBreeze FILE SYNCHRONIZATION User Manual Version 10.3 Dec 2017 www.syncbreeze.com info@flexense.com 1 1 SyncBreeze Overview...3 2 SyncBreeze Product Versions...5 3 Product Installation Procedure...6

More information

D5.6 - Evaluation Benchmarks

D5.6 - Evaluation Benchmarks Th is document is part of the Project Machine Tr a n sla tion En h a n ced Com pu ter A ssisted Tr a n sla tion (Ma teca t ), fu nded by the 7th Framework Programme of th e Eu r opea n Com m ission th

More information

Europeana Core Service Platform

Europeana Core Service Platform Europeana Core Service Platform DELIVERABLE D7.1: Strategic Development Plan, Architectural Planning Revision Final Date of submission 30 October 2015 Author(s) Marcin Werla, PSNC Pavel Kats, Europeana

More information

ECS High Availability Design

ECS High Availability Design ECS High Availability Design March 2018 A Dell EMC white paper Revisions Date Mar 2018 Aug 2017 July 2017 Description Version 1.2 - Updated to include ECS version 3.2 content Version 1.1 - Updated to include

More information

The Google File System (GFS)

The Google File System (GFS) 1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

Parallel Programming Patterns Overview and Concepts

Parallel Programming Patterns Overview and Concepts Parallel Programming Patterns Overview and Concepts Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

Comprehensive Guide to Evaluating Event Stream Processing Engines

Comprehensive Guide to Evaluating Event Stream Processing Engines Comprehensive Guide to Evaluating Event Stream Processing Engines i Copyright 2006 Coral8, Inc. All rights reserved worldwide. Worldwide Headquarters: Coral8, Inc. 82 Pioneer Way, Suite 106 Mountain View,

More information

Intellicus Cluster and Load Balancing- Linux. Version: 18.1

Intellicus Cluster and Load Balancing- Linux. Version: 18.1 Intellicus Cluster and Load Balancing- Linux Version: 18.1 1 Copyright 2018 Intellicus Technologies This document and its content is copyrighted material of Intellicus Technologies. The content may not

More information

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS Dr Adnene Guabtni, Senior Research Scientist, NICTA/Data61, CSIRO Adnene.Guabtni@csiro.au EC2 S3 ELB RDS AMI

More information

Client Server & Distributed System. A Basic Introduction

Client Server & Distributed System. A Basic Introduction Client Server & Distributed System A Basic Introduction 1 Client Server Architecture A network architecture in which each computer or process on the network is either a client or a server. Source: http://webopedia.lycos.com

More information

EUDAT Registry Overview for SAF (26/04/2012) John kennedy, Tatyana Khan

EUDAT Registry Overview for SAF (26/04/2012) John kennedy, Tatyana Khan EUDAT Registry Overview for SAF (26/04/2012) John kennedy, Tatyana Khan Introduction: The Purpose of this document is to provide a more detailed overview of the EUDAT Registry status and plans and to request

More information

GOOGLE LAUNCHES BETA VERSION OF BRAND NEW SEARCH CONSOLE YOUTUBE S NEW MONETIZATION POLICY, NOT A BAD DECISION

GOOGLE LAUNCHES BETA VERSION OF BRAND NEW SEARCH CONSOLE YOUTUBE S NEW MONETIZATION POLICY, NOT A BAD DECISION GOOGLE LAUNCHES BETA VERSION OF BRAND NEW SEARCH CONSOLE YOU CAN NOW ADD VIDEOS TO YOUR GOOGLE MY BUSINESS LISTING PAGE SPEED WILL OFFICIALLY BE A RANKING FACTOR IN GOOGLE FROM JULY 2018 IMPORTANT GOOGLE

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW

More information

Information Security Policy

Information Security Policy Information Security Policy Information Security is a top priority for Ardoq, and we also rely on the security policies and follow the best practices set forth by AWS. Procedures will continuously be updated

More information

Developing Microsoft Azure Solutions (MS 20532)

Developing Microsoft Azure Solutions (MS 20532) Developing Microsoft Azure Solutions (MS 20532) COURSE OVERVIEW: This course is intended for students who have experience building ASP.NET and C# applications. Students will also have experience with the

More information

Hybrid Backup & Disaster Recovery. Back Up SAP HANA and SUSE Linux Enterprise Server with SEP sesam

Hybrid Backup & Disaster Recovery. Back Up SAP HANA and SUSE Linux Enterprise Server with SEP sesam Hybrid Backup & Disaster Recovery Back Up SAP HANA and SUSE Linux Enterprise Server with SEP sesam 1 Table of Contents 1. Introduction and Overview... 3 2. Solution Components... 3 3. SAP HANA: Data Protection...

More information

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability

More information

Senior Project: Calendar

Senior Project: Calendar Senior Project: Calendar By Jason Chin June 2, 2017 Contents 1 Introduction 1 2 Vision and Scope 2 2.1 Business Requirements...................... 2 2.1.1 Background........................ 2 2.1.2 Business

More information

DiskSavvy Disk Space Analyzer. DiskSavvy DISK SPACE ANALYZER. User Manual. Version Dec Flexense Ltd.

DiskSavvy Disk Space Analyzer. DiskSavvy DISK SPACE ANALYZER. User Manual. Version Dec Flexense Ltd. DiskSavvy DISK SPACE ANALYZER User Manual Version 10.3 Dec 2017 www.disksavvy.com info@flexense.com 1 1 Product Overview...3 2 Product Versions...7 3 Using Desktop Versions...8 3.1 Product Installation

More information

Europeana DSI 2 Access to Digital Resources of European Heritage

Europeana DSI 2 Access to Digital Resources of European Heritage Europeana DSI 2 Access to Digital Resources of European Heritage MILESTONE Revision 1.0 Date of submission 28.04.2017 Author(s) Krystian Adamski, Tarek Alkhaeir, Marcin Heliński, Aleksandra Nowak, Marcin

More information

B2SAFE metadata management

B2SAFE metadata management B2SAFE metadata management version 1.2 by Claudio Cacciari, Robert Verkerk, Adil Hasan, Elena Erastova Introduction The B2SAFE service provides a set of functions for long term bit stream data preservation:

More information

Deliverable D8.4 Certificate Transparency Log v2.0 Production Service

Deliverable D8.4 Certificate Transparency Log v2.0 Production Service 16-11-2017 Certificate Transparency Log v2.0 Production Contractual Date: 31-10-2017 Actual Date: 16-11-2017 Grant Agreement No.: 731122 Work Package/Activity: 8/JRA2 Task Item: Task 6 Nature of Deliverable:

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School

More information

Up and Running Software The Development Process

Up and Running Software The Development Process Up and Running Software The Development Process Success Determination, Adaptative Processes, and a Baseline Approach About This Document: Thank you for requesting more information about Up and Running

More information

Securely Access Services Over AWS PrivateLink. January 2019

Securely Access Services Over AWS PrivateLink. January 2019 Securely Access Services Over AWS PrivateLink January 2019 Notices This document is provided for informational purposes only. It represents AWS s current product offerings and practices as of the date

More information

Table of Contents VSSI VMware vcenter Infrastructure...1

Table of Contents VSSI VMware vcenter Infrastructure...1 Table of Contents VSSI VMware vcenter Infrastructure...1 Document version...1 Glossary...1 VMware vsphere Infrastructure...1 Connect to vsphere Server using the vsphere Client...2 VMware vsphere home window...3

More information

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery Java Message Service (JMS) is a standardized messaging interface that has become a pervasive part of the IT landscape

More information

Spatially-Aware Information Retrieval on the Internet

Spatially-Aware Information Retrieval on the Internet Spatially-Aware Information Retrieval on the Internet SPIRIT is funded by EU IST Programme Contract Number: Abstract Multi-Attribute Similarity Ranking Deliverable number: D17:5301 Deliverable type: R

More information

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is

More information

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu } Introduction } Architecture } File

More information

Office and Express Print Release High Availability Setup Guide

Office and Express Print Release High Availability Setup Guide Office and Express Print Release High Availability Setup Guide Version 1.0 2017 EQ-HA-DCE-20170512 Print Release High Availability Setup Guide Document Revision History Revision Date May 12, 2017 September

More information

Constant Contact. Responsyssy. VerticalResponse. Bronto. Monitor. Satisfaction

Constant Contact. Responsyssy. VerticalResponse. Bronto. Monitor. Satisfaction Contenders Leaders Marketing Cloud sy Scale Campaign aign Monitor Niche High Performers Satisfaction Email Marketing Products Products shown on the Grid for Email Marketing have received a minimum of 10

More information

D6.4: Report on Integration into Community Translation Platforms

D6.4: Report on Integration into Community Translation Platforms D6.4: Report on Integration into Community Translation Platforms Philipp Koehn Distribution: Public CasMaCat Cognitive Analysis and Statistical Methods for Advanced Computer Aided Translation ICT Project

More information

IBM InfoSphere Streams v4.0 Performance Best Practices

IBM InfoSphere Streams v4.0 Performance Best Practices Henry May IBM InfoSphere Streams v4.0 Performance Best Practices Abstract Streams v4.0 introduces powerful high availability features. Leveraging these requires careful consideration of performance related

More information

EU and multilingualism & How can public services benefit from CEF Automated Translation

EU and multilingualism & How can public services benefit from CEF Automated Translation EU and multilingualism & How can public services benefit from CEF Automated Translation Saila Rinne, European Commission DG CONNECT, Data Value Chain Unit Oslo, 8 June 2016 Outline EU policy context Need

More information

Software-defined Storage: Fast, Safe and Efficient

Software-defined Storage: Fast, Safe and Efficient Software-defined Storage: Fast, Safe and Efficient TRY NOW Thanks to Blockchain and Intel Intelligent Storage Acceleration Library Every piece of data is required to be stored somewhere. We all know about

More information

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju Distributed Data Infrastructures, Fall 2017, Chapter 2 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note: Term Warehouse-scale

More information

Backup challenge for Home Users

Backup challenge for Home Users PARAGON Technologie GmbH, Systemprogrammierung Heinrich-von-Stephan-Str. 5c 79100 Freiburg, Germany Tel. +49 (0) 761 59018201 Fax +49 (0) 761 59018130 Internet www.paragon-software.com Email sales@paragon-software.com

More information

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines A programming model in Cloud: MapReduce Programming model and implementation for processing and generating large data sets Users specify a map function to generate a set of intermediate key/value pairs

More information

Euro-BioImaging Preparatory Phase II Project

Euro-BioImaging Preparatory Phase II Project Euro-BioImaging Preparatory Phase II Project Testing of the basic framework of EuBI online user access portal, with the most central links and functions included, and improvements and corrections implemented

More information

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. CS 138: Google CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google* 정학수, 최주영 1 Outline Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis Conclusions

More information

Cloud Programming James Larus Microsoft Research. July 13, 2010

Cloud Programming James Larus Microsoft Research. July 13, 2010 Cloud Programming James Larus Microsoft Research July 13, 2010 New Programming Model, New Problems (and some old, unsolved ones) Concurrency Parallelism Message passing Distribution High availability Performance

More information

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017 Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google

More information

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane BIG DATA Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management Author: Sandesh Deshmane Executive Summary Growing data volumes and real time decision making requirements

More information

CHAPTER 7 CONCLUSION AND FUTURE SCOPE

CHAPTER 7 CONCLUSION AND FUTURE SCOPE 121 CHAPTER 7 CONCLUSION AND FUTURE SCOPE This research has addressed the issues of grid scheduling, load balancing and fault tolerance for large scale computational grids. To investigate the solution

More information

SERVICE DESCRIPTION MANAGED BACKUP & RECOVERY

SERVICE DESCRIPTION MANAGED BACKUP & RECOVERY Contents Service Overview.... 3 Key Features... 3 Implementation... 4 Validation... 4 Implementation Process.... 4 Internal Kick-Off... 4 Customer Kick-Off... 5 Provisioning & Testing.... 5 Billing....

More information

Apache Hadoop 3. Balazs Gaspar Sales Engineer CEE & CIS Cloudera, Inc. All rights reserved.

Apache Hadoop 3. Balazs Gaspar Sales Engineer CEE & CIS Cloudera, Inc. All rights reserved. Apache Hadoop 3 Balazs Gaspar Sales Engineer CEE & CIS balazs@cloudera.com 1 We believe data can make what is impossible today, possible tomorrow 2 We empower people to transform complex data into clear

More information

Record Clone User Guide

Record Clone User Guide IOTAP s Record Clone add-on for Microsoft Dynamics CRM allows users to create copy of records for not only System & Standard entities but also Custom and their related entities. Record Clone Version: 3.1

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,

More information

CA Test Data Manager Key Scenarios

CA Test Data Manager Key Scenarios WHITE PAPER APRIL 2016 CA Test Data Manager Key Scenarios Generate and secure all the data needed for rigorous testing, and provision it to highly distributed teams on demand. Muhammad Arif Application

More information

Identifying Workloads for the Cloud

Identifying Workloads for the Cloud Identifying Workloads for the Cloud 1 This brief is based on a webinar in RightScale s I m in the Cloud Now What? series. Browse our entire library for webinars on cloud computing management. Meet our

More information

Perceptive Intelligent Capture. Product Licensing Guide. with Supervised Learning. Version 5.5 SP2

Perceptive Intelligent Capture. Product Licensing Guide. with Supervised Learning. Version 5.5 SP2 Perceptive Intelligent Capture with Supervised Learning Product Licensing Guide Version 5.5 SP2 Written by: Product Documentation, QA Date: July 2013 1991-2013 Perceptive Software, Inc.. All rights reserved

More information

Oracle WebLogic Server Multitenant:

Oracle WebLogic Server Multitenant: Oracle WebLogic Server Multitenant: The World s First Cloud-Native Enterprise Java Platform KEY BENEFITS Enable container-like DevOps and 12-factor application management and delivery Accelerate application

More information

Java Without the Jitter

Java Without the Jitter TECHNOLOGY WHITE PAPER Achieving Ultra-Low Latency Table of Contents Executive Summary... 3 Introduction... 4 Why Java Pauses Can t Be Tuned Away.... 5 Modern Servers Have Huge Capacities Why Hasn t Latency

More information

The BITX M2M ecosystem. Detailed product sheet

The BITX M2M ecosystem. Detailed product sheet The BITX M2M ecosystem Detailed product sheet Stop wasting energy! Finally an M2M application development platform that doesn t have you running in circles. Why building it all from scratch every time?

More information

Designing a System Engineering Environment in a structured way

Designing a System Engineering Environment in a structured way Designing a System Engineering Environment in a structured way Anna Todino Ivo Viglietti Bruno Tranchero Leonardo-Finmeccanica Aircraft Division Torino, Italy Copyright held by the authors. Rubén de Juan

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

Overview. SUSE OpenStack Cloud Monitoring

Overview. SUSE OpenStack Cloud Monitoring Overview SUSE OpenStack Cloud Monitoring Overview SUSE OpenStack Cloud Monitoring Publication Date: 08/04/2017 SUSE LLC 10 Canal Park Drive Suite 200 Cambridge MA 02141 USA https://www.suse.com/documentation

More information

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers 1 HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers Vinit Kumar 1 and Ajay Agarwal 2 1 Associate Professor with the Krishna Engineering College, Ghaziabad, India.

More information

Equitrac Office and Express DCE High Availability White Paper

Equitrac Office and Express DCE High Availability White Paper Office and Express DCE High Availability White Paper 2 Summary............................................................... 3 Introduction............................................................

More information

Building a Real-time Notification System

Building a Real-time Notification System Building a Real-time Notification System September 2015, Geneva Author: Jorge Vicente Cantero Supervisor: Jiri Kuncar CERN openlab Summer Student Report 2015 Project Specification Configurable Notification

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/60 Definition Distributed Systems Distributed System is

More information

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Creating a Recommender System. An Elasticsearch & Apache Spark approach Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Data Protection. Plugging the gap. Gary Comiskey 26 February 2010

Data Protection. Plugging the gap. Gary Comiskey 26 February 2010 Data Protection. Plugging the gap Gary Comiskey 26 February 2010 Data Protection Trends in Financial Services Financial services firms are deploying data protection solutions across their enterprise at

More information

Why Datrium DVX is Best for VDI

Why Datrium DVX is Best for VDI Why Datrium DVX is Best for VDI 385 Moffett Park Dr. Sunnyvale, CA 94089 844-478-8349 www.datrium.com Technical Report Introduction Managing a robust and growing virtual desktop infrastructure in current

More information

Proposed Revisions to ebxml Technical. Architecture Specification v1.04

Proposed Revisions to ebxml Technical. Architecture Specification v1.04 Proposed Revisions to ebxml Technical Architecture Specification v1.04 Business Process Team 11 May 2001 (This document is the non-normative version formatted for printing, July 2001) Copyright UN/CEFACT

More information

Héctor Fernández and G. Pierre Vrije Universiteit Amsterdam

Héctor Fernández and G. Pierre Vrije Universiteit Amsterdam Héctor Fernández and G. Pierre Vrije Universiteit Amsterdam Cloud Computing Day, November 20th 2012 contrail is co-funded by the EC 7th Framework Programme under Grant Agreement nr. 257438 1 Typical Cloud

More information

GFS-python: A Simplified GFS Implementation in Python

GFS-python: A Simplified GFS Implementation in Python GFS-python: A Simplified GFS Implementation in Python Andy Strohman ABSTRACT GFS-python is distributed network filesystem written entirely in python. There are no dependencies other than Python s standard

More information

How To Guide: Long Term Archive for Rubrik. Using SwiftStack Storage as a Long Term Archive for Rubrik

How To Guide: Long Term Archive for Rubrik. Using SwiftStack Storage as a Long Term Archive for Rubrik Using SwiftStack Storage as a Long Term Archive for Rubrik Introduction 3 Solution Architecture 5 Example Design 5 Multi Region Cluster 6 Network Design 6 Minimum Supported Versions and Solution Limits

More information