Search Engines and Time Series Databases
|
|
- Felicia Short
- 6 years ago
- Views:
Transcription
1 Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference Big Data stack High-level Interfaces Data Processing Data Storage Resource Management Support / Integration Valeria Cardellini - SABD 2017/18 1
2 Why search engines? How to add search to NoSQL data stores? E.g., key-value data stores How to find documents that match queries? With text search faster than RDBMs How to obtain specific features? Such as highlighting, spatial search, suggestions, guided navigation, Valeria Cardellini - SABD 2017/18 2 Search engines Most popular search engines: Apache Solr ElasticSearch ETL process Valeria Cardellini - SABD 2017/18 3
3 Apache Solr Scalable, highly reliable and open-source framework for searching data Built on Apache Lucene Open-source library for indexing and search Used by Solr for full-text search Can index documents written in XML, JSON, CSV and binary formats Runs as standalone application service Provides a REST-like web service that exposes services to manage the lifecycle of documents in the index (indexing, querying, ) Used by most popular Web apps (Apple, Instagram, LinkedIn, ) Valeria Cardellini - SABD 2017/18 4 Solr: key features Faceting To group the results based on specific field or defined criteria, providing the count of each subset Example: shopping site can provide facets to narrow search results by manufacturer or price Auto-suggest To present list of possible query terms Spell check To suggest corrected spelling of query terms Highlighting Document clustering To group related documents in the search results Spatial search To filter search results based on location Valeria Cardellini - SABD 2017/18 5
4 Solr: key features Pagination and ranking of search results Results grouping To group the results based on a grouping field and return the top documents in each group Near real-time search To search documents immediately after they have been indexed; useful for apps with dynamic changing content (e.g., news) More Like This To identify other documents that are similar to one in a result set Valeria Cardellini - SABD 2017/18 6 Solr feature example Valeria Cardellini - SABD 2017/18 7
5 Solr components Valeria Cardellini - SABD 2017/18 8 Solr components Request Handlers: handle a client request at a URL To query, a GET request to /select handler To index a document, a POST request to /update handler Response Writers: serialize and stream response to client Search Components: part of a Search Handler, a componentized request handler Includes: Query, Faceting, Highlighting, Debug, Distributed Search capable Update Handlers: handle an indexing request Update Processors chain: per-handler componentized chain that handles updates Query Parsing plugins Mix and match query types in a single request Function plugins for Function Query Text Analysis plugins: Analyzers, Tokenizers, TokenFilters Valeria Cardellini - SABD 2017/18 9
6 Basic searching Solr can be queried via REST clients, curl, wget, Chrome POSTMAN, etc. as well as via native clients available for many programming languages Example: to search all documents in the index via curl curl " Example: to search for a single term curl " Example: to search all electronics documents in the index curl " See Valeria Cardellini - SABD 2017/18 10 Scaling Solr: SolrCloud How to provide distributed indexing and search capabilities? Up to millions of users and millions of indexed documents SolrCloud: deployment functionality of Solr which allows to setup clusters of Solr servers Enables and simplifies horizontal scaling of a search index through replication and sharding Sharding: incoming queries are distributed to shards in the collection, which respond with merged results Replication: to handle higher concurrent query load by spreading the requests to multiple servers No master node to allocate nodes, shards and replicas SolrCloud uses ZooKeeper for storing shared configuration files and for coordination Valeria Cardellini - SABD 2017/18 11
7 Solr distributed architecture Valeria Cardellini - SABD 2017/18 12 Elasticsearch Distributed, multitenant-capable and scalable full-text search engine with REST-based interface and schema-free JSON documents Search engine based on Apache Lucene Developed in Java Distributed Indices can be divided into shards and each shard can have zero or more replicas Each server hosts one or more shards, and acts as a coordinator to delegate operations to the correct shard(s) Rebalancing and routing are done automatically Valeria Cardellini - SABD 2017/18 13
8 Elastic (ELK) Stack Elasticsearch is closely integrated with Logstash and Kibana (Elastic Stack) Logstash Server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to Elasticsearch Kibana Data visualization platform Valeria Cardellini - SABD 2017/18 14 Solr vs. Elasticsearch Elasticsearch vs Solr on Google Trends Solr Mature, widely deployed product Active and large developer community Provides highly detailed functional environment, wide range of plug-ins are available Elasticsearch Newer, but already very widely used Focus on extracting value from data generally, and not just on search Part of ELK stack Schema-free and document-oriented Valeria Cardellini - SABD 2017/18 15
9 Time series data base (TSDB) How to analyze DevOps monitoring, application metrics, sensor data from smart factories, smart cities, or smart vehicles? Time series databases (TSDBs) A possible solution, not the only one! Optimized for handling high-volume time series data Time series: sequence of data points (arrays of numbers) indexed by time (a date time or a date time range), e.g.: Stock prices (price curve) Energy consumption (load profile) Temperature values (temperature trace) Optimized for providing complex logic to analyze time series data Queries for historical data, replete with time ranges and roll ups and arbitrary time zone conversions are difficult in DBMS Valeria Cardellini - SABD 2017/18 16 TSDB: overview Create, enumerate, update and destroy various time series and organize them in some fashion Series may be organized hierarchically and have companion metadata Provide basic calculations on a series as a whole (e.g., multiplying, adding, or combining various time series into a new time series) Filter on arbitrary patterns (e.g., day of the week, low value, high value) Provide additional statistical functions that are targeted to time series data Valeria Cardellini - SABD 2017/18 17
10 TSDB: some products Some open-source products CrateDB Chronix Graphite Stores numeric time-series data and render graphs of this data on demand InfluxDB KairosDB Stores its time series in Cassandra OpenTSDB Stores its time series in HBase Riak-TS NoSQL key/value store optimized for time series data with masterless architecture (similar to Riak-KV) Valeria Cardellini - SABD 2017/18 18 InfluxDB Written in Go Supports high write loads and large data set storage Conserves space through downsampling By automatically expiring and deleting unwanted data as well as backup and restore Provides easy-to-use SQL-like query language for interacting with data Provides simple, high performing write and query HTTP(S) APIs, e.g.: To create a database curl -i -XPOST --data-urlencode "q=create DATABASE mydb To write data curl -i -XPOST ' --data-binary 'cpu_load_short,host=server01,region=us-west value= ' Valeria Cardellini - SABD 2017/18 19
11 InfluxDB data store Data organized by time series, which contain a measured value, like cpu_load or temperature Time series have zero to many points, one for each discrete sample of the metric Points consist of: time (a timestamp) a measurement (e.g., cpu_load ) at least one key-value field (the measured value itself, e.g. value=0.64, or temperature=21.2 ) and zero to many key-value tags containing any metadata about the value (e.g. host=server01, region=emea, dc=frankfurt ) Valeria Cardellini - SABD 2017/18 20 InfluxDB data store General format of points: <measurement>[,<tag-key>=<tag-value>...] <field-key>=<fieldvalue>[,<field2-key>=<field2-value>...] [unix-nano-timestamp] Examples of points: cpu,host=servera,region=us_west value=0.64! payment,device=mobile,product=notepad,method=credit billed=33,licenses=3i stock,symbol=aapl bid=127.46,ask= temperature,machine=unit42,type=assembly external=25,internal= Valeria Cardellini - SABD 2017/18 21
12 InfluxDB data store A measurement is like a SQL table, where the primary index is time With respect to DBMS: No need to define schemas up-front Null values are not stored InfluxDB limitation Horizontal scalability: clustered installation available only as enterprise product Valeria Cardellini - SABD 2017/18 22 InfluxDB stack Integrated with Telegraph, Chronograf and Kapacitor (TICK stack) To realize a MAPE control loop Valeria Cardellini - SABD 2017/18 See 23
13 InfluxDB stack Telegraf: plugin-driven server agent for collecting and reporting metrics and events Input plugins or integrations to source a variety of metrics Output plugins to send metrics to a variety of other data stores, services, and message queues (InfluxDB, Graphite, OpenTSDB, Kafka, MQTT, ) Chronograf: administrative user interface and visualization engine To build dashboards with real-time visualizations of data and to create alerting and automation rules Kapacitor: native data processing engine To process both stream and batch data from InfluxDB E.g., to perform specific actions (e.g., dynamic load balancing) based on alerts (e.g., above load threshold) Valeria Cardellini - SABD 2017/18 24 References Apache Solr Reference Guide, InfluxDB v.1.5 Documentation, Dunning and Friedman, Time Series Databases, O Reilly, Valeria Cardellini - SABD 2017/18 25
Search and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria
More informationNewSQL Databases. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NewSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference
More informationRealtime visitor analysis with Couchbase and Elasticsearch
Realtime visitor analysis with Couchbase and Elasticsearch Jeroen Reijn @jreijn #nosql13 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com About Hippo Visitor Analysis OneHippo
More informationImproving Drupal search experience with Apache Solr and Elasticsearch
Improving Drupal search experience with Apache Solr and Elasticsearch Milos Pumpalovic Web Front-end Developer Gene Mohr Web Back-end Developer About Us Milos Pumpalovic Front End Developer Drupal theming
More informationChronix A fast and efficient time series storage based on Apache Solr. Caution: Contains technical content.
Chronix A fast and efficient time series storage based on Apache Solr Caution: Contains technical content. 68.000.000.000* time correlated data objects. How to store such amount of data on your laptop
More information@InfluxDB. David Norton 1 / 69
@InfluxDB David Norton (@dgnorton) david@influxdb.com 1 / 69 Instrumenting a Data Center 2 / 69 3 / 69 4 / 69 The problem: Efficiently monitor hundreds or thousands of servers 5 / 69 The solution: Automate
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationEPL660: Information Retrieval and Search Engines Lab 3
EPL660: Information Retrieval and Search Engines Lab 3 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Apache Solr Popular, fast, open-source search platform built
More informationApache Storm: Hands-on Session A.A. 2016/17
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Apache Storm: Hands-on Session A.A. 2016/17 Matteo Nardelli Laurea Magistrale in Ingegneria Informatica
More informationEffecient monitoring with Open source tools. Osman Ungur, github.com/o
Effecient monitoring with Open source tools Osman Ungur, github.com/o Who i am? software developer with system-administration background over 10 years mostly writes Java and PHP also working about infrastructure
More informationUsing Prometheus with InfluxDB for metrics storage
Using Prometheus with InfluxDB for metrics storage Roman Vynar Senior Site Reliability Engineer, Quiq September 26, 2017 About Quiq Quiq is a messaging platform for customer service. https://goquiq.com
More informationData pipelines with PostgreSQL & Kafka
Data pipelines with PostgreSQL & Kafka Oskari Saarenmaa PostgresConf US 2018 - Jersey City Agenda 1. Introduction 2. Data pipelines, old and new 3. Apache Kafka 4. Sample data pipeline with Kafka & PostgreSQL
More informationGoal of this document: A simple yet effective
INTRODUCTION TO ELK STACK Goal of this document: A simple yet effective document for folks who want to learn basics of ELK (Elasticsearch, Logstash and Kibana) without any prior knowledge. Introduction:
More informationNoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018
NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data
More informationChallenges in Data Stream Processing
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria
More informationRoad to Auto Scaling
Road to Auto Scaling Varun Thacker Lucidworks Apache Lucene/Solr Committer, and PMC member Agenda APIs Metrics Recipes Auto-Scale Triggers SolrCloud Overview ZooKee per Lots Shard 1 Leader Shard 3 Replica
More informationBattle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć sematext.com
Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć Sematext International @kucrafal @sematext sematext.com Who Am I Solr 3.1 Cookbook author (4.0 inc) Sematext consultant & engineer Solr.pl
More informationThe SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.
Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate
More informationGraph and Timeseries Databases
Graph and Timeseries Databases Roman Kern ISDS, TU Graz 2017-10-23 Roman Kern (ISDS, TU Graz) Dbase2 2017-10-23 1 / 31 Graph Databases Graph Databases Motivation and Basics of Graph Databases? Roman Kern
More informationSurvey and Comparison of Open Source Time Series Databases
Survey and Comparison of Open Source Time Series Databases SCDM @ BTW 2017 Andreas Bader, Oliver Kopp, Michael Falkenthal What is a time series data? A row of data that consists of a timestamp, a value,
More informationTungsten Replicator for Kafka, Elasticsearch, Cassandra
Tungsten Replicator for Kafka, Elasticsearch, Cassandra Topics In todays session Replicator Basics Filtering and Glue Kafka and Options Elasticsearch and Options Cassandra Future Direction 2 Asynchronous
More informationLenses 2.1 Enterprise Features PRODUCT DATA SHEET
Lenses 2.1 Enterprise Features PRODUCT DATA SHEET 1 OVERVIEW DataOps is the art of progressing from data to value in seconds. For us, its all about making data operations as easy and fast as using the
More informationCrateDB for Time Series. How CrateDB compares to specialized time series data stores
CrateDB for Time Series How CrateDB compares to specialized time series data stores July 2017 The Time Series Data Workload IoT, digital business, cyber security, and other IT trends are increasing the
More informationSOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera
SOLUTION TRACK Finding the Needle in a Big Data Haystack @EvaAndreasson, Innovator & Problem Solver Cloudera Agenda Problem (Solving) Apache Solr + Apache Hadoop et al Real-world examples Q&A Problem Solving
More informationBuilding a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch
Nick Pentreath Nov / 14 / 16 Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning
More informationFog Computing. The scenario
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Fog Computing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The scenario
More informationThe Internet of Things:
The Internet of Things: Sensor Data Management Course website: h8p://www.cs.unibo.it/projects/iot/ Prof. Luciano Bononi luciano.bononi@unibo.it Prof. Marco Di Felice marco.difelice3@unibo.it MASTER DEGREE
More informationREADME file for TICKpy (CogSys) Container v0.9.4
README file for TICKpy (CogSys) Container v0.9.4 Container: TICKpy (CogSys) Container-Version: 0.9.4 Interface-Version: 2.0.0 Build-date: Wed Jun 27 12:09:08 UTC 2018 Maintainer: Oliver Beyer Support:
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationUsing ElasticSearch to Enable Stronger Query Support in Cassandra
Using ElasticSearch to Enable Stronger Query Support in Cassandra www.impetus.com Introduction Relational Databases have been in use for decades, but with the advent of big data, there is a need to use
More informationLog Analytics with Amazon Elasticsearch Service. Christoph Schmitter
Log Analytics with Amazon Elasticsearch Service Christoph Schmitter (csc@amazon.de) What we'll cover Understanding Elasticsearch capabilities Elasticsearch, the technology Aggregations; ad-hoc analysis
More informationCourse Content MongoDB
Course Content MongoDB 1. Course introduction and mongodb Essentials (basics) 2. Introduction to NoSQL databases What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL
More informationEnergy Management with AWS
Energy Management with AWS Kyle Hart and Nandakumar Sreenivasan Amazon Web Services August [XX], 2017 Tampa Convention Center Tampa, Florida What is Cloud? The NIST Definition Broad Network Access On-Demand
More informationTechnical Deep Dive: Cassandra + Solr. Copyright 2012, Think Big Analy7cs, All Rights Reserved
Technical Deep Dive: Cassandra + Solr Confiden7al Business case 2 Super scalable realtime analytics Hadoop is fantastic at performing batch analytics Cassandra is an advanced column family oriented system
More informationContainer-based virtualization: Docker
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Container-based virtualization: Docker Corso di Sistemi Distribuiti e Cloud Computing A.A. 2018/19
More informationTuning Enterprise Information Catalog Performance
Tuning Enterprise Information Catalog Performance Copyright Informatica LLC 2015, 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States
More informationE l a s t i c s e a r c h F e a t u r e s. Contents
Elasticsearch Features A n Overview Contents Introduction... 2 Location Based Search... 2 Search Social Media(Twitter) data from Elasticsearch... 4 Query Boosting in Elasticsearch... 4 Machine Learning
More informationADVANCED DATABASES CIS 6930 Dr. Markus Schneider. Group 5 Ajantha Ramineni, Sahil Tiwari, Rishabh Jain, Shivang Gupta
ADVANCED DATABASES CIS 6930 Dr. Markus Schneider Group 5 Ajantha Ramineni, Sahil Tiwari, Rishabh Jain, Shivang Gupta WHAT IS ELASTIC SEARCH? Elastic Search Elasticsearch is a search engine based on Lucene.
More informationOracle NoSQL Database Enterprise Edition, Version 18.1
Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across
More informationKafka Streams: Hands-on Session A.A. 2017/18
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Kafka Streams: Hands-on Session A.A. 2017/18 Matteo Nardelli Laurea Magistrale in Ingegneria Informatica
More informationJargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems
Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons
More information#IoT #BigData. 10/31/14
#IoT #BigData Seema Jethani @seemaj @basho 1 10/31/14 Why should we care? 2 11/2/14 Source: http://en.wikipedia.org/wiki/internet_of_things Motivation for Specialized Big Data Systems Rate of data capture
More informationBring Context To Your Machine Data With Hadoop, RDBMS & Splunk
Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may
More informationAzure-persistence MARTIN MUDRA
Azure-persistence MARTIN MUDRA Storage service access Blobs Queues Tables Storage service Horizontally scalable Zone Redundancy Accounts Based on Uri Pricing Calculator Azure table storage Storage Account
More informationUniversità degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica. Hadoop Ecosystem
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Hadoop Ecosystem Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini Why an
More informationElasticSearch in Production
ElasticSearch in Production lessons learned Anne Veling, ApacheCon EU, November 6, 2012 agenda! Introduction! ElasticSearch! Udini! Upcoming Tool! Lessons Learned introduction! Anne Veling, @anneveling!
More informationNinja Level Infrastructure Monitoring. Defensive Approach to Security Monitoring and Automation
Ninja Level Infrastructure Monitoring Defensive Approach to Security Monitoring and Automation 1 DEFCON 24 06 th August 2016, Saturday 10:00-14:00 Madhu Akula & Riyaz Walikar Appsecco.com 2 About Automation
More informationDeveloping Microsoft Azure Solutions (70-532) Syllabus
Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages
More informationSEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME. Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013
SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013 1 WHO AM I? Ryan Tabora Think Big Analytics - Senior Data Engineer Lover of dachshunds,
More informationIntroduction to NoSQL Databases
Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationUsing Elastic with Magento
Using Elastic with Magento Stefan Willkommer CTO and CO-Founder @ TechDivision GmbH Comparison License Apache License Apache License Index Lucene Lucene API RESTful Webservice RESTful Webservice Scheme
More informationThales PunchPlatform Agenda
Thales PunchPlatform Agenda What It Does Building Blocks PunchPlatform team Deployment & Operations Typical Setups Customers and Use Cases RoadMap 1 What It Does Compose Arbitrary Industrial Data Processing
More information1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions
Big Data Hadoop Architect Online Training (Big Data Hadoop + Apache Spark & Scala+ MongoDB Developer And Administrator + Apache Cassandra + Impala Training + Apache Kafka + Apache Storm) 1 Big Data Hadoop
More informationTime Series Live 2017
1 Time Series Schemas @Percona Live 2017 Who Am I? Chris Larsen Maintainer and author for OpenTSDB since 2013 Software Engineer @ Yahoo Central Monitoring Team Who I m not: A marketer A sales person 2
More informationDEVOPS COURSE CONTENT
LINUX Basics: Unix and linux difference Linux File system structure Basic linux/unix commands Changing file permissions and ownership Types of links soft and hard link Filter commands Simple filter and
More informationThe Art of Container Monitoring. Derek Chen
The Art of Container Monitoring Derek Chen 2016.9.22 About me DevOps Engineer at Trend Micro Agile transformation Micro service and cloud service Docker integration Monitoring system development Automate
More informationBacking Up And Restoring Nagios Log Server. This document describes how to backup and restore a Nagios Log Server cluster.
Backing Up And Restoring Purpose This document describes how to backup and restore a cluster. Target Audience This document is intended for use by Administrators who wish to understand the different backup
More informationA Generic Microservice Architecture for Environmental Data Management
A Generic Microservice Architecture for Environmental Data Management Clemens Düpmeier, Eric Braun, Thorsten Schlachter, Karl-Uwe Stucky, Wolfgang Suess KIT The Research University in the Helmholtz Association
More informationTable 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti
Solution Overview Cisco UCS Integrated Infrastructure for Big Data with the Elastic Stack Cisco and Elastic deliver a powerful, scalable, and programmable IT operations and security analytics platform
More informationChapter 24 NOSQL Databases and Big Data Storage Systems
Chapter 24 NOSQL Databases and Big Data Storage Systems - Large amounts of data such as social media, Web links, user profiles, marketing and sales, posts and tweets, road maps, spatial data, email - NOSQL
More informationKafka Connect the Dots
Kafka Connect the Dots Building Oracle Change Data Capture Pipelines With Kafka Mike Donovan CTO Dbvisit Software Mike Donovan Chief Technology Officer, Dbvisit Software Multi-platform DBA, (Oracle, MSSQL..)
More informationChronix: Long Term Storage and Retrieval Technology for Anomaly Detection in Operational Data
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in Operational Data FAST 2017, Santa Clara Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, and Josef Adersberger Florian.Lautenschlager@qaware.de
More informationUsing AWS to Build a Large Scale Dockerized Microservices Architecture. Dr. Oliver Wahlen moovel Group GmbH Frankfurt, 30.
Using AWS to Build a Large Scale Dockerized Microservices Architecture Dr. Oliver Wahlen moovel Group GmbH Frankfurt, 30. Juni 2016 The moovel Group GmbH Our vision is an ecosystem that simplifies mobility
More informationBasic Concepts of the Energy Lab 2.0 Co-Simulation Platform
Basic Concepts of the Energy Lab 2.0 Co-Simulation Platform Jianlei Liu KIT Institute for Applied Computer Science (Prof. Dr. Veit Hagenmeyer) KIT University of the State of Baden-Wuerttemberg and National
More informationrpaf ktl Pen Apache Solr 3 Enterprise Search Server J community exp<= highlighting, relevancy ranked sorting, and more source publishing""
Apache Solr 3 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, relevancy ranked sorting, and more David Smiley Eric Pugh rpaf ktl Pen I I riv IV I J community
More informationElasticsearch. Presented by: Steve Mayzak, Director of Systems Engineering Vince Marino, Account Exec
Elasticsearch Presented by: Steve Mayzak, Director of Systems Engineering Vince Marino, Account Exec What about Elasticsearch the Company?! Support 100s of Companies in Production environments Training
More informationTowards a Real- time Processing Pipeline: Running Apache Flink on AWS
Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Dr. Steffen Hausmann, Solutions Architect Michael Hanisch, Manager Solutions Architecture November 18 th, 2016 Stream Processing Challenges
More informationOracle NoSQL Database Enterprise Edition, Version 18.1
Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across
More informationContainer 2.0. Container: check! But what about persistent data, big data or fast data?!
@unterstein @joerg_schad @dcos @jaxdevops Container 2.0 Container: check! But what about persistent data, big data or fast data?! 1 Jörg Schad Distributed Systems Engineer @joerg_schad Johannes Unterstein
More informationWhy NoSQL? Why Riak?
Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense? Riak Voldemort HBase MongoDB Neo4j Cassandra CouchDB Membase Redis (and the list goes on...) 2 What went wrong with
More informationFUJITSU Software ServerView Cloud Monitoring Manager V1.1. Release Notes
FUJITSU Software ServerView Cloud Monitoring Manager V1.1 Release Notes J2UL-2170-01ENZ0(00) July 2016 Contents Contents About this Manual... 4 1 What's New?...6 1.1 Performance Improvements... 6 1.2
More informationDEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!
DEMYSTIFYING BIG DATA WITH RIAK USE CASES Martin Schneider Basho Technologies! Agenda Defining Big Data in Regards to Riak A Series of Trade-Offs Use Cases Q & A About Basho & Riak Basho Technologies is
More informationDesign Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013
Design Patterns for Large- Scale Data Management Robert Hodges OSCON 2013 The Start-Up Dilemma 1. You are releasing Online Storefront V 1.0 2. It could be a complete bust 3. But it could be *really* big
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationOPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS
OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS 1 Why GPUs? A Tale of Numbers 100x Performance Increase Infrastructure Cost Savings Performance 100x gains over traditional
More informationApplication monitoring with BELK. Nishant Sahay, Sr. Architect Bhavani Ananth, Architect
Application monitoring with BELK Nishant Sahay, Sr. Architect Bhavani Ananth, Architect Why logs Business PoV Input Data Analytics User Interactions /Behavior End user Experience/ Improvements 2017 Wipro
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationIndex. Raul Estrada and Isaac Ruiz 2016 R. Estrada and I. Ruiz, Big Data SMACK, DOI /
Index A ACID, 251 Actor model Akka installation, 44 Akka logos, 41 OOP vs. actors, 42 43 thread-based concurrency, 42 Agents server, 140, 251 Aggregation techniques materialized views, 216 probabilistic
More informationPercona Live September 21-23, 2015 Mövenpick Hotel Amsterdam
Percona Live 2015 September 21-23, 2015 Mövenpick Hotel Amsterdam MongoDB, Elastic, and Hadoop: The What, When, and How Kimberly Wilkins Principal Engineer/Database Denizen ObjectRocket/Rackspace kimberly@objectrocket.com
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationAnyMiner 3.0, Real-time Big Data Analysis Solution for Everything Data Analysis. Mar 25, TmaxSoft Co., Ltd. All Rights Reserved.
AnyMiner 3.0, Real-time Big Analysis Solution for Everything Analysis Mar 25, 2015 2015 TmaxSoft Co., Ltd. All Rights Reserved. Ⅰ Ⅱ Ⅲ Platform for Net IT AnyMiner, Real-time Big Analysis Solution AnyMiner
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.
More informationNew Oracle NoSQL Database APIs that Speed Insertion and Retrieval
New Oracle NoSQL Database APIs that Speed Insertion and Retrieval O R A C L E W H I T E P A P E R F E B R U A R Y 2 0 1 6 1 NEW ORACLE NoSQL DATABASE APIs that SPEED INSERTION AND RETRIEVAL Introduction
More informationMicroservices log gathering, processing and storing
Microservices log gathering, processing and storing Siim-Toomas Marran Univeristy of Tartu J.Liivi 2 Tartu, Estonia siimtoom@ut.ee ABSTRACT The aim of this work is to investigate and implement one of the
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationIntroduction to Big Data
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Introduction to Big Data Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationMigrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring
Migrating massive monitoring to Bigtable without downtime Martin Parm, Infrastructure Engineer for Monitoring This is a big deal. -- Nicholas Harteau/VP, Engineering & Infrastructure https://news.spotify.com/dk/2016/02/23/announcing-spotify-infrastructures-googley-future/
More informationCIB Session 12th NoSQL Databases Structures
CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationFLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM
FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM RECOMMENDATION AND JUSTIFACTION Executive Summary: VHB has been tasked by the Florida Department of Transportation District Five to design
More informationEnd to End Analysis on System z IBM Transaction Analysis Workbench for z/os. James Martin IBM Tools Product SME August 10, 2015
End to End Analysis on System z IBM Transaction Analysis Workbench for z/os James Martin IBM Tools Product SME August 10, 2015 Please note IBM s statements regarding its plans, directions, and intent are
More informationfor Multi-Services Gateways
KURA an OSGi-basedApplication Framework for Multi-Services Gateways Introduction & Technical Overview Pierre Pitiot Grenoble 19 février 2014 Multi-Service Gateway Approach ESF / Increasing Value / Minimizing
More informationCopyright 2013, Oracle and/or its affiliates. All rights reserved.
1 Oracle NoSQL Database: Release 3.0 What s new and why you care Dave Segleau NoSQL Product Manager The following is intended to outline our general product direction. It is intended for information purposes
More informationSpark Streaming: Hands-on Session A.A. 2017/18
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Spark Streaming: Hands-on Session A.A. 2017/18 Matteo Nardelli Laurea Magistrale in Ingegneria Informatica
More informationUn'introduzione a Kafka Streams e KSQL and why they matter! ITOUG Tech Day Roma 1 Febbraio 2018
Un'introduzione a Kafka Streams e KSQL and why they matter! ITOUG Tech Day Roma 1 Febbraio 2018 R E T H I N K I N G Stream Processing with Apache Kafka Kafka the Streaming Data Platform 1.0 Enterprise
More information