Using ElasticSearch to Enable Stronger Query Support in Cassandra
|
|
- Rosemary McDonald
- 6 years ago
- Views:
Transcription
1 Using ElasticSearch to Enable Stronger Query Support in Cassandra
2 Introduction Relational Databases have been in use for decades, but with the advent of big data, there is a need to use NoSQL databases to handle the enormous amount of accumulating data. This leads to multiple challenges: Many operations such as aggregations, grouping, and ordering cannot be performed with many NoSQL databases Users are unable to use NoSQL data stores due to the restrictions they impose on executing complex and rich queries The learning curve in using these databases is steep Cassandra is one such data store which has restricted query support. However, there are search servers like elasticsearch that enable extensive querying mechanisms over data indexes. This paper provides an overview of Cassandra and Elasticsearch. What is Cassandra? Cassandra is a distributed database management system that easily handles enormous amounts of data. Cassandra is built to do the following: Overcome the challenges of high availability with no single point of failure even as the size of clusters increase. Require almost no configuration to add new nodes in existing clusters due to its elastic scalability Handle increasing amounts of data with minimal changes at any point in time. Provide flexible data storage Allow easy data distribution Provide operational simplicity However, despite these strong features and functions, there are some fundamental limitations such as primitive querying and search capabilities. Most of the NoSQL databases work on the fundamentals of querying/updating records using primary keys, but this is a highly ineffective way of using it. In real world scenarios, NoSQL databases frequently require non-primary keys, such as the price of a commodity using greater/less than values or lists of employees whose address contains xyz as its Street field. Thus, Cassandra lacks the querying capabilities that are often very much needed in real world scenarios. This shortcoming restricts users from leveraging its data storage power to the fullest. 1 2 What is Elasticsearch? Elasticsearch is a powerful search server based on Lucene and is used for realtime indexing. It provides a distributed, multitenant-capable, full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.
3 It supports features such as: Suggestions Autocomplete Querying Filtering Post filtering Aggregations Aggregation support enables users to perform numerical operations on data which proves to be an exceptionally strong add-on. Elasticsearch is the second most popular enterprise search engine. Aggregations in Elasticsearch Aggregation is one of the most powerful features in Elasticsearch. Think of it as a unit of work that builds analytic information over a set of documents that are indexed in Elasticsearch. This analytic processing with NoSQL databases can be used to perform SQL operations (like sum, min, max, count, order by, group by, having and many more). How does Elasticsearch support stronger queries in Cassandra? One of the most effective approaches we have found to achieve this is by using an indexing store in combination with any of the NoSQL database. Indexing stores can perform several operations directly on secondary indexes. The result of that process consists of IDs that can be used to retrieve the corresponding NoSQL data. In this way, the desired output can be achieved. This process is described in the diagram below: Data Aggregation Request from Client ES query generated by parsing the request NoSQL datastore specific query is generated and aggregations values are passed NoSQL Datastore Response Aggregation is performed over secondary indexes by executing generated ES query which returns the required record IDs Index Store NoSQL native query is executed for record IDs reverted by ES and returns THE RECORDS TO THE CLIENT 3
4 Here s how it works: 1. The client sends a request 2. This request is passed to the indexing store 3. The request is then sent to Cassandra 4. Then the indexing store performs the aggregation over the secondary indexes stored in it 5. ElasticSearch performs analytics over the indexes as required by the query, then returns the aggregated values and the respective record ID 6. These record IDs are used to send native queries on Cassandra to fetch complete records You may be concerned that in order to use the above approach you will have to learn NoSQL databases as well as the Indexing store s mechanism, but the good news is that there is a tool that allows you to achieve all of the above simply by creating SQ- like queries. As a result, you can start using a NoSQL database along with the Indexing store without learning any native database s API. This tool is called Kundera. Kundera can take a JPA (a popular API) query as an input and provide the desired output using native APIs of data stores and using the index stores that it supports under the hood. The diagram below describes how one can create indexes and run complex queries on data in NoSQL using Kundera in JPA way. For non aggregation query, index store returns the ID s of the records that matches the criteria in the required order. These ID s are then passed to the Client delegator Response Kundera Response Wrapper NoSQL DB returns the queried records which are then passed to response wrapper NoSQL Datastore Client JPA Query For aggregated query Core Engine Client Delegator Client delegator generates client specific query for respective ID s and execute it Core engine generates the corresponding index store query and delegates to it ID Category X Y X Z X B Query response contains the aggregation results directly computed over secondary indexes in case of aggregated query, otherwise it returns the ID s of the records that matches the query Price Index Store 4
5 Capabilities of the Data Lake Here s what s happening in the previous diagram: 1. The client submits a JPA query 2. Kundera s core engine analyzes the query 3. The query is then passed to the elasticsearch client 4. The client generates an elasticsearch aggregation equivalent 5. Elasticsearch processes the query over the secondary index data stored in it 6. Elasticsearch then sends the processed response back to the core engine. 7. The core engine further analyzes this response 8. If the JPA query only needs the aggregations result, then this response is redirected to a response wrapper. Otherwise, it is redirected to the respective NoSQL Client delegator 9. The client delegator generates the client-specific query for the row IDs 10. The row IDs are filtered by the elasticsearch according to criteria 11. It then passes to the NoSQL data store 12. The data store fetches the records and passes it to the response wrapper. 13. The Response wrapper prepares the results in the required manner and returns the result to the client. Example To further demonstrate this with the help of query, let s take an example of stable Product with price as one of its columns. Let s say we need to find minimum price. In SQL, it can be achieved using the following query: Select min(price) from Product However, in Cassandra, there is no direct query support to find the minimum of column values. To find this out, we will have to fetch all the values and process the values to find the minimum, which is a very inefficient approach. The complexity is directly proportional to the number of values in the column. Using Kundera, you can achieve this simply by doing the following: Create an entity corresponding to Product table and specify which columns to be indexed in the following manner: Create an entity corresponding to Product table and specify which columns to be indexed in the following = = = "productcategory )}) public class Product { } Specify the indexer in Cassandra s persistence unit: <persistence-unit name="esindexertest"> <provider>com.impetus.kundera.kunderapersistence</provider> <properties> <property name="kundera.nodes" value="localhost" /> <property name="kundera.port" value="9160" /> <property name="kundera.keyspace" value="kunderaexamples"/> 5
6 <property name="kundera.dialect" value="cassandra" /> <property name="kundera.ddl.auto.prepare" value="create" /> <property name="kundera.client.lookup.class" value="com.impetus.client.cassandra.thrift.thriftclientfactory" /> <property name="kundera.indexer.class" value="com.impetus.client.es.index.esindexer" /> </properties> </persistence-unit> Here, setting the value of property: "kundera.indexer.class" as com.impetus.client.es.index.esindexer" will tell Kundera s Cassandra client to create indexes using Elasticsearch and query the same on Elasticsearch. In the background, Elasticsearch generated indexes while persisting data using Kundera. And now you can use this Min Aggregation of Elasticsearch to simply find the minimum value over the indexed data. Let's see how this happens: Min aggregation returns the minimum value for numeric values extracted from the aggregated indexed documents. Below is the example of a Min aggregation query: { } "aggs" : { "min_price : { min : { field : "price }} } This aggregation returns the minimum value in the price column. In a similar way, the maximum value can also be found. Similar JPA queries can be run using Kundera to find the min and max in product category: Select min(e.price), max(e.price) from Product e where e.productcategory = 'Category1' When a JPA query is received, parsing is done and the query is analyzed by Kundera s core engine. For any aggregation keyword obtained, Elasticsearch generates a query. This query is delegated to Elasticsearch and the desired response is obtained and passed to the response wrapper as shown in diagram. Below is the corresponding Elasticsearch aggregation query: { "aggregations" : { "whereclause" : { "filter" : { "term" : { "productcategory" : "Category1" } }, "aggregations" : { 6 "MIN_price"
7 "min" : { "field" : "price" } }, "MAX_price" : { "max" : { "field" : "price" } } } } } } In similar manner, Elasticsearch can execute sum, average and count queries over Cassandra. Here s another example: For the same Product table, suppose there is a column Product Category. We need to find records grouped by category of products. In SQL, we can achieve this using the following query: Select * from Product where price > 100 group by productcategory Although there is no query in Cassandra that directly supports Group by, this can be achieved using Kundera s Cassandra + Elasticsearch combination. Kundera translates the above query in Elasticsearch s Term aggregation that groups the records on the basis of the field value. Below is an example that shows how to fetch the record grouped by product category using the following JPA query: Select p from Product where p.price > 100 group by p.productcategory Equivalent Term aggregation query: { "aggregations" : { } } } } } "whereclause" : { "filter" : { "range" : { "price" : { "from" : "100", "to" : null } } }, "aggregations" : { Here s what happens: "group_by" : { "term" : { "field" : "productcategory" } } 1. The Elasticsearch query fetches the records that are grouped according to the category field 2. Elasticsearch returns the record IDs having a price greater than 100 and grouped by the Product Category column value 3. These records IDs are then passed to the client delegator 4. The client delegator generates a Cassandra specific query and executes it to fetch the complete records from the Cassandra In this way we can get records grouped by some column values using Elasticsearch. 7
8 Using the above approach, we can perform aggregations and other complex queries which are not directly available in Cassandra. There are a lot of other rich aggregations available in Elasticsearch whose strength can be leveraged to query NoSQL data. To summarize, selecting the right indexing mechanism, creating required indexes and using it in combination with a NoSQL can actually lead to enrich the available querying support of a lot of NoSQL data stores. About Impetus 2015 Impetus Technologies, Inc. All rights reserved. Product and company names mentioned herein may be trademarks of their respective companies. June 2015 Impetus is focused on creating big business impact through Big Data Solutions for Fortune 1000 enterprises across multiple verticals. The company brings together a unique mix of software products, consulting services, Data Science capabilities and technology expertise. It offers full life-cycle services for Big Data implementations and real-time streaming analytics, including technology strategy, solution architecture, proof of concept, production implementation and on-going support to its clients. Visit or write to us at bigdata@impetus.com
Realtime visitor analysis with Couchbase and Elasticsearch
Realtime visitor analysis with Couchbase and Elasticsearch Jeroen Reijn @jreijn #nosql13 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com About Hippo Visitor Analysis OneHippo
More informationMigrate from Netezza Workload Migration
Migrate from Netezza Automated Big Data Open Netezza Source Workload Migration CASE SOLUTION STUDY BRIEF Automated Netezza Workload Migration To achieve greater scalability and tighter integration with
More informationOracle NoSQL Database Enterprise Edition, Version 18.1
Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across
More informationAutomated Netezza Migration to Big Data Open Source
Automated Netezza Migration to Big Data Open Source CASE STUDY Client Overview Our client is one of the largest cable companies in the world*, offering a wide range of services including basic cable, digital
More informationSearch Engines and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18
More informationOracle NoSQL Database Enterprise Edition, Version 18.1
Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across
More informationRethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu
RethinkDB Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu Content Introduction System Features Data Model ReQL Applications Introduction Niharika Vithala What is a NoSQL Database Databases that
More informationCreating a Recommender System. An Elasticsearch & Apache Spark approach
Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused
More informationMigrate from Netezza Workload Migration
Migrate from Netezza Automated Big Data Open Netezza Source Workload Migration CASE SOLUTION STUDY BRIEF Automated Netezza Workload Migration To achieve greater scalability and tighter integration with
More informationNoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018
NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data
More informationSearch and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationSEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME. Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013
SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013 1 WHO AM I? Ryan Tabora Think Big Analytics - Senior Data Engineer Lover of dachshunds,
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationOracle GoldenGate for Big Data
Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines
More informationNoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Leveraging the NoSQL boom 2 Why NoSQL? In the last fourty years relational databases have been the default choice for serious
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationHigh-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg
High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg common work with Nikolaus Glombiewski, Michael Körber, Marc Seidemann 1.
More informationPython, PySpark and Riak TS. Stephen Etheridge Lead Solution Architect, EMEA
Python, PySpark and Riak TS Stephen Etheridge Lead Solution Architect, EMEA Agenda Introduction to Riak TS The Riak Python client The Riak Spark connector and PySpark CONFIDENTIAL Basho Technologies 3
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationAmazon Search Services. Christoph Schmitter
Amazon Search Services Christoph Schmitter csc@amazon.de What we'll cover Overview of Amazon Search Services Understand the difference between Cloudsearch and Amazon ElasticSearch Service Q&A Amazon Search
More informationProvide Real-Time Data To Financial Applications
Provide Real-Time Data To Financial Applications DATA SHEET Introduction Companies typically build numerous internal applications and complex APIs for enterprise data access. These APIs are often engineered
More informationNon-Relational Databases. Pelle Jakovits
Non-Relational Databases Pelle Jakovits 25 October 2017 Outline Background Relational model Database scaling The NoSQL Movement CAP Theorem Non-relational data models Key-value Document-oriented Column
More informationActive Server Pages Architecture
Active Server Pages Architecture Li Yi South Bank University Contents 1. Introduction... 2 1.1 Host-based databases... 2 1.2 Client/server databases... 2 1.3 Web databases... 3 2. Active Server Pages...
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationCONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION
Hands-on Session NoSQL DB Donato Summa THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION 1 Summary Elasticsearch How to get Elasticsearch up and running ES data organization
More informationRoad to a Multi-model Database -- making PostgreSQL the most popular and versatile database
PGConf.ASIA 2017 Road to a Multi-model Database -- making PostgreSQL the most popular and versatile database December 5, 2017 Takayuki Tsunakawa Fujitsu Limited 0 Who am I? Takayuki Tsunakawa PostgreSQL
More informationData Mining with Elastic
2017 IJSRST Volume 3 Issue 3 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology Data Mining with Elastic Mani Nandhini Sri, Mani Nivedhini, Dr. A. Balamurugan Sri Krishna
More informationAsanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks
Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data
More informationFLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM
FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM RECOMMENDATION AND JUSTIFACTION Executive Summary: VHB has been tasked by the Florida Department of Transportation District Five to design
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationNew Oracle NoSQL Database APIs that Speed Insertion and Retrieval
New Oracle NoSQL Database APIs that Speed Insertion and Retrieval O R A C L E W H I T E P A P E R F E B R U A R Y 2 0 1 6 1 NEW ORACLE NoSQL DATABASE APIs that SPEED INSERTION AND RETRIEVAL Introduction
More informationExample Azure Implementation for Government Agencies. Indirect tax-filing system. By Alok Jain Azure Customer Advisory Team (AzureCAT)
Example Azure Implementation for Government Agencies Indirect tax-filing system By Alok Jain Azure Customer Advisory Team (AzureCAT) June 2018 Example Azure Implementation for Government Agencies Contents
More informationTop 5 Considerations When Evaluating NoSQL Databases
A MongoDB White Paper Top 5 Considerations When Evaluating NoSQL Databases August 2015 Table of Contents Introduction Data Model Document Model Graph Model Key-Value and Wide Column Models Query Model
More informationUsing the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver
Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data
More informationA NoSQL Introduction for Relational Database Developers. Andrew Karcher Las Vegas SQL Saturday September 12th, 2015
A NoSQL Introduction for Relational Database Developers Andrew Karcher Las Vegas SQL Saturday September 12th, 2015 About Me http://www.andrewkarcher.com Twitter: @akarcher LinkedIn, Twitter Email: akarcher@gmail.com
More informationHibernate Search Googling your persistence domain model. Emmanuel Bernard Doer JBoss, a division of Red Hat
Hibernate Search Googling your persistence domain model Emmanuel Bernard Doer JBoss, a division of Red Hat Search: left over of today s applications Add search dimension to the domain model Frankly, search
More informationWhat is a multi-model database and why use it?
What is a multi-model database and why use it? An When it comes to choosing the right technology for a new project, ongoing development or a full system upgrade, it can often be challenging to define the
More informationA Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores
A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores Nikhil Dasharath Karande 1 Department of CSE, Sanjay Ghodawat Institutes, Atigre nikhilkarande18@gmail.com Abstract- This paper
More informationIntroduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos
Instituto Politécnico de Tomar Introduction to Big Data NoSQL Databases Ricardo Campos Mestrado EI-IC Análise e Processamento de Grandes Volumes de Dados Tomar, Portugal, 2016 Part of the slides used in
More informationData 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.
Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020
More informationElasticSearch in Production
ElasticSearch in Production lessons learned Anne Veling, ApacheCon EU, November 6, 2012 agenda! Introduction! ElasticSearch! Udini! Upcoming Tool! Lessons Learned introduction! Anne Veling, @anneveling!
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
THE FOLLOWING IS INTENDED TO OUTLINE OUR GENERAL PRODUCT DIRECTION. IT IS INTENDED FOR INFORMATION PURPOSES ONLY, AND MAY NOT BE INCORPORATED INTO ANY CONTRACT. IT IS NOT A COMMITMENT TO DELIVER ANY MATERIAL,
More informationSchema Management In Hibernate Interview. Questions >>>CLICK HERE<<<
Schema Management In Hibernate Interview Questions Hibernate is a popular framework of Java which allows an efficient Object hibernate interview questions What's transaction management in hibernate? What
More informationImproving Drupal search experience with Apache Solr and Elasticsearch
Improving Drupal search experience with Apache Solr and Elasticsearch Milos Pumpalovic Web Front-end Developer Gene Mohr Web Back-end Developer About Us Milos Pumpalovic Front End Developer Drupal theming
More informationAgenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache
Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More informationDistributed Databases: SQL vs NoSQL
Distributed Databases: SQL vs NoSQL Seda Unal, Yuchen Zheng April 23, 2017 1 Introduction Distributed databases have become increasingly popular in the era of big data because of their advantages over
More informationData Lake Based Systems that Work
Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a
More informationAutomated Netezza to Cloud Migration
Automated Netezza to Cloud Migration CASE STUDY Client Overview Our client is a government-sponsored enterprise* that provides financial products and services to increase the availability and affordability
More informationCIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench
CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench Abstract Implementing a Hadoop-based system for processing big data and doing analytics is a topic which has been
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationApache Hadoop Goes Realtime at Facebook. Himanshu Sharma
Apache Hadoop Goes Realtime at Facebook Guide - Dr. Sunny S. Chung Presented By- Anand K Singh Himanshu Sharma Index Problem with Current Stack Apache Hadoop and Hbase Zookeeper Applications of HBase at
More information<Insert Picture Here> MySQL Cluster What are we working on
MySQL Cluster What are we working on Mario Beck Principal Consultant The following is intended to outline our general product direction. It is intended for information purposes only,
More informationAn UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry
An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry I-Chen Wu 1 and Shang-Hsien Hsieh 2 Department of Civil Engineering, National Taiwan
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationModule - 17 Lecture - 23 SQL and NoSQL systems. (Refer Slide Time: 00:04)
Introduction to Morden Application Development Dr. Gaurav Raina Prof. Tanmai Gopal Department of Computer Science and Engineering Indian Institute of Technology, Madras Module - 17 Lecture - 23 SQL and
More informationUnderstanding the latent value in all content
Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence
More informationPercona Live September 21-23, 2015 Mövenpick Hotel Amsterdam
Percona Live 2015 September 21-23, 2015 Mövenpick Hotel Amsterdam MongoDB, Elastic, and Hadoop: The What, When, and How Kimberly Wilkins Principal Engineer/Database Denizen ObjectRocket/Rackspace kimberly@objectrocket.com
More informationAbstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight
ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group
More informationChapter 11: Data Management Layer Design
Systems Analysis and Design With UML 2.0 An Object-Oriented Oriented Approach, Second Edition Chapter 11: Data Management Layer Design Alan Dennis, Barbara Wixom, and David Tegarden 2005 John Wiley & Sons,
More informationIntro Cassandra. Adelaide Big Data Meetup.
Intro Cassandra Adelaide Big Data Meetup instaclustr.com @Instaclustr Who am I and what do I do? Alex Lourie Worked at Red Hat, Datastax and now Instaclustr We currently manage x10s nodes for various customers,
More informationThe Technology of the Business Data Lake. Appendix
The Technology of the Business Data Lake Appendix Pivotal data products Term Greenplum Database GemFire Pivotal HD Spring XD Pivotal Data Dispatch Pivotal Analytics Description A massively parallel platform
More informationPolyglot Persistence. EclipseLink JPA for NoSQL, Relational, and Beyond. Shaun Smith Gunnar Wagenknecht
Polyglot Persistence EclipseLink JPA for NoSQL, Relational, and Beyond Shaun Smith shaun.smith@oracle.com Gunnar Wagenknecht gunnar@wagenknecht.org 2012 Oracle and AGETO; Licensed under a Creative Commons
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationIn-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Big data analytics / machine learning 6+ years
More informationGoal of this document: A simple yet effective
INTRODUCTION TO ELK STACK Goal of this document: A simple yet effective document for folks who want to learn basics of ELK (Elasticsearch, Logstash and Kibana) without any prior knowledge. Introduction:
More informationIntroduction to NoSQL Databases
Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction
More informationExtend NonStop Applications with Cloud-based Services. Phil Ly, TIC Software John Russell, Canam Software
Extend NonStop Applications with Cloud-based Services Phil Ly, TIC Software John Russell, Canam Software Agenda Cloud Computing and Microservices Amazon Web Services (AWS) Integrate NonStop with AWS Managed
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More information#MicroFocusCyberSummit
#MicroFocusCyberSummit Data Simplicity: ArcSight Data Platform enhances enterprise data via the Common Event Format Peter Titov Micro Focus #MicroFocusCyberSummit Agenda Usage Ingestion Management Solutions
More informationMySQL Cluster Web Scalability, % Availability. Andrew
MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended
More informationNOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe
NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS h_da Prof. Dr. Uta Störl Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe 2017 163 Performance / Benchmarks Traditional database benchmarks
More informationElastify Cloud-Native Spark Application with PMEM. Junping Du --- Chief Architect, Tencent Cloud Big Data Department Yue Li --- Cofounder, MemVerge
Elastify Cloud-Native Spark Application with PMEM Junping Du --- Chief Architect, Tencent Cloud Big Data Department Yue Li --- Cofounder, MemVerge Table of Contents Sparkling: The Tencent Cloud Data Warehouse
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationAmusing algorithms and data-structures that power Lucene and Elasticsearch. Adrien Grand
Amusing algorithms and data-structures that power Lucene and Elasticsearch Adrien Grand Agenda conjunctions regexp queries numeric doc values compression cardinality aggregation How are conjunctions implemented?
More informationTopics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL
Databases Topics History - RDBMS - SQL Architecture - SQL - NoSQL MongoDB, Mongoose Persistent Data Storage What features do we want in a persistent data storage system? We have been using text files to
More informationOpen Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria
Open Source Search Andreas Pesenhofer max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria max.recall information systems max.recall is a software and consulting company enabling
More informationBig Data It s not just for Google Any More
Big Data It s not just for Google Any More The Software and Compelling Economics of Big Data Computing EXECUTIVE SUMMARY Big Data holds out the promise of providing businesses with differentiated competitive
More informationParsing the request. Part 2 - Creating a filter
Parsing the request Part 2 - Creating a filter Last example about parsing created SQL We showed you in the last lecture (Parsing the request) how to write a small parser which created an SQL SELECT statement
More informationA Non-Relational Storage Analysis
A Non-Relational Storage Analysis Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Cloud Computing - 2nd semester 2012/2013 Universitat Politècnica de Catalunya Microblogging - big data?
More informationData 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.
17-18 March, 2018 Beijing Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020 Today, 80% of organizations
More informationServerless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services
Serverless Computing Redefining the Cloud Roger S. Barga, Ph.D. General Manager Amazon Web Services Technology Triggers Highly Recommended http://a16z.com/2016/12/16/the-end-of-cloud-computing/ Serverless
More informationBig Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline
More informationMaking Session Stores More Intelligent KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS
Making Session Stores More Intelligent KYLE J. DAVIS TECHNICAL MARKETING MANAGER REDIS LABS What is a session store? A session store is An chunk of data that is connected to one user of a service user
More informationOracle Big Data SQL brings SQL and Performance to Hadoop
Oracle Big Data SQL brings SQL and Performance to Hadoop Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data SQL, Hadoop, Big Data Appliance, SQL, Oracle, Performance, Smart Scan Introduction
More informationThe dialog boxes Import Database Schema, Import Hibernate Mappings and Import Entity EJBs are used to create annotated Java classes and persistence.
Schema Management In Hibernate Mapping Different Automatic schema generation with SchemaExport Managing the cache Implementing MultiTenantConnectionProvider using different connection pools, 16.3. Hibernate
More informationTour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect
Tour of Database Platforms as a Service June 2016 Warner Chaves Christo Kutrovsky Solutions Architect Bio Solutions Architect at Pythian Specialize high performance data processing and analytics 15 years
More informationMicroservices log gathering, processing and storing
Microservices log gathering, processing and storing Siim-Toomas Marran Univeristy of Tartu J.Liivi 2 Tartu, Estonia siimtoom@ut.ee ABSTRACT The aim of this work is to investigate and implement one of the
More informationOracle Essbase XOLAP and Teradata
Oracle Essbase XOLAP and Teradata Steve Kamyszek, Partner Integration Lab, Teradata Corporation 09.14 EB5844 ALLIANCE PARTNER Table of Contents 2 Scope 2 Overview 3 XOLAP Functional Summary 4 XOLAP in
More information"Web Age Speaks!" Webinar Series
"Web Age Speaks!" Webinar Series Java EE Patterns Revisited WebAgeSolutions.com 1 Introduction Bibhas Bhattacharya CTO bibhas@webagesolutions.com Web Age Solutions Premier provider of Java & Java EE training
More informationTools, tips, and strategies to optimize BEx query performance for SAP HANA
Tools, tips, and strategies to optimize BEx query performance for SAP HANA Pravin Gupta TekLink International Produced by Wellesley Information Services, LLC, publisher of SAPinsider. 2016 Wellesley Information
More informationNosDB vs DocumentDB. Comparison. For.NET and Java Applications. This document compares NosDB and DocumentDB. Read this comparison to:
NosDB vs DocumentDB Comparison For.NET and Java Applications NosDB 1.3 vs. DocumentDB v8.6 This document compares NosDB and DocumentDB. Read this comparison to: Understand NosDB and DocumentDB major feature
More informationBig Data Analytics using Apache Hadoop and Spark with Scala
Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important
More informationBest Practices for Choosing Content Reporting Tools and Datasources. Andrew Grohe Pentaho Director of Services Delivery, Hitachi Vantara
Best Practices for Choosing Content Reporting Tools and Datasources Andrew Grohe Pentaho Director of Services Delivery, Hitachi Vantara Agenda Discuss best practices for choosing content with Pentaho Business
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationARCHITECTURE ARCHITECTURE OVERVIEW
ARCHITECTURE ARCHITECTURE OVERVIEW The personalization of the customer experience is in every marketer s mind and this requirement has strong impacts on customer data integration, across channels and applications.
More informationSQL, Scaling, and What s Unique About PostgreSQL
SQL, Scaling, and What s Unique About PostgreSQL Ozgun Erdogan Citus Data XLDB May 2018 Punch Line 1. What is unique about PostgreSQL? The extension APIs 2. PostgreSQL extensions are a game changer for
More informationTurbocharge your MySQL analytics with ElasticSearch. Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe 2017
Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe 2017 About the Speaker Guillaume Lefranc Data Architect at Productsup
More informationWhat is database? Types and Examples
What is database? Types and Examples Visit our site for more information: www.examplanning.com Facebook Page: https://www.facebook.com/examplanning10/ Twitter: https://twitter.com/examplanning10 TABLE
More information