MarkLogic Technology Briefing

Similar documents
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

MarkLogic 8 Overview of Key Features COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

VOLTDB + HP VERTICA. page

Study Guide. MarkLogic Professional Certification. Taking a Written Exam. General Preparation. Developer Written Exam Guide

5 Fundamental Strategies for Building a Data-centered Data Center

Building a Data Strategy for a Digital World

BEYOND THE RDBMS: WORKING WITH RELATIONAL DATA IN MARKLOGIC

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

How Insurers are Realising the Promise of Big Data

Scott Meder Senior Regional Sales Manager

A Single Source of Truth

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

IBM Data Replication for Big Data

MAPR TECHNOLOGIES, INC. TECHNICAL BRIEF APRIL 2017 MAPR SNAPSHOTS

From Data Challenge to Data Opportunity

The Technology of the Business Data Lake. Appendix

Modern Data Warehouse The New Approach to Azure BI

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

Data Analytics at Logitech Snowflake + Tableau = #Winning

<Insert Picture Here> Introduction to Big Data Technology

Lambda Architecture for Batch and Stream Processing. October 2018

5/24/ MVP SQL Server: Architecture since 2010 MCT since 2001 Consultant and trainer since 1992

Implementing a Big Data Strategy PRASA Passenger Rail Agency of South Africa

Modern Stream Processing with Apache Flink

Microsoft SQL Server

RA-GRS, 130 replication support, ZRS, 130

Acquiring Big Data to Realize Business Value

Challenges for Data Driven Systems

WHITEPAPER. MemSQL Enterprise Feature List

Introduction to Big-Data

The Power of Snapshots Stateful Stream Processing with Apache Flink

SQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.

Sub Meter Data Import & Storage Platform RFP Questions/Answers

REGULATORY REPORTING FOR FINANCIAL SERVICES

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Schema-Agnostic Indexing with Azure Document DB

Security and Performance advances with Oracle Big Data SQL

Convergence and Collaboration: Transforming Business Process and Workflows

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

Oracle Database 18c and Autonomous Database

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Distributed File Systems II

HAWQ: A Massively Parallel Processing SQL Engine in Hadoop

April Copyright 2013 Cloudera Inc. All rights reserved.

Technical Sheet NITRODB Time-Series Database

Safe Harbor Statement

Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development

In-Memory Data Management

Cisco Tetration Analytics Platform: A Dive into Blazing Fast Deep Storage

Hortonworks DataFlow Sam Lachterman Solutions Engineer

MarkLogic Server. Database Replication Guide. MarkLogic 9 May, Copyright 2017 MarkLogic Corporation. All rights reserved.

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

HYBRID TRANSACTION/ANALYTICAL PROCESSING COLIN MACNAUGHTON

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Flash Storage Complementing a Data Lake for Real-Time Insight

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

Achieving Horizontal Scalability. Alain Houf Sales Engineer

NPP & Blockchain Have you thought about the data? Ken Krupa, CTO, MarkLogic

Oracle Big Data Connectors

DocAve 6 Software Platform Service Pack 1

Rickard Linck Client Technical Professional Core Database and Lifecycle Management Common Analytic Engine Cloud Data Servers On-Premise Data Servers

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Microsoft Big Data and Hadoop

Craig Blitz Oracle Coherence Product Management

MarkLogic Server. Database Replication Guide. MarkLogic 6 September, Copyright 2012 MarkLogic Corporation. All rights reserved.

SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

AnyMiner 3.0, Real-time Big Data Analysis Solution for Everything Data Analysis. Mar 25, TmaxSoft Co., Ltd. All Rights Reserved.

Cloud Analytics and Business Intelligence on AWS

Delivering a 360 o View in Healthcare and Life Sciences With Agile Data

What is MarkLogic Server? An overview

Part 1: Indexes for Big Data

Data Movement & Tiering with DMF 7

Map-Reduce. Marco Mura 2010 March, 31th

SQL Server New innovations. Ivan Kosyakov. Technical Architect, Ph.D., Microsoft Technology Center, New York

Storage for HPC, HPDA and Machine Learning (ML)

<Insert Picture Here> Value of TimesTen Oracle TimesTen Product Overview

MarkLogic Server. Monitoring MarkLogic Guide. MarkLogic 9 May, Copyright 2017 MarkLogic Corporation. All rights reserved.

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

Microsoft SQL Server Database Administration

Data Management Glossary

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

RealTime. RealTime. Real risks. Data recovery now possible in minutes, not hours or days. A Vyant Technologies Product. Situation Analysis

Automating Information Lifecycle Management with

MySQL Cluster for Real Time, HA Services

Approaching the Petabyte Analytic Database: What I learned

MarkLogic Server. Administrator s Guide. MarkLogic 9 May, Copyright 2017 MarkLogic Corporation. All rights reserved.

Microsoft SQL Server Fix Pack 15. Reference IBM

LazyBase: Trading freshness and performance in a scalable database

Solution Brief. Bridging the Infrastructure Gap for Unstructured Data with Object Storage. 89 Fifth Avenue, 7th Floor. New York, NY 10003

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Transcription:

MarkLogic Technology Briefing Edd Patterson CTO/VP Systems Engineering, Americas Slide 1

Agenda Introductions About MarkLogic MarkLogic Server Deep Dive Slide 2

MarkLogic Overview Company Highlights Headquartered in Silicon Valley Founded in 2001 40% CAGR revenue growth Privately held Patented, award-winning technology Slide 3 1

Select Customers Financial Services and Other Customers Government Customers Media Customers Slide 4

Inside MarkLogic Server Slide 5

MarkLogic Powers the World s Big Data Applications Use all your data to make your organization smarter Analyze a wide variety of structured, unstructured and semistructured data together to gain actionable insights Bring these insights into operational business processes via realtime Big Data Applications A unified Big Data platform for both analytics and applications Any data any volume any structure real-time. E.g. Derivatives contracts, customer records, social media, medical records, intelligence assets, journal articles, etc. Slide 6 Copyright 2010 2012 MarkLogic Corporation. All rights reserved.

Elements of a Big Data Platform Tools / APIs Visualization Data Mining / Analytics Event Processing Metadata Search Analytic DB Operational DB Unstructured Content Ingest / Batch Analytics / Enrichment Archive / Warm Long Tail Data Store Slide 7

How is this usually implemented? BI Tools Applications Stitched together from multiple technologies: Stats (SPSS, SAS, R, ) Stream / Event Processing Search Search Index Each line is an opportunity for latency, ETL bugs, etc. Each component is managed, scaled, supported, etc. separately Analytic DB Operational DB Metadata Unstructured Content Store Applications interface with many technologies, sometimes managed by different groups Bottom Line: Batch Analytics (Hadoop MR) Archive (HDFS) Data governance is compromised Impossible to react in real time Agility is lost

MarkLogic Unified Platform for Big Data BI Tools Applications Stats (SPSS, SAS, R, ) Analytic DB Stream / Event Processing Operational DB Metadata Search Unstructured Content Store MarkLogic Server is An operational DBMS An analytic DBMS An unstructured DBMS A search engine An event processor all in one technology Search Index Batch Analytics (Hadoop MR) Archive (HDFS)

How MarkLogic Server Works Schema-Agnostic Design Slide 10

Data Model MarkLogic Server is a document-centric database Supports any-structured data via hierarchical (XML) data model Document fpml Title Author Trade Product First Last Section Metadata Cashflow Amount ID TradeLeg TradeLeg Trade ID TradeLeg Section Section Section Section Event Event Event Event Slide 11

MarkLogic is Schema Agnostic XML is self-describing <article> <title>marklogic Server:...</title> <author> <first-name>dale</first-name> <last-name>kim</last-name> </author> <abstract>.... <company>mark Logic</company> </abstract> <body> <section> <section>...</section> </section> <section>... index... </section> </body> <copyright>copyright... </copyright> </article> Slide 12

MarkLogic is Schema Agnostic XML is self-describing <article> <article> <title>marklogic Server:....</title> <author> <first-name>dale</first-name> <last-name>kim</last-name> </author> <author> <title> <abstract> <abstract>.... <company>marklogic</company> "MarkLogic Server:..." </abstract> <body> <section> <first-name> "... " <company> "... " <section>...</section> </section> <last-name> <section>.... index.... </section> "Dale" "MarkLogic" </body> <copyright>copyright "Kim"... </copyright> Slide 13 </article> No Schema Needed! <body> <copyright> <section> <section> <section>... " "... index... " "... "

How MarkLogic Server Works Indexing and Query Slide 14

Universal Index UNIVERSAL INDEX Term data base data base STEM be STEM data be Term List 123, 127, 129, 152, 344, 791... 122, 125, 126, 129, 130, 167... 123, 126, 130, 142, 143, 167... 123, 130, 131, 135, 162, 177... 126, 130, 167, 212, 219, 377... Document References 126, 130, 167, <article>... <article>/<abstract>... <section>/<product>... <product>ims</product>... <title> contains "data"... Collection(Red)... Role:Editor + Action:Read... MarkLogic indexes Words Phrases Stemming Structure Values Collections Security Permissions Slide 15

Collections and Security Directories Exclusive, hierarchical, analogous to file system, based on URI Collections Set-based, N:N relationship Security Invisible to your app Slide 16

Scalars How many of the articles that contain data base were written in each of the last 5 decades? UNIVERSAL INDEX data base data base STEM be STEM data be <article> <article>/<abstract> 123, 127, 129, 152, 344, 791... 122, 125, 126, 129, 130, 167... 123, 126, 130, 142, 143, 167... 123, 130, 131, 135, 162, 177... 126, 130, 167, 212, 219, 377......... Document References 126, 130, 167, YEAR <section>/<product>... <product>ims</product>... Volume <title> contains "data"... Collection(Red)... Role:Editor + Action:Read... Slide 17

Range Indexes Maps document ids to values, and values to document ids In a compact memory representation DOC ID VALUE VALUE DOC ID 1 2009 2002 3 3 2002 2003 10 4 2007 2004 5 5 2004 2004 11 8 2011 2007 4 10 2003 2007 17 11 2004 2009 1 17 2007 2011 8... Slide 18

Geospatial Index: A 2-Dimensional Range Index Built-in support for: Point Box Circle Polygon Complex Polygon Polygon Intersection Polygon Containment Fully composable with all other indexes! Slide 19

How MarkLogic Server Works Event Processing Slide 20

Reverse Indexes (Alerting) 1. Load serialized queries as query documents 2. For a given data document, find all queries that match Can provide real-time alerts during loads With no significant performance impact! Can let documents store values as "ranges" Documents about cities self-defining their geo boundaries Person documents defining birthdays as ranges, sequences Can power classifiers and "matchmaker" queries Slide 21

How MarkLogic Server Works Scale-out Slide 22

Databases Scale Out Database of documents Stored in partitions Database Partition 1 Partition 2 Partition 3 Slide 23

Shared-Nothing Architecture E-Node E-Node E-Node D-Node 1 D-Node 2 D-Node 3 D-Node k Forest 1 Forest 2 Forest 3 Forest 4 Forest m Slide 24

How MarkLogic Server Works Analytics Slide 25

Range Indexes: A Built-In In-Memory Column Store Maps document ids to values, and values to document ids In a compact memory representation DOC ID VALUE VALUE DOC ID 1 2009 2002 3 3 2002 2003 10 4 5 8 10 11 2007 2004 2011 2003 2004 2004 2007 2007 5 11 4 17 2004 2009 1 Range Indexes are equivalent to a built-in in-memory column store 17 2007 2011 8... Slide 26

Scalar Queries and Aggregation Slide 27

In-Database MapReduce E-Node start encode decode reduce finish decode map reduce encode D-Node 1 D-Node 2 D-Node 3 D-Node k Forest 1 Forest 2 Forest 3 Forest 4 Forest m Slide 28

Hadoop MapReduce via Bi-Directional Hadoop Connector Raw Data Hadoop? Intermediate Intelligence 3 1 Operational Applications Bulk Loading Progressive 2 Enhancement MarkLogic + Connector for Hadoop Slide 29

Co-Occurrence Slide 30

SQL and BI Tools ODBC SQL Range Indexes Slide 31

How MarkLogic Server Works Transactions Slide 32

MVCC /articles/codd.xml /articles/codd.xml Document Document Title First Author Last Metadata Section Title First Author Last Metadata Section Year Section Section Section Section Section Section Section Section Section Section 523 628 628 c d Creation Timestamp Deleted Timestamp Timestamps can be: Increasing integers (Before MarkLogic 5) Increasing wall time (Starting with MarkLogic 5) Slide 33

MVCC Benefits /articles/codd.xml Very High Throughput Read queries don t require locks Queries and updates do not conflict Title First Document Author Metadata Section Last Year ACID Transactions Internal 2-phase commit between hosts (forest partitions) Section Section Section Section Section 628 Zero-latency between ingestion and indexing Slide 34

The Four Forest Operations Create a new document Into the in-memory stand buffer Mark a document as expired A memory-mapped timestamps document per stand Write buffer out to disk (checkpoint) Our buffers are 100s of megabytes For performance, double buffer Merge A background process Optimization: reduces number of stands in forest Slide 35

Consistency And Throughput 2-phase commit Transactions span forests Recovery Forest Journals Lock-free read queries Query at a point-in-time Repeatable reads Increased throughput Time travel (and near-instant DB rollback) Slide 36

HA/DR Features of MarkLogic Feature Function Benefit Use Case Database backup/restore Make a backup of your database, then restore it Recover from complete data loss Disaster Recovery Journal Archiving/Point-In-Time Recovery Snapshot backup Make a continuous backup; restore to a point in time, or to the point of failure Recover from complete data loss; recover all your data, or to just before a Bad Event Very fast backup using mirrored disk Recover from complete data loss; take a backup in seconds Disaster Recovery Disaster Recovery High Availability Database rollback Roll back to a point in time before a Bad Thing happened Recover in seconds from human error or a rogue application Disaster Recovery High Availability Automatic Failover Using Shared-Disk Local-Disk If a node fails, automatically failover to another node Recover from failure of a data node in a cluster High Availability Flexible Replication (part of Replication option) Maintain a hot copy of (part of) a database in another data center Move parts of a database, parts of documents, closer to users for improved performance Information Sharing Database Replication (part of Replication option) Maintain a hot copy of a database in another data center Recover from loss of a Data Center Disaster Recovery High Availability Distributed Transactions Slide 37 XA support for transactions that go across MarkLogic and other repositories Copyright 2011 2012 MarkLogic Corporation. All rights All rights reserved. reserved. Keep an exact (synchronous) copy of your data in more than one place Disaster Recovery High Availability Information Sharing

OTC: Derivatives and Exotics Repository for Derivatives and other exotic products (trades, options, swaps, etc). Key Requirements - Native JSON support - Real time queries on semi-structured dat. - 7 year retention - Replication -BAR Slide 38 Customers in Production JP Morgan Chase - Derivatives Core Processing Platform enables risk management for $78 Trillion dollars in derivatives Relevant Features Native JSON Support Value based Lookups Tiered Storage Fine Grained Partition Management Enterprise Class Backup and Restore Features Replication Clustering

Equity Risk Systems Currently using traditional RDMS servers and file systems to store intraday and time series data. Customers in Production Where MarkLogic is Providing Similar Solutions Requirements - Scale out on commodity servers. - Ease of data modeling for BLOB s (unstructured data) -Handle Complex Data Slide 39 Relevant Features Schema Agnostic w/ Optional Validation Bi-temporal API User Defined Functions Linear Scalabiluty on commodity hardware Clustering Binary Support Fast native XPath Result and Data Caching

Document Modeling Enterprise Data Group (EDG) is revamping the workflow for the creation and management of the negotiated documents. Instead of capturing the end image and some metadata, we are modeling the creation of the document through templates, xml or similar documents. The business groups need to dynamically change the agreements, and constantly add new information to meet the Slide 40 day to day needs. C tl k Customers in Production MorganStanley Citi Docgenix JetBlue LexisNexis Relevant Features Native support for XML, JSON and Binary Schema Agnostic w/ Optional Validation ACID Compliant CRUD Value based lookups Document Library Services Search Indexing LDAP and Kerberos Integration Clustering Range Indexes

Log Analysis Enable analysis of system and user logs to evaluate user behavior and provide BI. Automatically capture and analyze logs from many sources.(web, DB, DW) Perform correlation of events and performance within time-slices Normalize and enhance data with metadata from applications. Analytics pipeline to compute agregates and statistical data. Dashboards for each source High Data Volumes 1TB/day for 45 days. ~45TB of total data 20M Requests/day Customers in Production Bank of America Enabled Bank of America to map their internal reference data architecture through log analysis Relevant Features Schema Agnostic Data Model Hadoop Integration BI Tool Integration Range Indexes User Defined Functions Transformation Capabilities Processing Framework Visualization Framework Application Server Linear Scalability Slide 41

In Conclusion Slide 42

MarkLogic Server is An operational DBMS with MVCC-based transaction model, with high throughput An analytic DBMS with in-memory column store with in-database MapReduce An unstructured DBMS with XML data model and ad-hoc schema A high-performance search engine with transactional universal index An event processor with serialized queries and alerting A unified Big Data platform Slide 43

Questions?? Slide 44