The Emerging Data Lake IT Strategy

Similar documents
Building a Data Strategy for a Digital World

Enterprise Semantic Technology

Fast Innovation requires Fast IT

Solving the Enterprise Data Dilemma

FINANCIAL REGULATORY REPORTING ACROSS AN EVOLVING SCHEMA

APPLYING KNOWLEDGE BASED AI TO MODERN DATA MANAGEMENT. Mani Keeran, CFA Gi Kim, CFA Preeti Sharma

Virtuoso Infotech Pvt. Ltd.

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Fujitsu World Tour 2018

Delivering a 360 o View in Healthcare and Life Sciences With Agile Data

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

Evaluating Cloud Databases for ecommerce Applications. What you need to grow your ecommerce business

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

ETL is No Longer King, Long Live SDD

ENTERPRISE DATA STRATEGY IN THE HEALTHCARE LANDSCAPE

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

AVOIDING SILOED DATA AND SILOED DATA MANAGEMENT

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

How Insurers are Realising the Promise of Big Data

Data Lakes and Their Implication on the Global Health Sector v2.0

Modernizing Healthcare IT for the Data-driven Cognitive Era Storage and Software-Defined Infrastructure

SIEM: Five Requirements that Solve the Bigger Business Issues

REGULATORY REPORTING FOR FINANCIAL SERVICES

SIEM Solutions from McAfee

Unified Governance for Amazon S3 Data Lakes

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Modernizing Business Intelligence and Analytics

Transforming IT: From Silos To Services

Metadata and the Rise of Big Data Governance: Active Open Source Initiatives. October 23, 2018

Informatica Enterprise Information Catalog

The Value of Data Modeling for the Data-Driven Enterprise

WEBMETHODS AGILITY FOR THE DIGITAL ENTERPRISE WEBMETHODS. What you can expect from webmethods

How to Evaluate a Next Generation Mobile Platform

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

Data Mining: Approach Towards The Accuracy Using Teradata!

Electronic Health Records with Cleveland Clinic and Oracle Semantic Technologies

Realizing the Full Potential of MDM 1

Capture Business Opportunities from Systems of Record and Systems of Innovation

NPP & Blockchain Have you thought about the data? Ken Krupa, CTO, MarkLogic

BRINGING DATA LINEAGE TO YOUR FINGERTIPS

Shine a Light on Dark Data with Vertica Flex Tables

Microsoft SharePoint Server 2013 Plan, Configure & Manage

MOBIUS + ARKIVY the enterprise solution for MIFID2 record keeping

Microsoft Developer Day

Harmonizing Multi-Model at the World Bank Group

Advances In Data Integration: The No ETL Approach. Marcos A. Campos, Principle Consultant, The Cognatic Group. capsenta.com. Sponsored by Capsenta

SD-WAN. Enabling the Enterprise to Overcome Barriers to Digital Transformation. An IDC InfoBrief Sponsored by Comcast

SDN meets the real world part two: SDN rewrites the WAN manual

I D C T E C H N O L O G Y S P O T L I G H T

Enabling Data Governance Leveraging Critical Data Elements

Intelligence for the connected world How European First-Movers Manage IoT Analytics Projects Successfully

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

SmartData Fabric distributed virtual data, graph data and master data management, analytics and security. Solutions and Key Features Revision 2.

MODERNIZE INFRASTRUCTURE

Transformation in Technology Barbara Duck Chief Information Officer. Investor Day 2018

VMware Cloud Operations Management Technology Consulting Services

Talend Spark Meetup. Edward Ost Talend

Fundamental Shift: A LOOK INSIDE THE RISING ROLE OF IT IN PHYSICAL ACCESS CONTROL

Cisco Digital Media System: Simply Compelling Communications

Transformation Through Innovation

Virtustream Managed Services Drive value from technology investments through IT management solutions. Tim Calahan, Manager Managed Services

When, Where & Why to Use NoSQL?

ODPi and Data Governance Free Your MetaData! October 10, 2018

Run the business. Not the risks.

in collaboration with

Improving Data Governance in Your Organization. Faire Co Regional Manger, Information Management Software, ASEAN

Ten Innovative Financial Services Applications Powered by Data Virtualization

BUILDING the VIRtUAL enterprise

ALIGNING CYBERSECURITY AND MISSION PLANNING WITH ADVANCED ANALYTICS AND HUMAN INSIGHT

PERSPECTIVE. Data Virtualization A Potential Antidote for Big Data Growing Pains. Abstract

The Value of Data Governance for the Data-Driven Enterprise

Cisco Start. IT solutions designed to propel your business

Investing in a Better Storage Environment:

Security and Performance advances with Oracle Big Data SQL

Data Governance for the Connected Enterprise

Information Workbench

Healthcare IT Modernization and the Adoption of Hybrid Cloud

E X E C U T I V E B R I E F

Preparing your network for the next wave of innovation

Accelerate Your Enterprise Private Cloud Initiative

Data Virtualization for Oil and Gas Companies

A Guide to Best Practices

CYBER SOLUTIONS & THREAT INTELLIGENCE

TECHNOLOGY BRIEF: CA ERWIN DATA PROFILER. Combining Data Profiling and Data Modeling for Better Data Quality

Executive brief Create a Better Way to Work: OpenText Release 16

Active Archive and the State of the Industry

Cloud Computing Private Cloud

Data Virtualization and the API Ecosystem

New Digital Business Models Driving the Softwarization of the Network

Introduction to Data Science

21ST century enterprise. HCL Technologies Presents. Roadmap for Data Center Transformation

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR

Defining a Data Strategy

Six Weeks to Security Operations The AMP Story. Mike Byrne Cyber Security AMP

Overview of Data Services and Streaming Data Solution with Azure

Drawing the Big Picture

Transcription:

The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments bit.ly/datalake SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin, Founder and CTO Cambridge Semantics

We re living in an amazing world of information sharing, connecting with family, neighbors, vendors, and customers all over the world 2

Telling the world about what we like and don t like #HIMYMfinale @MLB is now following Cognizant Technology Solutions and Cambridge Semantics 3

What we re doing and how we re succeeding 4

We re deciding what advertising that we want to see and what we don t Unsubscribe Influencing how business and customers engage 5

Many businesses have emerged that embrace this model of customer engagement 10 million stays in 2013, without owning a hotel Grew to nearly $75B in annual retail revenue in 2013, without opening a storefront Shares over 40 million photos each day and we ve said Goodbye to businesses that didn t 6

Retail Engaging in a more personalized shopping experience, retailers are building a stronger relationship with each customer 7

Customer Service Delivering a positive and successful experience for each customer 8

Life Sciences and Healthcare Combining health, genetic, clinical, and public sciences data to bring effective therapies to patients sooner 9

Financial Services Delivering innovative products and services, based on a 360 view of the Customer, across all business lines, engaging all available data assets, internal and external 10

The Challenges That We're Addressing Onboarding and Integrating Data is Slow and Expensive Transforming data from a growing variety of technologies Custom coded ETL Existing ETL processes are not reusable Optimization for analytics is time-consuming and costly Often wait until there is a defined need for a set of data, delaying benefits realization while waiting to onboard the data Data Provenance is Often Poorly Recorded Data meaning is lost in translation Data transformations tracked in spreadsheets Post-onboarding, maintenance and analysis cost for onboarded data is high Recreating data lineage is manual, time-consuming, and error-prone 11

The Challenges That We're Addressing Target Data is Difficult to Consume Optimization favors known analytics, but not well suited to new requirements A one-size-fits-all canonical view is used rather than fit-for-purpose views Or, lacks a conceptual model to easily consume the target data Difficult to identify what data is available, how to get access, and how to integrate the data to answer a question Industrializing the Big Data Environment is Difficult to Manage Proliferation of data silos leads to inconsistency/syncing issues Conflicting objectives of opening access to data assets while managing security and privacy requirements Velocity of business change rapidly invalidate data organization and analytics optimizations Managing the integration/interaction with the multiple data management technologies that make up the Big Data environment 12

The Data Lake is made up of four key components Data Lake Management Data Ingestion Data Management Query Management Delivering Low Cost, High Performance Storage Flexible, Easy-to-Use Data Organization Performance-Optimized Analytics Automation of most manual Development and Query Activities Self-Service End-User Features Intelligent Processing 13

Data Ingestion Data Sources Desktop and Mobile Social Media and Cloud Operational Systems Linked Data Data Ingestion Model- Driven Semantic Tagging On-Demand Query Streaming Scheduled Batch Load Data Lake Management Data Management Query Management Internet of Things IoT Self-Service 14

Data Management Data Sources Desktop and Mobile Social Media and Cloud Data Ingestion Model- Driven Semantic Tagging Data Lake Management Data Management Data Movement NoSQL In Memory Provenance Map Reduce Query Management Operational Systems On-Demand Query Streaming Columnar Graph Semantic Linked Data Internet of Things IoT Scheduled Batch Load Self-Service HDFS Storage Structured and Unstructured Data 15

Data Lake Management Data Governance Focus on Shared Data Standard Models Controlled Vocabulary Data Sources Common Definitions Standards-based Data Views (FIBO, CDISC/RDF) Desktop and Mobile Social Media and Cloud Data Mappings Source-to-Target Transformations Models Data Ingestion Business-Focused Business Unit Data Organization and Terms Optimized to Assist Analytics Model- Driven Semantic Tagging (ontologies, Data Lake Management Data Assets Catalog Data Management Data Movement NoSQL Internal and External Data Assets Defined Data Orgs taxonomies, thesauri) In Memory Processes Schedules Provenance Capture Provenance Map Reduce Workflow Monitoring Monitor and Manage Data Lake Operations Access Management Query Management Authorization and Access Rules Rule-based Security Group, Role, and User Level Authorization Auditable Access Operational Systems On-Demand Query Streaming Columnar Graph Semantic Linked Data Internet of Things IoT Scheduled Batch Load Self-Service HDFS Storage Structured and Unstructured Data 16

Query Management Data Sources Desktop and Mobile Social Media and Cloud Operational Systems Linked Data Internet of Things IoT Data Ingestion Model- Driven Semantic Tagging On-Demand Query Streaming Scheduled Batch Load Self-Service Data Lake Management Data Management Data Movement NoSQL In Memory Columnar Graph Semantic Provenance Map Reduce HDFS Storage Structured and Unstructured Data Query Management Semantic Search Data Discovery Analytics Directed to the Best Query Engine Capture and Share Analytics Expertise Query Data, Metadata, and Provenance 17

Semantic Technology Delivers Smart Data Integrates a network of internal and external data assets, insulating end users from the details of the underlying technologies Captures expertise (logic, inferencing) and integrates it with the data, delivering smart data to non-expert users Manages a comprehensive inventory of the data assets Secures access to the right data assets by the right users 18

Key W3C Standards in Semantic Technology Resource Description Framework (RDF) Framework for storing and integrating data and data definitions in the form of subjectpredicate-object expressions, or triples. Relationships are organized in a logical graph model. Reduced development time and cost; faster time-tobusiness value. Web Ontology Language (OWL) An ontology is a comprehensive model of data definitions and relationships that is human- and machine-readable. Ontologies are inheritable and extensible. Improved application quality, flexible iterative / investigative approach, easily adapts to business change. SPARQL Query Language SQL-like query language for semantic data that can leverage the ontological relationships and constructs to execute smarter queries. Access multiple internal and external databases simultaneously in a single query. Access and integrate data across business silos. Inference Reasoning over data through business rules. Expertise is captured and embedded in the ontology model, accessible through user queries. This is the smart in Smart Data. Easier end user access to expertise; intelligent systems capabilities. Linked Data Connects data contained in different databases, allowing queries to find, share and combine data so insights can be identified across the Web. Connect disparate databases to navigate and integrate data regardless of location or technology platform. RDB to RDF Mapping Language (R2RML) Preserving current investments in relational technology, R2RML maps relational data to an ontology. SPARQL can query RDF and relational databases simultaneously. Low cost of entry to use Semantic Technology to deliver high-value solutions 19

The Common Model is the Data Glue Source Systems Lead (SFA system) Quote (Quote system) Order (OMS system) Contract (CMS system) Common Model ( Data Glue ) Different business entities in physical systems actually share many of the same concepts, meanings, and relationships Semantic data science exposes common business concepts and connects them with their physical expression in production systems Data is glued together by its business meaning, rather than physical structures dictated by the underlying technologies The conceptual model can be directly used by both business and IT users to operationalize data services, understand the data landscape, track data lineage, and conduct downstream analytics. 20

Semantic Models Relate Data by Business Meaning Life Style Life Events Personal Network Music Customer Entertainment Preferences Interests Purchasing Profession 21

Implications to the Existing IT Architecture and Practices Manages Secure Access Extends Existing Investments in IT Architecture Self-Service Data Feeds and Analytics Easier Access to External Data Reduction of Data Mart Silos User Tools to Discover and Optimize Data Relationships Structured and Unstructured Data, Voice, and Video Builds Out Enterprise Data Models, with Integration Hub Capabilities Infrastructure Capacity Elasticity Data Analysis Automation 22

Data Lake Approach to Meeting Business Needs Business Needs Onboard New Data Connect External Data Integrate Data between Business Units or Business Partners Capture and Embed Expertise Traditional Technologies and Practices Comprehensive analysis creates rigid structure that is difficult to change, or Minimal definition of data organization requires detailed understanding of data contents External data is collected and loaded into the analytics repository. Data is streamed, or is refreshed on a scheduled frequency. Governance activities establish common vocabulary, and data definitions Shared data is copied to an integrated database. Organization-specific definitions may require duplicating certain data in marts Expertise often captured in the reporting and analytics; change management challenge when updates required. Data Lake Technologies and Practices Flexible data model can be revised or extended without redesign of the database Agile, evolutionary refinement of the data organization, leveraging new insights as users work with the data External data can be sourced from databases, spreadsheets, Web pages, news feeds, and more; data is queried through common methods, without regard to location, with real-time values delivered at query time. And, systems of record publish existing data specifications or ontology model; each organization defines data in a manner that is best suited for its business. Federation and virtualization features provide choices in which data to copy and which data to retain in the system(s) of record All models can be supported through a single copy of the data, maintained in the data lake or system of record. Expertise captured in the data definitions; single, shared definition minimizes change management efforts 23

Lessons learned from early adopters Prioritize Onboard Connect Load Organize Customize Search Secure Prioritize data onboarding by the data s ability to contribute to customer engagement Onboard data assets as they become available Connect to available internal and external data assets Load the data unfiltered/untransformed Use models to provide organization to the data Create models that are tailored to the needs of the business groups Make it easy to find data Manage security and privacy, but make it easy to authorize access to data that users need 24

Addressing Challenges - Privacy vs Personal Value - Granularity of customer understanding - Delivering strategic objectives when projects tend to have a technical focus - Opening access to data - Need for executive sponsorship - Access to external data - Establishing firewalls - Persistent, pervasive data quality issues 25

Clues to better customer engagement will be found in the ever-growing volume of data that we re creating 26

A Data Lake Strategy helps you to create a personalized, engaging experience with each customer Visibility Provenance Agile Internet Scale Open, yet Secure Smart Self-Service Universal Data Access Adaptable 27

Questions? 28

Thank you! 29