Lily 2.4 What s New Product Release Notes

Similar documents
The Technology of the Business Data Lake. Appendix

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

SOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Microsoft Big Data and Hadoop

Data Lake Based Systems that Work

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

Data Storage Infrastructure at Facebook

Hortonworks DataPlane Service

Enterprise Data Catalog for Microsoft Azure Tutorial

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018

Big Data Hadoop Stack

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Instructor : Dr. Sunnie Chung. Independent Study Spring Pentaho. 1 P a g e

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Oracle Data Integrator 12c: Integration and Administration

FEATURES BENEFITS SUPPORTED PLATFORMS. Reduce costs associated with testing data projects. Expedite time to market

Getting Started With Intellicus. Version: 7.3

Importing and Exporting Data Between Hadoop and MySQL

Oracle Data Integrator 12c: Integration and Administration

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Innovatus Technologies

Configuring and Deploying Hadoop Cluster Deployment Templates

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Apache HAWQ (incubating)

Informatica Enterprise Information Catalog

Dell In-Memory Appliance for Cloudera Enterprise

Stages of Data Processing

An Introduction to Big Data Formats

Oracle Big Data Connectors

Certified Big Data Hadoop and Spark Scala Course Curriculum

Towards a Real- time Processing Pipeline: Running Apache Flink on AWS

Shark: Hive (SQL) on Spark

BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation

Oracle Big Data Fundamentals Ed 2

Hyperion Interactive Reporting Reports & Dashboards Essentials

Cloudera Manager Quick Start Guide

sqoop Easy, parallel database import/export Aaron Kimball Cloudera Inc. June 8, 2010

Cloudera Introduction

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

MapR Enterprise Hadoop

Product Compatibility Matrix

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Certified Big Data and Hadoop Course Curriculum

Talend Open Studio for MDM Web User Interface. User Guide 5.6.2

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Security and Performance advances with Oracle Big Data SQL

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Report on The Infrastructure for Implementing the Mobile Technologies for Data Collection in Egypt

1Z Oracle Business Intelligence (OBI) Foundation Suite 11g Essentials Exam Summary Syllabus Questions

Enabling Secure Hadoop Environments

Cloudera Introduction

Hadoop. Introduction / Overview

A Review Approach for Big Data and Hadoop Technology

Shark: Hive (SQL) on Spark

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Modern Data Warehouse The New Approach to Azure BI

EsgynDB Enterprise 2.0 Platform Reference Architecture

Big Data Architect.

Curriculum Guide. ThingWorx

Getting Started with Intellicus. Version: 16.0

Big Data Hadoop Course Content

Big Data with Hadoop Ecosystem

Oracle Enterprise Data Quality - Roadmap

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Integrating Big Data with Oracle Data Integrator 12c ( )

Cloud Computing & Visualization

COURSE 20466D: IMPLEMENTING DATA MODELS AND REPORTS WITH MICROSOFT SQL SERVER

How to choose the right approach to analytics and reporting

Data Access 3. Managing Apache Hive. Date of Publish:

Hadoop An Overview. - Socrates CCDH

Microsoft SharePoint Server 2013 Plan, Configure & Manage

Hive SQL over Hadoop

Chase Wu New Jersey Institute of Technology

Introduction to Big-Data

This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.

Hive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Course Contents: 1 Business Objects Online Training

C_HANAIMP142

OSR Administration 3.7 User Guide. Updated:

Information empowerment for your evolving data ecosystem

DATA SCIENCE USING SPARK: AN INTRODUCTION

SAS. Information Map Studio 3.1: Creating Your First Information Map

Microsoft Exam

Hadoop Overview. Lars George Director EMEA Services

April Copyright 2013 Cloudera Inc. All rights reserved.

Oracle Big Data Fundamentals Ed 1

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

What does SAS Data Management do? For whom is SAS Data Management designed? Key Benefits

Oracle BDA: Working With Mammoth - 1

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

Planning and Administering SharePoint 2016

Transcription:

Lily 2.4 What s New Product Release Notes

WHAT S NEW IN LILY 2.4 2 Table of Contents Table of Contents... 2 Purpose and Overview of this Document... 3 Product Overview... 4 General... 5 Prerequisites... 5 The Lily Data Repository... 6 Side Effect Processor... 6 Multi-Repository Support... 6 Metadata... 6 Record Type Inheritance... 5 HDFS Name Node Support... 8 The Lily Customer Database... 9 Foundations... 9 Master Records... 9 Hive Integration... 9 Interaction Summary Attribute Calculation Engine... 10 Lily Customer Database Explorer... 10 User Interface... 11 Item Explorer... 13 Lily Customer Intelligence Applications... 14 Rules - Based Recommendation Strategies... 14 Knowledge Based Recommendations... 14 Provisioning... 15

WHAT S NEW IN LILY 2.4 3 Purpose and Overview of this Document This document serves as a comprehensive, Customer-oriented overview of all feature additions, changes and enhancements available in Lily Release 2.4. Lily 2.4 has been released the 22nd of July 2013 with the main release focus being updates to the Lily Customer Database and the Lily Data Repository.

WHAT S NEW IN LILY 2.4 4 Product Overview Lily, the Customer Intelligence Platform, Release 2.4 is composed of the following products: Lily Data Repository (DR) Lily Customer Database (CDB) Lily Customer Intelligence Applications (CA) Lily Enterprise tools: o Cluster installation o ETL tool connectivity o Hive connectivity The Lily Customer Database relies upon the Data Repository, and the Customer Applications on the Customer Database.

WHAT S NEW IN LILY 2.4 5 General Prerequisites Lily 2.4 has been certified against Cloudera s Distribution Including Apache Hadoop (CDH4) and therefore requires the following: CDH4.2 o Supported Operating Systems: http://www.cloudera.com/content/cloudera-content/cloudera- docs/cdh4/4.2.0/cdh4-requirements-and-supported- Versions/cdhrsv_topic_1.html o Supported Databases: http://www.cloudera.com/content/cloudera- content/cloudera-docs/cdh4/4.2.0/cdh4-requirements-and- Supported-Versions/cdhrsv_topic_2.html o Support JDK: http://www.cloudera.com/content/cloudera- content/cloudera-docs/cdh4/4.2.0/cdh4-requirements-and- Supported-Versions/cdhrsv_topic_3.html

WHAT S NEW IN LILY 2.4 6 The Lily Data Repository With Release 2.4 of the Lily Data Repository a number of key enhancements and expansion of functionality has been included. By delivering these enhancements NGDATA has strengthened and expanded the foundation of the Lily Data Repository, for all users and continuing to provide the best-in-breed solution for all clients. Side Effect Processor A new component, the Side Effect Processor (SEP) has been developed for the Data Repository allowing for greater processing efficiency of asynchronous update operations The SEP sits at the core of Lily s triggering and indexing mechanism and replaces parts of the existing RowLog engine. With this enhancement, Lily is three times faster with indexing switched off, and five times faster when using indexing. Also, maintenance of multiple indexes at the same time bears no additional performance impact. The SEP is also being used for computing aggregates and maintaining additional data indexes in the Customer Database. Multi-Repository Support With this release, records retained within the Lily Data Repository can be stored across multiple Lily Repositories within HBase, thereby supporting the logical segregation of data providing additional performance gains and data security. It is possible to host multiple data repositories in one shared installation, e.g. to operate a shared Lily for multiple departments. Metadata Lily has been enhanced with this release to support field level metadata. This metadata may be comprised of key / value pairs where key is a string and a value is a simple type, such as String, Integer, Long, Float, Double, Boolean, or byte. Providing this functionality to the Lily Data Repository allows Users to identify information such as Source, Creator, Timestamp, Quality, etc. without having to predefine this field in the schema and therefore making the addition of this data by an application simpler. Amongst others, field metadata may be utilized to store access control information.

WHAT S NEW IN LILY 2.4 7 Record Type Inheritance When creating a schema definition the Lily Data Repository functionality now supports the concept of Supertypes. This allows a User to define a base record type and then Inherit the structure of that base type into another record type that then extends it. As an example: recordtypes: [ { name: person, fields : [ { name: name, mandatory: true }, { name: address, mandatory: true} ] }, { name: Customer, fields: [ { name: Customer Number, mandatory: true } ], supertypes: [ { name: person } ] }, { name: employee, fields: [ { name: employeenumber, mandatory: true } ], supertypes: [ { name: person } ] } ]

WHAT S NEW IN LILY 2.4 8 In this example, the Customer record will have fields Name, Address, and Customer Number while the Employee record will have Name, Address, and Employee Number. By utilizing the recordtype inheritance, schema design has become much more elegant. HDFS Name Node Support Lily 2.4 Enterprise ships with deployment configuration and instructions to support high-availability of the Hadoop HDFS Name Node service. This allows enterprise clients to run Hadoop in a fault-tolerant configuration and be able to maintain system uptime in the event of server failure.

WHAT S NEW IN LILY 2.4 9 The Lily Customer Database Foundations At the foundation of the Lily Customer Database, with this release the two enhancements are being delivered: Interaction Timestamps o Lily will add a Timestamp to all Interactions, when the Interaction Timestamp has not been specified on the input data, thereby ensuring the Timestamp field will always contain a value Automatic Creation of Item or Customer o Lily will add a new Item and/or Customer record when an Interaction is logged for a non-existent Item and/or Customer ensuring 100% of the Interactions are prepared for further analysis Master Records An enhancement to the Lily Customer Database allows Users to specify that Customer records from multiple sources are actually for the same Customer. When this occurs, a Master Customer record is created to combine all of the data from the different sources. Each Source Record is kept intact for auditing purposes, but a Master Record is updated to reflect a combination of the most recent data. This is beneficial because it allows Lily Users to get a single view of their Customers in a central place based on data from different systems. Hive Integration Expanding upon the existing Lily Hive integration, a number of enhancements and added features have been developed and included in Release 2.4, which are highlighted below: Generate SQL like Queries called HiveQL for searching data in the Customer Database through a wizard on the Lily User Interface o This User Interface allows a User to generate a Hive query that which can be run offline o This allows the query to be copied into and utilized inside popular BI or data exploration tools supporting Hive such as Tableau, Toad4Cloud,

WHAT S NEW IN LILY 2.4 1 0 Search data in Lily via Hive using field level metadata Store data set statistics in the HDFS layer of Lily By expanding the Hive integration NGDATA continues to improve the ability of Users to do analysis on the data that is stored in Lily. Interaction Summary Attribute Calculation Engine The Lily Customer Database has been updated with an engine for both real-time & batch calculation of Summary Attributes that can be defined by the Client, such as: amount of visits, average amount spent, time on website, etc. Lily supports the following basic aggregators: Max, Min, Average, Count, Distinct Count, Last, Sum. Summary Attributes are calculated values, which are based on the analysis of incoming/processed interactions. Summaries Attributes are always stored on a single Customer or Item record, which is comparable to an SQL query with an Aggregation (e.g. SUM) and a GROUP BY Customer. The real-time functionality of this Variable Calculation Engine has been developed as a SEP component to compute the Summary Attributes in real-time during ingestion of the data. The calculated values can also be used in the Lily Customer Database Explorer for Facet Search, allowing for a richer User experience in the User Interface and a continued expansion of the functionality of Lily as a data exploration tool. The usage of the Summary Attributes within the Faceted Search functionality can be enabled/disabled on a Summary Attribute level. In addition to the real-time functionality, Lily also supports a batch-based rebuild of all variables using the parallel processing power of MapReduce. Lily Customer Database Explorer The Lily Customer Database Explorer has been enhanced with the functionality to support facet-based searches using graph widgets (bar chart, pie chart ). This type of User Interface navigation provides the User visual insights into the Lily Customer Database, allowing a step-by-step creation of Customer or Item target views. The Lily Customer Database Explorer also makes it possible to select one or more Segments to aid in segment analysis by utilizing interactive usage & selection within the of the graph-based User Interface widgets.

WHAT S NEW IN LILY 2.4 1 1 The graph-based widgets utilize faceted data as input, provided by the Lily Customer Database and support the following types of facets: Aggregated Facets: These type of facets make use of interaction summary attributes, like shopping frequency, click count, Field Facets: These type of facets are not using summarized data but basic field values like gender, region, User Interface With the delivery of Release 2.4, the Customer Database Explorer includes the functionality to drill down to the individual Customer records based on a Faceted Search by the following areas of Customer classification: Identity data Profile and segmentation data Behavioral data aggregated from customer interactions Preferences calculated by scoring and learning engines Source selection A User is able to filter on selected facet values through an Include/Exclude function, allowing for easy audience or item selection through a query-by-example interface.

WHAT S NEW IN LILY 2.4 1 2 Additionally, a User is able to get down to view the results of the filter in the screen below. The resulting results will include the following information: ID, Identity, Behavior and Preference for the individual record result. The User is able to see ID, Identity, Behavior and Preference results on a individual record result.

WHAT S NEW IN LILY 2.4 1 3 Item Explorer With the release of Lily 2.4, a User will be able to explore Item data through a Faceted Search as well. The User will begin the search through product categories, filter facets, and delving in the data of individual Items.

Lily Customer Intelligence Applications WHAT S NEW IN LILY 2.4 1 4 Rules-Based Recommendation Strategies The Rules-Based Recommendation Strategies functionality has been enhanced to support Strategies through dynamic business rules for Pre / Post processing of Recommendations. This includes selection of recommendation engines, different models within the Recommendation Engines, bypassing recommendation engine consultation, as well as influencing scores based on decision tables. A variety of Business Rules may be defined specific to a Clients needs and the calculations or the activation / deactivation of individual rules may be made by a User without having to redeploy a new version of the application. Knowledge-Based Recommendations A new Knowledge-Based Recommendation Engine can utilize the knowledge about Users and Items and reasons out what products meet the Users requirements. This has been delivered with this release of Lily. It supports the configuration and application of custom-made, domain-specific recommendation scores based on aggregates and overall Customer interaction data. This functionality allows Users to determine recommendations for a Customer based solely on their behavior and not taking into account the interactions of other Customers, such as with the Lily Collaborative Filtering Machine Learning Recommendation Engine.

WHAT S NEW IN LILY 2.4 1 5 Provisioning With the 2.4 release of Lily, a new provisioning framework has been created to assist in the installation and upgrade of Lily clusters. This framework is a Python solution based upon Fabric and is capable of installing an entire Lily cluster including Cloudera Manager and CDH4. By utilizing this tool, it is possible to create a Lily cluster on EC2 and have it running within 15 minutes. It replaces the existing Whirrbased installer and allows for a variety of deployment automations. For more information on Lily, please contact us at info@ngdata.com.