Query-Time JOIN for Active Intelligence Engine (AIE)

Similar documents
Introducing Active Security

How to Accelerate Merger and Acquisition Synergies

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Informatica Enterprise Information Catalog

The Value of Data Modeling for the Data-Driven Enterprise

Qlik s Associative Model

Hybrid Data Platform

Shine a Light on Dark Data with Vertica Flex Tables

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

Data Analytics at Logitech Snowflake + Tableau = #Winning

Progress DataDirect For Business Intelligence And Analytics Vendors

The strategic advantage of OLAP and multidimensional analysis

Enterprise Multimedia Integration and Search

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations

FINANCIAL REGULATORY REPORTING ACROSS AN EVOLVING SCHEMA

Whitepaper. Solving Complex Hierarchical Data Integration Issues. What is Complex Data? Types of Data

Fig 1.2: Relationship between DW, ODS and OLTP Systems

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

TIBCO Spotfire Statement of Direction. Spotfire Product Management

Provide Real-Time Data To Financial Applications

COURSE 20466D: IMPLEMENTING DATA MODELS AND REPORTS WITH MICROSOFT SQL SERVER

How to integrate data into Tableau

Guide Users along Information Pathways and Surf through the Data

Cisco APIC Enterprise Module Simplifies Network Operations

The Data Explosion. A Guide to Oracle s Data-Management Cloud Services

Microsoft FAST Search Server 2010 for SharePoint Evaluation Guide

2 The IBM Data Governance Unified Process

Qlik s Associative Model

1Z0-526

Overview of Data Services and Streaming Data Solution with Azure

20466C - Version: 1. Implementing Data Models and Reports with Microsoft SQL Server

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.

TECHNOLOGY BRIEF: CA ERWIN DATA PROFILER. Combining Data Profiling and Data Modeling for Better Data Quality

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

The Business Case for a Web Content Management System. Published: July 2001

Spotfire and Tableau Positioning. Summary

VAT/GST Analytics by Deloitte User Guide August 2017

Tableau Metadata Model

Adds Leading AI Data Engine to Oracle Cloud Applications, Providing Dynamic and Insightful Company-Level Data to Power Even Smarter Decisions

Fast Innovation requires Fast IT

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

Deploying, Managing and Reusing R Models in an Enterprise Environment

Přehled novinek v SQL Server 2016

MODERNIZE INFRASTRUCTURE

Semantics, Metadata and Identifying Master Data

OLAP Introduction and Overview

Introduction to K2View Fabric

SAS BI Dashboard 3.1. User s Guide Second Edition

The Value of Data Governance for the Data-Driven Enterprise

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

Data safety for digital business. Veritas Backup Exec WHITE PAPER. One solution for hybrid, physical, and virtual environments.

Smart Data Center From Hitachi Vantara: Transform to an Agile, Learning Data Center

Self-Service Data Preparation for Qlik. Cookbook Series Self-Service Data Preparation for Qlik

IBM InfoSphere Information Analyzer

The Associative Difference

Chapter 13 XML: Extensible Markup Language

Massive Scalability With InterSystems IRIS Data Platform

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

Aufbau agiler BI- & Discovery-Applikationen mit Oracle Endeca. DOAG 2012 Nürnberg, 20. November Harald Erb Solution Architect BI & DWH

Improving Data Governance in Your Organization. Faire Co Regional Manger, Information Management Software, ASEAN

Quest Central for DB2

Virtuoso Infotech Pvt. Ltd.

Help Your Security Team Sleep at Night

BUILD BETTER MICROSOFT SQL SERVER SOLUTIONS Sales Conversation Card

HPE Partner Ready Digital Marketing Program

Study on the possibility of automatic conversion between BusinessObjects Universes and Oracle Business Intelligence Repositories

Strategic Briefing Paper Big Data

Data warehouse architecture consists of the following interconnected layers:

Informatica PowerExchange for Tableau User Guide

The Evolution of Search:

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR

Full file at

User Configurable Semantic Natural Language Processing

Enterprise Data-warehouse (EDW) In Easy Steps

REPORTING AND QUERY TOOLS AND APPLICATIONS

Distributed Hybrid MDM, aka Virtual MDM Optional Add-on, for WhamTech SmartData Fabric

Modelling Structures in Data Mining Techniques

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

Introducing IBM Lotus Sametime 7.5 software.

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.

The functions performed by a typical DBMS are the following:

The Emerging Data Lake IT Strategy

Information Retrieval

20762B: DEVELOPING SQL DATABASES

Qlik. 10 key elements of a successful data strategy and modern analytics platform. February 2019 Julie Kae Executive Director, Qlik.

CA ARCserve Backup. Benefits. Overview. The CA Advantage

THE JOURNEY OVERVIEW THREE PHASES TO A SUCCESSFUL MIGRATION ADOPTION ACCENTURE IS 80% IN THE CLOUD

Tamr Technical Whitepaper

The #1 Key to Removing the Chaos. in Modern Analytical Environments

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes?

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

When, Where & Why to Use NoSQL?

DATACENTER SERVICES DATACENTER

Accelerate Your Enterprise Private Cloud Initiative

Microsoft. [MS20762]: Developing SQL Databases

Transcription:

Query-Time JOIN for Active Intelligence Engine (AIE) Ad hoc JOINing of Structured Data and Unstructured Content: An Attivio-Patented Breakthrough in Information- Centered Business Agility An Attivio Technology Brief

Executive Summary & Overview Which organizations will successfully grow revenue, cut costs and increase competitiveness despite a challenging economy? According to IDC, those winning organizations will be informationcentered, with the ability to monitor and mine information [and] unite access to all data sources structured and unstructured. 1 Attivio s Active Intelligence Engine (AIE) provides information-centered business agility by enabling users to access and analyze all related enterprise information regardless of data type (structured, semi-structured or unstructured) and regardless of original data source. This informational agility is made possible by AIE s schema-less universal index along with AIE s ability to perform ad hoc relational JOINs ( AIE JOIN ) of all related data and content, with no advance data modeling or pre-defined joins between information sources required. The ability to perform ad hoc JOINs with the agility offered by the universal index is groundbreaking functionality that was conceived, developed, and patented by Attivio. AIE JOIN enables highly agile integration of data and content through uniquely flexible capabilities that surpass those of SQL JOIN (databases) and search engines. A comparison of the limitations of relational database and search engine technologies are summarized in the table below. Relational Databases (RDBMS) Enterprise Search Active Intelligence Engine Integrates Data? Yes, however: Database model (schema) must be fully defined in advance to enable SQL JOIN Limited: Must flatten databases to enable ingestion, losing relationships between data Addition of new databases will require complete re-indexing and reflattening of all data. Some solutions may require building pre-defined joins to enable relating data and content Yes: Complete with the ability to JOIN related data and content, with no flattening of databases and no advance data modeling required Integrates Content? Limited: Only content stored within a database; also may not perform full-text search of that content SQL text operators (e.g., LIKE, CONTAINS) are limited to pattern recognition, lacking linguistics and text analytics found in enterprise search Yes: with strong linguistic and text analytics capabilities to find related and relevant content with Google-like simplicity Yes: Provides the best capabilities of enterprise search and business intelligence (BI) to JOIN related and relevant content and data with Google-like simplicity or using BI tools and other access options 2

This technology brief will review in further detail the flexibility and agility of AIE JOIN and explore an example of AIE JOIN in action, powering a financial services dashboard that freely integrates and presents all data and content related to user queries. AIE JOIN: Combining the Best of Databases & Enterprise Search Accessing and correlating structured data and unstructured content together from separate information silos requires significant manual effort and precious IT resources, often resulting in stale, still-incomplete information. Attivio has solved this critical problem through its unified information access (UIA) platform, indexing, joining and presenting previously isolated structured and unstructured information sources, and enabling the comprehensive, easily-accessed, businessbuilding insights organizations are looking for, using the end user s tool of choice. Traditionally, the methods for accessing unstructured content and structured data were divided: enterprise search engines were used for finding unstructured content (documents, email, etc.) and relational database systems (RDBMS) were needed for retrieving data. Flattening Databases Loses Critical Information Meanwhile, enterprise search engines did not look at databases, leaving their contents out of search results. More recently, some search engines added an ability to ingest data by converting relational data to an index model. This requires running a query in the database to produce a single table a flattened (denormalized) view that can be indexed such that each row is a document, and each column is a field within it. Unfortunately, such flattened databases are quite bloated size-wise, as information originally included just once within one table of the original database now must be duplicated repeatedly. Even worse, the relationships between data is discarded, severely diminishing its business value. This sample database (Fig.1) contains three simple tables: Users, News, and Comments. Fig.1 Flattening these tables using User as the parent record enables querying users and their comments; for example, find users from the domain attivio.com whose comment included the word agree. Including news content in the query would require extensive duplication (Fig.2). Fig.2 Unfortunately, despite this flattened data no longer supports precise querying against news articles. For example, if you search for users who used the term agree in a comment for a news article with search in its title, you would get back both of the users in Fig. 2 above, which is incorrect: In the 3

first case, the agree comment relates to the second article, not the first one that contains the desired term search. This is the downside of flattening: it is a lossy process in which useful relational data is discarded. In contrast, Attivio's Active Intelligence Engine (AIE) uses a hybrid index structure that supports both flat and relational indexing models. AIE's query-time JOIN operator allows selection of relationships between tables as would be expressed in SQL, and use full-text search at the same time. JOIN elegantly solves the problems that result from the translation of relational data to inverted indexing. For example, AIE can index the users, news, and comment tables as separate tables - no flattening required - and queries, using the JOIN operator, can answer virtually any question about the data. Attivio s AIE JOIN capability is the key to solving the problem of assembling data from both content sources and databases without losing data relationships and without requiring building a data model (schema) in advance. The Devil s in the Details : Limitations of SQL JOIN While database developers are very well versed in the use of SQL JOIN, it is important to recognize the compelling new power AIE JOIN offers beyond SQL JOIN. JOIN originated as a Structured Query Language (SQL) clause for use in relational databases and is not available in search engines. The SQL JOIN operation combines records from two or more tables in a relational database, linking specific fields to create a single result set. SQL JOIN allows efficiency by enabling smaller table structures. Though very powerful, SQL JOINs, as previously noted, are limited to database use only, not content repositories. Relational databases and JOIN semantics, require fully predefined schemas and normalized data. In addition, most JOINs will need to be analyzed and tuned by Database Administrators for acceptable performance. All of this can become a very long, cumbersome process, demanding substantial time and resources to plan, develop, test and maintain. Most database methods do not perform full-text searching of content; those that do require the content to be stored and organized in database tables. It should be noted that SQL text operators such as LIKE and CONTAINS are limited to finding documents based on pattern recognition. Effective textual searching requires linguistics and text analytics commonly found in enterprise search technologies; including: Stemming reduces words to their root or stem word; e.g.; identifying fish as the stem word for fishing and fishy; while the stem word for better would be identified as good Lemmatization works similarly to stemming, but operates on the full text, recognizing parts of speech (e.g., using stemming and lemmatization, a search for bike would find documents with bicycle, biking and cycling as well as bike) without relying on string pattern recognition And much more: spelling correction, synonym expansion; entity extraction (people, places, etc.), key phrase extraction, etc. Key Innovations of Attivio s Patented AIE JOIN AIE JOIN was conceived, developed and patented by Attivio; it is not found in any other enterprise information management product. AIE JOIN offers unique functionality beyond the SQL JOIN of relational databases (RDBMS). The key innovations offered by AIE are: 4

Indexing databases on a table-by-table basis, retaining their relational structure (no flattening ) Treating data and content equally in the same universal index Making JOINs available for all unstructured content sources as well as all structured data sources included in the universal index Executing the JOIN operation on an ad hoc basis at query time, whereas database JOINs must be set up prior to queries Not requiring predefined data schemas (table definitions), relationships (any sub-queries can be joined), or models for how data is to be extracted from sources The JOIN operator in AIE, which is part of Attivio s Advanced Query Language, allows all matching information to be retrieved from disparate sources without any of the prior set-up and data modeling that are required for databases. In other words, with AIE you do not have to JOIN tables in advance. The AIE JOIN is executed when the query is issued. The information that is JOINed can originate in content or databases, including different databases whose tables are not linked with a SQL JOIN. This means that with one query, AIE JOIN allows you to gather and analyze all types of information, regardless of source or format. Queries that use AIE JOIN ask for multiple pieces of information, containing a string of at least two sub-queries, with each sub-query requesting one type of information. For example, a query with sub-queries could ask for January sales (one sub-query) for XYZ products (one sub-query) included in Wall Street Journal articles (one sub-query). AIE sub-queries can be a table or a complex query (including nested JOINs). The sub-queries are dynamically combined on any fields. Unlike relational databases, common keys are not required and relationships among fields and tables do not need to be pre-defined. Instead, a record is created on the fly to contain the data and content that match the query. AIE also leverages its JOIN operator and universal index to easily enforce the most complex user permissions with high performance. AIE s Active Security model stores permissions for users, groups, 5

roles, etc. and access control lists (ACLs) in a separate security table in the AIE universal index, without requiring a separate database. At query time, AIE will automatically JOIN and present all information that matches a user query, as well as the user s permissions. Only the data and content the user is allowed to see is presented. The scalability and fast performance of AIE s Active Security are expected in the database world, but not in the enterprise search world, where typical document-centric security models ( early binding and late binding ) are difficult to manage, require substantial ongoing system maintenance, and pose very challenging technical tasks, such as updating metadata and other parts of a record. No other enterprise information platform provides the agility of AIE and the scale and performance of Active Security yet another way Attivio leverages its JOIN capability. AIE JOIN in Action: A Financial/BI Dashboard Example This sample financial services customer dashboard leverages AIE JOIN to let users explore integrated and correlated data and content in ways not otherwise possible, including combining customer s past transactions, portfolio allocations, customer s areas of interest, preferences, etc. The dashboard is populated from large sets of structured data and unstructured content, all universally indexed in AIE. Behind the scenes, AIE leverages its universal index to JOIN structured data from transactional and CRM databases as well as unstructured content customer emails, chat logs, phone call transcripts, documents, etc. AIE also utilizes advanced text analytics, including categorizing previous customer inquiries, calculating levels of positive or negative customer sentiment expressed by the customer previously towards different financial investment products, etc. A query for any entity (noun phrase) in the search box, for example, presents a set of information data and content, JOINed on an ad hoc basis at query time about the entity(ies) the user requests. When the user issues a simple query, for example: John O. Smith (or selects this name from facet recommendations), the JOIN query is executed, and the customer management interface is populated with data and content related to that customer. Because AIE natively supports SQL with ODBC/JDBC connectivity, popular agile business intelligence (BI) tools such as TIBCO Spotfire, Tableau and QlikView can be used as a visual front end to access and present the unified data and content in AIE s universal index. 6

This unified customer financial data and content made possible by AIE s query-time JOIN capability empowers customer reps to proactively manage customer conversations in a helpful and relevant manner, such as directing a customer towards products and services for which that customer has previously expressed interest and positive sentiment. The result: significantly enhanced new revenue opportunities from highly satisfied existing customers. By comparison, the standard SQL required to accomplish similar results would: Depend on significant up-front work to define the data model, design the schema and structure the content in tables Access only database records; can t access unstructured content via search Require two multi-line queries (one for data, one for content) Offer only faceted navigation of structured data or no faceted navigation at all The above-noted complexity relate simply to formulating standard SQL similar to the results already provided behind the scenes by AIE JOIN. It does not address the challenges of integrating essential text analytics, sentiment analysis, etc. which, of course, are already incorporated within AIE. Additionally, AIE s syntax for building this dashboard is much simpler than SQL; for example, the SQL UNION that unites the two queries to access the data and content requires the same number of columns (with matching types) from both the data and content queries. AIE does not have this reliance on columns or on pre-defined schemas, so queries can be more flexible. Conclusion: AIE JOIN Enables Information-Centered Business Agility AIE JOIN ensures that all relevant enterprise information structured data, semi-structured data and unstructured content alike can be accessed with an unrivaled level of information agility: Schema-less indexing that enables dynamic, relational joining of all structured and unstructured information No advance data modeling required Databases retain their relational structure; no flattening of databases required Ad hoc JOINs executed at query time, returning all related data and content Information returned by AIE can be used to populate charts, tables or other interfaces and can include both data and content, including: Data tables Charts KPIs Lists of relevant unstructured content Data extracted from content Data calculated based on text analysis of content (e.g., sentiment analysis), in-engine analytics, and more As a powerful Analyze Everything platform, AIE provides highly flexible and easily implemented indexing, joining and presentation of all enterprise information, regardless of data type or desired end user access method. 7

About Attivio Attivio s unified information access platform, the Active Intelligence Engine (AIE), redefines the business impact of our customers information assets, so they can quickly seize opportunities, solve critical challenges and fulfill their strategic vision. Attivio correlates disparate silos of structured data and unstructured content in ways never before possible. Offering both intuitive search capabilities and the power of SQL, AIE seamlessly integrates with existing BI and big data tools to reveal insight that matters, through the access method that best suits each user s technical skills and priorities. Please visit us at www.attivio.com. Attivio and Active Intelligence Engine are trademarks of Attivio, Inc. All other names are trademarks of their respective companies. Attivio, Inc. 275 Grove Street Newton, MA 02466 USA o +1.857.226.5040 f +1.857.226.5072 info@attivio.com www.attivio.com Endnote 1 Sue Feldman, IDC, Forget the Query Box: Explore, Discover, Find (October 2011). 2013 Attivio, Inc. All rights reserved. Attivio, Active Intelligence Engine, and all other related logos and product names are registered trademarks or trademarks of Attivio in the United States and/or other countries. All other company, product, and service names are the property of their respective holders and may be registered trademarks or trademarks in the United States and/or other countries. 8