An Oracle White Paper October Oracle Social Cloud Platform Text Analytics

Similar documents
An Oracle White Paper November Primavera Unifier Integration Overview: A Web Services Integration Approach

Creating Custom Project Administrator Role to Review Project Performance and Analyze KPI Categories

Installation Instructions: Oracle XML DB XFILES Demonstration. An Oracle White Paper: November 2011

Oracle Data Provider for.net Microsoft.NET Core and Entity Framework Core O R A C L E S T A T E M E N T O F D I R E C T I O N F E B R U A R Y

An Oracle White Paper December, 3 rd Oracle Metadata Management v New Features Overview

October Oracle Application Express Statement of Direction

Automatic Receipts Reversal Processing

Oracle Learn Cloud. Taleo Release 16B.1. Release Content Document

Veritas NetBackup and Oracle Cloud Infrastructure Object Storage ORACLE HOW TO GUIDE FEBRUARY 2018

Correction Documents for Poland

Oracle NoSQL Database For Time Series Data O R A C L E W H I T E P A P E R D E C E M B E R

Generate Invoice and Revenue for Labor Transactions Based on Rates Defined for Project and Task

Configuring Oracle Business Intelligence Enterprise Edition to Support Teradata Database Query Banding

Tutorial on How to Publish an OCI Image Listing

Oracle CIoud Infrastructure Load Balancing Connectivity with Ravello O R A C L E W H I T E P A P E R M A R C H

Using the Oracle Business Intelligence Publisher Memory Guard Features. August 2013

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fences

April Understanding Federated Single Sign-On (SSO) Process

An Oracle White Paper October The New Oracle Enterprise Manager Database Control 11g Release 2 Now Managing Oracle Clusterware

Oracle Enterprise Data Quality New Features Overview

Loading User Update Requests Using HCM Data Loader

StorageTek ACSLS Manager Software Overview and Frequently Asked Questions

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

Oracle Cloud Applications. Oracle Transactional Business Intelligence BI Catalog Folder Management. Release 11+

Extreme Performance Platform for Real-Time Streaming Analytics

Oracle DIVArchive Storage Plan Manager

Load Project Organizations Using HCM Data Loader O R A C L E P P M C L O U D S E R V I C E S S O L U T I O N O V E R V I E W A U G U S T 2018

Achieving High Availability with Oracle Cloud Infrastructure Ravello Service O R A C L E W H I T E P A P E R J U N E

JD Edwards EnterpriseOne Licensing

An Oracle White Paper October Deploying and Developing Oracle Application Express with Oracle Database 12c

Working with Time Zones in Oracle Business Intelligence Publisher ORACLE WHITE PAPER JULY 2014

A Distinctive View across the Continuum of Care with Oracle Healthcare Master Person Index ORACLE WHITE PAPER NOVEMBER 2015

Oracle Enterprise Performance Reporting Cloud. What s New in September 2016 Release (16.09)

VISUAL APPLICATION CREATION AND PUBLISHING FOR ANYONE

An Oracle White Paper June StorageTek In-Drive Reclaim Accelerator for the StorageTek T10000B Tape Drive and StorageTek Virtual Storage Manager

Oracle WebLogic Portal O R A C L E S T A T EM EN T O F D I R E C T IO N F E B R U A R Y 2016

Oracle Database Vault

An Oracle White Paper September Security and the Oracle Database Cloud Service

Oracle Data Masking and Subsetting

Oracle Social Network

Siebel CRM Applications on Oracle Ravello Cloud Service ORACLE WHITE PAPER AUGUST 2017

Oracle Service Cloud Agent Browser UI. November What s New

Hard Partitioning with Oracle VM Server for SPARC O R A C L E W H I T E P A P E R J U L Y

Oracle Event Processing Extreme Performance on Sparc T5

An Oracle White Paper July Oracle WebCenter Portal: Copying a Runtime-Created Skin to a Portlet Producer

Repairing the Broken State of Data Protection

An Oracle Technical Article March Certification with Oracle Linux 4

Integrating Oracle SuperCluster Engineered Systems with a Data Center s 1 GbE and 10 GbE Networks Using Oracle Switch ES1-24

Oracle Best Practices for Managing Fusion Application: Discovery of Fusion Instance in Enterprise Manager Cloud Control 12c

PeopleSoft Fluid Navigation Standards

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

Oracle Service Registry - Oracle Enterprise Gateway Integration Guide

DATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud

Migration Best Practices for Oracle Access Manager 10gR3 deployments O R A C L E W H I T E P A P E R M A R C H 2015

Oracle Secure Backup. Getting Started. with Cloud Storage Devices O R A C L E W H I T E P A P E R F E B R U A R Y

Oracle Clusterware 18c Technical Overview O R A C L E W H I T E P A P E R F E B R U A R Y

Oracle Utilities CC&B V2.3.1 and MDM V2.0.1 Integrations. Utility Reference Model Synchronize Master Data

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

TABLE OF CONTENTS DOCUMENT HISTORY 3

An Oracle White Paper September, Oracle Real User Experience Insight Server Requirements

Oracle Fusion Configurator

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Differentiate Your Business with Oracle PartnerNetwork. Specialized. Recognized by Oracle. Preferred by Customers.

Subledger Accounting Reporting Journals Reports

Frequently Asked Questions Oracle Content Management Integration. An Oracle White Paper June 2007

See What's Coming in Oracle Taleo Business Edition Cloud Service

Oracle Express CPQ for Salesforce.com

August 6, Oracle APEX Statement of Direction

Benefits of an Exclusive Multimaster Deployment of Oracle Directory Server Enterprise Edition

Oracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R A P R I L,

An Oracle White Paper December Oracle Exadata Database Machine Warehouse Architectural Comparisons

See What's Coming in Oracle CPQ Cloud

Managing Metadata with Oracle Data Integrator. An Oracle Data Integrator Technical Brief Updated December 2006

Oracle Big Data Connectors

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 1, Compiler Barriers

Migrating VMs from VMware vsphere to Oracle Private Cloud Appliance O R A C L E W H I T E P A P E R O C T O B E R

Differentiate Your Business with Oracle PartnerNetwork. Specialized. Recognized by Oracle. Preferred by Customers.

Sun Fire X4170 M2 Server Frequently Asked Questions

TABLE OF CONTENTS DOCUMENT HISTORY 3

Oracle Flash Storage System QoS Plus Operation and Best Practices ORACLE WHITE PAPER OCTOBER 2016

An Oracle Technical White Paper September Detecting and Resolving Oracle Solaris LUN Alignment Problems

Cloud Operations for Oracle Cloud Machine ORACLE WHITE PAPER MARCH 2017

An Oracle White Paper October Release Notes - V Oracle Utilities Application Framework

Oracle Mobile Hub. Complete Mobile Platform

Product Release Notes

An Oracle White Paper July Methods for Downgrading from Oracle Database 11g Release 2

Product Release Notes

NOSQL DATABASE CLOUD SERVICE. Flexible Data Models. Zero Administration. Automatic Scaling.

Oracle Enterprise Performance Management Cloud

Technical Upgrade Guidance SEA->SIA migration

Oracle Learn Cloud. What s New in Release 15B.1

Rapid Bottleneck Identification A Better Way to do Load Testing. An Oracle White Paper June 2008

Pricing Cloud: Upgrading to R13 - Manual Price Adjustments from the R11/R12 Price Override Solution O R A C L E W H I T E P A P E R A P R I L

Technical White Paper August Recovering from Catastrophic Failures Using Data Replicator Software for Data Replication

RAC Database on Oracle Ravello Cloud Service O R A C L E W H I T E P A P E R A U G U S T 2017

Oracle Database Security Assessment Tool

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

Oracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R M A Y,

Oracle Responsys Release. Release Content Document. August 2016

Automatic Data Optimization with Oracle Database 12c O R A C L E W H I T E P A P E R S E P T E M B E R

Transcription:

An Oracle White Paper October 2012 Oracle Social Cloud Platform Text Analytics

Executive Overview Oracle s social cloud text analytics platform is able to process unstructured text-based conversations to find the signal through its method of categorization and enrichment. First, the platform defines posts as relevant or irrelevant to a category ( topic ) based on a combination of user defined key words and semantic accept or reject filters in context, instead of using key words alone. Second, the platform enriches categorized posts by looking for snippets (the 1-2 sentences) within posts that semantically match to key indicators (consumer intentions, interests, or psychographics) such as purchase intent, problems, complaints, referrals, single moms, etc. The platform also scores snippets by sentiment, readability and subjectivity, and scores authors by demographics and Klout/influencer scores (where data is available). This enrichment process adds multiple and valuable meta-data to relevant posts and authors who made them, to support effective routing of posts within business process workflows, CX applications, and big data analytics and business intelligence reporting systems. Because of this approach, the result from the platform is: Accurate The platform cleanses and removes irrelevant posts that would be included if using keywords only, due to multiple meanings and/or ambiguous meaning of many key words; Comprehensive The platform includes more relevant posts by avoiding the need to over filter as a method for excluding irrelevant posts; Insightful The platform discovers new topics or dimensions that are relevant to a business through open-ended white-space analysis Clean The platform cleans all incoming raw data to eliminate duplicate posts and SPAM. The platform s cleaning algorithms have been refined and optimized using social media content. 1

Oracle s Blended Technology Listening Platform Oracle has developed unique and proprietary IP around its text processing approach and specifically in its development and application of latent semantic analysis (LSA) as a method to automatically categorize and enrich text content in near real-time. Brief Overview of Latent Semantic Analysis Oracle s LSA is an advanced form of statistical language modeling (SLM). LSA is a method for exposing latent contextual-meaning within a large body of text. It does this by looking at word usage (specifically, word co-occurrence) within a set of documents. Words, which appear in similar contexts, are assumed to have similar meaning and/or relational significance. LSA constructs a large matrix of term-document association data. Each cell in the matrix contains a weighted value, which is proportional to the number of times each term appears within each document in the set. The weights are structured such that rarer terms have greater weights. This allows more relevant terms to carry more weight to construct more accurate vectors of what/how consumers are talking about a category, brand, or product. This matrix is then analyzed and reduced using singular value decomposition (SVD), a method of statistical transformation. SVD identifies relevant correlations between specific combinations of terms and documents. The process constructs document and term spaces for each term and document with an n-dimensional vector within the space. Following this process, each document occupies a specified position within the semantic space, with similar documents appearing near each other. Document-to-document similarity can then be performed by using simple vector distance measures such as cosine or Euclidean. Cosine is a trigonometric function that calculates the length of the adjacent sides (x and y-axis) of a hypotenuse. A Euclidean measure computes the square root of the sum of the squares of the differences between corresponding values deriving the shortest distance between two points that form a 90-degree angle on an x and y-axis. Oracle combines its LSA technology with other forms of text processing, including: Keyword/Boolean search to assist in quickly initiating topic or dimension categorization queries; Natural language processing (NLP) to conduct speech analysis to further enrich snippet data through sentiment and readability scoring, voice of customer, and word usage analysis. 2

Pre-Filter: Start with Keyword Search on one word that s all Narrow large data-sets for analysis with semantic processing Speed to Insights: Latent Semantic Analysis High quality simliarity measures High performance for speed and precision Easier Maintenance Accurate Categorization Spam identification Speech Analysis Natural Languge Processing Parsing content to diagram speech Sentiment scoring Figure 1. Oracle s Text Analytics Processing Approach Oracle has implemented this approach, as compared to most other sophisticated vendors in the marketplace (who rely on combination of keyword/natural language processing), for the following reasons: Cost of Client Setup - For new domains, other vendors can spend a lot of time teaching the system the exact meaning of the vocabulary for each new area a client wants to build topics on. For example having to teach the system the difference between "drive" as a verb in an automotive context and "drive" as a piece of hardware in a computer context. Oracle s latent semantic analysis captures this contextual difference automatically and on the fly, using semantic theming (also known as auto-clustering). Processing Speed - with social media content volumes reaching hundreds of millions of posts per day it is essential that content is processed as quickly as possible. Oracle s Latent Semantic Analysis approach enables our system to operate at scale, categorizing hundreds of thousands of messages against tens of thousands of topics in minutes. These speeds can be difficult to achieve in systems that require interaction with extensive knowledge bases to determine a post s meaning. Data Accuracy - Oracle s categorization technique delivers exceptional categorization accuracy without sacrificing ease of topic setup or processing speed. Oracle s blended text processing approach is most suitable for clients who want a high degree of categorization accuracy, precise enrichment and automated scoring from noisy consumer conversations in an efficient and effective manner. Oracle s Messaging Processing Platform Process Oracle s text analytics approach, part of its Oracle Social Engagement and Monitoring Cloud Service, delivers its insights by consuming massive amounts of social conversation and then applying unique IP processes to extract meaningful, relevant information from 3

that mountain of data. With access to this information, Oracle s customers are able to make intelligent business decisions and take more targeted engagement action. Figure 2. Oracle Social Cloud Platform Text Analytics Message Processing Pipeline Oracle s message processing occurs across three stages: 1. Cleaning 2. Categorization 3. Enrichment Snippetization Dimension Assignment Scoring Stage 1: Cleaning The first stage of message processing removes SPAM and duplicate posts from the incoming data stream. De-duping De-duping is accomplished in two ways: the Oracle system accepts only a single, unique instance of a URL for each post. This prevents the exact same social media message from entering the system as a result of a provider error or multiple providers collecting the same message. When the URL is distinct but the post content is exactly the same, such as with multiple posts for blogs, press releases and boards, an algorithm identifies the duplicate content and prevents the post from entering the system. (Note: Oracle does not eliminate duplicate content in the case of tweets because re-tweets are considered to be relevant message volume because they come from unique authors.) SPAM Removal 4

To remove SPAM, Oracle s system uses a Bayesian SPAM filter that has been trained and refined with social media content. Stage 2: Categorization The second stage of message processing identifies and assigns relevant topics to each post. This process utilizes the keyword and semantic filters that have been defined for each topic by end users. Users create topics, starting with a simple key-word based query, and then using the Latent Semantic Analysis engine s auto-clustering capability to choose which types of conversations to accept (green), and which types of conversations to reject (red). This straight forward process allows a user to achieve 95%+ categorization accuracy for both ambiguous and unambiguous search terms. Figure 3. Diagram Signal Accuracy of User Selection of Accept and Reject Filters to achieve This stage of processing is at the post level and produces a subset of relevant posts for each topic. Stage 3: Enrichment Snippetization/Indicator Assignment The final stage of message processing enriches the relevant posts for each topic by attributing the posts with values for each assigned indicator. To achieve the best accuracy, this process occurs at the snippet level. A snippet is a subset of a post that is made up of the words in close proximity to user defined anchor terms (a.k.a. the key word(s)). Using a snippet rather than a full post results in a better indicator assessment because the words being evaluated are in close proximity to the anchor terms. See Exhibit A for a Sample of Oracle s Library of Indicators (semantic based) Scoring Following snippetization and dimension assignment, each snippet is scored for the following attributes: Sentiment negative, neutral, positive Readability complexity of writing style (0..100) Subjectivity identification of individual expression (0..100) 5

For each author, scoring is provided for demographics when available (age, gender, and location) and Klout (influencer) score. These values are captured by our data aggregators and made available within our system. Conclusion - Post Processed, Tagged Conversations & Authors Oracle s Social Cloud Platform for Text Analytics can process large volumes of social conversations consistently, fast, at scale with topic/indicator assignment and other scoring meta-data for consumption in Oracle s Social Engagement and Monitoring Cloud Service, or via a data feed for consumption within appropriate client systems. Specifically, once configured the platform, the result is the ability to: Achieve pure signal by identifying social media posts as relevant to a specific business opportunity or problem this is the topic creation/filtering part of CI user setup; Define how the posts are relevant in a way that leverages the content into actionable information this is the enrichment/dimension-creation part of CI user setup. 6

Appendix A Sample of Oracle s Library of Indicators The following are semantically trained indicators that are used as KPIs for specific industries. Additional ones can be created as required. 7

October 2012 Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065 U.S.A. Worldwide Inquiries: Phone: +1.650.506.7000 Fax: +1.650.506.7200 Copyright 2012, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 0912 oracle.com