Social Business Intelligence in Action

Similar documents
Meta-Stars: Multidimensional Modeling for Social Business Intelligence

Social Business Intelligence in Action

Social Business Intelligence

Variety-Aware OLAP of Document-Oriented Databases

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

Meta-Stars: Multidimensional Modeling for Social Business Intelligence

Q1) Describe business intelligence system development phases? (6 marks)

Evaluating the Usefulness of Sentiment Information for Focused Crawlers

An Approach To Web Content Mining

Sentiment Web Mining Architecture - Shahriar Movafaghi

Modern Software Engineering Methodologies Meet Data Warehouse Design: 4WD

Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics

Empowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics

Data Driven Performance Repository to Classify and Retrieve Storage Tuning Profiles Ismael Solis Moreno IBM

Managing Information Resources

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Things to consider when using Semantics in your Information Management strategy. Toby Conrad Smartlogic

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations

Proposal of a new Data Warehouse Architecture Reference Model

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

Information Retrieval

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Question Bank. 4) It is the source of information later delivered to data marts.

ANNUAL REPORT Visit us at project.eu Supported by. Mission

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu

Part I: Data Mining Foundations

Enhancing applications with Cognitive APIs IBM Corporation

Data Management Glossary

A Database Driven Initial Ontology for Crisis, Tragedy, and Recovery

USER GUIDE DASHBOARD OVERVIEW A STEP BY STEP GUIDE

Text Mining. Representation of Text Documents

Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Building Search Applications

Comparing Sentiment Engine Performance on Reviews and Tweets

Data Mining and Warehousing

Developing SQL Data Models

Analysis of Nokia Customer Tweets with SAS Enterprise Miner and SAS Sentiment Analysis Studio

20489: Developing Microsoft SharePoint Server 2013 Advanced Solutions

Lily: A Geo-Enhanced Library for Location Intelligence

A B2B Search Engine. Abstract. Motivation. Challenges. Technical Report

Knowledge Engineering with Semantic Web Technologies

Contextual Search using Cognitive Discovery Capabilities

GIPO Observatory Tool flash session for NRIs

Teradata Aggregate Designer

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Data-Intensive Distributed Computing

CHAPTER 3 Implementation of Data warehouse in Data Mining

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

SharePoint SP380: SharePoint Training for Power Users (Site Owners and Site Collection Administrators)

Shine a Light on Dark Data with Vertica Flex Tables

COCKPIT FP Citizens Collaboration and Co-Creation in Public Service Delivery. Deliverable D Opinion Mining Tools 1st version

Data Analytics at Logitech Snowflake + Tableau = #Winning

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

A Study of Future Internet Applications based on Semantic Web Technology Configuration Model

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

Modelling Structures in Data Mining Techniques

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Introduction to Data Mining and Data Analytics

Chapter 6 VIDEO CASES

STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa

Developing SQL Data Models

Business Intelligence Roadmap HDT923 Three Days

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

How Co-Occurrence can Complement Semantics?

Visualizing PI System Data with Dashboards and Reports

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Improving Data Governance in Your Organization. Faire Co Regional Manger, Information Management Software, ASEAN

Néonaute: mining web archives for linguistic analysis

Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries

INTRODUCTION. Chapter GENERAL

Evolution of Database Systems

Microsoft SharePoint Server

Introduction to Data Science

SAMPLE 2 This is a sample copy of the book From Words to Wisdom - An Introduction to Text Mining with KNIME

BIG DATA TESTING: A UNIFIED VIEW

Mining Social Media Users Interest

ETL4Social-Data: Modeling Approach for Topic Hierarchy

A data-driven framework for archiving and exploring social media data

DATA WAREHOUING UNIT I

Essential for Employee Engagement. Frequently Asked Questions

Advanced Data Management Technologies

USER GUIDE DESIGN A STEP BY STEP GUIDE

Microsoft End to End Business Intelligence Boot Camp

USING THE MUSICBRAINZ DATABASE IN THE CLASSROOM. Cédric Mesnage Southampton Solent University United Kingdom

Prof. Dr. Christian Bizer

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Construction Change Order analysis CPSC 533C Analysis Project

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

Microsoft Developing Microsoft SharePoint Server 2013 Advanced Solutions

A Benchmarking Criteria for the Evaluation of OLAP Tools

ACCELERATE YOUR SHAREPOINT ADOPTION AND ROI WITH CONTENT INTELLIGENCE

Data warehouse architecture consists of the following interconnected layers:

A MODEL OF EXTRACTING PATTERNS IN SOCIAL NETWORK DATA USING TOPIC MODELLING, SENTIMENT ANALYSIS AND GRAPH DATABASES

AWERProcedia Information Technology & Computer Science

Transcription:

Social Business Intelligence in ction Matteo Francia, nrico Gallinucci, Matteo Golfarelli, Stefano Rizzi DISI University of Bologna, Italy Introduction Several Social-Media Monitoring tools are available for UG analysis Perceived as self-standing apps: no integration with corporate data ffered as-a-service: lack in sufficient verticalization / personalization Project-oriented: narrow time-horizon, limited historical depth "Social Business Intelligence is the discipline that aims at combining corporate data with user-generated content (UG) to let decisionmakers analyze and improve their business based on the trends and moods perceived from the environment." 1

Introduction In SBI, social media monitoring becomes an integrated D process ross-analysis between enterprise and social data is fundamental to properly understand the impact of social events on the enterprise Social data become an asset of the company The goal of our approach is to discuss the architectural options available for an SBI project Help designers in finding the right cost-benefit compromise Based on the experience of several real world projects, including: ollaboration with madori, Italian leader in poultry industry Regional project on monitoring vaccine-related discussions and fears Ministerial project on the analysis of the 2014 uropean elections Introduction: the project The ebpolu Project aimed at studying the connection between politics and social media http://webpoleu.net SBI is used as an enabling technology for analyzing the UG hen: March 2014 to May 2014 (elections held on May 22-25, 2014) here: Germany, Italy, United Kingdom round 10 millions clips collected Facebook posts, tweets, blogs and forum posts, news feeds and comments, etc. 2

SBI architecture: classification In an SBI architecture, the roles of each component may vary from project to project Design complexity and control level by the user may vary ff-the-shelf solution dopt a full solution, supporting a set of standard reports and dashboards nd-to-nd solution cquire and tune an end-to-end software / service Best-of-Breed solution cquire specialized tools in one or more parts of the process Low skills Fast setup High skills Slow setup Low control High control SBI architecture: reference reference architecture for SBI has been proposed in [2] eb rawling Semantic nrichment ntology nalysis RM TL DS TL DM Document Base D [2] M. Francia, M. Golfarelli, S. Rizzi. methodology for social BI. In Proc. IDS, pp 207-216, 2014 3

SBI architecture: ebpolu ( No RM and no D are present in ebpolu ) Raw crawler files: 79 GB eb rawling Semantic nrichment ntology nalysis TL DS TL DM Document Base DS: 481 GB DM: 116 GB Text docs: 65 GB SBI architectural options rawling ontext The main burden is in ensuring a good compromise between too much / too little content onsiderations ff-the-shelf: the designer only carries out macro analysis nd-to-end: clipping is guaranteed by the service provider, querying is controlled by the designer Best-of-breed: all technical activities are in charge of the designer; potentially burdensome and very time-consuming In ebpolu rawling process relies on Brandwatch, a third-party service (end-to-end) It also extracts metadata (source, author, etc.) and derives clip sentiment S 4

SBI architectural options Semantic enrichment ontext wide spectrum of technological alternatives onsiderations Basic techniques may be sufficient to analyze raw data e.g., count topic occurrences NLP analysis techniques are powerful but potentially expensive e.g., extract lemmas, semantic relationships between lemmas, more detailed sentiment For inherently complex languages (e.g., German), automated analysis and interpretation of sentences is discouraged In ebpolu Semantic enrichment is achieved as the combination of both basic and advanced techniques NLP analysis is carried out by the commercial system SyN Semantic enter S SBI architectural options DS (perational Data Store) ontext The DS component is not strictly necessary onsiderations DS is actually recommended Buffering and early analysis (separate crawling from semantic enrichment) lip reprocessing (semantic enrichment is an iterative process) Data cleaning (more effective on materialized data rather than on-the-fly) Relational or NoSQL? NoSQL guarantees scalability Transactional workload is better handled with ID properties Metadata processing also favors a well-defined, normalized schema In ebpolu n RDBMS is used; a NoSQL repository is only used to enable text search S 5

SBI architectural options nalysis S ontext nalysis is a key component of SBI architectures; it can take a variety of shapes, with quite different capabilities Dashboards: small number of predefined views and navigations Text search: detailed analysis up to the single UG level Text mining: advanced analyses (e.g., clip clustering, topic discovery) LP: flexible analysis on the multidimensional metaphor onsiderations ff-the-shelf solutions: dashboards and text search, rarely text mining LP capabilities are clearly more powerful, but also complex to provide In ebpolu Dashboards, text search and LP capabilities are enabled SBI architectural options nalysis (ebpolu) S lip hierarchy: built with metadata from the crawling service Fact: occurrence of a topic within a clip 6

SBI architectural options nalysis(ebpolu) S The topic hierarchy is built from the domain ontology and modeled using an advanced technique, specific for topic hierarchies (meta-stars) [4] [4]. Gallinucci, M. Golfarelli, S. Rizzi. dvanced topic modeling for social business intelligence. Information Systems, pp. 53:87-106, 2015. ase study: effectiveness omparison of the two semantic enrichment techniques Topic occurrences in the clips dvanced only: 1,9 M Shared: 21,5 M (nglish dataset only) Basic only: 3,5 M dvanced: SyN semantic engine Basic: in-house procedure In most cases, the two techniques find the same topic occurrences Basic techniques could be sufficient for KPIs based on topic counting Sophisticated ontology-based techniques are required for deeper analyses (e.g., semantic co-occurrences) 7

ase study: effectiveness omparison of the two semantic enrichment techniques lip sentiment comparison 3500 3000 2500 2000 1500 1000 500 0 dvanced Basic greed (nglish dataset only) dvanced: SyN semantic engine (lexical analysis of the sentences) Basic: Brandwatch service (rule-based technique) Brandwatch hardly assigns a nonneutral sentiment to a clip Due to its inability / caution in assigning a non-neutral sentiment ase study: efficiency verage TL times Basic sem. enrich.: 20 msec/clip dvanced sem. enrich.: 1,6 sec/clip eb rawling Semantic nrichment ntology nalysis TL DS TL DM Document Base rawling to DS: 0,3 sec/clip IR cubes: 1,3 msec/clip NLP cubes: 6,8 msec/clip Document DB: 1,6 msec/clip 8

ase study: sustainability The design of the architecture is an iterative task First design iteration: 84 person-days Second design iteration: 30 person-days Main critical issues: ntology design: correctness of the results is deeply affected by the completeness of the domain ontology rawling setup: proper formulation within the boundaries set by the service provider (e.g., number of queries, query length) may become a nightmare TL & LP design: continuous tuning required to handle all possible unexpected results due to bad clipping onclusion Rules of thumb The adoption of sentiment analysis should be carefully evaluated side from specific sources/closed domains, sentiment accuracy easily drops Twitter is possibly the best source for sentiment analysis Due to the shortness of tweets and the high percentage of opinions ff-the-shelf solutions only provide quick-and-dirty answers To pursued only with limited available resources or at early stages LP analysis has been recognized as truly valuable by the ebpolu users Full LP capabilities will increasingly be provided as SBI gradually gain importance ebpolu data is going to be released as a benchmark for SBI The goal is to enable the possibility to test every task of the SBI process, thanks to expert-validated ground truth 9

Thanks rchitectural options nalysis (ebpolu) S 10

rchitecture: specifications eb SyNthema 6-nodes virtual cluster (12-cores processor, 10 GB of RM per node) rawling Semantic nrichment ntology nalysis TL DS TL DM Document Base Internal cluster 7-nodes cluster (4-cores processor, 32 GB of RM per node) Internal server 8-cores server, 64 GB of RM ase study: effectiveness lip sentiment accuracy Sample of 600 nglish clips, manually tagged by domain experts 100% 80% 60% 40% 20% 0% Negative Neutral Positive Standard Hard High Low Sentiment Text complexity lipping quality NLP IR random classifier (33%) 11

ase study: effectiveness lip sentiment accuracy Sample of 600 nglish clips, manually tagged by domain experts 100% 80% 60% 40% 20% 0% Blog Facebook Forum General News Twitter random classifier (33%) 12