Business Intelligence You can t manage what you can t measure. You can t measure what you can t describe Ahsan Kabir
A broad category of applications and technologies for gathering, storing, analyzing, sharing and providing access to data to help enterprise users make better business decisions -Gartner
Why BI? Performance management Identify trends Cash flow trend Fine-tune operations Sales pipeline analysis Future projections business Forecasting Decision Making Tools Convert data into information
What happened? What is happening? Why did it happen? What will happen? What do I want to happen? Past Present Future ERP CRM SCM 3Pty Black books Data How to Think?
Major Players in BI Market
Improving organizations by providing business insights to all employees leading to better, faster, more relevant decisions Advanced Analytics Self Service Reporting End-User Analysis Business Performance Management Operational Applications Microsoft Business Intelligence Vision
Corporate BI Commonly design, implement and maintain data warehouses, data models and integrated reporting and analytics. It require significant time, expertise and money but total business is not covered. Self-service BI (SSBI) SSBI is to empower analysts so that they can design, customize and maintain their own BI solutions. SSBI is a combination of corporate BI and extensions to empower analysts to more fully exploit it. Managed BI Ensuring responsible BI by managing review, approve and audit solutions Data is delivered in a compliant, responsive and secure way and access permissions are enforced BI implementations
SharePoint Dashboards & Scorecards SharePoint Collaboration Excel Workbooks PowerPivot Applications Analysis Services (SSAS) Integration Services (SSIS) DQS Reporting Services (SSRS) Master Data Services (MDS) ERP/CRM DB Cloud Born Data Social Network Microsoft Business Intelligence Components
Step 1 : Business Analysis Step 2 : SSIS Different Source of Data (RDBMS, FTP, Web Services, XML, CSV, EXCEL, etc.) DQS (Data Quality Services) Integration, cleansing, profiling MDS (Master Data Service ) Centrally managing organizational master data ETL (Extraction, Transformation and Loading) framework Step 3 : SSAS Create an OLAP multi-dimensional structure making data available for analytics and reporting SSAS can pre-calculates, summarizes and stores the data in a highly compressed format Reporting is provided by data through SSAS cubes Step 4 : SSRS SSRS (SQL Server Reporting Services) allows creating formatted and interactive reports Step 5 : PowerPivot, Power View, Excel services provide rapid data exploration, visualization, and presentation experience for users. It allows users to interrogate the data from various aspects by using charts, graphs, drill-down paths etc. Excel and PowerPivot services can be used for deploying Excel or PowerPivot to SharePoint in order to make it available to other people, turning Personal BI into Organizational BI. Microsoft Business Intelligence Road Map
Data Warehouse was designed specifically to be a central repository for all data in a company disparate data from transactional systems
DW is a relational database that is designed for query and analysis Ship and integrate data from different sources to the analyst Contains data derived from transaction, internal-external data & archived data But it s not a copy of a source database Characteristics DW
High query performance Analysis queries place extra load on transactional systems Query optimization is hard to do well Queries not visible outside warehouse Local processing at sources unaffected Can operate when sources unavailable Can query data not stored in a DBMS Summarized and Extremal data at warehouse Advantages of Warehousing
Before enter into warehouse Data is processed (cleansed and transformed) Data Marts Users query the data warehouse Warehouse Data is kept in a specific business line wise. DW Architecture
Data Warehouse Corporate/Enterprise-wide Union of all data marts Organized on E-R model* Data Mart Departmental Single business process Star-join* DW vs. Data Mart
Transactional Databases 1. ER modeling is used 2. 3NF Normalized 3. Data is spited into tables 4. Hard to visualize 5. Slows down the response time of the query and report Warehouse Database 1. Dimensional modeling 2. De-normalized 3. Data is kept in fact and dimension 4. Flexible for user perspective 5. Response time and increases the performance Transactional Databases vs. Data warehouse
Warehouse WID (PK) Location Address district WU_Code Client_Information CID (PK) Name Address Credit_Limit User_Profile UId (PK) Name Address Email CellNo Requisition RID(PK) CID (FK) WID (FK) UID (FK) Requestion_Date Product_Profile PID (PK) description brand category Requisition_Details RID (PK) RDD (FK) PID (FK) promotion_key (FK) dollars_sold units_sold dollars_cost Entity Relation Diagram 16
TIME time_key (PK) SQL_date day_of_week month STORE store_key (PK) store_id store_name address district floor_type CLERK clerk_key (PK) clerk_id clerk_name clerk_grade Sales - FACT time_key (FK) store_key (FK) clerk_key (FK) product_key (FK) customer_key (FK) promotion_key (FK) dollars_sold units_sold dollars_cost PRODUCT product_key (PK) SKU description brand category CUSTOMER customer_key (PK) customer_name purchase_profile credit_profile Address City country PROMOTION promotion_key (PK) promotion_name price_type ad_type DIMENSONAL MODEL 17
Data Warehouse Federated Database Extraction Query Rewritten Queries Query Answer Warehouse Answer Mediator Data warehouse Create a copy of all the data and Execute queries against the copy Federated database Pull data from source systems as needed to answer queries Source Systems Federated Databases vs. Data warehouse
Before After Name Address City House No DoB State Country Ahsan CDA Avenue CTG 181/1 05/11/1978 BD Kabir RB Avn CTG 41/6 23/04/1991 DHK Bangladesh Name Address City House No Data Quality problems DoB State Country Ahsan CDA Avenue CTG 181/1 05/11/1978 CT Bangladesh Kabir RB Avenue DHK 41/6 23/04/1991 DHK Bangladesh Indication : Completeness Accuracy Conformity Consistency
Data Quality Issue Sample Data Problem Standard Are data elements consistently defined and understood? Gender code = M, F, U in one system and Gender code = 0, 1, 2 in another system Complete Is all necessary data present? 20% of customers last name is blank, 50% of zip-codes are 99999 Accurate Valid Does the data accurately represent reality or a verifiable source? Do data values fall within acceptable ranges? A Supplier is listed as Active but went out of business six years ago Salary values should be between 60,000-120,000 Unique Data appears several times Both John Ryan and Jack Ryan appear in the system are they the same person? Data Quality Issues
Data Quality Services (DQS) is a Knowledge-Driven data quality solution, enabling to easily improve the quality of their data Data Quality Services (DQS)
Simplicity Users should understand the design Data model should match users conceptual model Queries should be easy and intuitive to write Expressiveness Include enough information to answer all important queries Include all relevant data (without irrelevant data) Performance An efficient physical design should be possible DW Design Consideration
DW consists of Fact tables and dimensions. The relationship between a Fact table and dimensions are based on the foreign key and primary key. Facts are numeric measurements or additive value that represent a specific business aspect or activity. Examples : Unit Cost, Sale Amount, Quantity Sold Salary Amount Purchase amount Dimension has a primary key, which is called the surrogate key. The primary key of the source system will be stored in the dimension table as the business key Dimension tables are tables that contain descriptive information. Dimension table contains a list of columns Example : Incase of Product Product Name Origin Category Manufacturer Date Sales Date The Fact table is a table with foreign keys pointing to surrogate keys of the dimension tables Component of Data Warehousing
TIME time_key (PK) SQL_date day_of_week month STORE store_key (PK) store_id store_name address district floor_type Sales - FACT time_key (FK) store_key (FK) clerk_key (FK) product_key (FK) customer_key (FK) promotion_key (FK) dollars_sold units_sold dollars_cost PRODUCT product_key (PK) SKU description brand category CUSTOMER customer_key (PK) customer_name purchase_profile credit_profile Address City country CLERK clerk_key (PK) clerk_id clerk_name clerk_grade PROMOTION promotion_key (PK) promotion_name price_type ad_type Dimensional Modeling 24
The diagram resembles a star Center of the star consists of one fact table Points of the star are the dimension tables Optimizes performance by keeping queries simple and Providing fast response time Star schema 25
Fact table Date Promotion ONE fact table Sales 4 dimension tables Product Dimension tables Store Star Schema for the retailer s DW 26
TIME time_key (PK) SQL_date day_of_week month STORE store_key (PK) store_id store_name address district floor_type CLERK clerk_key (PK) clerk_id clerk_name clerk_grade Sales - FACT time_key (FK) store_key (FK) clerk_key (FK) product_key (FK) customer_key (FK) promotion_key (FK) dollars_sold units_sold dollars_cost PRODUCT product_key (PK) SKU description brand category CUSTOMER customer_key (PK) customer_name purchase_profile credit_profile Address City country PROMOTION promotion_key (PK) promotion_name price_type ad_type DIMENSONAL MODEL
Simplicity Users should understand the design Data model should match users conceptual model Queries should be easy and intuitive to write Expressiveness Include enough information to answer all important queries Include all relevant data (without irrelevant data) Performance An efficient physical design should be possible Goals for Logical Design 28
Step 1 : Identify business subjects and fields of information of relevant subjects Step 2 : Discover entities and attributes and relationships Step 3 : Identify which information belongs to a central fact table Step 4 : Which information belongs to its associated dimension tables Step 5 : Identify cleansing points Step 6 : Which data need to mange centrally Step 7 : Define surrogate key and business key Step 8 : Make ETL Package Step 9 : Organize data structures on disk Steps of DW Implementation 29
Thanks