PRIME Deep Dive. Introduction... 1 Data Model and Load... 2 OLAP Cubes... 2 MTDI Cubes... 5

Size: px

Start display at page:

Download "PRIME Deep Dive. Introduction... 1 Data Model and Load... 2 OLAP Cubes... 2 MTDI Cubes... 5"

Patricia Pitts
5 years ago
Views:

1 PRIME Deep Dive Table of Contents Introduction... 1 Data Model and Load... 2 OLAP Cubes... 2 MTDI Cubes... 5 Data Processing and Storage In-Memory... 7 Cube Publication Stages... 7 Cube Publication: Parallelization and Partitioning... 8 In-memory Data Storage... 9 Cube Publication Performance Profile and Driving Factors... 9 Cube Reporting Functional Considerations for MTDI Cubes On Relationships and Data Correctness Table Row Count Metrics Drive Aggregation on a Table Performance Considerations for MTDI Cubes Partitioning Performance Effect & Guidelines Dashboard Design Considerations Document Data Blending: Multi-cube Engine (MCE) Multi-table Data Import Cube (MTDI) Table Join Recommendation: Use MTDI for Joining Troubleshooting and Diagnostics Useful Logs Engine Perf Log Other Logs Cube Health Checker Introduction MicroStrategy s in-memory cube technology has been generally referred to as PRIME cubes since PRIME stands for Parallel Relational In-Memory Engine, and it engulfs two types of in-memory cubes: OLAP cubes, or the traditional Intelligent Cubes that have existed since 9.x releases, and the new Multitable Data Import (MTDI) cubes introduced in 10.x. Both types of cubes leverage the new partitioning functionality that allows loading more data to memory and querying this data in a parallel fashion leveraging multi-core hardware architectures. The most obvious difference between OLAP and MTDI cubes is how they are authored and created; OLAP cubes are only authored through Developer and are tightly coupled to a MicroStrategy project schema. Because of this, OLAP cubes are traditionally built and maintained by advanced BI developers or

2 architects. On the other hand, MTDI cubes are authored through MicroStrategy Web, and through Visual Insight or MicroStrategy Workstation, these cubes do not need a project schema, and therefore business users can themselves create these cubes by directly loading data from one or more sources and determine what columns are attributes, along with their relationships, and which are metrics. MTDI cubes are the self-service in-memory solution. Aside from how they are exposed to end users, there are also more fundamental and intrinsic differences between these two types of cubes, and these distinctions are the basis for the performance profile and available functionality for these two types of in-memory cubes. These differences are summarized in the diagram below. Data Model and Load OLAP Cubes OLAP cubes consist of a filter and a data template comprised of project attributes and metrics; metrics can be level metrics and can have transformations or conditionality. Cube publication, or the Data fetching from underlying relational sources is similar to that of a standard reports: multiple SQL passes generate several intermediate tables (for e.g. transformations, conditional metrics, aggregations from several fact tables), and all temporary result sets are joined together into a single final result table that is loaded to the Intelligence Server s memory, the intelligent cube instance.

For more information on Intelligent Cubes, please refer to the In-Memory Analytics Guide, part of the official documentation: http://www2.microstrategy.com/producthelp/manuals/en/inmemoryanalytics.

3 For more information on Intelligent Cubes, please refer to the In-Memory Analytics Guide, part of the official documentation: The publication of OLAP Intelligent Cubes consists of the following stages along with the most relevant settings that can help improve publication performance: Query generation and execution: this involves the generation of SQL passes and their execution. Outside of all the SQL generation VLDB settings available for all report/cube executions, in particular the following can help optimize this step: o Parallel Query Execution (VLDB setting under Query Optimizations): Allows the execution of some or all intermediate passes to execute in parallel. See KB44052: How does the feature in the MicroStrategy 9.3.x SQL engine: "Parallel Query Execution" work. Data fetch from the data source: this involves the actual movement of data from the data source to the Intelligence Server memory, and the data preparation within the Query Engine and into the MicroStrategy s In-memory Engine. Of relevance, these settings can help improve memory usage or overall data load performance: o Data population for Intelligent Cubes (VLDB setting under Query Optimizations): Controls how data can be normalized either in the intelligence server at different stages of the publication workflow or at database level, potentially impacting the amount of memory used during cube publication. The default setting is for the Intelligence Server to normalize data early in the publication workflow. See KB32540: Intelligent Cube Population methods in MicroStrategy. o Parallel Data Fetch (Data > Configure Intelligent Cube > Data Partition): Allows fetching slices of the final result table in parallel through different database connections.

4 The parallel data fetch setting is coupled with the cube partitioning settings for Intelligent Cubes in terms of the attribute/column used to split the final result table into equal parts for data transfer and the number of parallel data fetch connections.

5 MTDI Cubes MTDI cubes have a local self-contained schema, containing a set of tables. The corresponding attributes and metrics in this local schema only represent the columns in these tables; therefore, this contrasts from a project-level schema, where fact objects and attribute forms can represent SQL expressions that are executed in the data source. Moreover, cube publication for MTDI consists of loading data from multiple tables in parallel. A summary of the modeling differences between a MTDI cube schema and the project schema, are the following: Project Schema (OLAP) MTDI Schema Comments Logical Representation of Data Source objects Tables Tables Tables Fact columns Fact objects Metrics Metrics directly represent aggregable columns without facts. Dimensional columns Attribute & Attribute Forms Attributes & Attribute forms Attributes Modeling Attribute Form Expressions Attribute Lookup Table Yes No In MTDI, any column-level transformation happens through Data Wrangling, or in Visual Insight with derived attributes; alternatively, use SQL as the table in MTDI. Yes No For MTDI, Attributes do not have an explicitly defined lookup table; the lookup table is auto-generated (union of all distinct elements of an attribute present in all tables). Multiple Forms Yes Restricted MTDI only allows defining a few forms for an attribute: ID, DESC, DATE, SORT, Latitude, Longitude. It also supports which are display forms by default. Other Schema Functionality Fact Expressions Yes No In MTDI, any column-level transformation happens through Data Wrangling, or in Visual Insight with derived attributes Partition Mappings Transparently supported; SQL Engine, brings data from multiple tables result table that is loaded into OLAP cube Multiple Tables with same columns can be loaded in parallel into a single inmemory table In MTDI multiple horizontally partitioned tables can be loaded into a single in-memory table. Transformations Yes No Logical Tables Yes Yes A table in MTDI can be a freeform SQL pass, for example. Table Aliases Yes No Useful for attribute roles; in MTDI the same table can be imported multiple times to achieve a similar

6 Reporting Functionality Standalone filters or Yes prompts Conditional Derived In cube or through derived Metrics, metrics Level Derived In cube or through derived Metrics metrics Transformation In cube Metrics Compound Metrics In cube or through derived metrics If project attributes are mapped to table columns Only through derived metrics Only through derived metrics No Only through derived metrics. Alternatively, imported tables could be actual SQL, in which some columns are expressions. effect, but with potential performance issues. Unlike OLAP cubes, MTDI cubes cannot contain in their definition transformation metrics, conditional metrics or complex compound metrics. These need to be built through derived metrics when reporting from MTDI cubes. Derived Metrics on top of both cubes have in general the same limitations: end-on-hand-type of metrics (grouping = first/last value) is currently supported as a beta feature, and transformations metrics cannot be created as derived metrics. Partitioning Configuration Data Partitioning Yes Yes Allow loading more than 2 billion rows of data in memory; leverage multi-core architectures to query/calculate data in parallel. Auto-partitioning No Yes Automatically determine whether partitioning should be enabled (largest cube table is larger then 1 million), and if enabled, determine partition attribute. For MTDI cubes, In terms of data loading, all tables are loaded into memory in parallel. Particularly for partitioned tables (multiple tables linked to a particular logical table, as in when the same table coming from different sources is dragged multiple times in the data import dialog), the following settings control the degree of parallelism for data loading (how many tables to load at the same time): o Parallel Query Execution (VLDB setting under Query Optimizations): Allows the execution of some or all intermediate passes to execute in parallel. See KB44052: How does the feature in the MicroStrategy 9.3.x SQL engine: "Parallel Query Execution" work. All the SQL queries executed against relational sources are simple select statements retrieving all columns for a table with simple filtering criteria (if so defined). Therefore, there are no joins or aggregations when selecting data from these relational sources.

The exception to the above is when the imported table is actually a user-entered SQL statement modeled as table in the MTDI cube; in this case the SQL statement may have joins and aggregations.

7 The exception to the above is when the imported table is actually a user-entered SQL statement modeled as table in the MTDI cube; in this case the SQL statement may have joins and aggregations. Data Processing and Storage In-Memory After data is transferred to the Intelligence Server from the data sources, the data goes through some processing to store the data in an efficient way, and in a way that querying the data in memory is quick. Both OLAP and MTDI have the same data processing stages, with MTDI cubes having one more stage, related to relationship table building. Moreover, unlike OLAP cubes which consist of a single table, MTDI cubes processing is done for several tables. Cube Publication Stages When publishing a cube in memory, generally speaking the Intelligence Server does the following: Create Lookup tables for attribute data: identify set of unique data elements per attributes, assign numeric indices for these data elements for performance purposes, and attach lookup forms (description, etc.); these indices are used in metric indices later. Lookup table creation is done in parallel, for every attribute or attribute partition. Create Relationship tables: relationship tables relate attributes elements to improve performance when joining data. This only applies to MTDI cubes. Relationship table generation is relatively fast (just indices of multiple attributes based on data present in one or more tables). Relationship table building is done in parallel by relationship table.

Build fact tables: Fact represent the actual tables the users upload; these tables only contain attribute indices along with references to the metric data.

8 Build fact tables: Fact represent the actual tables the users upload; these tables only contain attribute indices along with references to the metric data. This stage involves creating the fact table index, populating metric columns and generating of additional indices that help improve element filtering. Fact table generation is also done in parallel, by table or table partition. Persist the Cube: This step creates the corresponding cube files to allow loading the cubes without the need to republish. Cube Publication: Parallelization and Partitioning The stages of publication are executed one after the other serially, but each stage is completed and processed in parallel as described above to leverage multi-core architectures. By default, the level of parallelization during cube publication varies by the number CPUs in the host machine. By default, only half the number CPUs can at most be used for cube publication (not exclusively). This means the number of executors for every cube publication can use as much half the number of CPUs. Cube partitioning can help further parallelize the processing of the cube publication process, allowing for a single attribute or table to processed in parallel according to its partitions. Following the diagram above, each individual section/table is processed in parallel within each publication stage. OLAP cube partitioning is essentially the same, but consisting of only one table. By default, MTDI cubes are configured to be auto-partitioned. By default, if the largest table has more than 1 million rows, it will be partitioned according to the attribute with the largest number of distinct elements (the lowest-level attribute), to ensure data is distributed equally across partitions. The default number of partitions is 4 (changeable from the Data Import Cube Edit dialog).

In-memory Data Storage To note, there are a few optimizations that help reduce the memory usage when loading cube data to Memory: 1. Metric value compression: This was introduced since 9.

9 In-memory Data Storage To note, there are a few optimizations that help reduce the memory usage when loading cube data to Memory: 1. Metric value compression: This was introduced since 9.x, and this avoids keeping in memory duplicate metric values. This applies to metrics of type double, strings (basically, all types that are larger than integer). The current rule in the code is if all rows for a metric have the same value, or if the row count is greater than the distinct number of rows times 10, then we do not repeat metric values for all rows; we save it once in memory. 2. Index compression: Besides the fact that data is referred to by indices, the space used to hold indexes in memory is also optimized by using smaller data types to hold the index values, depending on the number of elements, for example, for an attribute. So, instead of using integers (4 bytes) to represent an index value, we use a short (2 bytes), which reduces the index space by half, to say an example. 3. Metric data type compression: This optimization changes the data types of columns internally, as in, if we can hold all the values in a column in a smaller data type, we use a smaller data type to hold data in cubes in-memory (e.g. All decimal/big decimal numeric values can be represented with an integer, then we just use integer, to say an example). 4. Normalization of data: Data is normalized in memory so that redundant data is not saved (such as attribute descriptions, etc.) in memory; and we use indexes to relate lookup data and fact data, not relying necessarily on original attribute primary keys (which may not be as efficient to store or relate data among tables, e.g. strings). Cube Publication Performance Profile and Driving Factors The general performance profile of the publication of a cube is shown below. There are factors that may affect each phase in the publication, and these factors depend in the cube schema and data in the cube.

10 The general curve applies to both MTDI and OLAP cubes, with the differences that OLAP cubes only consist of one table and do not contain relationship tables (no need to join data among tables inmemory). The shape of the curve again would vary based on the schema and data being published. Every publication stage has main driving factors that could increase or decrease its execution time or memory usage, and these factors should be considered especially when publishing MTDI cubes. OLAP cubes depend on project schemas, which generally rely on database designs that optimize data organization with optimized data types for unique identifiers; plus, these OLAP cubes depend on architected, tested and optimized MicroStrategy schemas. MTDI cubes, though, are created and defined on-the-fly at data import stage, allowing for many or any type of data schemas and data types; therefore, a few guidelines help improve the publication of these cubes. Stage Factors Affecting Performance Main task in stage Query Execution and Data Fetch Avoid loading a lot of redundant data for large datasets (normalize data); repeated data may be loaded in memory for some time until attribute lookup tables are finished. Use the smallest data type possible in the schema configuration for every column to minimize memory usage in memory in this stage. Raw data is loaded in memory in this stage. Lookup Table Generation Relationship Table Fact Table Generation String or dates as Data type of ID attributes are slower to process. Very high cardinality of individual attribute elements in combination with string/date data types can slow down the lookup generation phase. The number of the automatically detected or user defined relationships tables generated. Larger fact tables will take longer to index and populate Larger number of metrics in a single fact table can delay the full fact table generation stage. Identifying unique elements for columns and sorting them. Sorting string Identifying unique tuples of two related attributes. There is a relationship table building validation logic: if two attributes are supposed to have a one-to-many relationship but the data appears otherwise as manyto-many, the relationship table is not built. Main index generation and metric population is done for every loaded table.

11 Refresh operations on single fact table will cause full table metric population; if only small set of metrics change it may make sense to partition into smaller vertical sections of the table. To visualize detailed logs on the full publication process, enable the Engine > Perf log in the MicroStrategy Diagnostics tool. Notice that this log is quite verbose, so enabling it may affect performance of executing objects and the cube s publication time. Cube Reporting Cube reporting for both OLAP and MTDI cubes consists on queries against MicroStrategy s the Inmemory Engine, which executes the data queries on the in-memory cube. Because of the configuration/composition of the both OLAP and MTDI cubes, OLAP cubes tend to be essentially faster because there is only one in-memory table in the cube, pre-joined with all attributes defined in the Intelligent Cube template. For MTDI cubes, in contrast, there will be more processing when retrieving data since data in tables will need to be joined to filter, project and aggregate data. OLAP Multiple queries (Cube Subset Instructions, CSI) Queries against, one single inmemory table for projection, aggregation, filtering MTDI Multiple inmemory queries (CSIs) against multiple inmemory tables for joins, projection, aggregation, filtering Auto-Join functionality (use imported tables as relationship tables, e.g.): No explicit need to define relationships, but useful to

12 optimize queries. Not available: Transformations, begin/end-onhand metrics using metric dimensionality Functional Considerations for MTDI Cubes MTDI cubes, whenever they are composed of multiple tables, will have different functional and from OLAP cubes. This is just a summary of the most commonly encountered ones. On Relationships and Data Correctness Data Import automatically tries to identify relationships among attributes in a table; most of the time, it makes sense to delete these automatically created relations, and define the minimum set of relations that make sense for the user. Having incorrect relationship can cause unexpected data.

$It is possible to disable this automatic identification of relationships by modifying the following registry setting: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microstrategy\DataImport\AutoDetect Add a$

13 It is possible to disable this automatic identification of relationships by modifying the following registry setting: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microstrategy\DataImport\AutoDetect Add a DWORD entry named "NeedAutoDetect" with value of "0" to disable relation auto detection. Any other value for that registry entry or if the entry is not present, relation auto detection is enabled. On One-to-Many Relationships A relationships between two attributes has a direct effect on how data is, for example, aggregated. Consider the following example below, in which having or not having a relationship between Year and Quarter will have a direct effect on how data is aggregated. Consider the following multi-table cube with cost data is stored at the Quarter (and Category) level: Also consider a visualization with the quarterly cost in the example along with another metric showing the corresponding yearly cost for every quarter. The per-quarter yearly cost metric is defined as a level metric as shown below:

14 Whenever there is a relationship is present or not, the above metric is calculated or aggregated at a different level. Therefore, relationships between attributes have a direct effect on data correctness; these one-tomany relationships in particular should be defined to have proper aggregation when dealing with level metrics. Notice, though, that inaccurate one-to-many relationships can also affect data correctness; for example, a fact table in an MTDI cube may have both year and month attributes, and a revenue metric. If there is a one-to-many relationship between year and month (one year can be related to many months, and a month can only be related to one year), then the revenue metric s true level is at the month level. In this case, the metric s index only can be expressed only in terms of month. If these two attributes were not truly one-to-many, the reduced index may result in unexpected data (double counting for example). On Many-to-Many Relationships Many-to-Many relationships for the most part are unnecessary, so avoid defining them in the MTDI cube definition. CSI engine may cause additional unnecessary joins, and the additional many-to-many generated relationship tables may use redundant data present already in user-uploaded tables. Unlike one-to-many relationships, which help improve performance to avoid doing more expensive joins between tables or to have proper data correctness, many-to-many relationships are not necessary. CSI auto-join will identify that if two unrelated attributes are in the same table, then it will use it as the relation between these two attributes. This makes it unnecessary to create separate many-to-many relationships between attributes.

15 Table Row Count Metrics MicroStrategy automatically creates a new base metric called Row Count [Table Name] for every table in a MTDI cube. This metric is equivalent to having a column with value of 1 for every row. This allows to see the rows involved in any aggregation on the grid. This metric is visible from Visual Insight.

Drive Aggregation on a Table Aggregation functions such as Count traditionally have a function parameter called FactID in project schema metrics; this parameter

16 Drive Aggregation on a Table Aggregation functions such as Count traditionally have a function parameter called FactID in project schema metrics; this parameter forces the SQL engine to count the given column in the table that maps to the provided fact. This can also be achieved in MTDI cubes through the MetricID function parameter.

17 In the example above, the metric counts the distinct months present in the Order_Fact table; another count metric could count the months in another table, such as an inventory table, for example. Performance Considerations for MTDI Cubes MTDI cubes, given its multi-table configuration, will also have a different performance profile from OLAP cubes. When querying MTDI cubes internally, the dashboard in question will generate a set of instructions (Cube Subset Instructions) that are executed by the In-Memory Engine. These CSI queries are like SQL statements.

In the example above, in MTDI cube containing tables from MicroStrategy Tutorial, retrieving two metrics from two separate fact tables will cause three queries: retrieving data from each fact table

18 In the example above, in MTDI cube containing tables from MicroStrategy Tutorial, retrieving two metrics from two separate fact tables will cause three queries: retrieving data from each fact table and then joining the results from the first two passes. The main factor that will affect performance when reporting off these cube is mainly the join operation. To name a few, these are the factors that can impact response time when querying MTDI cubes: One-to-many relationships. Correctly set relationships help improve query performance by avoiding doing more expensive joins between user-uploaded tables, and instead leveraging automatically generated relationship tables, which tend to be smaller than the user-uploaded tables, in joins. Number of tables joined in query. Retrieving metric data from many fact tables in the MTDI cube could affect execution time as there are more join operations. Security filters will force the application of filters to ensure data is secured; if a table does not have the security attribute, the CSI/SQL engine, through its auto-join functionality, it will find a way to join the main table that has the user-requested data with other necessary tables that contain attributes used in the security filter. This automatic joining may impact performance.

join join A, B B,C C, D D, E join Table with Userrequested data Table with attributes used in filters Complex filters that will result in multiple passes and joins; if multiple filter conditions are

19 join join A, B B,C C, D D, E join Table with Userrequested data Table with attributes used in filters Complex filters that will result in multiple passes and joins; if multiple filter conditions are based on multiple attributes not present in the same table, then similar to the security filter case, the filter will be applied by joining data. Many-to-Many relationships for the most part are unnecessary, so avoid defining them in the MTDI cube definition. CSI engine may cause additional unnecessary joins, and the additional many-to-many generated relationship tables may use redundant data present already in useruploaded tables. Partitioning Performance Effect & Guidelines For cube reporting, partitioning can improve reporting performance against cubes, for both MTDI and OLAP cubes; nonetheless, there are multiple considerations. For cube reporting, the benefit of partitioning relies on parallelizing the aggregation of data in parallel leveraging the multiple CPUs present in the Intelligence Server host machine. Therefore, in order to take full advantage of partitioning, the partitions of a table should be similar in size so that the processing load is distributed equally across the parallel/map tasks when querying cubes. Consider the following points: Select the correct partition attribute/table:

o o o o The partition attribute should be an attribute in the largest table (for the case of MTDI cubes); any table larger than 2 billion rows must be partitioned.

20 o o o o The partition attribute should be an attribute in the largest table (for the case of MTDI cubes); any table larger than 2 billion rows must be partitioned. The partition attribute should be the lowest-level attribute, with the most number of distinct elements; this attribute should be aggregated over in most or all queries, and it will almost never be used in grids by users directly. The attribute ID should be numeric, date or text. The partition attribute should not be a parent attribute of another attribute. Do not partition relatively small cubes. Sample case study, for an OLAP cube with 20,000 rows, a dashboard running off this cube degraded by 50% after the cube was split into 24 partitions. The reduce step cost was greater than just not parallelizing/partitioning at all. Follow a similar guideline to the auto-partitioning threshold: if there are more than 1 million rows in a table/cube, consider partitioning. Select the correct number of partitions: o Do not set partition numbers to more than half the number of (logical) CPUs. By default, for the cube publication process, the number of partitions processed in parallel is not more than half the number of CPUs in the machine. More partitions would not be processed in parallel (as explained in section Data Processing and Storage In-Memory) o Increasing number of partitions only does not help improve performance. By default, to ensure performance is maintained under user concurrency, every cube query can use at most 4 CPUs (not exclusively), ensuring other CPUs have fully dedicated time for executing other tasks. This means that even if a cube is split into more than 4 partitions, no more than four map tasks will execute in parallel. This number can be changed through registry settings (or MSIReg.reg); contact technical support or professional services especially if your environment has many cores (e.g. 86 cores) that could dedicate more CPUs per job. Identify reporting scenarios that will parallelize queries that include any of the following types of functions: o Additive (e.g. Sum, Count, Min, Max) and Semi-additive (e.g. Avg, StdDev, Variance, etc.) functions. Additive functions can be re-aggregated. Semi-additive functions are be decomposed into additive functions (e.g. Avg = Sum/Count) and re-aggregated; therefore these functions can be evaluated in parallel. o Count Distinct o Any scalar function.

21 Identify reporting scenarios that will NOT be parallelized and may see worse performance; these include cases that use non-additive functions (e.g. Median) Dashboard Design Considerations Both, MTDI and Dashboards/Dossiers, allow joining data from multiple tables and datasets, respectively. Though, both have different mechanisms to achieve this joining. Document-level data blending is useful mainly to allow displaying data from multiple pre-existent reports, view reports or OLAP cubes in single visualizations. Nonetheless, the recommendation is to model data joins from multiple datasets (multiple tables) within the MTDI cubes if datasets are MTDI cubes, instead of relying on the document-level data blending functionality. Behavior is consistent and deterministic, more user friendly (no need to know special settings), and there is less overhead for the document to execute. Document Data Blending: Multi-cube Engine (MCE) Documents have the capability to blend data from different datasets. This data blending is carried out by the Document s Multi-Cube Engine (MCE). This engine determines what data is needed to render visualizations in the dossiers; these data requirements are materialized in the form of multiple queries (Document-level cube subset instruction, CSI, statements) against the base document datasets. These document-level queries are processed by the individual datasets (kind of like mini view reports). The MCE queries to the document datasets are based on an interim schema representing the datasets in the document; this schema is formed in function of the linked attributes defined by the user.

The MCE generates these structures in its interim schema Document-level lookup tables for linked attributes: this allows populating full set of distinct elements present in all datasets.

22 The MCE generates these structures in its interim schema Document-level lookup tables for linked attributes: this allows populating full set of distinct elements present in all datasets. Document-level relation tables used to blend data between datasets: identify necessary relationship tables based on linked attributes and populate; in cases where different datasets have incomplete or complementary data, Document Setting Data Combination can help have full relationship tables coming from all datasets, but this can incur in increased memory usage and affected document execution time (more data joining in memory)

Blending behavior may differ on the of data of the base dataset; Data Combination should be turned on if datasets have partially common data (E.g. One dataset has year 2001, and a second has some 2001 and all of 2002 data).

23 Some side effects of the using MCE is that Generation of relation tables has a performance overhead. Adding new datasets may modify the set of relation tables at the document level potentially resulting in different data. Blending behavior may differ on the of data of the base dataset; Data Combination should be turned on if datasets have partially common data (E.g. One dataset has year 2001, and a second has some 2001 and all of 2002 data). Using maximum relations may lead to potentially significant performance overhead. Grid-level settings such as outer/inner join for attributes or metrics do not take effect; to achieve similar effects for attributes, when data blending, use the dataset join behavior. Dataset Join Behavior Grid-level Join Behavior VS Besides the structures above, the MCE is also responsible to calculate Derived Attributes and element groupings, independently of whether there are multiple datasets or not. When there are multiple linked datasets, the derived attributes or groupings are populated based on the data coming from all datasets. Multi-table Data Import Cube (MTDI) Table Join Querying a MTDI cube is different; all tables are exposed as a MicroStrategy schema: collection of attributes, relationships among them and their mappings to tables; and metrics. Depending on the set of attributes and metrics requested in the grid/visualization, a set of cube subset instructions (CSI) are generated, similar to our SQL generated by our SQL Engine, determining

24 which tables in the cube to get data from, their joins, the aggregations and filtering specifications; the intermediate passes, and the final pass. Unlike the document-level blending engine (MCE), the MTDI cubes, as explained in previous sections, reporting against MTDI cubes Have a deterministic behavior according to a stable cube schema based on the attribute relations defined by the user along with the attribute mappings to the different tables. Query behavior is independent of data contents (no necessary settings analogous to the Data Combinations setting for data blending). Only by changing the cube schema explicitly can the join behavior change. All the automatically generated lookup and relation tables are created at publication time, and data results are consistent. There is no need to create new relation tables when used by a Dossier; all are already loaded in memory. All inner/outer join at the grid/visualization level are honored; the dataset join behavior setting is ignored. Recommendation: Use MTDI for Joining All in all, document-level data blending is useful mainly to allow displaying data from multiple pre-existent reports, view reports or OLAP cubes in single visualizations. Nonetheless, the recommendation is to model data joins from multiple datasets (multiple tables) within one MTDI, instead of relying on multiple document-level data blending functionality. Behavior is consistent and deterministic, more user friendly (no need to know special settings), and there is less overhead for the document to execute. Troubleshooting and Diagnostics Useful Logs Engine Perf Log The Engine Perf log is enabled through the MicroStrategy Diagnostics utility, and provides detailed information on both cube publication and plethora of information on what is going on within the Analytical Engine. The log is mostly useful for detailed troubleshooting as the log can be verbose. Nonetheless, it may be useful at times to understand multiple details: Execution any cube publication: Allows seeing the data types and amount of data loaded to the server (MSI table description), the time spent building lookup, relation and fact tables, and the level of parallelization (by seeing running threads).

Execution for cube querying: Allows seeing the CSI passes being executed and how long it takes to run these, along with the level of parallelization (by seeing the threads running the different

25 Execution for cube querying: Allows seeing the CSI passes being executed and how long it takes to run these, along with the level of parallelization (by seeing the threads running the different evaluation tasks). When troubleshooting functional or performance issues, this log is very useful, but it must be run in single-user or single-job mode; in other words, the log should capture the information only for one job (one report execution, one cube publication, one document execution) in the whole server. Otherwise, the log will output information on multiple objects, and it is hard to relate those entries to the actual objects being run. For performance especially, it must be done in single-user, single-job mode. Other Logs A couple of other relevant logs to see the internal CSI passes only, without any other information, you can enable the Engine > CSI and MCE > Trace logs. The CSI log will output the CSI (Cube Subset Instructions) executed by the analytical engine; it is possible to see if there are crossjoins or how tables are joined internally when querying MTDI cubes. The MCE Trace log will also include CSI passes, but these are the document-execution specific. As explained in Dashboard Design Considerations section, the multi-cube engine s performs additional joins and queries; these passes are captured in the MCE Trace. Cube Health Checker We introduced a new command from the Intelligent Server console available when running the Intelligent Server as an application from the command line only for troubleshooting purposes:

26 cubehealthcheck. This command automatically verifies that the cube is structurally consistent and suggests corrective actions for the cube whenever possible. The command also collects relevant logs about the cube. The command can perform diagnostics for one particular or all loaded In-memory cubes (both Intelligent and Data Import cubes). Once the server is run as an application (from Windows, run Program Files (x86) /MicroStrategy/Intelligence Server/MSTRSvr2_64; from Linux, run [MSTR_Install_PATH]/bin/mstrsvr), enter the following command: cubehealthcheck {<cube_id> allcube} <cube_id> above represents the GUID of the cube report. If allcube is entered, then diagnostics are done for all the loaded cubes in the Intelligence Server. After executing the command, the console will output general information about the cube and suggested corrective actions for any problem found. A sample output is like the following: Project:3 Cube: CubeName... [BEGIN] Check basic cube information [INFO] Cube Name: CubeName... [INFO] Cube ID: CD8C4EC511E5E61004DB0080EF552B41 [INFO] Version ID: F29A767711E637C048E20080EFC5C708

27 [INFO] The cube is a multi-table EMMA cube. [INFO] The cube is not partitioned. [INFO] The largest fact table has 96 row(s). [INFO] The cube size is bytes. [INFO] The report instance size is Bytes. [INFO] Detailed information is in memory_usage.log [END] Check basic cube information [BEGIN] Dump cube structure logs [INFO] cubeinfo.log is dumped. [INFO] memory_usage.log is dumped. [INFO] cube header is dumped. [INFO] Server logs are copied from /home/diakite/mstr/ LIN/BIN/Linux [END] Dump cube structure logs [BEGIN] Check Relation tables [INFO] 1 relation tables in the cube [INFO] Detailed information is in cubeinfo.log [END] Check Relation tables [BEGIN] Check fact tables [INFO] There are 1 fact tables [INFO] Detailed information is in cubeinfo.log [BEGIN] Check fact table 0 [INFO] There are 96 row(s) in the table [INFO] There are 2 attribute(s) in the table [INFO] Attributes: State,INSTNM [WARNING] There are level metrics in the cube: Metric with Nulls, Row Count - CubeName.xlsx [WARNING] These level metrics have the same rows as main index, but in different order: Metric with Nulls, Row Count - CubeName.xlsx [SUGGESTION] If the cube is published before MicroStrategy 10.3, you may run cubehealthcheck levelindex <cubeid> to check whether the cube is corrupted and fix it. [END] Check fact table 0 [END] Check fact tables [BEGIN] Check internal pointers [END] Check internal pointers ====== Summary ===== [WARNING] There are level metrics in the cube: Metric with Nulls, Row Count - CubeName.xlsx [WARNING] These level metrics have the same rows as main index, but in different order: Metric with Nulls, Row Count - CubeName.xlsx [SUGGESTION] If the cube is published before MicroStrategy 10.3, you may run cubehealthcheck levelindex <cubeid> to check whether the cube is corrupted and fix it. [INFO] logs are written to cubehealthchecker.log Moreover, the Intelligence Server generates a set of logs for each cube. The locations for these logs are generally these for two sample Operating Systems: Windows: Program Files (x86)/microstrategy/intelligence Server/CubeHealthReport Linux: [INSTALL_PATH]/IntelligenceServer/CubeHealthReport

28 If it is necessary to contact Technical Support for cube-related topics, please collect all the generated logs for the cubes in question and provide this information.

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10 Onur Kahraman High Performance Is No Longer A Nice To Have In Analytical Applications Users expect Google Like performance from