How Clean is Clean Enough? Determining the Most Effective Use of Resources in the Data Cleansing Process

Size: px
Start display at page:

Download "How Clean is Clean Enough? Determining the Most Effective Use of Resources in the Data Cleansing Process"

Transcription

1 How Clean is Clean Enough? Determining the Most Effective Use of Resources in the Data Cleansing Process Research-in-Progress Jeffery Lucas The University of Alabama, Tuscaloosa, AL Uzma Raja The University of Alabama, Tuscaloosa, AL Rafay Ishfaq Auburn University, Auburn AL, Abstract Poor data quality can have a significant impact on system and organizational performance. With significant increase in data gathering and storage, the number of sources of data that must be merged in data warehouse and Enterprise Resource Planning (ERP) implementations has increased significantly. This makes data cleansing as part of the implementation conversion, increasingly difficult. In this research we expand the traditional Extraction-Load-Transform (ETL) process to identify subprocesses between the main stages. We then identify the decisions and tradeoffs related to the various decisions on allocation of time, resources and accuracy constraints on the data cleansing process. We develop a mathematical model of the process to identify the optimal configuration of these factors in data cleansing process. We use empirical data to test the feasibly of the proposed model. Multiple domain experts validate the range of constraints used for model testing. Three different levels of cleansing complexity are tested in the preliminary analysis to demonstrate the use and validity of the modeling process. Keywords: Data Cleansing Integer Programming, Data Cleansing, ETL. Optimization Thirty Fifth International Conference on Information Systems, Auckland

2 Decision Analytics, Big Data, and Visualization Introduction Data Warehouse and ERP implementations involve large data integration projects. As the amount of data and the number of data sources increase, this integration process becomes increasingly complex. As of recent estimates, 88% percent of all data integration projects either fail completely or significantly overrun their budgets (Marsh 2005). Even in the projects that are completed, the post-conversion data quality is often less than desirable (Watts et al. 2009). Poor quality data within ERP and Data Warehouse systems can have a severe impact on organizational performance including customer dissatisfaction, increased operational cost, less effective decision-making and a reduced ability to make and execute strategy (Redman 1998). As data sets become larger and more complex the risks associated with inaccurate data will continue to increase (Watts et al. 2009). While the costs of poor data quality can be difficult to measure, many organizations have determined it is sufficient enough to take on data cleansing projects and to institute Master Data Management roles and techniques within the organization. Gartner estimates the revenue for data quality tools to reach 2 billion by the end of 2017 (Friedman and Bitterer 2006). As sources of data grow and data sets become larger, the impacts of poor data quality will continue to grow. ETL is a process in which multiple software tools are utilized for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. To address the need for quality data, organizations will often take on data cleansing activities as part of the conversion ETL process. Given high quality data is associated with perceived system value (Wixom and Watson 2001), it is important to fully understand the cleansing process adopted by the organizations during the ETL process. During the cleansing process, the expert conducts data analysis, define mapping rules, complete data verification and manually clean the data when necessary. The quality of these tasks depends upon the expertise of the individuals performing the cleansing (Galhardas 2011). This presents the conversion expert with the question of determining the best use of experts time. Depending on the risks and costs associated with dirty data compared with the costs of additional cleansing efforts, a conversion expert could choose to build additional and potentially more complex data cleansing rules, to manually clean the data or to channel the inaccurate data in their production environment. The cost and associated risks associated with these choices must be considered to determine a proper course of action and the best use of resources during the cleansing process. In this paper, we investigate the interplay of these resource assignment and data quality issues. We draw from existing literature on data quality, ETL process and costs of quality. As per our knowledge, this is the first study to model the iterative nature of the data cleansing process within ETL process and use optimization techniques to identify the most suitable strategy for data cleansing. We present a detailed view of the data cleansing process that is the framework for this study. We then discuss the methodology that will be used in the study followed by an integer programming mathematical model for the analysis. We conclude by discussing our plan to conduct the analysis Literature Review Data cleansing, also called data cleaning or scrubbing, deals with detecting and removing errors and inconsistencies from data in order to improve the quality of data (Rahm and Do 2000). In general the cleansing expert looks to determine the potential error types, search the data to identify instances of the potential errors in the data and to correct the errors once found (Maletic and Marcus 2000) From a process view standpoint, the cleansing expert conducts data analysis, defines the workflow and mapping rules, verifies the results and conducts the transformation (Rahm and Do 200). Thus the details of the process through which data cleansing is performed plays an important role in the overall quality of the data. In this study we develop a granular view of the ETL process and the related stages. Data cleansing, especially for large datasets, has significant costs associated to it. Poor quality of input data leads to increased time and effort expended on the cleansing process without guarantees of high quality outcome. There are four types of costs associated with poor data quality (Haug et al. 2011). There are direct and hidden operational costs such as manufacturing error, payment errors, long lead times and employee dissatisfaction. There are also both direct and hidden operational costs on strategic decisions including poor planning, poor price policies, decreased efficiency, etc. (Haug et al. 2011). When examining 2 Thirty Fifth International Conference on Information Systems, Auckland 2014

3 How Clean is Clean Enough? decisions made in ETL processes, cost is an important factor to consider, since these direct and indirect costs can undermine organizational policies on data accuracy and availability. In this research we consider cost as a major factor in decisions regarding cleansing cycles and accuracy requirements. When assessing the state of data quality within an organization, both subjective and objective measures must be considered (Pipino et al. 2002). Subjective measures deal primarily with the data consumer's perception of the data quality. These subjective measures can be captured with the use of a questionnaire. As mentioned earlier, these subjective measures are important as they drive user behavior. The most straightforward objective measure is a simple ratio. The simple ratio measures the ratio of desired outcomes to total outcomes (Pipino et al. 2002). This study will focus on this objective measure of accuracy. The user s impact on the data cleansing processes will be determined using the ratio of inaccurate records to total records before and after the user s involvement. In this research we use accuracy requirements as one of the main constraint when making decisions regarding cleansing cycles. Research Framework Data conversions for ERP and data warehouse systems are extremely complex and multi-faceted. Current trends in data collection results in multiple sources of data that are available for use in enterprise systems. Figure 1. Expanded ETL Process Outline This data is merged and mapped into the proper format for loading into the target system, then loaded and verified, the classic ETL process. Most literature has focused either on the ETL process including mapping rules and solving the merge / purge problem or it has focused on data cleansing in existing systems. As mentioned in the introduction, these two processes are tightly bound during the conversion process and both are typically required to be successful. Much of research in ETL considers a simplistic view of the flow of data from extract to transform to load processes. While at a higher level, this hold true, in reality, there is a complex mesh of decisions that are to be made within these process that impact the cost, time and resource utilization of the data cleansing process, to obtain a target accuracy (data quality). To understand the complex nature of the decisions within the ETL process, we take a closer look into the black box. The actual process of data flow is iterative in nature and involves decisions regarding expert resource assignment to tasks, time allocation, costs and accuracy compromises (Vassiliadis et al. 2009). A detailed view of the expanded ETL process is presented in Figure 1. Thirty Fifth International Conference on Information Systems, Auckland

4 Decision Analytics, Big Data, and Visualization Process Outline Prior to beginning the ETL process, the expert must first define the initial conversion requirements. This includes identifying sources of data and understanding the layout and schema of the extracted data. The expert will also define mapping and transformation rules and any data correction rules that are known at the start of the project. Prior to the initial extraction of data the expert will also work to understand the users trust level with each source of data (Vassiliadis et al. 2002) After developing the initial mapping and data cleansing rules, the data is ready for extraction. Following extraction, the data is loaded into the ETL tool. At this point the data is co-located simplifying the data analysis and cleansing process. Data analysis and the transformation step is an iterative process. Initial data review and profiling produces simple frequency counts, population levels, simple numerical analysis, etc. At this point, experts can begin data editing including checking for missing data, incorrect formats, values outside of the valid range, etc. Simple cross-field editing can also be used to ensure effective data synthesis. After completing the initial data analysis, the data is prepared for more detailed analysis. The data is consolidated using the transformation and cleansing rules identified to this point. Once the data is consolidated, advanced techniques can be used to analyze the data. Some of the techniques used during this phase can include association discovery, clustering and sequence discovery (Maletic and Marcus 2000). More detailed and specific business level rules can also be applied at this point in the process. After several rounds of refining the extraction, defaulting, mapping and consolidation rules through data analysis the expert is ready to transform and load the data. While these transformations and mappings may be complex, they should be thoroughly defined and validated through the data analysis and cleansing process. Transforming the data into a format usable by the target system and loading into the target system should be a relatively straightforward process of applying the rules defined and tested in the previous steps. Data Cleansing As data errors are identified during data analysis, there are three choices to consider. The mapping, defaulting and transformation rules can be updated to clean the data in an automated fashion. This could even include identifying new sources of conversion data. The expert can choose to manually clean records. There are costs associated with cleansing data including time, labor and computing resources. At some point cleansing data becomes more costly than the ongoing costs of dirty data (Haug et al. 2011). This leaves the expert with the third option of leaving the data as is and accepting the ongoing costs and risks associated with the erroneous records. There are several factors that drive the decision to automate, manually clean or leave the data as is including population size, complexity of cleansing logic and the criticality of the erroneous data elements. The expert must first determine if an automated approach is even feasible. In some cases the data needed to correct the erroneous records is not available and the defaulting logic is not appropriate for the data set. If a suitable mapping rule or defaulting logic can be identified, the expert must weigh the cost of building the correction logic to manually cleansing. Cost associated with cleansing a dataset manually tends to be linear in nature. The larger the data set the higher the cost to cleanse the dataset. This may not be the case with automated cleansing where the majority of the costs are in developing and testing the cleansing logic. Once developed the cleansing logic can be ran against varying population sizes with less impact on the overall cost of the project. The expert must also consider the criticality of the erroneous records. How often will the data be used? How will the data be used? What risks are introduced to ongoing processing and decision making if the data is left in its current state? The expert must balance these ongoing risks and costs with the costs associated with cleansing the data and available resources for the effort. Methodology The framework in the previous section presented the process view and the decisions involved in the ETL data cleansing process. The challenge is to balance the requirements for data accuracy and the resource limitations in terms of time, cost and expertise. Timely delivery of high quality data at the output while 4 Thirty Fifth International Conference on Information Systems, Auckland 2014

5 How Clean is Clean Enough? staying within the budget is the ideal goal for the process. The effectiveness of the process depends upon the decisions about the level of accuracy, allocation of experts to clean and the reaction time to task. To evaluate the decisions and the tradeoffs involved in the data cleansing process, we use a binary integer programming methodology. This methodology allows us to formulate the system flows, develop a mathematical model and then experimentally evaluate various decisions and their interactions. Such a technique is useful for planning decisions in practical situations. The goal is to identify the optimal configuration of the ETL process. The processes, stages, tasks and decisions in the granular ETL process discussed in the previous section can be represented as a network graph. The process flow of ETL is represented by a network graph, which consists of sets of nodes and links, as shown in Figure 2. The set of nodes represents different stages of the ETL process, whereas the set of links represents the process flow. The stages in the ETL process consist of two groups: main stages (extract, transform, and load) and intermediate stages (evaluate, re-work). The sequence of stages through which data flows in the ETL process constitutes a path. There are multiple paths in the graph that are organized as a sequence of stages through which data is processed. All paths begin at the Start node and terminate at the End node, in the graph. Each stage processes the data using certain resources, which vary in their expertise and capabilities. Figure 2. Network Graph Representation of ETL process A node represents each stage in the graph and its attributes. Each node j has the following attributes: unit processing time t j, unit cost c j and the ability to improve data accuracy π j. The expertise of a resource r processing data at stage j is given by e rj. The objective of this model is to identify the stages through which data will be processed in order to achieve system-level goals related to data quality, processing time and project cost. These system-level goals are represented by: σ (target overall data quality), τ (available total processing time) and µ (available monetary resources). A sequence of stages through which data is processed is referred as a feasible path of the ETL process. The sequence of stages in a feasible path is represented by variables x ij. Each feasible path has a corresponding total processing time and cost, incurred in achieving the target system-level goals. Note that all feasible Thirty Fifth International Conference on Information Systems, Auckland

6 Decision Analytics, Big Data, and Visualization paths of the ETL process must contain the main stages (extract, transform, and load). Each intermediate stage (evaluate, re-work) of data processing added to a sample path will improve data accuracy (although it will also increase the total processing times and costs). The optimal path of the ETL process is one that achieves the targeted accuracy level by using the least number of intermediate data processing stages. Next we develop a mathematical formulation of the network graph to represent the ETL process. The network graph representation serves as the foundation for building the mathematical representation of the system. This formulation implements the relationships in the ETL process in the form of system parameters and decision variables. Solving this formulation gives us the values of the decision variables that are interpreted as choices/decisions about which identify the configuration of the ETL process i.e. main stages and intermediate stages through which data will be processed. Given system level targets of data accuracy, there will be tradeoffs between allocating experts for cleansing tasks, time available to perform a process and the nature of analysis performed in the sub-processes. The mathematical representation of the ETL process and different tradeoffs involved in the selection of data processing stages is developed using the following notations: Sets: S A R Parameters: Decision Variables: Model: = Stages = Links between stages = Resources = Improvement in data accuracy after completing stage j = Data processing cost at stage j = Processing time at stage j = Expertise of resource r to process data in stage j = Available ETL completion time = Available ETL monetary budget = Data accuracy requirements at end of ETL process = Constant; =1 if i = start node; = 1 if i = end node; =0 otherwise = 1, if data is transferred to stage j after stage i, 0 otherwise Minimize: (, ) (1) Subject to: (, ) (, ) = ; (2) 1 ; (3) (, ), (4) (, ), (5) (, ), (6) The objective function of the mathematical formulation is given by (eq-1). The output of this model yields an optimal path that identifies the intermediate stages used in the ETL process. The constraints in (eq-2) implement data flow requirements for feasible paths from Start node to End node of the ETL network 6 Thirty Fifth International Conference on Information Systems, Auckland 2014

7 How Clean is Clean Enough? graph. The constraints in (eq-3) require that when there are multiple choices for selecting the next stage of the ETL process, only one stage is selected. The limitation of total available time for the ETL process is implemented by constraint (eq-4). Constraint (eq-5) limits total costs of the ETL process to be within budget. Constraint (eq-6) implements the requirement that a feasible path consists of a sufficient number of intermediate (and main) stages to ensure that the ETL process achieves the target data accuracy level. Preliminary Analysis The optimization model developed in the previous section served as the basis of our preliminary experimental study. The formulation was coded in AMPL software using the interactive command environment for setting up optimizations models and solved using built-in library of solution algorithms available in IBM CPLEX software (Robert et al. 2002). Next a full factorial experimental study was set up to examine the interplay of: time, budget, expertise, accuracy and size. The output was the process configuration for that combination. An instance of the formulation represents a given scenario that was solved by the solver. An instance will be decided by the size of the data. Data Description To test the feasibility of the model, data was generated for a model run. At least one instance of data is required for every combination of the factorials. There will be deterministic change in output based on changes in the value. Based on boundary values we find when the effects change. The factors and interaction effects on the outcome i.e. the optimal configuration of the ETL process and its stages. Experts in the field were recruited to provide insights into the realistic levels of the parameters listed in the previous section. The first author of the paper has over 20 years of experience in the data conversion domain, so the first author proposed the initial accuracy rates. Experts with at least 15 years of experience were contacted and were sent a description of the problem. The data was based on real problem sets from insurance and benefits industries. Real data samples were used to identify realistic data size and the inaccuracies in the input population. Experts validated the realistic accuracy expectations, competencies of experts for tasks, time, and budget constraints. The error rates and accuracy estimates were also based on the actual numbers in the cleaning process. Three experts then independently verified the numbers and proposed modifications based on their experience. The suggested modifications were within the margin of error. All three experts to ensure there was no bias verified the final numbers. In order to narrow in on sample data, we focused on a single domain for the conversion and a single use for the data post conversion. After narrowing our focus we were able to work with industry experts to define typical values for a conversion. The focus of our analysis is on a Defined Benefit (DB) conversion. A DB plan is often more commonly referred to as a pension plan. DB conversions tend to be very complex for several reasons. The plan administrator must track data on the plan participants for up to 30 or more years, to use in a calculation at retirement. There is a vast amount of data to track over a very long period of time. The problem is enhanced by the fact that very few participants review their data until retirement so issues are not found and corrected in a timely manner. Plan sizes can vary greatly as well for only a few hundred participants at small organizations to hundreds of thousands of participants in fortune 500 companies or government entities. For the purpose of this analysis, we focused in on plans with roughly 20,000 participants. The 20k participants represent the total number of participants (current employees, former employees and beneficiaries) in the conversion. It does not represent the true volume of data. For each participant the plan administrator must convert several types of data (tables in the database). And each type of data would have several characteristics (fields on the table). Consider an example: For each person we would need personal indicative data and pay data. Personal indicative data likely contains fields such as first name, last name, birth date, etc. Pay data would include the type of pay, frequency of pay, date it was paid and the amount. It can be seen that a conversion of 20k participants can actually be a very large amount of data. A plan that pays their employees twice a month for example and has 10k active employees will generate 240k rows of pay data per year. The complexity of a DB conversion can be driven by several other factors as well. The number of unions and complexity of their rules, the number of mergers and acquisitions the organization has been through, the number of system conversions in the past, just to name a few. For the purpose of this effort we tried to define three broad complexity levels: Simple, Average and Complex. Thirty Fifth International Conference on Information Systems, Auckland

8 Decision Analytics, Big Data, and Visualization The use of the data post conversion in our analysis was for real time online retirement processing. This is important as it drives the requirement for data quality at the end of the ETL process. If this data was for reporting, or even for off line retirement processing the quality of data could be significantly less. Since the transactions are taken online, the user expects the system to work without interruption. In this environment, invalid data is tracked at the participant level. We may convert thousands of individual fields for participants (think of the entire pay history, hours history, etc.), but if anything is wrong in any of the fields, we cannot process the transaction for the participants. The larger the participant count the higher the data accuracy must be. If the plan has 5,000 participants and we have a 5% error rate, there are 250 participants that we will need to handle off line when they try to retire. If the plan has 500,000 and we have a 5% error rate we have 25,000 participants to handle in a manual fashion. The type of error dictates the experience level of the ETL team member needed to solve the error. The four types of errors considered for the analysis are: Type 1 - This is incorrect data that requires very little effort to clean up. An example may be a population that is missing a birth date, but the data is available in other data we have, so the correction rule might be as simple as using our normal mapping if the data is there, else using this alternate mapping. Type 2 - This is incorrect data that requires effort to correct, but does not require specialized knowledge or skill set and does not require overly complicated code. Using the missing birth date example, if we had the data, to correct the birth date, but had to evaluate several pieces of data to determine which source to use, it is considered a Type 2 error. Type 3 - This is incorrect data that requires a significant effort to correct or a specialized skill set or knowledge e.g. an incorrect service amount that must be calculated at conversion. Type 4 - This is incorrect data that simply cannot be cleaned in an automated fashion. The data does not exist or can only be found manually with extensive research. Table 1. Percentage of error types in each of the three conversions Conversion Total Errors Type 1 Type 2 Type 3 Type 4 Simple 20% 10% 7% 2% 1% Average 40% 20% 10% 6% 4% Difficult 60% 25% 18% 10% 7% The three types of conversion were expressed in terms of the percentages of the three types of errors are described in Table 1. The first column defines the conversion type: Simple, Average and Complex. Next is the percentage of total errors. These figures were derived from actual empirical data and then validated by the three experts to ensure that the case was not atypical. The following columns represent the percentages of individual error types listed earlier. Table 2. Parameters for each conversion type and the estimated constraints Process Resource Node Simple Conversion =528 h, =0.95, =$9.22 t c e e π Average Conversion =880 h, =0.95, =$20.39 t Complex Conversion =1584 h, =0.95, =$59.68 t c e e π e c e π Programmer E Analyst Analyst Lead Analyst T Lead Programmer Analyst Analyst L Analyst Lead Lead Thirty Fifth International Conference on Information Systems, Auckland 2014

9 How Clean is Clean Enough? Parameter Description The parameters identified in the model in the previous section were defined based on the empirical data for each of the three conversion types. The target accuracy, budget (cost), time permitted for the conversion process and the types of resources available were also derived from the sample data. Since the three experts were from different firms, they were able to validate the estimates for the sample data independently to ensure that the estimates were reasonable. The estimates of the accuracy of cleansing process and time taken for manual and automated cleaning were also extracted from the sample datasets used for the study. It must be explained here that while our network implementation of the nodes in Figure 2 shows only two stages for each of the ETL processes, in reality there can be multiple iterations in each stage, before the data is passed on to the next process. The parameters for each of the three conversion types along with the percentage improvement at each node are presented in Table 2. Results and Discussion The data described in the previous section was used in the optimization model. The output of the model was used to record the optimal path. The optimal path identified different stages of the ETL process to achieve target accuracy level, while conforming to the budgetary and the processing time constraints. The optimal path for Simple conversion is: Start End The optimal path for Average conversion is: Start End The optimal path for Complex conversion is: Start End The numbers in the optimal paths represent the nodes in Figure 2 and the order of nodes represents the sequence of stages in the optimal ETL process. It was seen that for the case of simple conversions, optimal path identified no data cleaning in the extract phase, while resolving most of the errors, through a combination of manual and automated cleansing processes, in the transform phase. This could be indicative of the nature of the conversion task that makes cleansing process improvement during extract and load phase marginal. For the case of conversions of average complexity, the optimal path indicated that cleansing at each node improves the accuracy significantly. This result also shows that manual processing is optimal compared to the automated processes for data errors with higher complexity. For the case of most complex conversion, optimal path is similar to the case of average conversion, with a difference that an additional cleansing step was used in the transform stage. In this step, automated cleansing provided a better data cleaning option than the manual process. Hence, the nature and type of errors in a complex conversion made it different than the cases of average and simple conversion. Furthermore, analysis of the results showed that budget and available time for the three types of conversions also play key roles in the composition of optimal paths to satisfy the data cleansing requirements. Conclusion In this research in progress, we develop an optimization model of the ETL process. We identify the major stages and the sub processes within the ETL process that impact the quality of data cleansing. We use the parameters of time, expert resources, budget, and target data accuracy requirements to formulate a mathematical model of the system. We then use empirical data for conversions of three different complexity levels from one domain. The estimates of the parameters of the model are derived from the sample data and validated by three industry experts independently. Preliminary results have been presented that represent how complexity of the task, accuracy requirements, budget and time available can impact the optimal path for data cleansing. Sometimes cleansing upfront in the extract process is not the best solution and vice versa. This preliminary analysis establishes the feasibility of the approach. Future studies will use additional data sources and include additional sub stages within the ETL process. This research will help us understand the various facets of data cleansing process in ETL process and provide with optimal combinations of the decisions regarding resource allocation, budget and data accuracy. Thirty Fifth International Conference on Information Systems, Auckland

10 Decision Analytics, Big Data, and Visualization References Erhard, E., and Do, H Data Cleaning: Problems and Current Approaches, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering (23:4), pp Galhardas, H., Lopes, A., and Santos, E Support for User Involvement in Data Cleaning, in Proceedings of the 2011 International Conference of Data Warehousing and Knowledge Discover, pp Fourer, R. Gay, D.M., and Kernighan, B.W AMPL: A Modeling Language for Mathematical Programming. Duxbury Press / Brooks/Cole Publishing Company. Friedman, T., and Bitterer, A Magic quadrant for data quality tools. Gartner Group, 184. Haug, A., Zachariassen, F., and Liempd, D The Costs of Poor Data Quality, Journal of Industrial Engineering and Management (4:2), pp Lee, Y., Strong, D., Kahn, B., and Wang, R AIMQ: a methodology for information quality assessment, Information & Management (40:2), pp Maletic, J., and Marcus, A Data Cleansing: Beyond Integrity Analysis, in Proceedings of the 2000 Conference on Information Quality, pp Marsh, R Drowning in dirty data? It s time to sink or swim: a four-stage methodology for total data quality management, The Journal of Database Marketing & Customer Strategy Management (12:2), pp Pipino, L., Lee, Y., and Wang, R Data Quality Assessment, Communications of the ACM (45:4), pp Rahm, E., and Do, H. H Data cleaning: Problems and current approaches, IEEE Data Eng. Bull. 23(4), Redman, T The Impact of Poor Data Quality on the Typical Enterprise, Communications of the ACM (41:2), pp Vassiliadis, P., Simitsis, A., and Baikousi, E A Taxonomy of ETL Activities, in Proceedings of the 12 th ACM International Workshop on Data Warehousing, pp Watts, S., Shankaranarayanan, G., and Even A Data Quality Assessment in Context: A Cognitive Perspective, Decision Support Systems (48:1), pp Wixom, B. and Watson, H An Empirical Investigation of the Factors Affecting Data Warehousing Success, MIS Quarterly (25:1), pp Thirty Fifth International Conference on Information Systems, Auckland 2014

Data Management Glossary

Data Management Glossary Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative

More information

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems Data Analysis and Design for BI and Data Warehousing Systems Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your

More information

Mining Domain Knowledge: Using Functional Dependencies to Profile Data

Mining Domain Knowledge: Using Functional Dependencies to Profile Data Mining Domain Knowledge: Using Functional Dependencies to Profile Data Research-in-Progress Derek Legenzoff The University of Alabama Tuscaloosa, AL, USA dtlegenzoff@crimson.ua.edu Teagen Nabity The University

More information

On maximum throughput in BitTorrent

On maximum throughput in BitTorrent Gradus Vol 3, No 2 (2016) 67-72 ISSN 2064-8014 On maximum throughput in BitTorrent Elvira Dobjánné Antal 1, and Tamás Vinkó 2 1 Department of Natural Sciences and Engineering, Faculty of Mechanical Engineering

More information

Data Cleansing Strategies

Data Cleansing Strategies Page 1 of 8 Data Cleansing Strategies InfoManagement Direct, October 2004 Kuldeep Dongre The presence of data alone does not ensure that all the management functions and decisions can be smoothly undertaken.

More information

The Six Principles of BW Data Validation

The Six Principles of BW Data Validation The Problem The Six Principles of BW Data Validation Users do not trust the data in your BW system. The Cause By their nature, data warehouses store large volumes of data. For analytical purposes, the

More information

Full file at

Full file at Chapter 2 Data Warehousing True-False Questions 1. A real-time, enterprise-level data warehouse combined with a strategy for its use in decision support can leverage data to provide massive financial benefits

More information

Handling Missing Values via Decomposition of the Conditioned Set

Handling Missing Values via Decomposition of the Conditioned Set Handling Missing Values via Decomposition of the Conditioned Set Mei-Ling Shyu, Indika Priyantha Kuruppu-Appuhamilage Department of Electrical and Computer Engineering, University of Miami Coral Gables,

More information

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW Ana Azevedo and M.F. Santos ABSTRACT In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done

More information

Managing Data Resources

Managing Data Resources Chapter 7 Managing Data Resources 7.1 2006 by Prentice Hall OBJECTIVES Describe basic file organization concepts and the problems of managing data resources in a traditional file environment Describe how

More information

Top of Minds Report series Data Warehouse The six levels of integration

Top of Minds Report series Data Warehouse The six levels of integration Top of Minds Report series Data Warehouse The six levels of integration Recommended reading Before reading this report it is recommended to read ToM Report Series on Data Warehouse Definitions for Integration

More information

Management Information Systems

Management Information Systems Foundations of Business Intelligence: Databases and Information Management Lecturer: Richard Boateng, PhD. Lecturer in Information Systems, University of Ghana Business School Executive Director, PearlRichards

More information

FAQ: Database Development and Management

FAQ: Database Development and Management Question 1: Are normalization rules followed exclusively in the real world? Answer 1: Unfortunately, the answer to this question is no. Database design and development do not have hard and fast rules,

More information

The Hadoop Paradigm & the Need for Dataset Management

The Hadoop Paradigm & the Need for Dataset Management The Hadoop Paradigm & the Need for Dataset Management 1. Hadoop Adoption Hadoop is being adopted rapidly by many different types of enterprises and government entities and it is an extraordinarily complex

More information

Enterprise Data Architecture: Why, What and How

Enterprise Data Architecture: Why, What and How Tutorials, G. James, T. Friedman Research Note 3 February 2003 Enterprise Data Architecture: Why, What and How The goal of data architecture is to introduce structure, control and consistency to the fragmented

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

TECHNOLOGY BRIEF: CA ERWIN DATA PROFILER. Combining Data Profiling and Data Modeling for Better Data Quality

TECHNOLOGY BRIEF: CA ERWIN DATA PROFILER. Combining Data Profiling and Data Modeling for Better Data Quality TECHNOLOGY BRIEF: CA ERWIN DATA PROFILER Combining Data Profiling and Data Modeling for Better Data Quality Table of Contents Executive Summary SECTION 1: CHALLENGE 2 Reducing the Cost and Risk of Data

More information

Where Does Dirty Data Originate?

Where Does Dirty Data Originate? Social123.com 1.888.530.6723 Sales@Social123.com @Social123 The importance of data quality cannot be overstated. For marketers, it starts with educating ourselves and our team as to what dirty data looks

More information

Leverage the power of SQL Analytical functions in Business Intelligence and Analytics. Viana Rumao, Asher Dmello

Leverage the power of SQL Analytical functions in Business Intelligence and Analytics. Viana Rumao, Asher Dmello International Journal of Scientific & Engineering Research Volume 9, Issue 7, July-2018 461 Leverage the power of SQL Analytical functions in Business Intelligence and Analytics Viana Rumao, Asher Dmello

More information

Managing Data Resources

Managing Data Resources Chapter 7 OBJECTIVES Describe basic file organization concepts and the problems of managing data resources in a traditional file environment Managing Data Resources Describe how a database management system

More information

A WHITE PAPER By Silwood Technology Limited

A WHITE PAPER By Silwood Technology Limited A WHITE PAPER By Silwood Technology Limited Delivering metadata transparency for Enterprise Application packages Executive Summary Enterprise Resource Planning (ERP) packages such as SAP, J.D. Edwards

More information

UNIX Server Optimization With TCO and ROI Analysis

UNIX Server Optimization With TCO and ROI Analysis UNIX Server Optimization With TCO and ROI Analysis Prepared For: John Smith Vice President of Finance ABC Company 123 Main Street Anytown, USA 98765 555-123-4567 Prepared By: Datatrend Technologies, Inc

More information

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar 1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar 1) What does the term 'Ad-hoc Analysis' mean? Choice 1 Business analysts use a subset of the data for analysis. Choice 2: Business analysts access the Data

More information

IBM Software IBM InfoSphere Information Server for Data Quality

IBM Software IBM InfoSphere Information Server for Data Quality IBM InfoSphere Information Server for Data Quality A component index Table of contents 3 6 9 9 InfoSphere QualityStage 10 InfoSphere Information Analyzer 12 InfoSphere Discovery 13 14 2 Do you have confidence

More information

PUTTING THE CUSTOMER FIRST: USER CENTERED DESIGN

PUTTING THE CUSTOMER FIRST: USER CENTERED DESIGN PUTTING THE CUSTOMER FIRST: USER CENTERED DESIGN icidigital.com 1 Case Study DEFINE icidigital was chosen as a trusted creative partner to design a forward-thinking suite of sites for AICPA, one of the

More information

Data Quality in the MDM Ecosystem

Data Quality in the MDM Ecosystem Solution Guide Data Quality in the MDM Ecosystem What is MDM? The premise of Master Data Management (MDM) is to create, maintain, and deliver the most complete and comprehensive view possible from disparate

More information

Considerations of Analysis of Healthcare Claims Data

Considerations of Analysis of Healthcare Claims Data Considerations of Analysis of Healthcare Claims Data ABSTRACT Healthcare related data is estimated to grow exponentially over the next few years, especially with the growing adaptation of electronic medical

More information

Data Quality Assessment Framework

Data Quality Assessment Framework Data Quality Assessment Framework ABSTRACT Many efforts to measure data quality focus on abstract concepts and cannot find a practical way to apply them. Or they attach to specific issues and cannot imagine

More information

Practical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems

Practical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems Practical Database Design Methodology and Use of UML Diagrams 406.426 Design & Analysis of Database Systems Jonghun Park jonghun@snu.ac.kr Dept. of Industrial Engineering Seoul National University chapter

More information

QM Chapter 1 Database Fundamentals Version 10 th Ed. Prepared by Dr Kamel Rouibah / Dept QM & IS

QM Chapter 1 Database Fundamentals Version 10 th Ed. Prepared by Dr Kamel Rouibah / Dept QM & IS QM 433 - Chapter 1 Database Fundamentals Version 10 th Ed Prepared by Dr Kamel Rouibah / Dept QM & IS www.cba.edu.kw/krouibah Dr K. Rouibah / dept QM & IS Chapter 1 (433) Database fundamentals 1 Objectives

More information

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT MANAGING THE DIGITAL FIRM, 12 TH EDITION Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT VIDEO CASES Case 1: Maruti Suzuki Business Intelligence and Enterprise Databases

More information

Data Mining: Approach Towards The Accuracy Using Teradata!

Data Mining: Approach Towards The Accuracy Using Teradata! Data Mining: Approach Towards The Accuracy Using Teradata! Shubhangi Pharande Department of MCA NBNSSOCS,Sinhgad Institute Simantini Nalawade Department of MCA NBNSSOCS,Sinhgad Institute Ajay Nalawade

More information

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended. Previews of TDWI course books are provided as an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews can not be printed. TDWI strives

More information

SYMBIOSIS CENTRE FOR DISTANCE LEARNING (SCDL) Subject: Management Information Systems

SYMBIOSIS CENTRE FOR DISTANCE LEARNING (SCDL) Subject: Management Information Systems Sample Questions: Section I: Subjective Questions 1. Which factors are considered critical for the success/failure of the Decision Support System? 2. List the categories of data warehousing tools. 3. "MIS

More information

Software Development Chapter 1

Software Development Chapter 1 Software Development Chapter 1 1. Introduction Software Applications are increasingly used to tackle problems that concern everyday life : Automatic Bank tellers Airline reservation systems Air traffic

More information

Data Quality Framework

Data Quality Framework #THETA2017 Data Quality Framework Mozhgan Memari, Bruce Cassidy The University of Auckland This work is licensed under a Creative Commons Attribution 4.0 International License Two Figures from 2016 The

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

Tamr Technical Whitepaper

Tamr Technical Whitepaper Tamr Technical Whitepaper 1. Executive Summary Tamr was founded to tackle large-scale data management challenges in organizations where extreme data volume and variety require an approach different from

More information

Predictive Insight, Automation and Expertise Drive Added Value for Managed Services

Predictive Insight, Automation and Expertise Drive Added Value for Managed Services Sponsored by: Cisco Services Author: Leslie Rosenberg December 2017 Predictive Insight, Automation and Expertise Drive Added Value for Managed Services IDC OPINION Competitive business leaders are challenging

More information

Implementing a Successful Data Governance Program

Implementing a Successful Data Governance Program Implementing a Successful Data Governance Program Mary Anne Hopper Data Management Consulting Manager SAS #AnalyticsX Data Stewardship #analyticsx SAS Data Management Framework BUSINESS DRIVERS DATA GOVERNANCE

More information

Course Information

Course Information Course Information 2018-2020 Master of Information Systems: Management and Innovation Institutt for teknologi / Department of Technology Index Index... i 1... 1 1.1 Content... 1 1.2 Name... 1 1.3 Programme

More information

SUGGESTED SOLUTION IPCC MAY 2017EXAM. Test Code - I M J

SUGGESTED SOLUTION IPCC MAY 2017EXAM. Test Code - I M J SUGGESTED SOLUTION IPCC MAY 2017EXAM INFORMATION TECHNOLOGY Test Code - I M J 7 1 2 1 BRANCH - (MULTIPLE) (Date : 20.11.2016) Head Office : Shraddha, 3 rd Floor, Near Chinai College, Andheri (E), Mumbai

More information

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) About the Speaker Dr. SubraMANI Paramasivam PhD., MCT, MCSE, MCITP, MCP, MCTS, MCSA CEO, Principal Consultant & Trainer

More information

Strategy & Planning: Data Governance & Data Quality

Strategy & Planning: Data Governance & Data Quality Strategy & Planning: Data Governance & Data Quality April 30, 2017 In the era of big data and data science, most commercial and nonprofit organizations realize the potential power of data in accelerating

More information

Value of Data Transformation. Sean Kandel, Co-Founder and CTO of Trifacta

Value of Data Transformation. Sean Kandel, Co-Founder and CTO of Trifacta Value of Data Transformation Sean Kandel, Co-Founder and CTO of Trifacta Organizations today generate and collect an unprecedented volume and variety of data. At the same time, the adoption of data-driven

More information

Extending the Value of MDM Through Data Virtualization

Extending the Value of MDM Through Data Virtualization Extending the Value of MDM Through Data Virtualization Perspective on how data virtualization adds business value to MDM implementations Audience Business Stakeholders Line of Business Managers Enterprise

More information

Software Testing and Maintenance

Software Testing and Maintenance Software Testing and Maintenance Testing Strategies Black Box Testing, also known as Behavioral Testing, is a software testing method in which the internal structure/ design/ implementation of the item

More information

Big data privacy in Australia

Big data privacy in Australia Five-article series Big data privacy in Australia Three actions you can take towards compliance Article 5 Big data and privacy Three actions you can take towards compliance There are three actions that

More information

2 The IBM Data Governance Unified Process

2 The IBM Data Governance Unified Process 2 The IBM Data Governance Unified Process The benefits of a commitment to a comprehensive enterprise Data Governance initiative are many and varied, and so are the challenges to achieving strong Data Governance.

More information

ACCOUNTING (ACCT) Kent State University Catalog

ACCOUNTING (ACCT) Kent State University Catalog Kent State University Catalog 2018-2019 1 ACCOUNTING (ACCT) ACCT 23020 INTRODUCTION TO FINANCIAL ACCOUNTING 3 Credit (Equivalent to ACTT 11000) Introduction to the basic concepts and standards underlying

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Fifteen Best Practices for a Successful Data Center Migration

Fifteen Best Practices for a Successful Data Center Migration Fifteen Best Practices for a Successful Data Center Migration Published: 6 March 2017 ID: G00324187 Analyst(s): Henrique Cecci Data center migrations are often complex and risky. These best practices will

More information

Defining the Challenges and Solutions. Resiliency Model. A Holistic Approach to Risk Management. Discussion Outline

Defining the Challenges and Solutions. Resiliency Model. A Holistic Approach to Risk Management. Discussion Outline Resiliency Model A Holistic Approach to Risk Management Discussion Outline Defining the Challenges and Solutions The Underlying Concepts of Our Approach Outlining the Resiliency Model (RM) Next Steps The

More information

Integration With the Business Modeler

Integration With the Business Modeler Decision Framework, J. Duggan Research Note 11 September 2003 Evaluating OOA&D Functionality Criteria Looking at nine criteria will help you evaluate the functionality of object-oriented analysis and design

More information

Planning and Implementing ITIL in ICT Organisations

Planning and Implementing ITIL in ICT Organisations CCPM Solutions Experts in ICT Performance Supporting Your Business Planning and Implementing ITIL in ICT Organisations June 2012, Addis Ababa Content 1. Quick ITIL (Overview) 2. Case study (How not to

More information

POSITION DESCRIPTION

POSITION DESCRIPTION State of Michigan Civil Service Commission Capitol Commons Center, P.O. Box 30002 Lansing, MI 48909 Position Code 1. INTCSPL3D16N POSITION DESCRIPTION This position description serves as the official classification

More information

Data Profiling. A Quick Primer on the What and the Why of Data Integration AUTHORS

Data Profiling. A Quick Primer on the What and the Why of Data Integration AUTHORS A Quick Primer on the What and the Why of Data Integration AUTHORS Shankar Ganesh R Senior Technical Architect Architecture and Technology Services HCL Technologies, Chennai Sathish Kumar Srinivasan Enterprise

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

by Prentice Hall

by Prentice Hall Chapter 6 Foundations of Business Intelligence: Databases and Information Management 6.1 2010 by Prentice Hall Organizing Data in a Traditional File Environment File organization concepts Computer system

More information

FOR FINANCIAL SERVICES ORGANIZATIONS

FOR FINANCIAL SERVICES ORGANIZATIONS RSA BUSINESS-DRIVEN SECURITYTM FOR FINANCIAL SERVICES ORGANIZATIONS MANAGING THE NEXUS OF RISK & SECURITY A CHANGING LANDSCAPE AND A NEW APPROACH Today s financial services technology landscape is increasingly

More information

WHITE PAPER. The truth about data MASTER DATA IS YOUR KEY TO SUCCESS

WHITE PAPER. The truth about data MASTER DATA IS YOUR KEY TO SUCCESS WHITE PAPER The truth about data MASTER DATA IS YOUR KEY TO SUCCESS Master Data is your key to success SO HOW DO YOU KNOW WHAT S TRUE AMONG ALL THE DIFFER- ENT DATA SOURCES AND ACROSS ALL YOUR ORGANIZATIONAL

More information

OPTIMIZATION MAXIMIZING TELECOM AND NETWORK. The current state of enterprise optimization, best practices and considerations for improvement

OPTIMIZATION MAXIMIZING TELECOM AND NETWORK. The current state of enterprise optimization, best practices and considerations for improvement MAXIMIZING TELECOM AND NETWORK OPTIMIZATION The current state of enterprise optimization, best practices and considerations for improvement AOTMP.com The Next Evolution of Telecom Management OVERVIEW As

More information

ONS Beta website. 7 December 2015

ONS Beta website. 7 December 2015 ONS Beta website Terminology survey results 7 December 2015 Background During usability sessions, both moderated and online, it has become clear that users do not understand the majority of terminology

More information

Cost Effectiveness of Programming Methods A Replication and Extension

Cost Effectiveness of Programming Methods A Replication and Extension A Replication and Extension Completed Research Paper Wenying Sun Computer Information Sciences Washburn University nan.sun@washburn.edu Hee Seok Nam Mathematics and Statistics Washburn University heeseok.nam@washburn.edu

More information

Data Migration Platform

Data Migration Platform appmigrate TM Data Migration Platform QUALITY MANAGEMENT RECONCILIATION MAINTENANCE MIGRATION PROFILING COEXISTENCE CUTOVER PLANNING AND EXECUTION ? Problem Data migration done in the traditional way,

More information

Demystifying GRC. Abstract

Demystifying GRC. Abstract White Paper Demystifying GRC Abstract Executives globally are highly focused on initiatives around Governance, Risk and Compliance (GRC), to improve upon risk management and regulatory compliances. Over

More information

3Lesson 3: Web Project Management Fundamentals Objectives

3Lesson 3: Web Project Management Fundamentals Objectives 3Lesson 3: Web Project Management Fundamentals Objectives By the end of this lesson, you will be able to: 1.1.11: Determine site project implementation factors (includes stakeholder input, time frame,

More information

DIGITAL INNOVATION HYBRID CLOUD COSTS AGILITY PRODUCTIVITY

DIGITAL INNOVATION HYBRID CLOUD COSTS AGILITY PRODUCTIVITY Hybrid Cloud Automation Framework (HCAF): How to Accelerate and De-risk the Path to Hybrid Cloud CDI LLC Advanced Services Group Contents 02 03 03 04 04 04 05 05 05 06 06 07 07 08 08 09 09 10 10 12 Executive

More information

Q1) Describe business intelligence system development phases? (6 marks)

Q1) Describe business intelligence system development phases? (6 marks) BUISINESS ANALYTICS AND INTELLIGENCE SOLVED QUESTIONS Q1) Describe business intelligence system development phases? (6 marks) The 4 phases of BI system development are as follow: Analysis phase Design

More information

DATA POSITION AND PROFILING IN DOMAIN-INDEPENDENT WAREHOUSE CLEANING

DATA POSITION AND PROFILING IN DOMAIN-INDEPENDENT WAREHOUSE CLEANING DATA POSITION AND PROFILING IN DOMAIN-INDEPENDENT WAREHOUSE CLEANING C.I. Ezeife School of Computer Science, University of Windsor Windsor, Ontario, Canada N9B 3P4 cezeife@uwindsor.ca A. O. Udechukwu,

More information

The Importance of Data Profiling

The Importance of Data Profiling The Importance of Data Profiling By Bud Walker and Joseph Vertido A Melissa Data Whitepaper Introduction Data profiling is a commonly used term in the discipline of data management, yet the perception

More information

Chapter Twelve. Systems Design and Development

Chapter Twelve. Systems Design and Development Chapter Twelve Systems Design and Development After reading this chapter, you should be able to: Describe the process of designing, programming, and debugging a computer program Explain why there are many

More information

Standard Glossary of Terms used in Software Testing. Version 3.2. Foundation Extension - Usability Terms

Standard Glossary of Terms used in Software Testing. Version 3.2. Foundation Extension - Usability Terms Standard Glossary of Terms used in Software Testing Version 3.2 Foundation Extension - Usability Terms International Software Testing Qualifications Board Copyright Notice This document may be copied in

More information

DATAWAREHOUSING AND ETL PROCESSES: An Explanatory Research

DATAWAREHOUSING AND ETL PROCESSES: An Explanatory Research DATAWAREHOUSING AND ETL PROCESSES: An Explanatory Research Priyanshu Gupta ETL Software Developer United Health Group Abstract- In this paper, the author has focused on explaining Data Warehousing and

More information

Data Governance Central to Data Management Success

Data Governance Central to Data Management Success Data Governance Central to Data Success International Anne Marie Smith, Ph.D. DAMA International DMBOK Editorial Review Board Primary Contributor EWSolutions, Inc Principal Consultant and Director of Education

More information

Management Information Systems. B15. Managing Information Resources and IT Security

Management Information Systems. B15. Managing Information Resources and IT Security Management Information Systems Management Information Systems B15. Managing Information Resources and IT Security Code: 166137-01+02 Course: Management Information Systems Period: Spring 2013 Professor:

More information

Improving Data Governance in Your Organization. Faire Co Regional Manger, Information Management Software, ASEAN

Improving Data Governance in Your Organization. Faire Co Regional Manger, Information Management Software, ASEAN Improving Data Governance in Your Organization Faire Co Regional Manger, Information Management Software, ASEAN Topics The Innovation Imperative and Innovating with Information What Is Data Governance?

More information

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3 6, 2012 Automatic New Topic Identification in Search Engine Transaction Log

More information

Sample Exam Syllabus

Sample Exam Syllabus ISTQB Foundation Level 2011 Syllabus Version 2.9 Release Date: December 16th, 2017. Version.2.9 Page 1 of 46 Dec 16th, 2017 Copyright 2017 (hereinafter called ISTQB ). All rights reserved. The authors

More information

Losing Control: Controls, Risks, Governance, and Stewardship of Enterprise Data

Losing Control: Controls, Risks, Governance, and Stewardship of Enterprise Data Losing Control: Controls, Risks, Governance, and Stewardship of Enterprise Data an eprentise white paper tel: 407.591.4950 toll-free: 1.888.943.5363 web: www.eprentise.com Author: Helene Abrams www.eprentise.com

More information

Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts.

Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts. Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts. BY SCOTT A. BARNES, CPA, CFF, CGMA The adversarial nature of the American legal system creates a natural conflict between

More information

ACL Interpretive Visual Remediation

ACL Interpretive Visual Remediation January 2016 ACL Interpretive Visual Remediation Innovation in Internal Control Management SOLUTIONPERSPECTIVE Governance, Risk Management & Compliance Insight 2015 GRC 20/20 Research, LLC. All Rights

More information

Four Essential Steps for Removing Risk and Downtime from Your POWER9 Migration

Four Essential Steps for Removing Risk and Downtime from Your POWER9 Migration Four Essential Steps for Removing Risk and Downtime from Your POWER9 Migration Syncsort Four Essential Steps for Removing Risk and Downtime from Your POWER9 Migration With the introduction of IBM s POWER9

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

Chapter XVIII Utility-Cost Tradeoffs in the Design of Data Resources

Chapter XVIII Utility-Cost Tradeoffs in the Design of Data Resources Chapter XVIII Utility-Cost Tradeoffs in the Design of Data Resources Adir Even Ben Gurion University of the Negev, Israel G. Shankaranarayanan Boston University School of Management, USA Paul D. Berger

More information

Trustwave Managed Security Testing

Trustwave Managed Security Testing Trustwave Managed Security Testing SOLUTION OVERVIEW Trustwave Managed Security Testing (MST) gives you visibility and insight into vulnerabilities and security weaknesses that need to be addressed to

More information

Chapter 8: SDLC Reviews and Audit Learning objectives Introduction Role of IS Auditor in SDLC

Chapter 8: SDLC Reviews and Audit Learning objectives Introduction Role of IS Auditor in SDLC Chapter 8: SDLC Reviews and Audit... 2 8.1 Learning objectives... 2 8.1 Introduction... 2 8.2 Role of IS Auditor in SDLC... 2 8.2.1 IS Auditor as Team member... 2 8.2.2 Mid-project reviews... 3 8.2.3 Post

More information

Answer: D. Answer: B. Answer: B

Answer: D. Answer: B. Answer: B 1. Management information systems (MIS) A. create and share documents that support day-today office activities C. capture and reproduce the knowledge of an expert problem solver B. process business transactions

More information

SQL Tuning Reading Recent Data Fast

SQL Tuning Reading Recent Data Fast SQL Tuning Reading Recent Data Fast Dan Tow singingsql.com Introduction Time is the key to SQL tuning, in two respects: Query execution time is the key measure of a tuned query, the only measure that matters

More information

BPS Suite and the OCEG Capability Model. Mapping the OCEG Capability Model to the BPS Suite s product capability.

BPS Suite and the OCEG Capability Model. Mapping the OCEG Capability Model to the BPS Suite s product capability. BPS Suite and the OCEG Capability Model Mapping the OCEG Capability Model to the BPS Suite s product capability. BPS Contents Introduction... 2 GRC activities... 2 BPS and the Capability Model for GRC...

More information

IBM InfoSphere Information Analyzer

IBM InfoSphere Information Analyzer IBM InfoSphere Information Analyzer Understand, analyze and monitor your data Highlights Develop a greater understanding of data source structure, content and quality Leverage data quality rules continuously

More information

The Analysis and Proposed Modifications to ISO/IEC Software Engineering Software Quality Requirements and Evaluation Quality Requirements

The Analysis and Proposed Modifications to ISO/IEC Software Engineering Software Quality Requirements and Evaluation Quality Requirements Journal of Software Engineering and Applications, 2016, 9, 112-127 Published Online April 2016 in SciRes. http://www.scirp.org/journal/jsea http://dx.doi.org/10.4236/jsea.2016.94010 The Analysis and Proposed

More information

Generating Cross level Rules: An automated approach

Generating Cross level Rules: An automated approach Generating Cross level Rules: An automated approach Ashok 1, Sonika Dhingra 1 1HOD, Dept of Software Engg.,Bhiwani Institute of Technology, Bhiwani, India 1M.Tech Student, Dept of Software Engg.,Bhiwani

More information

CA ERwin Data Profiler

CA ERwin Data Profiler PRODUCT BRIEF: CA ERWIN DATA PROFILER CA ERwin Data Profiler CA ERWIN DATA PROFILER HELPS ORGANIZATIONS LOWER THE COSTS AND RISK ASSOCIATED WITH DATA INTEGRATION BY PROVIDING REUSABLE, AUTOMATED, CROSS-DATA-SOURCE

More information

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION The process of planning and executing SQL Server migrations can be complex and risk-prone. This is a case where the right approach and

More information

Case study: Database integration by Hokuriku Coca-Cola using a database appliance

Case study: Database integration by Hokuriku Coca-Cola using a database appliance H.Horiuchi Research Note June 25, 2010 Case study: Database integration by Hokuriku Coca-Cola using a database appliance Hokuriku Coca-Cola Bottling (Hokuriku Coca-Cola) integrated data used in3types of

More information

A Conceptual Framework for Data Cleansing A Novel Approach to Support the Cleansing Process

A Conceptual Framework for Data Cleansing A Novel Approach to Support the Cleansing Process A Conceptual Framework for Data Cleansing A Novel Approach to Support the Cleansing Process Kofi Adu-Manu Sarpong Valley View University, Accra-Ghana Faculty of Science P.O. Box VV 44, Oyibi-Accra Joseph

More information

TIBCO Data Virtualization for the Energy Industry

TIBCO Data Virtualization for the Energy Industry TIBCO Data Virtualization for the Energy Industry USE CASES DESCRIBED: Offshore platform data analytics Well maintenance and repair Cross refinery web data services SAP master data quality TODAY S COMPLEX

More information

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring

More information