Assessing data quality in records management systems as implemented in Noark 5

Size: px
Start display at page:

Download "Assessing data quality in records management systems as implemented in Noark 5"

Transcription

1 1 Assessing data quality in records management systems as implemented in Noark 5 Dimitar Ouzounov School of Computing Dublin City University Dublin, Ireland dimitar.ouzounov2@computing.dcu.ie Abstract Good data quality is crucial in records management, but these two areas have never been studied together. The main contribution of this paper lies in bridging the gap that exists. We do so by examining the data quality issues in a Norwegian standard for records management called Noark 5, and developing a software component that objectively measures data quality on the basis of user-defined DQ requirements. There was no Noark 5 system that we could use freely in our study. To address the issue, we designed an architecture for records management and DQ assessment, and developed a Noark 5 compliant system which is in many aspects superior to the existing commercial solutions. We also created an innovative language for specifying data quality requirements in the form of rules. The language allows data quality to be objectively measured in a wide variety of systems regardless of their underlying technology. I. INTRODUCTION Nowadays, electronic information plays a more important role in our society than ever. Before the advent of the World Wide Web information was mostly contained within the organisations that created it. The Web, along with the emergence of web-based service-oriented architectures, allowed electronic data to be easily exchanged between different parties. The quality of the exchanged data has become fundamental to the relationships between citizens, businesses, and governmental institutions. Organisations worldwide lose significant amounts of money as a direct consequence of poor data quality. The Data Warehousing Institute estimates that businesses in the United States alone lose around 600 billion dollars a year. These losses come not only from staff overtime and unnecessary printing and postage but also from the diminishing credibility of the organisation in the eyes of its customers and suppliers [6]. Another striking example of the consequences of poor data quality is the Year 2000 problem which necessitated modifications in various software applications and databases that cost about 1.5 trillion dollars [2]. Although data quality is as important to governmental institutions as it is to businesses, research has mostly focused on business application domains including customer relationship management, supply chain management, and enterprise resource management [12]. One particularly under-researched area is that of records management. Records management systems are used by different types of organisations to track documents and organise them into a records structure, which is stored in a way that allows future users to grasp the meaning of the original documents and the context within which they were created. The content, structure and context of the documents are described by various kinds of metadata which can be assigned manually or automatically by the system. In many countries, records management systems are widely used by the public administration, in which case the systems typically need to be in line with a number of laws and regulations in order to ensure that the actions of the public bodies are properly documented. Norwegian law, for instance, sets the obligation for public bodies to keep records for all cases they handle, and specifies how the recordkeeping function should be organised, what must be stored in an archive and for how long it should be retained. The archive function of the public administration of Norway includes activities such as maintaining an overview of all documents assigned to a particular case, assigning documents to cases, archiving cases that have been handled, and responding to enquiries regarding the case handling status and the content of documents. After a specified period of time, the archive material is set aside and deposited at an archival repository such as the National Archives of Norway. If a data quality error is introduced in a record (e.g. as a result of the data entry process) it becomes harder to fix the error the more time elapses from the creation of the record. The reason is that the data in the record needs to reflect facts about the state of the real world at the time the record is created and not at the time the record is fixed. So, for example, if an incorrect address is entered today and it changes after six months, fixing the record after one year will require the original correct address to be entered and not the current one. Once an erroneous record is transferred to the National Archives of Norway, it becomes impossible to fix the error due to legal reasons. As a consequence, poor data quality in records management systems used by Norwegian public bodies leads to poor data quality in the national archive, which may render the data stored in the archive useless for future generations. Although it is impossible to know how data in the national archive will be used in the future, the responsible parties need to ensure that the data are of high quality with respect to current standards. In order to meet legal requirements, all public bodies in Norway are obliged to use records management systems based on the Noark 5 standard [15]. Every Noark 5 compliant system must include the so called inner core, which provides means for storing and retrieving records from a database and an interface that allows administrators to modify records. A Noark 5 outer core is built on top of the inner core and includes functionality such as user administration, reporting,

2 2 and case handling. Finally, a Noark 5 complete system extends the outer core and includes even more functionality, but it is highly doubtful that this level of compliance has been achieved by any of the existing solutions. There was no open-source Noark 5 system that we could use in our study. We addressed the issue by designing a comprehensive architecture for records management and DQ assessment, and by building a Noark 5 inner core from scratch on top of this architecture. Due to space limitations, we will publish a description of the architecture separately and only provide a brief overview of the system here. The new Noark 5 inner core is based on Enterprise Java Beans (EJB) technology and was designed with modularity, flexibility, and scalability in mind. Some of the features of our system clearly distinguish it from the existing commercial solutions. First, it allows data quality to be objectively measured using DQ requirements, which are written in an innovative domain-specific language that we developed. Second, the system can be easily adapted to specific organisational scenarios. And third, it can be deployed both in a cluster and as a hosted service which can be accessed over the Internet. We will release the Noark 5 system under an open-source license, which is an important contribution to both the records management community and to society in general. The goal of this study is to develop an understanding of how data quality can be objectively measured in records management systems on the basis of user-defined DQ requirements. We begin our discussion in Section II with a brief description of the research methodology we used. Section III contains an overview of data quality, data quality dimensions and metrics, and data quality methodologies. In Section IV we discuss the Noark 5 standard focusing on metadata, describe how cases are typically handled with a Noark 5 system, and identify several data quality dimensions that are applicable to records management systems. Section V presents the stateof-the-art language that we created for specifying data quality requirements. Next, we describe in Section VI the data quality component that we developed for our Noark 5 system, and show the results of validating the component with two different datasets in Section VII. Finally, we present our conclusions in Section VIII. II. RESEARCH METHODOLOGY Research in the field of Information Systems (IS) embodies two fundamental paradigms [10]. The first one, behavioural science, aims to formulate and verify theories that describe human and organisational behaviour in the context of IS. The second paradigm, design science, focuses on solving organisational problems by developing innovative artifacts. Artifacts include, among other things, algorithms, practices, and prototype solutions, which aim to facilitate the design, development and management of information systems. We followed the design science research methodology in our study and observed the research guidelines outlined by Hevner et al [10]. The three most important guidelines are described here. First, design science research should be conducted iteratively, seeking improvement of the artifact during each iteration. Second, research must result in artifacts which attempt to solve specific organisational problems. And third, research must clearly contribute to the area of the produced artifact. All these guidelines are based on the fundamental principle of design science that new knowledge is acquired through the creation and application of useful artifacts [20], [13], [9]. A. Overview III. BACKGROUND A widely accepted definition of data quality, which reflects the fact that quality cannot be uniquely defined for a product such as information, is fitness of use [3]. Such a definition is flexible because it can be used to describe data quality in many different domains. Data quality is regarded as a multidimensional concept [21] and a number of models have been created to describe it. These models identify data quality dimensions using one of three approaches [2]. In the intuitive approach, dimensions are derived based on the intuition and previous experience of the researcher. The theoretical approach makes use of formal models and logical analysis to construct the set of dimensions. In the empirical approach dimensions are identified based on experiments, and surveys and interviews with data consumers. Regardless of the utilised approach, dimensions are either generic and apply to all kinds of data, or specific to a particular domain. The perception of data quality is highly dependent on the perspective of the people who use the data [3]. Data quality, however, can be determined not only by surveying users but also by looking at the data itself, and by examining the process of accessing the data [14]. Based on this, data quality dimensions can be classified into three categories. Subjective dimensions, such as understandability, can only be assessed by users based on background and experience. Objective dimensions are assessed by analysing the information itself. An example of such a dimension is completeness. Process dimensions, such as response time, are assessed by querying the data. Both objective and process dimensions can be measured automatically and may have objective scores, while subjective dimensions do not allow for objective scores. B. Data quality models A data quality model developed using the intuitive approach is the one proposed by Redman [17]. He identified a number of dimensions and divided them into three categories. The first category includes dimensions related to the conceptual schema of data. The second category contains dimensions related to data values. Finally, the third category focuses on dimensions related to the internal representation of data. Following the empirical approach, Wang and Strong [23] developed a data quality model on the basis of what data quality means to data consumers. Surveys were used as the main research tool and a framework was developed, which according to the authors closely represents the aspects of data quality which are most important to data consumers. The framework includes twenty dimensions grouped into four categories intrinsic data quality, contextual data quality, repre-

3 3 sentational data quality, and accessibility data quality. Intrinsic data quality includes dimensions which are fundamental to all types of data. Contextual data quality contains dimensions that must be considered with regard to the specific task at hand. Representational data quality includes dimensions that are related to the format and meaning of data. The last category, accessibility data quality, covers dimensions related to the accessibility and security of data. A similar but more recent data quality model is the one developed by Bovee, Srivastava and Mak [3]. They classified data quality dimensions into four categories, namely integrity, interpretability, relevance and accessibility. The dimensions in the first category are intrinsic in nature, while the dimensions in the other three categories are extrinsic. Wand and Wang [21] developed a theoretical model of data quality by looking at discrepancies between two different views of the real world the one obtained by directly observing real-world phenomena, and the one obtained by inspecting an information system that represents the same phenomena. Four intrinsic data quality dimensions were derived which specify quality of data according to whether data are complete, unambiguous, meaningful, and correct. Incompleteness occurs when a real-world state is not represented in the information system. Ambiguity is observed when two or more real-world states are mapped to the same information system state. An information system state that does not map to a real-world state is meaningless. Finally, incorrectness occurs when a realworld state is mapped to the wrong information system state. The last two problems are a direct result of what the authors call garbling. Garbling is usually caused by errors in the data entry process. C. Metrics The quality of the data in an information system can be determined using various metrics, which are either based on some fundamental principles or created on an ad hoc basis. Metrics are used in both subjective and objective evaluation of data quality. Pipino, Lee and Wang [16] propose three principles for developing DQ metrics simple ratio, MIN and MAX operations, and weighted average. Simple ratio is used for dimensions such as completeness and consistency, which can be evaluated for a given piece of data by testing whether each of the data elements satisfy some condition. Data quality is commonly measured on a scale of 0 to 1 where 0 denotes poor quality and 1 denotes good quality. For this reason, simple ratio is defined as the number of successful outcomes divided by the total number of outcomes, subtracted from one. Next, MIN and MAX operations are useful for dimensions such as believability where multiple DQ values are typically aggregated, e.g. by interviewing users. MIN is conservative because it assigns to the dimension the lowest of the DQ values while MAX is liberal as it assigns the highest of the values. Finally, weighted average is appropriate for multivariate cases but requires good understanding of how important each variable is for the dimension that is being evaluated. A number of metrics based on the three principles outlined here are proposed by Batini and Scanniapeco [2]. D. Methodologies A data quality methodology is a set of models, techniques and guidelines which define a systematic process for measuring and improving the quality of the data in an organisation [2]. Methodologies can be classified according to various criteria. One of them distinguishes between general-purpose and specific-purpose methodologies. General-purpose methodologies include a variety of activities that can be applied in different domains. These methodologies are also known as management methodologies because they follow several key principles of quality management. Special-purpose methodologies address a particular activity or a specific application domain. For example, DQ assessment methodologies focus on measuring and assessing the data quality in an organisation, benchmarking the results against other organisations or a set of best practices, and suggesting suitable improvement steps. We provide here a summary of the key data quality management and assessment methodologies. 1) TDQM The Total Data Quality Management methodology (TDQM) is a comprehensive methodology for data quality management, which is extensively based on the principles of Total Quality Management (TQM) [22]. Analogously to product manufacturing in TQM, information manufacturing in TDQM is the process in which a system works with raw data to create an information product (IP). TDQM includes four phases to define requirements for, measure, analyse, and improve data quality. Executing the phases iteratively is essential to producing highquality IP. The methodology also distinguishes between four different types of stakeholders. Information suppliers gather or create raw data. Information manufacturers are responsible for designing and developing the systems that produce IP. Information consumers use the IP. Finally, IP managers are responsible for managing the whole IP production cycle. 2) TIQM Another methodology for managing data quality is the Total Information Quality Management methodology (TIQM) developed by English [7]. TIQM includes five phases, which are executed iteratively and focus on assessing the quality of both information architecture and data, measuring costs and risks, correcting data, and improving business processes. There is a separate phase that does not have a fixed beginning and end, which aims to make data quality an important part of the organisational culture. The author of the methodology makes an important point about data quality management. He suggests that DQ problems can hardly be solved by implementing a one-time improvement programme because they are caused by broken business processes. Unless the processes are improved, they will keep producing poor-quality data. Moreover, due to the same reason, better data quality cannot be achieved only by using data correction software. 3) HIQM A more recently developed methodology is the Hybrid Information Quality Management methodology (HIQM) developed by Cappiello, Ficiaro and Pernici [4]. HIQM is based on

4 4 TDQM but includes eight phases environment analysis, resource management, quality requirements definition, quality measurement, analysis and monitoring, improvement, strategy correction, and warning management. Several of these phases distinguish HIQM from other methodologies. The environmental analysis phase, for example, focuses on acquiring knowledge of the organisational processes and data and determining the feasibility of introducing a data quality management programme in the organisation. Improvement steps at the strategic level are executed as part of the strategy correction phase. Finally, the warning management phase supports interactive analysis of any errors originating in the improvement phase. 4) CDQ The Comprehensive Data Quality methodology (CDQ) attempts to integrate processes found in other methodologies such as TDQM and TIQM and is applicable to both structured and unstructured data [1]. It includes three major phases which are executed iteratively state reconstruction, assessment, and choice of improvement steps. The state reconstruction phase aims at modelling the organisational units, processes, services, and data sources, along with the relationships between them. The assessment phase focuses on measuring and assessing data quality along different dimensions and on setting new quality targets. Before data quality is assessed, however, relevant DQ issues are identified by interviewing users. The last phase in the methodology helps organisations to select the optimal improvement steps by evaluating different alternatives using cost-benefit analysis. 5) IPMAP The last management methodology we describe is the IPMAP methodology which was developed by Shankaranarayan, Ziad and Wang [18]. It follows the principles of IP manufacturing defined by TDQM and aims to support decision makers in scenarios which are characterised by large volumes of data, data sources distributed across several locations, and multiple stakeholders. The methodology includes three major components. The first component, called IPMAP, provides a set of constructs for modelling the stages of IP manufacturing. The second component in the framework is a set of metadata elements which can be attached to any of the manufacturing stages. Metadata elements include, among other things, the business rules associated with a given stage, the business unit responsible for it, and the timeliness, accuracy and completeness dimensions associated with the data at that stage. The third component of the framework is a set of capabilities that allow the decision maker to estimate the time required to produce an IP and to determine the exact stage in the manufacturing process in which data quality problems originated. 6) AIMQ The AIMQ methodology provides a comprehensive set of techniques for assessing and benchmarking data quality [11]. It includes three components. The first component is a DQ model which defines quality based on what it means to various stakeholders. The second component is a questionnaire for assessing quality in terms of the dimensions that are important to stakeholders. The last component provides two techniques to help organisations interpret the collected measurements. One of them calculates the gap between assessments of different stakeholders, while the other is used for benchmarking data quality against a set of best practices. 7) Francalanci and Pernici The assessment methodology developed by Francalanci and Pernici aims to overcome the limitations of other methodologies, which focus on assessing quality only by looking at the source where data are stored without considering how users perceive quality [8]. Indeed, different users have different requirements and their perceptions of the quality of the same dataset may vary. The methodology allows quality to be assessed from the perspective of different users by assigning them to classes, which group similar users together. Generally, users in the same class access the same system services and use the same types of data. Explicit data quality requirements are specified for each class by associating an evaluation function to each dimension and setting a minimum acceptable value for that dimension. Class-level requirements are inherited by all users in the class but can be modified by each user based on his personal preferences. When a user requests data from a service, the system evaluates the quality of the data, checks whether the data satisfy class-level and user-level requirements, and presents the results to the user. E. Methodologies and Noark 5 A suitable DQ methodology can help public bodies that use Noark 5 systems to produce higher-quality data and thus increase data reusability and provide better services to citizens. However, in order to be applicable to records management, a methodology needs to include a number of important features. First, it must offer techniques for data and business process modelling. Second, it must allow data consumers to specify which DQ dimensions are important and to define DQ requirements in a format that can be parsed by software and used for objective DQ assessment. The methodology must also provide suitable data quality improvement techniques and a means for estimating the cost of improvement. Note that data correction tools which require little or no human involvement cannot be used in the context of Noark 5 systems because records in such systems may be modified only by designated people due to legal reasons. Next, the methodology has to include capabilities for benchmarking data quality, improving business processes that lead to poor DQ, and correcting the strategy of the organisation. Finally, it must offer guidelines for making DQ an important part of the organisational culture. A careful analysis of the reviewed methodologies reveals that none of them support all previously outlined features. The analysis results are summarised in Table I. Although the methodologies are not directly applicable to Noark 5, some of their features can be combined in order to develop a comprehensive data quality management methodology for records management systems. Some of the important features are the data quality requirements definition in TDQM, cost-benefit analysis in CDQ, modelling techniques in IPMAP, and the

5 5 TABLE I: Data quality methodologies Feature / Methodology TDQM TIQM HIQM CDQ IPMAP AIMQ Francalanci & Pernici Modelling techniques User-defined DQ dimensions Executable DQ requirements Objective assessment Benchmarking Business process improvement Improvement costs analysis Strategy correction Organisational culture transformation techniques for DQ measurement and benchmarking in AIMQ. Creating a new methodology is outside the scope of the present study but we lay the groundwork for such an endeavour by identifying several dimensions that are applicable to Noark 5 and which can be objectively measured, and by proposing an approach for measuring data quality along these dimensions on the basis of user-defined DQ requirements. IV. THE NOARK 5 STANDARD Apart from specifying functional requirements, the Noark 5 standard defines metadata that must be supported by compliant systems. Metadata are conceptually represented with an entity relationship model which we translated to EJB entities when we developed our system entities in the model correspond to classes, attributes correspond to class fields, and relationships between entities correspond to fields that reference other classes. Noark 5 systems can handle both electronic and paper documents, which are stored externally and are thus outside the scope of the system. For this reason, our attention is drawn to the data quality of the metadata and not the actual documents. This section provides an overview of the Noark 5 conceptual model, describes how cases are typically handled with Noark 5 systems, and presents several applicable data quality dimensions. A. Metadata The key entities in the Noark 5 conceptual model are depicted in Fig. 1, while a detailed description of their attributes can be found in the standard. The dotted lines in the diagram denote entity relationships which are part of the standard and fully supported by our system but were disregarded in this study for the sake of simplicity. The work with any Noark 5 system begins by creating a fonds, i.e. an archive. A fonds is associated with one or more fonds creators and references a number of series. Each series points to case files which represent cases in the system. Fonds, series, and case files all have associated storage locations, which denote where documents are physically stored (in the case of paper documents). Each series points to a classification system with a number of classes and keywords. A classification system may be based on social security numbers. In that case, a given class will be the social security number of a particular person. Classes are therefore used to organise information e.g. all records for one person will belong to the same class. Keywords on the other hand are used to facilitate searching. A case file includes one or more registry entries each of which records a particular transaction and points to documents related to the case. The link from a registry entry to a document is indirect a registry entry is first linked to a document description that references one or more document objects, which in turn point to either electronic or paper documents. Both case files and registry entries can be assigned to classes and can have associated keywords. B. Case handling Users of every Noark 5 system are assigned a role on the basis of how they use the system. The two most important roles are archive leader and executive officer. An archive leader manages the archive of an organisation. Given a new Noark 5 system, he establishes the archive structure by creating a fonds, possibly several subfonds, a series, etc. He also assigns cases to executive officers for handling, monitors the cases in his organisation or department, and ensures that they are handled both efficiently and effectively. An archive leader has superuser access rights to the system, which allows him, among other things, to modify the access privileges of other users, edit wrongly entered metadata, and move misclassified cases. Executive officers are responsible for handling the cases that have been assigned to them by the archive leader. The Business Process Modelling Notation (BPMN) diagram in Fig. 2 shows the process of case handling. When a citizen submits an application to a public body, a Noark 5 system automatically creates a new case and sends it to the archive leader. At this stage, the case has a status of registered. The archive leader then delegates the case to one of the executive officers. Once the officer opens the case, its status changes to reserved for editing. The officer adds a number of registry entries to the case over some period of time. Each entry includes documents related to the case. Once the case is complete, it gets a status of finished. At this point, the

6 6 Fig. 1: Noark 5 simplified conceptual model Fig. 2: Case handling in Noark 5 officer writes a response that will later be sent to the citizen and forwards both the response and the case to the archive leader for approval. The status of the case changes to pending approval. If the case is not complete, the archive leader sends it back to the officer. Otherwise, he signs the response, either electronically or on paper, sends it to the citizen, and sets the status of the case to archived. In the presence of a DQ measurement component, officers would be able to check the quality of any given case before submitting it for approval. Furthermore, such a component would facilitate the archive leader in deciding whether a case should be archived or sent back to an officer for corrections. C. Dimensions applicable to Noark 5 Many of the data quality dimensions identified in the literature are applicable to Noark 5 but we focus in our analysis only on those that can be objectively measured. They are listed below along with two other dimensions which are specific to Noark 5 and possibly to the domain of records management. Completeness can be defined as the degree to which data are of sufficient depth, breadth, and scope for the task at hand [2]. We consider an object (entity) in Noark 5, to have good data quality in terms of completeness when the important fields in the object (as regarded by users of the data) are not NULL and non-empty if they are strings. Accuracy is defined by Redman [17] as the distance between a data value v and the real-world value v

7 7 which v represents. Batini and Scannapieco [2] call this type of accuracy semantic accuracy and also discuss syntactic accuracy, which unlike semantic accuracy can be measured objectively. In syntactic accuracy we are not interested in comparing v to v but instead in comparing v to the possible values that it can take, i.e. its input domain. An object in Noark 5 has good data quality in terms of syntactic accuracy when each of its fields matches a value in its input domain. Consistency is described by a set of constraints that must hold true for all objects in the system. These constraints can be defined for one or more fields in a single class or in a class and a referenced class. There are two major types of integrity constraints, namely functional dependency and inclusion dependency. Functional dependency constraints specify which fields depend on what values of other fields in the same or in another object. Inclusion dependency constraints indicate that an object must refer to other objects, or that it includes objects that are also included in an object that it references. Correctness describes the situation in which every realworld state maps to the correct system state [21]. A case in Noark 5, for example, can be assigned to the wrong class, which produces incorrectness that cannot be automatically detected. However, other errors of the same type can be detected. For example, it is possible to detect that a case is assigned to the wrong series when the creation date of the case is not between the series start and end dates. Disposal and processing delay are two other dimensions that we believe are important to the domain of records management. A case in a Noark 5 system has low data quality along the disposal dimension when the age of the case is over some specified period of time and the case has not been removed from the system. Note that case disposal is required due to legal requirements. A case has low data quality along the processing delay dimension when no new registry entries have been added to it for some period of time but it has not been archived. V. LANGUAGE FOR SPECIFYING DQ REQUIREMENTS Noark 5 does not consider data quality but we hope that this will change in the future versions of the standard, which not only needs to specify what constitutes good data quality but also to define different levels of DQ compliance. Translating DQ requirements to code that measures quality is difficult and has to be done on an ad hoc basis unless a domain-specific language is used to describe the requirements. However, to the best of our knowledge, no such language exists, and DQ requirements are typically defined in a non-portable way even in commercial data quality software systems. We developed a portable domain-specific language which allows DQ requirements to be specified in the form of DQ rules for the six dimensions described in Section IV. The language can be used to measure data quality in a wide variety of systems and does not impose our definitions of DQ dimensions to its users but instead provides constructs that allow users to specify what data quality aspects are included in a dimension. We wrote the language grammar in EBNF and used ANTLR to construct a language parser and interpreter. Due to space limitations, the grammar and the actual data quality rules that we developed for Noark 5 will be published separately. A description of the language and a representative set of the rules will be provided here instead. The language we developed works on objects and can therefore be used to measure data quality in systems that use objectoriented databases, but also in systems that use relational databases, although in this case an object-relational mapping is required to translate the relational schema to a set of interrelated objects. In fact, our system works with a relational database and uses Hibernate to implement the object-relational mapping. Focusing on the object-oriented view of the data, each DQ rule in our language describes what good data quality is for a given class in terms of its fields. Evaluating the data quality of every case in our Noark 5 system is done in three steps. First, a set of DQ rules is defined for the classes that taken together form a case. Second, a language interpreter measures the DQ of all instances of these classes based on the previously defined rules. Finally, the measurements are combined into a single data quality value for each case. Structure of DQ rules Every DQ rule in our language includes three parts header, body, and trailer. The header specifies a dimension that the rule contributes to and a class whose instances will be inspected by the rule. The body includes one or more DQ tests which are executed for each instance of the class and return a boolean value as a result. Note that TRUE indicates good DQ, while FALSE denotes poor DQ. A DQ test is made up of one or two logical predicates. Finally, the trailer marks the end of a rule with the string. Predicates Predicates use one of several supported operators to compare fields or a field with a literal, and return a boolean value. Supported operators include the relational operators <, <=, >, >=, == and!=, as well as seven other operators which will be illustrated later in the text using Noark-specific examples. The types of fields that can be used in a predicate are those present in Noark 5, i.e. integer, string, date, reference to an object, or a set of references to objects. Measuring DQ The language interpreter evaluates each DQ rule for a given dimension for all instances of the class specified in the header and returns as a result the number of successful and unsuccessful DQ tests for each instance. This result, along with the dimension that it applies to, is stored in a database using an approach similar to that proposed by Storey and Wang [19]. Storing DQ measurements in this way allows us to calculate data quality along a given dimension for a single object or for an arbitrary set of objects. Consequently, we can easily calculate data quality for objects in Noark 5 which stand on their own such as the instances of StorageLocation,

8 8 Author, and FondsCreator. More importantly, we can calculate the data quality of cases, which consist of a number of objects of the classes CaseFile, RegistryEntry, DocumentDescription, and DocumentObject. Example DQ rule Listing 1 shows an example rule which will be applied to all instances of the Type1 class. Note that the rule specifies requirements not only for the Type1 instances but also for the Type2 objects that each Type1 instance refers to. The name of the reference is rtype2. A reference is essentially a pointer from one object to another. DIM "Example" FOR Type1 t1, rtype2 t2 t1.intf >= 1; WHEN t1.strf MATCHES "ab" THEN t2.intf > 0; Listing 1: Example DQ rule The example rule contributes to a dimension called Example and includes two DQ tests. The first test contains a single predicate and evaluates to TRUE when the integer field intf in the instance of Type1 is greater than or equal to 1. The second DQ test uses a WHEN-THEN construct with two predicates. This test returns TRUE either when the string field strf in the Type1 instance does not match ab, or when it matches ab and the integer field intf in the object pointed to by the field rtype2 is greater than 0. Noark 5 DQ rules We defined eighteen rules for Noark 5 with more than seventy DQ tests. In the following we provide a small subset of these rules in order to better illustrate the capabilities of the language we developed. The rules in Listing 2 contribute to the completeness dimension. The first one works on CaseFile objects and returns TRUE when the object s file type is not NULL or an empty string. The second rule works on RegistryEntry objects and evaluates to TRUE when the object s created date is not NULL. DIM "Completeness" FOR CaseFile f EXISTS f.filetype; DIM "Completeness" FOR RegistryEntry e EXISTS e.createddate; Listing 2: Completeness rules The rule in Listing 3 applies to syntactic accuracy and works on CaseFile objects. It returns TRUE when the object s status matches either of the strings open and closed. Note that any regular expression can be placed on the right-hand side of the MATCHES operator. DIM "Syntactic Accuracy" FOR CaseFile f f.casestatus MATCHES "(open closed)"; Listing 3: Syntactic accuracy rule The rule in Listing 4 is part of the consistency dimension and works on CaseFile objects and on a Series object that each of them references. The first test evaluates to TRUE when the CaseFile object contains a non-empty set of references to RegistryEntry objects. The second test returns TRUE when the reference to StorageLocation in the CaseFile object is contained in the set of references to StorageLocation in the series that the case belongs to. DIM "Consistency" FOR CaseFile f, rseries s INCLUDES f.refregistryentries; f.reflocation CONTAINED IN s.reflocations; Listing 4: Consistency rule The rule in Listing 5 contributes to the correctness dimension and works on CaseFile objects and on a Series object that each of them points to. The rule evaluates to TRUE when the case creation date is between the start and end dates of the corresponding series. DIM "Correctness" FOR CaseFile f, refseries s f.createddate BETWEEN s.startdate,s.enddate; Listing 5: Correctness rule The rule in Listing 6 is part of the disposal dimension and works on CaseFile objects. It evaluates to TRUE when the case age is less than 1825 days (five years). The creation date of the case is denoted by the createddate field in the CaseFile object. DIM "Disposal" FOR CaseFile f f.createddate AGE< 1825; Listing 6: Disposal rule Finally, the rule in Listing 7 applies to the processing delay dimension and works on CaseFile objects and on the set of referenced RegistryEntry objects. The rule evaluates to TRUE for cases that are currently being processed when the age of the newest registry entry in the case is less than five days. DIM "Processing Delay" FOR CaseFile f, rentry r WHEN f.casestatus MATCHES "open" THEN r.createddate AGE_LATEST< 5; Listing 7: Processing delay rule VI. DATA QUALITY COMPONENT The data quality component that we developed for our Noark 5 system is made up of two separate units a backend which evaluates the data quality in the system, and a frontend which triggers data quality analysis and visualises the results as reported by the backend. We implemented the backend as an EJB module that can be easily plugged into our Noark 5 system. The frontend is a web application based on the ZK

9 9 Fig. 3: Visual appearance of the DQ frontend framework. It communicates with the backend either by using remote objects or web services. The backend provides a number of functionalities to its clients. A client can start data quality analysis both synchronously and asynchronously, check whether data quality analysis is in progress, retrieve raw data quality results, i.e. results for each object for a given dimension, or retrieve the results for the cases handled by the system. When a client triggers the data quality analysis process, a language recogniser parses the DQ rules defined in a specific text file in the module. Then an interpreter measures data quality based on the rules and stores the results in the database. Parsing the rules every time analysis is triggered is not an expensive operation and at the same time it allows new rules to be added on the fly without restarting the system. When a client needs to retrieve the data quality results for all cases in the system, the module uses the raw data quality results stored in the database to calculate a data quality value ranging from 0 to 1 for each case by using simple ratio as defined by Pipino et al [16]. This approach is preferred to fetching raw data quality results as that may impose a significant communication overhead. However, the module also provides the necessary mechanisms to allow a client to fetch raw results and calculate a metric based on some principle different than simple ratio if it wishes to do so. Most of the functionality of the module is contained in a class library which can be reused in other systems that are not based on EJB. The frontend (Fig. 3) allows its users to start DQ analysis of the data in the system and to visualise the results once the analysis is performed. It contains three graphical widgets a list of all analysed dimensions, a list of the cases in the system, and a dial chart for displaying data quality measurements on a scale of 0 to 100. Both lists provide sorting capabilities. For example, a user can sort the available cases by their name or author. When a user selects one or more dimensions and one or more cases from the lists, the application calculates the average quality for the selected cases in each of the selected dimensions, calculates the average of these values, and visualises it using the dial chart. VII. RESULTS We validated the data quality component by using it to analyse the quality of two different datasets one with good and one with poor quality. Both datasets were created automatically by a tool that connects to the Noark 5 system, establishes a basic archive structure, and creates 100 cases. Each case includes a random number of RegistryEntry, DocumentDescription, and DocumentObject objects. The tool operates in two modes. In the first mode, it follows the data quality rules we specified for Noark 5 when it creates the cases. In the second mode, it randomly introduces specific types of errors which are described below. In order to lower the quality of the data in terms of completeness, the tool creates objects with NULL or empty fields. Accuracy is decreased by introducing syntactically incorrect values for string fields. Consistency is lowered by randomly breaking the inclusion and functional dependency rules that we specified. For example, the tool deliberately sets the finalised date of randomly selected cases to one that is older than their creation date. It also sets the storage location of randomly chosen cases to a location that is not included in the series that the cases belong to. Correctness is decreased by changing the creation date of random cases to a date which is not between the start and end dates of the series that the cases belong to. The tool reduces data quality along the disposal dimension by setting the creation date of random cases to a date that is more than five years older than the current date. Finally, quality along the processing delay dimension is lowered by setting the status of random cases to being processed and setting the creation date of all registry entries in those cases to a date that is older than five days. The data quality module successfully analysed both data sets and reported data quality of 1.0 along the six dimensions for the data set that follows our rules, and a lower quality for the other set. We verified that the module detects all types of

10 10 errors that we had deliberately introduced in one of the data sets. VIII. CONCLUSION The goal of the present study was to develop an understanding of how data quality can be objectively measured in records management systems on the basis of user-defined requirements. In order to answer this main research question, we identified a number of data quality issues in the Noark 5 standard, experimented with different measurement techniques for the identified issues, and developed a software component for DQ assessment. As there was no open-source Noark 5 system that we could use in our study, we designed a comprehensive architecture for records management and DQ assessment and developed a new Noark 5 core based on it. Our system is not only modular and flexible but unlike existing commercial solutions supports data quality analysis and can be deployed both in a clustered environment and as a hosted service. We provided the background necessary for our study by investigating several approaches to identifying data quality dimensions, by looking at fundamental principles for developing metrics, and by outlining a number of DQ management and assessment methodologies. Our analysis of these methodologies showed that they cannot be directly applied in organisations that use Noark 5 systems due to the specifics of the records management domain. After identifying six applicable dimensions that can be objectively measured, we developed a domain-specific language for expressing data quality requirements. The language can be used to measure data quality in a variety of systems regardless of the technology they are based on. Moreover, it is applicable not only to Noark 5 but also to other records management standards such as MoReq2, which is used both by public bodies and private organisations throughout the European Union [5]. The approach of specifying data quality requirements with a domain-specific language is highly innovative because to the best of our knowledge even commercial DQ solutions specify such requirements in a non-portable way. This paper makes an important contribution by bridging the domains of records management and data quality, which has not been done before. Our study showed that good data quality is crucial for records management systems, particularly for those based on Noark 5, but that the area is under-researched. DQ is not even part of the Noark 5 standard and although the implications of poor-quality data are not widely understood and not currently observable, they may render data in the national archive of Norway unusable in the future. For this reason, data quality must be considered in the next version of the standard. The language we developed is also an important contribution because it allows, among other things, a uniform set of DQ requirements to be developed for public bodies in Norway and used to objectively measure and benchmark the quality of the data they produce. Our study has two limitations which are related to the set of DQ dimensions that we considered, and the practical utility of the domain-specific language. A study of other dimensions applicable to records management systems will help to better understand the scope and importance of DQ issues in such systems. Furthermore, the language we developed needs to be tested in different scenarios and possibly extended with new operators, constructs, and DQ tests. The ultimate goal of the research of data quality in records management systems should be the development of a comprehensive DQ management methodology for such systems as well as suite of software tools to support the methodology. This paper provides a good start both from a theoretical and practical perspective. REFERENCES [1] C. Batini, F. Cabitza, C. Cappiello, and C. Francalanci. A comprehensive data quality methodology for web and structured data. International Journal of Innovative Computing and Applications, 1(3):205, [2] C. Batini and M. Scannapieco. Data Quality: Concepts, Methodologies and Techniques. Springer, Sept [3] M. Bovee, R. P. Srivastava, and B. Mak. A conceptual framework and belief-function approach to assessing overall information quality. International Journal of Intelligent Systems, 18(1):51 74, Jan [4] C. Cappiello, P. Ficiaro, and B. Pernici. HIQM: a methodology for information quality monitoring, measurement, and improvement. In Advances in Conceptual Modeling - Theory and Practice, volume 4231, pages Springer Berlin Heidelberg, Berlin, Heidelberg, [5] E. Commission. Model requirements for the management of electronic records - update and extension 2008, [6] W. Eckerson. Data quality and the bottom line. Technical report, [7] L. English. Total information quality management - a complete methodology for IQ management. Information Management Magazine, [8] C. Francalanci and B. Pernici. Data quality assessment from the user s perspective. In Proceedings of the 2004 international workshop on Information quality in informational systems - IQIS 04, page 68, Paris, France, [9] A. Hevner, S. Chatterjee, A. Hevner, and S. Chatterjee. Design science research in information systems. In Design Research in Information Systems, volume 22, pages Springer US, Boston, MA, [10] A. R. Hevner, S. T. March, J. Park, and S. Ram. Design science in information systems research. MIS Quarterly, 28(1):75 105, Mar [11] Y. Lee. AIMQ: a methodology for information quality assessment. Information & Management, 40(2): , Dec [12] S. E. Madnick, R. Y. Wang, Y. W. Lee, and H. Zhu. Overview and framework for data and information quality research. Journal of Data and Information Quality, 1(1):1 22, June [13] S. T. March and G. F. Smith. Design and natural science research on information technology. Decision Support Systems, 15(4): , Dec [14] F. Naumann and C. Rolker. Assessment methods for information quality criteria [15] N. A. S. of Norway. Noark 5 standard for records management, Apr [16] L. L. Pipino, Y. W. Lee, and R. Y. Wang. Data quality assessment. Communications of the ACM, 45(4):211, Apr [17] T. C. Redman. Data Quality for the Information Age. Artech House, Jan [18] G. Shankaranarayan, M. Ziad, and R. Y. Wang. Managing data quality in dynamic decision environments. Journal of Database Management, 14(4):14 32, [19] V. C. Storey and R. Y. Wang. Modeling quality requirements in conceptual database design. pages 64 87, [20] T. Tuunanen, M. A. Rothenberger, S. Chatterjee, and K. Peffers. A design science research methodology for information systems research. Journal of Management Information Systems, 24(3):45 77, [21] Y. Wand and R. Y. Wang. Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39(11):86 95, Nov [22] R. Y. Wang. A product perspective on total data quality management. Communications of the ACM, 41(2):58 65, Feb [23] R. Y. Wang and D. M. Strong. Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems, 12(4):5 33, Mar

Data Quality Framework

Data Quality Framework #THETA2017 Data Quality Framework Mozhgan Memari, Bruce Cassidy The University of Auckland This work is licensed under a Creative Commons Attribution 4.0 International License Two Figures from 2016 The

More information

Level 4 Diploma in Computing

Level 4 Diploma in Computing Level 4 Diploma in Computing 1 www.lsib.co.uk Objective of the qualification: It should available to everyone who is capable of reaching the required standards It should be free from any barriers that

More information

Requirements Engineering for Enterprise Systems

Requirements Engineering for Enterprise Systems Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2001 Proceedings Americas Conference on Information Systems (AMCIS) December 2001 Requirements Engineering for Enterprise Systems

More information

DLV02.01 Business processes. Study on functional, technical and semantic interoperability requirements for the Single Digital Gateway implementation

DLV02.01 Business processes. Study on functional, technical and semantic interoperability requirements for the Single Digital Gateway implementation Study on functional, technical and semantic interoperability requirements for the Single Digital Gateway implementation 18/06/2018 Table of Contents 1. INTRODUCTION... 7 2. METHODOLOGY... 8 2.1. DOCUMENT

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Preservation Planning in the OAIS Model

Preservation Planning in the OAIS Model Preservation Planning in the OAIS Model Stephan Strodl and Andreas Rauber Institute of Software Technology and Interactive Systems Vienna University of Technology {strodl, rauber}@ifs.tuwien.ac.at Abstract

More information

Introduction to and Aims of the Project : Infocamere and Data Warehousing

Introduction to and Aims of the Project : Infocamere and Data Warehousing Introduction to and Aims of the Project : Infocamere and Data Warehousing Some Background Information Infocamere is the Italian Chambers of Commerce Consortium for Information Technology and as such it

More information

UML-Based Conceptual Modeling of Pattern-Bases

UML-Based Conceptual Modeling of Pattern-Bases UML-Based Conceptual Modeling of Pattern-Bases Stefano Rizzi DEIS - University of Bologna Viale Risorgimento, 2 40136 Bologna - Italy srizzi@deis.unibo.it Abstract. The concept of pattern, meant as an

More information

On the Design and Implementation of a Generalized Process for Business Statistics

On the Design and Implementation of a Generalized Process for Business Statistics On the Design and Implementation of a Generalized Process for Business Statistics M. Bruno, D. Infante, G. Ruocco, M. Scannapieco 1. INTRODUCTION Since the second half of 2014, Istat has been involved

More information

Building UAE s cyber security resilience through effective use of technology, processes and the local people.

Building UAE s cyber security resilience through effective use of technology, processes and the local people. WHITEPAPER Security Requirement WE HAVE THE IN-HOUSE DEPTH AND BREATH OF INFORMATION AND CYBER SECURIT About Us CyberGate Defense (CGD) is a solution provider for the full spectrum of Cyber Security Defenses

More information

Towards a Vocabulary for Data Quality Management in Semantic Web Architectures

Towards a Vocabulary for Data Quality Management in Semantic Web Architectures Towards a Vocabulary for Data Quality Management in Semantic Web Architectures Christian Fürber Universitaet der Bundeswehr Muenchen Werner-Heisenberg-Weg 39 85577 Neubiberg +49 89 6004 4218 christian@fuerber.com

More information

Summary - Review of the legal conditions when using cloud computing in the municipal sector feasibility study

Summary - Review of the legal conditions when using cloud computing in the municipal sector feasibility study KS FoU-project 144008: Summary - Review of the legal conditions when using cloud computing in the municipal sector feasibility study April 2015 Advokatfirmaet Føyen Torkildsen AS -1- 1 Introduction Use

More information

Telecom Italia response. to the BEREC public consultation on

Telecom Italia response. to the BEREC public consultation on Telecom Italia response to the BEREC public consultation on Guidelines on Net Neutrality and Transparency: Best practise and recommended approaches - BOR (11) 44 (2 November 2011) Telecom Italia response

More information

An Approach to Software Component Specification

An Approach to Software Component Specification Page 1 of 5 An Approach to Software Component Specification Jun Han Peninsula School of Computing and Information Technology Monash University, Melbourne, Australia Abstract. Current models for software

More information

Metadata Framework for Resource Discovery

Metadata Framework for Resource Discovery Submitted by: Metadata Strategy Catalytic Initiative 2006-05-01 Page 1 Section 1 Metadata Framework for Resource Discovery Overview We must find new ways to organize and describe our extraordinary information

More information

3.4 Data-Centric workflow

3.4 Data-Centric workflow 3.4 Data-Centric workflow One of the most important activities in a S-DWH environment is represented by data integration of different and heterogeneous sources. The process of extract, transform, and load

More information

Description Cross-domain Task Force Research Design Statement

Description Cross-domain Task Force Research Design Statement Description Cross-domain Task Force Research Design Statement Revised 8 November 2004 This document outlines the research design to be followed by the Description Cross-domain Task Force (DTF) of InterPARES

More information

Academic Editor Tutorial

Academic Editor Tutorial Academic Editor Tutorial Contents I. Assignments a. Receiving an invitation b. Responding to an invitation c. Primary review i. Cascading peer review II. Inviting additional reviewers a. Reviewer selection

More information

Configuration Management for Component-based Systems

Configuration Management for Component-based Systems Configuration Management for Component-based Systems Magnus Larsson Ivica Crnkovic Development and Research Department of Computer Science ABB Automation Products AB Mälardalen University 721 59 Västerås,

More information

ISTE SEAL OF ALIGNMENT REVIEW FINDINGS REPORT. Certiport IC3 Digital Literacy Certification

ISTE SEAL OF ALIGNMENT REVIEW FINDINGS REPORT. Certiport IC3 Digital Literacy Certification ISTE SEAL OF ALIGNMENT REVIEW FINDINGS REPORT Certiport IC3 Digital Literacy Certification AUGUST 2016 TABLE OF CONTENTS ABOUT... 2 About ISTE... 2 ISTE Seal of Alignment... 2 RESOURCE DESCRIPTION... 3

More information

MESH. Multimedia Semantic Syndication for Enhanced News Services. Project Overview

MESH. Multimedia Semantic Syndication for Enhanced News Services. Project Overview MESH Multimedia Semantic Syndication for Enhanced News Services Project Overview Presentation Structure 2 Project Summary Project Motivation Problem Description Work Description Expected Result The MESH

More information

A Collaborative User-centered Approach to Fine-tune Geospatial

A Collaborative User-centered Approach to Fine-tune Geospatial A Collaborative User-centered Approach to Fine-tune Geospatial Database Design Grira Joel Bédard Yvan Sboui Tarek 16 octobre 2012 6th International Workshop on Semantic and Conceptual Issues in GIS - SeCoGIS

More information

Reverse Software Engineering Using UML tools Jalak Vora 1 Ravi Zala 2

Reverse Software Engineering Using UML tools Jalak Vora 1 Ravi Zala 2 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 03, 2014 ISSN (online): 2321-0613 Reverse Software Engineering Using UML tools Jalak Vora 1 Ravi Zala 2 1, 2 Department

More information

PRINCIPLES AND FUNCTIONAL REQUIREMENTS

PRINCIPLES AND FUNCTIONAL REQUIREMENTS INTERNATIONAL COUNCIL ON ARCHIVES PRINCIPLES AND FUNCTIONAL REQUIREMENTS FOR RECORDS IN ELECTRONIC OFFICE ENVIRONMENTS RECORDKEEPING REQUIREMENTS FOR BUSINESS SYSTEMS THAT DO NOT MANAGE RECORDS OCTOBER

More information

A Practical Look into GDPR for IT

A Practical Look into GDPR for IT Andrea Pasquinucci, March 2017 pag. 1 / 7 A Practical Look into GDPR for IT Part 1 Abstract This is the first article in a short series about the new EU General Data Protection Regulation (GDPR) looking,

More information

Building Resilience to Disasters for Sustainable Development: Visakhapatnam Declaration and Plan of Action

Building Resilience to Disasters for Sustainable Development: Visakhapatnam Declaration and Plan of Action Building Resilience to Disasters for Sustainable Development: Visakhapatnam Declaration and Plan of Action Adopted at the Third World Congress on Disaster Management Visakhapatnam, Andhra Pradesh, India

More information

HOW AND WHEN TO FLATTEN JAVA CLASSES?

HOW AND WHEN TO FLATTEN JAVA CLASSES? HOW AND WHEN TO FLATTEN JAVA CLASSES? Jehad Al Dallal Department of Information Science, P.O. Box 5969, Safat 13060, Kuwait ABSTRACT Improving modularity and reusability are two key objectives in object-oriented

More information

Semantics, Metadata and Identifying Master Data

Semantics, Metadata and Identifying Master Data Semantics, Metadata and Identifying Master Data A DataFlux White Paper Prepared by: David Loshin, President, Knowledge Integrity, Inc. Once you have determined that your organization can achieve the benefits

More information

Semantic interoperability, e-health and Australian health statistics

Semantic interoperability, e-health and Australian health statistics Semantic interoperability, e-health and Australian health statistics Sally Goodenough Abstract E-health implementation in Australia will depend upon interoperable computer systems to share information

More information

2 The IBM Data Governance Unified Process

2 The IBM Data Governance Unified Process 2 The IBM Data Governance Unified Process The benefits of a commitment to a comprehensive enterprise Data Governance initiative are many and varied, and so are the challenges to achieving strong Data Governance.

More information

What our members see as being the value of TM 2.0

What our members see as being the value of TM 2.0 About TM 2.0 The TM 2.0 ERTICO Platform originated in 2011 by TomTom and Swarco-Mizar and was formally established on 17 June 2014 during the ITS Europe Congress in Helsinki. It now comprises more than

More information

Business Analysis for Practitioners - Requirements Elicitation and Analysis (Domain 3)

Business Analysis for Practitioners - Requirements Elicitation and Analysis (Domain 3) Business Analysis for Practitioners - Requirements Elicitation and Analysis (Domain 3) COURSE STRUCTURE Introduction to Business Analysis Module 1 Needs Assessment Module 2 Business Analysis Planning Module

More information

Level 5 Diploma in Computing

Level 5 Diploma in Computing Level 5 Diploma in Computing 1 www.lsib.co.uk Objective of the qualification: It should available to everyone who is capable of reaching the required standards It should be free from any barriers that

More information

Data Quality:A Survey of Data Quality Dimensions

Data Quality:A Survey of Data Quality Dimensions Data Quality:A Survey of Data Quality Dimensions 1 Fatimah Sidi, 2 Payam Hassany Shariat Panahy, 1 Lilly Suriani Affendey, 1 Marzanah A. Jabar, 1 Hamidah Ibrahim,, 1 Aida Mustapha Faculty of Computer Science

More information

Business Architecture concepts and components: BA shared infrastructures, capability modeling and guiding principles

Business Architecture concepts and components: BA shared infrastructures, capability modeling and guiding principles Business Architecture concepts and components: BA shared infrastructures, capability modeling and guiding principles Giulio Barcaroli Directorate for Methodology and Statistical Process Design Istat ESTP

More information

is easing the creation of new ontologies by promoting the reuse of existing ones and automating, as much as possible, the entire ontology

is easing the creation of new ontologies by promoting the reuse of existing ones and automating, as much as possible, the entire ontology Preface The idea of improving software quality through reuse is not new. After all, if software works and is needed, just reuse it. What is new and evolving is the idea of relative validation through testing

More information

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment Shigeo Sugimoto Research Center for Knowledge Communities Graduate School of Library, Information

More information

Terms in the glossary are listed alphabetically. Words highlighted in bold are defined in the Glossary.

Terms in the glossary are listed alphabetically. Words highlighted in bold are defined in the Glossary. Glossary 2010 The Records Management glossary is a list of standard records terms used throughout CINA s guidance and training. These terms and definitions will help you to understand and get the most

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

Executing Evaluations over Semantic Technologies using the SEALS Platform

Executing Evaluations over Semantic Technologies using the SEALS Platform Executing Evaluations over Semantic Technologies using the SEALS Platform Miguel Esteban-Gutiérrez, Raúl García-Castro, Asunción Gómez-Pérez Ontology Engineering Group, Departamento de Inteligencia Artificial.

More information

Chapter No. 2 Class modeling CO:-Sketch Class,object models using fundamental relationships Contents 2.1 Object and Class Concepts (12M) Objects,

Chapter No. 2 Class modeling CO:-Sketch Class,object models using fundamental relationships Contents 2.1 Object and Class Concepts (12M) Objects, Chapter No. 2 Class modeling CO:-Sketch Class,object models using fundamental relationships Contents 2.1 Object and Class Concepts (12M) Objects, Classes, Class Diagrams Values and Attributes Operations

More information

MONITORING DATA PRODUCT QUALITY

MONITORING DATA PRODUCT QUALITY Association for Information Systems AIS Electronic Library (AISeL) UK Academy for Information Systems Conference Proceedings 2010 UK Academy for Information Systems Spring 3-23-2010 MONITORING DATA PRODUCT

More information

European Conference on Quality and Methodology in Official Statistics (Q2008), 8-11, July, 2008, Rome - Italy

European Conference on Quality and Methodology in Official Statistics (Q2008), 8-11, July, 2008, Rome - Italy European Conference on Quality and Methodology in Official Statistics (Q2008), 8-11, July, 2008, Rome - Italy Metadata Life Cycle Statistics Portugal Isabel Morgado Methodology and Information Systems

More information

Data Governance Central to Data Management Success

Data Governance Central to Data Management Success Data Governance Central to Data Success International Anne Marie Smith, Ph.D. DAMA International DMBOK Editorial Review Board Primary Contributor EWSolutions, Inc Principal Consultant and Director of Education

More information

6. Relational Algebra (Part II)

6. Relational Algebra (Part II) 6. Relational Algebra (Part II) 6.1. Introduction In the previous chapter, we introduced relational algebra as a fundamental model of relational database manipulation. In particular, we defined and discussed

More information

MetaData for Database Mining

MetaData for Database Mining MetaData for Database Mining John Cleary, Geoffrey Holmes, Sally Jo Cunningham, and Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand. Abstract: At present, a machine

More information

A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems

A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems Dorothee Blocks Hypermedia Research Unit School of Computing University of Glamorgan, UK NKOS workshop

More information

Eight units must be completed and passed to be awarded the Diploma.

Eight units must be completed and passed to be awarded the Diploma. Diploma of Computing Course Outline Campus Intake CRICOS Course Duration Teaching Methods Assessment Course Structure Units Melbourne Burwood Campus / Jakarta Campus, Indonesia March, June, October 022638B

More information

A MAS Based ETL Approach for Complex Data

A MAS Based ETL Approach for Complex Data A MAS Based ETL Approach for Complex Data O. Boussaid, F. Bentayeb, J. Darmont Abstract : In a data warehousing process, the phase of data integration is crucial. Many methods for data integration have

More information

Impact of Dependency Graph in Software Testing

Impact of Dependency Graph in Software Testing Impact of Dependency Graph in Software Testing Pardeep Kaur 1, Er. Rupinder Singh 2 1 Computer Science Department, Chandigarh University, Gharuan, Punjab 2 Assistant Professor, Computer Science Department,

More information

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, 00142 Roma, Italy e-mail: pimassol@istat.it 1. Introduction Questions can be usually asked following specific

More information

SDMX self-learning package No. 3 Student book. SDMX-ML Messages

SDMX self-learning package No. 3 Student book. SDMX-ML Messages No. 3 Student book SDMX-ML Messages Produced by Eurostat, Directorate B: Statistical Methodologies and Tools Unit B-5: Statistical Information Technologies Last update of content February 2010 Version

More information

Content Management for the Defense Intelligence Enterprise

Content Management for the Defense Intelligence Enterprise Gilbane Beacon Guidance on Content Strategies, Practices and Technologies Content Management for the Defense Intelligence Enterprise How XML and the Digital Production Process Transform Information Sharing

More information

Managing Changes to Schema of Data Sources in a Data Warehouse

Managing Changes to Schema of Data Sources in a Data Warehouse Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2001 Proceedings Americas Conference on Information Systems (AMCIS) December 2001 Managing Changes to Schema of Data Sources in

More information

A Visual Tool for Supporting Developers in Ontology-based Application Integration

A Visual Tool for Supporting Developers in Ontology-based Application Integration A Visual Tool for Supporting Developers in Ontology-based Application Integration Tobias Wieschnowsky 1 and Heiko Paulheim 2 1 SAP Research tobias.wieschnowsky@sap.com 2 Technische Universität Darmstadt

More information

PROJECT PERIODIC REPORT

PROJECT PERIODIC REPORT PROJECT PERIODIC REPORT Grant Agreement number: 257403 Project acronym: CUBIST Project title: Combining and Uniting Business Intelligence and Semantic Technologies Funding Scheme: STREP Date of latest

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

A Study on Website Quality Models

A Study on Website Quality Models International Journal of Scientific and Research Publications, Volume 4, Issue 12, December 2014 1 A Study on Website Quality Models R.Anusha Department of Information Systems Management, M.O.P Vaishnav

More information

Vendor: The Open Group. Exam Code: OG Exam Name: TOGAF 9 Part 1. Version: Demo

Vendor: The Open Group. Exam Code: OG Exam Name: TOGAF 9 Part 1. Version: Demo Vendor: The Open Group Exam Code: OG0-091 Exam Name: TOGAF 9 Part 1 Version: Demo QUESTION 1 According to TOGAF, Which of the following are the architecture domains that are commonly accepted subsets of

More information

A Transformation-Based Model of Evolutionary Architecting for Embedded System Product Lines

A Transformation-Based Model of Evolutionary Architecting for Embedded System Product Lines A Transformation-Based Model of Evolutionary Architecting for Embedded System Product Lines Jakob Axelsson School of Innovation, Design and Engineering, Mälardalen University, SE-721 23 Västerås, Sweden

More information

Data Management Glossary

Data Management Glossary Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative

More information

ISO INTERNATIONAL STANDARD. Information and documentation Managing metadata for records Part 2: Conceptual and implementation issues

ISO INTERNATIONAL STANDARD. Information and documentation Managing metadata for records Part 2: Conceptual and implementation issues INTERNATIONAL STANDARD ISO 23081-2 First edition 2009-07-01 Information and documentation Managing metadata for records Part 2: Conceptual and implementation issues Information et documentation Gestion

More information

Data Models: The Center of the Business Information Systems Universe

Data Models: The Center of the Business Information Systems Universe Data s: The Center of the Business Information Systems Universe Whitemarsh Information Systems Corporation 2008 Althea Lane Bowie, Maryland 20716 Tele: 301-249-1142 Email: Whitemarsh@wiscorp.com Web: www.wiscorp.com

More information

ICT & Computing Progress Grid

ICT & Computing Progress Grid ICT & Computing Progress Grid Pupil Progress ion 9 Select, Algorithms justify and apply appropriate techniques and principles to develop data structures and algorithms for the solution of problems Programming

More information

The Impact of Data Quality Tagging on Decision Outcomes

The Impact of Data Quality Tagging on Decision Outcomes Association for Information Systems AIS Electronic Library (AISeL) ACIS 2001 Proceedings Australasian (ACIS) 2001 The Impact of Data Quality Tagging on Decision Outcomes Graeme Shanks The University of

More information

Data Curation Handbook Steps

Data Curation Handbook Steps Data Curation Handbook Steps By Lisa R. Johnston Preliminary Step 0: Establish Your Data Curation Service: Repository data curation services should be sustained through appropriate staffing and business

More information

Alignment of Business and IT - ArchiMate. Dr. Barbara Re

Alignment of Business and IT - ArchiMate. Dr. Barbara Re Alignment of Business and IT - ArchiMate Dr. Barbara Re What is ArchiMate? ArchiMate is a modelling technique ("language") for describing enterprise architectures. It presents a clear set of concepts within

More information

Participatory Quality Management of Ontologies in Enterprise Modelling

Participatory Quality Management of Ontologies in Enterprise Modelling Participatory Quality Management of Ontologies in Enterprise Modelling Nadejda Alkhaldi Mathematics, Operational research, Statistics and Information systems group Vrije Universiteit Brussel, Brussels,

More information

Usability Evaluation as a Component of the OPEN Development Framework

Usability Evaluation as a Component of the OPEN Development Framework Usability Evaluation as a Component of the OPEN Development Framework John Eklund Access Testing Centre and The University of Sydney 112 Alexander Street, Crows Nest NSW 2065 Australia johne@testingcentre.com

More information

NEW DATA REGULATIONS: IS YOUR BUSINESS COMPLIANT?

NEW DATA REGULATIONS: IS YOUR BUSINESS COMPLIANT? NEW DATA REGULATIONS: IS YOUR BUSINESS COMPLIANT? What the new data regulations mean for your business, and how Brennan IT and Microsoft 365 can help. THE REGULATIONS: WHAT YOU NEED TO KNOW Australia:

More information

On The Theoretical Foundation for Data Flow Analysis in Workflow Management

On The Theoretical Foundation for Data Flow Analysis in Workflow Management Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2005 Proceedings Americas Conference on Information Systems (AMCIS) 2005 On The Theoretical Foundation for Data Flow Analysis in

More information

Diplomatic Analysis. Case Study 03: HorizonZero/ZeroHorizon Online Magazine and Database

Diplomatic Analysis. Case Study 03: HorizonZero/ZeroHorizon Online Magazine and Database Diplomatic Analysis Case Study 03: HorizonZero/ZeroHorizon Online Magazine and Database Tracey Krause, UBC December 2006 INTRODUCTION The InterPARES 2 case study 03 was proposed to explore the distinction

More information

GUIDELINES FOR MASTER OF SCIENCE INTERNSHIP THESIS

GUIDELINES FOR MASTER OF SCIENCE INTERNSHIP THESIS GUIDELINES FOR MASTER OF SCIENCE INTERNSHIP THESIS Dear Participant of the MScIS Program, If you have chosen to follow an internship, one of the requirements is to write a Thesis. This document gives you

More information

A METHODOLOGY FOR INFORMATION QUALITY MANAGEMENT IN SELF-HEALING WEB SERVICES (Completed paper)

A METHODOLOGY FOR INFORMATION QUALITY MANAGEMENT IN SELF-HEALING WEB SERVICES (Completed paper) A METHODOLOGY FOR INFORMATION QUALITY MANAGEMENT IN SELF-HEALING WEB SERVICES (Completed paper) Cinzia Cappiello Barbara Pernici Politecnico di Milano, Milano, Italy {cappiell, pernici}@elet.polimi.it

More information

TEL2813/IS2820 Security Management

TEL2813/IS2820 Security Management TEL2813/IS2820 Security Management Security Management Models And Practices Lecture 6 Jan 27, 2005 Introduction To create or maintain a secure environment 1. Design working security plan 2. Implement management

More information

2/18/2009. Introducing Interactive Systems Design and Evaluation: Usability and Users First. Outlines. What is an interactive system

2/18/2009. Introducing Interactive Systems Design and Evaluation: Usability and Users First. Outlines. What is an interactive system Introducing Interactive Systems Design and Evaluation: Usability and Users First Ahmed Seffah Human-Centered Software Engineering Group Department of Computer Science and Software Engineering Concordia

More information

ISO/IEC TR TECHNICAL REPORT. Information technology Procedures for achieving metadata registry (MDR) content consistency Part 1: Data elements

ISO/IEC TR TECHNICAL REPORT. Information technology Procedures for achieving metadata registry (MDR) content consistency Part 1: Data elements TECHNICAL REPORT ISO/IEC TR 20943-1 First edition 2003-08-01 Information technology Procedures for achieving metadata registry (MDR) content consistency Part 1: Data elements Technologies de l'information

More information

Network protocols and. network systems INTRODUCTION CHAPTER

Network protocols and. network systems INTRODUCTION CHAPTER CHAPTER Network protocols and 2 network systems INTRODUCTION The technical area of telecommunications and networking is a mature area of engineering that has experienced significant contributions for more

More information

THE STATE OF IT TRANSFORMATION FOR RETAIL

THE STATE OF IT TRANSFORMATION FOR RETAIL THE STATE OF IT TRANSFORMATION FOR RETAIL An Analysis by Dell EMC and VMware Dell EMC and VMware are helping IT groups at retail organizations transform to business-focused service providers. The State

More information

Digital Archives: Extending the 5S model through NESTOR

Digital Archives: Extending the 5S model through NESTOR Digital Archives: Extending the 5S model through NESTOR Nicola Ferro and Gianmaria Silvello Department of Information Engineering, University of Padua, Italy {ferro, silvello}@dei.unipd.it Abstract. Archives

More information

Generic Statistical Business Process Model

Generic Statistical Business Process Model Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Generic Statistical Business Process Model Version 3.1 December 2008 Prepared by the UNECE Secretariat 1 I. Background 1. The Joint

More information

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2 Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu

More information

Distributed Hybrid MDM, aka Virtual MDM Optional Add-on, for WhamTech SmartData Fabric

Distributed Hybrid MDM, aka Virtual MDM Optional Add-on, for WhamTech SmartData Fabric Distributed Hybrid MDM, aka Virtual MDM Optional Add-on, for WhamTech SmartData Fabric Revision 2.1 Page 1 of 17 www.whamtech.com (972) 991-5700 info@whamtech.com August 2018 Contents Introduction... 3

More information

International Journal of Software and Web Sciences (IJSWS) Web service Selection through QoS agent Web service

International Journal of Software and Web Sciences (IJSWS)   Web service Selection through QoS agent Web service International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

Evaluation and Design Issues of Nordic DC Metadata Creation Tool

Evaluation and Design Issues of Nordic DC Metadata Creation Tool Evaluation and Design Issues of Nordic DC Metadata Creation Tool Preben Hansen SICS Swedish Institute of computer Science Box 1264, SE-164 29 Kista, Sweden preben@sics.se Abstract This paper presents results

More information

Solving the Enterprise Data Dilemma

Solving the Enterprise Data Dilemma Solving the Enterprise Data Dilemma Harmonizing Data Management and Data Governance to Accelerate Actionable Insights Learn More at erwin.com Is Our Company Realizing Value from Our Data? If your business

More information

European Commission - ISA Unit

European Commission - ISA Unit DG DIGIT Unit.D.2 (ISA Unit) European Commission - ISA Unit INTEROPERABILITY QUICK ASSESSMENT TOOLKIT Release Date: 12/06/2018 Doc. Version: 1.1 Document History The following table shows the development

More information

Designing a System Engineering Environment in a structured way

Designing a System Engineering Environment in a structured way Designing a System Engineering Environment in a structured way Anna Todino Ivo Viglietti Bruno Tranchero Leonardo-Finmeccanica Aircraft Division Torino, Italy Copyright held by the authors. Rubén de Juan

More information

Opinion 02/2012 on facial recognition in online and mobile services

Opinion 02/2012 on facial recognition in online and mobile services ARTICLE 29 DATA PROTECTION WORKING PARTY 00727/12/EN WP 192 Opinion 02/2012 on facial recognition in online and mobile services Adopted on 22 March 2012 This Working Party was set up under Article 29 of

More information

XETA: extensible metadata System

XETA: extensible metadata System XETA: extensible metadata System Abstract: This paper presents an extensible metadata system (XETA System) which makes it possible for the user to organize and extend the structure of metadata. We discuss

More information

Exploring the Concept of Temporal Interoperability as a Framework for Digital Preservation*

Exploring the Concept of Temporal Interoperability as a Framework for Digital Preservation* Exploring the Concept of Temporal Interoperability as a Framework for Digital Preservation* Margaret Hedstrom, University of Michigan, Ann Arbor, MI USA Abstract: This paper explores a new way of thinking

More information

data elements (Delsey, 2003) and by providing empirical data on the actual use of the elements in the entire OCLC WorldCat database.

data elements (Delsey, 2003) and by providing empirical data on the actual use of the elements in the entire OCLC WorldCat database. Shawne D. Miksa, William E. Moen, Gregory Snyder, Serhiy Polyakov, Amy Eklund Texas Center for Digital Knowledge, University of North Texas Denton, Texas, U.S.A. Metadata Assistance of the Functional Requirements

More information

What is cloud computing? The enterprise is liable as data controller. Various forms of cloud computing. Data controller

What is cloud computing? The enterprise is liable as data controller. Various forms of cloud computing. Data controller A guide to CLOUD COMPUTING 2014 Cloud computing Businesses that make use of cloud computing are legally liable, and must ensure that personal data is processed in accordance with the relevant legislation

More information

Harmonization of usability measurements in ISO9126 software engineering standards

Harmonization of usability measurements in ISO9126 software engineering standards Harmonization of usability measurements in ISO9126 software engineering standards Laila Cheikhi, Alain Abran and Witold Suryn École de Technologie Supérieure, 1100 Notre-Dame Ouest, Montréal, Canada laila.cheikhi.1@ens.etsmtl.ca,

More information

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING in partnership with Overall handbook to set up a S-DWH CoE: Deliverable: 4.6 Version: 3.1 Date: 3 November 2017 CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING Handbook to set up a S-DWH 1 version 2.1 / 4

More information

Intranets and Organizational Learning: Impact of Metadata Filters on Information Quality, User Satisfaction and Intention to Use

Intranets and Organizational Learning: Impact of Metadata Filters on Information Quality, User Satisfaction and Intention to Use Intranets and Organizational Learning: Impact of Metadata Filters on Information Quality, User Satisfaction and Intention to Use Suparna Goswami suparnag@comp.nus.edu.sg Hock Chuan Chan chanhc@comp.nus.edu.sg

More information

Developing Web-Based Applications Using Model Driven Architecture and Domain Specific Languages

Developing Web-Based Applications Using Model Driven Architecture and Domain Specific Languages Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 2. pp. 287 293. Developing Web-Based Applications Using Model Driven Architecture and Domain

More information

"Charting the Course... ITIL 2011 Managing Across the Lifecycle ( MALC ) Course Summary

Charting the Course... ITIL 2011 Managing Across the Lifecycle ( MALC ) Course Summary Course Summary Description ITIL is a set of best practices guidance that has become a worldwide-adopted framework for IT Service Management by many Public & Private Organizations. Since early 1990, ITIL

More information

International Roaming Charges: Frequently Asked Questions

International Roaming Charges: Frequently Asked Questions MEMO/06/144 Brussels, 28 March 2006 International Roaming Charges: Frequently Asked Questions What is international mobile roaming? International roaming refers to the ability to use your mobile phone

More information

Security Management Models And Practices Feb 5, 2008

Security Management Models And Practices Feb 5, 2008 TEL2813/IS2820 Security Management Security Management Models And Practices Feb 5, 2008 Objectives Overview basic standards and best practices Overview of ISO 17799 Overview of NIST SP documents related

More information