my Grid Requirements for the information repository and management of information in mygrid Document Class: Requirements document

Size: px
Start display at page:

Download "my Grid Requirements for the information repository and management of information in mygrid Document Class: Requirements document"

Transcription

1 my Grid Requirements for the information repository and management of information in mygrid Document Class: Requirements document Document Reference: PL2 Issue No: 0.2 Author: Peter Li Institution: University of Newcastle Date: Pages: 49 Abstract: This document reports on a study to identify the requirements for the information repository and the management of information in the mygrid project. It is intended that the document will provide a reference for the design of the mygrid information repository and stimulate discussion amongst the mygrid developers, investigators, and the wider life sciences community regarding its specification and implementation.

2 0 Document Information 0.1 Table of Contents 0 Document Information Table of Contents Document History Forecast Changes References Introduction Requirements Gathering Terminology Data Storage Data formats Flat files XML files Relational data Data Archiving Metadata Metadata types Technical metadata Contextual metadata Ownership Versioning Life Sciences Identifier Metadata storage Provenance Contents of Provenance Workflows and provenance of MIR usage Query Capability Data Formats Relational data XML data Flat file data Metadata querying Free text searching Distributed Query Processing Types of repositories in mygrid Data Formats Distributed Query Processing and Workflows Wrapping of public databases Views Data Processing and Transformation Data Transformation User-Defined Functions Management Information Service User Management Ref PL1 Issue 0.2 Page 2 of 49

3 10.1 User accounts Project working Notification Distributed Annotation System Security Security Issues Authentication Authorisation Role-Based Access and Privileges Views Auditing mygrid Information Repository Activities Capacity and Performance Data Volume Indexing OGSA-DAI Fault Tolerance Counter measures Transactions Concurrency Control Database Backup and Recovery Personalisation Personal Annotation of Data Integration of Legacy Data Views Notification Profiles Appendix I User requirements for biologists Bio Bio Bio Bio Bio User requirements for specialist bioinformaticians Bioinf Bioinf Bioinf Bioinf User requirements for system administrators S User requirements for tool builders T Appendix II An example of a flat file entry in Swiss-Prot Appendix III An example sequence entry in FASTA format Ref PL1 Issue 0.2 Page 3 of 49

4 0.2 Document History Revision Description of change Initial issue to mygrid WP3 Newcastle. Second draft based on comments received from Paul Watson and Anil Wipat. Released to mygrid WP3 Newcastle and Manchester. The next draft will take into account the comments received from WP3 Manchester, Alan Robinson and Nick Sharman. 0.3 Forecast Changes The requirements described in this document are subject to change based on the outcome of the following issues in mygrid: Requirements for mygrid gathered from industrial users at GlaxoSmithKline, AstraZeneca, Merck and Non-Linear Dynamics. The storage of metadata in the form of RDF. Provision of a mygrid e-lab book. 0.4 References [1] mygrid User Group web pages (2002) [2] Apgar et al. (2002) Life Sciences Identifier (LSID): Draft Specification for Review and Comment. [3] Werner, P. (2002) Life Sciences Identifier (LSID): A Foundation for Wide Area, Scientific Collaboration and Informatics Interoperability. [4] Greenhalgh, C. (2002) Towards a simple operational model for mygrid? pdf [5] Smith, J. et al., (2002) Distributed Query Processing on the Grid. Proc. Grid Computing GRID ed. M. Parashar, Baltimore, USA, November LNCS 2536, Springer Verlag, [6] Watson, P. (2002) Databases and the Grid. Newcastle University Computing Science Technical Report CS-TR-755. [7] IBM DiscoveryLink web pages. [8] Atkinson, M. et al., (2002) Grid database access and integration: Requirements and functionalities. [9] Pearson, D. (2002). Data requirements for the Grid: Scoping study report. [10] Goble, C. et al., (2001) mygrid project proposal. [11] Distributed Annotation System Ref PL1 Issue 0.2 Page 4 of 49

5 1 Introduction This report documents the results of a requirements analysis for the mygrid information repository (MIR) and integration of its data with external user repositories and public databases. This analysis was deemed necessary by the Information Repository Management work package in the mygrid project to provide a basis for the design and implementation of the information repository and its distributed query processing service. The report consists of three sections. Section 1 describes how the requirements were gathered and Section 2 defines the terminology used in this report. Section 3 contains a list of generic requirements for the MIR and distributed querying of data repositories Ref PL1 Issue 0.2 Page 5 of 49

6 2 Requirements Gathering The requirements for the MIR were derived from discussions within the Information Repository Management work package in the mygrid project and resources available on the web. The results of the work undertaken by the mygrid User Group were also considered in this requirements analysis [1]. The mygrid User Group developed a taxonomy of mygrid end users of which there were five main types: biologists, bioinformaticians, application tool builders, system administrators and managers. These user types are underlined in Figure 1. People conforming to these roles were recruited from various academic institutions in the UK and their requirements of the mygrid platform were captured using a semi-structured interview. The results from these user interviews are presented in Appendix I. MyGrid Users Biologists Computer Specialists Managers Rare Users Occasional Users Bioinformaticians Tool Builders Systems Administrators Project Managers Bioinformatics Managers Bioinformatics Tool Builders Figure 1. A taxonomy of mygrid users. The requirements for the MIR were either directly extracted or inferred from the results of the interviews with end-users. The requirements for the MIR from the developers on the mygrid were also considered Ref PL1 Issue 0.2 Page 6 of 49

7 3 Terminology This section introduces several terms used in this document. Data is a collective term for the values assigned to data items and data format describes how this data have been structured. The data is generated by a data producer such as an application program or laboratory experiment. A database is an organised collection of data which is stored and managed by a database management system (DBMS). A service is a capability provided by a resource which could be a data analysis program or a data repository. The personal repository is now referred to in this document as the mygrid information repository (MIR) since this name conveys a more accurate meaning of its role in mygrid. Each mygrid user possesses data which is stored in a MIR deployed by their organisation. Databases can be arbitrarily categorised into three types in the mygrid environment: the MIR, public databases such as EMBL and Swiss-Prot, and external user repositories. These latter repositories contain biological data which is proprietary to a user and/or their research group Ref PL1 Issue 0.2 Page 7 of 49

8 4 Data Storage This section describes the requirements for storing data in the MIR. It describes what formats these data may be structured as and how integrity of the data within these formats should be maintained. The MIR is required to accommodate data generated from laboratory and in silico experiments. Examples of the types of data which might be stored in the MIR include gene sequences, protein structures, signalling pathways and abstracts from scientific papers. Other types of data requiring storage are provenance and intermediary data generated from the execution of workflows. In addition, the workflow definitions constructed by users should also be stored in their personal repositories. 4.1 Data formats In the life sciences, data is structured in a number of different formats: flat files, XML and relational data. Biological data needs to be stored in the MIR in these data formats Flat files A flat file is a file containing data which have no structured interrelationship and no regard to its visual representation. Flat files are a common format for storing textual data in the life sciences. There are various types of flat file formats used to represent the different types of biological data. Examples of flat file formats representing DNA information include EMBL and Genbank. Common file formats holding protein information are Swiss-Prot and PDB. These flat files represent data by flagging each line with an n-lettered code to indicate the type of data present on that line. An example of a Swiss-Prot entry is shown in Appendix II. DNA, RNA and protein sequences are also commonly represented in the FASTA format. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. An example sequence in FASTA format is shown in Appendix III XML files XML format is becoming increasingly common for structuring biological data. Examples of where XML is used as a general framework for annotating data in the life sciences include the Bioinformatic Sequence Markup Language (BSML) and Biopolymer Markup Language (BioML). In mygrid, workflows are being represented using XML in the form of Web Services Flow Language (WSFL). Provenance records generated from the execution of workflows are also being represented using XML. The structure of XML documents are defined by Document Type Definitions and XML schemas. These files should either be stored or be made accessible to the MIR so that they can be used to validate XML files store there and maintain data integrity Relational data Biological data may be structured into relational tables which embody different aspects of the data but contain overlapping information. Users should be able to store relational data in the MIR Ref PL1 Issue 0.2 Page 8 of 49

9 The integrity of relational data must be maintained in the MIR. This involves the ability to control relationships between data using DBMS capabilities such as referential integrity, check constraints and triggers Referential Integrity Referential integrity should be used to enforce the relationships between tables in relational data. For example, a data record in a table cannot exist for a user ID if that user ID does not exist in the Users table Check Constraints Check constraints should be used to enforce rules on a column of data. Examples of the rule which should be enforced using check constraints include protein sequence cannot contain letters J, O, U or X and DNA sequence can only contain the letters A, T, C and G Triggers Triggers are compiled SQL procedures in a database used to perform actions based on other actions that occur in the database. These triggers should be created and executed to enforce rules other than referential integrity between tables before and after the insertion, update and deletion of relational data in the MIR. For example, triggers could be used to remove the dependencies on a piece of data which a user wants deleted. The functionality provided by triggers can also be used as a basis for the notification of data change events in the MIR to facilitate the general mygrid notification service. 4.2 Data Archiving Industrial organisations are compelled to store outdated data for extended periods of time (decades) for legal and regulatory reasons. They will also wish to do this so that potentially valuable information is not lost. This may create significant capacity and performance demands on the MIR. Therefore, a facility for transferring old data to a separate off-line data archive may be required Ref PL1 Issue 0.2 Page 9 of 49

10 5 Metadata This section describes the requirements for metadata and its role in performing data operations in the MIR. 5.1 Metadata types Metadata is a term for data that describes other data. Metadata is important to the MIR since it provides references, context and meaning to the data stored in there. The types of metadata required by the MIR to perform its data operations can be grouped into technical, contextual, currency and ownership metadata Technical metadata Users data stored in the MIR and in external databases need to be described in terms of its technical characteristics. These technical characteristics are location, data structure and data resource characteristics Location The MIR needs to know where data is located in order for it to be retrieved. This is especially important if legacy data belonging to a user is kept in another database as it must be retrievable by the MIR if it is required for processing by mygrid services. The data location may be expressed in the form of a logical reference to the data such as a full file pathname, a uniform resource locator (url) or an object name in a database Data structure The data structure defines the logical groupings of data items and their interrelationships in a data source along with their order of appearance, format, size and type within each logical grouping. Knowledge of the data structure in a data resource is required to navigate through its contents and to be able to directly access specific pieces of data. Examples of data structures include database schemas, such as that for the MIR, which is required by the mygrid portal, and the record structures of flat files, and XML files which are in the form of DTDs and XML schemas Data resource characteristics The technical characteristics of data resources are required for determining effective and efficient methods of discovering, retrieving and managing data from data resources. Size is an important data resource characteristic which might be required for determining where data might be stored and how much local space must be allocated before it is downloaded. Information on data resource characteristics is particularly important for distributed query processing which relies on a data dictionary containing statistics such as the number of processor nodes, distribution of volume over these nodes and availability of the resource Contextual metadata The data stored in the MIR must be furnished with contextual metadata to provide it with meaning and context. The contextual metadata that is used should conform to a specific Ref PL1 Issue 0.2 Page 10 of 49

11 standard that data can be defined accurately and without ambiguity by the MIR and by other mygrid services. Data in the pre-prototype and 0.1 version of the MIR was annotated with their concept type, e.g. These concept type terms were obtained from ontologies provided by the Metadata and Ontologies work package Ownership The MIR must assign an owner to each item of data it stores. This ownership metadata is important for a number of reasons. It is required for maintaining data security in the MIR, for establishing intellectual property rights over data and for users to credit an owner when using their data. Knowledge of the data owner also provides an indication of the quality and value of the data Versioning Data in the life sciences is volatile. An example of a process which generates extremely volatile data is contig assembly. This volatility is caused by updating of data which can involve the amendment or deletion of the data contents, or the addition of new data. However, previous versions of data may need to be retrieved if, for example, a biologist disagrees with the new functional annotation of a protein and needs to re-assess the previous annotated record of the protein. It is therefore essential to be able to distinguish between data in different states which arise and co-exist over time in the MIR by annotating data with versioning metadata Life Sciences Identifier The Life Sciences Identifier (LSID) is a naming scheme for uniquely identifying biological data in federated repositories which has been submitted to the I3C for approval [2]. These data identifiers are unique because they incorporate technical metadata. The LSID is a two-tiered naming system which separates the name and physical location of the data item. The name of the data item is based on the DNS domain of the authority defining the data and a namespace which denotes a particular database to constrain the scope of its object ID which is an alphanumeric identifier unique in the database. An optional field containing a unique integer representing the version of the objectid can also be incorporated into the LSID [3]. An example of a LSID for an entry in a Swiss-Prot database housed at the European Bioinformatics Institute is shown below: Urn:LSID:<AuthorityID>:<NamespaceID>:<ObjectID>:<Version> Urn:LSID:ebi.ac.uk:swiss-prot:P10166:3 Data items in the 0.1 version of the MIR and in other external repositories are identified using an alphanumeric or integer identifier. These identifiers of data items are meaningless without knowledge of the database that they referring to and the location of this database. For example, consider the integer which is the unique identifier for a gene in a database. What database is the identifier referring to and where is physical location of this database? For these reasons, it is suggested that the MIR supports the use of the LSID to distinguish data between multiple mygrid information repositories and other external databases. Adoption of LSID to describe the origin and ownership of data will also benefit the recording of provenance in mygrid. It is essential that the origin of data being analysed is known and this will be conveniently provided by its LSID. In addition, the LSID standard provides a way of incorporating versioning metadata into identifiers and thereby enabling the MIR to distinguish between two or more versions of data Ref PL1 Issue 0.2 Page 11 of 49

12 5.2 Metadata storage There is an issue of where and how metadata should be stored in the mygrid environment so that it can be queried by mygrid services. In mygrid 0.1, the metadata required by the MIR and other mygrid services was stored in the MIR itself. The types of metadata stored in the MIR were defined by its database schema. Consequently, this schema had to be modified every time a new type of metadata had to be stored in the MIR. Resource Description Framework (RDF) has been suggested as a less rigid and more flexible framework for describing data [4]. If RDF repositories are used for metadata storage then the MIR will be required to query and manipulate metadata stored in this format. Another issue is the location of the metadata repository. Since the MIR places huge demands on metadata to perform its data operations, it is natural to continue to store metadata with data in the MIR Ref PL1 Issue 0.2 Page 12 of 49

13 6 Provenance This section describes the role of the MIR in provenance in mygrid. It also describes the relationship between provenance relating to the data operations made by users on the MIR and workflows. 6.1 Contents of Provenance Provenance is a form of metadata that describes the history and origin of data and thereby an indication of the value and trustworthiness of data. This is essential in the life sciences where research is based on analysing data which has been created and maintained by someone else, for example, the DNA and protein entries in the Embl and Swiss-Prot databases maintained by the European Bioinformatics Institute. In a similar fashion, users will also want to know how data in their personal repositories have been generated so that they can ascertain the quality and value of their data. This requires the provenance record to be tightly linked to the data it describes in the MIR. A provenance record can be considered as an audit trail which traces the sourcing, moving and processing of data by recording the metadata describing each of these steps. Metadata associated with a provenance record for MIR data generated by the execution of workflows include its date of creation, owner and the bioinformatics services which were used in creating the data. These types of metadata form a core set of metadata which should be available in provenance records. However, other types of metadata will also be required in provenance records which will be specific to the type of data analysis being performed by the user. These types of metadata will consist of the parameters used in services within workflows. For example, a biologist might want to know what E-value threshold was used in determining homology between sequences in a Blast analysis. This will require a flexible means of adding new metadata into the provenance record for the interpretation and analysis of data by the user. 6.2 Workflows and provenance of MIR usage The operations that are performed by users on data in the MIR can also be considered as metadata which are important to provenance. As more functionality is added to the MIR and used for manipulating and transforming data, writing these operations to the provenance record will be essential. Moreover, the data operation steps in transforming data from one format to another may form the basis of mini-workflows which users may want to repeat, share with other users and incorporate into other workflows Ref PL1 Issue 0.2 Page 13 of 49

14 7 Query Capability This section describes the requirements for querying data in the MIR. 7.1 Data Formats The MIR must have the means to retrieve any data which has been stored there by its users. Since data will be stored as relational data, XML and flat files, the MIR will need to be able to query data at all levels of granularity associated with these formats in order to select the specific piece of data which is required for retrieval Relational data Structured Query Language is the standard language used for querying relational data in relational database management systems. It is therefore essential that the DBMS used for storing relational data implements ANSI SQL. Operators will be required to compare and categorize data, and aggregate functions for summarizing data. Relational data will also have to be sorted using group functions and textually restructured using character functions. More sophisticated database queries will require joining tables to integrate data between relations, and the use of subqueries to define unknown data and to combine multiple queries into one XML data The MIR will need to query all parts of an XML document. This will require the MIR to support an XML query language such as XPath that allows queries to specify the location paths through the data. These location paths are the sequence of XML tags which are required to identify elements and attributes for extraction. The DBMS should also allow elements or attributes in XML documents to be indexed if they are frequently queried Flat file data The querying of data in flat files is addressed in the Data Transformation and Processing section of this document. 7.2 Metadata querying RDF has been suggested as a protocol for storing metadata in mygrid. If this is the case, then the information repository will need to be able to query, retrieve and manipulate metadata in RDF repositories in order for it to perform its data operations. This will require the information repository to support the use of Jena or another application programming interface which can be employed to query and manage RDF. 7.3 Free text searching The ability to query free text found in flat files, XML and relational data stored in the MIR will be a useful function since biologists often want to retrieve information based on a keyword search. For example, a scientist might want to find out what projects are developing a drug compound against tyrosine kinase receptors which will require a search against the MIR using the keywords tyrosine, kinase and drug Ref PL1 Issue 0.2 Page 14 of 49

15 8 Distributed Query Processing This section describes the requirements for distributed query processing (DQP) in mygrid. 8.1 Types of repositories in mygrid Data repositories in the mygrid user environment can be arbitrarily grouped into three types: the MIR, public databases such as EMBL and Swiss-Prot, and external user repositories. These latter repositories are proprietary databases of biological data that they wish to integrate with mygrid. For example, the analyses that users want to perform may involve federating data which have been distributed amongst these three types of repositories. Combining data from these repositories will require DQP for integrating the data contained within them. 8.2 Data Formats The latest work on DQP in the Information Repository Management work package has been involved with federating structured data from relational and object-orientated databases [5]. Since data in the MIR will also be stored as flat files and XML, DQP will be required to integrate data in these formats whilst masking the differences, idiosyncrasies and implementation of the underlying data source from the user. 8.3 Distributed Query Processing and Workflows mygrid users will want to incorporate DQP into their workflows. An example of the use of DQP is to compare data, for example a gene sequence, referenced by the same identifier in local and remote repositories and then send the latest version for analysis to a mygrid service. This step should be incorporated into workflows if the latest versions of data from repositories are required for analysis by the user. 8.4 Wrapping of public databases The DQP service being provided by WP3 will require the databases containing the data being federated to be wrapped using OGSA_DAI Grid services. The use cases provided by the mygrid User Group indicate that the EMBL, SwissProt and PDB/MSD would be popular candidates amongst the biologists and bioinformaticians for OGSA-DAI service wrapping. Using these databases, scientists can obtain information on genes, proteins and protein structures which, for example, can be used to study the relationship between protein sequence and protein structure which is an area of research for Bioinf2 [1]. Furthermore, the biologist referred to as Bio3 in the mygrid user cases studies single nucleotide polymorphisms (SNPs) and she might want to integrate her SNP data, which might be stored in the MIR, with data from EMBL, SwissProt and PDB/MSD to determine whether her SNPs in genes are silent or produce a change in the corresponding protein sequence and protein structure [1]. 8.5 Views See Personalisation section in this document Ref PL1 Issue 0.2 Page 15 of 49

16 9 Data Processing and Transformation This section describes the requirements for transforming data between different formats. It also describes the requirements for processing data held in the information repository into management information and the caveats of producing this type of information. 9.1 Data Transformation The transformation of data between different formats is a task that is frequently undertaken by bioinformaticians and biologists. Much of the in silico analyses of data undertaken by the scientists interviewed by the mygrid User Group involved workflows composed of multiple computational steps of database queries and applications of analytical tools or algorithms. However, users were continually faced with the problem of transforming, exporting or saving their data as a different format that can be imported into another application and this is tedious and time-consuming process. Due to these interoperability problems, there is a requirement for the MIR to seamlessly marshal the flow of data from one application or repository to the next. This involves manipulating or transforming the saved intermediary data into the form required by the following application. The use of XML to store data has solved some of these problems but data present in XML elements may also need to be manipulated. For example, the data analysis performed by the biologist referred to as Bio3 in the user cases involved the identifying signal peptides in protein sequences using a bioinformatics application called SignalP. Since only the first 60 amino acids from protein sequences are required by SignalP to perform its service, Bio3 was required to write a Perl script which extracted the 60 amino acids from the protein sequences which he wanted to analyse User-Defined Functions Due to the diversity in flat file formats in the life sciences, the ability to create and execute userdefined functions (UDFs) will be required for data transformation in the DBMS used to implement the MIR. A UDF is a procedural functionality created in the DBMS using a host programming language that can be incorporated into a SQL statement. Open source libraries such as BioJava containing classes that can manipulate bioinformatics data could be employed in the creation of UDFs. For example, classes are available in BioJava which can be used to transcribe DNA to RNA, perform in silico enzymatic digests of proteins and converting between flat file formats. The MIR mediation of data flow between services in workflows will provide gains in computational performance which will not be realised if data transformation is performed by the user portal or by a third party transformation service. 9.2 Management Information Service Management information systems is a term for the computer systems in an organization that provide information about its business operations. This might be information about sales, inventories and other data that would help in managing the running of the organization. Value can be added to the MIR by processing data from all personal repositories into management information which can then be used to aid managers in their decision making. For example, a manager might need to justify the decision to purchase a licence to use a commercial database of genomic information after the 90-day trial period. Management information on how Ref PL1 Issue 0.2 Page 16 of 49

17 often the genome database has been queried during its trial period would be useful in deciding whether it is worthwhile buying a licence. Generation of management information requires the data stored in the MIR to be processed into such a form which managers can use as an aid to decision-making in managing their organisations. The generation of management information is a service which should be provided by the MIR. This management information service (MIS) can be delivered using data warehousing techniques, triggers, complex SQL queries or a combination of the three. Collection of management information poses a need for responsible information handling since the management information will include a user s personal data. If the MIS is to be used then it should be made publicly known along with the uses of the management information. Data access security needs to be configurable to prevent the contravention of data privacy laws in the country in which the MIR is deployed. Appropriate security measures such as views should be used to restrict access to sensitive data Ref PL1 Issue 0.2 Page 17 of 49

18 10 User Management This section describes the requirements for user management in the MIR. It also describes the requirements for project working amongst users User accounts A stable user management system is mandatory for maintaining the security of the MIR. Users must be properly managed in order to protect the data held in the MIR. New users of the personal repository will require accounts on their organisation s MIR to be created for them with the default profile and access privileges that are required for them to accomplish their duties. In addition, users will need to have their accounts dropped from the MIR if they no longer require access to the personal repository. The ability to alter a user s profile after user creation will be required in case the role undertaken by the user changes Project working Much of the work in the life sciences is undertaken in an environment which requires the sharing of information on a regular basis. A data sharing facility is required which allows people to access project-specific data collections, create and share annotation about data, and view the results of in silico analyses. Managers will need to be able to create projects and provide it with metadata, e.g. project name, for its identification. The project administrator will need to be able to add members to the project and allocate them with the appropriate access permissions and privileges Notification The ability of the MIR to notify users when a piece of data has been inserted, modified (e.g. annotation) or deleted is required for project working. An example of the need for notification during collaborative working is when a user is waiting analyse data which is being generated by a colleague in the project Distributed Annotation System The creation and sharing of annotations on sequence data could be facilitated by supporting the Distributed Annotation System (DAS) in the MIR [11]. The DAS standard allows scientists to view and compare annotations which are distributed across the web in other DAS servers. In addition to the storage of DAS annotations, the MIR will need to act as a Grid-enabled DAS annotation server to allow annotations to be shared amongst mygrid users within project teams, research groups and organisations. A Grid-enabled DAS reference server will also be required to provide the genome maps and sequences on which the DAS annotations in the MIR are based on Ref PL1 Issue 0.2 Page 18 of 49

19 11 Security The section describes the requirements for controlling access to data stored in the MIR Security Issues Protection of data from unauthorised usage is of the utmost importance to mygrid users of the information repository. The use of mygrid will be greatly affected by users confidence in how secure data is in the MIR and also how it will be used. The data in the MIR should be kept secure using the security measures provided by the DBMS that is employed to implement the MIR. Furthermore, the security measures used in the MIR will also have to be integrated with mygrid-wide security processes. Security in the MIR is important for another of reasons. Organisations will need to control who has access to their MIR and what kinds of data they can retrieve for viewing so that unauthorised data is not disclosed. For example, a manager will need to be able to view the data of those mygrid users who are members of his group and to decide what to publish and to whom. Threats involving the malicious and accidental modification and deletion of MIR data also need to be prevented. This might involve the deletion of project data by a hacker or the unauthorised amendment of sequence data by a student. There are four key areas of data security which the MIR must address: authentication, authorisation, privileges and integrity Authentication Authentication is the process by which the DBMS verifies a user s identity. This is the first layer of security that is required from MIR. The MIR will be required to integrate its authentication procedure for users with the security facility for mygrid services provided by the Architecture work package. Their security model is to be based on a X509 certificate authentication mechanism Authorisation Authorisation is the next layer of security required from the MIR. This is the process by which a DBMS obtains information of a user about which data operations they can perform and what database objects they can access on the database. The level and type of data access will vary amongst users of the MIR from a single element in an XML file to all of the data contained in a repository. Authorisation mechanisms must be present in the MIR which can provide this range of granularity in data access Role-Based Access and Privileges The fine level of security control to data access required by the MIR can be achieved through the use of privileges. Four types of privileges to data access can be granted to MIR users: read, write, update and delete. Providing a high number of individuals with different combinations of these access privileges at various levels of granularity to data will make security difficult to manage in the MIR. Instead, roles could be created for MIR users and each role configured with only those privileges to data access that will enable the user to perform their job duties. These roles should be based on the types of users identified by the mygrid User Group: biologist, bioinformatician, system administrator, application tool builder and manager. Database privileges need to be defined for each of these roles so that the correct authority levels are set for data access and Ref PL1 Issue 0.2 Page 19 of 49

20 manipulation, and system administration. A role will need to be associated with users when accounts on the MIR database are created for them. The work undertaken by a user may cross the boundaries of two or more types of mygrid end user. For example, a biologist may also perform activities associated with that for a bioinformatician. In these cases, users will need to be registered with two or more roles in order for them to accomplish their work Views The data stored in the MIR is of a sensitive nature since workflows and provenance contains information about the activities of its users on mygrid. Various mygrid services such as the MIS will require access to this data for retrieval and processing. However, this personal information may need to be provided in an anonymous form prior to its release from the MIR so that users activities cannot be traced back to a specific individual. This requirement is necessary in order to adhere to any data protection and confidentiality legislation which may be present in the country where the MIR is deployed. If personal information is required to be made anonymous, the specific piece of data which leads to the identification of users can be hidden through the use of views in the DBMS. These views are created by predefined queries which can therefore be used as a form of security restricting data, for example usernames, from being accessed by a service or a user Auditing mygrid Information Repository Activities Authentication and authorisation procedures can be employed to control access to data from known users but are not sufficient for preventing unknown or unauthorised access to the MIR. Monitoring the operations made on the MIR can improve the regulation of data access and ultimately prevent unauthorised access. This will require an audit facility which records the data operations performed on the MIR. These operations are then examined to determine whether they were authorised and, if not, who was responsible for performing those unauthorised MIR operations Ref PL1 Issue 0.2 Page 20 of 49

21 12 Capacity and Performance An indication of the volume of data that the MIR will have to accommodate and handle is provided in this section. This section also describes what measures should be adopted by the MIR to improve its scalability and performance Data Volume The growth of data in the life sciences has been fuelled by the high-throughput technologies and the proliferation of computational tools for data analysis and processing. The volume of data in the life sciences is currently estimated to be in the petabyte range [7]. Public databases stores data in the 100s of gigabyte range, for example, EMBL currently stores approximately 150 gigabytes of genomic data [10]. It is difficult to predict in advance how much data will be stored in the MIR. The figure will be dependent on the usage of mygrid which will differ between each user and organisation. However, there are performance requirements which will be required from all MIR regardless of the number of users or the types of organisation it serves. Low response times for complex queries will be required from applications that wish to retrieve subsets of data for further processing such as the management information service provided by WP3. In addition, the MIR will have to support high access throughput to cope with large number of clients simultaneously accessing data. To this end, the MIR must be designed and deployed as a mission-critical database that is scalable to handle the level of performance required by the users. The MIR is more akin to databases found at the heart of financial systems more than a typical scientific database Indexing The large volumes of data, provenance and workflows that will be stored in the MIR will have a detrimental effect on the speed of data query and retrieval. The efficiency by which data is queried and retrieved should be improved by making use of indices in the MIR to reference specific data that are frequently queried by users and mygrid services OGSA-DAI The analyses performed by mygrid users may involve intensive computation over large datasets. This will require the MIR, external user databases and public databases to be wrapped by OGSA-DAI Grid services to provide the efficient transfer of data between the user and the data repositories Ref PL1 Issue 0.2 Page 21 of 49

22 13 Fault Tolerance This section describes the requirements for fault tolerance mechanisms in the MIR. The section also describes what procedures are required to recover the MIR in the event of a database crash Counter measures Procedures must be available to counter situations which might lead to inconsistencies in the data stored by the MIR. For example, the failure of a long-lived workflow that interacts with the MIR could lead to its data not conforming to referential integrity Transactions There is a requirement to conduct transactions in the MIR so that data operations performed by a user or mygrid service can be grouped into units of work. To minimise the amount of recomputation should a failure in a workflow occur, the transactional control commands, Commit, Rollback and Savepoint, must be available and appropriately used in the MIR depending on the successful or unsuccessful execution of a transaction Concurrency Control Collaborative working carries with it the potential for inconsistencies to occur in data that are shared and operated on by users. These inconsistencies arise when the same piece of data is simultaneously being manipulated by data operations in multiple transactions. Concurrency control protocols are required to schedule transactions in such a way that they do not interfere with one another. These concurrency control protocols may involve locking methods which can deny transactions access to a data item if it is already being accessed by another transaction in the MIR or timestamping methods to enable read/write access to data by a transaction if the last update that data has been performed by an older transaction Database Backup and Recovery The MIR will need to be recovered in the event of a failure such as a power failure or a system crash. The following facilities will be required from the MIR for its recovery: A backup mechanism to make off-line backup copies of the MIR at regular intervals which can then be used to restore a crashed MIR. A logging facility to record the current state of transactions and associated data operations. The maintenance of a log file is also required for security in the audit of database operations and for supporting user sessions. A checkpoint facility to record when the MIR and its log file have been synchronised Ref PL1 Issue 0.2 Page 22 of 49

23 14 Personalisation This section describes how the Information Repository Management work package can contribute in the personalisation of mygrid to its users. Personalisation may be defined as the process through which the working environment is altered to suit the needs and individual preferences of users. The MIR together with DQP can address personalisation by storing personal annotation of data, provenance and workflows, integration of legacy data with the MIR, and views of the user s own data in the MIR and in public databases Personal Annotation of Data A laboratory scientist s paper log book acts as a persistent store for their data and provenance in a similar function to the MIR. However, the major difference between these two data stores is that the mygrid 0.1 version of MIR does not record the textual commentary that is also found in log books. This textual commentary is important since it adds further context to the results obtained from experiments from the user s perspective. For example, it is common for scientists to comment on the quality of the data they have generated based on the provenance which has been captured during the running of an experiment. The MIR should provide a facility that allows users to be able to furnish data, provenance and workflows with personal annotation Integration of Legacy Data Most laboratories will have accumulated legacy data during the time it has been in existence which may also form the basis of the research performed by its current scientists. In addition, a mygrid user may not wish to or it might not be practical to store their data in the MIR. In these scenarios, there is a requirement for a facility to integrate the MIR with repositories containing the laboratory s legacy data and external user data to allow it to be analysed by mygrid services. This could be achieved using federation middleware such as IBM s DiscoveryLink software or the DQP service currently being developed by work package 3. If the latter is implemented then there is a requirement for the DBMS used to store legacy data to be wrapped with OGSA-DAI services Views In addition to security, views can be utilized to permit users to access data in a way that is customized to their needs. Most biologists and bioinformaticians interviewed worked on multiple projects running several wet lab and in silico experiments at any one time. The volume of data, provenance and workflows which will be generated from these experiments will soon become difficult to manage without a facility to sort these data into groups. However, the structure and the naming of these groups will differ between each user since it will be dependent on the nature of their analyses and their personal preferences. Views should be used to present the data and its associated provenance and workflows in the MIR in the form of a hierarchical file directory based on metadata that specifies its position in the hierarchy for each data item. Currently, domain entities such as gene and protein sequences can only be sorted into groups in the 0.1 version of the MIR. Once data has been sorted into user-defined groups, views can then be used as a form of data aggregation to provide summaries of data in the MIR. These summaries are required by Ref PL1 Issue 0.2 Page 23 of 49

24 biologists who need to record their wet lab and in silico experiments in paper log books. For example, the mygrid user cases describes a biologist referred to as Bio1 who printed out a hard copy of the results of her bioinformatics analyses for storage. The MIR should allow users to collate data from in silico experiments for presentation in a user-friendly format for printing and pasting into paper log books. Users will only want to view certain types of metadata associated with domain entities, provenance records and workflows which are found in its flat file or XML file. For example, when a user examines proteins, he or she may only want to see the amino acid sequence of a protein, information about its functional domains and its associated journal reference. The MIR should provide a facility which allows users to specify what types of metadata they wish to view along with a domain entity, provenance record or workflow. Furthermore, the stock views of data provided from public repositories may not be suited to the user. The display of data entries in public databases should be amenable to that required by users. For example, the user may not be interested in all the fields of a EMBL entry and should be allowed to choose those fields, e.g. gene name, description and reference which they are interested in reading. Views should be employed to display data that users are only interested in seeing from entries in public repositories. Creating these sorts of views will involve using DQP to select the required data from OGSA-DAI wrapped public repositories Notification All of the scientists interviewed by the mygrid user group indicated that they wished to be notified of changes in the data content of public databases. The changes in data that users wanted to be notified about was dependent upon their area of research. The granularity of the data changes that users wanted to be notified also varied from user to user. For example, Bio1 wanted to be notified of any changes in sequence data relating to chromosome 9 and 10 in humans, whilst Bioinf4 was only interested in changes in those genes expressed in excretory cells in C. elegans Profiles Personalization depends on the gathering of information on the activities of users and persistent storage of this information so that personalisation of mygrid is present when users return to use it. Whilst this information can be directly requested from the user, it can also be actively gathered from data held in the MIR using complex queries similar to those found in the WP3 management information service and used to create a user profile. This user profile can then be employed to tailor mygrid services and components to the work performed by the life scientist. For example, knowledge of a user s field of research can be used configure the notification service so that the user is notified about entries in public databases based on keywords obtained from their user profile. Similarly, information about the services which a user frequently uses can be employed to arrange services or tools on the portal GUI in a manner specific to the needs of the user Ref PL1 Issue 0.2 Page 24 of 49

XML in the bipharmaceutical

XML in the bipharmaceutical XML in the bipharmaceutical sector XML holds out the opportunity to integrate data across both the enterprise and the network of biopharmaceutical alliances - with little technological dislocation and

More information

The GenAlg Project: Developing a New Integrating Data Model, Language, and Tool for Managing and Querying Genomic Information

The GenAlg Project: Developing a New Integrating Data Model, Language, and Tool for Managing and Querying Genomic Information The GenAlg Project: Developing a New Integrating Data Model, Language, and Tool for Managing and Querying Genomic Information Joachim Hammer and Markus Schneider Department of Computer and Information

More information

JISC PALS2 PROJECT: ONIX FOR LICENSING TERMS PHASE 2 (OLT2)

JISC PALS2 PROJECT: ONIX FOR LICENSING TERMS PHASE 2 (OLT2) JISC PALS2 PROJECT: ONIX FOR LICENSING TERMS PHASE 2 (OLT2) Functional requirements and design specification for an ONIX-PL license expression drafting system 1. Introduction This document specifies a

More information

PRISM - FHF The Fred Hollows Foundation

PRISM - FHF The Fred Hollows Foundation PRISM - FHF The Fred Hollows Foundation MY WORKSPACE USER MANUAL Version 1.2 TABLE OF CONTENTS INTRODUCTION... 4 OVERVIEW... 4 THE FHF-PRISM LOGIN SCREEN... 6 LOGGING INTO THE FHF-PRISM... 6 RECOVERING

More information

TARGET2-SECURITIES INFORMATION SECURITY REQUIREMENTS

TARGET2-SECURITIES INFORMATION SECURITY REQUIREMENTS Target2-Securities Project Team TARGET2-SECURITIES INFORMATION SECURITY REQUIREMENTS Reference: T2S-07-0270 Date: 09 October 2007 Version: 0.1 Status: Draft Target2-Securities - User s TABLE OF CONTENTS

More information

T2/T2S CONSOLIDATION USER REQUIREMENTS DOCUMENT SHARED SERVICES (SHRD) FOR

T2/T2S CONSOLIDATION USER REQUIREMENTS DOCUMENT SHARED SERVICES (SHRD) FOR T2/T2S CONSOLIDATION USER REQUIREMENTS DOCUMENT FOR SHARED SERVICES (SHRD) Version: 1.0 Status: FINAL Date: 06/12/2017 Contents 1 EUROSYSTEM SINGLE MARKET INFRASTRUCTURE GATEWAY (ESMIG)... 6 1.1 Overview...

More information

PRINCIPLES AND FUNCTIONAL REQUIREMENTS

PRINCIPLES AND FUNCTIONAL REQUIREMENTS INTERNATIONAL COUNCIL ON ARCHIVES PRINCIPLES AND FUNCTIONAL REQUIREMENTS FOR RECORDS IN ELECTRONIC OFFICE ENVIRONMENTS RECORDKEEPING REQUIREMENTS FOR BUSINESS SYSTEMS THAT DO NOT MANAGE RECORDS OCTOBER

More information

Business Processes for Managing Engineering Documents & Related Data

Business Processes for Managing Engineering Documents & Related Data Business Processes for Managing Engineering Documents & Related Data The essence of good information management in engineering is Prevention of Mistakes Clarity, Accuracy and Efficiency in Searching and

More information

JobRouter Product description Version 3.0

JobRouter Product description Version 3.0 JobRouter Product description Version 3.0 Index Philosophy... 03 Workflow with JobRouter Basics... 04 Defining the term workflow... 04. Displaying processes... 04 Forms of process management... 05 Sequential...

More information

Microsoft SharePoint Server 2013 Plan, Configure & Manage

Microsoft SharePoint Server 2013 Plan, Configure & Manage Microsoft SharePoint Server 2013 Plan, Configure & Manage Course 20331-20332B 5 Days Instructor-led, Hands on Course Information This five day instructor-led course omits the overlap and redundancy that

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

Sparta Systems TrackWise Digital Solution

Sparta Systems TrackWise Digital Solution Systems TrackWise Digital Solution 21 CFR Part 11 and Annex 11 Assessment February 2018 Systems TrackWise Digital Solution Introduction The purpose of this document is to outline the roles and responsibilities

More information

Information Security Controls Policy

Information Security Controls Policy Information Security Controls Policy Classification: Policy Version Number: 1-00 Status: Published Approved by (Board): University Leadership Team Approval Date: 30 January 2018 Effective from: 30 January

More information

Complex Query Formulation Over Diverse Information Sources Using an Ontology

Complex Query Formulation Over Diverse Information Sources Using an Ontology Complex Query Formulation Over Diverse Information Sources Using an Ontology Robert Stevens, Carole Goble, Norman Paton, Sean Bechhofer, Gary Ng, Patricia Baker and Andy Brass Department of Computer Science,

More information

Contents. Microsoft is a registered trademark of Microsoft Corporation. TRAVERSE is a registered trademark of Open Systems Holdings Corp.

Contents. Microsoft is a registered trademark of Microsoft Corporation. TRAVERSE is a registered trademark of Open Systems Holdings Corp. TPLWPT Contents Summary... 1 General Information... 1 Technology... 2 Server Technology... 2 Business Layer... 4 Client Technology... 4 Structure... 4 Ultra-Thin Client Considerations... 7 Internet and

More information

SMD149 - Operating Systems - File systems

SMD149 - Operating Systems - File systems SMD149 - Operating Systems - File systems Roland Parviainen November 21, 2005 1 / 59 Outline Overview Files, directories Data integrity Transaction based file systems 2 / 59 Files Overview Named collection

More information

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data June 2006 Note: This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality,

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

Access Control Policy

Access Control Policy Access Control Policy Version Control Version Date Draft 0.1 25/09/2017 1.0 01/11/2017 Related Polices Information Services Acceptable Use Policy Associate Accounts Policy IT Security for 3 rd Parties,

More information

These are all examples of relatively simple databases. All of the information is textual or referential.

These are all examples of relatively simple databases. All of the information is textual or referential. 1.1. Introduction Databases are pervasive in modern society. So many of our actions and attributes are logged and stored in organised information repositories, or Databases. 1.1.01. Databases Where do

More information

Hippocratic Databases and Fine Grained Access Control

Hippocratic Databases and Fine Grained Access Control Hippocratic Databases and Fine Grained Access Control Li Xiong CS573 Data Privacy and Security Review Anonymity - an individual (or an element) not identifiable within a well-defined set Confidentiality

More information

Content Management for the Defense Intelligence Enterprise

Content Management for the Defense Intelligence Enterprise Gilbane Beacon Guidance on Content Strategies, Practices and Technologies Content Management for the Defense Intelligence Enterprise How XML and the Digital Production Process Transform Information Sharing

More information

Course Description. Audience. Prerequisites. At Course Completion. : Course 40074A : Microsoft SQL Server 2014 for Oracle DBAs

Course Description. Audience. Prerequisites. At Course Completion. : Course 40074A : Microsoft SQL Server 2014 for Oracle DBAs Module Title Duration : Course 40074A : Microsoft SQL Server 2014 for Oracle DBAs : 4 days Course Description This four-day instructor-led course provides students with the knowledge and skills to capitalize

More information

21 CFR Part 11 LIMS Requirements Electronic signatures and records

21 CFR Part 11 LIMS Requirements Electronic signatures and records 21 CFR Part 11 LIMS Requirements Electronic signatures and records Compiled by Perry W. Burton Version 1.0, 16 August 2014 Table of contents 1. Purpose of this document... 1 1.1 Notes to version 1.0...

More information

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services Patrick Wendel Imperial College, London Data Mining and Exploration Middleware for Distributed and Grid Computing,

More information

SECURITY & PRIVACY DOCUMENTATION

SECURITY & PRIVACY DOCUMENTATION Okta s Commitment to Security & Privacy SECURITY & PRIVACY DOCUMENTATION (last updated September 15, 2017) Okta is committed to achieving and preserving the trust of our customers, by providing a comprehensive

More information

Introduction to Databases

Introduction to Databases Introduction to Databases Matthew J. Graham CACR Methods of Computational Science Caltech, 2009 January 27 - Acknowledgements to Julian Bunn and Ed Upchurch what is a database? A structured collection

More information

Althea UK and Ireland Limited Privacy Policy

Althea UK and Ireland Limited Privacy Policy Althea UK and Ireland Limited Privacy Policy BMS Ref: ALT23 V1.1 Page 1 of 7 Document Control Version Date Author/Reviewer/Approver Revision notes V1.0 Draft 18-5-18 Legal Author Authors draft V1.1 20-5-18

More information

Microsoft Core Solutions of Microsoft SharePoint Server 2013

Microsoft Core Solutions of Microsoft SharePoint Server 2013 1800 ULEARN (853 276) www.ddls.com.au Microsoft 20331 - Core Solutions of Microsoft SharePoint Server 2013 Length 5 days Price $4290.00 (inc GST) Version B Overview This course will provide you with the

More information

28 September PI: John Chip Breier, Ph.D. Applied Ocean Physics & Engineering Woods Hole Oceanographic Institution

28 September PI: John Chip Breier, Ph.D. Applied Ocean Physics & Engineering Woods Hole Oceanographic Institution Developing a Particulate Sampling and In Situ Preservation System for High Spatial and Temporal Resolution Studies of Microbial and Biogeochemical Processes 28 September 2010 PI: John Chip Breier, Ph.D.

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Approved 10/15/2015. IDEF Baseline Functional Requirements v1.0

Approved 10/15/2015. IDEF Baseline Functional Requirements v1.0 Approved 10/15/2015 IDEF Baseline Functional Requirements v1.0 IDESG.org IDENTITY ECOSYSTEM STEERING GROUP IDEF Baseline Functional Requirements v1.0 NOTES: (A) The Requirements language is presented in

More information

Data Curation Profile Human Genomics

Data Curation Profile Human Genomics Data Curation Profile Human Genomics Profile Author Profile Author Institution Name Contact J. Carlson N. Brown Purdue University J. Carlson, jrcarlso@purdue.edu Date of Creation October 27, 2009 Date

More information

OIML-CS PD-05 Edition 2

OIML-CS PD-05 Edition 2 PROCEDURAL DOCUMENT OIML-CS PD-05 Edition 2 Processing an application for an OIML Type Evaluation Report and OIML Certificate OIML-CS PD-05 Edition 2 ORGANISATION INTERNATIONALE DE MÉTROLOGIE LÉGALE INTERNATIONAL

More information

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 9 Database Design

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 9 Database Design Database Systems: Design, Implementation, and Management Tenth Edition Chapter 9 Database Design Objectives In this chapter, you will learn: That successful database design must reflect the information

More information

Development of an Ontology-Based Portal for Digital Archive Services

Development of an Ontology-Based Portal for Digital Archive Services Development of an Ontology-Based Portal for Digital Archive Services Ching-Long Yeh Department of Computer Science and Engineering Tatung University 40 Chungshan N. Rd. 3rd Sec. Taipei, 104, Taiwan chingyeh@cse.ttu.edu.tw

More information

Implementation of a reporting workflow to maintain data lineage for major water resource modelling projects

Implementation of a reporting workflow to maintain data lineage for major water resource modelling projects 18 th World IMACS / MODSIM Congress, Cairns, Australia 13-17 July 2009 http://mssanz.org.au/modsim09 Implementation of a reporting workflow to maintain data lineage for major water Merrin, L.E. 1 and S.M.

More information

CHAPTER 3 Implementation of Data warehouse in Data Mining

CHAPTER 3 Implementation of Data warehouse in Data Mining CHAPTER 3 Implementation of Data warehouse in Data Mining 3.1 Introduction to Data Warehousing A data warehouse is storage of convenient, consistent, complete and consolidated data, which is collected

More information

INFORMATION TO BE GIVEN 2

INFORMATION TO BE GIVEN 2 (To be filled out in the EDPS' office) REGISTER NUMBER: 1423 (To be filled out in the EDPS' office) NOTIFICATION FOR PRIOR CHECKING DATE OF SUBMISSION: 03/01/2017 CASE NUMBER: 2017-0015 INSTITUTION: ESMA

More information

Cognizant Careers Portal Privacy Policy ( Policy )

Cognizant Careers Portal Privacy Policy ( Policy ) Cognizant Careers Portal Privacy Policy ( Policy ) Date: 22 March 2017 Introduction This Careers Portal Privacy Policy ("Policy") applies to the Careers portal on the Cognizant website accessed via www.cognizant.com/careers

More information

TIA. Privacy Policy and Cookie Policy 5/25/18

TIA. Privacy Policy and Cookie Policy 5/25/18 TIA Privacy Policy and Cookie Policy 5/25/18 Background: TIA understands that your privacy is important to you and that you care about how your information is used and shared online. We respect and value

More information

Overview. SUSE OpenStack Cloud Monitoring

Overview. SUSE OpenStack Cloud Monitoring Overview SUSE OpenStack Cloud Monitoring Overview SUSE OpenStack Cloud Monitoring Publication Date: 08/04/2017 SUSE LLC 10 Canal Park Drive Suite 200 Cambridge MA 02141 USA https://www.suse.com/documentation

More information

Security Mechanisms I. Key Slide. Key Slide. Security Mechanisms III. Security Mechanisms II

Security Mechanisms I. Key Slide. Key Slide. Security Mechanisms III. Security Mechanisms II Database Facilities One of the main benefits from centralising the implementation data model of a DBMS is that a number of critical facilities can be programmed once against this model and thus be available

More information

Developing a Research Data Policy

Developing a Research Data Policy Developing a Research Data Policy Core Elements of the Content of a Research Data Management Policy This document may be useful for defining research data, explaining what RDM is, illustrating workflows,

More information

HASS RECORD GUIDANCE. Version 1

HASS RECORD GUIDANCE. Version 1 HASS RECORD GUIDANCE Version 1 2013 1 General 1.1 This guidance is intended to help HASS holders make clear, accurate and consistent records, to make the necessary reports to SEPA and therefore allow SEPA

More information

RippleMatch Privacy Policy

RippleMatch Privacy Policy RippleMatch Privacy Policy This Privacy Policy describes the policies and procedures of RippleMatch Inc. ( we, our or us ) on the collection, use and disclosure of your information on https://www.ripplematch.com/

More information

SDMX GLOBAL CONFERENCE

SDMX GLOBAL CONFERENCE SDMX GLOBAL CONFERENCE PARIS 2009 EUROSTAT SDMX REGISTRY (Francesco Rizzo, Bengt-Åke Lindblad - Eurostat) 1. Introduction The SDMX initiative (Statistical Data and Metadata exchange) is aimed at developing

More information

Document Title Ingest Guide for University Electronic Records

Document Title Ingest Guide for University Electronic Records Digital Collections and Archives, Manuscripts & Archives, Document Title Ingest Guide for University Electronic Records Document Number 3.1 Version Draft for Comment 3 rd version Date 09/30/05 NHPRC Grant

More information

This website is managed by Club Systems International on behalf of the Hoburne and Burry and Knight Groups.

This website is managed by Club Systems International on behalf of the Hoburne and Burry and Knight Groups. Privacy Policy This website is managed by Club Systems International on behalf of the Hoburne and Burry and Knight Groups. Your privacy is important to us and this Privacy Policy ( Policy ) provides information

More information

BPS Suite and the OCEG Capability Model. Mapping the OCEG Capability Model to the BPS Suite s product capability.

BPS Suite and the OCEG Capability Model. Mapping the OCEG Capability Model to the BPS Suite s product capability. BPS Suite and the OCEG Capability Model Mapping the OCEG Capability Model to the BPS Suite s product capability. BPS Contents Introduction... 2 GRC activities... 2 BPS and the Capability Model for GRC...

More information

MASTERCARD PRICELESS SPECIALS INDIA PRIVACY POLICY

MASTERCARD PRICELESS SPECIALS INDIA PRIVACY POLICY Effective Date: 12 September 2017 MASTERCARD PRICELESS SPECIALS INDIA PRIVACY POLICY Mastercard respects your privacy. This Privacy Policy describes how we process personal data, the types of personal

More information

testo Comfort Software CFR 4 Instruction manual

testo Comfort Software CFR 4 Instruction manual testo Comfort Software CFR 4 Instruction manual 2 1 Contents 1 Contents 1 Contents... 3 2 Specifications... 4 2.1. Intended purpose... 4 2.2. 21 CFR Part 11 and terminology used... 5 3 First steps... 9

More information

Sherpa Archive Attender. Product Information Guide Version 3.5

Sherpa Archive Attender. Product Information Guide Version 3.5 Sherpa Archive Attender Product Information Guide Version 3.5 Last updated May 28, 2010 Table of Contents Introduction 3 Benefits 4 Reduce Installation and Deployment Time 4 Recover Space on the Exchange

More information

Business Insight Authoring

Business Insight Authoring Business Insight Authoring Getting Started Guide ImageNow Version: 6.7.x Written by: Product Documentation, R&D Date: August 2016 2014 Perceptive Software. All rights reserved CaptureNow, ImageNow, Interact,

More information

Guidance on completing the HASS record form (EPR-RSR10) Making and Amending Records

Guidance on completing the HASS record form (EPR-RSR10) Making and Amending Records Guidance on completing the HASS record form (EPR-RSR10) Making and Amending Records 1a Date record made 1b Replaces record made on 1c Amends information about Most of the information you record about each

More information

Answer: D. Answer: B. Answer: B

Answer: D. Answer: B. Answer: B 1. Management information systems (MIS) A. create and share documents that support day-today office activities C. capture and reproduce the knowledge of an expert problem solver B. process business transactions

More information

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM OMB No. 3137 0071, Exp. Date: 09/30/2015 DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM Introduction: IMLS is committed to expanding public access to IMLS-funded research, data and other digital products:

More information

Advent IM Ltd ISO/IEC 27001:2013 vs

Advent IM Ltd ISO/IEC 27001:2013 vs Advent IM Ltd ISO/IEC 27001:2013 vs 2005 www.advent-im.co.uk 0121 559 6699 bestpractice@advent-im.co.uk Key Findings ISO/IEC 27001:2013 vs. 2005 Controls 1) PDCA as a main driver is now gone with greater

More information

The Clinical Data Repository Provides CPR's Foundation

The Clinical Data Repository Provides CPR's Foundation Tutorials, T.Handler,M.D.,W.Rishel Research Note 6 November 2003 The Clinical Data Repository Provides CPR's Foundation The core of any computer-based patient record system is a permanent data store. The

More information

Grid Security Policy

Grid Security Policy CERN-EDMS-428008 Version 5.7a Page 1 of 9 Joint Security Policy Group Grid Security Policy Date: 10 October 2007 Version: 5.7a Identifier: https://edms.cern.ch/document/428008 Status: Released Author:

More information

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1 Basic Concepts :- 1. What is Data? Data is a collection of facts from which conclusion may be drawn. In computer science, data is anything in a form suitable for use with a computer. Data is often distinguished

More information

20331B: Core Solutions of Microsoft SharePoint Server 2013

20331B: Core Solutions of Microsoft SharePoint Server 2013 20331B: Core Solutions of Microsoft SharePoint Server 2013 Course Details Course Code: Duration: Notes: 20331B 5 days This course syllabus should be used to determine whether the course is appropriate

More information

Interact2 Help and Support

Interact2 Help and Support Sharing content stored in Interact2 Content Collection Contents Terminology... 2 What is the Content Collection?... 2 Student view of the Content Collection in course and subject sites... 4 Summary of

More information

Bioinformatics Data Distribution and Integration via Web Services and XML

Bioinformatics Data Distribution and Integration via Web Services and XML Letter Bioinformatics Data Distribution and Integration via Web Services and XML Xiao Li and Yizheng Zhang* College of Life Science, Sichuan University/Sichuan Key Laboratory of Molecular Biology and Biotechnology,

More information

PORTAL RESOURCES INFORMATION SYSTEM: THE DESIGN AND DEVELOPMENT OF AN ONLINE DATABASE FOR TRACKING WEB RESOURCES.

PORTAL RESOURCES INFORMATION SYSTEM: THE DESIGN AND DEVELOPMENT OF AN ONLINE DATABASE FOR TRACKING WEB RESOURCES. PORTAL RESOURCES INFORMATION SYSTEM: THE DESIGN AND DEVELOPMENT OF AN ONLINE DATABASE FOR TRACKING WEB RESOURCES by Richard Spinks A Master s paper submitted to the faculty of the School of Information

More information

Unit title: IT in Business: Advanced Databases (SCQF level 8)

Unit title: IT in Business: Advanced Databases (SCQF level 8) Higher National Unit Specification General information Unit code: F848 35 Superclass: CD Publication date: January 2017 Source: Scottish Qualifications Authority Version: 02 Unit purpose This unit is designed

More information

Spree Privacy Policy

Spree Privacy Policy Spree Privacy Policy Effective as at 21 November 2018 Introduction Spree respects your privacy and it is important to us that you have an enjoyable experience buying and selling with us but also that you

More information

Guidance for completing High Activity Sealed Source (HASS) report Form EPR-RSR16

Guidance for completing High Activity Sealed Source (HASS) report Form EPR-RSR16 Guidance for completing High Activity Sealed Source (HASS) report Form EPR-RSR16 Please read these guidance notes carefully before you fill in the form. This guidance will help you complete the HASS Reporting

More information

The DPM metamodel detail

The DPM metamodel detail The DPM metamodel detail The EBA process for developing the DPM is supported by interacting tools that are used by policy experts to manage the database data dictionary. The DPM database is designed as

More information

Credentials Policy. Document Summary

Credentials Policy. Document Summary Credentials Policy Document Summary Document ID Credentials Policy Status Approved Information Classification Public Document Version 1.0 May 2017 1. Purpose and Scope The Royal Holloway Credentials Policy

More information

Data Management Glossary

Data Management Glossary Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative

More information

Computer Security Policy

Computer Security Policy Administration and Policy: Computer usage policy B 0.2/3 All systems Computer and Rules for users of the ECMWF computer systems May 1995 Table of Contents 1. The requirement for computer security... 1

More information

Plus500UK Limited. Website and Platform Privacy Policy

Plus500UK Limited. Website and Platform Privacy Policy Plus500UK Limited Website and Platform Privacy Policy Website and Platform Privacy Policy Your privacy and trust are important to us and this Privacy Statement (Statement) provides important information

More information

CA IT Client Manager / CA Unicenter Desktop and Server Management

CA IT Client Manager / CA Unicenter Desktop and Server Management CA GREEN BOOKS CA IT Client Manager / CA Unicenter Desktop and Server Management Object Level Security Best Practices LEGAL NOTICE This publication is based on current information and resource allocations

More information

2.4. Target Audience This document is intended to be read by technical staff involved in the procurement of externally hosted solutions for Diageo.

2.4. Target Audience This document is intended to be read by technical staff involved in the procurement of externally hosted solutions for Diageo. Diageo Third Party Hosting Standard 1. Purpose This document is for technical staff involved in the provision of externally hosted solutions for Diageo. This document defines the requirements that third

More information

Today Learning outcomes LO2

Today Learning outcomes LO2 2015 2016 Phil Smith Today Learning outcomes LO2 On successful completion of this unit you will: 1. Be able to design and implement relational database systems. 2. Requirements. 3. User Interface. I am

More information

Policy Based Security

Policy Based Security BSTTech Consulting Pty Ltd Policy Based Security The implementation of ABAC Security through trusted business processes (policy) and enforced metadata for people, systems and information. Bruce Talbot

More information

CA IdentityMinder. Glossary

CA IdentityMinder. Glossary CA IdentityMinder Glossary 12.6.3 This Documentation, which includes embedded help systems and electronically distributed materials, (hereinafter referred to as the Documentation ) is for your informational

More information

GDPR Draft: Data Access Control and Password Policy

GDPR Draft: Data Access Control and Password Policy wea.org.uk GDPR Draft: Data Access Control and Password Policy Version Number Date of Issue Department Owner 1.2 21/01/2018 ICT Mark Latham-Hall Version 1.2 last updated 27/04/2018 Page 1 Contents GDPR

More information

BECOMING A DATA-DRIVEN BROADCASTER AND DELIVERING A UNIFIED AND PERSONALISED BROADCAST USER EXPERIENCE

BECOMING A DATA-DRIVEN BROADCASTER AND DELIVERING A UNIFIED AND PERSONALISED BROADCAST USER EXPERIENCE BECOMING A DATA-DRIVEN BROADCASTER AND DELIVERING A UNIFIED AND PERSONALISED BROADCAST USER EXPERIENCE M. Barroco EBU Technology & Innovation, Switzerland ABSTRACT Meeting audience expectations is becoming

More information

Microsoft SQL Server Training Course Catalogue. Learning Solutions

Microsoft SQL Server Training Course Catalogue. Learning Solutions Training Course Catalogue Learning Solutions Querying SQL Server 2000 with Transact-SQL Course No: MS2071 Two days Instructor-led-Classroom 2000 The goal of this course is to provide students with the

More information

User Scripting April 14, 2018

User Scripting April 14, 2018 April 14, 2018 Copyright 2013, 2018, Oracle and/or its affiliates. All rights reserved. This software and related documentation are provided under a license agreement containing restrictions on use and

More information

BW C SILWOOD TECHNOLOGY LTD. Safyr Metadata Discovery Software. Safyr User Guide

BW C SILWOOD TECHNOLOGY LTD. Safyr Metadata Discovery Software. Safyr User Guide BW C SILWOOD TECHNOLOGY LTD Safyr Metadata Discovery Software Safyr User Guide S I L W O O D T E C H N O L O G Y L I M I T E D Safyr User Guide Safyr 7.1 This product is subject to the license agreement

More information

INCOMMON FEDERATION: PARTICIPANT OPERATIONAL PRACTICES

INCOMMON FEDERATION: PARTICIPANT OPERATIONAL PRACTICES INCOMMON FEDERATION: PARTICIPANT OPERATIONAL PRACTICES Participation in the InCommon Federation ( Federation ) enables a federation participating organization ("Participant") to use Shibboleth identity

More information

Security Enterprise Identity Mapping

Security Enterprise Identity Mapping System i Security Enterprise Identity Mapping Version 6 Release 1 System i Security Enterprise Identity Mapping Version 6 Release 1 Note Before using this information and the product it supports, be sure

More information

Controls Electronic messaging Information involved in electronic messaging shall be appropriately protected.

Controls Electronic messaging Information involved in electronic messaging shall be appropriately protected. I Use of computers This document is part of the UCISA Information Security Toolkit providing guidance on the policies and processes needed to implement an organisational information security policy. To

More information

DATABASE DEVELOPMENT (H4)

DATABASE DEVELOPMENT (H4) IMIS HIGHER DIPLOMA QUALIFICATIONS DATABASE DEVELOPMENT (H4) December 2017 10:00hrs 13:00hrs DURATION: 3 HOURS Candidates should answer ALL the questions in Part A and THREE of the five questions in Part

More information

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 1 Database Systems

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 1 Database Systems Database Systems: Design, Implementation, and Management Tenth Edition Chapter 1 Database Systems Objectives In this chapter, you will learn: The difference between data and information What a database

More information

Duration: 5 Days Course Code: M20764 Version: B Delivery Method: Elearning (Self-paced)

Duration: 5 Days Course Code: M20764 Version: B Delivery Method: Elearning (Self-paced) Administering a SQL Database Infrastructure Duration: 5 Days Course Code: M20764 Version: B Delivery Method: Elearning (Self-paced) Overview: This five-day instructor-led course provides students who administer

More information

ISO INTERNATIONAL STANDARD. Health informatics Genomic Sequence Variation Markup Language (GSVML)

ISO INTERNATIONAL STANDARD. Health informatics Genomic Sequence Variation Markup Language (GSVML) INTERNATIONAL STANDARD ISO 25720 First edition 2009-08-15 Health informatics Genomic Sequence Variation Markup Language (GSVML) Informatique de santé Langage de balisage de la variation de séquence génomique

More information

Master's Thesis Defense

Master's Thesis Defense Master's Thesis Defense Development of a Data Management Architecture for the Support of Collaborative Computational Biology Lance Feagan 1 Acknowledgements Ed Komp My mentor. Terry Clark My advisor. Victor

More information

Course Outline and Objectives: Database Programming with SQL

Course Outline and Objectives: Database Programming with SQL Introduction to Computer Science and Business Course Outline and Objectives: Database Programming with SQL This is the second portion of the Database Design and Programming with SQL course. In this portion,

More information

Part 11 Compliance SOP

Part 11 Compliance SOP 1.0 Commercial in Confidence 16-Aug-2006 1 of 14 Part 11 Compliance SOP Document No: SOP_0130 Prepared by: David Brown Date: 16-Aug-2006 Version: 1.0 1.0 Commercial in Confidence 16-Aug-2006 2 of 14 Document

More information

Losing Control: Controls, Risks, Governance, and Stewardship of Enterprise Data

Losing Control: Controls, Risks, Governance, and Stewardship of Enterprise Data Losing Control: Controls, Risks, Governance, and Stewardship of Enterprise Data an eprentise white paper tel: 407.591.4950 toll-free: 1.888.943.5363 web: www.eprentise.com Author: Helene Abrams www.eprentise.com

More information

If you require more information that is not included in this document, please contact us and we will be happy to provide you with further detail.

If you require more information that is not included in this document, please contact us and we will be happy to provide you with further detail. Summary This document is an introduction to how Neuxpower has designed and built NXPowerLite for File Servers to be a powerful technology, while respecting customer data and taking a safety-first approach

More information

APF!submission!!draft!Mandatory!data!breach!notification! in!the!ehealth!record!system!guide.!

APF!submission!!draft!Mandatory!data!breach!notification! in!the!ehealth!record!system!guide.! enquiries@privacy.org.au http://www.privacy.org.au/ 28September2012 APFsubmission draftmandatorydatabreachnotification intheehealthrecordsystemguide. The Australian Privacy Foundation (APF) is the country's

More information

SAE MOBILUS USER GUIDE Subscription Login Dashboard Login Subscription Access Administration... 5

SAE MOBILUS USER GUIDE Subscription Login Dashboard Login Subscription Access Administration... 5 May 2018 TABLE OF CONTENTS SAE MOBILUS USER GUIDE 1. Logging into SAE MOBILUS... 2 1.1. Subscription Login... 2 1.2. Dashboard Login... 3 2. Subscription Access... 4 3. Administration... 5 3.1 Testing

More information

normalization are being violated o Apply the rule of Third Normal Form to resolve a violation in the model

normalization are being violated o Apply the rule of Third Normal Form to resolve a violation in the model Database Design Section1 - Introduction 1-1 Introduction to the Oracle Academy o Give examples of jobs, salaries, and opportunities that are possible by participating in the Academy. o Explain how your

More information

11. Architecture of Database Systems

11. Architecture of Database Systems 11. Architecture of Database Systems 11.1 Introduction Software systems generally have an architecture, ie. possessing of a structure (form) and organisation (function). The former describes identifiable

More information

Introduction to SURE

Introduction to SURE Introduction to SURE Contents 1. Introduction... 3 2. What is SURE?... 4 3. Aim and objectives of SURE... 4 4. Overview of the facility... 4 5. SURE operations and design... 5 5.1 Logging on and authentication...

More information