Data management for clinical research

Size: px
Start display at page:

Download "Data management for clinical research"

Transcription

1 Data management for clinical research by W. L. SIBLEY, M. D. HOPWOOD, G. F. GRONER, W. H. JOSEPHS and N. A. PALLEY The RAND Corporation Santa Monica, California ABSTRACT This paper discusses a prototype system intended for the personal use of physicians engaged in ciinical research. In particular, the prototype is a highly integrated, interactive, minicomputer based data management and analysis system. The facilities offered by the system allow the clinical investigator to store, recall, analyze, and display his research data without resorting to computer programming. Modem data base techniques are available to the physician as aids in organizing, storing, and retrieving his data. The data base concepts are expressed in terms that are familiar to a clinical researcher. PURPOSE The purpose of this paper is to discuss a prototype system intended for the personal use of physicians engaged in clinical research. In particular, the prototype is a highly integrated, interactive, minicomputer based data management and analysis system. THE CLINFO PROJECT Th~ c:linfq,pr()totyp~.~~ta manag~m~ut and anab:~is system described in this paper has been developed as part of the CLINFO project, a scientific inquiry sponsored by the Division of Research Resources (DRR) of the National Institutes of Health (NIH). The goals of the project are to identify and characterize the information analytic tasks and the information flows in clinical research, and to develop methods for facilitating these tasks and flows. The project is being conducted by clinical investigators at the Baylor College of Medicine, the University of Washington, the University of Oklahoma, and Vanderbilt University; by information scientists at the Rand Corporation; and by staff members of the DRR. We have thus far (1) interviewed clinical investigators both formally and informally., (2) characterized their information processing needs, (3) identified data management and analysis as major problems, (4) examined existing computer systems aimed at satisfying those needs, (5) designed, built, and installed three copies of a prototype system, and (6) started the analysis of instrumentation data and user interviews to help us evaluate the system. References 1, 2, 3, and 4 discuss various aspects of the foregoing points. CLINICAL STUDY DATA VOLUMES AND ACTIVITIES A General Clinical Research Center (GCRC) provides facilities which clinical research personnel can use in conducting a variety of medical studies. Each study tends to be separate and distinct from other studies and provides a natural focal point for data collection and analysis. A large study involves approximately 75 subjects (patients) and a total collection of approximately 67,000 data values. 1 Generally, a clinical study involves the following activities: During the study design the investigator decides what data are to be collected, at what rate, and in what volume. As the study progresses, data are collected and recorded in a central data file. Considerable care is taken to ensure the validity of those data. As the study progresses, the investigator reviews and sum~~r!:z~~ th~ d~t"~."p~i1 of the::. pro~,ess, of S,Un::Wlllrizahon Involves transcribing the data into a form suitable for statistical analysis as well as the preparation of plots, graphs, and reports. The CLINFO prototype is designed to accommodate 25 such studies. SYSTEM OVERVIEW The prototype is implemented on a commercial minicomputer (Data General Eclipse S2(0) using multi-user extended BASIC.5 Each study has its own portion of a 25 million character disk. When an investigator (or a member of his staff) uses the system, he interacts with the CLINFO software (primarily by responding to prompts) and is prevented from using BASIC procedures to affect his data. Figure 1 illustrates the main features of the CLINFO 63

2 64 National Computer Conference, 1977 ( DESCRIBE }--s Vl UPjTE ~ C ( ENTER )-- ~_V' M I PATIENTS I S T o U A 0 ( RETRIEVE )-+ ~-i T '{ I SUBSETS I A I ( ANALYZEDISPLAY }--_I WORKSHEETS I-1COMM FILE I Figure I-Command(---) and data (-) flow in the CLINFO prototype for a single protocol prototype. The ellipses encompass the three general activities which were mentioned above. The rectangles indicate the various ways in which data (or descriptions of data) are stored in the prototype. The diagram encompasses a single study (or research protocol). This paper is concerned primarily with the activities labeled "DESCRIBE," "EN TER" and "RETRIEVE" and the data contained in the "SCHEMA," "UPDATE," "PATIENTS," "SUBSETS," and the "STUDY DATA" structures. The "ANALYZE DISPLAY," "INPUT," "OUTPUT," and "BASIC" activities and their associated structures will be described, but in less detail. THE DATA BASE The organization of a CLINFO study data base is predicated on three observations: The fundamental unit of analysis is usually the patient. Data are collected about that patient over time. The data have natural groupings in time (e.g., the vital signs of a patient are sampled at essentially the same time). The time-oriented nature of clinical research data has been discussed by Fries. 6 Dr. Fries' ideas have been further reinforced by our observations of the current manual techniques for recording that data. For example, data are recorded in patient flowsheets (one for each patient) which are two-dimensional arrays with the rows representing the data items to be measured and the columns containing the values of an item for different times of measurement. We have further observed that some of the rows can be grouped naturally into related items (examples of such groups are vital signs, blood serum tests, urine analyses, etc.). However, the groupings differ widely from study to study. The CLINFO term for a grouping is PANEL." The concept parallels records or groups in the CODASYL Data Base Task Group (DBTG) report. 7 PATIENTSSUBSETS The key structure is the PATIENT file which contains an eight character abbreviation for each patient (assigned by the clinician when the patient is added to the study) and an internal numeric key (assigned by the system). This file can contain up to 392 different entries. The entries are ordered alphabetically on the patient abbreviation to allow for efficient searching. A subset is a named file (e.g., LOWBLOOD) with the same structure as the PATIENT file but containing patients with particular properties. No particular patient ordering is maintained for subsets. There can be as many subsets as space allows. The numeric key associated with each patient references a block of data in a PATIENT DIRECTORY (not shown). That block contains some redundant identification data and two tables. The first table contains pointers to the patient's set of panels (see above) in the STUDY DATA file and counts of the numbers of instances of the panels. The second contains data concerning the state of the various EVENTS (to be described later) for this patient. DESCRIBESCHEMA The SCHEMA 7 provides the means for adapting the CLINFO data base structure to the needs of a particular investigator. It is a description of the panels and their contents. The schema exists in two forms: the external textual form and the internal compiled form. The former is for human interaction, the latter is for computer processing. The DESCRIBE activity is in essence an interactive editor which allows the clinician to create, edit, and compile his schema. The schema describes three main entities: ITEMS of data, PANELS of items, and EVENTS triggered by the occurrence of certain values for particular items. PANELS have eight character names, and references to them by name imply reference to all of their items. There are two types of panels, numeric and textual. The type of the panel determines the types of the data items included in that panel. Each panel can contain at most 30 items, all of which are either 32-bit floating decimal numbers, or character strings at most 70 characters in length. A sample panel entry in the schema might be PANEL 1 hist,patient ID & history,numeric; Reading from left to right, this is the first panel (PANEL 1), its name is "hist ", it contains "patient ID & history," and its items are all numeric. ITEMS also have eight character names which are used to reference the data associated with an item in a wide variety of circumstances. Again, an item may either be numeric or textual. In addition, there are parts of the item description that serve to screen, validate, and possibly encode the value associated with that item. For example: ITEM 9 HT,Height,inches,num,range,(55,80),no;

3 Data Management for Clinical Research 65 Reading from left to right, this is the ninth item in the schema (ITEM 9), its name is "HT ", its values represent height in inches, its values are numeric and must lie in the range 55 to 80 inches inclusive, and its value need not be considered confidential ("no"). "Height" and "inches" are purely descriptive and do not imply automatic units checking by the system. The sequence "num,range,(55,80)" illustrates one option available for data screening. Other options for items in numeric panels are: date,range( , ) the item value is a date in the indicated range ( to inclusive). time,range(1300, 1330) the item value is a time in the indicated range (13:00 to 13:30 inclusive). num,check(4),(7,10,6,5) the item value is numeric and one of the listed 4 values. char,code(3),(yes,no,unk ) the item value is externally one of the listed 3 character strings and is carried internally as one less than the ordinal position of the string in the list (i.e., 0, 1, or 2). the data types date, time, and num may require no validating, e.g., num,none. The textual panels and items are of the form: PANEL 2,demo ITEM 20,name,demographic data,text;,patient name,text(30); The item "name" will then be allocated 30 characters in each occurrence of a "demo" panel. EVENTS are time markers maintained by the CLINFO prototype system to aid the user in the retrieval of his data. It is common in clinical research to relate the response of a patient to a procedure by referencing some events which marks the application of that procedure. For example, blood sugar levels are measured at regular intervals after the ingestion of glucose. Events provide the means for cieaiing with relative time. A sample event is: EVENT 3 test,glucose ingest,gluc,first; Reading from left to right, this is the third event in the schema, its name is "test," it relates to glucose ingestion, and it records the time at which the item "gluc" FIRST takes on a value. Time is taken here to mean a combination of date and time. Events may be triggered by the "FIRST," "LAST,""MAX," and "MIN" values of its governing item. In addition, an event may be triggered at the first or last time the item takes on a particular value. When the clinician is satisfied with his description of the data he intends to collect, he compiles the schema (the system provides appropriate feedback about errors in the compilation). He then requests that disk space be allocated to accommodate his study. He assists the allocation by estimating the number of patients he expects to include in his study and an approximate number of occurrences of each panel type for the average patient. He is then ready to start entering patients and their data into the data base. ENTER The main function of the ENTER activity is to process data (entered interactively) in the light of the data screening specifications contained in the schema. The data that are accepted are not passed directly to the STUDY DATA file, but are recorded in an UPDATE file whose contents are merged with the STUDY DATA file at some convenient time. This buffering of the input data allows a simple structure for the STUDY DATA file, reduces the exposure of the system to computer malfunctions, and provides a means of reviewing the input data before it is actually recorded in the STUDY DATA file. Again, a CLINFO data base is patient- and time-oriented. In order to enter data, a "context" must be established for that data. That is, the data must Belong to a patient already in the PATIENT file. Have an associated date and time of day at which the data are assumed to have been sampled. After a context has been established (by interactive prompts and responses) the clinician may then enter values for items by either Typing the name of an item followed by its value or Typing the name of a panel, whereupon the system prompts for each of the items in that panel. The context stays in force for all data entered subsequently until the clinician desires to change it. The context may be changed at any time by typing "patient:," "date:," or "time:" followed by the corresponding value. As each value (including patient aijbreviations. dates. times, and panel and item names) is entered, that value is checked for reasonableness. The patient abbreviation must be in the PATIENT file; dates and times must have the correct format; panel and item names must exist in the schema; the value for an item must satisfy the criteria in the schema. Incorrect values are refused and re-prompted for by the system. An exception is the case of an out-of-range numeric value which may be forced upon the system by entering the initials of the data enterer. Incorrect but reasonable values that have been accepted by the system may be corrected by re-typing them (perhaps after reestablishing the correct context). ENTER has three additional functions: Adding patient abbreviations to the PATIENT file. Reviewing data in the UPDATE file. Copying and screening data from a worksheet into the UPDATE file.

4 66 National Computer Conference, 1977 As an alternative to item-by-item data entry, an entire twodimensional array (Le., a "worksheet") of data may be entered. A worksheet is a two-dimensional array of numeric entries extended to include 8-character labels for the rows and columns as well as a worksheet name, title, date of creation and date of last modification. A worksheet is stored by the system as a file with the same name as that of the worksheet. Worksheets provide the data organization required by the ANALYZEDISPLAY activities. The row and column labels can be used to reference patients, relative times, and items. These labels are used to direct the data entry process. Data entry via worksheets provides the facility for entering "derived" data (i.e., data produced as the result of the analysis of raw data) into the STUDY DATA file. It also provides another data entry route that may be more convenient in some circumstances. As indicated above, the clinician can interactively review the contents of the UPDATE file and edit it to the extent of deleting blocks of data. When he is satisfied that the data are correct, the clinician can request that the UPDATE file be merged with the data already in the STUDY DATA file. The newly entered data are organized automatically into panels, those panels are marked with the internal patient identifier, the panel number, the date and time of sample, the date and time of data entry, and the initials of the data enterer, and the results are merged by date and time of sample. The merge process checks for existing instances of panels with the same date and time as those being merged and for item values previously entered in those instances. The investigator is now in a position to retrieve his data from the STUDY :bat A file and to format it in a variety of ways: into worksheets, as a display at the terminal, or as a file to be passed to an externally generated applications p~ckage (Such a file is a "communication" file in CLINFO terminology.). RETRIEVE The data in the STUDY DATA file may be viewed as values associated with points in a three dimensional space. Figure 2 illustrates that space. The axes are labeled "PA TIENT", "TIME," and "ITEM" to reflect the parameters that uniquely identify a data value. The three dimensional space is sparsely populated with data points, especially if the TIME axis is taken to represent absolute dates and times. Moreover, not all data are collected about all patients for all of the expected times. It is the intent of the RETRIEVE activity to identify a planar slice of the data space and to abstract part of that slice into a two-dimensional form (i.e., a worksheet). Selecting a slice as illustrated in Figure 2 results in what was referred to above as a flowsheet for that patient. A slice parallel to the PATIENT-ITEM plane produces an array of items for a group of patients at a particular time. That time may be either an absolute date and time, or it can be a time relative to an event (e.g., two hours after glucose ingestion). Finally, a slice parallel to the PATIENT-TIME plane P A T I E N T P ITEM Figure 2-A three-dimensional representation of a time-oriented data file. I-I is a panel of items for patient P at time T results in a worksheet containing the time history of a particular item for a group of patients (again either absolute or relative time). The clinician constructs his retrieval request interactively under the control of the schema. In particular, coded items (e.g., sex may be either "male" or "female", but is encoded as either 0 or 1) are referred to in terms of the external form ("male," "female") while the system deals with them as numeric values (0 or 1). The values of items playa role if the clinician wishes to restrict the panels being retrieved by specifying conditions that the values must satisfy. When he completes the specification, the system surveys the appropriate parts of the STUDY DATA file and produces a worksheet with the rows and columns appropriately labeled. The retrievals discussed above are limited to numeric (or coded) items. However, textual panels may be retrieved and displayed at the terminal or passed in their entirety to a communication file. This facility allows the clinician to review his notes about a patient or to pass demographic data to a BASIC program that, for example, produces mailing lists. For those retrievals in which a list of patient abbreviations is appropriate (e.g., a slice parallel to the PATIENT ITEM plane), the investigator may provide that list by giving the name of a SUBSET. SUBSETS As mentioned above, subsets are files containing patient abbreviations and numeric keys. The files have the same structure as the PATIENT file.

5 Data Management for Ciinicai Research 67 Subsets are created in a variety of ways: By listing patient abbreviations. By the usual set operations (e.g., intersection and union) among existing subsets. By requiring that a patient's data in the STUDY DATA file satisfy a set of conditions. By requiring that a patient's data in a worksheet (e.g., the values in a row labeled with the patient's abbreviation) satisfy a set of conditions. The sets of conditions mentioned in the last two members of the above list are made up of lists of either conjunctions or disjunctions (but not both) of range restrictions on the values of items. A sample set of conditions is: If 30:s; age :s; 65 and if 120:s; systol :s; 140 and if sex = male The last line illustrates the use of a coded item which takes on one of a set of discrete values. In addition to the conditions placed on item values, the time at which a sample was taken can play a part in the selection. For example, the above conditions might be required to hold in the third hour after the ingestion of glucose. In any case, the usual result of a retrieval using either a subset or all of the PATIENT file is a worksheet. ANAL YZEDISPLA Y The CLINFO approach to data analysis is to abstract the desired data from the STUDY DATA file (which can cause very intensive accessing of the disk storage) into a worksheet which fits into the user's working space in main memory. The worksheet can then be manipulated efficiently, especially since its contents need not be re-abstracted from the STUDY DATA file each time a new question is asked of those contents. A sample worksheet has the format illustrated in Figure 3. WORKSHEET CLINICAL TITLE Clinical Characteristics of Diabetic Patients CREATED MODIFIED # OF ROWS 7 # OF COLS 4 ROWSCOLS 2 4 LABELS age sex %idl wt dur diab 1 case 1 27 male case 2 24 female case 3 32 male case 4 34 male case 5 21 male case 6 23 male 7 case 7 28 female 99 JO (Note:... indicates missing values) Figure 3-A sample worksheet A worksheet can hold approximately 2200 entries and its dimensions can vary within that restriction. In the case above, the row labels are patient abbreviations and the column labels are item names. These labels and the contents of the worksheet would ordinarily be supplied by the retrieval process. In addition, Figure 1 indicates an INPUT facility that allows the direct creation of worksheets, the labelling of their rows and columns, and the entry of data into those rows and columns. This facility allows the investigator to use CLINFO's analysis and display features independently of the STUDY DATA file. CLINFO maintains a "current" worksheet so that the same worksheet can be used in a variety of situations without the user respecifying it as the worksheet of interest. Most of the worksheet analysis and display functions apply to the current worksheet. The main worksheet manipulation facilities are: Select a worksheet as the current one by typing its name. If there is no worksheet by that name, create one as specified by the user. Display a worksheet at the terminal (see Figure 3). Label worksheet rows or columns. Enter data into a worksheet by row or by column or by individual cell. Discard (destroy) a worksheet. Sort a worksheet by rows or columns. Edit a worksheet by adding, deleting, or moving rows or columns. Copy a sub-array of a worksheet, with labels, into the same or another worksheet. Print a worksheet on the system printer. Display a list of all the worksheets in this study. Worksheets created by the retrieval process or by direct means can be used by the analysis facilities. In general, the analyses can be made to apply to sub-arrays within the body of a worksheet. The results of some analyses may be stored in worksheets to minimize transcription and to make those results available for further analysis. The CLINFO analysis facilities include: Descriptive Statistics (means, standard deviations, etc.). T Test. Chi Square Test. Linear Regression (simple and multiple). Analysis of Variance. Cross Tabs. Scatter Plots and Bar Charts. Histograms. Frequency Distributions. Normality Test. Non-parametric Paired Tests. In addition, special calculations may be performed on a worksheet, and the result of these calculations are stored in the worksheet. This facility replaces in part the need for special purpose computer programming. These calculations

6 68 National Computer Conference, 1977 are: Generate values in a row (or column) as the result of evaluating a single algebraic-like expression involving other rows (or columns) as variables. Generate truth values (0, 1) on the same basis. Perform special calculations such as cumulative sum, time differences, etc. SUMMARY The foregoing discussion does not describe the properties and facilities of the CLINFO prototype in their entirety. The intent is rather to outline those properties and facilities and to emphasize the idea of a personal, integrated, highly interactive system placed in the hands of personnel whose primary interest is to use the system as a tool. At least one CLINFO prototype has been in daily use by medical personnel since January of The acceptance of the system seems to be good and it is being used in productive ways. Moreover it is being used personally by senior medical staff, that is, the research staff for which it was designed. Our experience to date indicates that modern data base techniques (e.g. the SCHEMA), when expressed in the user's own terminology, are readily understood, accepted, and used by those personnel. The use of terms such as "PATIENT", "PANEL", and "WORKSHEET" and the time orientation of the STUDY DATA file are not accidental. They play an important role in making the computer scientist's techniques understandable by and useful to the clinical researcher. Another point of interest is the utility of two distinct data structures, the STUDY DATA file (and its associated files) and worksheets. The STUDY DATA file acts as a pool of data whose contents can be abstracted and arranged into a form more suitable for viewing and analyzing (i.e., a worksheet). In addition, worksheets serve to store data and to communicate them between steps in the analysis process. ACKNOWLEDGMENTS Although the authors accept the responsibility for the words in this paper, a much wider group is responsible for the content those words express. We would like to express our appreciation to our co-contractors Howard K. Thompson, Jr., M.D., at the Baylor College of Medicine, T. Graham Christopher, M.D., at the University of Washington, and Arthur W. Nunnery, M.D., at the University of Oklahoma, our project officer, William R. Baker, Jr., Ph.D., of the NIH, and to our colleague Thomas L. Lincoln, M.D., of the Rand Corporation. They have all contributed significantly to the design and continuing development of the CLINFO prototype. This research was supported by Contract No. NOI-RR from the Division of Research Resources, National Institutes of Health. REFERENCES I. Palley, N. A., and G. F. Groner, "Information Processing Needs and Practices of Clinical Investigators-Survey Results," AFIPS Conference Proceedings (1975), Vol. 44, AFIPS Press, Montvale, New Jersey, 1975, pp Groner, G. F., N. A. Palley and N. Z. Shapiro, "A Structural Characterization of Clinical Research Participants and Their Activities," R NIH, The Rand Corporation, January Groner, G. F., M. D. Hopwood, N. A. Palley, N. Z. Shapiro, and W. L. Sibley, "A Plan for the Development and Evaluation of a Data Management and Analysis System for Clinical Investigators," R-1541-NIH, The Rand Corporation, August Sibley, W. L., M. D. Hopwood, G. F. Groner, and N. A. Palley, "A Prototype Data Management and Analysis System for Clinical Investigators: An Initial Functional Description," R-1621-NIH, The Rand Corporation, August Data General Corporation, Extended BASIC User's Manual, publication , September Fries, James F., "Time-Oriented Patient Records and A Computer Databank," The Journal of the American Medical Association, Vol. 222, December 18, CODASYL Data Base Task Group (DBTG) Report, April Available from the Association for Computing Machinery.

Creating a data file and entering data

Creating a data file and entering data 4 Creating a data file and entering data There are a number of stages in the process of setting up a data file and analysing the data. The flow chart shown on the next page outlines the main steps that

More information

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

Math 227 EXCEL / MEGASTAT Guide

Math 227 EXCEL / MEGASTAT Guide Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf

More information

Overview of the Multi-Payer Claims Database (MPCD)

Overview of the Multi-Payer Claims Database (MPCD) Overview of the Multi-Payer Claims Database (MPCD) Genesis of the MPCD The MPCD project is one of a number of initiatives related to comparative effectiveness research (CER) funded by the American Recovery

More information

Canadian National Longitudinal Survey of Children and Youth (NLSCY)

Canadian National Longitudinal Survey of Children and Youth (NLSCY) Canadian National Longitudinal Survey of Children and Youth (NLSCY) Fathom workshop activity For more information about the survey, see: http://www.statcan.ca/ Daily/English/990706/ d990706a.htm Notice

More information

Excel 2010 with XLSTAT

Excel 2010 with XLSTAT Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with

More information

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file 1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/

More information

MINITAB 17 BASICS REFERENCE GUIDE

MINITAB 17 BASICS REFERENCE GUIDE MINITAB 17 BASICS REFERENCE GUIDE Dr. Nancy Pfenning September 2013 After starting MINITAB, you'll see a Session window above and a worksheet below. The Session window displays non-graphical output such

More information

Select Cases. Select Cases GRAPHS. The Select Cases command excludes from further. selection criteria. Select Use filter variables

Select Cases. Select Cases GRAPHS. The Select Cases command excludes from further. selection criteria. Select Use filter variables Select Cases GRAPHS The Select Cases command excludes from further analysis all those cases that do not meet specified selection criteria. Select Cases For a subset of the datafile, use Select Cases. In

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

New Employee Orientation. Nursing Documentation For PCT/SPCT

New Employee Orientation. Nursing Documentation For PCT/SPCT New Employee Orientation Nursing Documentation For PCT/SPCT Obtaining a Form to Chart Scheduled tasks from physician orders and scheduled routine nursing documentation forms will display on the task list.

More information

Organisation and Presentation of Data in Medical Research Dr K Saji.MD(Hom)

Organisation and Presentation of Data in Medical Research Dr K Saji.MD(Hom) Organisation and Presentation of Data in Medical Research Dr K Saji.MD(Hom) Any data collected by a research or reference also known as raw data are always in an unorganized form and need to be organized

More information

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University While your data tables or spreadsheets may look good to

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Analytics:- Class Room: Training Fee & Duration : 23K & 3 Months Online: Training Fee & Duration : 25K & 3 Months Learning SAS: Getting Started with SAS Basic

More information

MATH 117 Statistical Methods for Management I Chapter Two

MATH 117 Statistical Methods for Management I Chapter Two Jubail University College MATH 117 Statistical Methods for Management I Chapter Two There are a wide variety of ways to summarize, organize, and present data: I. Tables 1. Distribution Table (Categorical

More information

ehepqual- HCV Quality of Care Performance Measure Program

ehepqual- HCV Quality of Care Performance Measure Program NEW YORK STATE DEPARTMENT OF HEALTH AIDS INSTITUTE ehepqual- HCV Quality of Care Performance Measure Program USERS GUIDE A GUIDE FOR PRIMARY CARE AND HEPATITIS C CARE PROVIDERS * * For use with ehepqual,

More information

Spreadsheet definition: Starting a New Excel Worksheet: Navigating Through an Excel Worksheet

Spreadsheet definition: Starting a New Excel Worksheet: Navigating Through an Excel Worksheet Copyright 1 99 Spreadsheet definition: A spreadsheet stores and manipulates data that lends itself to being stored in a table type format (e.g. Accounts, Science Experiments, Mathematical Trends, Statistics,

More information

CERTIFICATE IN BIG DATA TECHNIQUES ON SMALL DATA FORMAT UTILIZING MICROSOFT EXCEL

CERTIFICATE IN BIG DATA TECHNIQUES ON SMALL DATA FORMAT UTILIZING MICROSOFT EXCEL CERTIFICATE IN BIG DATA TECHNIQUES ON SMALL DATA FORMAT UTILIZING MICROSOFT EXCEL Conducted by: Palani Murugappan Contact Palani / Aaron : http://www.malaysia-training.com Certificate from the United Kingdom

More information

Clearing Out Legacy Electronic Records

Clearing Out Legacy Electronic Records For whom is this guidance intended? Clearing Out Legacy Electronic Records This guidance is intended for any member of University staff who has a sizeable collection of old electronic records, such as

More information

ASSOCIATION BETWEEN VARIABLES: SCATTERGRAMS (Like Father, Like Son)

ASSOCIATION BETWEEN VARIABLES: SCATTERGRAMS (Like Father, Like Son) POLI 300 Handouts #11 N. R. Miller ASSOCIATION BETWEEN VARIABLES: SCATTERGRAMS (Like Father, Like Son) Though it is not especially relevant to political science, suppose we want to research the following

More information

To make sense of data, you can start by answering the following questions:

To make sense of data, you can start by answering the following questions: Taken from the Introductory Biology 1, 181 lab manual, Biological Sciences, Copyright NCSU (with appreciation to Dr. Miriam Ferzli--author of this appendix of the lab manual). Appendix : Understanding

More information

Tutorial #1: Using Latent GOLD choice to Estimate Discrete Choice Models

Tutorial #1: Using Latent GOLD choice to Estimate Discrete Choice Models Tutorial #1: Using Latent GOLD choice to Estimate Discrete Choice Models In this tutorial, we analyze data from a simple choice-based conjoint (CBC) experiment designed to estimate market shares (choice

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) Clinical SAS:- Class Room: Training Fee & Duration : 23K & 3 Months Online: Training Fee & Duration : 25K & 3 Months Learning SAS: Getting Started with SAS Basic

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Reviewers Guide on Clinical Trials

Reviewers Guide on Clinical Trials Reviewers Guide on Clinical Trials Office of Research Integrity & Compliance Version 2 Updated: June 26, 2017 This document is meant to help board members conduct reviews for Full Board: Clinical Trial

More information

An Automated Testing Environment to support Operational Profiles of Software Intensive Systems

An Automated Testing Environment to support Operational Profiles of Software Intensive Systems An Automated Testing Environment to support Operational Profiles of Software Intensive Systems Abstract: Robert S. Oshana Raytheon Systems Company oshana@raytheon.com (972)344-783 Raytheon Systems Company

More information

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus Introduction (SPSS) SPSS is the acronym of Statistical Package for the Social Sciences. SPSS is one of the most popular statistical packages which can perform highly complex data manipulation and analysis

More information

Introduction to CS graphs and plots in Excel Jacek Wiślicki, Laurent Babout,

Introduction to CS graphs and plots in Excel Jacek Wiślicki, Laurent Babout, MS Excel 2010 offers a large set of graphs and plots for data visualization. For those who are familiar with older version of Excel, the layout is completely different. The following exercises demonstrate

More information

Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design)

Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design) Electronic Health Records for Clinical Research Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design) Project acronym: EHR4CR Project full title: Electronic

More information

User Services Spring 2008 OBJECTIVES Introduction Getting Help Instructors

User Services Spring 2008 OBJECTIVES  Introduction Getting Help  Instructors User Services Spring 2008 OBJECTIVES Use the Data Editor of SPSS 15.0 to to import data. Recode existing variables and compute new variables Use SPSS utilities and options Conduct basic statistical tests.

More information

Beyond 20/20. QuickStart Guide. Version 7.0, SP3

Beyond 20/20. QuickStart Guide. Version 7.0, SP3 Beyond 20/20 QuickStart Guide Version 7.0, SP3 Notice of Copyright Beyond 20/20 Desktop Browser Version 7.0, SP3 Copyright 1992-2006 Beyond 20/20 Inc. All rights reserved. This document forms part of the

More information

B. Graphing Representation of Data

B. Graphing Representation of Data B Graphing Representation of Data The second way of displaying data is by use of graphs Although such visual aids are even easier to read than tables, they often do not give the same detail It is essential

More information

UNIT 4. Research Methods in Business

UNIT 4. Research Methods in Business UNIT 4 Preparing Data for Analysis:- After data are obtained through questionnaires, interviews, observation or through secondary sources, they need to be edited. The blank responses, if any have to be

More information

Quantitative - One Population

Quantitative - One Population Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)

More information

Ovation Process Historian

Ovation Process Historian Ovation Process Historian Features Designed to meet the needs of precision, performance, scalability and historical data management for the Ovation control system Collects historical data of Ovation process

More information

MINITAB BASICS STORING DATA

MINITAB BASICS STORING DATA MINITAB BASICS After starting MINITAB, you ll see a Session window above and a worksheet below. The Session window displays non-graphical output such as tables of statistics and character graphs. A worksheet

More information

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings Statistical Good Practice Guidelines SSC home Using Excel for Statistics - Tips and Warnings On-line version 2 - March 2001 This is one in a series of guides for research and support staff involved in

More information

CurveCompare v1.0 INSTRUCTIONS

CurveCompare v1.0 INSTRUCTIONS CurveCompare v1.0 INSTRUCTIONS Table of Contents Introduction... 2 Installation... 2 Platform... 2 Verifying (or Installing) the Necessary.NET Framework... 2 Downloading and Storing the CurveCompare Executable(s)...

More information

Running Minitab for the first time on your PC

Running Minitab for the first time on your PC Running Minitab for the first time on your PC Screen Appearance When you select the MINITAB option from the MINITAB 14 program group, or click on MINITAB 14 under RAS you will see the following screen.

More information

17 - VARIABLES... 1 DOCUMENT AND CODE VARIABLES IN MAXQDA Document Variables Code Variables... 1

17 - VARIABLES... 1 DOCUMENT AND CODE VARIABLES IN MAXQDA Document Variables Code Variables... 1 17 - Variables Contents 17 - VARIABLES... 1 DOCUMENT AND CODE VARIABLES IN MAXQDA... 1 Document Variables... 1 Code Variables... 1 The List of document variables and the List of code variables... 1 Managing

More information

Basic concepts and terms

Basic concepts and terms CHAPTER ONE Basic concepts and terms I. Key concepts Test usefulness Reliability Construct validity Authenticity Interactiveness Impact Practicality Assessment Measurement Test Evaluation Grading/marking

More information

Chapter Two: Descriptive Methods 1/50

Chapter Two: Descriptive Methods 1/50 Chapter Two: Descriptive Methods 1/50 2.1 Introduction 2/50 2.1 Introduction We previously said that descriptive statistics is made up of various techniques used to summarize the information contained

More information

COMPUTER TECHNOLOGY SPREADSHEETS BASIC TERMINOLOGY. A workbook is the file Excel creates to store your data.

COMPUTER TECHNOLOGY SPREADSHEETS BASIC TERMINOLOGY. A workbook is the file Excel creates to store your data. SPREADSHEETS BASIC TERMINOLOGY A Spreadsheet is a grid of rows and columns containing numbers, text, and formulas. A workbook is the file Excel creates to store your data. A worksheet is an individual

More information

INTRODUCTORY SPSS. Dr Feroz Mahomed Swalaha x2689

INTRODUCTORY SPSS. Dr Feroz Mahomed Swalaha x2689 INTRODUCTORY SPSS Dr Feroz Mahomed Swalaha fswalaha@dut.ac.za x2689 1 Statistics (the systematic collection and display of numerical data) is the most abused area of numeracy. 97% of statistics are made

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

8 th Grade Mathematics Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the

8 th Grade Mathematics Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 8 th Grade Mathematics Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13. This document is designed to help North Carolina educators

More information

AN OVERVIEW AND EXPLORATION OF JMP A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE

AN OVERVIEW AND EXPLORATION OF JMP A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE AN OVERVIEW AND EXPLORATION OF JMP A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE A.P. Ruhil and Tara Chand National Dairy Research Institute, Karnal-132001 JMP commonly pronounced as Jump is a statistical software

More information

i2itracks Population Health Analytics (ipha) Custom Reports & Dashboards

i2itracks Population Health Analytics (ipha) Custom Reports & Dashboards i2itracks Population Health Analytics (ipha) Custom Reports & Dashboards 377 Riverside Drive, Suite 300 Franklin, TN 37064 707-575-7100 www.i2ipophealth.com Table of Contents Creating ipha Custom Reports

More information

DICOM Conformance Statement for GALILEI. Software Version V6.0

DICOM Conformance Statement for GALILEI. Software Version V6.0 DICOM Conformance Statement for GALILEI Software Version V6.0 Version 1.0, December 6 th, 2011 Contents Revision History... 2 Purpose... 2 References... 2 Terms and Definitions... 2 Abbreviations... 3

More information

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007 What s New in Spotfire DXP 1.1 Spotfire Product Management January 2007 Spotfire DXP Version 1.1 This document highlights the new capabilities planned for release in version 1.1 of Spotfire DXP. In this

More information

Introduction to Excel Workshop

Introduction to Excel Workshop Introduction to Excel Workshop Empirical Reasoning Center September 9, 2016 1 Important Terminology 1. Rows are identified by numbers. 2. Columns are identified by letters. 3. Cells are identified by the

More information

T. D. M. S. Version 6.0

T. D. M. S. Version 6.0 T. D. M. S. Version 6.0 For Windows T H E R A P E U T I C D R U G M O N I T O R I N G S Y S T E M USER MANUAL Copyright 1986-2003 Healthware, Inc. PO Box 33483 San Diego, CA 92163 Phone/FAX : (858) 452-0297

More information

Exercise: Graphing and Least Squares Fitting in Quattro Pro

Exercise: Graphing and Least Squares Fitting in Quattro Pro Chapter 5 Exercise: Graphing and Least Squares Fitting in Quattro Pro 5.1 Purpose The purpose of this experiment is to become familiar with using Quattro Pro to produce graphs and analyze graphical data.

More information

Technical Publications

Technical Publications g GE Medical Systems Technical Publications Direction 2275362-100 Revision 0 DICOM for DICOM V3.0 Copyright 2000 By General Electric Co. Do not duplicate REVISION HISTORY REV DATE REASON FOR CHANGE 0 May

More information

Introduction to StatsDirect, 15/03/2017 1

Introduction to StatsDirect, 15/03/2017 1 INTRODUCTION TO STATSDIRECT PART 1... 2 INTRODUCTION... 2 Why Use StatsDirect... 2 ACCESSING STATSDIRECT FOR WINDOWS XP... 4 DATA ENTRY... 5 Missing Data... 6 Opening an Excel Workbook... 6 Moving around

More information

B.2 Measures of Central Tendency and Dispersion

B.2 Measures of Central Tendency and Dispersion Appendix B. Measures of Central Tendency and Dispersion B B. Measures of Central Tendency and Dispersion What you should learn Find and interpret the mean, median, and mode of a set of data. Determine

More information

4 Displaying Multiway Tables

4 Displaying Multiway Tables 4 Displaying Multiway Tables An important subset of statistical data comes in the form of tables. Tables usually record the frequency or proportion of observations that fall into a particular category

More information

HIE Clinical Portal Non-Provider Manual 1 Last update: 2016/08/30 Alaska ehealth Network

HIE Clinical Portal Non-Provider Manual 1 Last update: 2016/08/30 Alaska ehealth Network HIE Clinical Portal Non-Provider Manual 1 Last update: 2016/08/30 Alaska ehealth Network Table of Contents Overview... 2 Patient Privacy Policy & Access... 3 User Levels... 5 User Homepage... 7 Common...

More information

Basic Intro to ETO Results

Basic Intro to ETO Results Basic Intro to ETO Results Who is the intended audience? Registrants of the 8 hour ETO Results Orientation (this training is a prerequisite) Anyone who wants to learn more but is not ready to attend the

More information

nstream Version 3.1 DICOM Conformance Statement

nstream Version 3.1 DICOM Conformance Statement Image Stream Medical nstream Version 3.1 DICOM Conformance Statement Revision History: Revision Date Author Notes A 03/02/2006 TMT/EP Initial Control A 03/05/2006 TMT Minor cosmetic changes A 03/07/2006

More information

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata Paul Dickman September 2003 1 A brief introduction to Stata Starting the Stata program

More information

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Graphical Analysis of Data using Microsoft Excel [2016 Version] Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

General Social Survey (GSS) NORC

General Social Survey (GSS) NORC General Social Survey (GSS) NORC OBTAINING GSS SENSITIVE DATA FILES The GSS geographic identification code files are made available to researchers under special contract with NORC. The GSS takes its promise

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept

More information

Recording end-users security events: A step towards increasing usability

Recording end-users security events: A step towards increasing usability Section 1 Network Systems Engineering Recording end-users security events: A step towards increasing usability Abstract D.Chatziapostolou and S.M.Furnell Network Research Group, University of Plymouth,

More information

SPSS. (Statistical Packages for the Social Sciences)

SPSS. (Statistical Packages for the Social Sciences) Inger Persson SPSS (Statistical Packages for the Social Sciences) SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform basic statistical calculations in SPSS.

More information

Standard Safety Visualization Set-up Using Spotfire

Standard Safety Visualization Set-up Using Spotfire Paper SD08 Standard Safety Visualization Set-up Using Spotfire Michaela Mertes, F. Hoffmann-La Roche, Ltd., Basel, Switzerland ABSTRACT Stakeholders are requesting real-time access to clinical data to

More information

Sequence of Grade 5 Modules Aligned with the Standards

Sequence of Grade 5 Modules Aligned with the Standards Sequence of Grade 5 Modules Aligned with the Standards Module 1: Whole Number and Decimal Fraction Place Value to the One-Thousandths Module 2: Multi-Digit Whole Number and Decimal Fraction Operations

More information

ICSSR Data Service. Stata: User Guide. Indian Council of Social Science Research. Indian Social Science Data Repository

ICSSR Data Service. Stata: User Guide. Indian Council of Social Science Research. Indian Social Science Data Repository http://www.icssrdataservice.in/ ICSSR Data Service Indian Social Science Data Repository Stata: User Guide Indian Council of Social Science Research ICSSR Data Service Contents: 1. Introduction 1 2. Opening

More information

Flat and relational structures for a terrestrial vertebrate database

Flat and relational structures for a terrestrial vertebrate database Journal of the Royal Society of Western Australia, 85:133-137, 2002 Flat and relational structures for a terrestrial vertebrate database P C Withers Department of Zoology, University of Western Australia,

More information

Recording Vital Signs and Test Results in Med-Center

Recording Vital Signs and Test Results in Med-Center Recording Vital Signs and Test Results in Med-Center Introduction This technical note discusses Med-Center s ability to record and plot lab test results or patient vital signs. This capability is a new

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

Intermediate SAS: Statistics

Intermediate SAS: Statistics Intermediate SAS: Statistics OIT TSS 293-4444 oithelp@mail.wvu.edu oit.wvu.edu/training/classmat/sas/ Table of Contents Procedures... 2 Two-sample t-test:... 2 Paired differences t-test:... 2 Chi Square

More information

STATA 13 INTRODUCTION

STATA 13 INTRODUCTION STATA 13 INTRODUCTION Catherine McGowan & Elaine Williamson LONDON SCHOOL OF HYGIENE & TROPICAL MEDICINE DECEMBER 2013 0 CONTENTS INTRODUCTION... 1 Versions of STATA... 1 OPENING STATA... 1 THE STATA

More information

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975. Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975. SPSS Statistics were designed INTRODUCTION TO SPSS Objective About the

More information

Statistics with a Hemacytometer

Statistics with a Hemacytometer Statistics with a Hemacytometer Overview This exercise incorporates several different statistical analyses. Data gathered from cell counts with a hemacytometer is used to explore frequency distributions

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data

More information

WORKSHOP: Using the Health Survey for England, 2014

WORKSHOP: Using the Health Survey for England, 2014 WORKSHOP: Using the Health Survey for England, 2014 There are three sections to this workshop, each with a separate worksheet. The worksheets are designed to be accessible to those who have no prior experience

More information

Cite: CTSA NIH Grant UL1- RR024982

Cite: CTSA NIH Grant UL1- RR024982 PREREQUISITE FOR USE Review and approval of the project by the Institutional Review Board is required If colleting data for the purpose of human subject s research. Cite: CTSA NIH Grant UL1- RR024982 1

More information

Downloading other workbooks All our workbooks can be downloaded from:

Downloading other workbooks All our workbooks can be downloaded from: Introduction This workbook accompanies the computer skills training workshop. The trainer will demonstrate each skill and refer you to the relevant page at the appropriate time. This workbook can also

More information

Elementary Statistics

Elementary Statistics 1 Elementary Statistics Introduction Statistics is the collection of methods for planning experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing

More information

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently. 255 APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software Introduction 255 Generating a QMF Export Procedure 255 Exporting Queries from QMF 257 Importing QMF Queries into Query and Reporting 257 Alternate

More information

Module 2: Health Information Exchange Services

Module 2: Health Information Exchange Services Module 2: Health Information Exchange Services Introduction In this module, Health Information Exchange (HIE) will be introduced. This system is designed to provide patient information for THR facilities

More information

Genetic Analysis. Page 1

Genetic Analysis. Page 1 Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced

More information

data elements (Delsey, 2003) and by providing empirical data on the actual use of the elements in the entire OCLC WorldCat database.

data elements (Delsey, 2003) and by providing empirical data on the actual use of the elements in the entire OCLC WorldCat database. Shawne D. Miksa, William E. Moen, Gregory Snyder, Serhiy Polyakov, Amy Eklund Texas Center for Digital Knowledge, University of North Texas Denton, Texas, U.S.A. Metadata Assistance of the Functional Requirements

More information

Insight: Measurement Tool. User Guide

Insight: Measurement Tool. User Guide OMERO Beta v2.2: Measurement Tool User Guide - 1 - October 2007 Insight: Measurement Tool User Guide Open Microscopy Environment: http://www.openmicroscopy.org OMERO Beta v2.2: Measurement Tool User Guide

More information

Testing a Set of Image Processing Operations for Completeness

Testing a Set of Image Processing Operations for Completeness Testing a Set of Image Processing Operations for Completeness Leonard Brown Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 73019 lbrown@cs.ou.edu, gruenwal@cs.ou.edu Greg

More information

Excel Primer CH141 Fall, 2017

Excel Primer CH141 Fall, 2017 Excel Primer CH141 Fall, 2017 To Start Excel : Click on the Excel icon found in the lower menu dock. Once Excel Workbook Gallery opens double click on Excel Workbook. A blank workbook page should appear

More information

1. Descriptive Statistics

1. Descriptive Statistics 1.1 Descriptive statistics 1. Descriptive Statistics A Data management Before starting any statistics analysis with a graphics calculator, you need to enter the data. We will illustrate the process by

More information

W3P: A Portable Presentation System for the World-Wide Web

W3P: A Portable Presentation System for the World-Wide Web W3P: A Portable Presentation System for the World-Wide Web Christopher R. Vincent Intelligent Information Infrastructure Project MIT Artificial Intelligence Laboratory cvince@ai.mit.edu http://web.mit.edu/cvince/

More information

Progress towards database management standards

Progress towards database management standards Progress towards database management standards by DONALD R. DEUTSCH General Electric Information Services Co. Nashville, Tennessee ABSTRACT The first proposals for database management standards appeared

More information

2017 NEW JERSEY STATEWIDE SURVEY ON OUR HEALTH AND WELL BEING Methodology Report December 1, 2017

2017 NEW JERSEY STATEWIDE SURVEY ON OUR HEALTH AND WELL BEING Methodology Report December 1, 2017 207 NEW JERSEY STATEWIDE SURVEY ON OUR HEALTH AND WELL BEING Methodology Report December, 207 Prepared for: Center for State Health Policy Rutgers University 2 Paterson Street, 5th Floor New Brunswick,

More information

Microsoft Excel 2007

Microsoft Excel 2007 Microsoft Excel 2007 1 Excel is Microsoft s Spreadsheet program. Spreadsheets are often used as a method of displaying and manipulating groups of data in an effective manner. It was originally created

More information