WORKSHOP: Using the Health Survey for England, 2014

Similar documents
Introduction to Nesstar

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

Chapter One: Getting Started With IBM SPSS for Windows

4. Descriptive Statistics: Measures of Variability and Central Tendency

Canadian National Longitudinal Survey of Children and Youth (NLSCY)

Creating a data file and entering data

Frequency Distributions and Descriptive Statistics in SPS

Introduction to SPSS

Select Cases. Select Cases GRAPHS. The Select Cases command excludes from further. selection criteria. Select Use filter variables

2016 SPSS Workshop UBC Research Commons

DEPARTMENT OF HEALTH AND HUMAN SCIENCES HS900 RESEARCH METHODS

User Services Spring 2008 OBJECTIVES Introduction Getting Help Instructors

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2018/10/30) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

Opening a Data File in SPSS. Defining Variables in SPSS

IBMSPSSSTATL1P: IBM SPSS Statistics Level 1

Handling Your Data in SPSS. Columns, and Labels, and Values... Oh My! The Structure of SPSS. You should think about SPSS as having three major parts.

Running Minitab for the first time on your PC

WELCOME! Lecture 3 Thommy Perlinger

Longitudinal Linkage of Cross-Sectional NCDS Data Files Using SPSS

OneView. User s Guide

Let s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project

How to Use a Statistical Package

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

MiBody 360 Personal Scale Instruction Manual

INTRODUCTION TO SPSS. Anne Schad Bergsaker 13. September 2018

For many people, learning any new computer software can be an anxietyproducing

INTRODUCTORY SPSS. Dr Feroz Mahomed Swalaha x2689

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values

SPSS for Survey Analysis

Chapter 3 Analyzing Normal Quantitative Data

Data to Story Project: SPSS Cheat Sheet for Analyzing General Social Survey Data

GETTING STARTED. A Step-by-Step Guide to Using MarketSight

Computers and statistical software such as the Statistical Package for the Social Sciences (SPSS) make complex statistical

Chapter 11 Dealing With Data SPSS Tutorial

Table of Contents. Self Service Builder (SSB) - How To's... 2 Creating a Project Building a Project Miscellaneous...

Chapter 12 Dealing With Data Tutorial

How to Use a Statistical Package

There are 3 main windows, and 3 main types of files, in SPSS: Data, Syntax, and Output.

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University

Chapter 2 Assignment (due Thursday, April 19)

National Child Measurement Programme 2017/18. IT System User Guide part 3. Pupil Data Management

Quick User Guide SURVEYTIME. User Manual Page 1 INTRODUCTION TO SURVEYTIME

IBM SPSS Statistics 22 Brief Guide

How to Use a Statistical Package

Brief Guide on Using SPSS 10.0

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus

Barchard Introduction to SPSS Marks

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Basic concepts and terms

STREET MOBILITY PROJECT User Guide for Analysing the Health and Neighbourhood Mobility Survey

APPENDIX B EXCEL BASICS 1

Using Tables, Sparklines and Conditional Formatting. Module 5. Adobe Captivate Wednesday, May 11, 2016

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

Using the Health Indicators database to help students research Canadian health issues

Barchard Introduction to SPSS Marks

SPSS 11.5 for Windows Assignment 2

Page 1. Graphical and Numerical Statistics

CHAPTER 6. The Normal Probability Distribution

Stata: A Brief Introduction Biostatistics

Tutor Handbook for WebCT

Quick Guide. Choose It Maker 2. Overview/Introduction. ChooseIt!Maker2 is a motivating program at first because of the visual and musical

One does not necessarily have special statistical software to perform statistical analyses.

UNIT 4. Research Methods in Business

SPSS - Beginnings Data, Descriptive Statistics, Select cases, recode Structure SPSS has 3 different fields (windows) 1. Data window (double window). O

Getting Started With. A Step-by-Step Guide to Using WorldAPP Analytics to Analyze Survey Data, Create Charts, & Share Results Online

Spotlight Session Analysing answers to open-ended questions from surveys

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

In version that we have released on October 5, 2015 you will find the following helpful new features:

Preparing for Data Analysis

3. Saving Your Work: You will want to save your work periodically, especially during long exercises.

i2itracks Population Health Analytics (ipha) Custom Reports & Dashboards

Introduction to Minitab 1

Data Science Centre (DSC) Data Preparation Policy: Guidelines for managing and preparing your data for statistical analysis

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

17 - VARIABLES... 1 DOCUMENT AND CODE VARIABLES IN MAXQDA Document Variables Code Variables... 1

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

SPSS. (Statistical Packages for the Social Sciences)

CSCU9B2 Practical 1: Introduction to HTML 5

Table of Contents. Adding Files Select File...20 Metadata...20 A to Z...21

Minitab Notes for Activity 1

Surviving SPSS.

Course Code: SPSS19 Introduction to IBM SPSS Statistics

Introductions Overview of SPSS

Joomla! 2.5.x Training Manual

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Unlike other computer programs you may have come across, SPSS has many user

Spreadsheet Concepts: Creating Charts in Microsoft Excel

Introduction to Qualtrics

Transitioning Teacher Websites

Chapter 2 Assignment (due Thursday, October 5)

Software Reference Sheet: Inserting and Organizing Data in a Spreadsheet

Smartphone Ownership 2013 Update

TRAFFIC SAFETY FACTS. Young Drivers Report the Highest Level of Phone Involvement in Crash or Near-Crash Incidences. Research Note

CREATING A NEW SURVEY IN

Chapter 2 The SAS Environment

Solar Campaign Google Guide. PART 1 Google Drive

22/10/16. Data Coding in SPSS. Data Coding in SPSS. Data Coding in SPSS. Data Coding in SPSS

Title of Resource Introduction to SPSS 22.0: Assignment and Grading Rubric Kimberly A. Barchard. Author(s)

Mr. Kongmany Chaleunvong. GFMER - WHO - UNFPA - LAO PDR Training Course in Reproductive Health Research Vientiane, 22 October 2009

1.0 Overview For content management, Joomla divides into some basic components: the Article

Transcription:

WORKSHOP: Using the Health Survey for England, 2014 There are three sections to this workshop, each with a separate worksheet. The worksheets are designed to be accessible to those who have no prior experience of data analysis and/or SPSS. If you have experience with both, feel free to be flexible with how you use Worksheet 3, for example you might want to conduct analyses on variables of your own choice, depending upon your own interests, rather than those specified. However, we suggest everyone works through worksheets 1 and 2 fully. Worksheet 1: Exploring the documentation for HSE, 2014 Worksheet 2: Using Nesstar to explore HSE data online Worksheet 3: Analysing HSE 2014 data using SPSS 23 1

Worksheet 1: Exploring the documentation for HSE, 2014 When conducting secondary data analyses, it is tempting to skip past the part exploring the documentation. However, this step is essential to ensuring the validity of your research; it will help make sure you select the correct variables for your analysis, treat them appropriately (e.g. applying the correct weights, doing any recodes) and can help you identify possible limitations with the data. This worksheet guides you through some sections of the HSE, 2014 documentation. 1. Finding the HSE data First, we must find the HSE data from the UK Data Service website. a. Go to www.ukdataservice.ac.uk b. Click on the Get Data tab near the top of the page to open the discover search tool From here you can do a basic search for any data that the UK Data Service has. However, for a more advanced search (which we will be doing). c. Click Go in the search bar (it is fine to leave the search bar empty) You will see several filter options have appeared on the left-hand side. We are going to look for a moment at the options under Type. Many of the data sets form part of a series usually because the same survey was conducted in multiple years (as is the case for the HSE). In the discover search tool, you can search for individual years of a survey by typing in the name and date of that survey (e.g. HSE 2014). However, you can also search for the entire series of a survey which we will do now. d. On the left hand side of the Discover page, click the plus sign next to Type and uncheck Data collections and check Series e. Type HSE in the search bar and click Go f. Select the Health Survey for England (SN: 2000021) This page is the series record for the HSE. It gives an overview of the Survey and lists all the years available. To select HSE, 2014: g. Click the plus sign under DATA ACCESS to reveal all the years available for the HSE h. Select Health Survey for England, 2014 (SN: 7919) at the top of the list You are now on the catalogue page for the Health Survey for England, 2014 which gives a description for that year of the survey (each year of the survey has its own catalogue page). 2

NB: Remember that you can go straight to this individual catalogue page from the less advanced discover search tool by clicking get data from the UK Data Service main page and typing HSE 2014, or by checking Data Collections under Type in the advanced discover search tool and searching for HSE 2014. The catalogue page provides several key pieces of information for the given data set, including the main subjects included in the data and an overview of the methodology. Have a scroll down to have a brief look at the sections available. 2. Finding the documentation for HSE 2014 One of the sections on the catalogue page for a data set is Documentation. All the documentation available for a data set can be accessed via this catalogue page, though it will also be included with the data if you download it. Several different files of documentation can be included, such as copies of the questionnaire, a general user guide, and list of the variables. To access the list of documentations: a. On the catalogue page for HSE 2014, click on documentation under the study title Health Survey for England, 2014 this will take you straight to the relevant section 3. Browsing the User Guide In exploring the documentation, a good place to start is the User Guide. a. Open this file (7919_hse2014_user_guide.pdf), listed fifth in the documents. This document provides an overview of the main features of the HSE including the survey design, the variables, and weights. As you scan the document, consider the following questions. If it helps, you can record your answers but there is no requirement to. Questions 1. Which two organisations conducted the HSE, 2014? 2. How many addresses were included in the survey sample? 3. How many individuals (adults and children) were interviewed? 4. As well as during the interview, when/how else were (some of) the respondents measured? 5. What are the two data sets available for the HSE? 6. What are some of the reasons a respondent s blood pressure measurement is defined invalid? 7. Look at Appendix A on page 18, what age range of respondents were asked to complete the Wellbeing (Warwick Edinburgh Scale) questions? 3

A brief explanation of survey weights A weight is a derived variable - meaning it is calculated by the data producers after the survey has been conducted, rather than data that is collected directly from the survey respondents. Weights are continuous variables. In a weight variable, each case (respondent) is assigned a value according to how much that case will count in the statistical analysis. How much a case counts (or how much weight it is given) is determined by whether the characteristics of that case are over or under-represented in the sample. For example, if a sample has far more women than men, the female cases will be given less weight (lower weight values) than the male cases. Weights are always positive but are often fractions, meaning that case counts for less than 1 case in the analysis. Cases with a weight greater than 1 count for more than 1 case in the analysis. Weights will often be calculated to take account of two or more ways in which each case is unfairly (over or under) represented in the survey sample, e.g. taking account of characteristics such as age, region and ethnicity. More information about survey weights can be found in the UK Data Service What is Weighting? guide https://www.ukdataservice.ac.uk/media/285227/weighting_2_1.pdf 8. How many different weights are there in the HSE (see section 5 of User Guide)? 9. What does the Interview weight for adults take account of? 10. How do you know which weights should be applied to the data before you do your analysis? 4. Exploring the Variables Once we have an overview of the data set, it is good practice to look at the list of variables so we can select those we want to use in our analyses. a. Go back to the list of documentation files on the UK Data Service webpage for HSE, 2014 b. Open the document List of Variables and Derived Variables c. Have a look at the different groups of variables listed in the contents page d. We will be using a measure of Body Mass Index (BMI) in our analyses so we will explore the different BMI variables via this document. Use the contents page to identify where we might find the BMI measures. Questions 1. How many different BMI variables are there in the data set? 2. Which measure(s) of BMI do you think would be most reliable and why? In our analyses, we will use two different measures of BMI one continuous (BMIVAL2) where a respondent s BMI is recorded as an exact amount, and one categorical (BMIVG3) where a respondent s BMI is grouped into a given range (e.g. 0-20). More information about these variables, including how they were derived, is contained in the second half of the variables document. 3. How many categories are there in the grouped BMI variable BMIVG3 and what are they? (The quickest way to find this is by searching for the variable name in the document using the search function). 4

Categorical versus continuous variables There are two main types of variables we can analyse: Categorical and continuous or scalar. Continuous or scalar variables are variables were the values (or numbers) in that variable are meaningful in themselves, e.g. age, height, weight. For example, a continuous or scalar measure of age will have a set of values that represent the age of the individual 16,17,18,19,20 78,79,80 and so on. For categorical variables, the values are not meaningful in themselves but rather, are attached to a particular group or category. For example, a measure of sex has two categories male and female. Each category will be assigned an arbitrary value, e.g. Male = 1, Female = 2. Statistical analysis software such as SPSS will record the meanings of these values so the analyst can know what they are (more on this later). N.B. A variable which we assume to be continuous/scalar might in fact be categorical, for example, if age is grouped (e.g. 20-29, 30-39, 40-49 etc.). In these cases, the values are again arbitrary and we would need to know what they mean. For example, a grouped age variable might be coded as follows: 1 = 20-29 years; 2 = 30-39 years; 3 = 40-49 years. This is the case with our two different measures of BMI one is categorical (BMIVG3), one is continuous (BMIVAL2). We are interested in factors that might be associated with a person s BMI. You can use the list of variables in the variables document to identify those that you might be interested in testing. In this workshop, we are going to look at the relationship between BMI and several categorical variables: sex, age and four different health/well-being measures. These variables are: SEX: sex AG16G10: Age 16-17+ in ten year age bands GENHELF4: General health 4 categories PAIN: How much experienced pain or discomfort ANXIETY: Whether anxious or depressed ENERGY: How often had energy to spare Using the same document, we can look up the response categories for each of these variables as we did for BMIVG3. 4. What are the response categories for GENHELF4? (Again, the quickest way to find this is by searching for the variable name in the document using the search function). Now we have a list of the variables we are interested in, we can explore exactly where these variables come from, including whether they are raw or derived. 5

Raw and Derived Variables Data can consist of raw and/or derived variables. Raw variables are those taken directly from the questionnaire. For example, a respondent s weight may be recorded in an interview and kept in its raw form in the data. Derived variables are raw variables that have been treated in some way to present them differently. For example, the respondent s weight recorded in an interview might then be grouped into ranges (0-50kg, 51-70kg) creating a derived measure of weight. The second half of the variables document contains information about how the derived variables included in HSE 2014 were derived. It is in this section of the document where you would have identified the four categories of the variable GENHELF4 (on page 101 of the document) in answering question 4. Indeed, GENHELF4 is a derived variable. 5. Using information on this page, can you work out from which variable GENHELF4 was derived and how? NB: SPSS Syntax which you see here is the coding language that was used to derive the variable GENHELF4. Finally, now we have selected the variables we are interested in analysing, we can look at the questionnaire for the HSE to see which specific survey question was used to create those variables. e. Go back to the catalogue page for HSE 2014 and open the document Questionnaires, Showcards, Coding Frames and Consent Booklets f. Search the document for the variable GENHELF remember we will not find GENHELF4 because it is a derived variable - to find the exact wording of the relevant question. 5. Selecting a weight variable Now we know which variables are going to be used in the analyses, you can identify the appropriate weight variable to apply to the data. N.B. Remember the user guide states that a weight should be selected depending upon which stage of the data collection (e.g. interview, nurse visit) the variables you are analysing were collected. If you use variables collected from more than one stage, use the weight relevant to the latest stage (e.g. if you are using variables collected during the interview and at the nurse interview use the nurse weight as the nurse visit happened later than the interview). a. Open the document Interviewer, Nurse, Coding, Measurement and Editing Instructions. b. Using the tables on pages 4 and 5 of this document (Section 1.5 The interviewer visit, Section 1.6 The nurse visit) work out whether the variables to be analysed later (BMI, age, sex, general health and well-being) were included in the interview stage or nurse visit. 6

Questions 1. Based upon the above, which weight should be used in the analyses of the variables selected for later analyses (BMIVG3, BMIVAL2, SEX, AG16G10, GENHELF4, PAIN, ANXIETY, ENERGY)? 2. What is the name of this weight variable? (revisit the document listing all the variables to find out). Jot it down here to use later. Other resources Not every piece of information available on the HSE is included in the list of documentations available via the UK Data Service. Other useful resources can be found online including via the NatCen website (who manage the survey). See their page on Health Survey for England here: http://natcen.ac.uk/our-research/research/health-survey-for-england/ and scroll down to View reports and trend tables 2004-2015 for links to reports from individual survey years. 7

Worksheet 2: Using NESSTAR to explore HSE data online This worksheet demonstrates Nesstar - an online data discovery and exploration tool used by several data archives, including the UK Data Service. Nesstar can be really useful if you want to explore the data/variables before using it for analysis in a statistical package such as SPSS, STATA or R. 1. One-way frequency distributions Nesstar allows you to explore the data online before registering with the UK Data Service. a. From the catalogue page for the HSE, 2014 click Access online near the top of the page. This takes you to the Nesstar page which lists all the surveys available via this tool. The HSE 2014 is automatically selected but you can navigate to other surveys (and other years of the HSE) via the menu on the left. The right window contains general information about the HSE. In the left window, if you click on the signs, you open a new list via which you can find any variable from the HSE you are interested in and run analyses on it. We are going to look at a couple of the variables identified in worksheet 1 via these drop down menus. b. Click on the sign next to Variable Description under Health Survey for England, 2014 c. Click on the sign next to Individual Data File d. Click on sign next to Anthropometric Measurements e. Click on sign next to Measurements f. Click the variable BMI grouped combining underweight and normal, overweight and combining obese and morbidly obese. This is the variable BMIVG3 which we will be using later. You should be able to see the distribution of data in this variable. Questions 1. How many valid cases are there for the variable BMIVG3? 2. How many missing cases are there for the variable BMIVG3? Missing cases Most variables will have one or more missing cases. These are cases that do not have a valid response for a given variable and so should not be included in the analyses of that variable. There may be a specific reason for a case being missing from a variable, for example if the variable does not apply to them or if they refused to give certain information. However, cases can sometimes be missing for unspecified reasons. In the top right corner of the page are several icons which represent several functions Nesstar can perform. If you hover over each with the mouse, you can see what they are. Many of them are not accessible without logging into the UK Data Service so you will not be able to click on them but we 8

will look at some below. 2. Creating a subset of the data The subset icon allows you to look at a smaller population, or sub-group, within the data set. a. Click on the subset icon b. Locating again the variable BMI grouped combining underweight and normal, overweight and combining obese and morbidly obese, click on this variable on the left-hand list and select add to subset from the drop-down menu. c. By way of practice, we want to select just those in the data set who are overweight or obese. To do so, click on the scroll function next to the equals sign symbol >= d. In the categories list, select 2 Overweight and select ADD to its left and select the e. Click ok. This means that any analysis you conduct will only include the population with a value on the variable BMIVG3 of 2 or more, which includes those in the Overweight category (value 2) and those in the Obese category (value 3). We could have, instead, just selected those in any one of the categories by using the = sign. 3. Using Weights By default, the figures displayed via Nesstar are the raw statistics for given variables but they can be weighted using the icon. a. Click on the weight icon at the top right. This brings up a screen which lists all the weighting variables associated with the dataset. Select HSE 2014 Weight for analysis of core interview sample and move it to the right-hand window using the arrow, and click OK. 4. Creating cross-tabs Both the functions we have practiced above (selecting a subset and applying weights) direct you immediately to the cross-tab or Tabulation feature of Nesstar. Nessar will not apply weights or a sub-set selection to a one-way frequency. You should see at the bottom of the table that the filter (sub-set selection) and weight is on. 9

The tabulation feature can be used to create cross-tabulations of two variables and conduct simple analyses. However, you need to be registered with the UK Data Service to use this feature. If you try and populate the table by clicking on a variable in the left window and choose add to row or add to column you will be asked for your username and password. 10

Worksheet 3: Analysing HSE 2014 data using SPSS 23 1. Downloading the data Now we will use the HSE 2014 data in SPSS to run our own analyses. There are two different data sets for the HSE 2014 the individual and household, we will be using the individual. If you are registered with the UK Data Service, you can download this data from the HSE 2014 catalogue page. However, the data is already on your laptop which we will open now. a. Open SPSS by clicking on the Start menu and typing SPSS in the search bar b. Find IBM SPSS Statistics 23 under programs and click to open c. In the left-hand box Recent Files click Open (another) file d. Select the C drive (C:\\Work) e. Click on hse2014ai.sav to open SPSS: A brief overview In SPSS there are two ways to view the data in the Data View or the Variable View. Switch between the two views using the tabs at the bottom left of the screen. Click on the Data View tab (bottom left of screen). Each row represents an individual respondent to the survey. Each column represents a variable in the survey. A variable is something that varies between respondents (e.g. age, sex, ethnicity) it is often a response to a question or has been derived from answers to a question. Click on the Variable View tab (bottom left of screen). This screen gives more information about each variable in the data set. Each row represents a variable and each column provides information about the variable including the name, label and values. Values (listed in the values column of the variable view) are the numeric values within a variable. If the variable is a scale or continuous variable (e.g. age, income, weight) the values are meaningful in themselves and do not need value labels. If the variable is categorical (responses consisting of different groups or categories, e.g. ethnicity, sex, BMI grouped) then each category is assigned a value (1, 2, 3 etc.) and each value is assigned a label (e.g. 1=male, 2=female). The value labels are listed in the values column in the variable view. N.B. see earlier description about difference between categorical and continuous variables. Missing values: Most variables will have one or more missing values. These are values assigned to a response that in some way are defined as missing data. For example, if someone refused to answer a certain question or did not give an answer because the question did not apply to them. Missing values are usually given a negative value so they are obvious within the data. Value labels (e.g. did not apply, refused to answer) are assigned to these values in the same way other values representing categorical responses are. They are also listed in the values column of the variable view. 11

2. Applying weights to the data Before we do any analyses, we will add the appropriate weights to our data. Remember the user guide (p.14) states that the weight from the latest stage of the survey that is used in the analyses should be applied to the data. Because we are using data collected only at the interview stage, we want to use the Interview weight. a. From the drop-down menus, select Data> Weight Cases b. Select the interview weight (wt_int) from the list of variables in the dialogue box. NB: To help you find the variable you want in this list, right click on the list and display the variable names instead of the labels and sort alphabetically. c. On the right-hand side of the dialogue box, select Weight cases by, click the icon and click OK. If the weight has been applied properly it will read Weight On at the bottom right hand corner of the data. 3. Univariate analysis: Exploring the variables (one-way frequencies) In both the data view and variable view windows of the data, you can use the drop-down menus at the top of the page to do analyses. The menu we will be using is Analyze > Descriptive Statistics 12

We will begin by conducting some frequency tables/statistics to explore the distribution of the variables we are interested in. We will run frequencies on the variables we identified in worksheet 1 to explore their distribution The variables are: BMIVAL2: Valid BMI measurements using estimated weight if >200kg BMIVG3: BMI grouped combining underweight and normal, overweight and combining obese and morbidly obese SEX: sex AG16G10: Age 16-17+ in ten year age bands GENHELF4: General health 4 categories PAIN: How much experienced pain or discomfort ANXIETY: Whether anxious or depressed ENERGY: How often had energy to spare The way in which we present the frequencies of each variable depends upon the type of variable they are (categorical or continuous). All but one of the variables (BMIVAL2) listed above are categorical so we will run frequencies on those first. For the categorical variables, we will present frequency tables and graphs to show the distribution of the data. a. From the drop-down menus select Analyze > Descriptive Statistics > Frequencies. This opens the Frequencies dialogue box. b. In the frequencies dialogue box select the categorical variables you are interested in from the list above (BMIVG3, SEX, AG16G10, GENHELF4, PAIN, ANXIETY, ENERGY) and move them into the right-hand box. NB: Again, to help you find the variable you want in this list, right click on the list and display the variable names instead of the labels and sort alphabetically. c. Make sure Display frequency tables is checked d. To produce graphs click on the Charts box e. In the Charts dialogue box, under Chart Type select Bar charts and under Chart Values select Percentages. Click Continue. f. Click ok 13

A frequency table (like that shown below) and graph for each variable should appear in an Output window, which is separate from the data set. You can save your output as files separate to the data set. The columns in the frequency tables are self-explanatory. The first column contains counts of the number of cases observed with that value (frequency) The second contains that number expressed as a percentage of all cases in the data (percent) The third contains that number expressed as a percentage of all cases in the data with nonmissing values (valid percent) The final column contains the cumulative percentage (those which have the same or a lower value). Questions 5. Which of the categories of BMI grouped has the highest percentage of individuals? 6. What percentage of respondents only had energy to spare rarely or never? (See cumulative percentage in frequency table). 7. What percentage of respondents had at least good health? (see cumulative percentages in frequency table). Now we will explore the frequency distribution of the continuous measure of BMI (BMIVAL2). We do not want to produce frequency tables or bar charts for these variables as there would be too many cells. Instead we can use descriptive statistics (e.g. mean) and histograms. g. From the drop-down menus select Analyze > Descriptive Statistics > Frequencies. h. In the frequencies dialogue box select the variable BMIVAL2 from the list on the left and move it into the right-hand box. i. Make sure Display frequency tables is unchecked (we do not want to produce a frequency table as it would be very large) j. Click on Statistics and under Central Tendency select Mean 14

k. Click continue l. Click Charts and select Histograms under Chart Type m. Click Continue and OK 4. What is the mean BMI? 5. What can we learn from the histogram about the way in which the BMI data is distributed? NB: Note that there are fewer missing cases on this measure of BMI than the grouped variable BMIVG3. This is because this continuous measure of BMI includes all ages of respondents, but the grouped variable excludes those aged less than 16. 4. Bivariate analysis: Exploring associations with BMI Now we will look at whether BMI is associated with our selected variables. Again, the way we do this depends upon the type of variables we are using (categorical or continuous). Here we will focus on looking at the associations between pairs of categorical variables by using cross-tabs. We will look at the association between BMIVG3 and each of the other categorical variables (SEX, AG16G10, GENHELF4, PAIN, ANXIETY, ENERGY). We will start by looking at the age and sex variables. Do this as follows: a. Select from the drop-down menus Analyze > Descriptive statistics > Cross tabs b. As we have done before, we need to select which variables we want to include in our table. However, as we are selecting two variables for this table we need to choose which will go in the rows and which will go in the columns of the table. This time, select BMIVG3 for the Row (again select the variable from the menu on the left, and click the arrow to move it into the Row(s) box). Now put the age and sex variables (SEX, AG16G10) in the Column(s) box. This will create two cross-tabs. One for sex by BMI and one for age by BMI. c. Click on Cells on the right-hand side of the dialogue box d. Under Percentages click Column and click Continue As a general rule, when you are creating a cross-tab, put the independent variable in the columns and the dependent variable in the rows, and then select percentages for the columns. 15

Selecting the independent and dependent variable in univariate analysis When looking at the relationship between two variables there will typically be one independent variable and one dependent variable. The independent variable is the variable you think is effecting the dependent variable. It will be the most fixed of the two variables, e.g. age (this cannot change). As age is more fixed than BMI it is more likely that a person s age will be affecting their BMI rather than their BMI affecting their age! Some relationships are less clear, such as for the relationship between BMI and depression. In this case, each might affect the other and you might want to test the relationship both ways in a cross-tab. Questions 1. Looking at the cross-tab for sex and BMI can you see a relationship between sex and BMI? E.g. do men or women look more likely to be overweight or obese? 2. Looking at the cross-tab for age and BMI what tends to happen to the percentage of obese people amongst older age groups? e. Using instructions a-d, now create cross-tabs for BMI and each measure of health/wellbeing (GENHELF4, PAIN, ANXIETY, ENERGY). Treat the health/well-being measures as the dependent variables, putting them in the rows of the cross-tabs and BMIVG3 in the columns 3. Look at each of the cross-tabs in turn. What relationships do you observe? The above are just suggestions of some of the many relationships you might want to explore with the HSE data. If you have time, try some other associations with different pairs of variables that you identify in the data set. Use the documentation to explore the full meanings of these different variables. 5. Syntax (optional extra) If you are not already familiar with Syntax but think you might use SPSS frequently, you may want to work through this task. In conducting the analysis above, we used drop-down menus in SPSS. However, SPSS Syntax offers a more efficient way of issuing commands (e.g. adding weights, producing graphs, creating crosstabs) in SPSS. Instead of using the drop-down menus, in SPSS Syntax you type commands in the Syntax Editor. However, rather than memorising what all the different commands are, we can use the paste function, which you might have seen when we were producing tables and graphs earlier. We will try this now by using Syntax to produce a one-way frequency distribution of GENHELF4. a. Select Analyze > Descriptive Statistics > Frequencies from the drop-down menus just as you did earlier 16

b. Select GENHELF4 from the list of variables on the left and move into the variables list using the arrow icon c. Click on charts and select Bar charts and percentages (again as you did earlier) and click continue d. Now back in the frequencies dialogue box, instead of clicking OK as you did before (which would create the selected output), click Paste right next to it This brings up the Syntax Editor with the command language necessary to produce the output you selected (frequency table and bar chart for GENHELF4) e. We still do not have any output. To produce the output via the Syntax Editor, click the large green arrow in the tool bar. This process so far seems less efficient than using the drop-down menus. However, the process means we now have a record of the command language needed to create frequency tables and bar charts with percentages. This means that we can manually type into the Syntax Editor the names of other variables for which we might want to produce the same output. We will try this now. f. In the Syntax Editor, type the names of the following variables next to genhelf4 : energy pain anxiety bmivg3 17

g. Run this command using the large green arrow Now we see the potential for this process being much quicker than using the drop-down menus. We will try the same process for producing cross-tabs h. Select Analyze > Descriptive Statistics > Cross-tabs i. Put pain in the rows and bmivg3 in the columns (you could put any variables in there and change them manually in the Syntax Editor later) j. Click on Cells and select Column under Percentages, click Continue k. Click Paste l. Before running this command, create the commands to produce a different cross-tab of energy BY genhelf4 by copying the command you have just produced, pasting it below and then changing the variable names m. Produce both cross-tabs by clicking on the large green arrow. NB: To produce only one of the cross-tabs instead of both you can highlight the one you want and then click the green arrow (it will ignore the one not highlighted). NB: If we were producing a second cross-tab where the column or row variable was the same as for the first cross-tab (i.e. pain was the row variable or bmivg3 was the column variable) we could have just added the new variable to the existing command. For example, if we wanted to produce two cross-tabs, one showing the relationship between pain and bmivg3 and the other showing the relationship between energy and bmivg3, the command could be written as follows: 18

You can use Syntax to do anything in SPSS. If you have time, try and use it to add different weights to the data. Remember, create the command using the drop-down menu (Data > Weight cases) but select Paste rather than Ok. Syntax Editor files can be saved just like output files. These can act as a useful record of your analysis which you can edit as your ideas change. Other Resources As well as providing access to data, the UK Data Service provides many resources to help with different methods of data analysis. From the UK Data Service main page (www.ukdataservice.ac.uk) click Use Data near the top of the page to locate our wide range of guides and tutorials. Some useful ones are listed here: https://www.ukdataservice.ac.uk/media/285227/weighting_2_1.pdf https://www.ukdataservice.ac.uk/media/398743/complexsampleguide_1_2.pdf https://www.ukdataservice.ac.uk/media/359156/whatisstata_8.pdf https://www.ukdataservice.ac.uk/media/455362/changeovertime.pdf https://www.ukdataservice.ac.uk/media/277533/whatarehierarchicalfiles.pdf https://www.ukdataservice.ac.uk/media/342808/usingspssforwindows.pdf https://www.ukdataservice.ac.uk/media/398726/usingr.pdf https://www.ukdataservice.ac.uk/use-data/secondary-analysis/reusing-quantitative-data https://www.ukdataservice.ac.uk/use-data/tutorials 19