Incluvie: Actor Data Collection Ada Gok, Dana Hochman, Lucy Zhan

Size: px
Start display at page:

Download "Incluvie: Actor Data Collection Ada Gok, Dana Hochman, Lucy Zhan"

Transcription

1 Incluvie: Actor Data Collection Ada Gok, Dana Hochman, Lucy Zhan Figure 0. Our partner company: Incluvie. 1. Project Task Incluvie is a platform that promotes and celebrates diversity within Hollywood. The platform allows movie viewers to voice their opinions on a movie regarding diversity and displays movie diversity ratings. Our project with Incluvie is focused on actors and actresses. Our task is to create a method that extracts specific data parameters related to gender and ethnic backgrounds of the actor. With our actor dataset, we continue to create data visualizations to illustrate the gender and cultural diversity in Hollywood. Incluvie will use our code and data visualizations to display diversity in Hollywood. In addition, we will use the dataset to analyze any possible correlation between the gender and race of an actor and his/her popularity. Furthermore, Incluvie will be able to use the datasets/code to compile actor profiles for its users. Utilizing this data, users can easily determine the kind of gender and cultural diversity stance a movie holds by viewing a movie s cast and the actors gender/ethnic information. 2. Approach Our main method of data extraction involves taking in an input text file. The input text file is a list of actors full names listed line-by-line. Given that there is not yet a full list of actors Incluvie wants to generate data for, we start with grabbing data for the top 100 actors in Hollywood. The goal is to first write the code for data extraction of the top 100 and have it automated for all actors in the future. We chose two lists to generate two datasets: Top 100 of the Highest Grossing Stars of at the Domestic Box Office and Top 100 on IMDB Starmeter 2. These lists are compiled to a text file in the same format as previously stated. After the method reads each text file, it parses it line-byline. For each line (each actor), multiple APIs and Python packages are used to search the web for the actor s ethnicity and gender. In this project, we are focusing on the TMDB API, Wikipedia API, Ethnicelebs and Beautiful Soup package. TMDB API takes in an actor s name and does a search for his/her TMDB page. It proceeds to retrieve data from the actor s TMDB page. The specific data parameters we retrieve are gender, birth place and popularity score 3. Next, Wikipedia API retrieves actor s Wikipedia page and the page content in HTML format. Using the Beautiful Soup package, we parse the HTML content to search for the Early Life section of the Wikipedia page (we recognize that the Early Life section generally 1 "Highest Grossing Stars of 2018 at the Domestic Box Office." 2 "IMDb Top Star Meter." user generated list 3 Popularity score is a score determined by TMDB based on the actor s number of views for the day and scores from previous days.

2 contains an actor s cultural background). We proceed to extract that section and search for keywords ( ethnicity, ancestral, descent, ancestry, mother, father, mom, dad, parents). Within the same sentence the keywords appear, we search for ethnicity terms. We have a list of common ethnicities in a csv file that can be converted into a python list using ethnicities_list.py. This is our python code that specifically parses a csv file (in the format of our ethnicities.csv file) and converts it into a python list. We use this ethnicities list to do our search for ethnicity titles. The final ethnicity titles get saved as ethnicity list for the actor. Ethnicelebs is a website that allows users to submit specific actors and data regarding their ethnicities. A majority of our list of actors are found on this site. Although Ethnicelebs does not have an API, we can parse the HTML code of the actor s page using Beautiful Soup to find ethnicities of the actor. Ethnicelebs has a specific section that lists out the ethnicities of the actors therefore, we retrieve this section this string and clean it up to save only the ethnicity titles. We clean up the line by searching for the titles matching those in our ethnicities_list.py list. The algorithm is very similar to the one used for Wikipedia. Once we finish the data extracting process, we store it all in a data frame and export the data frame as an excel sheet. We then proceed to generate data visualization to showcase the gender and cultural distribution of Hollywood using the Python library Matplotlib, which provides graphing abilities. With this library, we were able to generate an assortment of bar charts, pie charts and histograms (see the resulting figures in the graph appendix). In addition to graph visualizations, we want to analyze any correlations between gender and race with the popularity of actor (based on TMDB popularity score). We plan to find the correlation by testing through decision trees. We create an algorithm using decision trees that classifies an actor's popularity based on the actor's race and gender. The decision tree takes in a data frame of gender and race for all our actors (represented as binary variables {0,1}). Using this data, the tree predicts whether an actor s popularity score falls within the range of low, medium, or high. We split the actor s popularity into these three categories by splitting the popularity range (difference between lowest and highest score) into thirds. The decision tree visual (see graph appendix Figure 27) follows the path our algorithm takes in computing the popularity score. The decision tree trains 80% of the data and tests on 20%. The goal is to see how accurately a decision tree can classify an actor s popularity score based solely on race and gender. 3. Dataset and Metric For this project we have two final excel files of data: topactors_data and topimdb_data. The file alldata.xlsx contains the combined data of both. Each file represents data for a different list of corresponding actors (Top 100 of the Highest Grossing Stars of 2018 at the Domestic Box Office 4 as topactors.txt and Top 100 on IMDB Starmeter 5 as topimdb.txt, respectively) 6. A sample of the excel sheets are attached at the graph appendix (refer to figure 1-4). 4 "Highest Grossing Stars of 2018 at the Domestic Box Office." 5 "IMDb Top Star Meter." user generated list 6 From our analysis, IMDB star meter seems to focus more on the recently popular stars

3 The final excel sheets not only contain the data parameters that were extracted (gender, birth place, ethnicity list from Wikipedia, ethnicity list from Ethnicelebs, and popularity score), but also additional columns: percent similar, percent difference, race (sourced) and race (gut check). Percent similar/different are parameters for cross-checking the error between the ethnicities list generated from Wikipedia and Ethnicelebs. We calculate the value for percent similar by finding the number of ethnicity titles that match in both the Wikipedia and Ethnicelebs list and dividing it by the number of total unique ethnicity titles in both lists (for example, if Wikipedia and Ethnicelebs match on 3 out of 4 total ethnicities for one actor, this would be 3/4 = 75% similarity). The percent different parameter is 100% - percent similarity. This parameter represents how many of the ethnicity titles are not found in both lists. Race (sourced) represents the actor s race based on the actors ethnicities on Wikipedia and Ethnicelebs. Race (gut check) represents the actor s race determined by guessing 7. The reason we incorporate both race (sourced) and race (gut check) is to cross-check the data to ensure accuracy. Using the data, we generate multiple graphs relating to gender, popularity, ethnicity and race (e.g. gender histogram, popularity distribution histogram, birth country distribution histogram, etc.), which can be found in the graph appendix (refer to figure 5-26). These visualizations display the data to allow the viewer to easily draw conclusions. In order to also utilize our data, we created a decision tree to classify an actor s popularity based on gender and race. 4. Disclaimer/Assumptions Regarding this project and the methods, we want to make a few disclaimers and clarify some assumptions that were made. First, the actor dataset we are providing is made of data grabbed from three sites: TMDB, Wikipedia, and Ethnicelebs. TMDB is a database similar to that of IMDB and so the assumption is that the data is relatively accurate. Wikipedia and Ethniceleb pages consist of data that is edited and compiled by users, so the accuracy and reliability is questionable. However, we assume Wikipedia and Ethnicelebs users cross-check this data as they make edits and thus, it can be reliable. Furthermore, extracting data from both Wikipedia and Ethnicelebs helps validate the ethnicities. Additionally, our percent similarity/different parameters provide a value that can be a status of reliability. Second, we want to make the claim that not all data is available online, so there is missing data in our datasets. Third, regarding the data we added manually (e.g. race -- sourced and gut check), we labeled actors as Latino(a). We are aware that people of Hispanic origin do not classify Latino(a) as their race, but it was a specific factor we found necessary to clarify. Fourth, in reference to our ethnicities list generated from Wikipedia and Ethnicelebs, we are aware that certain titles (e.g. American, African American, and European ) are not ethnicity terms. However, Wikipedia and 7 We understand some bias could be involved in this process.

4 Ethnicelebs both list these as ethnicities and we felt it was important to include them so as to not lose data. This disclaimer is mirrored for the figures and figures in the graph appendix. 5. Results Referring to our gender bar graphs (figure 5 and figure 15 in the graph appendix), we can make specific conclusions regarding the gender representation in Hollywood. Figure 5 is the gender split amongst the Highest Grossing Stars of 2018 at the Domestic Box Office 8 and figure 16 is the gender split amongst the top Top 100 on IMDB Starmeter 9 (which we observe to be the current actors rising in fame ). We conclude that the highestgrossing stars have a 60:40 male to female ratio (figure 5), but the rising stars have a lower male to female ratio (figure 15). Furthermore, we analyze graphs regarding ethnicity distribution amongst all actors. Figure 10 and 11 (refer to graph appendix) are examples of the ethnicity distribution among the Highest Grossing Stars of 2018 at the Domestic Box Office 10. Observing these graphs, we can conclude that the majority of our actors are of European or American descent. Likewise, we analyze nationality distribution among the actors. Figure 9 of the graph appendix shows a nationality distribution stack bar chart by gender. The bar chart does not show any significant takeaways regarding gender and cultural representation in Hollywood. Most of the actors are from the US and within the actors with US nationality, majority are also male. Additionally, we create some graphs regarding race (sourced) shown in figure 15 and 26 of the graph appendix. These graphs continue to show that the white race overthrows all other races in terms of representation in Hollywood (like the ethnicity graphs). To visually demonstrate where actors are born, we created a geographical heat map to showcase where most actors in Hollywood come from. This is also a nice interactive addition for the Incluvie website (refer to figure 28 in graph appendix). For the decision tree, we run all 200 data points over it. The resulting accuracy is very high: 90% accuracy. 6. Conclusion As we wrap up our project for Incluvie, we want to reflect on the project, the challenges and the learning experience. Gathering data was one of the bigger challenges. It took time to find APIs that retrieved data we were looking for and figuring out how to parse through information on webpages. This was especially true with parsing through Wikipedia, where we had to continually expand our keywords list in order to find the ethnicities mentioned within paragraphs of text. However, through our data extracting, we learned how to effectively utilize APIs and the best methods for parsing through webpages. One concern that still remains is accuracy, as our data is not complete and is reliant on websites with user inputs (ex. Wikipedia and Ethnicelebs). 8 Highest Grossing Stars of 2018 at the Domestic Box Office." 9 "IMDb Top Star Meter." user generated list 10 Highest Grossing Stars of 2018 at the Domestic Box Office."

5 Overall, from this project, we have some conclusions to make about representation in Hollywood. We found that the gender representation is definitely unbalanced. When we analyze the data for the top 100 grossing actors, we get a 60:40 ratio of male to female. However, we realize that the actors rising in popularity (top IMDB starmeter) has a gender ratio that leans towards females. In one way, this can be viewed as a good sign, as it shows that there is improvement in gender representation in Hollywood. However, it can also be viewed negatively because despite the growing popularity of female actors, they are still being paid less. In addition, we can make conclusions about the cultural representation. As showcased by both ethnicity and race graphs, the White race conquers the Hollywood industry. From the decision tree we were able to classify actor popularity score ranges with a 90% accuracy. A high accuracy generally infers a high correlation -- thus, we can conclude that there is a high correlation between popularity, and the actors race and gender. However, we must recognize that we have a small data set and most of our actors have scores in the low range. The fact that a majority of our top actors have low popularity scores also casts some doubt on the TMDB popularity score. It is somewhat unclear as to how it is calculated, and there is no rationale as to why the range of popularity scores among certain actors varies so widely. A metric such as the F1 score could pinpoint the balance between precision and recall of the results. This would tell us how accurately the model identifies and returns all relevant popularity scores of the actor dataset. Perhaps it would be best to use a different measurement of popularity for future analysis. 7. Future Work While working on this project, we believe there are parts that can be improved in the future. As we ran the code, we believe there are parts that could be refactored to improve efficiency. The current code for our data extraction is not very time efficient. This could be the cause of parsing through long HTML codes. The run time does not affect our current work as the dataset only needs to be generated once. If we want to run it in the future for a larger set of actors and do so more regularly, it will be very time consuming. Therefore, it is necessary to optimize the run time. In addition, we wish to fix the code for data extraction and make it more dynamic. As of now, the code takes in a text file of a list of actor names. For future work, we want to find a reliable list of top 100 actors that is automatically updated daily/weekly. As the list updates, it would be ideal if we could automatically extract new data or use old data to update the graph visualizations for the 100 actors in Hollywood. Furthermore, we wish to find a more reliable source to gather a list of ethnicities for actors. Wikipedia and Ethnicelebs are both based on user input, which is not fully reliable. A more reliable list would not only increase the accuracy of our data, but also increase the amount of data, as we currently have some missing ethnicities for some actors. References 1. "Highest Grossing Stars of 2018 at the Domestic Box Office." The Numbers - Where Data and Movies Meet. Accessed December 03,

6 2. "IMDb Top Star Meter." IMDb. Accessed December 03,

7 Graph Appendix Figure 1. Data frame of Top 100 Highest Grossing Actors of 2018

8 Figure 2. Topactors_data.xlsx snippet

9 Figure 3. Data frame of Top 100 IMDB Starmeter Actor

10 Figure 4. TopIMDB_data.xlsx snippet

11 Figure 5. Gender Bar Chart for Top 100 Highest Grossing Actors of 2018

12 Figure 6. Popularity Distribution Histogram for Top 100 Highest Grossing Actors of 2018

13 Figure 7. Nationality Bar Chart for Top 100 Highest Grossing Actors of 2018

14 Figure 8. Nationality Distribution Pie Chart for Top 100 Highest Grossing Actors of 2018

15 Figure 9. Nationality Stack Bar Chart Split by Gender for Top 100 Highest Grossing Actors of 2018

16 Figure 10. Ethnicity Bar Chart for Top 100 Highest Grossing Actors of 2018

17 Figure 11. Ethnicity Distribution Pie Chart for Top 100 Highest Grossing Actors of 2018

18 Figure 12. Ethnicity Stacked Bar Chart by Gender for Top 100 Highest Grossing Actors of 2018

19 Figure 13. Nationality (U.S. States) Bar Chart for Top 100 Highest Grossing Actors of 2018

20 Figure 14. Nationality (U.S. States) Distribution Pie Chart for Top 100 Highest Grossing Actors of 2018

21 Figure 15. Race Bar Chart for Top 100 Highest Grossing Actors of 2018

22 Figure 16. Gender Bar Chart for Top 100 IMDB Starmeter Actors

23 Figure 17. Popularity Distribution Histogram for Top 100 IMDB Starmeter Actors

24 Figure 18. Nationality Bar Chart for Top 100 IMDB Starmeter Actors

25 Figure 19. Nationality Distribution Pie Chart for Top 100 IMDB Starmeter Actors

26 Figure 20. Nationality Stack Bar Chart Split by Gender for Top 100 IMDB Starmeter Actors

27 Figure 21. Ethnicity Bar Chart for Top 100 IMDB Starmeter Actors

28 Figure 22. Ethnicity Distribution Pie Chart for Top 100 IMDB Starmeter Actors

29 Figure 23. Ethnicity Stacked Bar Chart by Gender for Top 100 IMDB Starmeter Actors

30 Figure 24. Nationality (U.S. States) Bar Chart for Top 100 IMDB Starmeter Actors

31 Figure 25. Nationality (U.S. States) Distribution Pie Chart for Top 100 IMDB Starmeter Actors

32 Figure 26. Race Bar Chart for Top 100 IMDB Starmeter Actors

33 Figure 27. Decision Tree in regards to popularity

34 Figure 28. Snippet of Heat Map of the Birth Place of all data points

Texas Death Row. Last Statements. Data Warehousing and Data Mart. By Group 16. Irving Rodriguez Joseph Lai Joe Martinez

Texas Death Row. Last Statements. Data Warehousing and Data Mart. By Group 16. Irving Rodriguez Joseph Lai Joe Martinez Texas Death Row Last Statements Data Warehousing and Data Mart By Group 16 Irving Rodriguez Joseph Lai Joe Martinez Introduction For our data warehousing and data mart project we chose to use the Texas

More information

Predict the box office of US movies

Predict the box office of US movies Predict the box office of US movies Group members: Hanqing Ma, Jin Sun, Zeyu Zhang 1. Introduction Our task is to predict the box office of the upcoming movies using the properties of the movies, such

More information

Hybrid Recommendation System Using Clustering and Collaborative Filtering

Hybrid Recommendation System Using Clustering and Collaborative Filtering Hybrid Recommendation System Using Clustering and Collaborative Filtering Roshni Padate Assistant Professor roshni@frcrce.ac.in Priyanka Bane B.E. Student priyankabane56@gmail.com Jayesh Kudase B.E. Student

More information

5/13/2009. Introduction. Introduction. Introduction. Introduction. Introduction

5/13/2009. Introduction. Introduction. Introduction. Introduction. Introduction Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007) Two types of technologies are widely used to overcome

More information

WellComm: A Speech and Language Toolkit for Screening and Intervention in the Early Years. Revised Edition Report Wizard: User s Guide

WellComm: A Speech and Language Toolkit for Screening and Intervention in the Early Years. Revised Edition Report Wizard: User s Guide WellComm: A Speech and Language Toolkit for Screening and Intervention in the Early Years. Revised Edition Report Wizard: User s Guide 1. Overview of the Report Wizard The Report Wizard allows WellComm

More information

Parents Names Mom Cell/Work # Dad Cell/Work # Parent List the Math Courses you have taken and the grade you received 1 st 2 nd 3 rd 4th

Parents Names Mom Cell/Work # Dad Cell/Work # Parent   List the Math Courses you have taken and the grade you received 1 st 2 nd 3 rd 4th Full Name Phone # Parents Names Birthday Mom Cell/Work # Dad Cell/Work # Parent email: Extracurricular Activities: List the Math Courses you have taken and the grade you received 1 st 2 nd 3 rd 4th Turn

More information

User Guide for the ILDS Illinois Higher Education Information System (HEIS)

User Guide for the ILDS Illinois Higher Education Information System (HEIS) User Guide for the ILDS Illinois Higher Education Information System (HEIS) Data Submission Portal https://www2.ibhe.org/web/lds-public/ Helpdesk 217-557-7384 IHEIS@ibhe.org Table of Contents Change History...

More information

CS229 Final Project: Predicting Expected Response Times

CS229 Final Project: Predicting Expected  Response Times CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time

More information

Jessica J. Schuler Department of Resource Analysis, Saint Mary s University of Minnesota, Minneapolis, MN 55404

Jessica J. Schuler Department of Resource Analysis, Saint Mary s University of Minnesota, Minneapolis, MN 55404 Metadata Management: Visualization of Data Relationships Jessica J. Schuler Department of Resource Analysis, Saint Mary s University of Minnesota, Minneapolis, MN 55404 Keywords: Geographic Information

More information

Contents. Introduction. CDI V3 Help Quick Search

Contents. Introduction. CDI V3 Help Quick Search Contents Introduction... 1 A The interface... 2 A-1 Introduction... 2 A-2. Results list... 2 A-3: Additional menu options... 3 B. Searching the CDI entries metadata... 4 B-1 Narrowing down your result

More information

PERTS Default Privacy Policy

PERTS Default Privacy Policy PERTS Default Privacy Policy Version 1.3 2017-07-15 About PERTS PERTS is a center at Stanford University that helps educators apply evidence-based strategies in order to increase student engagement and

More information

T&F MEET MAGAGER 4.0 NEW FEATURES. Released January 14, 2013 Updated February 20, Track & Field

T&F MEET MAGAGER 4.0 NEW FEATURES. Released January 14, 2013 Updated February 20, Track & Field T&F MEET MAGAGER 4.0 NEW FEATURES Released January 14, 2013 Updated February 20, 2013 1 ABOUT T&F MEET MANAGER 4.0 MEET MANAGER 4.0 for (TFMM) is HY-TEK's 5th generation of Meet Management software. Provides

More information

Age & Stage Structure: Elephant Model

Age & Stage Structure: Elephant Model POPULATION MODELS Age & Stage Structure: Elephant Model Terri Donovan recorded: January, 2010 Today we're going to be building an age-structured model for the elephant population. And this will be the

More information

Facebook Page Insights

Facebook Page Insights Facebook Product Guide for Facebook Page owners Businesses will be better in a connected world. That s why we connect 845M people and their friends to the things they care about, using social technologies

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

WellComm: A Speech and Language Toolkit for Screening and Intervention in the Early Years. Revised Edition Report Wizard: User s Guide

WellComm: A Speech and Language Toolkit for Screening and Intervention in the Early Years. Revised Edition Report Wizard: User s Guide WellComm: A Speech and Language Toolkit for Screening and Intervention in the Early Years. Revised Edition Report Wizard: User s Guide 1. Overview of the Report Wizard The Report Wizard allows WellComm

More information

Family Map Client Specification

Family Map Client Specification Family Map Client Specification 1 Contents Contents... 2 Acknowledgements... 4 Introduction... 4 Purposes... 4 Family Map Client: A Quick Overview... 4 Activities... 5 Main Activity... 5 Login Fragment...

More information

Orange3 Data Fusion Documentation. Biolab

Orange3 Data Fusion Documentation. Biolab Biolab Mar 07, 2018 Widgets 1 IMDb Actors 1 2 Chaining 5 3 Completion Scoring 9 4 Fusion Graph 13 5 Latent Factors 17 6 Matrix Sampler 21 7 Mean Fuser 25 8 Movie Genres 29 9 Movie Ratings 33 10 Table

More information

Value of YouTube to the music industry Paper V Direct value to the industry

Value of YouTube to the music industry Paper V Direct value to the industry Value of YouTube to the music industry Paper V Direct value to the industry June 2017 RBB Economics 1 1 Introduction The music industry has undergone significant change over the past few years, with declining

More information

DATA STRUCTURE AND ALGORITHM USING PYTHON

DATA STRUCTURE AND ALGORITHM USING PYTHON DATA STRUCTURE AND ALGORITHM USING PYTHON Common Use Python Module II Peter Lo Pandas Data Structures and Data Analysis tools 2 What is Pandas? Pandas is an open-source Python library providing highperformance,

More information

Intermediate SPSS. If you have an SPSS dataset (*.sav), you can open it in the following way:

Intermediate SPSS. If you have an SPSS dataset (*.sav), you can open it in the following way: Center for Teaching, Research & Learning Research Support Group at the Social Science Research lab American University, Washington, D.C. http://www.american.edu/provost/ctrl/ 202-885-3862 Intermediate

More information

Family Map Server Specification

Family Map Server Specification Family Map Server Specification Acknowledgements The Family Map project was created by Jordan Wild. Thanks to Jordan for this significant contribution. Family Map Introduction Family Map is an application

More information

Hacking Airbnb s search rank algorithm

Hacking Airbnb s search rank algorithm TABLE OF CONTENTS Foreword... 2 Introduction... 2 Methodology... 2 Software used:... 2 Getting the data... 3 Getting an initial data set... 3 Diving Deeper... 4 Put it all together... 6 Results... 8 Data

More information

6 TOOLS FOR A COMPLETE MARKETING WORKFLOW

6 TOOLS FOR A COMPLETE MARKETING WORKFLOW 6 S FOR A COMPLETE MARKETING WORKFLOW 01 6 S FOR A COMPLETE MARKETING WORKFLOW FROM ALEXA DIFFICULTY DIFFICULTY MATRIX OVERLAP 6 S FOR A COMPLETE MARKETING WORKFLOW 02 INTRODUCTION Marketers use countless

More information

Facebook Page Insights

Facebook Page Insights Facebook Product Guide for Facebook Page owners Businesses will be better in a connected world. That s why we connect 800M people and their friends to the things they care about, using social technologies

More information

The WellComm Report Wizard Guidance and Information

The WellComm Report Wizard Guidance and Information The WellComm Report Wizard Guidance and Information About Testwise Testwise is the powerful online testing platform developed by GL Assessment to host its digital tests. Many of GL Assessment s tests are

More information

Empirical Study on Impact of Developer Collaboration on Source Code

Empirical Study on Impact of Developer Collaboration on Source Code Empirical Study on Impact of Developer Collaboration on Source Code Akshay Chopra University of Waterloo Waterloo, Ontario a22chopr@uwaterloo.ca Parul Verma University of Waterloo Waterloo, Ontario p7verma@uwaterloo.ca

More information

Help Guide DATA INTERACTION FOR PSSA /PASA CONTENTS

Help Guide DATA INTERACTION FOR PSSA /PASA CONTENTS Help Guide Help Guide DATA INTERACTION FOR PSSA /PASA 2015+ CONTENTS 1. Introduction... 4 1.1. Data Interaction Overview... 4 1.2. Technical Support... 4 2. Access... 4 2.1. Single Sign-On Accoutns...

More information

Bar Code File Preparation, Transmission & Validation Frequently Asked Questions

Bar Code File Preparation, Transmission & Validation Frequently Asked Questions Bar Code File Preparation, Transmission & Validation Frequently Asked Questions 2017-2018 Updates The Department of Education requires bar code labels for all students taking the Iowa Assessments. To comply

More information

Chapter 2 Assignment (due Thursday, April 19)

Chapter 2 Assignment (due Thursday, April 19) (due Thursday, April 19) Introduction: The purpose of this assignment is to analyze data sets by creating histograms and scatterplots. You will use the STATDISK program for both. Therefore, you should

More information

Chapter 3 Analyzing Normal Quantitative Data

Chapter 3 Analyzing Normal Quantitative Data Chapter 3 Analyzing Normal Quantitative Data Introduction: In chapters 1 and 2, we focused on analyzing categorical data and exploring relationships between categorical data sets. We will now be doing

More information

74 Wyner Math Academy I Spring 2016

74 Wyner Math Academy I Spring 2016 74 Wyner Math Academy I Spring 2016 CHAPTER EIGHT: SPREADSHEETS Review April 18 Test April 25 Spreadsheets are an extremely useful and versatile tool. Some basic knowledge allows many basic tasks to be

More information

Heuristic Evaluation of [Pass It On]

Heuristic Evaluation of [Pass It On] Heuristic Evaluation of [Pass It On] Evaluator #A: Janette Evaluator #B: John Evaluator #C: Pascal Evaluator #D: Eric 1. Problem Pass It On aims to transform some of the numerous negative and stressful

More information

2013 Association Marketing Benchmark Report

2013 Association  Marketing Benchmark Report 2013 Association Email Marketing Benchmark Report Part I: Key Metrics 1 TABLE of CONTENTS About Informz.... 3 Introduction.... 4 Key Findings.... 5 Overall Association Metrics... 6 Results by Country of

More information

Module 9 Kelsie Donaldson Casey Boland Nitish Pahwa. IMDb, August 13th, 2002

Module 9 Kelsie Donaldson Casey Boland Nitish Pahwa. IMDb, August 13th, 2002 Module 9 Kelsie Donaldson Casey Boland Nitish Pahwa IMDb, August 13th, 2002 IMDb.com Sitemap Landing Page Navigation bar News Forums Awards Movies TV Box office Search bar Header Site title logo Footer

More information

Introduction to Nesstar

Introduction to Nesstar Introduction to Nesstar Nesstar is a software system for online data analysis. It is available for use with many of the large UK surveys on the UK Data Service website. You will know whether you can use

More information

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012 Python & Web Mining Lecture 6 10-10-12 Old Dominion University Department of Computer Science CS 495 Fall 2012 Hany SalahEldeen Khalil hany@cs.odu.edu Scenario So what did Professor X do when he wanted

More information

Now that you ve got about 15 seed words, let s go examine the Keyword Planner tool.

Now that you ve got about 15 seed words, let s go examine the Keyword Planner tool. Googles New Keyword Planner Googles New Keyword Planner tool is a great way to research keywords and come up with new ideas for your website content. While you can use it for both Adwords and SEO, you

More information

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file 1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/

More information

IBM MANY EYES USABILITY STUDY

IBM MANY EYES USABILITY STUDY IBM MANY EYES USABILITY STUDY Team Six: Rod Myers Dane Petersen Jay Steele Joe Wilkerson December 2, 2008 I543 Interaction Design Methods Fall 2008 Dr. Shaowen Bardzell EXECUTIVE SUMMARY We asked three

More information

Chapter 2 - Graphical Summaries of Data

Chapter 2 - Graphical Summaries of Data Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense

More information

In-class activities: Sep 25, 2017

In-class activities: Sep 25, 2017 In-class activities: Sep 25, 2017 Activities and group work this week function the same way as our previous activity. We recommend that you continue working with the same 3-person group. We suggest that

More information

8.4 Organizing Data In Excel. MDM - Term 3 Statistics

8.4 Organizing Data In Excel. MDM - Term 3 Statistics 8.4 Organizing Data In Excel MDM - Term 3 Statistics WARM UP Log in to your computer Open Microsoft Excel Open a new sheet OBJECTIVES Enter data into Excel Use Excel to create a pie chart and personalize

More information

Excel Tips and FAQs - MS 2010

Excel Tips and FAQs - MS 2010 BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my

More information

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.

More information

Whitepaper. Dashboard Design Tips & Tricks.

Whitepaper. Dashboard Design Tips & Tricks. Whitepaper Dashboard Design Tips & Tricks Introduction Dashboards are popular but, let s be honest, many of them are useless. Either because they are visually boring or because zealous design too often

More information

Website Designs Australia

Website Designs Australia Proudly Brought To You By: Website Designs Australia Contents Disclaimer... 4 Why Your Local Business Needs Google Plus... 5 1 How Google Plus Can Improve Your Search Engine Rankings... 6 1. Google Search

More information

Part 1: Collecting and visualizing The Movie DB (TMDb) data

Part 1: Collecting and visualizing The Movie DB (TMDb) data CSE6242 / CX4242: Data and Visual Analytics Georgia Tech Fall 2015 Homework 1: Analyzing The Movie DB dataset; SQLite; D3 Warmup; OpenRefine Due: Friday, 11 September, 2015, 11:55PM EST Prepared by Meera

More information

Survey Questions and Methodology

Survey Questions and Methodology Survey Questions and Methodology Winter Tracking Survey 2012 Final Topline 02/22/2012 Data for January 20 February 19, 2012 Princeton Survey Research Associates International for the Pew Research Center

More information

CS140 Final Project. Nathan Crandall, Dane Pitkin, Introduction:

CS140 Final Project. Nathan Crandall, Dane Pitkin, Introduction: Nathan Crandall, 3970001 Dane Pitkin, 4085726 CS140 Final Project Introduction: Our goal was to parallelize the Breadth-first search algorithm using Cilk++. This algorithm works by starting at an initial

More information

Assignment 2: Processing New York City Open Data

Assignment 2: Processing New York City Open Data Assignment 2: Processing New York City Open Data Due date: Sept. 28, 11:59PM EST. 1 Summary This assignment is designed to expose you to the use of open data. Wikipedia has a good description of open data:

More information

Charts Data Importer

Charts Data Importer Charts Data Importer Plugin for Joomla! Editor extension for Aimy Charts This manual documents version 3.0.x of the Joomla! extension. http://www.aimy-extensions.com/joomla/charts-data-importer.html 1

More information

Automation.

Automation. Automation www.austech.edu.au WHAT IS AUTOMATION? Automation testing is a technique uses an application to implement entire life cycle of the software in less time and provides efficiency and effectiveness

More information

Female Brown Bear Weights

Female Brown Bear Weights CC-20 Normal Distributions Common Core State Standards MACC.92.S-ID..4 Use the mean and standard of a data set to fit it to a normal distribution and to estimate population percentages. Recognize that

More information

DADC Submission Instructions

DADC Submission Instructions System Purpose: The purpose of the Designated Agency Data Collection (DADC) is for each designated agency (DA) to report candidate information to CDE. Pursuant to state statute, CRS 22-2112(q)(I), CDE

More information

Identifying Important Communications

Identifying Important Communications Identifying Important Communications Aaron Jaffey ajaffey@stanford.edu Akifumi Kobashi akobashi@stanford.edu Abstract As we move towards a society increasingly dependent on electronic communication, our

More information

Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar Youngstown State University Bonita Sharif, Youngstown State University

Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar Youngstown State University Bonita Sharif, Youngstown State University Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar, Youngstown State University Bonita Sharif, Youngstown State University Jenna Wise, Youngstown State University Alyssa Pawluk, Youngstown

More information

How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects?

How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects? How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects? Saraj Singh Manes School of Computer Science Carleton University Ottawa, Canada sarajmanes@cmail.carleton.ca Olga

More information

Surfing the Web Student Response

Surfing the Web Student Response Surfing the Web Student Response CT.B.1.4.1 The student uses efficient search methods to locate information. Name: Date: Locate the Websites for the following information and give the complete URL address.

More information

The Rise of the Connected Viewer

The Rise of the Connected Viewer JULY 17, 2012 The Rise of the Connected Viewer 52% of adult cell owners use their phones while engaging with televised content; younger audiences are particularly active in these connected viewing experiences

More information

PDQ-Notes. Reynolds Farley. PDQ-Note 3 Displaying Your Results in the Expert Query Window

PDQ-Notes. Reynolds Farley. PDQ-Note 3 Displaying Your Results in the Expert Query Window PDQ-Notes Reynolds Farley PDQ-Note 3 Displaying Your Results in the Expert Query Window PDQ-Note 3 Displaying Your Results in the Expert Query Window Most of your options for configuring your query results

More information

Survey Questions and Methodology

Survey Questions and Methodology Survey Questions and Methodology Spring Tracking Survey 2012 Data for March 15 April 3, 2012 Princeton Survey Research Associates International for the Pew Research Center s Internet & American Life Project

More information

User Services Spring 2008 OBJECTIVES Introduction Getting Help Instructors

User Services Spring 2008 OBJECTIVES  Introduction Getting Help  Instructors User Services Spring 2008 OBJECTIVES Use the Data Editor of SPSS 15.0 to to import data. Recode existing variables and compute new variables Use SPSS utilities and options Conduct basic statistical tests.

More information

Fulbright Distinguished Awards in Teaching Program Partner Organization Application Manual. Institute of International Education

Fulbright Distinguished Awards in Teaching Program Partner Organization Application Manual. Institute of International Education Fulbright Distinguished Awards in Teaching Program Partner Organization Application Manual Institute of International Education 2015-2016 Contents International Application...3 Overview...3 Login for Applicants...3

More information

USER MANUAL BULK UPLOAD CONTENTS FEBRUARY 2014 INTRODUCTION 2 1. HOW DO I ACCESS THE BULK UPLOAD FACILITY? 3 2. BULK UPLOAD CODES 4

USER MANUAL BULK UPLOAD CONTENTS FEBRUARY 2014 INTRODUCTION 2 1. HOW DO I ACCESS THE BULK UPLOAD FACILITY? 3 2. BULK UPLOAD CODES 4 USER MANUAL BULK UPLOAD CONTENTS INTRODUCTION 2. HOW DO I ACCESS THE BULK UPLOAD FACILITY? 3 2. BULK UPLOAD CODES 4 3. BULK UPLOAD TEMPLATE - EXCEL FORMAT 5 4. BULK UPLOAD TEMPLATE - CSV FORMAT 0 5. ACW

More information

Yawkey Scholars Program for Massachusetts Residents

Yawkey Scholars Program for Massachusetts Residents Massachusetts Residents 2018-2019 Instructions & Pre-Application Checklist Thank you for your interest in the Yawkey Scholars Program. Please read these instructions carefully before filling out your application.

More information

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016) CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1 Daphne Skipper, Augusta University (2016) 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is

More information

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie SOLOMON: Parentage Analysis 1 Corresponding author: Mark Christie christim@science.oregonstate.edu SOLOMON: Parentage Analysis 2 Table of Contents: Installing SOLOMON on Windows/Linux Pg. 3 Installing

More information

Students Preferences for Receiving Communication from the University: A Report from the Student Life Survey

Students Preferences for Receiving Communication from the University: A Report from the Student Life Survey Preferences for Receiving Communication from the University: A Report from the Student Life Survey Center for the Study of Student Life March 2017 INTRODUCTION This report explores questions on the Student

More information

Data Science Essentials

Data Science Essentials Data Science Essentials Lab 6 Introduction to Machine Learning Overview In this lab, you will use Azure Machine Learning to train, evaluate, and publish a classification model, a regression model, and

More information

2013 Local Arts Agency Salary & Benefits Summary EXECUTIVE DIRECTOR / PRESIDENT / CEO

2013 Local Arts Agency Salary & Benefits Summary EXECUTIVE DIRECTOR / PRESIDENT / CEO Local Arts Agency Salary & Benefits Summary EXECUTIVE DIRECTOR / PRESIDENT / CEO PRIVATE LAAS ONLY PUBLIC LAAS ONLY The Executive Director / President / Chief Executive Officer (CEO) is the chief staff

More information

Graphical Presentation for Statistical Data (Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics) Introduction

Graphical Presentation for Statistical Data (Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics) Introduction Graphical Presentation for Statistical Data (Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics) Y O Lam, SCOPE, City University of Hong Kong Introduction The most convenient

More information

Classification of Procedurally Generated Textures

Classification of Procedurally Generated Textures Classification of Procedurally Generated Textures Emily Ye, Jason Rogers December 14, 2013 1 Introduction Textures are essential assets for 3D rendering, but they require a significant amount time and

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

Describable Visual Attributes for Face Verification and Image Search

Describable Visual Attributes for Face Verification and Image Search Advanced Topics in Multimedia Analysis and Indexing, Spring 2011, NTU. 1 Describable Visual Attributes for Face Verification and Image Search Kumar, Berg, Belhumeur, Nayar. PAMI, 2011. Ryan Lei 2011/05/05

More information

File Size Distribution on UNIX Systems Then and Now

File Size Distribution on UNIX Systems Then and Now File Size Distribution on UNIX Systems Then and Now Andrew S. Tanenbaum, Jorrit N. Herder*, Herbert Bos Dept. of Computer Science Vrije Universiteit Amsterdam, The Netherlands {ast@cs.vu.nl, jnherder@cs.vu.nl,

More information

Didacticiel - Études de cas

Didacticiel - Études de cas Subject In some circumstances, the goal of the supervised learning is not to classify examples but rather to organize them in order to point up the most interesting individuals. For instance, in the direct

More information

Data to Story Project: SPSS Cheat Sheet for Analyzing General Social Survey Data

Data to Story Project: SPSS Cheat Sheet for Analyzing General Social Survey Data Data to Story Project: SPSS Cheat Sheet for Analyzing General Social Survey Data This guide is intended to help you explore and analyze the variables you have selected for your group project. Conducting

More information

About Intellipaat. About the Course. Why Take This Course?

About Intellipaat. About the Course. Why Take This Course? About Intellipaat Intellipaat is a fast growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 700,000 in over

More information

FRPG 189I The Craft of Acting Professor Matt Salzberg. Introduction to Library Research. Spring 2014

FRPG 189I The Craft of Acting Professor Matt Salzberg. Introduction to Library Research. Spring 2014 FRPG 189I The Craft of Acting Professor Matt Salzberg Introduction to Library Research Spring 2014 St. Lawrence University Libraries Homepage RESEARCH is front and center Two ways to search library holdings

More information

Principal Components Analysis with Spatial Data

Principal Components Analysis with Spatial Data Principal Components Analysis with Spatial Data A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,

More information

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as

More information

HOW TO ACCESS ROTARY CLUB CENTRAL

HOW TO ACCESS ROTARY CLUB CENTRAL HOW TO ACCESS ROTARY CLUB CENTRAL 1 Go to My Rotary and select Sign In or Register. Or go to rotary.org/clubcentral to reach the site directly. You ll be prompted to sign in to My Rotary or create an account

More information

HOW TO ACCESS ROTARY CLUB CENTRAL

HOW TO ACCESS ROTARY CLUB CENTRAL HOW TO ACCESS ROTARY CLUB CENTRAL 1 Go to My Rotary and select Sign In or Register. Or go to rotary.org/clubcentral to reach the site directly. You ll be prompted to sign in to My Rotary or create an account

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

EXCELLING WITH ANALYSIS AND VISUALIZATION

EXCELLING WITH ANALYSIS AND VISUALIZATION EXCELLING WITH ANALYSIS AND VISUALIZATION A PRACTICAL GUIDE FOR DEALING WITH DATA Prepared by Ann K. Emery July 2016 Ann K. Emery 1 Welcome Hello there! In July 2016, I led two workshops Excel Basics for

More information

ClubSTATS USER GUIDE. Version 1.0. A window into the local community

ClubSTATS USER GUIDE. Version 1.0. A window into the local community A window into the local community USER GUIDE Version 1.0 July 2014 Contents 1 interface 3 1.1 Installation of the Tableau reader 3 1.2 Navigating the dashboards 3 1.3 Dashboards features and controls 4

More information

Concept Production. S ol Choi, Hua Fan, Tuyen Truong HCDE 411

Concept Production. S ol Choi, Hua Fan, Tuyen Truong HCDE 411 Concept Production S ol Choi, Hua Fan, Tuyen Truong HCDE 411 Travelust is a visualization that helps travelers budget their trips. It provides information about the costs of travel to the 79 most popular

More information

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution Name: Date: Period: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the

More information

How to Use the Cancer-Rates.Info/NJ

How to Use the Cancer-Rates.Info/NJ How to Use the Cancer-Rates.Info/NJ Web- Based Incidence and Mortality Mapping and Inquiry Tool to Obtain Statewide and County Cancer Statistics for New Jersey Cancer Incidence and Mortality Inquiry System

More information

Bank Manager. Bank Manager is an add on module that allows you to import your bank statements that you download from your internet banking.

Bank Manager. Bank Manager is an add on module that allows you to import your bank statements that you download from your internet banking. Overview is an add on module that allows you to import your bank statements that you download from your internet banking. No more manual capturing of bank statements. When the first bank statement is imported,

More information

APPLICATION OF SOFTMAX REGRESSION AND ITS VALIDATION FOR SPECTRAL-BASED LAND COVER MAPPING

APPLICATION OF SOFTMAX REGRESSION AND ITS VALIDATION FOR SPECTRAL-BASED LAND COVER MAPPING APPLICATION OF SOFTMAX REGRESSION AND ITS VALIDATION FOR SPECTRAL-BASED LAND COVER MAPPING J. Wolfe a, X. Jin a, T. Bahr b, N. Holzer b, * a Harris Corporation, Broomfield, Colorado, U.S.A. (jwolfe05,

More information

Let s get started with the module Getting Data from Existing Sources.

Let s get started with the module Getting Data from Existing Sources. Welcome to Data Academy. Data Academy is a series of online training modules to help Ryan White Grantees be more proficient in collecting, storing, and sharing their data. Let s get started with the module

More information

Family Map Server Specification

Family Map Server Specification Family Map Server Specification Acknowledgements Last Modified: January 5, 2018 The Family Map project was created by Jordan Wild. Thanks to Jordan for this significant contribution. Family Map Introduction

More information

Annex 6, Part 2: Data Exchange Format Requirements

Annex 6, Part 2: Data Exchange Format Requirements Annex 6, Part 2: Data Exchange Format Requirements Data Exchange Format Requirements Post-Event Response Export File Descriptions Please Note: Required data fields are marked. Data Exchange consists of

More information

Part 1: Written Questions (60 marks):

Part 1: Written Questions (60 marks): COMP 352: Data Structure and Algorithms Fall 2016 Department of Computer Science and Software Engineering Concordia University Combined Assignment #3 and #4 Due date and time: Sunday November 27 th 11:59:59

More information

Website Usability Study: The American Red Cross. Sarah Barth, Veronica McCoo, Katelyn McLimans, Alyssa Williams. University of Alabama

Website Usability Study: The American Red Cross. Sarah Barth, Veronica McCoo, Katelyn McLimans, Alyssa Williams. University of Alabama Website Usability Study: The American Red Cross Sarah Barth, Veronica McCoo, Katelyn McLimans, Alyssa Williams University of Alabama 1. The American Red Cross is part of the world's largest volunteer network

More information

Construct an optimal tree of one level

Construct an optimal tree of one level Economics 1660: Big Data PS 3: Trees Prof. Daniel Björkegren Poisonous Mushrooms, Continued A foodie friend wants to cook a dish with fresh collected mushrooms. However, he knows that some wild mushrooms

More information