Incluvie: Actor Data Collection Ada Gok, Dana Hochman, Lucy Zhan
|
|
- Candace Reeves
- 5 years ago
- Views:
Transcription
1 Incluvie: Actor Data Collection Ada Gok, Dana Hochman, Lucy Zhan Figure 0. Our partner company: Incluvie. 1. Project Task Incluvie is a platform that promotes and celebrates diversity within Hollywood. The platform allows movie viewers to voice their opinions on a movie regarding diversity and displays movie diversity ratings. Our project with Incluvie is focused on actors and actresses. Our task is to create a method that extracts specific data parameters related to gender and ethnic backgrounds of the actor. With our actor dataset, we continue to create data visualizations to illustrate the gender and cultural diversity in Hollywood. Incluvie will use our code and data visualizations to display diversity in Hollywood. In addition, we will use the dataset to analyze any possible correlation between the gender and race of an actor and his/her popularity. Furthermore, Incluvie will be able to use the datasets/code to compile actor profiles for its users. Utilizing this data, users can easily determine the kind of gender and cultural diversity stance a movie holds by viewing a movie s cast and the actors gender/ethnic information. 2. Approach Our main method of data extraction involves taking in an input text file. The input text file is a list of actors full names listed line-by-line. Given that there is not yet a full list of actors Incluvie wants to generate data for, we start with grabbing data for the top 100 actors in Hollywood. The goal is to first write the code for data extraction of the top 100 and have it automated for all actors in the future. We chose two lists to generate two datasets: Top 100 of the Highest Grossing Stars of at the Domestic Box Office and Top 100 on IMDB Starmeter 2. These lists are compiled to a text file in the same format as previously stated. After the method reads each text file, it parses it line-byline. For each line (each actor), multiple APIs and Python packages are used to search the web for the actor s ethnicity and gender. In this project, we are focusing on the TMDB API, Wikipedia API, Ethnicelebs and Beautiful Soup package. TMDB API takes in an actor s name and does a search for his/her TMDB page. It proceeds to retrieve data from the actor s TMDB page. The specific data parameters we retrieve are gender, birth place and popularity score 3. Next, Wikipedia API retrieves actor s Wikipedia page and the page content in HTML format. Using the Beautiful Soup package, we parse the HTML content to search for the Early Life section of the Wikipedia page (we recognize that the Early Life section generally 1 "Highest Grossing Stars of 2018 at the Domestic Box Office." 2 "IMDb Top Star Meter." user generated list 3 Popularity score is a score determined by TMDB based on the actor s number of views for the day and scores from previous days.
2 contains an actor s cultural background). We proceed to extract that section and search for keywords ( ethnicity, ancestral, descent, ancestry, mother, father, mom, dad, parents). Within the same sentence the keywords appear, we search for ethnicity terms. We have a list of common ethnicities in a csv file that can be converted into a python list using ethnicities_list.py. This is our python code that specifically parses a csv file (in the format of our ethnicities.csv file) and converts it into a python list. We use this ethnicities list to do our search for ethnicity titles. The final ethnicity titles get saved as ethnicity list for the actor. Ethnicelebs is a website that allows users to submit specific actors and data regarding their ethnicities. A majority of our list of actors are found on this site. Although Ethnicelebs does not have an API, we can parse the HTML code of the actor s page using Beautiful Soup to find ethnicities of the actor. Ethnicelebs has a specific section that lists out the ethnicities of the actors therefore, we retrieve this section this string and clean it up to save only the ethnicity titles. We clean up the line by searching for the titles matching those in our ethnicities_list.py list. The algorithm is very similar to the one used for Wikipedia. Once we finish the data extracting process, we store it all in a data frame and export the data frame as an excel sheet. We then proceed to generate data visualization to showcase the gender and cultural distribution of Hollywood using the Python library Matplotlib, which provides graphing abilities. With this library, we were able to generate an assortment of bar charts, pie charts and histograms (see the resulting figures in the graph appendix). In addition to graph visualizations, we want to analyze any correlations between gender and race with the popularity of actor (based on TMDB popularity score). We plan to find the correlation by testing through decision trees. We create an algorithm using decision trees that classifies an actor's popularity based on the actor's race and gender. The decision tree takes in a data frame of gender and race for all our actors (represented as binary variables {0,1}). Using this data, the tree predicts whether an actor s popularity score falls within the range of low, medium, or high. We split the actor s popularity into these three categories by splitting the popularity range (difference between lowest and highest score) into thirds. The decision tree visual (see graph appendix Figure 27) follows the path our algorithm takes in computing the popularity score. The decision tree trains 80% of the data and tests on 20%. The goal is to see how accurately a decision tree can classify an actor s popularity score based solely on race and gender. 3. Dataset and Metric For this project we have two final excel files of data: topactors_data and topimdb_data. The file alldata.xlsx contains the combined data of both. Each file represents data for a different list of corresponding actors (Top 100 of the Highest Grossing Stars of 2018 at the Domestic Box Office 4 as topactors.txt and Top 100 on IMDB Starmeter 5 as topimdb.txt, respectively) 6. A sample of the excel sheets are attached at the graph appendix (refer to figure 1-4). 4 "Highest Grossing Stars of 2018 at the Domestic Box Office." 5 "IMDb Top Star Meter." user generated list 6 From our analysis, IMDB star meter seems to focus more on the recently popular stars
3 The final excel sheets not only contain the data parameters that were extracted (gender, birth place, ethnicity list from Wikipedia, ethnicity list from Ethnicelebs, and popularity score), but also additional columns: percent similar, percent difference, race (sourced) and race (gut check). Percent similar/different are parameters for cross-checking the error between the ethnicities list generated from Wikipedia and Ethnicelebs. We calculate the value for percent similar by finding the number of ethnicity titles that match in both the Wikipedia and Ethnicelebs list and dividing it by the number of total unique ethnicity titles in both lists (for example, if Wikipedia and Ethnicelebs match on 3 out of 4 total ethnicities for one actor, this would be 3/4 = 75% similarity). The percent different parameter is 100% - percent similarity. This parameter represents how many of the ethnicity titles are not found in both lists. Race (sourced) represents the actor s race based on the actors ethnicities on Wikipedia and Ethnicelebs. Race (gut check) represents the actor s race determined by guessing 7. The reason we incorporate both race (sourced) and race (gut check) is to cross-check the data to ensure accuracy. Using the data, we generate multiple graphs relating to gender, popularity, ethnicity and race (e.g. gender histogram, popularity distribution histogram, birth country distribution histogram, etc.), which can be found in the graph appendix (refer to figure 5-26). These visualizations display the data to allow the viewer to easily draw conclusions. In order to also utilize our data, we created a decision tree to classify an actor s popularity based on gender and race. 4. Disclaimer/Assumptions Regarding this project and the methods, we want to make a few disclaimers and clarify some assumptions that were made. First, the actor dataset we are providing is made of data grabbed from three sites: TMDB, Wikipedia, and Ethnicelebs. TMDB is a database similar to that of IMDB and so the assumption is that the data is relatively accurate. Wikipedia and Ethniceleb pages consist of data that is edited and compiled by users, so the accuracy and reliability is questionable. However, we assume Wikipedia and Ethnicelebs users cross-check this data as they make edits and thus, it can be reliable. Furthermore, extracting data from both Wikipedia and Ethnicelebs helps validate the ethnicities. Additionally, our percent similarity/different parameters provide a value that can be a status of reliability. Second, we want to make the claim that not all data is available online, so there is missing data in our datasets. Third, regarding the data we added manually (e.g. race -- sourced and gut check), we labeled actors as Latino(a). We are aware that people of Hispanic origin do not classify Latino(a) as their race, but it was a specific factor we found necessary to clarify. Fourth, in reference to our ethnicities list generated from Wikipedia and Ethnicelebs, we are aware that certain titles (e.g. American, African American, and European ) are not ethnicity terms. However, Wikipedia and 7 We understand some bias could be involved in this process.
4 Ethnicelebs both list these as ethnicities and we felt it was important to include them so as to not lose data. This disclaimer is mirrored for the figures and figures in the graph appendix. 5. Results Referring to our gender bar graphs (figure 5 and figure 15 in the graph appendix), we can make specific conclusions regarding the gender representation in Hollywood. Figure 5 is the gender split amongst the Highest Grossing Stars of 2018 at the Domestic Box Office 8 and figure 16 is the gender split amongst the top Top 100 on IMDB Starmeter 9 (which we observe to be the current actors rising in fame ). We conclude that the highestgrossing stars have a 60:40 male to female ratio (figure 5), but the rising stars have a lower male to female ratio (figure 15). Furthermore, we analyze graphs regarding ethnicity distribution amongst all actors. Figure 10 and 11 (refer to graph appendix) are examples of the ethnicity distribution among the Highest Grossing Stars of 2018 at the Domestic Box Office 10. Observing these graphs, we can conclude that the majority of our actors are of European or American descent. Likewise, we analyze nationality distribution among the actors. Figure 9 of the graph appendix shows a nationality distribution stack bar chart by gender. The bar chart does not show any significant takeaways regarding gender and cultural representation in Hollywood. Most of the actors are from the US and within the actors with US nationality, majority are also male. Additionally, we create some graphs regarding race (sourced) shown in figure 15 and 26 of the graph appendix. These graphs continue to show that the white race overthrows all other races in terms of representation in Hollywood (like the ethnicity graphs). To visually demonstrate where actors are born, we created a geographical heat map to showcase where most actors in Hollywood come from. This is also a nice interactive addition for the Incluvie website (refer to figure 28 in graph appendix). For the decision tree, we run all 200 data points over it. The resulting accuracy is very high: 90% accuracy. 6. Conclusion As we wrap up our project for Incluvie, we want to reflect on the project, the challenges and the learning experience. Gathering data was one of the bigger challenges. It took time to find APIs that retrieved data we were looking for and figuring out how to parse through information on webpages. This was especially true with parsing through Wikipedia, where we had to continually expand our keywords list in order to find the ethnicities mentioned within paragraphs of text. However, through our data extracting, we learned how to effectively utilize APIs and the best methods for parsing through webpages. One concern that still remains is accuracy, as our data is not complete and is reliant on websites with user inputs (ex. Wikipedia and Ethnicelebs). 8 Highest Grossing Stars of 2018 at the Domestic Box Office." 9 "IMDb Top Star Meter." user generated list 10 Highest Grossing Stars of 2018 at the Domestic Box Office."
5 Overall, from this project, we have some conclusions to make about representation in Hollywood. We found that the gender representation is definitely unbalanced. When we analyze the data for the top 100 grossing actors, we get a 60:40 ratio of male to female. However, we realize that the actors rising in popularity (top IMDB starmeter) has a gender ratio that leans towards females. In one way, this can be viewed as a good sign, as it shows that there is improvement in gender representation in Hollywood. However, it can also be viewed negatively because despite the growing popularity of female actors, they are still being paid less. In addition, we can make conclusions about the cultural representation. As showcased by both ethnicity and race graphs, the White race conquers the Hollywood industry. From the decision tree we were able to classify actor popularity score ranges with a 90% accuracy. A high accuracy generally infers a high correlation -- thus, we can conclude that there is a high correlation between popularity, and the actors race and gender. However, we must recognize that we have a small data set and most of our actors have scores in the low range. The fact that a majority of our top actors have low popularity scores also casts some doubt on the TMDB popularity score. It is somewhat unclear as to how it is calculated, and there is no rationale as to why the range of popularity scores among certain actors varies so widely. A metric such as the F1 score could pinpoint the balance between precision and recall of the results. This would tell us how accurately the model identifies and returns all relevant popularity scores of the actor dataset. Perhaps it would be best to use a different measurement of popularity for future analysis. 7. Future Work While working on this project, we believe there are parts that can be improved in the future. As we ran the code, we believe there are parts that could be refactored to improve efficiency. The current code for our data extraction is not very time efficient. This could be the cause of parsing through long HTML codes. The run time does not affect our current work as the dataset only needs to be generated once. If we want to run it in the future for a larger set of actors and do so more regularly, it will be very time consuming. Therefore, it is necessary to optimize the run time. In addition, we wish to fix the code for data extraction and make it more dynamic. As of now, the code takes in a text file of a list of actor names. For future work, we want to find a reliable list of top 100 actors that is automatically updated daily/weekly. As the list updates, it would be ideal if we could automatically extract new data or use old data to update the graph visualizations for the 100 actors in Hollywood. Furthermore, we wish to find a more reliable source to gather a list of ethnicities for actors. Wikipedia and Ethnicelebs are both based on user input, which is not fully reliable. A more reliable list would not only increase the accuracy of our data, but also increase the amount of data, as we currently have some missing ethnicities for some actors. References 1. "Highest Grossing Stars of 2018 at the Domestic Box Office." The Numbers - Where Data and Movies Meet. Accessed December 03,
6 2. "IMDb Top Star Meter." IMDb. Accessed December 03,
7 Graph Appendix Figure 1. Data frame of Top 100 Highest Grossing Actors of 2018
8 Figure 2. Topactors_data.xlsx snippet
9 Figure 3. Data frame of Top 100 IMDB Starmeter Actor
10 Figure 4. TopIMDB_data.xlsx snippet
11 Figure 5. Gender Bar Chart for Top 100 Highest Grossing Actors of 2018
12 Figure 6. Popularity Distribution Histogram for Top 100 Highest Grossing Actors of 2018
13 Figure 7. Nationality Bar Chart for Top 100 Highest Grossing Actors of 2018
14 Figure 8. Nationality Distribution Pie Chart for Top 100 Highest Grossing Actors of 2018
15 Figure 9. Nationality Stack Bar Chart Split by Gender for Top 100 Highest Grossing Actors of 2018
16 Figure 10. Ethnicity Bar Chart for Top 100 Highest Grossing Actors of 2018
17 Figure 11. Ethnicity Distribution Pie Chart for Top 100 Highest Grossing Actors of 2018
18 Figure 12. Ethnicity Stacked Bar Chart by Gender for Top 100 Highest Grossing Actors of 2018
19 Figure 13. Nationality (U.S. States) Bar Chart for Top 100 Highest Grossing Actors of 2018
20 Figure 14. Nationality (U.S. States) Distribution Pie Chart for Top 100 Highest Grossing Actors of 2018
21 Figure 15. Race Bar Chart for Top 100 Highest Grossing Actors of 2018
22 Figure 16. Gender Bar Chart for Top 100 IMDB Starmeter Actors
23 Figure 17. Popularity Distribution Histogram for Top 100 IMDB Starmeter Actors
24 Figure 18. Nationality Bar Chart for Top 100 IMDB Starmeter Actors
25 Figure 19. Nationality Distribution Pie Chart for Top 100 IMDB Starmeter Actors
26 Figure 20. Nationality Stack Bar Chart Split by Gender for Top 100 IMDB Starmeter Actors
27 Figure 21. Ethnicity Bar Chart for Top 100 IMDB Starmeter Actors
28 Figure 22. Ethnicity Distribution Pie Chart for Top 100 IMDB Starmeter Actors
29 Figure 23. Ethnicity Stacked Bar Chart by Gender for Top 100 IMDB Starmeter Actors
30 Figure 24. Nationality (U.S. States) Bar Chart for Top 100 IMDB Starmeter Actors
31 Figure 25. Nationality (U.S. States) Distribution Pie Chart for Top 100 IMDB Starmeter Actors
32 Figure 26. Race Bar Chart for Top 100 IMDB Starmeter Actors
33 Figure 27. Decision Tree in regards to popularity
34 Figure 28. Snippet of Heat Map of the Birth Place of all data points
Texas Death Row. Last Statements. Data Warehousing and Data Mart. By Group 16. Irving Rodriguez Joseph Lai Joe Martinez
Texas Death Row Last Statements Data Warehousing and Data Mart By Group 16 Irving Rodriguez Joseph Lai Joe Martinez Introduction For our data warehousing and data mart project we chose to use the Texas
More informationPredict the box office of US movies
Predict the box office of US movies Group members: Hanqing Ma, Jin Sun, Zeyu Zhang 1. Introduction Our task is to predict the box office of the upcoming movies using the properties of the movies, such
More informationHybrid Recommendation System Using Clustering and Collaborative Filtering
Hybrid Recommendation System Using Clustering and Collaborative Filtering Roshni Padate Assistant Professor roshni@frcrce.ac.in Priyanka Bane B.E. Student priyankabane56@gmail.com Jayesh Kudase B.E. Student
More information5/13/2009. Introduction. Introduction. Introduction. Introduction. Introduction
Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007) Two types of technologies are widely used to overcome
More informationWellComm: A Speech and Language Toolkit for Screening and Intervention in the Early Years. Revised Edition Report Wizard: User s Guide
WellComm: A Speech and Language Toolkit for Screening and Intervention in the Early Years. Revised Edition Report Wizard: User s Guide 1. Overview of the Report Wizard The Report Wizard allows WellComm
More informationParents Names Mom Cell/Work # Dad Cell/Work # Parent List the Math Courses you have taken and the grade you received 1 st 2 nd 3 rd 4th
Full Name Phone # Parents Names Birthday Mom Cell/Work # Dad Cell/Work # Parent email: Extracurricular Activities: List the Math Courses you have taken and the grade you received 1 st 2 nd 3 rd 4th Turn
More informationUser Guide for the ILDS Illinois Higher Education Information System (HEIS)
User Guide for the ILDS Illinois Higher Education Information System (HEIS) Data Submission Portal https://www2.ibhe.org/web/lds-public/ Helpdesk 217-557-7384 IHEIS@ibhe.org Table of Contents Change History...
More informationCS229 Final Project: Predicting Expected Response Times
CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time
More informationJessica J. Schuler Department of Resource Analysis, Saint Mary s University of Minnesota, Minneapolis, MN 55404
Metadata Management: Visualization of Data Relationships Jessica J. Schuler Department of Resource Analysis, Saint Mary s University of Minnesota, Minneapolis, MN 55404 Keywords: Geographic Information
More informationContents. Introduction. CDI V3 Help Quick Search
Contents Introduction... 1 A The interface... 2 A-1 Introduction... 2 A-2. Results list... 2 A-3: Additional menu options... 3 B. Searching the CDI entries metadata... 4 B-1 Narrowing down your result
More informationPERTS Default Privacy Policy
PERTS Default Privacy Policy Version 1.3 2017-07-15 About PERTS PERTS is a center at Stanford University that helps educators apply evidence-based strategies in order to increase student engagement and
More informationT&F MEET MAGAGER 4.0 NEW FEATURES. Released January 14, 2013 Updated February 20, Track & Field
T&F MEET MAGAGER 4.0 NEW FEATURES Released January 14, 2013 Updated February 20, 2013 1 ABOUT T&F MEET MANAGER 4.0 MEET MANAGER 4.0 for (TFMM) is HY-TEK's 5th generation of Meet Management software. Provides
More informationAge & Stage Structure: Elephant Model
POPULATION MODELS Age & Stage Structure: Elephant Model Terri Donovan recorded: January, 2010 Today we're going to be building an age-structured model for the elephant population. And this will be the
More informationFacebook Page Insights
Facebook Product Guide for Facebook Page owners Businesses will be better in a connected world. That s why we connect 845M people and their friends to the things they care about, using social technologies
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationWellComm: A Speech and Language Toolkit for Screening and Intervention in the Early Years. Revised Edition Report Wizard: User s Guide
WellComm: A Speech and Language Toolkit for Screening and Intervention in the Early Years. Revised Edition Report Wizard: User s Guide 1. Overview of the Report Wizard The Report Wizard allows WellComm
More informationFamily Map Client Specification
Family Map Client Specification 1 Contents Contents... 2 Acknowledgements... 4 Introduction... 4 Purposes... 4 Family Map Client: A Quick Overview... 4 Activities... 5 Main Activity... 5 Login Fragment...
More informationOrange3 Data Fusion Documentation. Biolab
Biolab Mar 07, 2018 Widgets 1 IMDb Actors 1 2 Chaining 5 3 Completion Scoring 9 4 Fusion Graph 13 5 Latent Factors 17 6 Matrix Sampler 21 7 Mean Fuser 25 8 Movie Genres 29 9 Movie Ratings 33 10 Table
More informationValue of YouTube to the music industry Paper V Direct value to the industry
Value of YouTube to the music industry Paper V Direct value to the industry June 2017 RBB Economics 1 1 Introduction The music industry has undergone significant change over the past few years, with declining
More informationDATA STRUCTURE AND ALGORITHM USING PYTHON
DATA STRUCTURE AND ALGORITHM USING PYTHON Common Use Python Module II Peter Lo Pandas Data Structures and Data Analysis tools 2 What is Pandas? Pandas is an open-source Python library providing highperformance,
More informationIntermediate SPSS. If you have an SPSS dataset (*.sav), you can open it in the following way:
Center for Teaching, Research & Learning Research Support Group at the Social Science Research lab American University, Washington, D.C. http://www.american.edu/provost/ctrl/ 202-885-3862 Intermediate
More informationFamily Map Server Specification
Family Map Server Specification Acknowledgements The Family Map project was created by Jordan Wild. Thanks to Jordan for this significant contribution. Family Map Introduction Family Map is an application
More informationHacking Airbnb s search rank algorithm
TABLE OF CONTENTS Foreword... 2 Introduction... 2 Methodology... 2 Software used:... 2 Getting the data... 3 Getting an initial data set... 3 Diving Deeper... 4 Put it all together... 6 Results... 8 Data
More information6 TOOLS FOR A COMPLETE MARKETING WORKFLOW
6 S FOR A COMPLETE MARKETING WORKFLOW 01 6 S FOR A COMPLETE MARKETING WORKFLOW FROM ALEXA DIFFICULTY DIFFICULTY MATRIX OVERLAP 6 S FOR A COMPLETE MARKETING WORKFLOW 02 INTRODUCTION Marketers use countless
More informationFacebook Page Insights
Facebook Product Guide for Facebook Page owners Businesses will be better in a connected world. That s why we connect 800M people and their friends to the things they care about, using social technologies
More informationThe WellComm Report Wizard Guidance and Information
The WellComm Report Wizard Guidance and Information About Testwise Testwise is the powerful online testing platform developed by GL Assessment to host its digital tests. Many of GL Assessment s tests are
More informationEmpirical Study on Impact of Developer Collaboration on Source Code
Empirical Study on Impact of Developer Collaboration on Source Code Akshay Chopra University of Waterloo Waterloo, Ontario a22chopr@uwaterloo.ca Parul Verma University of Waterloo Waterloo, Ontario p7verma@uwaterloo.ca
More informationHelp Guide DATA INTERACTION FOR PSSA /PASA CONTENTS
Help Guide Help Guide DATA INTERACTION FOR PSSA /PASA 2015+ CONTENTS 1. Introduction... 4 1.1. Data Interaction Overview... 4 1.2. Technical Support... 4 2. Access... 4 2.1. Single Sign-On Accoutns...
More informationBar Code File Preparation, Transmission & Validation Frequently Asked Questions
Bar Code File Preparation, Transmission & Validation Frequently Asked Questions 2017-2018 Updates The Department of Education requires bar code labels for all students taking the Iowa Assessments. To comply
More informationChapter 2 Assignment (due Thursday, April 19)
(due Thursday, April 19) Introduction: The purpose of this assignment is to analyze data sets by creating histograms and scatterplots. You will use the STATDISK program for both. Therefore, you should
More informationChapter 3 Analyzing Normal Quantitative Data
Chapter 3 Analyzing Normal Quantitative Data Introduction: In chapters 1 and 2, we focused on analyzing categorical data and exploring relationships between categorical data sets. We will now be doing
More information74 Wyner Math Academy I Spring 2016
74 Wyner Math Academy I Spring 2016 CHAPTER EIGHT: SPREADSHEETS Review April 18 Test April 25 Spreadsheets are an extremely useful and versatile tool. Some basic knowledge allows many basic tasks to be
More informationHeuristic Evaluation of [Pass It On]
Heuristic Evaluation of [Pass It On] Evaluator #A: Janette Evaluator #B: John Evaluator #C: Pascal Evaluator #D: Eric 1. Problem Pass It On aims to transform some of the numerous negative and stressful
More information2013 Association Marketing Benchmark Report
2013 Association Email Marketing Benchmark Report Part I: Key Metrics 1 TABLE of CONTENTS About Informz.... 3 Introduction.... 4 Key Findings.... 5 Overall Association Metrics... 6 Results by Country of
More informationModule 9 Kelsie Donaldson Casey Boland Nitish Pahwa. IMDb, August 13th, 2002
Module 9 Kelsie Donaldson Casey Boland Nitish Pahwa IMDb, August 13th, 2002 IMDb.com Sitemap Landing Page Navigation bar News Forums Awards Movies TV Box office Search bar Header Site title logo Footer
More informationIntroduction to Nesstar
Introduction to Nesstar Nesstar is a software system for online data analysis. It is available for use with many of the large UK surveys on the UK Data Service website. You will know whether you can use
More informationPython & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012
Python & Web Mining Lecture 6 10-10-12 Old Dominion University Department of Computer Science CS 495 Fall 2012 Hany SalahEldeen Khalil hany@cs.odu.edu Scenario So what did Professor X do when he wanted
More informationNow that you ve got about 15 seed words, let s go examine the Keyword Planner tool.
Googles New Keyword Planner Googles New Keyword Planner tool is a great way to research keywords and come up with new ideas for your website content. While you can use it for both Adwords and SEO, you
More information1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file
1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/
More informationIBM MANY EYES USABILITY STUDY
IBM MANY EYES USABILITY STUDY Team Six: Rod Myers Dane Petersen Jay Steele Joe Wilkerson December 2, 2008 I543 Interaction Design Methods Fall 2008 Dr. Shaowen Bardzell EXECUTIVE SUMMARY We asked three
More informationChapter 2 - Graphical Summaries of Data
Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense
More informationIn-class activities: Sep 25, 2017
In-class activities: Sep 25, 2017 Activities and group work this week function the same way as our previous activity. We recommend that you continue working with the same 3-person group. We suggest that
More information8.4 Organizing Data In Excel. MDM - Term 3 Statistics
8.4 Organizing Data In Excel MDM - Term 3 Statistics WARM UP Log in to your computer Open Microsoft Excel Open a new sheet OBJECTIVES Enter data into Excel Use Excel to create a pie chart and personalize
More informationExcel Tips and FAQs - MS 2010
BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my
More informationThings you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.
1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.
More informationWhitepaper. Dashboard Design Tips & Tricks.
Whitepaper Dashboard Design Tips & Tricks Introduction Dashboards are popular but, let s be honest, many of them are useless. Either because they are visually boring or because zealous design too often
More informationWebsite Designs Australia
Proudly Brought To You By: Website Designs Australia Contents Disclaimer... 4 Why Your Local Business Needs Google Plus... 5 1 How Google Plus Can Improve Your Search Engine Rankings... 6 1. Google Search
More informationPart 1: Collecting and visualizing The Movie DB (TMDb) data
CSE6242 / CX4242: Data and Visual Analytics Georgia Tech Fall 2015 Homework 1: Analyzing The Movie DB dataset; SQLite; D3 Warmup; OpenRefine Due: Friday, 11 September, 2015, 11:55PM EST Prepared by Meera
More informationSurvey Questions and Methodology
Survey Questions and Methodology Winter Tracking Survey 2012 Final Topline 02/22/2012 Data for January 20 February 19, 2012 Princeton Survey Research Associates International for the Pew Research Center
More informationCS140 Final Project. Nathan Crandall, Dane Pitkin, Introduction:
Nathan Crandall, 3970001 Dane Pitkin, 4085726 CS140 Final Project Introduction: Our goal was to parallelize the Breadth-first search algorithm using Cilk++. This algorithm works by starting at an initial
More informationAssignment 2: Processing New York City Open Data
Assignment 2: Processing New York City Open Data Due date: Sept. 28, 11:59PM EST. 1 Summary This assignment is designed to expose you to the use of open data. Wikipedia has a good description of open data:
More informationCharts Data Importer
Charts Data Importer Plugin for Joomla! Editor extension for Aimy Charts This manual documents version 3.0.x of the Joomla! extension. http://www.aimy-extensions.com/joomla/charts-data-importer.html 1
More informationAutomation.
Automation www.austech.edu.au WHAT IS AUTOMATION? Automation testing is a technique uses an application to implement entire life cycle of the software in less time and provides efficiency and effectiveness
More informationFemale Brown Bear Weights
CC-20 Normal Distributions Common Core State Standards MACC.92.S-ID..4 Use the mean and standard of a data set to fit it to a normal distribution and to estimate population percentages. Recognize that
More informationDADC Submission Instructions
System Purpose: The purpose of the Designated Agency Data Collection (DADC) is for each designated agency (DA) to report candidate information to CDE. Pursuant to state statute, CRS 22-2112(q)(I), CDE
More informationIdentifying Important Communications
Identifying Important Communications Aaron Jaffey ajaffey@stanford.edu Akifumi Kobashi akobashi@stanford.edu Abstract As we move towards a society increasingly dependent on electronic communication, our
More informationImproving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar Youngstown State University Bonita Sharif, Youngstown State University
Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar, Youngstown State University Bonita Sharif, Youngstown State University Jenna Wise, Youngstown State University Alyssa Pawluk, Youngstown
More informationHow Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects?
How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects? Saraj Singh Manes School of Computer Science Carleton University Ottawa, Canada sarajmanes@cmail.carleton.ca Olga
More informationSurfing the Web Student Response
Surfing the Web Student Response CT.B.1.4.1 The student uses efficient search methods to locate information. Name: Date: Locate the Websites for the following information and give the complete URL address.
More informationThe Rise of the Connected Viewer
JULY 17, 2012 The Rise of the Connected Viewer 52% of adult cell owners use their phones while engaging with televised content; younger audiences are particularly active in these connected viewing experiences
More informationPDQ-Notes. Reynolds Farley. PDQ-Note 3 Displaying Your Results in the Expert Query Window
PDQ-Notes Reynolds Farley PDQ-Note 3 Displaying Your Results in the Expert Query Window PDQ-Note 3 Displaying Your Results in the Expert Query Window Most of your options for configuring your query results
More informationSurvey Questions and Methodology
Survey Questions and Methodology Spring Tracking Survey 2012 Data for March 15 April 3, 2012 Princeton Survey Research Associates International for the Pew Research Center s Internet & American Life Project
More informationUser Services Spring 2008 OBJECTIVES Introduction Getting Help Instructors
User Services Spring 2008 OBJECTIVES Use the Data Editor of SPSS 15.0 to to import data. Recode existing variables and compute new variables Use SPSS utilities and options Conduct basic statistical tests.
More informationFulbright Distinguished Awards in Teaching Program Partner Organization Application Manual. Institute of International Education
Fulbright Distinguished Awards in Teaching Program Partner Organization Application Manual Institute of International Education 2015-2016 Contents International Application...3 Overview...3 Login for Applicants...3
More informationUSER MANUAL BULK UPLOAD CONTENTS FEBRUARY 2014 INTRODUCTION 2 1. HOW DO I ACCESS THE BULK UPLOAD FACILITY? 3 2. BULK UPLOAD CODES 4
USER MANUAL BULK UPLOAD CONTENTS INTRODUCTION 2. HOW DO I ACCESS THE BULK UPLOAD FACILITY? 3 2. BULK UPLOAD CODES 4 3. BULK UPLOAD TEMPLATE - EXCEL FORMAT 5 4. BULK UPLOAD TEMPLATE - CSV FORMAT 0 5. ACW
More informationYawkey Scholars Program for Massachusetts Residents
Massachusetts Residents 2018-2019 Instructions & Pre-Application Checklist Thank you for your interest in the Yawkey Scholars Program. Please read these instructions carefully before filling out your application.
More informationCHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)
CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1 Daphne Skipper, Augusta University (2016) 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is
More informationCS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning
CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationSOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie
SOLOMON: Parentage Analysis 1 Corresponding author: Mark Christie christim@science.oregonstate.edu SOLOMON: Parentage Analysis 2 Table of Contents: Installing SOLOMON on Windows/Linux Pg. 3 Installing
More informationStudents Preferences for Receiving Communication from the University: A Report from the Student Life Survey
Preferences for Receiving Communication from the University: A Report from the Student Life Survey Center for the Study of Student Life March 2017 INTRODUCTION This report explores questions on the Student
More informationData Science Essentials
Data Science Essentials Lab 6 Introduction to Machine Learning Overview In this lab, you will use Azure Machine Learning to train, evaluate, and publish a classification model, a regression model, and
More information2013 Local Arts Agency Salary & Benefits Summary EXECUTIVE DIRECTOR / PRESIDENT / CEO
Local Arts Agency Salary & Benefits Summary EXECUTIVE DIRECTOR / PRESIDENT / CEO PRIVATE LAAS ONLY PUBLIC LAAS ONLY The Executive Director / President / Chief Executive Officer (CEO) is the chief staff
More informationGraphical Presentation for Statistical Data (Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics) Introduction
Graphical Presentation for Statistical Data (Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics) Y O Lam, SCOPE, City University of Hong Kong Introduction The most convenient
More informationClassification of Procedurally Generated Textures
Classification of Procedurally Generated Textures Emily Ye, Jason Rogers December 14, 2013 1 Introduction Textures are essential assets for 3D rendering, but they require a significant amount time and
More informationData analysis using Microsoft Excel
Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data
More informationDescribable Visual Attributes for Face Verification and Image Search
Advanced Topics in Multimedia Analysis and Indexing, Spring 2011, NTU. 1 Describable Visual Attributes for Face Verification and Image Search Kumar, Berg, Belhumeur, Nayar. PAMI, 2011. Ryan Lei 2011/05/05
More informationFile Size Distribution on UNIX Systems Then and Now
File Size Distribution on UNIX Systems Then and Now Andrew S. Tanenbaum, Jorrit N. Herder*, Herbert Bos Dept. of Computer Science Vrije Universiteit Amsterdam, The Netherlands {ast@cs.vu.nl, jnherder@cs.vu.nl,
More informationDidacticiel - Études de cas
Subject In some circumstances, the goal of the supervised learning is not to classify examples but rather to organize them in order to point up the most interesting individuals. For instance, in the direct
More informationData to Story Project: SPSS Cheat Sheet for Analyzing General Social Survey Data
Data to Story Project: SPSS Cheat Sheet for Analyzing General Social Survey Data This guide is intended to help you explore and analyze the variables you have selected for your group project. Conducting
More informationAbout Intellipaat. About the Course. Why Take This Course?
About Intellipaat Intellipaat is a fast growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 700,000 in over
More informationFRPG 189I The Craft of Acting Professor Matt Salzberg. Introduction to Library Research. Spring 2014
FRPG 189I The Craft of Acting Professor Matt Salzberg Introduction to Library Research Spring 2014 St. Lawrence University Libraries Homepage RESEARCH is front and center Two ways to search library holdings
More informationPrincipal Components Analysis with Spatial Data
Principal Components Analysis with Spatial Data A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,
More informationCS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp
CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as
More informationHOW TO ACCESS ROTARY CLUB CENTRAL
HOW TO ACCESS ROTARY CLUB CENTRAL 1 Go to My Rotary and select Sign In or Register. Or go to rotary.org/clubcentral to reach the site directly. You ll be prompted to sign in to My Rotary or create an account
More informationHOW TO ACCESS ROTARY CLUB CENTRAL
HOW TO ACCESS ROTARY CLUB CENTRAL 1 Go to My Rotary and select Sign In or Register. Or go to rotary.org/clubcentral to reach the site directly. You ll be prompted to sign in to My Rotary or create an account
More informationTree-based methods for classification and regression
Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting
More informationEXCELLING WITH ANALYSIS AND VISUALIZATION
EXCELLING WITH ANALYSIS AND VISUALIZATION A PRACTICAL GUIDE FOR DEALING WITH DATA Prepared by Ann K. Emery July 2016 Ann K. Emery 1 Welcome Hello there! In July 2016, I led two workshops Excel Basics for
More informationClubSTATS USER GUIDE. Version 1.0. A window into the local community
A window into the local community USER GUIDE Version 1.0 July 2014 Contents 1 interface 3 1.1 Installation of the Tableau reader 3 1.2 Navigating the dashboards 3 1.3 Dashboards features and controls 4
More informationConcept Production. S ol Choi, Hua Fan, Tuyen Truong HCDE 411
Concept Production S ol Choi, Hua Fan, Tuyen Truong HCDE 411 Travelust is a visualization that helps travelers budget their trips. It provides information about the costs of travel to the 79 most popular
More informationName: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution
Name: Date: Period: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the
More informationHow to Use the Cancer-Rates.Info/NJ
How to Use the Cancer-Rates.Info/NJ Web- Based Incidence and Mortality Mapping and Inquiry Tool to Obtain Statewide and County Cancer Statistics for New Jersey Cancer Incidence and Mortality Inquiry System
More informationBank Manager. Bank Manager is an add on module that allows you to import your bank statements that you download from your internet banking.
Overview is an add on module that allows you to import your bank statements that you download from your internet banking. No more manual capturing of bank statements. When the first bank statement is imported,
More informationAPPLICATION OF SOFTMAX REGRESSION AND ITS VALIDATION FOR SPECTRAL-BASED LAND COVER MAPPING
APPLICATION OF SOFTMAX REGRESSION AND ITS VALIDATION FOR SPECTRAL-BASED LAND COVER MAPPING J. Wolfe a, X. Jin a, T. Bahr b, N. Holzer b, * a Harris Corporation, Broomfield, Colorado, U.S.A. (jwolfe05,
More informationLet s get started with the module Getting Data from Existing Sources.
Welcome to Data Academy. Data Academy is a series of online training modules to help Ryan White Grantees be more proficient in collecting, storing, and sharing their data. Let s get started with the module
More informationFamily Map Server Specification
Family Map Server Specification Acknowledgements Last Modified: January 5, 2018 The Family Map project was created by Jordan Wild. Thanks to Jordan for this significant contribution. Family Map Introduction
More informationAnnex 6, Part 2: Data Exchange Format Requirements
Annex 6, Part 2: Data Exchange Format Requirements Data Exchange Format Requirements Post-Event Response Export File Descriptions Please Note: Required data fields are marked. Data Exchange consists of
More informationPart 1: Written Questions (60 marks):
COMP 352: Data Structure and Algorithms Fall 2016 Department of Computer Science and Software Engineering Concordia University Combined Assignment #3 and #4 Due date and time: Sunday November 27 th 11:59:59
More informationWebsite Usability Study: The American Red Cross. Sarah Barth, Veronica McCoo, Katelyn McLimans, Alyssa Williams. University of Alabama
Website Usability Study: The American Red Cross Sarah Barth, Veronica McCoo, Katelyn McLimans, Alyssa Williams University of Alabama 1. The American Red Cross is part of the world's largest volunteer network
More informationConstruct an optimal tree of one level
Economics 1660: Big Data PS 3: Trees Prof. Daniel Björkegren Poisonous Mushrooms, Continued A foodie friend wants to cook a dish with fresh collected mushrooms. However, he knows that some wild mushrooms
More information