Chameleon Metadata s Data Science Basics Tutorial Series. DSB-2: Information Gain (IG) By Eric Thornton,

Size: px
Start display at page:

Download "Chameleon Metadata s Data Science Basics Tutorial Series. DSB-2: Information Gain (IG) By Eric Thornton,"

Transcription

1 Chameleon Metadata s Data Science Basics Tutorial Series Data Science Basics Syllabus for DSB-2 (DSB-2-Infographic-1) Download PDF version here: DSB-2-Information-Gain-V10.pdf DSB-2: Information Gain (IG) By Eric Thornton, ET@ChameleonMetadata.com (20 September, 2017 V11) DSB-1 DSB-2 DSB-3 DSB-4 DSB-5 DSB-6 DSB-7 Links to Chameleon Metadata s Data Science Basics Series Papers INTRODUCTION Links will be activated on an ongoing basis going forward as each paper is published In the first of this series of six tutorials, DSB-1: Information Entropy (Ѥ), we walked through how one calculates Ѥ for datasets, which had a binary target variable with each record classified as a Type 1 or Type 2 record. When we calculate Ѥ for a dataset, the resulting calculation only tells us how much disorder is present with a given dataset. Remember that the dataset s Ѥ value we calculated in DSB-1 applies to the entire dataset as a whole and tells us nothing about the individual attributes (or features) of the dataset. In DSB-1 we were looking for a dataset with higher Ѥ values (higher value = more disordered) as a starting point. Once we have a dataset with a high Ѥ value, the next step is to calculate Ѥ for each individual attribute/feature relative to the same target variable we used in DSB-1. In other words, calculating the number of Type 1 or Type 2 records grouped by each possible value for a single attribute. For example, with possible values for GENDER being { Female or Male }, how many Type 1 and Type 2 records are there for each possible GENDER value (i.e. how many Type 1 or Type 2 records when GENDER= Female vs. how many when GENDER= Male ). In this second of Chameleon Metadata s Data Science Basics tutorial series, DSB-2: Information Gain (IG), we will walk through how to figure out which attribute is the best one to use as your initial split criteria based on its (IG) value relative to our target variable. This process of attribute-driven division of datasets is also the foundation for more complicated calculations we will explore in later tutorials. By DSB-5 and DSB-6 we will learn how to do probability estimates by classification. But for now, let s get started on learning how to calculate Information Gain (IG) for the table uci_dataset_flags_plus_target which. Topics covered in DSB-2: Information Gain (IG) will be: 1. Relationship Between Ѥ and Information Gain (IG) 2. Calculating Attribute Entropy (Æ) for Each Attribute of the UCI Flags Plus Target 3. Calculating Information Gain (IG) for Each Attribute of the UCI Flags Plus Target

2 BULK LEARNING LAB SETUP File Learning Lab Setup & Labs File (DSB-2-SQL-BULK-1.sql) All SQL & DDL used in this tutorial may be downloaded at Chameleon s Learning Lab as file: DSB-2-SQL-BULK-1. URL for DSB-2 Student Lab Setup Bulk SQL File MySQL Syntax (DSB-2-SQL-BULK-1) 1. RELATIONSHIP BETWEEN INFORMATION ENTROPY (Ѥ) INFORMATION GAIN (IG) Review of Information Entropy (Ѥ) In DSB-1, we walked through how to calculate Ѥ for a dataset with a binary target variable. As a reminder, a binary target variable is one with only two possible outcomes like having only Heads or Tails as possible outcomes for a given coin toss, etc. The problem with using the outcomes of a coin toss is that there are very few variables affecting the outcome of each toss. Typically, and assuming the same coin is used, there isn t anything variable about each toss. Sure, you could use multiple different 2 coins (i.e. coins with a face value of 2 Euros) and document differences between different 2 coins (i.e. this 2 coin weighs 0.04 grams less than that 2 coin) and provide another attribute/feature (coin-weight) about each coin. Then it s possible to group the number of Heads or Tails by coin weight. But, that s not how typical coin toss results work out, so we used the UCI Flags Plus Target dataset instead for tutorial DSB-1, because it gave us a good bit of other attributes with which to work. In that dataset, we find attributes/features other than the target variable, the Dependent Variable, which we can use as the Independent Variables to help us understand how much influence it has over the dependent variable. The more disordered the dataset, the higher its Information Entropy value As the proportion of target values skews towards one value, the Information Entropy value decreases Dataset Entropy vs. Attribute Entropy (DSB-2-Infographic-2) When only a single target value exists, the set is Pure and Information Entropy = 0 As we saw in Graph-1 from the first tutorial, and recopied below, we are looking for a dataset with a higher Ѥ value if it is to be of any value in predicting the answers to business questions. Why? Well, think about it, if a trick-coin with Heads on both sides always produces Heads whenever it is tossed, nobody needs data science to tell people it s highly probable that the next toss will be Heads too 100%. And, with such a trick-coin, we also know the probability of getting Tails on a future toss is very close to approaching zero as well 0%. Because of this, notice in Graph-1 below that Ѥ approaches zero as the probability of its binary target values moves towards 100% or 0%.

3 Introduction to Attribute Entropy (Æ) Entropy for Dataset with Binary Target Variable (DSB-1-Graph-1) Attribute Entropy (Æ) is the next piece of this puzzle we need to understand how strong the relationship is and, if one even exists between each of the non-target attributes and their associated target value for each observation. We have been looking for a highly disordered dataset with a high Ѥ value for the entire dataset up to this point. Here is a quick review for calculating the Ѥ value for the UCI Flags Plus Target Dataset from DSB-1: 57.73% Type % Type 2 x x x x x x x x x x = 10 Observations UCI Flags Plus Target Dataset - Chameleon Target Variable Counts (Type 1 vs. Type 2) (DSB-2-Infographic-3) Ѥ Flags Dataset = ((-1) * ( Type-1 * Log 2( Type-1))) + ((-1) * ( Type-2 * Log 2( Type-2))) Ѥ Flags Dataset = ((-1) * ( * ) ) + ((-1) * ( * ) Ѥ Flags Dataset = ( ) + ( ) Ѥ Flags Dataset = Information Entropy Ѥ of the UCI Flags Plus Target Dataset as Calculated in Tutorial DSB-1 (DSB-1-Formula-5)

4 This dataset has a high Parent Ѥ value (Ѥ Flags Dataset = ). Review all the interim calculations used for Ѥ of the UCI Flags Plus Target Dataset at Chameleon s Learning Lab: Link to Chameleon Learning Lab Dashboard (DSB-1-Dashboard-1) (DSB-1-URL-4) However, when we calculate Æ for each non-target attribute, we are looking for an attribute which reduces the disorder with respect to its expected target value. In other words, an attribute with a low Æ value can be used to bring order to a disordered dataset Nice. So, in a nutshell, we want datasets with high Ѥ values like the UCI Flags Plus Target Dataset. Within that dataset we also are looking for it to contain non-target attributes with low Æ. Because when we have high Ѥ at the dataset level it means there is disorder (mixed target values) of the target variable (i.e. a mix of Heads and Tails toss results or a mix of Type-1 and Type-2 records). And if we can discover an attribute within such a dataset which does a good job of predicting what its record s target value will be, it reduces disorder. We know we have a good attribute by its Æ value being much less than the overall Ѥ value calculated across the entire dataset with no consideration given to the value of any non-target attribute. NOTICE EXAMPLES USE MySQL DATABASE The remainder of this tutorial will be done using MySQL examples. For other databases (Oracle, SQL Server, PostgreSQL, etc.) the basic SQL shouldn t change much. But, you'll need to modify the catalog tables and their columns in the SQL as they will be different than the MySQL database catalog. NOTICE USERS OF Python, R, Weka, etc. The same approach would be used with Python, R, Weka, etc. Just keep in mind that where I will use an intermediate results table called uci_dataset_flags_child_attribute_counts. To accomplish the same thing with Python, R, Weka, etc., you would simply use Data Frames instead. To Repeat: the overall approach is the same for Python, R, Weka, etc., as the MySQL examples which will follow below. 2. CALCULATING ATTRIBUTE ENTROPY (Æ) FOR EACH ATTRIBUTE OF THE UCI FLAGS PLUS TARGET DATASET I ll begin by saying the text that follows is a little lengthy. I did this intentionally because so many tutorials I review seem to skip or omit key pieces of information you need to understand the material completely. This means that there is some overlap, especially with the SQL statements which follow. For example, all the functionality of SQL statements below, DSB-2-SQL-5, DSB-2-SQL-6 and DSB-2-SQL-7, could have all be accomplished with just DSB-2-SQL-7. But, the other two, DSB-2-SQL-5 and DSB-2-SQL-6 show the interim results and, hopefully, give the learner more insight.

5 Calculating Attribute Entropy (Æ) Components Using one of my favorite DBA tricks, we will use the database catalog to auto-generate the SQL needed for these commands. Statement DSB-2-SQL-1 generates the first SQL statement we will use. SELECT DISTINCT CONCAT( "UNION SELECT '", information_schema.columns.column_name, "' AS independent_variable, ", "COUNT(DISTINCT ", information_schema.columns.column_name, ") AS independent_value_count ", " FROM ", information_schema.columns.table_name, " GROUP BY independent_variable" ) AS sql_statement FROM information_schema.`columns` WHERE information_schema.`columns`.table_schema = 'ericth5_chameleon_tutorials_data_science' information_schema.`columns`.table_name = 'uci_dataset_flags_plus_target' GROUP BY information_schema.columns.column_name SQL Statement for Mining Attributes and Distinct Value Counts from UCI Dataset Flags Plus Target (DSB-2-SQL-1) NOTICE CATALOG GENERATED SQL ALWAYS NEEDS EDITS BEFORE EXECUTION Whenever generating SQL statements directly from a database catalog, it should be done using a repeatable pattern and UNION which will include each table attribute s results. However, that means if one leaves the first UNION in the generated SQL, it won t run. So, remove the first UNION. IMPORTANT: The first UNION needs to be remove before executing the SQL statement DSB-2-SQL-1 generates or it will not run UNION SELECT 'animate_image' AS independent_variable, COUNT(DISTINCT animate_image) AS independent_value_count FROM uci_dataset_flags_plus_target GROUP BY independent_variable UNION SELECT 'area' AS independent_variable, COUNT(DISTINCT area) AS independent_value_count FROM uci_dataset_flags_plus_target GROUP BY independent_variable UNION SELECT 'bars' AS independent_variable, COUNT(DISTINCT bars) AS independent_value_count FROM uci_dataset_flags_plus_target GROUP BY independent_variable UNION SELECT 'black' AS independent_variable, COUNT(DISTINCT black) AS independent_value_count FROM uci_dataset_flags_plus_target GROUP BY independent_variable UNION SELECT 'blue' AS independent_variable, COUNT(DISTINCT blue) AS independent_value_count FROM uci_dataset_flags_plus_target GROUP BY independent_variable Results not shown UNION SELECT 'white' AS independent_variable, COUNT(DISTINCT white) AS independent_value_count FROM uci_dataset_flags_plus_target GROUP BY independent_variable UNION SELECT 'zone' AS independent_variable, COUNT(DISTINCT zone) AS independent_value_count FROM uci_dataset_flags_plus_target GROUP BY independent_variable ORDER BY 2 DESC While not required, adding the ORDER BY 2 DESC statement will sort the results with highest counts at top Output from DSB-2-SQL-1 with Red Removed and Green Added for Execution Syntax Becomes Query DSB-2-SQL-2 (DSB-2-SQL-1-Results)

6 See the output from DSB-2-SQL-2 at Chameleon s online student lab environment, click here: Output from Query DSB-2-SQL-1 (DSB-2-SQL-1-Results) Next, we modify the output of SQL statement (DSB-2-SQL-1) above and execute that statement (now DSB-2-SQL-2) with its results shown below. The attributes chameleon_key and data_source_url were added as part of a Data Vault onboarding scheme and will be ignored as we go forward because those attributes were not part of the original dataset. Independent Variable Independent Value chameleon_key 194 name 194 area 136 population 48 count_of_sunstars 14 stripes 12 language 10 religion 8 count_of_colors 8 bottom_right_color 8 mainhue 8 top_left_color 7 landmass 6 bars 5 count_of_circles 4 zone 4 count_of_crosses 3 count_of_quarters 3 count_of_saltires 2 text 2 black 2 green 2 white 2 blue 2 icon_inanimate_image 2 red 2 animate_image 2 gold 2 chameleon_target_variable 2 orange 2 cresent 2 triangle 2 data_source_url 1 Output from Query DSB-2-SQL-2 (DSB-2-SQL-2-Results)

7 See the output from DSB-2-SQL-2 at Chameleon s online student lab environment, click here: Output from Query DSB-2-SQL-2 (DSB-2-SQL-2-Results-URL) Now that we have all the columns, let s take a second and discuss what we can learn just from looking at the results above shown as DSB-2-SQL-2-Results. While the first and last columns will be ignored as they weren t part of the original dataset, what else can we learn? The chameleon_key attribute is simply a surrogate key added for data management purposes. But, it also tells us there are 194 rows in this result set. Knowing that, we can immediately decide to ignore the name attribute as it has a 1:1 relationship with the surrogate key and, therefore, cannot be grouped in groups with the same name. The name attribute will have the best IG value, but at the cost of being too narrow in focus at this point in the process. Groups of one are most useful when one knows the exact identity in advance (i.e. with your credit card number, I can find the one person (group of one) associated with that card easily because it will return a group of one). If something can t be used for grouping, it s not a good candidate for classification either So, we dump the attribute name and won t use this attribute going forward. The attributes area and population are a little more complicated. On the one hand, there are a lot of possible values for both; 136 and 48 respectively. But, unlike name, these other two attributes are measurement values, which means we can t so easily dismiss them. While this will be covered more in later tutorials (DSB-4 and DSB-5), for now I ll just say that with variables like these are typically handled using a range of values to create a smaller number of groups than the 136 and 48 distinct values for the two attributes of the original dataset. Population has values ranging from 0 through 1008 in the UCI Flags Plus Target dataset with about 50% of the values being either 0 or 1. Maybe, we split the population attribute into two groups; ( Population <=1 and Population >1)... Maybe? but you get the point, we usually group attributes with many distinct possible values by range rather than each specific value. Once again employing one of my favorite DBA tricks, we use the database catalog to auto-generate the SQL needed without the NOT statements highlighted in red added to remove the chameleon_target_variable along with the other five attributes discussed above.

8 After the NOT statements are added, we now have SQL statement DSB-2-SQL-3: SELECT DISTINCT CONCAT( "UNION SELECT DISTINCT '", information_schema.columns.column_name, "', ", information_schema.columns.column_name, ", ", "chameleon_target_variable, ", -- "COUNT(", information_schema.columns.column_name, ") AS attribute_target_value_xref", " ", " FROM ", information_schema.columns.table_name, " GROUP BY ", information_schema.columns.column_name, ", chameleon_target_variable" ) AS sql_statement FROM information_schema.`columns` WHERE information_schema.`columns`.table_schema = 'ericth5_chameleon_tutorials_data_science' information_schema.`columns`.table_name = 'uci_dataset_flags_plus_target' NOT information_schema.columns.column_name = 'chameleon_key' NOT information_schema.columns.column_name = 'chameleon_target_variable' NOT information_schema.columns.column_name = 'data_source_url' NOT information_schema.columns.column_name = 'name' NOT information_schema.columns.column_name = 'area' NOT information_schema.columns.column_name = 'population' GROUP BY information_schema.columns.column_name SQL Statement for Calculating Target Variable Value Counts for Each Non-Target Attribute s Distinct Values (DSB-2-SQL-3) The results from SQL statement DSB-2-SQL-3 do not include the columns with NOT and, subsequently, becomes DSB-2-SQL-4 once the first UNION is removed to correct the syntax of the overall statement. IMPORTANT: The first UNION needs to be remove before executing the SQL statement DSB-2-SQL-1 generates or it will not run UNION SELECT DISTINCT 'animate_image', animate_image, chameleon_target_variable, COUNT(animate_image) AS attribute_target_value_xref FROM uci_dataset_flags_plus_target GROUP BY animate_image, chameleon_target_variable UNION SELECT DISTINCT 'bars', bars, chameleon_target_variable, COUNT(bars) AS attribute_target_value_xref FROM uci_dataset_flags_plus_target GROUP BY bars, chameleon_target_variable UNION SELECT DISTINCT 'count_of_circles', count_of_circles, chameleon_target_variable, COUNT(count_of_circles) AS attribute_target_value_xref FROM uci_dataset_flags_plus_target GROUP BY count_of_circles, chameleon_target_variable... Most results not shown UNION SELECT DISTINCT 'triangle', triangle, chameleon_target_variable, COUNT(triangle) AS attribute_target_value_xref FROM uci_dataset_flags_plus_target GROUP BY triangle, chameleon_target_variable UNION SELECT DISTINCT 'white', white, chameleon_target_variable, COUNT(white) AS attribute_target_value_xref FROM uci_dataset_flags_plus_target GROUP BY white, chameleon_target_variable UNION SELECT DISTINCT 'zone', zone, chameleon_target_variable, COUNT(zone) AS attribute_target_value_xref FROM uci_dataset_flags_plus_target GROUP BY zone, chameleon_target_variable Output from DSB-2-SQL-3 with First UNION Removed for Execution Syntax Then Becomes Query DSB-2-SQL-4 (DSB-2-SQL-4-Results)

9 See the output from DSB-2-SQL-3 at Chameleon s online student lab environment, click here: Output from Query DSB-2-SQL-3 (DSB-2-SQL-3-Results-URL) The abbreviated results from SQL statement DSB-2-SQL-4 are below (with 213 results not shown). Independent Variable Value Target Value Count of Attribute Value + Target Value animate_image 0 Type 1 88 animate_image 0 Type 2 67 animate_image 1 Type 1 24 animate_image 1 Type 2 15 bars 0 Type 1 95 bars 0 Type 2 64 bars 1 Type 1 1 bars 1 Type 2 5 bars 2 Type 1 3 bars 2 Type 2 4 bars 3 Type 1 12 bars 3 Type 2 9 bars 5 Type Results not shown zone 3 Type 1 11 zone 3 Type 2 5 zone 4 Type 1 34 zone 4 Type 2 24 Abbreviated Output from Query DSB-2-SQL-4 (DSB-2-SQL-4-Results) See the entire output from the DSB-2-SQL-4 query at Chameleon s online student lab environment here: Output from Query DSB-2-SQL-4 (DSB-2-SQL-4-Results-URL) Interim Results Required for Calculation of Attribute Entropy (Æ) The next step is to create the SQL statement, DSB-2-SQL-5, which will return the total number of records by {Independent Variable, Independent Variable Value, Target Value}. We need the results of DSB-2-SQL-5 to know what proportion of each record type is Type 1 vs. how many are Type 2 records. For example, using the attribute animate_image we can see that about 80% of its possible values equals 0 and about 20% equals 1. To get this done, we also need to know that for animate_image = 0, there are 155 records spanning both target values ( Type 1 to Type 2 ). Of those records, 56.77% have their target value equal to Type 1 and 43.23% have target value equal to Type 2. Of course, there s an SQL statement for this too. To make the SQL less lengthy, I ll create a new MySQL table based exactly on SQL statement DSB-2-SQL-4 and call the new table uci_dataset_flags_child_attribute_counts. This new MySQL table is populated with the exact same records as the results from SQL DSB-2-SQL-4 above.

10 Now, I ll need calculations at two different levels: Counts for each {variable + variable value + target value } combination from DSB-2-SQL-4; and Counts for each {variable + variable value} combination from DSB-2-SQL-5 for each attribute. The new table uci_dataset_flags_child_attribute_counts is created and loaded (SQL needed to do it yourself in box below) using the long SQL statement DSB-2-SQL-4_uci_dataset_flags_child_attribute_counts.sql. Since that SQL is really long, I didn t put the NOT statements inline below to save space. In DSB-2-SQL-5 below, we will be joining this temporary table, uci_dataset_flags_child_attribute_counts, to itself to get various component variables needed for the calculations from and forward. Once again, the entire set of calculations discussed here in DSB-2 can also be done with Python, R, Weka, etc. In those cases, data frames would be used instead of my temporary uci_dataset_flags_child_attribute_counts table. SQL to create and load the uci_dataset_flags_child_attribute_counts from the Learning Lab is here: Creates: uci_dataset_flags_child_attribute_counts SQL (DDL) to create and load the uci_dataset_flags_child_attribute_counts (DSB-2-SQL-DDL-1-URL) Remember This tutorial is using a MySQL here. When using Python, R, Weka, etc., you just use Data Frames instead of a MySQL table. Now SQL statement DSB-2-SQL-5 may be executed against the table uci_dataset_flags_child_attribute_counts : SELECT independent_variable_name, independent_value, SUM(target_value_count) FROM uci_dataset_flags_child_attribute_counts GROUP BY independent_variable_name, independent_value ORDER BY independent_variable_name, independent_value SQL Statement for Calculating Attribute Value Counts for Each Non-Target Attribute s Distinct Values (DSB-2-SQL-5) The abbreviated results from SQL statement DSB-2-SQL-5 are below (with 211 results not shown). Notice that regardless the number of possible distinct values for each attribute, the SUM of every attribute s counts for each of its possible values always adds up to 194. The only difference is how many classification groups we get for each. That s why we removed the name and population attributes because they create too many classes to be useful at the value-level.

11 Independent Variable Value Count Records This Value animate_image animate_image 1 39 bars bars 1 6 bars 2 7 bars 3 21 bars 5 1 black black Results not shown white 0 48 white zone 1 91 zone 2 29 zone 3 16 zone 4 58 Abbreviated Output from Query DSB-2-SQL-5 (DSB-2-SQL-5-Results) See the entire output from the DSB-2-SQL-5 query at Chameleon s online student lab environment here: Final Attribute Entropy (Æ) Calculations Output from Query DSB-2-SQL-5 (DSB-2-SQL-5-Results-URL) Here is the basic formula for calculating Æ for an attribute with n possible target values: Attribute Entropy (Æ) = ((p attribute-1 * Log 2(p attribute-1)))) + ((p attribute-2 * Log 2(p attribute-2))))... + ((p n * Log 2(p n)))) Basic Formula for Calculating Attribute Entropy (Æ) (DSB-2-Formula-1) DSB-2-Formula-1 above is the base formula for calculating Æ of an attribute with n possible target values. Let s walk through the formula using one of the attributes from the UCI Flags Plus Target. But, first we need to compute all the variables which go into the Information Gain for each attribute. This is SQL statement DSB-2-SQL-6 which executes against the table uci_dataset_flags_child_attribute_counts.

12 SELECT flags_count_1.independent_variable_name AS attribute_name, flags_count_1.independent_value AS attribute_value, flags_count_1.target_value, SUM(flags_count_2.target_value_count) AS attribute_value_count, flags_count_1.target_value_count AS attribute_and_target_count, ((flags_count_1.target_value_count)/(sum(flags_count_2.target_value_count))) AS proportion_this_target_value, (-1*(LOG2(((flags_count_1.target_value_count)/(SUM(flags_count_2.target_value_count)))))) AS log2_proportion, ((((flags_count_1.target_value_count)/(sum(flags_count_2.target_value_count)))) * ((-1*(LOG2(((flags_count_1.target_value_count)/(SUM(flags_count_2.target_value_count))))))) ) AS value_plus_target_entropy FROM uci_dataset_flags_child_attribute_counts flags_count_1, uci_dataset_flags_child_attribute_counts flags_count_2 WHERE flags_count_1.independent_variable_name = flags_count_2.independent_variable_name flags_count_1.independent_value = flags_count_2.independent_value GROUP BY flags_count_1.independent_variable_name, flags_count_1.independent_value, flags_count_1.target_value, flags_count_1.target_value_count ORDER BY 1,2,3 SQL Statement for Attribute Entropy Base Calculations (DSB-2-SQL-6) The abbreviated results from SQL statement DSB-2-SQL-6 are below (with 279 results not shown). Attribute Name Attribute Value (AV) Target Value (TV) Total AV Count TV Count This AV Proportion AV + TV LOG2 Proportion AV + TV Entropy Attribute + AV + TV animate_image 0 Type animate_image 0 Type animate_image 1 Type animate_image 1 Type bars 0 Type bars 0 Type bars 1 Type bars 1 Type Results not shown white 1 Type white 1 Type zone 1 Type zone 1 Type zone 2 Type zone 2 Type zone 3 Type zone 3 Type zone 4 Type zone 4 Type Abbreviated Output from Query DSB-2-SQL-6 (DSB-2-SQL-6-Results)

13 See the entire output from the DSB-2-SQL-6 query at Chameleon s online student lab environment here: Output from Query DSB-2-SQL-6 (DSB-2-SQL-6-Results-URL) Now that we have the pieces, we execute DSB-2-SQL-7, against table uci_dataset_flags_child_attribute_counts Notice that I am joining the same table to itself three (3) times: once for the Type 1 counts; a second time for the Type 2 counts; and a third time (labeled flags_count_3 ) to hold the SUM of the Type 1 and Type 2 record counts. The flags_count_3 values will be needed to store interim results when we calculate percentage for each target value in DSB-2-SQL-7 below. SQL TIP WHY JOINING A TABLE TO ITSELF IS SO USEFUL You will notice that much of the SQL in this tutorial uses the uci_dataset_flags_child_attribute_counts table by joining the table to itself two or three times. Using SQL statement DSB-2-7 as an example, joining the table to itself three times allows calculations to be done as each result is processed. Blue for Type 1 counts, Green for Type 2 counts and Orange for the SUM of the Type 1 and Type 2 counts. SELECT flags_count_1.independent_variable_name AS attribute_name, flags_count_1.independent_value AS attribute_value, Type 1 Entropy for Each Attribute Value + Target Value ((((flags_count_1.target_value_count)/(sum(flags_count_3.target_value_count)))) * ((-1*(LOG2(((flags_count_1.target_value_count)/(SUM(flags_count_3.target_value_count))))))) ) AS type_1_entropy, -- Type 1 Entropy for Each Attribute Value + Target Value Type 2 Entropy for Each Attribute Value + Target Value ((((flags_count_2.target_value_count)/(sum(flags_count_3.target_value_count)))) * ((-1*(LOG2(((flags_count_2.target_value_count)/(SUM(flags_count_3.target_value_count))))))) ) AS type_2_entropy -- Type 2 Entropy for Each Attribute Value + Target Value -- FROM uci_dataset_flags_child_attribute_counts flags_count_1, uci_dataset_flags_child_attribute_counts flags_count_2, uci_dataset_flags_child_attribute_counts flags_count_3 WHERE flags_count_1.independent_variable_name = flags_count_3.independent_variable_name flags_count_2.independent_variable_name = flags_count_3.independent_variable_name flags_count_1.independent_value = flags_count_3.independent_value flags_count_2.independent_value = flags_count_3.independent_value flags_count_1.target_value = 'Type 1' flags_count_2.target_value = 'Type 2' GROUP BY flags_count_3.independent_variable_name, flags_count_3.independent_value ORDER BY 1,2 SQL Statement for Calculating Attribute Entropy for Attribute Value + Target Value (DSB-2-SQL-7) The abbreviated results from SQL statement DSB-2-SQL-7 are below (with 105 results not shown).

14 Attribute Name Value Type 1 Attribute Entropy Æ Type 2 Attribute Entropy Æ animate_image animate_image bars bars bars bars black black blue blue Results not shown triangle triangle white white zone zone zone zone Abbreviated Output from Query DSB-2-SQL-7 (DSB-2-SQL-7-Results) See the entire output from the DSB-2-SQL-7 query at Chameleon s online student lab environment here: Output from Query DSB-2-SQL-7 (DSB-2-SQL-7-Results-URL) 3. CALCULATING INFORMATION GAIN (IG) FOR EACH ATTRIBUTE OF THE UCI FLAGS PLUS TARGET DATASET Calculating IG for a given attribute (or called a Feature in R) simply involves knowing the difference between the dataset s Ѥ and each attribute s Æ. The greater the difference, the more IG that attribute value provides. Dataset Entropy, Attribute Entropy and Information Gain (DSB-2-Infographic-4)

15 We revisit the basic formula for calculating Æ for an attribute with n possible target values first shown in the text above (immediately after SQL statement DSB-2-SQL-5). Information Gain (IG) EACH ATTRIBUTE = (Information Entropy (Ѥ) DATASET ) - (Attribute Entropy (Æ) EACH ATTRIBUTE ) Basic Formula for Calculating Attribute Entropy (Æ) (DSB-2-Formula-1) Now that we have all the components needed to calculate each attribute s Æ, we just modify SQL statement DSB- 2-SQL-7 above by adding + highlighted in yellow, removing AS type_1_entropy from the Type 1 block of SQL in blue and change the computed column s label to AS classification_entropy. DSB-2-SQL-8 Remember SQL sees two hyphens --' and anything written after them is treated as a comment and ignored SELECT flags_count_1.independent_variable_name AS attribute_name, flags_count_1.independent_value AS attribute_value, -- Total Entropy for Each Attribute Value ( -- Type 1 Entropy for Each Attribute Value + Target Value ((((flags_count_1.target_value_count)/(sum(flags_count_3.target_value_count)))) * ((-1*(LOG2(((flags_count_1.target_value_count)/(SUM(flags_count_3.target_value_count)))))))) -- Type 1 Entropy for Each Attribute Value + Target Value + -- Type 2 Entropy for Each Attribute Value + Target Value ((((flags_count_2.target_value_count)/(sum(flags_count_3.target_value_count)))) * ((-1*(LOG2(((flags_count_2.target_value_count)/(SUM(flags_count_3.target_value_count))))))) ) -- Type 2 Entropy for Each Attribute Value + Target Value ) AS classification_entropy -- Total Entropy for Each Attribute Value FROM uci_dataset_flags_child_attribute_counts flags_count_1, uci_dataset_flags_child_attribute_counts flags_count_2, uci_dataset_flags_child_attribute_counts flags_count_3 WHERE flags_count_1.independent_variable_name = flags_count_3.independent_variable_name flags_count_2.independent_variable_name = flags_count_3.independent_variable_name flags_count_1.independent_value = flags_count_3.independent_value flags_count_2.independent_value = flags_count_3.independent_value flags_count_1.target_value = 'Type 1' flags_count_2.target_value = 'Type 2' GROUP BY flags_count_3.independent_variable_name, flags_count_3.independent_value ORDER BY 1, 2 SQL Statement for Calculating Class Entropy by Attribute Value for Both Target Values (DSB-2-SQL-8) WARNING

16 ENTROPY CALCULATIONS SHOULD NEVER BE NEGATIVE When I first wrote DSB-2-SQL-8, I was getting a couple results which were negative numbers. This can t be from a mathematics standpoint. So, if you ever come up with a negative result you did something wrong. In my case, I forgot to multiply by the percentage of each target value which is shown above as Proportion This AV+TV from: DSB-2-SQL-6-Results above. The abbreviated results from SQL statement DSB-2-SQL-8 are below (with 107 results not shown). Remember that the results from DSB-2-SQL-8 below reflect the entropy of a class of records ( attribute_name + value ) spanning both target values (i.e. both Type 1 and Type 2 ). All that s needed here is to add the Type 1 Æ to the Type 2 Æ to get the entropy for the Attribute Value across both target values. Attribute Name Value Class Entropy animate_image animate_image bars bars bars bars black black blue blue Results not shown white white zone zone zone zone Abbreviated Output from Query DSB-2-SQL-8 (DSB-2-SQL-8-Results) Using attribute value animate_image = 0 as an example, we simply add both results from DSB-2-SQL-8-Results to demonstrate how the first row of DSB-2-SQL-9-Results is calculated. (Type 1 Æ = ) + (Type 2 Æ = ) = Formula for Calculating Attribute Entropy (Æ) for the animate_image attribute of the UCI Flags Plus Target Dataset (DSB-2-Formula-2) Put another way, the lower the Class Entropy the greater the IG of the class. And, the greater the IG of a class, the better that attribute will be in bringing more order (low class entropy) to a disordered dataset (has a high Ѥ value), such as the UCI Flags Plus Target which was used The terms class entropy and attribute entropy will be assumed to mean the same thing from this point forward.

17 See the entire output from the DSB-2-SQL-8 query at Chameleon s online student lab environment here: Output from Query DSB-2-SQL-8 (DSB-2-SQL-8-Results-URL) Since we re looking for the lowest attribute/class entropy statement DSB-2-SQL-8 is rewritten with a new ORDER BY and becomes DSB-2-SQL-9. Since this query sorts on Æ value, and since the lowest Æ (or class entropy) means the most IG, this query will present your past candidate first in the results. The final SQL statement, DSB-2-SQL-9, just changes the ORDER BY of DSB-2-SQL-8 so it returns the ( attribute_name + value ) with the most Information Gain (IG) at the top of the results. Typically, the ( attribute_name + value ) combination with the highest IG is the first branch of a classification tree... however, there are no definitive rules with this work and is why I used typically. Finally, if you were paying real close attention to the details, you have noticed that after query DSB-2-SQL-3 the WHERE clause statements which eliminate the attributes with too many distinct values haven t been used since then. These were identified by the NOT statements which were part of DSB-2-SQL-3 s WHERE clause. Of course, they could be inserted into DSB-2-SQL-4 through DSB-2-SQL-8 too, but were not included so DSB-2-SQL-4 through DSB-2-SQL-8 wouldn t take so much text space. However, since we re now looking for the candidate ( attribute_name + value ) combinations with the lowest entropy with DSB-2-SQL-9, we need to reintroduce the NOT statements now as part of DSB-2-SQL-9 so those attributes are not included in the final results set. The SQL statement DSB-2-SQL-9 is on the next page so the entire statement is on one page.

18 Remember SQL sees two hyphens --' and anything after them as a comment and ignores anything after the two hyphens SELECT flags_count_1.independent_variable_name AS attribute_name, flags_count_1.independent_value AS attribute_value, -- Total Entropy for Each Attribute Value Type 1 Entropy for Each Attribute Value + Target Value ((((flags_count_1.target_value_count)/(sum(flags_count_3.target_value_count)))) * ((-1*(LOG2(((flags_count_1.target_value_count)/(SUM(flags_count_3.target_value_count))))))) ) -- Type 1 Entropy for Each Attribute Value + Target Value Type 2 Entropy for Each Attribute Value + Target Value ((((flags_count_2.target_value_count)/(sum(flags_count_3.target_value_count)))) * ((-1*(LOG2(((flags_count_2.target_value_count)/(SUM(flags_count_3.target_value_count))))))) ) AS classification_entropy -- Type 2 Entropy for Each Attribute Value + Target Value Total Entropy for Each Attribute Value FROM uci_dataset_flags_child_attribute_counts flags_count_1, uci_dataset_flags_child_attribute_counts flags_count_2, uci_dataset_flags_child_attribute_counts flags_count_3 WHERE flags_count_1.independent_variable_name = flags_count_3.independent_variable_name flags_count_2.independent_variable_name = flags_count_3.independent_variable_name flags_count_1.independent_value = flags_count_3.independent_value flags_count_2.independent_value = flags_count_3.independent_value flags_count_1.target_value = 'Type 1' flags_count_2.target_value = 'Type 2' NOT flags_count_3.independent_value = 'chameleon_key' NOT flags_count_3.independent_value = 'chameleon_target_variable' NOT flags_count_3.independent_value = 'data_source_url' NOT flags_count_3.independent_value = 'name' NOT flags_count_3.independent_value = 'area' NOT flags_count_3.independent_variable_name = 'population' GROUP BY flags_count_3.independent_variable_name, flags_count_3.independent_value ORDER BY 3 SQL Statement for Calculating Class Entropy by Attribute Value for Both Target Values (DSB-2-SQL-9)

19 The abbreviated results from SQL statement DSB-2-SQL-9 are below: Attribute Name Value Class Entropy bars count_of_colors landmass triangle stripes language Results not shown count_of_sunstars 6 1 stripes 9 1 language 5 1 count_of_sunstars 3 1 Abbreviated Output from Query DSB-2-SQL-9 (DSB-2-SQL-9-Results) See the entire output from the DSB-2-SQL-9 query at Chameleon s online student lab environment here: Output from Query DSB-2-SQL-9 (DSB-2-SQL-9-Results-URL) That's the end of this tutorial! In the next of this series of tutorials, DSB-3: Loading UCI Flags Plus Target into A Data Vault, I ll walk the learner through how we keep all the data, SQL, images and interim results managed so there are not constantly recalculated over and over again. When that happens, different people recalculate the same metric using different methods, tools (Python vs. R) or data. Why do you think statements like this are heard so often? The numbers from these two reports should match, but they re not even close Tutorial DSB-3: Loading UCI Flags Plus Target into A Data Vault will explain how to avoid that problem with a Data Vault ETL environment working in conjunction with Chameleon Metadata so you do it once and reuse it everywhere. See you then. ### By Eric Thornton, ET@ChameleonMetadata.com

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software. Welcome to Basic Excel, presented by STEM Gateway as part of the Essential Academic Skills Enhancement, or EASE, workshop series. Before we begin, I want to make sure we are clear that this is by no means

More information

Table of Laplace Transforms

Table of Laplace Transforms Table of Laplace Transforms 1 1 2 3 4, p > -1 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Heaviside Function 27 28. Dirac Delta Function 29 30. 31 32. 1 33 34. 35 36. 37 Laplace Transforms

More information

5 R1 The one green in the same place so either of these could be green.

5 R1 The one green in the same place so either of these could be green. Page: 1 of 20 1 R1 Now. Maybe what we should do is write out the cases that work. We wrote out one of them really very clearly here. [R1 takes out some papers.] Right? You did the one here um where you

More information

Term Definition Introduced in: This option, located within the View tab, provides a variety of options to choose when sorting and grouping Arrangement

Term Definition Introduced in: This option, located within the View tab, provides a variety of options to choose when sorting and grouping Arrangement 60 Minutes of Outlook Secrets Term Definition Introduced in: This option, located within the View tab, provides a variety of options to choose when sorting and grouping Arrangement messages. Module 2 Assign

More information

SQL stands for Structured Query Language. SQL is the lingua franca

SQL stands for Structured Query Language. SQL is the lingua franca Chapter 3: Database for $100, Please In This Chapter Understanding some basic database concepts Taking a quick look at SQL Creating tables Selecting data Joining data Updating and deleting data SQL stands

More information

The name of our class will be Yo. Type that in where it says Class Name. Don t hit the OK button yet.

The name of our class will be Yo. Type that in where it says Class Name. Don t hit the OK button yet. Mr G s Java Jive #2: Yo! Our First Program With this handout you ll write your first program, which we ll call Yo. Programs, Classes, and Objects, Oh My! People regularly refer to Java as a language that

More information

Excel Basics: Working with Spreadsheets

Excel Basics: Working with Spreadsheets Excel Basics: Working with Spreadsheets E 890 / 1 Unravel the Mysteries of Cells, Rows, Ranges, Formulas and More Spreadsheets are all about numbers: they help us keep track of figures and make calculations.

More information

QUICK EXCEL TUTORIAL. The Very Basics

QUICK EXCEL TUTORIAL. The Very Basics QUICK EXCEL TUTORIAL The Very Basics You Are Here. Titles & Column Headers Merging Cells Text Alignment When we work on spread sheets we often need to have a title and/or header clearly visible. Merge

More information

MITOCW watch?v=zm5mw5nkzjg

MITOCW watch?v=zm5mw5nkzjg MITOCW watch?v=zm5mw5nkzjg The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To

More information

1.7 Limit of a Function

1.7 Limit of a Function 1.7 Limit of a Function We will discuss the following in this section: 1. Limit Notation 2. Finding a it numerically 3. Right and Left Hand Limits 4. Infinite Limits Consider the following graph Notation:

More information

Access Intermediate

Access Intermediate Access 2010 - Intermediate 103-134 Advanced Queries Quick Links Overview Pages AC116 AC117 Selecting Fields Pages AC118 AC119 AC122 Sorting Results Pages AC125 AC126 Specifying Criteria Pages AC132 AC134

More information

SCRATCH MODULE 3: NUMBER CONVERSIONS

SCRATCH MODULE 3: NUMBER CONVERSIONS SCRATCH MODULE 3: NUMBER CONVERSIONS INTRODUCTION The purpose of this module is to experiment with user interactions, error checking input, and number conversion algorithms in Scratch. We will be exploring

More information

Sql Server Check If Global Temporary Table Exists

Sql Server Check If Global Temporary Table Exists Sql Server Check If Global Temporary Table Exists I am trying to create a temp table from the a select statement so that I can get the schema information from the temp I have yet to see a valid justification

More information

Access Intermediate

Access Intermediate Access 2013 - Intermediate 103-134 Advanced Queries Quick Links Overview Pages AC124 AC125 Selecting Fields Pages AC125 AC128 AC129 AC131 AC238 Sorting Results Pages AC131 AC136 Specifying Criteria Pages

More information

Advanced Reporting Tool

Advanced Reporting Tool Advanced Reporting Tool The Advanced Reporting tool is designed to allow users to quickly and easily create new reports or modify existing reports for use in the Rewards system. The tool utilizes the Active

More information

The attendee will get a deep dive into all the DDL changes needed in order to exploit DB2 V10 Temporal tables as well as the limitations.

The attendee will get a deep dive into all the DDL changes needed in order to exploit DB2 V10 Temporal tables as well as the limitations. The attendee will get a deep dive into all the DDL changes needed in order to exploit DB2 V10 Temporal tables as well as the limitations. A case study scenario using a live DB2 V10 system will be used

More information

Section 0.3 The Order of Operations

Section 0.3 The Order of Operations Section 0.3 The Contents: Evaluating an Expression Grouping Symbols OPERATIONS The Distributive Property Answers Focus Exercises Let s be reminded of those operations seen thus far in the course: Operation

More information

Furl Furled Furling. Social on-line book marking for the masses. Jim Wenzloff Blog:

Furl Furled Furling. Social on-line book marking for the masses. Jim Wenzloff Blog: Furl Furled Furling Social on-line book marking for the masses. Jim Wenzloff jwenzloff@misd.net Blog: http://www.visitmyclass.com/blog/wenzloff February 7, 2005 This work is licensed under a Creative Commons

More information

DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in

DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in versions 8 and 9. that must be used to measure, evaluate,

More information

Creating Custom Financial Statements Using

Creating Custom Financial Statements Using Creating Custom Financial Statements Using Steve Collins Sage 50 Solution Provider scollins@iqacct.com 918-851-9713 www.iqaccountingsolutions.com Financial Statement Design Sage 50 Accounting s built in

More information

Introduction to Access 97/2000

Introduction to Access 97/2000 Introduction to Access 97/2000 PowerPoint Presentation Notes Slide 1 Introduction to Databases (Title Slide) Slide 2 Workshop Ground Rules Slide 3 Objectives Here are our objectives for the day. By the

More information

Adding content to your Blackboard 9.1 class

Adding content to your Blackboard 9.1 class Adding content to your Blackboard 9.1 class There are quite a few options listed when you click the Build Content button in your class, but you ll probably only use a couple of them most of the time. Note

More information

Math 7 Notes Unit Three: Applying Rational Numbers

Math 7 Notes Unit Three: Applying Rational Numbers Math 7 Notes Unit Three: Applying Rational Numbers Strategy note to teachers: Typically students need more practice doing computations with fractions. You may want to consider teaching the sections on

More information

Conference Users Guide for the GCFA Statistical Input System.

Conference Users Guide for the GCFA Statistical Input System. Conference Users Guide for the GCFA Statistical Input System http://eagle.gcfa.org Published: November 29, 2007 TABLE OF CONTENTS Overview... 3 First Login... 4 Entering the System... 5 Add/Edit Church...

More information

Best Practices for. Membership Renewals

Best Practices for. Membership Renewals Best Practices for Membership Renewals For many associations, it s easy to get caught up in the marketing efforts associated with attracting new members. But as important as membership growth is, renewal

More information

GENERAL MATH FOR PASSING

GENERAL MATH FOR PASSING GENERAL MATH FOR PASSING Your math and problem solving skills will be a key element in achieving a passing score on your exam. It will be necessary to brush up on your math and problem solving skills.

More information

Java/RealJ Troubleshooting Guide

Java/RealJ Troubleshooting Guide Java/RealJ Troubleshooting Guide by Bob Clark / Sharon Curtis / Simon Jones, September 2000 Some of these tips you will come across during your practical sessions, however we felt it would be helpful to

More information

Fractions and their Equivalent Forms

Fractions and their Equivalent Forms Fractions Fractions and their Equivalent Forms Little kids use the concept of a fraction long before we ever formalize their knowledge in school. Watching little kids share a candy bar or a bottle of soda

More information

If Statements, For Loops, Functions

If Statements, For Loops, Functions Fundamentals of Programming If Statements, For Loops, Functions Table of Contents Hello World Types of Variables Integers and Floats String Boolean Relational Operators Lists Conditionals If and Else Statements

More information

Lecture 1: Overview

Lecture 1: Overview 15-150 Lecture 1: Overview Lecture by Stefan Muller May 21, 2018 Welcome to 15-150! Today s lecture was an overview that showed the highlights of everything you re learning this semester, which also meant

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

On the Web sun.com/aboutsun/comm_invest STAROFFICE 8 DRAW

On the Web sun.com/aboutsun/comm_invest STAROFFICE 8 DRAW STAROFFICE 8 DRAW Graphics They say a picture is worth a thousand words. Pictures are often used along with our words for good reason. They help communicate our thoughts. They give extra information that

More information

6 Tips to Help You Improve Configuration Management. by Stuart Rance

6 Tips to Help You Improve Configuration Management. by Stuart Rance 6 Tips to Help You Improve Configuration Management by Stuart Rance Introduction Configuration management provides information about what assets you own, how they are configured, and how they are connected

More information

CSCI 1100L: Topics in Computing Lab Lab 11: Programming with Scratch

CSCI 1100L: Topics in Computing Lab Lab 11: Programming with Scratch CSCI 1100L: Topics in Computing Lab Lab 11: Programming with Scratch Purpose: We will take a look at programming this week using a language called Scratch. Scratch is a programming language that was developed

More information

DO EVEN MORE WITH TABLEAU. At BlueGranite, our unique approach and extensive expertise helps you get the most from your Tableau products.

DO EVEN MORE WITH TABLEAU. At BlueGranite, our unique approach and extensive expertise helps you get the most from your Tableau products. DO EVEN MORE WITH TABLEAU At BlueGranite, our unique approach and extensive expertise helps you get the most from your Tableau products. WHAT WE DO WE PLAN, DESIGN AND BUILD SOLUTIONS WITH TABLEAU TECHNOLOGY.

More information

CheckBook Pro 2 Help

CheckBook Pro 2 Help Get started with CheckBook Pro 9 Introduction 9 Create your Accounts document 10 Name your first Account 11 Your Starting Balance 12 Currency 13 We're not done yet! 14 AutoCompletion 15 Descriptions 16

More information

(Updated 29 Oct 2016)

(Updated 29 Oct 2016) (Updated 29 Oct 2016) 1 Class Maker 2016 Program Description Creating classes for the new school year is a time consuming task that teachers are asked to complete each year. Many schools offer their students

More information

Touring the Mac S e s s i o n 4 : S A V E, P R I N T, C L O S E & Q U I T

Touring the Mac S e s s i o n 4 : S A V E, P R I N T, C L O S E & Q U I T Touring the Mac S e s s i o n 4 : S A V E, P R I N T, C L O S E & Q U I T Touring_the_Mac_Session-4_Feb-22-2011 1 To store your document for later retrieval, you must save an electronic file in your computer.

More information

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller Table of Contents Introduction!... 1 Part 1: Entering Data!... 2 1.a: Typing!... 2 1.b: Editing

More information

Text Input and Conditionals

Text Input and Conditionals Text Input and Conditionals Text Input Many programs allow the user to enter information, like a username and password. Python makes taking input from the user seamless with a single line of code: input()

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

CPSC 320: Intermediate Algorithm Design and Analysis. Tutorial: Week 3

CPSC 320: Intermediate Algorithm Design and Analysis. Tutorial: Week 3 CPSC 320: Intermediate Algorithm Design and Analysis Author: Susanne Bradley Tutorial: Week 3 At the time of this week s tutorial, we were approaching the end of our stable matching unit and about to start

More information

It s possible to get your inbox to zero and keep it there, even if you get hundreds of s a day.

It s possible to get your  inbox to zero and keep it there, even if you get hundreds of  s a day. It s possible to get your email inbox to zero and keep it there, even if you get hundreds of emails a day. It s not super complicated, though it does take effort and discipline. Many people simply need

More information

Modeling of RAS and Relays in Power Flow Contingency Analysis. Jamie Weber

Modeling of RAS and Relays in Power Flow Contingency Analysis. Jamie Weber Modeling of RAS and Relays in Power Flow Contingency Analysis Jamie Weber weber@powerworld.com 217 384 6330 ext. 13 2001 South First Street Champaign, Illinois 61820 +1 (217) 384.6330 support@powerworld.com

More information

GSAK (Geocaching Swiss Army Knife) GEOCACHING SOFTWARE ADVANCED KLASS GSAK by C3GPS & Major134

GSAK (Geocaching Swiss Army Knife) GEOCACHING SOFTWARE ADVANCED KLASS GSAK by C3GPS & Major134 GSAK (Geocaching Swiss Army Knife) GEOCACHING SOFTWARE ADVANCED KLASS GSAK - 102 by C3GPS & Major134 Table of Contents About this Document... iii Class Materials... iv 1.0 Locations...1 1.1 Adding Locations...

More information

LOOPS. Repetition using the while statement

LOOPS. Repetition using the while statement 1 LOOPS Loops are an extremely useful feature in any programming language. They allow you to direct the computer to execute certain statements more than once. In Python, there are two kinds of loops: while

More information

Grade 6 Math Circles November 6 & Relations, Functions, and Morphisms

Grade 6 Math Circles November 6 & Relations, Functions, and Morphisms Faculty of Mathematics Waterloo, Ontario N2L 3G1 Centre for Education in Mathematics and Computing Relations Let s talk about relations! Grade 6 Math Circles November 6 & 7 2018 Relations, Functions, and

More information

Workbooks (File) and Worksheet Handling

Workbooks (File) and Worksheet Handling Workbooks (File) and Worksheet Handling Excel Limitation Excel shortcut use and benefits Excel setting and custom list creation Excel Template and File location system Advanced Paste Special Calculation

More information

Sql 2008 Copy Table Structure And Database To

Sql 2008 Copy Table Structure And Database To Sql 2008 Copy Table Structure And Database To Another Table Different you can create a table with same schema in another database first and copy the data like Browse other questions tagged sql-server sql-server-2008r2-express.

More information

Introduction to Scientific Computing

Introduction to Scientific Computing Introduction to Scientific Computing Dr Hanno Rein Last updated: October 12, 2018 1 Computers A computer is a machine which can perform a set of calculations. The purpose of this course is to give you

More information

Spectroscopic Analysis: Peak Detector

Spectroscopic Analysis: Peak Detector Electronics and Instrumentation Laboratory Sacramento State Physics Department Spectroscopic Analysis: Peak Detector Purpose: The purpose of this experiment is a common sort of experiment in spectroscopy.

More information

Liquibase Version Control For Your Schema. Nathan Voxland April 3,

Liquibase Version Control For Your Schema. Nathan Voxland April 3, Liquibase Version Control For Your Schema Nathan Voxland April 3, 2014 nathan@liquibase.org @nvoxland Agenda 2 Why Liquibase Standard Usage Tips and Tricks Q&A Why Liquibase? 3 You would never develop

More information

Project 1 Balanced binary

Project 1 Balanced binary CMSC262 DS/Alg Applied Blaheta Project 1 Balanced binary Due: 7 September 2017 You saw basic binary search trees in 162, and may remember that their weakness is that in the worst case they behave like

More information

Publications Database

Publications Database Getting Started Guide Publications Database To w a r d s a S u s t a i n a b l e A s i a - P a c i f i c!1 Table of Contents Introduction 3 Conventions 3 Getting Started 4 Suggesting a Topic 11 Appendix

More information

Earthquake data in geonet.org.nz

Earthquake data in geonet.org.nz Earthquake data in geonet.org.nz There is are large gaps in the 2012 and 2013 data, so let s not use it. Instead we ll use a previous year. Go to http://http://quakesearch.geonet.org.nz/ At the screen,

More information

n! = 1 * 2 * 3 * 4 * * (n-1) * n

n! = 1 * 2 * 3 * 4 * * (n-1) * n The Beauty and Joy of Computing 1 Lab Exercise 9: Problem self-similarity and recursion Objectives By completing this lab exercise, you should learn to Recognize simple self-similar problems which are

More information

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 02 Lecture - 45 Memoization

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 02 Lecture - 45 Memoization Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Module 02 Lecture - 45 Memoization Let us continue our discussion of inductive definitions. (Refer Slide Time: 00:05)

More information

Modular Arithmetic. is just the set of remainders we can get when we divide integers by n

Modular Arithmetic. is just the set of remainders we can get when we divide integers by n 20181004 Modular Arithmetic We are accustomed to performing arithmetic on infinite sets of numbers. But sometimes we need to perform arithmetic on a finite set, and we need it to make sense and be consistent

More information

ITConnect KEEPING TRACK OF YOUR EXPENSES WITH YNAB

ITConnect KEEPING TRACK OF YOUR EXPENSES WITH YNAB ITConnect Technology made practical for home APRIL 06 Edit PDF files with Word Word is the best tool we have at hand to edit PDFs without having to purchase extra software. Viruses distributed by email

More information

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week 02 Module 06 Lecture - 14 Merge Sort: Analysis So, we have seen how to use a divide and conquer strategy, we

More information

NON-CALCULATOR ARITHMETIC

NON-CALCULATOR ARITHMETIC Mathematics Revision Guides Non-Calculator Arithmetic Page 1 of 30 M.K. HOME TUITION Mathematics Revision Guides: Level: GCSE Foundation Tier NON-CALCULATOR ARITHMETIC Version: 3.2 Date: 21-10-2016 Mathematics

More information

SERIES 1 (SS1) (V ) SELECT SERIES 1 REFRESH (SS1R) (V ) SELECT SERIES 2 (SS2) (V ) ADDENDUM

SERIES 1 (SS1) (V ) SELECT SERIES 1 REFRESH (SS1R) (V ) SELECT SERIES 2 (SS2) (V ) ADDENDUM WHY USE AN ADDENDUM: INROADS SURVEY V8i SELECT SERIES 1 (SS1) (V08.11.07.229) SELECT SERIES 1 REFRESH (SS1R) (V08.11.07.246) SELECT SERIES 2 (SS2) (V08.11.07.428) ADDENDUM This addendum is meant to accompany

More information

Students received individual feedback throughout year on assignments.

Students received individual feedback throughout year on assignments. ACS108 No exam. Students received individual feedback throughout year on assignments. ACS123 In general, during ACS123 exam session, students have shown satisfactory performance, clear understanding of

More information

EDITING AN EXISTING REPORT

EDITING AN EXISTING REPORT Report Writing in NMU Cognos Administrative Reporting 1 This guide assumes that you have had basic report writing training for Cognos. It is simple guide for the new upgrade. Basic usage of report running

More information

EEN118 LAB TWO. 1. A Five-Pointed Star.

EEN118 LAB TWO. 1. A Five-Pointed Star. EEN118 LAB TWO The purpose of this lab is to get practice with defining and using your own functions. The essence of good structured programming is to split large problems into smaller and smaller sub-problems.

More information

CS1114: Matlab Introduction

CS1114: Matlab Introduction CS1114: Matlab Introduction 1 Introduction The purpose of this introduction is to provide you a brief introduction to the features of Matlab that will be most relevant to your work in this course. Even

More information

NCSS: Databases and SQL

NCSS: Databases and SQL NCSS: Databases and SQL Tim Dawborn Lecture 1, January, 2016 Motivation SQLite SELECT WHERE JOIN Tips 2 Outline 1 Motivation 2 SQLite 3 Searching for Data 4 Filtering Results 5 Joining multiple tables

More information

Burning CDs in Windows XP

Burning CDs in Windows XP B 770 / 1 Make CD Burning a Breeze with Windows XP's Built-in Tools If your PC is equipped with a rewritable CD drive you ve almost certainly got some specialised software for copying files to CDs. If

More information

SharePoint 2010 Site Owner s Manual by Yvonne M. Harryman

SharePoint 2010 Site Owner s Manual by Yvonne M. Harryman SharePoint 2010 Site Owner s Manual by Yvonne M. Harryman Chapter 9 Copyright 2012 Manning Publications Brief contents PART 1 GETTING STARTED WITH SHAREPOINT 1 1 Leveraging the power of SharePoint 3 2

More information

VISUAL GUIDE to. RX Scripting. for Roulette Xtreme - System Designer 2.0. L J Howell UX Software Ver. 1.0

VISUAL GUIDE to. RX Scripting. for Roulette Xtreme - System Designer 2.0. L J Howell UX Software Ver. 1.0 VISUAL GUIDE to RX Scripting for Roulette Xtreme - System Designer 2.0 L J Howell UX Software 2009 Ver. 1.0 TABLE OF CONTENTS INTRODUCTION...ii What is this book about?... iii How to use this book... iii

More information

Oracle Cloud. Content and Experience Cloud ios Mobile Help E

Oracle Cloud. Content and Experience Cloud ios Mobile Help E Oracle Cloud Content and Experience Cloud ios Mobile Help E82090-01 February 2017 Oracle Cloud Content and Experience Cloud ios Mobile Help, E82090-01 Copyright 2017, 2017, Oracle and/or its affiliates.

More information

Information Technology Virtual EMS Help https://msum.bookitadmin.minnstate.edu/ For More Information Please contact Information Technology Services at support@mnstate.edu or 218.477.2603 if you have questions

More information

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 5 Structured Query Language Hello and greetings. In the ongoing

More information

Heads Up! (Continued)

Heads Up! (Continued) . Name Date A c t i v i t y 6 Heads Up! (Continued) In this activity, you will do more experiments with simulations and use a calculator program that will quickly simulate multiple coin tosses. The Problem

More information

How to Improve Your Campaign Conversion Rates

How to Improve Your  Campaign Conversion Rates How to Improve Your Email Campaign Conversion Rates Chris Williams Author of 7 Figure Business Models How to Exponentially Increase Conversion Rates I'm going to teach you my system for optimizing an email

More information

Lesson 6: Manipulating Equations

Lesson 6: Manipulating Equations Lesson 6: Manipulating Equations Manipulating equations is probably one of the most important skills to master in a high school physics course. Although it is based on familiar (and fairly simple) math

More information

Resource Center & Messaging System

Resource Center & Messaging System SOS 2012 User Manual Resource Center & Messaging System Alpha Omega Publications MMVI Alpha Omega Publications, Inc. Switched-On Schoolhouse 2012, Switched-On Schoolhouse. Switched-On, and their logos

More information

Search Overview. How to find your way around: Suggested Keywords: From any List screen, click in the Search box: Monday, August 15, 2016 Page 1

Search Overview. How to find your way around: Suggested Keywords: From any List screen, click in the Search box: Monday, August 15, 2016 Page 1 Search Overview From any List screen, click in the Search box: How to find your way around: Suggested Keywords: Monday, August 15, 2016 Page 1 Additional Criteria: Use Additional Criteria to search particular

More information

Designing the staging area contents

Designing the staging area contents We are going to design and build our very first ETL mapping in OWB, but where do we get started? We know we have to pull data from the acme_pos transactional database as we saw back in topic 2. The source

More information

Slice Intelligence!

Slice Intelligence! Intern @ Slice Intelligence! Wei1an(Wu( September(8,(2014( Outline!! Details about the job!! Skills required and learned!! My thoughts regarding the internship! About the company!! Slice, which we call

More information

1: Introduction to Object (1)

1: Introduction to Object (1) 1: Introduction to Object (1) 김동원 2003.01.20 Overview (1) The progress of abstraction Smalltalk Class & Object Interface The hidden implementation Reusing the implementation Inheritance: Reusing the interface

More information

Civil Engineering Computation

Civil Engineering Computation Civil Engineering Computation First Steps in VBA Homework Evaluation 2 1 Homework Evaluation 3 Based on this rubric, you may resubmit Homework 1 and Homework 2 (along with today s homework) by next Monday

More information

Heuristic Evaluation of [ Quest ]

Heuristic Evaluation of [ Quest ] Heuristic Evaluation of [ Quest ] 1. Problem Quest is an app that allows you to stay involved in, participate in, and create local clubs and events happening in your community 2. Violations Found 1. [H10.

More information

GEO 425: SPRING 2012 LAB 9: Introduction to Postgresql and SQL

GEO 425: SPRING 2012 LAB 9: Introduction to Postgresql and SQL GEO 425: SPRING 2012 LAB 9: Introduction to Postgresql and SQL Objectives: This lab is designed to introduce you to Postgresql, a powerful database management system. This exercise covers: 1. Starting

More information

Manual Trigger Sql Server 2008 Insert Multiple Rows

Manual Trigger Sql Server 2008 Insert Multiple Rows Manual Trigger Sql Server 2008 Insert Multiple Rows With "yellow" button I want that the sql insert that row first and then a new row like this OF triggers: technet.microsoft.com/en-us/library/ms175089(v=sql.105).aspx

More information

Microsoft Access - Using Relational Database Data Queries (Stored Procedures) Paul A. Harris, Ph.D. Director, GCRC Informatics.

Microsoft Access - Using Relational Database Data Queries (Stored Procedures) Paul A. Harris, Ph.D. Director, GCRC Informatics. Microsoft Access - Using Relational Database Data Queries (Stored Procedures) Paul A. Harris, Ph.D. Director, GCRC Informatics October 01, 2004 What is Microsoft Access? Microsoft Access is a relational

More information

Computer Basics: Step-by-Step Guide (Session 2)

Computer Basics: Step-by-Step Guide (Session 2) Table of Contents Computer Basics: Step-by-Step Guide (Session 2) ABOUT PROGRAMS AND OPERATING SYSTEMS... 2 THE WINDOWS 7 DESKTOP... 3 TWO WAYS TO OPEN A PROGRAM... 4 DESKTOP ICON... 4 START MENU... 5

More information

How & Why We Subnet Lab Workbook

How & Why We Subnet Lab Workbook i How & Why We Subnet Lab Workbook ii CertificationKits.com How & Why We Subnet Workbook Copyright 2013 CertificationKits LLC All rights reserved. No part of this book maybe be reproduced or transmitted

More information

Texas Death Row. Last Statements. Data Warehousing and Data Mart. By Group 16. Irving Rodriguez Joseph Lai Joe Martinez

Texas Death Row. Last Statements. Data Warehousing and Data Mart. By Group 16. Irving Rodriguez Joseph Lai Joe Martinez Texas Death Row Last Statements Data Warehousing and Data Mart By Group 16 Irving Rodriguez Joseph Lai Joe Martinez Introduction For our data warehousing and data mart project we chose to use the Texas

More information

the NXT-G programming environment

the NXT-G programming environment 2 the NXT-G programming environment This chapter takes a close look at the NXT-G programming environment and presents a few simple programs. The NXT-G programming environment is fairly complex, with lots

More information

Divisibility Rules and Their Explanations

Divisibility Rules and Their Explanations Divisibility Rules and Their Explanations Increase Your Number Sense These divisibility rules apply to determining the divisibility of a positive integer (1, 2, 3, ) by another positive integer or 0 (although

More information

Comparison of LO-VC model to AVC Model Steve Schneider, Sr. Consulting Product Data Analyst

Comparison of LO-VC model to AVC Model Steve Schneider, Sr. Consulting Product Data Analyst Comparison of LO-VC model to AVC Model Steve Schneider, Sr. Consulting Product Data Analyst sschneid@steelcase.com October 14-17, 2018 1 1 About Steelcase Inc. For more than 105 years, Steelcase Inc. has

More information

CS1114: Matlab Introduction

CS1114: Matlab Introduction CS1114: Matlab Introduction 1 Introduction The purpose of this introduction is to provide you a brief introduction to the features of Matlab that will be most relevant to your work in this course. Even

More information

Where Did My Data Go?

Where Did My Data Go? Where Did My Data Go? ODS / BANNER DATA LOCATIONS A Guide to locating data for those preparing reports whose data now resides in Banner data tables. Revised 12/17/08 Contents Page ii Contents Overview...3

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my Algebra course that I teach here at Lamar University, although I have to admit that it s been years since I last taught this course. At this point in my career I

More information

Selec%on and Decision Structures in Java: If Statements and Switch Statements CSC 121 Fall 2016 Howard Rosenthal

Selec%on and Decision Structures in Java: If Statements and Switch Statements CSC 121 Fall 2016 Howard Rosenthal Selec%on and Decision Structures in Java: If Statements and Switch Statements CSC 121 Fall 2016 Howard Rosenthal Lesson Goals Understand Control Structures Understand how to control the flow of a program

More information

OVERVIEW OF RELATIONAL DATABASES: KEYS

OVERVIEW OF RELATIONAL DATABASES: KEYS OVERVIEW OF RELATIONAL DATABASES: KEYS Keys (typically called ID s in the Sierra Database) come in two varieties, and they define the relationship between tables. Primary Key Foreign Key OVERVIEW OF DATABASE

More information

ADP Reporting Skills Business Requirements ADP Pro User Conference

ADP Reporting Skills Business Requirements ADP Pro User Conference ADP Reporting Skills Business Requirements 2015 ADP Pro User Conference Disclaimer The screen shots used in this presentation come from the current version of ADP Custom Reporting. What you see when you

More information

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13 CS121 MIDTERM REVIEW CS121: Relational Databases Fall 2017 Lecture 13 2 Before We Start Midterm Overview 3 6 hours, multiple sittings Open book, open notes, open lecture slides No collaboration Possible

More information

POC Evaluation Guide May 09, 2017

POC Evaluation Guide May 09, 2017 POC Evaluation Guide May 09, 2017 This page intentionally left blank P r o p r i e t a r y a n d C o n f i d e n t i a l. 2 0 1 7 R F P M o n k e y. c o m L L C Page 2 CONTENTS Read Me First... 4 About

More information