INSTITUTO SUPERIOR TÉCNICO GESTÃO E TRATAMENTO DE INFORMAÇÃO

Size: px
Start display at page:

Download "INSTITUTO SUPERIOR TÉCNICO GESTÃO E TRATAMENTO DE INFORMAÇÃO"

Transcription

1 Número: Nome: INSTITUTO SUPERIOR TÉCNICO GESTÃO E TRATAMENTO DE INFORMAÇÃO Exam 2 - solution 30 January 2015 The duration of this exam is 2,5 Hours. You can access your own written materials, but the exam is to be done individually. You are not allowed to use computers, tablets, nor mobile phones. The maximum grade of the exam is 20 pts. Write your answers below the questions. Write your number and name at the top of each page. Present all calculations performed. After the exam starts, you can leave the room one hour after delivering the exam. The following table is be used by instructors, ONLY: SUM Page 1 of 12

2 (This page was left blank.) Page 2 of 12

3 Número: Nome: 1. (4 pts) XML Data Management Technology Consider the following XML document: <dvdcollection> <dvd> <title>good Night, and Good Luck</title> <release-year>2005</release-year> <director>george Clooney</director> <actors> <actor>george Clooney</actor> <actor>jeff Daniels</actor> <actor>david Strathairn</actor> </actors> </dvd> <dvd> <title>they Live</title> <release-year>1988</release-year> <director>john Carpenter</director> <actors> <actor>roddy Piper</actor> <actor>keith David</actor> <actor>meg Foster</actor> </actors> </dvd> <!-- list of remaining dvds --> </dvdcollection> 1.1. (2,5 pts) Present XPath expressions that, using the XML document, answer the following information needs: What are the titles of movies directed by John Carpenter, where Roddy Piper was the leading actor (i.e., the first actor appearing in the list of actors). //dvd[./director="john Carpenter"] [.//actor[1]="roddy Piper"]/title Who are the actors, in the XML dataset, that are also directors of movies released after //actor[ text() =//dvd[./release year > 1995]/director ] Who is the director of the oldest movie featuring Jeff Daniels has an actor. //dvd[.//actor="jeff Daniels " and./release year= min(//dvd[.//actor="jeff Daniels"]/release year) ]/director Page 3 of 12

4 1.2. (1 pt) Present an XQuery expression that, using the XML document, lists all movies that were directed by actors in the movie entitled Good Night, and Good Luck. Movies in the results should be sorted according to the release year, from oldest to newest. let $a := //dvd[./title="good Night, and Good Luck"]//actor for $m in //dvd where $m/director/text() = $a/text() order by $m/release year ascending return $m 1.3. (0,5 pt) Present an XQuery updating expression for changing the XML document, deleting all but the leading actor in the movies that were released prior to 1990, and adding an attribute rating = "awesome" to the dvd elements corresponding to movies directed by John Carpenter. ( for $m in //dvd[release year < 1990] let $a := $m/actors/actor[position() > 1] return delete nodes $a, for $m in //dvd[director="john Carpenter"] return insert node attribute rating { "awesome" } into $m ) Page 4 of 12

5 Número: Nome: 2. (4 pts) Web Data Extraction Consider the following trees, representing two data records encoding information about a family tree (2,5 pts) Compute the similarity (i.e., the number of matching nodes), using the Simple Tree Matching (STM) algorithm, and considering that two nodes can be aligned if they share the same label. Page 5 of 12

6 2.2. (1 pt) Compute the alignment between the trees, using the calculations performed for the previous question (make clear the backtracking process that reaches the specified alignment). The backtracking is shown in pink in the previous question 2.3. (0,5 pt) Knowing that the STM algorithm is a simplification of a more general tree matching algorithm, give an example of two HTML trees containing a data record that would not be captured by STM, but could be captured if the general algorithm was used. Explain why this would happen. Consider, for example, HTML pages contaning data records with information on books, where, in some cases the title is encoded using <strong> and in others using <emph>. This could be captured by the general algorithm but not by STM, since it discards nodes with different labels. Page 6 of 12

7 Número: Nome: 3. (4 pts) Data Integration Suppose a data source S storing the following tables: Movie (movie name, year, director name) Play (movie name, person name) Person (person name, nationality) 3.1. (2,5 pts) Rewrite the following SQL query as a conjunctive query: SELECT movie name, director name FROM Movie m, Play p, Person a WHERE m.movie name = p.movie name AND p.person name = a.person name AND a.nationality = Portuguese UNION ALL SELECT movie name, director name FROM Movie m WHERE m.year = 1995 Q(m, d) :- Movie(m, y, d), Play(m, p), Person(p, Portuguese ) Q(m, d) :- Movie(m, 1995, d) Suppose you have the following mediated schema M: Portuguese movies(movie name, year) which represents the names and years of Movies whose actors are Portuguese or whose director is Portuguese. Write a global-as-view mapping between the mediated schema M and the data source schema S. Portuguese-movies (m, y) = Movie(m, y, d), Play(m, p), Person(p, Portuguese ) Portuguese-movies/m, y) = Movie(m,y, d), Person(d, Portuguese) Write a conjunctive query in terms of the mediated schema that returns the names of portuguese movies directed after Then, unfold it and rewrite it in terms of the tables of data source S. Q (m) :- Portuguese-movies(m, y), y >= 1995 Unfolding: Q (m) :- Movie(m, y, d), Plays(m, p), Person(p, Portuguese ), y >= 1995 Page 7 of 12

8 Q (m) :- Movie(m,y, d), Person(d, Portuguese), y >= (1 pt) Suppose you have a pre-computed view: Portuguese Person(m,p) : Plays(m,p), Person(p, Portuguese ) How would write the conjunctive query of Question using the view Portuguese-Person? Portuguese-movies (m, y) = Movie (m, y, d), Portuguese-Person(m,p) Portuguese-movies (m, y) = Movie(m,y, d), Person(d, Portuguese) 3.2. (0,5 pt) For the following pair of queries, state which relationship exists (equivalence or containment) between them. Justify. Q1(A,B,E) : T(A,B,C), R(C,E), T(A,B,E), R(E,C) Q2(U,V,Z) : T(U,V,Z), R(Z,5) There is no relationship Page 8 of 12

9 Número: Nome: 4. (4 pts) Data Cleaning and Integration 4.1. (2,5 pts) Suppose the following two tuples: Good Night, and Good Luck 2005 George Clooney George Cloony, Jeff Daniels, David Strathairn nice well directed exceptional actors Good Night Good Luck 2006 George Clooney Jeff Daniels and George Clooney and David Strahtairn wonderful nicely directed good actors of a table with schema: Movies (movie name, year, director, actors, review) The goal is to automatically detect that the two tuples refer to the same movie Which string matching algorithm would you use to compare the movie names? Justify. Would you use the same string matching algorithm to compare the reviews? Justify. We could use edit distance for instance, because they are medium-sized strings. To compare the reviews, edit distance would not give good results, because the same words can occur in a different position, so edit distance would not give good results. A possibility is to use TF/IDF Now, imagine you want to identify if the lists of actors of the two tuples are similar. Would you apply a string matching algorithm directly to the two strings that represent the actors in each record? If no, what would you do? We cannot apply a string matching directly to the two strings, because the actor names are separated by a different separator and they do not occur in the same order. It would be better to first split the actor field into one tuple per actor and store the actor tuples in a distinct table. Then a string matching algorithm could be applied Which string matching algorithm is appropriate to compare person names? Use that algorithm to compute the similarity between Clooney and Cloony in the two tuples and between Strahtair and Strathair? Do they return the same value? Why? Jaro measure is good to apply to short names Jaro (Clooney, cloony) : x = 7 y = 6 Common chars: 6 Transposed: 0 Jaro = 1/3 [ c/ x + c/ y + (c t/2)/c ] = 1/3 (6/7 + 6/6 +6/6) =1/3( ) = 0.95 Jaro(Strahtar, Strathair) x = 8 y = 9 Page 9 of 12

10 Common chars: 9 Transposed: 2 Jaro = 1/3(9/9 + 9/9 + (9 1)/8) = 0.96 Although the nb of common caracters is the size of one of the words, one of the pairs has 2 transposed characters which decreases the similarity value (1 pt) Consider now only the possible values of the attribute review. Besides the two values represented above (denoted t1 and t2, respectively) that correspond to positive reviews, consider that you have another two instances denoted t3 and t4 that correspond to negative reviews. Suppose as well that the review attribute values have undergone a normalization process. The resulting set of reviews is as follows: t1: {nice, well, directed, exceptional, actor} positive t2: {wonderful, nice, directed, good, actor} positive t3: {medium, film, terrible, direction, actor} negative t4: {poor, directed, medium, film} negative Now, suppose we have another table with schema T(Y) and we have one tuple of that table <nice, well, actor, good, directed>. Use a Naïve Bayes Learner to learn with the four possible instances of the review attribute of the Movie table (t1, t2, t3, and t4) and then to predict whether the value of attribute Y refers to a positive or a negative review. d: { nice, well, actor, good, directed } P(positive d)= P(d positive)p(positive)/p(d) P(negative d)= P(d negative)p(negative)/p(d) Cd = arg max ci [P(d C i)p(c i)], where ci is positive or ci is negative P(d ci) and P(ci) P(ci) - the portion of the training instances with label ci P(positive) = 0.5 P(negative) = 0.5 N(positive) = 13 N(negative) = 9 P(d positive)=p(nice positive). P(well positive).p(actor positive).p(good positive).p(directed positive) P(nice positive) = n(nice, positive)/n(positive) = 2/10 P(good positive) = n(good, positive)/n(positive) = 1/10 P(actor positive) = 2/10 P(well positive) =1/10 P(directed positive) = 2/10 P(d positive) = 0.5*8/10*10*10*10*10 P(d negative)=p(good negative).p(nice negative).p(actor negative).p(well negative).p(directed negative) P(good negative) = n(good, negative)/n(negative) = 0 P(nice negative) = 0 P(actor negative) = 1 P(well negative) = 0 P(directed negative) = 1 P(d negative)=0 Page 10 of 12

11 Número: Nome: So the answer is: positive review (0,5 pt) Suppose that you have 1 million tuples stored in the Movies table. Which method do you suggest to use to optimize the time needed to find all the tuples that refer to the same movie? Describe it briefly and point out one limitation of the method. Sorted neighborhood method. It consists of a first phase where a key composed by parts of every attribute is chosen, a second phase where the tuples are sorted according to this key, and a third where a fixed size window slide the set of tuples and only those that are within the window are compared using a set of matching rules. One limitation of this method is the possibility of loosing matches. Page 11 of 12

12 5. (4 pts) Miscellaneous 5.1. (1,5 pt) In this course you have seen dynamic programming at work in several algorithms/techniques. In string matching, what is dynamic programming used for? How does it work? Explain in your own words. Use a diagram or example if needed, but do not copy content from the slides. Answer: In string matching, dynamic programming is used to calculate the (minimum) edit distance between two given strings, where the possible edit operations are insertion, deletion, or substitution of characters. Basically, we build a matrix and in each cell of that matrix we consider the possibility of using each of those edit operations, but only with respect to the neighboring cells (the neighbor on top, the neighbor on the left, and the neighbor on the diagonal top-left). Usually, each edit operation is defined as having a cost of 1(one). The cost is 0(zero) if there is a match between the characters in both strings. As we build the matrix (by filling in the value in each cell), we choose the option that yields the minimum accumulated cost. Once the matrix is fully built, we backtrack over those options to find the corresponding edit operations (which gives us the alignment between both strings) (1,5 pt) In Hidden Markov Models (HMMs), what is dynamic programming used for? How does it work? Explain in your own words. Use a diagram or example if needed, but do not copy content from the slides. Answer: In HMMs, dynamic programming is used to find the most likely sequence of states for a given observed sequence of symbols. This is called the Viterbi algorithm. Basically, we need to find which state generated each symbol. At first sight, it could seem that we would have to consider every possibility of each state generating each symbol in the observed sequence. However, there are transition probabilities between states (and symbol emission probabilities in each state), so if we know which state generated symbol i, we can determine which state is more likely to have generated symbol i+1. Therefore, at each step we keep only the state that maximizes such probability (instead of keeping all possible transitions). Once we reach the end of the sequence, we can backtrack over the sequence of states which yields the highest total probability (1 pt) Now that you have seen dynamic programming at work in different places, what is the essence of dynamic programming? How would you describe it in general terms? What is so special about dynamic programming that makes it a good choice to solve certain problems? What do these problems have in common? Answer: In string matching, dynamic programming allows us to find a globally optimal alignment by doing a local minimization of the accumulated cost between neighboring cells. In HMMs, dynamic programming allows us to find a globally optimum sequence of states by doing a local maximization of the transition (and symbol emission) probabilities between consecutive states. Therefore, it seems that dynamic programming can be applied to those problems where a globally optimal solution can be found by a series of locally optimal decisions. Page 12 of 12

INSTITUTO SUPERIOR TÉCNICO Gestão e Tratamento de Informação

INSTITUTO SUPERIOR TÉCNICO Gestão e Tratamento de Informação -------------------------------------------------------------------------------------------------------------- INSTITUTO SUPERIOR TÉCNICO Gestão e Tratamento de Informação Exam 1 16 January 2011 --------------------------------------------------------------------------------------------------------------

More information

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados -------------------------------------------------------------------------------------------------------------- INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados Exam 1 - Solution

More information

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados -------------------------------------------------------------------------------------------------------------- INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados Exam 1 16 June 2014

More information

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados -------------------------------------------------------------------------------------------------------------- INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados Exam 1 - solution

More information

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours. CS 188 Spring 2010 Introduction to Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators

More information

CIS 550 Fall Final Examination. December 13, Name: Penn ID:

CIS 550 Fall Final Examination. December 13, Name: Penn ID: CIS 550 Fall 2013 Final Examination December 13, 2013 Name: Penn ID: Email: My signature below certifies that I have complied with the University of Pennsylvania's Code of Academic Integrity in completing

More information

EECS-3421a: Test #2 Queries

EECS-3421a: Test #2 Queries 2016 November 9 EECS-3421a: Test #2 w/ answers 1 of 16 EECS-3421a: Test #2 Queries Electrical Engineering & Computer Science Lassonde School of Engineering York University Family Name: Given Name: Student#:

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Predict the box office of US movies

Predict the box office of US movies Predict the box office of US movies Group members: Hanqing Ma, Jin Sun, Zeyu Zhang 1. Introduction Our task is to predict the box office of the upcoming movies using the properties of the movies, such

More information

To earn the extra credit, one of the following has to hold true. Please circle and sign.

To earn the extra credit, one of the following has to hold true. Please circle and sign. CS 188 Spring 2011 Introduction to Artificial Intelligence Practice Final Exam To earn the extra credit, one of the following has to hold true. Please circle and sign. A I spent 3 or more hours on the

More information

Gestão e Tratamento da Informação

Gestão e Tratamento da Informação Gestão e Tratamento da Informação Web Data Extraction: Automatic Wrapper Generation Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2010/2011 Outline Automatic Wrapper Generation

More information

CS 474, Spring 2016 Midterm Exam #2

CS 474, Spring 2016 Midterm Exam #2 CS 474, Spring 2016 Midterm Exam #2 Name: e-id: @dukes.jmu.edu By writing your name, you acknowledge the following honor code statement: I have neither given nor received unauthorized assistance on this

More information

CSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100

CSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100 CSE 344 Midterm November 9, 2011, 9:30am - 10:20am Name: Question Points Score 1 40 2 40 3 20 Total: 100 This exam is open book and open notes but NO laptops or other portable devices. You have 50 minutes;

More information

CSE 344 Midterm. Wednesday, February 19, 2014, 14:30-15:20. Question Points Score Total: 100

CSE 344 Midterm. Wednesday, February 19, 2014, 14:30-15:20. Question Points Score Total: 100 CSE 344 Midterm Wednesday, February 19, 2014, 14:30-15:20 Name: Question Points Score 1 30 2 50 3 12 4 8 Total: 100 This exam is open book and open notes but NO laptops or other portable devices. You have

More information

CSE-3421 Test #1 Design

CSE-3421 Test #1 Design 2 April 2009 CSE-3421 Test #1 (corrected) w/ answers p. 1 of 10 CSE-3421 Test #1 Design Family Name: Given Name: Student#: CS Account: Instructor: Parke Godfrey Exam Duration: 75 minutes Term: winter 2009

More information

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University

More information

Fall 2018 CSE 482 Big Data Analysis: Exam 1 Total: 36 (+3 bonus points)

Fall 2018 CSE 482 Big Data Analysis: Exam 1 Total: 36 (+3 bonus points) Fall 2018 CSE 482 Big Data Analysis: Exam 1 Total: 36 (+3 bonus points) Name: This exam is open book and notes. You can use a calculator but no laptops, cell phones, nor other electronic devices are allowed.

More information

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009 9 Video Retrieval Multimedia Databases 9 Video Retrieval 9.1 Hidden Markov Models (continued from last lecture) 9.2 Introduction into Video Retrieval Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme

More information

CSE548, AMS542: Analysis of Algorithms, Fall 2012 Date: October 16. In-Class Midterm. ( 11:35 AM 12:50 PM : 75 Minutes )

CSE548, AMS542: Analysis of Algorithms, Fall 2012 Date: October 16. In-Class Midterm. ( 11:35 AM 12:50 PM : 75 Minutes ) CSE548, AMS542: Analysis of Algorithms, Fall 2012 Date: October 16 In-Class Midterm ( 11:35 AM 12:50 PM : 75 Minutes ) This exam will account for either 15% or 30% of your overall grade depending on your

More information

Introduction to Algorithms May 14, 2003 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser.

Introduction to Algorithms May 14, 2003 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser. Introduction to Algorithms May 14, 2003 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik Demaine and Shafi Goldwasser Practice Final Practice Final Do not open this exam booklet until

More information

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential

More information

Name: Database Systems ( 資料庫系統 ) Midterm exam, November 15, 2006

Name: Database Systems ( 資料庫系統 ) Midterm exam, November 15, 2006 1 of 8 pages Database Systems ( 資料庫系統 ) Midterm exam, November 15, 2006 Time: 10:00 ~ 12:20 Name: Student ID: I herewith state that I understand and will adhere to the following academic integrity: I will

More information

CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam Answers CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join

More information

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Previous Lecture Audio Retrieval - Query by Humming

More information

Interactive Machine Learning (IML) Markup of OCR Generated Text by Exploiting Domain Knowledge: A Biodiversity Case Study

Interactive Machine Learning (IML) Markup of OCR Generated Text by Exploiting Domain Knowledge: A Biodiversity Case Study Interactive Machine Learning (IML) Markup of OCR Generated by Exploiting Domain Knowledge: A Biodiversity Case Study Several digitization projects such as Google books are involved in scanning millions

More information

XML Problem. Specification of the Publication Entity:

XML Problem. Specification of the Publication Entity: XML Problem Consider the following specification for the Publication entity and its corresponding ER model, which is designed to represent every aspect of this specification as much as possible. Specification

More information

Chapter 6. Multiple sequence alignment (week 10)

Chapter 6. Multiple sequence alignment (week 10) Course organization Introduction ( Week 1,2) Part I: Algorithms for Sequence Analysis (Week 1-11) Chapter 1-3, Models and theories» Probability theory and Statistics (Week 3)» Algorithm complexity analysis

More information

NAME: Sample Final Exam (based on previous CSE 455 exams by Profs. Seitz and Shapiro)

NAME: Sample Final Exam (based on previous CSE 455 exams by Profs. Seitz and Shapiro) Computer Vision Prof. Rajesh Rao TA: Jiun-Hung Chen CSE 455 Winter 2009 Sample Final Exam (based on previous CSE 455 exams by Profs. Seitz and Shapiro) Write your name at the top of every page. Directions

More information

Chapter 8 Multiple sequence alignment. Chaochun Wei Spring 2018

Chapter 8 Multiple sequence alignment. Chaochun Wei Spring 2018 1896 1920 1987 2006 Chapter 8 Multiple sequence alignment Chaochun Wei Spring 2018 Contents 1. Reading materials 2. Multiple sequence alignment basic algorithms and tools how to improve multiple alignment

More information

CSE-6490B Final Exam

CSE-6490B Final Exam February 2009 CSE-6490B Final Exam Fall 2008 p 1 CSE-6490B Final Exam In your submitted work for this final exam, please include and sign the following statement: I understand that this final take-home

More information

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24 Databases -Normalization I (GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 1 / 24 This lecture This lecture introduces normal forms, decomposition and normalization. We will explore problems

More information

CS145 Midterm Examination

CS145 Midterm Examination CS145 Midterm Examination Spring 2004, Prof. Widom Please read all instructions (including these) carefully. There are 9 problems on the exam, with a varying number of points for each problem and subproblem

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

NOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages.

NOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages. NOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages. The last page, only, may be separated and used as an

More information

Exam Marco Kuhlmann. This exam consists of three parts:

Exam Marco Kuhlmann. This exam consists of three parts: TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding

More information

CSE-3421M Test #2. Queries

CSE-3421M Test #2. Queries 14 March 2013 CSE-3421M Test #2 w/ answers p. 1 of 16 CSE-3421M Test #2 Queries Family Name: Given Name: Student#: CS&E Account: Instructor: Parke Godfrey Exam Duration: 75 minutes Term: Winter 2013 Answer

More information

CSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100

CSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100 CSE 344 Midterm November 9, 2011, 9:30am - 10:20am Name: Question Points Score 1 40 2 40 3 20 Total: 100 This exam is open book and open notes but NO laptops or other portable devices. You have 50 minutes;

More information

CS 170 Algorithms Spring 2009 David Wagner MT2

CS 170 Algorithms Spring 2009 David Wagner MT2 CS 170 Algorithms Spring 2009 David Wagner MT2 PRINT your name:, (last) SIGN your name: (first) PRINT your Unix account login: Your TA s name: Discussion section time: Name of the person sitting to your

More information

Sequence analysis Pairwise sequence alignment

Sequence analysis Pairwise sequence alignment UMF11 Introduction to bioinformatics, 25 Sequence analysis Pairwise sequence alignment 1. Sequence alignment Lecturer: Marina lexandersson 12 September, 25 here are two types of sequence alignments, global

More information

CS145 Midterm Examination

CS145 Midterm Examination CS145 Midterm Examination Spring 2002, Prof. Widom Please read all instructions (including these) carefully. There are 9 problems on the exam, with a varying number of points for each problem and subproblem

More information

List of Exercises: Data Mining 1 December 12th, 2015

List of Exercises: Data Mining 1 December 12th, 2015 List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring

More information

CS145 Midterm Examination

CS145 Midterm Examination CS145 Midterm Examination Autumn 2005, Prof. Widom Please read all instructions (including these) carefully. There are 8 problems on the exam, with a varying number of points for each problem and subproblem

More information

LBSC 690: Information Technology Lecture 05 Structured data and databases

LBSC 690: Information Technology Lecture 05 Structured data and databases LBSC 690: Information Technology Lecture 05 Structured data and databases William Webber CIS, University of Maryland Spring semester, 2012 Interpreting bits "my" 13.5801 268 010011010110 3rd Feb, 2014

More information

Data Analytics. Qualification Exam, May 18, am 12noon

Data Analytics. Qualification Exam, May 18, am 12noon CS220 Data Analytics Number assigned to you: Qualification Exam, May 18, 2014 9am 12noon Note: DO NOT write any information related to your name or KAUST student ID. 1. There should be 12 pages including

More information

CSE 344 Midterm. Monday, Nov 4, 2013, 9:30-10:20. Question Points Score Total: 100

CSE 344 Midterm. Monday, Nov 4, 2013, 9:30-10:20. Question Points Score Total: 100 CSE 344 Midterm Monday, Nov 4, 2013, 9:30-10:20 Name: Question Points Score 1 30 2 10 3 50 4 10 Total: 100 This exam is open book and open notes but NO laptops or other portable devices. You have 50 minutes;

More information

(a) Explain how physical data dependencies can increase the cost of maintaining an information

(a) Explain how physical data dependencies can increase the cost of maintaining an information NOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages. The last page, only, may be separated and used as an

More information

15-780: Problem Set #2

15-780: Problem Set #2 15-780: Problem Set #2 February 19, 2014 1. Constraint satisfaction problem (CSP) [20pts] A common problem at universities is to schedule rooms for exams. The basic structure of this problem is to divide

More information

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018 Assignment 2 Unsupervised & Probabilistic Learning Maneesh Sahani Due: Monday Nov 5, 2018 Note: Assignments are due at 11:00 AM (the start of lecture) on the date above. he usual College late assignments

More information

Computer Science 425 Fall 2006 Second Take-home Exam Out: 2:50PM Wednesday Dec. 6, 2006 Due: 5:00PM SHARP Friday Dec. 8, 2006

Computer Science 425 Fall 2006 Second Take-home Exam Out: 2:50PM Wednesday Dec. 6, 2006 Due: 5:00PM SHARP Friday Dec. 8, 2006 Computer Science 425 Fall 2006 Second Take-home Exam Out: 2:50PM Wednesday Dec. 6, 2006 Due: 5:00PM SHARP Friday Dec. 8, 2006 Instructions: This exam must be entirely your own work. Do not consult with

More information

CS 582 Database Management Systems II

CS 582 Database Management Systems II Review of SQL Basics SQL overview Several parts Data-definition language (DDL): insert, delete, modify schemas Data-manipulation language (DML): insert, delete, modify tuples Integrity View definition

More information

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT - Swarbhanu Chatterjee. Hidden Markov models are a sophisticated and flexible statistical tool for the study of protein models. Using HMMs to analyze proteins

More information

CS145 Final Examination

CS145 Final Examination CS145 Final Examination Spring 2003, Prof. Widom ffl Please read all instructions (including these) carefully. ffl There are 11 problems on the exam, with a varying number of points for each problem and

More information

Lecture 19 Query Processing Part 1

Lecture 19 Query Processing Part 1 CMSC 461, Database Management Systems Spring 2018 Lecture 19 Query Processing Part 1 These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used from

More information

Recommender Systems (RSs)

Recommender Systems (RSs) Recommender Systems Recommender Systems (RSs) RSs are software tools providing suggestions for items to be of use to users, such as what items to buy, what music to listen to, or what online news to read

More information

Informationslogistik Unit 5: Data Integrity & Functional Dependency

Informationslogistik Unit 5: Data Integrity & Functional Dependency Informationslogistik Unit 5: Data Integrity & Functional Dependency 27. III. 2012 Outline 1 Reminder: The Relational Algebra 2 The Relational Calculus 3 Data Integrity Keeping data consistent 4 Functional

More information

CISC 3140 (CIS 20.2) Design & Implementation of Software Application II

CISC 3140 (CIS 20.2) Design & Implementation of Software Application II CISC 3140 (CIS 20.2) Design & Implementation of Software Application II Instructor : M. Meyer Email Address: meyer@sci.brooklyn.cuny.edu Course Page: http://www.sci.brooklyn.cuny.edu/~meyer/ CISC3140-Meyer-lec4

More information

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001 Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher - 113059006 Raj Dabre 11305R001 Purpose of the Seminar To emphasize on the need for Shallow Parsing. To impart basic information about techniques

More information

Classification. 1 o Semestre 2007/2008

Classification. 1 o Semestre 2007/2008 Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class

More information

Name: Lirong TAN 1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G.

Name: Lirong TAN 1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G. 1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G. A shortest s-t path is a path from vertex to vertex, whose sum of edge weights is minimized. (b) Give the pseudocode

More information

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013 Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest

More information

Introduction to Algorithms October 12, 2005 Massachusetts Institute of Technology Professors Erik D. Demaine and Charles E. Leiserson Quiz 1.

Introduction to Algorithms October 12, 2005 Massachusetts Institute of Technology Professors Erik D. Demaine and Charles E. Leiserson Quiz 1. Introduction to Algorithms October 12, 2005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D. Demaine and Charles E. Leiserson Quiz 1 Quiz 1 Do not open this quiz booklet until you

More information

XSLT and Structural Recursion. Gestão e Tratamento de Informação DEI IST 2011/2012

XSLT and Structural Recursion. Gestão e Tratamento de Informação DEI IST 2011/2012 XSLT and Structural Recursion Gestão e Tratamento de Informação DEI IST 2011/2012 Outline Structural Recursion The XSLT Language Structural Recursion : a different paradigm for processing data Data is

More information

8) A top-to-bottom relationship among the items in a database is established by a

8) A top-to-bottom relationship among the items in a database is established by a MULTIPLE CHOICE QUESTIONS IN DBMS (unit-1 to unit-4) 1) ER model is used in phase a) conceptual database b) schema refinement c) physical refinement d) applications and security 2) The ER model is relevant

More information

2. E/R Design Considerations

2. E/R Design Considerations 2. E/R Design Considerations 32 What you will learn in this section Relationships cont d: multiplicity, multi-way Design considerations Conversion to SQL 33 Multiplicity of E/R Relationships Multiplicity

More information

CIS 110 Introduction to Computer Programming 8 October 2013 Midterm

CIS 110 Introduction to Computer Programming 8 October 2013 Midterm CIS 110 Introduction to Computer Programming 8 October 2013 Midterm Name: Recitation # (e.g., 201): Pennkey (e.g., eeaton): My signature below certifies that I have complied with the University of Pennsylvania

More information

EXAMINATIONS 2013 MID-YEAR SWEN 432 ADVANCED DATABASE DESIGN AND IMPLEMENTATION

EXAMINATIONS 2013 MID-YEAR SWEN 432 ADVANCED DATABASE DESIGN AND IMPLEMENTATION EXAMINATIONS 2013 MID-YEAR ADVANCED DATABASE DESIGN AND IMPLEMENTATION Time Allowed: 3 Hours (180 minutes) Instructions: Attempt all questions. There are 180 possible marks on the exam. Make sure your

More information

Data Integration. Lecture 23. Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems. CompSci 516: Data Intensive Computing Systems

Data Integration. Lecture 23. Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems. CompSci 516: Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 23 Data Integration Instructor: Sudeepa Roy Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 1 Announcements No class next week thanksgiving

More information

PRACTICE Examination

PRACTICE Examination PRACTICE Examination This is last year's exam. Note that it was considered a bit too long and too hard. This year's will be shorter and will include a bit more on database design and less of the most technical

More information

Hidden Markov Models. Mark Voorhies 4/2/2012

Hidden Markov Models. Mark Voorhies 4/2/2012 4/2/2012 Searching with PSI-BLAST 0 th order Markov Model 1 st order Markov Model 1 st order Markov Model 1 st order Markov Model What are Markov Models good for? Background sequence composition Spam Hidden

More information

Exam I Computer Science 420 Dr. St. John Lehman College City University of New York 12 March 2002

Exam I Computer Science 420 Dr. St. John Lehman College City University of New York 12 March 2002 Exam I Computer Science 420 Dr. St. John Lehman College City University of New York 12 March 2002 NAME (Printed) NAME (Signed) E-mail Exam Rules Show all your work. Your grade will be based on the work

More information

Module 4. Implementation of XQuery. Part 0: Background on relational query processing

Module 4. Implementation of XQuery. Part 0: Background on relational query processing Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:

More information

Problem Description Earned Max 1 CSS 20 2 PHP 20 3 SQL 10 TOTAL Total Points 50

Problem Description Earned Max 1 CSS 20 2 PHP 20 3 SQL 10 TOTAL Total Points 50 CSE 154, Autumn 2014 Midterm Exam, Friday, November 7, 2014 Name: Quiz Section: Student ID #: TA: Rules: You have 50 minutes to complete this exam. You may receive a deduction if you keep working after

More information

Exam 2 Study Guide. Denny Hood Computer Science 101

Exam 2 Study Guide. Denny Hood Computer Science 101 Exam 2 Study Guide Denny Hood denny.hood@mail.wvu.edu Computer Science 101 A Brief Word About Your Exam Your exam will be MONDAY, APRIL 10. You will have 50 minutes to complete Exam 2. 1. If you arrive

More information

COMP718: Ontologies and Knowledge Bases

COMP718: Ontologies and Knowledge Bases 1/35 COMP718: Ontologies and Knowledge Bases Lecture 9: Ontology/Conceptual Model based Data Access Maria Keet email: keet@ukzn.ac.za home: http://www.meteck.org School of Mathematics, Statistics, and

More information

CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Exam 2 November 16, 2005

CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Exam 2 November 16, 2005 CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Exam 2 November 16, 2005 Name: Instructions: 1. This is a closed book exam. Do not use any notes or books, other than your two 8.5-by-11

More information

Exact Inference: Elimination and Sum Product (and hidden Markov models)

Exact Inference: Elimination and Sum Product (and hidden Markov models) Exact Inference: Elimination and Sum Product (and hidden Markov models) David M. Blei Columbia University October 13, 2015 The first sections of these lecture notes follow the ideas in Chapters 3 and 4

More information

Describe The Differences In Meaning Between The Terms Relation And Relation Schema

Describe The Differences In Meaning Between The Terms Relation And Relation Schema Describe The Differences In Meaning Between The Terms Relation And Relation Schema describe the differences in meaning between the terms relation and relation schema. consider the bank database of figure

More information

Question Score Points Out Of 25

Question Score Points Out Of 25 University of Texas at Austin 6 May 2005 Department of Computer Science Theory in Programming Practice, Spring 2005 Test #3 Instructions. This is a 50-minute test. No electronic devices (including calculators)

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Introduction to Graphical Models

Introduction to Graphical Models Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability

More information

CSE 190D Spring 2017 Final Exam

CSE 190D Spring 2017 Final Exam CSE 190D Spring 2017 Final Exam Full Name : Student ID : Major : INSTRUCTIONS 1. You have up to 2 hours and 59 minutes to complete this exam. 2. You can have up to one letter/a4-sized sheet of notes, formulae,

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

Part 4. Decomposition Algorithms Dantzig-Wolf Decomposition Algorithm

Part 4. Decomposition Algorithms Dantzig-Wolf Decomposition Algorithm In the name of God Part 4. 4.1. Dantzig-Wolf Decomposition Algorithm Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Introduction Real world linear programs having thousands of rows and columns.

More information

Eukaryotic Gene Finding: The GENSCAN System

Eukaryotic Gene Finding: The GENSCAN System Eukaryotic Gene Finding: The GENSCAN System BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC

More information

Advanced Data Management Technologies Written Exam

Advanced Data Management Technologies Written Exam Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This

More information

Computer Science E-119 Practice Midterm

Computer Science E-119 Practice Midterm Name Computer Science E-119 Practice Midterm This exam consists of two parts. Part I has 5 multiple-choice questions worth 3 points each. Part II consists of 3 problems; show all your work on these problems

More information

Data Definition Language (DDL), Views and Indexes Instructor: Shel Finkelstein

Data Definition Language (DDL), Views and Indexes Instructor: Shel Finkelstein Data Definition Language (DDL), Views and Indexes Instructor: Shel Finkelstein Reference: A First Course in Database Systems, 3 rd edition, Chapter 2.3 and 8.1-8.4 Important Notices Reminder: Midterm is

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

Book 5. Chapter 1: Slides with SmartArt & Pictures... 1 Working with SmartArt Formatting Pictures Adjust Group Buttons Picture Styles Group Buttons

Book 5. Chapter 1: Slides with SmartArt & Pictures... 1 Working with SmartArt Formatting Pictures Adjust Group Buttons Picture Styles Group Buttons Chapter 1: Slides with SmartArt & Pictures... 1 Working with SmartArt Formatting Pictures Adjust Group Buttons Picture Styles Group Buttons Chapter 2: Slides with Charts & Shapes... 12 Working with Charts

More information

CS1800 Discrete Structures Fall 2016 Profs. Aslam, Gold, Ossowski, Pavlu, & Sprague December 16, CS1800 Discrete Structures Final

CS1800 Discrete Structures Fall 2016 Profs. Aslam, Gold, Ossowski, Pavlu, & Sprague December 16, CS1800 Discrete Structures Final CS1800 Discrete Structures Fall 2016 Profs. Aslam, Gold, Ossowski, Pavlu, & Sprague December 16, 2016 Instructions: CS1800 Discrete Structures Final 1. The exam is closed book and closed notes. You may

More information

NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E)

NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E) 1 NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E) 2 LECTURE OUTLINE More Complex SQL Retrieval Queries Self-Joins Renaming Attributes and Results Grouping, Aggregation, and Group Filtering

More information

ECE521 W17 Tutorial 10

ECE521 W17 Tutorial 10 ECE521 W17 Tutorial 10 Shenlong Wang and Renjie Liao *Some of materials are credited to Jimmy Ba, Eric Sudderth, Chris Bishop Introduction to A4 1, Graphical Models 2, Message Passing 3, HMM Introduction

More information

CS 564 PS1. September 10, 2017

CS 564 PS1. September 10, 2017 CS 564 PS1 September 10, 2017 Instructions / Notes: Using the IPython version of this problem set is strongly recommended, however you can use only this PDF to do the assignment, or replicate the functionality

More information

The University of British Columbia

The University of British Columbia The University of British Columbia Computer Science 304 Midterm Examination February 23, 2005 Time: 50 minutes Total marks: 50 Instructor: George Tsiknis Name (PRINT) (Last) (First) Signature This examination

More information

CSCI-6421 Final Exam York University Fall Term 2004

CSCI-6421 Final Exam York University Fall Term 2004 6 December 2004 CS-6421 Final Exam p. 1 of 7 CSCI-6421 Final Exam York University Fall Term 2004 Due: 6pm Wednesday 15 December 2004 Last Name: First Name: Instructor: Parke Godfrey Exam Duration: take

More information

NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E)

NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E) 1 NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E) 2 LECTURE OUTLINE More Complex SQL Retrieval Queries Self-Joins Renaming Attributes and Results Grouping, Aggregation, and Group Filtering

More information

CS 245 Midterm Exam Solution Winter 2015

CS 245 Midterm Exam Solution Winter 2015 CS 245 Midterm Exam Solution Winter 2015 This exam is open book and notes. You can use a calculator and your laptop to access course notes and videos (but not to communicate with other people). You have

More information

Excel 1. Module 6 Data Lists

Excel 1. Module 6 Data Lists Excel 1 Module 6 Data Lists Revised 4/17/17 People s Resource Center Module Overview Excel 1 Module 6 In this module we will be looking at how to describe a database and view desired information contained

More information

NJIT Department of Computer Science PhD Qualifying Exam on CS 631: DATA MANAGEMENT SYSTEMS DESIGN. Summer 2012

NJIT Department of Computer Science PhD Qualifying Exam on CS 631: DATA MANAGEMENT SYSTEMS DESIGN. Summer 2012 JIT Department of Computer Science PhD Qualifying Exam on CS 63: DATA MAAGEMET SYSTEMS DESIG Summer 202 o book or other document is allowed. Duration of the exam: 2.5 hours. The total number of points

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information