INSTITUTO SUPERIOR TÉCNICO GESTÃO E TRATAMENTO DE INFORMAÇÃO
|
|
- Kevin Carson
- 5 years ago
- Views:
Transcription
1 Número: Nome: INSTITUTO SUPERIOR TÉCNICO GESTÃO E TRATAMENTO DE INFORMAÇÃO Exam 2 - solution 30 January 2015 The duration of this exam is 2,5 Hours. You can access your own written materials, but the exam is to be done individually. You are not allowed to use computers, tablets, nor mobile phones. The maximum grade of the exam is 20 pts. Write your answers below the questions. Write your number and name at the top of each page. Present all calculations performed. After the exam starts, you can leave the room one hour after delivering the exam. The following table is be used by instructors, ONLY: SUM Page 1 of 12
2 (This page was left blank.) Page 2 of 12
3 Número: Nome: 1. (4 pts) XML Data Management Technology Consider the following XML document: <dvdcollection> <dvd> <title>good Night, and Good Luck</title> <release-year>2005</release-year> <director>george Clooney</director> <actors> <actor>george Clooney</actor> <actor>jeff Daniels</actor> <actor>david Strathairn</actor> </actors> </dvd> <dvd> <title>they Live</title> <release-year>1988</release-year> <director>john Carpenter</director> <actors> <actor>roddy Piper</actor> <actor>keith David</actor> <actor>meg Foster</actor> </actors> </dvd> <!-- list of remaining dvds --> </dvdcollection> 1.1. (2,5 pts) Present XPath expressions that, using the XML document, answer the following information needs: What are the titles of movies directed by John Carpenter, where Roddy Piper was the leading actor (i.e., the first actor appearing in the list of actors). //dvd[./director="john Carpenter"] [.//actor[1]="roddy Piper"]/title Who are the actors, in the XML dataset, that are also directors of movies released after //actor[ text() =//dvd[./release year > 1995]/director ] Who is the director of the oldest movie featuring Jeff Daniels has an actor. //dvd[.//actor="jeff Daniels " and./release year= min(//dvd[.//actor="jeff Daniels"]/release year) ]/director Page 3 of 12
4 1.2. (1 pt) Present an XQuery expression that, using the XML document, lists all movies that were directed by actors in the movie entitled Good Night, and Good Luck. Movies in the results should be sorted according to the release year, from oldest to newest. let $a := //dvd[./title="good Night, and Good Luck"]//actor for $m in //dvd where $m/director/text() = $a/text() order by $m/release year ascending return $m 1.3. (0,5 pt) Present an XQuery updating expression for changing the XML document, deleting all but the leading actor in the movies that were released prior to 1990, and adding an attribute rating = "awesome" to the dvd elements corresponding to movies directed by John Carpenter. ( for $m in //dvd[release year < 1990] let $a := $m/actors/actor[position() > 1] return delete nodes $a, for $m in //dvd[director="john Carpenter"] return insert node attribute rating { "awesome" } into $m ) Page 4 of 12
5 Número: Nome: 2. (4 pts) Web Data Extraction Consider the following trees, representing two data records encoding information about a family tree (2,5 pts) Compute the similarity (i.e., the number of matching nodes), using the Simple Tree Matching (STM) algorithm, and considering that two nodes can be aligned if they share the same label. Page 5 of 12
6 2.2. (1 pt) Compute the alignment between the trees, using the calculations performed for the previous question (make clear the backtracking process that reaches the specified alignment). The backtracking is shown in pink in the previous question 2.3. (0,5 pt) Knowing that the STM algorithm is a simplification of a more general tree matching algorithm, give an example of two HTML trees containing a data record that would not be captured by STM, but could be captured if the general algorithm was used. Explain why this would happen. Consider, for example, HTML pages contaning data records with information on books, where, in some cases the title is encoded using <strong> and in others using <emph>. This could be captured by the general algorithm but not by STM, since it discards nodes with different labels. Page 6 of 12
7 Número: Nome: 3. (4 pts) Data Integration Suppose a data source S storing the following tables: Movie (movie name, year, director name) Play (movie name, person name) Person (person name, nationality) 3.1. (2,5 pts) Rewrite the following SQL query as a conjunctive query: SELECT movie name, director name FROM Movie m, Play p, Person a WHERE m.movie name = p.movie name AND p.person name = a.person name AND a.nationality = Portuguese UNION ALL SELECT movie name, director name FROM Movie m WHERE m.year = 1995 Q(m, d) :- Movie(m, y, d), Play(m, p), Person(p, Portuguese ) Q(m, d) :- Movie(m, 1995, d) Suppose you have the following mediated schema M: Portuguese movies(movie name, year) which represents the names and years of Movies whose actors are Portuguese or whose director is Portuguese. Write a global-as-view mapping between the mediated schema M and the data source schema S. Portuguese-movies (m, y) = Movie(m, y, d), Play(m, p), Person(p, Portuguese ) Portuguese-movies/m, y) = Movie(m,y, d), Person(d, Portuguese) Write a conjunctive query in terms of the mediated schema that returns the names of portuguese movies directed after Then, unfold it and rewrite it in terms of the tables of data source S. Q (m) :- Portuguese-movies(m, y), y >= 1995 Unfolding: Q (m) :- Movie(m, y, d), Plays(m, p), Person(p, Portuguese ), y >= 1995 Page 7 of 12
8 Q (m) :- Movie(m,y, d), Person(d, Portuguese), y >= (1 pt) Suppose you have a pre-computed view: Portuguese Person(m,p) : Plays(m,p), Person(p, Portuguese ) How would write the conjunctive query of Question using the view Portuguese-Person? Portuguese-movies (m, y) = Movie (m, y, d), Portuguese-Person(m,p) Portuguese-movies (m, y) = Movie(m,y, d), Person(d, Portuguese) 3.2. (0,5 pt) For the following pair of queries, state which relationship exists (equivalence or containment) between them. Justify. Q1(A,B,E) : T(A,B,C), R(C,E), T(A,B,E), R(E,C) Q2(U,V,Z) : T(U,V,Z), R(Z,5) There is no relationship Page 8 of 12
9 Número: Nome: 4. (4 pts) Data Cleaning and Integration 4.1. (2,5 pts) Suppose the following two tuples: Good Night, and Good Luck 2005 George Clooney George Cloony, Jeff Daniels, David Strathairn nice well directed exceptional actors Good Night Good Luck 2006 George Clooney Jeff Daniels and George Clooney and David Strahtairn wonderful nicely directed good actors of a table with schema: Movies (movie name, year, director, actors, review) The goal is to automatically detect that the two tuples refer to the same movie Which string matching algorithm would you use to compare the movie names? Justify. Would you use the same string matching algorithm to compare the reviews? Justify. We could use edit distance for instance, because they are medium-sized strings. To compare the reviews, edit distance would not give good results, because the same words can occur in a different position, so edit distance would not give good results. A possibility is to use TF/IDF Now, imagine you want to identify if the lists of actors of the two tuples are similar. Would you apply a string matching algorithm directly to the two strings that represent the actors in each record? If no, what would you do? We cannot apply a string matching directly to the two strings, because the actor names are separated by a different separator and they do not occur in the same order. It would be better to first split the actor field into one tuple per actor and store the actor tuples in a distinct table. Then a string matching algorithm could be applied Which string matching algorithm is appropriate to compare person names? Use that algorithm to compute the similarity between Clooney and Cloony in the two tuples and between Strahtair and Strathair? Do they return the same value? Why? Jaro measure is good to apply to short names Jaro (Clooney, cloony) : x = 7 y = 6 Common chars: 6 Transposed: 0 Jaro = 1/3 [ c/ x + c/ y + (c t/2)/c ] = 1/3 (6/7 + 6/6 +6/6) =1/3( ) = 0.95 Jaro(Strahtar, Strathair) x = 8 y = 9 Page 9 of 12
10 Common chars: 9 Transposed: 2 Jaro = 1/3(9/9 + 9/9 + (9 1)/8) = 0.96 Although the nb of common caracters is the size of one of the words, one of the pairs has 2 transposed characters which decreases the similarity value (1 pt) Consider now only the possible values of the attribute review. Besides the two values represented above (denoted t1 and t2, respectively) that correspond to positive reviews, consider that you have another two instances denoted t3 and t4 that correspond to negative reviews. Suppose as well that the review attribute values have undergone a normalization process. The resulting set of reviews is as follows: t1: {nice, well, directed, exceptional, actor} positive t2: {wonderful, nice, directed, good, actor} positive t3: {medium, film, terrible, direction, actor} negative t4: {poor, directed, medium, film} negative Now, suppose we have another table with schema T(Y) and we have one tuple of that table <nice, well, actor, good, directed>. Use a Naïve Bayes Learner to learn with the four possible instances of the review attribute of the Movie table (t1, t2, t3, and t4) and then to predict whether the value of attribute Y refers to a positive or a negative review. d: { nice, well, actor, good, directed } P(positive d)= P(d positive)p(positive)/p(d) P(negative d)= P(d negative)p(negative)/p(d) Cd = arg max ci [P(d C i)p(c i)], where ci is positive or ci is negative P(d ci) and P(ci) P(ci) - the portion of the training instances with label ci P(positive) = 0.5 P(negative) = 0.5 N(positive) = 13 N(negative) = 9 P(d positive)=p(nice positive). P(well positive).p(actor positive).p(good positive).p(directed positive) P(nice positive) = n(nice, positive)/n(positive) = 2/10 P(good positive) = n(good, positive)/n(positive) = 1/10 P(actor positive) = 2/10 P(well positive) =1/10 P(directed positive) = 2/10 P(d positive) = 0.5*8/10*10*10*10*10 P(d negative)=p(good negative).p(nice negative).p(actor negative).p(well negative).p(directed negative) P(good negative) = n(good, negative)/n(negative) = 0 P(nice negative) = 0 P(actor negative) = 1 P(well negative) = 0 P(directed negative) = 1 P(d negative)=0 Page 10 of 12
11 Número: Nome: So the answer is: positive review (0,5 pt) Suppose that you have 1 million tuples stored in the Movies table. Which method do you suggest to use to optimize the time needed to find all the tuples that refer to the same movie? Describe it briefly and point out one limitation of the method. Sorted neighborhood method. It consists of a first phase where a key composed by parts of every attribute is chosen, a second phase where the tuples are sorted according to this key, and a third where a fixed size window slide the set of tuples and only those that are within the window are compared using a set of matching rules. One limitation of this method is the possibility of loosing matches. Page 11 of 12
12 5. (4 pts) Miscellaneous 5.1. (1,5 pt) In this course you have seen dynamic programming at work in several algorithms/techniques. In string matching, what is dynamic programming used for? How does it work? Explain in your own words. Use a diagram or example if needed, but do not copy content from the slides. Answer: In string matching, dynamic programming is used to calculate the (minimum) edit distance between two given strings, where the possible edit operations are insertion, deletion, or substitution of characters. Basically, we build a matrix and in each cell of that matrix we consider the possibility of using each of those edit operations, but only with respect to the neighboring cells (the neighbor on top, the neighbor on the left, and the neighbor on the diagonal top-left). Usually, each edit operation is defined as having a cost of 1(one). The cost is 0(zero) if there is a match between the characters in both strings. As we build the matrix (by filling in the value in each cell), we choose the option that yields the minimum accumulated cost. Once the matrix is fully built, we backtrack over those options to find the corresponding edit operations (which gives us the alignment between both strings) (1,5 pt) In Hidden Markov Models (HMMs), what is dynamic programming used for? How does it work? Explain in your own words. Use a diagram or example if needed, but do not copy content from the slides. Answer: In HMMs, dynamic programming is used to find the most likely sequence of states for a given observed sequence of symbols. This is called the Viterbi algorithm. Basically, we need to find which state generated each symbol. At first sight, it could seem that we would have to consider every possibility of each state generating each symbol in the observed sequence. However, there are transition probabilities between states (and symbol emission probabilities in each state), so if we know which state generated symbol i, we can determine which state is more likely to have generated symbol i+1. Therefore, at each step we keep only the state that maximizes such probability (instead of keeping all possible transitions). Once we reach the end of the sequence, we can backtrack over the sequence of states which yields the highest total probability (1 pt) Now that you have seen dynamic programming at work in different places, what is the essence of dynamic programming? How would you describe it in general terms? What is so special about dynamic programming that makes it a good choice to solve certain problems? What do these problems have in common? Answer: In string matching, dynamic programming allows us to find a globally optimal alignment by doing a local minimization of the accumulated cost between neighboring cells. In HMMs, dynamic programming allows us to find a globally optimum sequence of states by doing a local maximization of the transition (and symbol emission) probabilities between consecutive states. Therefore, it seems that dynamic programming can be applied to those problems where a globally optimal solution can be found by a series of locally optimal decisions. Page 12 of 12
INSTITUTO SUPERIOR TÉCNICO Gestão e Tratamento de Informação
-------------------------------------------------------------------------------------------------------------- INSTITUTO SUPERIOR TÉCNICO Gestão e Tratamento de Informação Exam 1 16 January 2011 --------------------------------------------------------------------------------------------------------------
More informationINSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados
-------------------------------------------------------------------------------------------------------------- INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados Exam 1 - Solution
More informationINSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados
-------------------------------------------------------------------------------------------------------------- INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados Exam 1 16 June 2014
More informationINSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados
-------------------------------------------------------------------------------------------------------------- INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados Exam 1 - solution
More informationFinal Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.
CS 188 Spring 2010 Introduction to Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators
More informationCIS 550 Fall Final Examination. December 13, Name: Penn ID:
CIS 550 Fall 2013 Final Examination December 13, 2013 Name: Penn ID: Email: My signature below certifies that I have complied with the University of Pennsylvania's Code of Academic Integrity in completing
More informationEECS-3421a: Test #2 Queries
2016 November 9 EECS-3421a: Test #2 w/ answers 1 of 16 EECS-3421a: Test #2 Queries Electrical Engineering & Computer Science Lassonde School of Engineering York University Family Name: Given Name: Student#:
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationPredict the box office of US movies
Predict the box office of US movies Group members: Hanqing Ma, Jin Sun, Zeyu Zhang 1. Introduction Our task is to predict the box office of the upcoming movies using the properties of the movies, such
More informationTo earn the extra credit, one of the following has to hold true. Please circle and sign.
CS 188 Spring 2011 Introduction to Artificial Intelligence Practice Final Exam To earn the extra credit, one of the following has to hold true. Please circle and sign. A I spent 3 or more hours on the
More informationGestão e Tratamento da Informação
Gestão e Tratamento da Informação Web Data Extraction: Automatic Wrapper Generation Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2010/2011 Outline Automatic Wrapper Generation
More informationCS 474, Spring 2016 Midterm Exam #2
CS 474, Spring 2016 Midterm Exam #2 Name: e-id: @dukes.jmu.edu By writing your name, you acknowledge the following honor code statement: I have neither given nor received unauthorized assistance on this
More informationCSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100
CSE 344 Midterm November 9, 2011, 9:30am - 10:20am Name: Question Points Score 1 40 2 40 3 20 Total: 100 This exam is open book and open notes but NO laptops or other portable devices. You have 50 minutes;
More informationCSE 344 Midterm. Wednesday, February 19, 2014, 14:30-15:20. Question Points Score Total: 100
CSE 344 Midterm Wednesday, February 19, 2014, 14:30-15:20 Name: Question Points Score 1 30 2 50 3 12 4 8 Total: 100 This exam is open book and open notes but NO laptops or other portable devices. You have
More informationCSE-3421 Test #1 Design
2 April 2009 CSE-3421 Test #1 (corrected) w/ answers p. 1 of 10 CSE-3421 Test #1 Design Family Name: Given Name: Student#: CS Account: Instructor: Parke Godfrey Exam Duration: 75 minutes Term: winter 2009
More informationA Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models
A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University
More informationFall 2018 CSE 482 Big Data Analysis: Exam 1 Total: 36 (+3 bonus points)
Fall 2018 CSE 482 Big Data Analysis: Exam 1 Total: 36 (+3 bonus points) Name: This exam is open book and notes. You can use a calculator but no laptops, cell phones, nor other electronic devices are allowed.
More informationMultimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009
9 Video Retrieval Multimedia Databases 9 Video Retrieval 9.1 Hidden Markov Models (continued from last lecture) 9.2 Introduction into Video Retrieval Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme
More informationCSE548, AMS542: Analysis of Algorithms, Fall 2012 Date: October 16. In-Class Midterm. ( 11:35 AM 12:50 PM : 75 Minutes )
CSE548, AMS542: Analysis of Algorithms, Fall 2012 Date: October 16 In-Class Midterm ( 11:35 AM 12:50 PM : 75 Minutes ) This exam will account for either 15% or 30% of your overall grade depending on your
More informationIntroduction to Algorithms May 14, 2003 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser.
Introduction to Algorithms May 14, 2003 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik Demaine and Shafi Goldwasser Practice Final Practice Final Do not open this exam booklet until
More informationHidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi
Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential
More informationName: Database Systems ( 資料庫系統 ) Midterm exam, November 15, 2006
1 of 8 pages Database Systems ( 資料庫系統 ) Midterm exam, November 15, 2006 Time: 10:00 ~ 12:20 Name: Student ID: I herewith state that I understand and will adhere to the following academic integrity: I will
More informationCSE 190D Spring 2017 Final Exam Answers
CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join
More informationMultimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Previous Lecture Audio Retrieval - Query by Humming
More informationInteractive Machine Learning (IML) Markup of OCR Generated Text by Exploiting Domain Knowledge: A Biodiversity Case Study
Interactive Machine Learning (IML) Markup of OCR Generated by Exploiting Domain Knowledge: A Biodiversity Case Study Several digitization projects such as Google books are involved in scanning millions
More informationXML Problem. Specification of the Publication Entity:
XML Problem Consider the following specification for the Publication entity and its corresponding ER model, which is designed to represent every aspect of this specification as much as possible. Specification
More informationChapter 6. Multiple sequence alignment (week 10)
Course organization Introduction ( Week 1,2) Part I: Algorithms for Sequence Analysis (Week 1-11) Chapter 1-3, Models and theories» Probability theory and Statistics (Week 3)» Algorithm complexity analysis
More informationNAME: Sample Final Exam (based on previous CSE 455 exams by Profs. Seitz and Shapiro)
Computer Vision Prof. Rajesh Rao TA: Jiun-Hung Chen CSE 455 Winter 2009 Sample Final Exam (based on previous CSE 455 exams by Profs. Seitz and Shapiro) Write your name at the top of every page. Directions
More informationChapter 8 Multiple sequence alignment. Chaochun Wei Spring 2018
1896 1920 1987 2006 Chapter 8 Multiple sequence alignment Chaochun Wei Spring 2018 Contents 1. Reading materials 2. Multiple sequence alignment basic algorithms and tools how to improve multiple alignment
More informationCSE-6490B Final Exam
February 2009 CSE-6490B Final Exam Fall 2008 p 1 CSE-6490B Final Exam In your submitted work for this final exam, please include and sign the following statement: I understand that this final take-home
More informationDatabases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24
Databases -Normalization I (GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 1 / 24 This lecture This lecture introduces normal forms, decomposition and normalization. We will explore problems
More informationCS145 Midterm Examination
CS145 Midterm Examination Spring 2004, Prof. Widom Please read all instructions (including these) carefully. There are 9 problems on the exam, with a varying number of points for each problem and subproblem
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationNOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages.
NOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages. The last page, only, may be separated and used as an
More informationExam Marco Kuhlmann. This exam consists of three parts:
TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding
More informationCSE-3421M Test #2. Queries
14 March 2013 CSE-3421M Test #2 w/ answers p. 1 of 16 CSE-3421M Test #2 Queries Family Name: Given Name: Student#: CS&E Account: Instructor: Parke Godfrey Exam Duration: 75 minutes Term: Winter 2013 Answer
More informationCSE 344 Midterm. November 9, 2011, 9:30am - 10:20am. Question Points Score Total: 100
CSE 344 Midterm November 9, 2011, 9:30am - 10:20am Name: Question Points Score 1 40 2 40 3 20 Total: 100 This exam is open book and open notes but NO laptops or other portable devices. You have 50 minutes;
More informationCS 170 Algorithms Spring 2009 David Wagner MT2
CS 170 Algorithms Spring 2009 David Wagner MT2 PRINT your name:, (last) SIGN your name: (first) PRINT your Unix account login: Your TA s name: Discussion section time: Name of the person sitting to your
More informationSequence analysis Pairwise sequence alignment
UMF11 Introduction to bioinformatics, 25 Sequence analysis Pairwise sequence alignment 1. Sequence alignment Lecturer: Marina lexandersson 12 September, 25 here are two types of sequence alignments, global
More informationCS145 Midterm Examination
CS145 Midterm Examination Spring 2002, Prof. Widom Please read all instructions (including these) carefully. There are 9 problems on the exam, with a varying number of points for each problem and subproblem
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationCS145 Midterm Examination
CS145 Midterm Examination Autumn 2005, Prof. Widom Please read all instructions (including these) carefully. There are 8 problems on the exam, with a varying number of points for each problem and subproblem
More informationLBSC 690: Information Technology Lecture 05 Structured data and databases
LBSC 690: Information Technology Lecture 05 Structured data and databases William Webber CIS, University of Maryland Spring semester, 2012 Interpreting bits "my" 13.5801 268 010011010110 3rd Feb, 2014
More informationData Analytics. Qualification Exam, May 18, am 12noon
CS220 Data Analytics Number assigned to you: Qualification Exam, May 18, 2014 9am 12noon Note: DO NOT write any information related to your name or KAUST student ID. 1. There should be 12 pages including
More informationCSE 344 Midterm. Monday, Nov 4, 2013, 9:30-10:20. Question Points Score Total: 100
CSE 344 Midterm Monday, Nov 4, 2013, 9:30-10:20 Name: Question Points Score 1 30 2 10 3 50 4 10 Total: 100 This exam is open book and open notes but NO laptops or other portable devices. You have 50 minutes;
More information(a) Explain how physical data dependencies can increase the cost of maintaining an information
NOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages. The last page, only, may be separated and used as an
More information15-780: Problem Set #2
15-780: Problem Set #2 February 19, 2014 1. Constraint satisfaction problem (CSP) [20pts] A common problem at universities is to schedule rooms for exams. The basic structure of this problem is to divide
More informationAssignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018
Assignment 2 Unsupervised & Probabilistic Learning Maneesh Sahani Due: Monday Nov 5, 2018 Note: Assignments are due at 11:00 AM (the start of lecture) on the date above. he usual College late assignments
More informationComputer Science 425 Fall 2006 Second Take-home Exam Out: 2:50PM Wednesday Dec. 6, 2006 Due: 5:00PM SHARP Friday Dec. 8, 2006
Computer Science 425 Fall 2006 Second Take-home Exam Out: 2:50PM Wednesday Dec. 6, 2006 Due: 5:00PM SHARP Friday Dec. 8, 2006 Instructions: This exam must be entirely your own work. Do not consult with
More informationCS 582 Database Management Systems II
Review of SQL Basics SQL overview Several parts Data-definition language (DDL): insert, delete, modify schemas Data-manipulation language (DML): insert, delete, modify tuples Integrity View definition
More informationHIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT
HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT - Swarbhanu Chatterjee. Hidden Markov models are a sophisticated and flexible statistical tool for the study of protein models. Using HMMs to analyze proteins
More informationCS145 Final Examination
CS145 Final Examination Spring 2003, Prof. Widom ffl Please read all instructions (including these) carefully. ffl There are 11 problems on the exam, with a varying number of points for each problem and
More informationLecture 19 Query Processing Part 1
CMSC 461, Database Management Systems Spring 2018 Lecture 19 Query Processing Part 1 These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used from
More informationRecommender Systems (RSs)
Recommender Systems Recommender Systems (RSs) RSs are software tools providing suggestions for items to be of use to users, such as what items to buy, what music to listen to, or what online news to read
More informationInformationslogistik Unit 5: Data Integrity & Functional Dependency
Informationslogistik Unit 5: Data Integrity & Functional Dependency 27. III. 2012 Outline 1 Reminder: The Relational Algebra 2 The Relational Calculus 3 Data Integrity Keeping data consistent 4 Functional
More informationCISC 3140 (CIS 20.2) Design & Implementation of Software Application II
CISC 3140 (CIS 20.2) Design & Implementation of Software Application II Instructor : M. Meyer Email Address: meyer@sci.brooklyn.cuny.edu Course Page: http://www.sci.brooklyn.cuny.edu/~meyer/ CISC3140-Meyer-lec4
More informationShallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001
Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher - 113059006 Raj Dabre 11305R001 Purpose of the Seminar To emphasize on the need for Shallow Parsing. To impart basic information about techniques
More informationClassification. 1 o Semestre 2007/2008
Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class
More informationName: Lirong TAN 1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G.
1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G. A shortest s-t path is a path from vertex to vertex, whose sum of edge weights is minimized. (b) Give the pseudocode
More informationLecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013
Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest
More informationIntroduction to Algorithms October 12, 2005 Massachusetts Institute of Technology Professors Erik D. Demaine and Charles E. Leiserson Quiz 1.
Introduction to Algorithms October 12, 2005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D. Demaine and Charles E. Leiserson Quiz 1 Quiz 1 Do not open this quiz booklet until you
More informationXSLT and Structural Recursion. Gestão e Tratamento de Informação DEI IST 2011/2012
XSLT and Structural Recursion Gestão e Tratamento de Informação DEI IST 2011/2012 Outline Structural Recursion The XSLT Language Structural Recursion : a different paradigm for processing data Data is
More information8) A top-to-bottom relationship among the items in a database is established by a
MULTIPLE CHOICE QUESTIONS IN DBMS (unit-1 to unit-4) 1) ER model is used in phase a) conceptual database b) schema refinement c) physical refinement d) applications and security 2) The ER model is relevant
More information2. E/R Design Considerations
2. E/R Design Considerations 32 What you will learn in this section Relationships cont d: multiplicity, multi-way Design considerations Conversion to SQL 33 Multiplicity of E/R Relationships Multiplicity
More informationCIS 110 Introduction to Computer Programming 8 October 2013 Midterm
CIS 110 Introduction to Computer Programming 8 October 2013 Midterm Name: Recitation # (e.g., 201): Pennkey (e.g., eeaton): My signature below certifies that I have complied with the University of Pennsylvania
More informationEXAMINATIONS 2013 MID-YEAR SWEN 432 ADVANCED DATABASE DESIGN AND IMPLEMENTATION
EXAMINATIONS 2013 MID-YEAR ADVANCED DATABASE DESIGN AND IMPLEMENTATION Time Allowed: 3 Hours (180 minutes) Instructions: Attempt all questions. There are 180 possible marks on the exam. Make sure your
More informationData Integration. Lecture 23. Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems. CompSci 516: Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 23 Data Integration Instructor: Sudeepa Roy Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 1 Announcements No class next week thanksgiving
More informationPRACTICE Examination
PRACTICE Examination This is last year's exam. Note that it was considered a bit too long and too hard. This year's will be shorter and will include a bit more on database design and less of the most technical
More informationHidden Markov Models. Mark Voorhies 4/2/2012
4/2/2012 Searching with PSI-BLAST 0 th order Markov Model 1 st order Markov Model 1 st order Markov Model 1 st order Markov Model What are Markov Models good for? Background sequence composition Spam Hidden
More informationExam I Computer Science 420 Dr. St. John Lehman College City University of New York 12 March 2002
Exam I Computer Science 420 Dr. St. John Lehman College City University of New York 12 March 2002 NAME (Printed) NAME (Signed) E-mail Exam Rules Show all your work. Your grade will be based on the work
More informationModule 4. Implementation of XQuery. Part 0: Background on relational query processing
Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:
More informationProblem Description Earned Max 1 CSS 20 2 PHP 20 3 SQL 10 TOTAL Total Points 50
CSE 154, Autumn 2014 Midterm Exam, Friday, November 7, 2014 Name: Quiz Section: Student ID #: TA: Rules: You have 50 minutes to complete this exam. You may receive a deduction if you keep working after
More informationExam 2 Study Guide. Denny Hood Computer Science 101
Exam 2 Study Guide Denny Hood denny.hood@mail.wvu.edu Computer Science 101 A Brief Word About Your Exam Your exam will be MONDAY, APRIL 10. You will have 50 minutes to complete Exam 2. 1. If you arrive
More informationCOMP718: Ontologies and Knowledge Bases
1/35 COMP718: Ontologies and Knowledge Bases Lecture 9: Ontology/Conceptual Model based Data Access Maria Keet email: keet@ukzn.ac.za home: http://www.meteck.org School of Mathematics, Statistics, and
More informationCPSC 310: Database Systems / CSPC 603: Database Systems and Applications Exam 2 November 16, 2005
CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Exam 2 November 16, 2005 Name: Instructions: 1. This is a closed book exam. Do not use any notes or books, other than your two 8.5-by-11
More informationExact Inference: Elimination and Sum Product (and hidden Markov models)
Exact Inference: Elimination and Sum Product (and hidden Markov models) David M. Blei Columbia University October 13, 2015 The first sections of these lecture notes follow the ideas in Chapters 3 and 4
More informationDescribe The Differences In Meaning Between The Terms Relation And Relation Schema
Describe The Differences In Meaning Between The Terms Relation And Relation Schema describe the differences in meaning between the terms relation and relation schema. consider the bank database of figure
More informationQuestion Score Points Out Of 25
University of Texas at Austin 6 May 2005 Department of Computer Science Theory in Programming Practice, Spring 2005 Test #3 Instructions. This is a 50-minute test. No electronic devices (including calculators)
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationIntroduction to Graphical Models
Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability
More informationCSE 190D Spring 2017 Final Exam
CSE 190D Spring 2017 Final Exam Full Name : Student ID : Major : INSTRUCTIONS 1. You have up to 2 hours and 59 minutes to complete this exam. 2. You can have up to one letter/a4-sized sheet of notes, formulae,
More informationAnnouncement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17
Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa
More informationPart 4. Decomposition Algorithms Dantzig-Wolf Decomposition Algorithm
In the name of God Part 4. 4.1. Dantzig-Wolf Decomposition Algorithm Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Introduction Real world linear programs having thousands of rows and columns.
More informationEukaryotic Gene Finding: The GENSCAN System
Eukaryotic Gene Finding: The GENSCAN System BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC
More informationAdvanced Data Management Technologies Written Exam
Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This
More informationComputer Science E-119 Practice Midterm
Name Computer Science E-119 Practice Midterm This exam consists of two parts. Part I has 5 multiple-choice questions worth 3 points each. Part II consists of 3 problems; show all your work on these problems
More informationData Definition Language (DDL), Views and Indexes Instructor: Shel Finkelstein
Data Definition Language (DDL), Views and Indexes Instructor: Shel Finkelstein Reference: A First Course in Database Systems, 3 rd edition, Chapter 2.3 and 8.1-8.4 Important Notices Reminder: Midterm is
More informationEvaluating XPath Queries
Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But
More informationBook 5. Chapter 1: Slides with SmartArt & Pictures... 1 Working with SmartArt Formatting Pictures Adjust Group Buttons Picture Styles Group Buttons
Chapter 1: Slides with SmartArt & Pictures... 1 Working with SmartArt Formatting Pictures Adjust Group Buttons Picture Styles Group Buttons Chapter 2: Slides with Charts & Shapes... 12 Working with Charts
More informationCS1800 Discrete Structures Fall 2016 Profs. Aslam, Gold, Ossowski, Pavlu, & Sprague December 16, CS1800 Discrete Structures Final
CS1800 Discrete Structures Fall 2016 Profs. Aslam, Gold, Ossowski, Pavlu, & Sprague December 16, 2016 Instructions: CS1800 Discrete Structures Final 1. The exam is closed book and closed notes. You may
More informationNESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E)
1 NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E) 2 LECTURE OUTLINE More Complex SQL Retrieval Queries Self-Joins Renaming Attributes and Results Grouping, Aggregation, and Group Filtering
More informationECE521 W17 Tutorial 10
ECE521 W17 Tutorial 10 Shenlong Wang and Renjie Liao *Some of materials are credited to Jimmy Ba, Eric Sudderth, Chris Bishop Introduction to A4 1, Graphical Models 2, Message Passing 3, HMM Introduction
More informationCS 564 PS1. September 10, 2017
CS 564 PS1 September 10, 2017 Instructions / Notes: Using the IPython version of this problem set is strongly recommended, however you can use only this PDF to do the assignment, or replicate the functionality
More informationThe University of British Columbia
The University of British Columbia Computer Science 304 Midterm Examination February 23, 2005 Time: 50 minutes Total marks: 50 Instructor: George Tsiknis Name (PRINT) (Last) (First) Signature This examination
More informationCSCI-6421 Final Exam York University Fall Term 2004
6 December 2004 CS-6421 Final Exam p. 1 of 7 CSCI-6421 Final Exam York University Fall Term 2004 Due: 6pm Wednesday 15 December 2004 Last Name: First Name: Instructor: Parke Godfrey Exam Duration: take
More informationNESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E)
1 NESTED QUERIES AND AGGREGATION CHAPTER 5 (6/E) CHAPTER 8 (5/E) 2 LECTURE OUTLINE More Complex SQL Retrieval Queries Self-Joins Renaming Attributes and Results Grouping, Aggregation, and Group Filtering
More informationCS 245 Midterm Exam Solution Winter 2015
CS 245 Midterm Exam Solution Winter 2015 This exam is open book and notes. You can use a calculator and your laptop to access course notes and videos (but not to communicate with other people). You have
More informationExcel 1. Module 6 Data Lists
Excel 1 Module 6 Data Lists Revised 4/17/17 People s Resource Center Module Overview Excel 1 Module 6 In this module we will be looking at how to describe a database and view desired information contained
More informationNJIT Department of Computer Science PhD Qualifying Exam on CS 631: DATA MANAGEMENT SYSTEMS DESIGN. Summer 2012
JIT Department of Computer Science PhD Qualifying Exam on CS 63: DATA MAAGEMET SYSTEMS DESIG Summer 202 o book or other document is allowed. Duration of the exam: 2.5 hours. The total number of points
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More information