Database Selection and Keyword Search of Structured Databases: Powerful Search for Naive Users

Size: px
Start display at page:

Download "Database Selection and Keyword Search of Structured Databases: Powerful Search for Naive Users"

Transcription

1 Database Selection and Keyword Search of Structured Databases: Powerful Search for Naive Users Mohammad Reda Alhajjl Mike J. Ridley" Ken Barked " School of Informatics Bradford University, Bradford West Yorkshire, BD7 1DP United Kingdom ADSA Lab, Computer Science Dept University of Calgary Calgary, Alberta T2N ln4 Canada {alhajj, Abstract - The main target of the wrk described in this paper is to provide a powerful approach for naile users to search structured databases. Such a study is necessary especially to satisfi web users who expect the abilify to access all web contents in a unified way, regardless of the structure of the available information. Given a set of distributed structured databases and a que?y which consists of a set of keywords connected by logical operators, the approach proposed in this paper adapts both web text files search techniques and information retrieval techniques to rank the existing databases based on their relevance to the posed query. For each keyword, the user specifies a level of search, which may be column, record, or table. We developed an estimation method with statistical foundations to estimate the usefulness of individual relational databases. The system gives a hint of what databases might be useful for the user's query, based on wrd-frequency information kept for each database. Some experiments have been conducted to demonstrate the efectiveness of the proposed method in determining promising sources for a given query. As nai've end-user satisfaction is a main target and motive. w developed a prototype system with a user friendly web-based interface that accomplishes our goals in a simple andpowiful wy. 1 Introduction The way we search for information is one of the most important differences between relational and text or document databases, at least from end userk perspective. It is more complicated to access structured databases, especially in a dynamic environment like the web, where varying or unknown database structures make the query formulation process a very difficult task. An approach taken in some cases is to export data from structured databases to web pages; then provide text search on web documents. This approach results in duplication of data, with the problem of keeping the versions up-to-date, in addition tu space and time overheads. Further, it is not feasible to export every possible combination, especially with highlyconnected data. The problem handled in this paper can be stated as follows. Consider a set of distributed heterogeneous relational databases DB and a query q, which consists of keywords connected by logical operators; it is required to allow users to search and access, and eventually manipulate certain information by using a keyword search tu submit q. such that no a priori knowledge of the database schemas or the place of the required information is necessary. The system selects a subset of DB, which consists of good candidate databases for submitting q. To make this selection, the system uses an estimator function, which assesses how good each database in DE is with respect to q. It is also required to rank the databases, i.e., decide on the order according to which they should he visited in order to provide the most effective response to q. Our solution to this problem involves building a server that can suggest databases to search and the order according to which they should he visited. The system gives a hint ofwhat databases might be useful for the user query, based on word-frequency information kept for each database; this process is known as the database selection problem. These are hard problems that have already been addressed hy many other researchers, e.g., [SI. The mainstream approaches in the development of database selection techniques was document or text oriented. Our approach described in this paper addresses structured relational databases. Simply, we present a novel architecture for a global information system, which contains an arbitrary number of distributed heterogeneous autonomous relational databases. It enables users to find information of interest and the databases that contain the targeted information without a priori knowledge about the structure of the databases involved or the location of the required information. The databases are ranked /03/ $ IEEE 175

2 according to retrieval function computation. The proposed mechanism accomplishes these goals without affecting the autonomy and heterogeneity of the participating databases. The database selection problem can be described as follows. Given a query and a set of databases, we wish either to select a subset of databases to which we will send the query, or order the databases such that we may send the query to the databases in that order or send the query to the top ranked n databases. We follow the latter interpretation. We view the database selection problem as producing a ranked list of a given set of databases, in decreasing order, according to their estimated potential usefulness to a given query. The rest of the paper is organized as follows. The related work is summarized in Section 2. An overview of the proposed system is given in Section 3. The representation of the database related information is covered in Section 4. Query representation together with the estimator function and the database selection process are presented in Section 5. Section 6 includes a summary and the conclusions. 2 Related Work Adapting keyword search to structured databases has already attracted the attention of several researchers. However, existing approaches mainly concentrate on searching a single database. They also have one thing in common; almost all of them use a graph to implement the basic database representation. For instance, a framework for keyword search on databases when the schema is not known to the user is presented in [Ill. The main limitation of this work is that all keywords must be contained in the same tuple. The approach described in [7] deals with the problem of keyword search over XML documents. Goldman et al [9] address the problem of proximity search over semi-structured stores. They restrict results to tuples from one relation near a set of keywords. DataSpot [5] is a commercial system that supports keyword-based search by extracting the contents ofthe database into a hyperbase; it duplicates the contents of the database, making data integrity and maintenance difficult. Other approaches include DISCOVER [IO], DBXplorer [I], BBQ [IZ], as well as those described in [3] and [13]. Note that the growth in online resources demands efficient techniques for improving the search space in a distributed environment [6], i.e., database selection is a fundamental problem in distributed search. As the number of databases involved in the process increases, sending each query to all databases becomes no longer a feasible and reasonable approach. The primary goal is to select a small set of databases to send a query to, without sacrificing retrieval effectiveness. Our proposed solution to the usefulness estimation problem is an extension of the approach employed in GIOSS [8]. We simply took the level of search and the structure of the relational database into account. The frequency information that GIOSS keeps about the databases are different from ours; our system keeps information related to the granularity of each relational database. Also, the final query result or the chosen set according to GIOSS is different from the one we are interested in. Finally, the mainstream approaches in the development of IR systems and database selection techniques, including GIOSS, are document or text oriented. Our system has been implemented as a structured relational database application that enables users to search through the granularity of a relational database in a distributed environment. 3 An Overview of the Proposed Approach Given a set of relational databases, we can provide efficient and effective access to them by applying the following approach. I- Each site constructs its Master Relation, which is a single table that combines all the data required to be searched within the corresponding database. Then, each site computes its local database -levels of database- frequencies for each term in this master relation, and the summary index that contains the total number of elements at each level. The system requires that each database cooperates and periodically updates these frequencies, following some predefined protocol. Both, the master relation and the statistical information (frequencies) could be easily built mechanically in any relational database management system. We developed and implemented an application for building these indexes as part of our prototype. 2- A central database selection server imports these frequencies from each remote database. We developed and implemented an application to import this information from distributed remote databases. 3- The server creates a union database frequency and a union summary index. We developed and implemented an application to build this union database frequency and the union summary index. Note that the estimation method depends on these unions during the database selection process. 4- Given a query q. which consists of a set of kejwords, logical operators, search-level and search 176

3 method, the conceptual database selection algorithm proceeds as follows: 0 Compute keywords estimator according to the specified level and the specified logical operators. Computeiselect the relevant se1 according to the specified search method. Sort the relevant set databases selected in the previous step, and return the result as specified by the search method. All of the above enumerated steps have been incorporated in a prototype, which has been developed and implemented as a user-friendly web-based application. 4 Representation of Database Related Information In a distributed search, different types of database representatives for each database may be available to the searching system. These representatives that characterize the contents of the database are used to estimate the usefulness of a database to a given query hy applying a certain database selection algorithm. Depending on the kind of information and the database representatives available, different approaches have been proposed to estimate the usefulness of a database. Most approaches require information about the terms that appear in the database and the statistical information related to these terms, such as, term frequencies, document frequencies and term weights. One of the most popular representatives used in distributed search systems and information retrieval systems is the inverted file. To serve our purpose, we need to record whether a keyword could represent a table, a record or an attribute. In traditional IR systems, such a distinction is not required because IR systems ignore the structure; they mainly depend on a document as the granularity. As a result, we propose to use an extended inverted file that stores information at row, column and table level granularity. This extended inverted file is part of the required database representative, which we call the database uordrfrequency information; it is created as a table within a relational database. While calculating these frequencies, we just count each record that contains the considered word once, even if the word appears more than once within the record. The same thing applies for columns and tables. We may reduce the words to their stems using a stemming algorithm. We may also leave out the often frequent words with little semantics using a stopping algorithm. Another schema is also needed as part of the database representative to store the total number of elements at each level of the master relation for each database, which we call the summary index. These two relations represent the required database representative. Our approach depends on this structure to create union Frequency information, which is required for estimating the usemness of a database with respect to a given query. Database administrators are also responsible for creating this database word-frequency information and the summary index in their own local systems, to be extracted by the server as a next step. We developed an application for building these relations as part of our prototype. Note that creating the master table in relational databases environment can be done easily either using SQL or some QBE-like interface. Creating the database word-frequency information and the summary index could also be done mechanically using simple algorithms that work on the already described master table. AAer preparing the database word-frequency information at local sources, the server can extract them directly from each local source; this depends on a predefined protocol, which assumes a cooperative system. We developed and implemented an application to import this information from the distributed remote databases palticipating in the system. As a result, this application produces union frequency information that consists of two relations, namely the union database word-frequency information, and the union summary index. Building these relations is also an easy task, and could be done mechanically. We developed an application to build these relations. Afier building the union frequency information, the system server keeps the following information about each database: IS(db), the level size of database db, which is the total number of elements at each level I of db, V dbc DB and I E {t,c,r), where 1, c and r stand for table, column and record, respectively. (The union summary index) *flw,l,db), the number of elements at each level I of database db that contains keyword w, V dbe DB, I E {t,c,r), and for all keywords w. (The union database word-frequency information) This information facilitates applying the database selection process and estimating the usefulness of a database with respect to a given query. The value of flw,l,db) is the size of the result of the query: find keyword w at level 1 of database db. 177

4 The system dws not need to store the value of flw,l,db) explicitly if it is zero. This way, if no information is found by the system about the value of flw,/,db), then it is assumed as zero. The database selection process is presented in the next section, after introducing query representation supported by the proposed approach. 5 A Closer look at the Proposed Approach 5.1 Query Representation Our approach considers boolean queries that consist of atomic subqueries connected by the boolean AND operator, denoted A. However, OR and NOT queries are also possible by the same way. An atomic subquery is a single keyword. Since all words in the searchable level of each database have to be explicitly integrated in its database word-frequency information, supporting phrase search would lead to an enormous overhead. Therefore, our implementation will only consider single word subqueries. So, a query in our model can be represented by an unordered subset of W, where W={wl,w2,..., w.} is the set of all words in the union database word-frequency information relation. Since we are searching relational databases, the word(s) may be associated with what we call a level o/ search, which is a structure within relational databases, such as table, column or values. The set of these levels is denoted L=(t c, r). and the definition of query q is extended to incorporate the words and the associated level as follows: 9=f([w,,w,..., wj0i w, EW,/EL) j= a... Example 1 Consider the following statement that represents a query in our system: 4 = (([Honda I Red], r)}, which is equivalent to: find Honda A red within record level. This query looks for the two atomic subqueries (keywords), namely Honda, and Red; and the record level has been specified as the search level. Note that it is also possible to specify different levels of search for individual words in the same query. For a non-experienced end-user, levels can be named in a more intuitive manner; some hints or classification about the semantics of these levels have been added to the user interface. We consider only boolean queries because most current commercial online services and information vendors worldwide as well as traditional library systems support boolean query models to access their databases, offering well-maintained information systems in a number of fields such as science, business, and law [4]. Also, more web search engines, e.g., AltaVista, Excite, and WehCrawler are adopting boolean queries in their advanced interfaces. Other important reasons for adopting boolean model is the size of the database representative it requires, and boolean queries could be used and expressed-well in searching relational databases hy utilizing RDBMS functionality. 5.2 Estimated Merit of a Query Consider ls(db) andflw,l,db) for a set of databases DE, which we intend to search in order to satisfy some query 9; we assume that each database dbe DE has some merit with respect to query 9, denoted m(q,db). Merit could be defined as the actual number of elements present at the specified level of db and that satisfy query 9. Database selection algorithms do not know the actual merit of a database with respect to a query. Rather, they provide a means which can be used to estimate the merit. Such estimates are used to produce estimated ranking, which may not be the same as the desired ranking; the baseline ranking is based on actual merit. One approach to evaluating a database selection technique determines the degree for which the selection technique is able to produce database estimated ranking that approximates the desired baseline ranking. Each database dbe DB has an estimated merit with respect to the given query q, denoted Em(q,db), which is computed by a database selection algorithm. This estimated merit estimates the potential usefulness of a database for the given query. We expect that Em(q,db) is an attempt to implicitly or explicitly estimate the actual merit m(q,db). Ideally, To illustrate these definitions, consider a set of six databases DE=(dbl, db2, dbj, db,, dbs, dba}, that we wish to search in order to satisfy some query q, and assume that each database db E DE has some merit with respect to q; merit values are 3, 0, 6, 2, 0, and 4, respectively. As we would like to search the databases in order of decreasing merit with respect to q. for this example, we would like to report the databases in the following order: (db,. db6, db,. db4, db2, db,). Because db, and dbshave no merit with respect to 9, we should exclude them from the produced estimated ranking list to report only (dbj. dbs. db,, db3. 178

5 The tasks of the database selection algorithm are: 1) Calculate the estimated merit for each database with respect to the given query; 2) Select only databases that have non-zero estimated merit; and 3) Rank databases selected in step 2, based on their estimated merits, to produce the desired estimated databases ranking list. More details about each of these tasks are presented next, starting with the estimator function used to calculate the estimated merit for each database with respect to a given query. Table 1: Portion of the database frequency information kept by the saver for four databases dbj dbi I d63 dbr elements which are present at level 1 in db and satisfy the query:find wia w,h.. A w. within kvel I, is given according to this type of estimator as: Estimate ($nd[w, A w2 A... A w,, 11, db) = The produced estimated value represents the estimated merit of the database with respect to the given query, i.e., Em(q.db). Example 2 Consider four databases dbl, dbl, dbs. db,, and suppose that the server has collected the corresponding statistics, a portion of which is given in Table 1. Further, assume that the system received the query: q = {([Honda, Red], r)) which is equivalent to: 5.3 The Estimator Function The estimator is a function that estimates the merit of a database with respect to a given query. So, given the frequenciesj(w,l,db) and the level size ls(db), for a set of databases DB, the system uses the estimator function to assess for each database db in DE, how many elements at level 1 of db satisfy query q. This estimator is built based on the assumption that keywords appear in various elements of any level of a database following independent and uniform probability distribution. By uniform distribution we mean that the events in any sample mace are "equally likely" to occur. Saying that events A and B are independent means that the Occurrence or non-occurrence of event B provides no information about whether event A has also occurred. An equation that defines the independence of events A and E can be stated as follow. Two events, A and E, are independent yand only if P(AnB)=P(A)P(B), where P(A) and P(B) are, the probabilities of events A and E, respectively. This equation can be extended to n events in a straightforward manner. find "Honda" A Ted" within record level This query searches for records that contain both words, Honda and Red; the system estimates the number of matching records in each of the four databases in the following way. Database dbl contains 500 records, 50 of which contain the word Honda. Therefore, the probability that a record in db, contains the word Honda is 0.1. Similarly, the probability that a record in dbl contains the word Red is 0.2. Under the assumption that words appear independently at the record level, i.e., the probability that a record in db, contains both words is 0.1x0.2=0.02. Consequently, we can estimate the merit of db, with respect to q as: so 100 Em(q,db )--x-x500 tnn ' - snn Similarly, Em(% Em(q,db )--x-x700=0, ' Em(q, db )--x-x =IO records = -X- loo I5O x1000=15, and = records, Such estimators are usually called independent estimators [SI. Under this assumption, given a database 5.4 The SdeCtiOn Function for Relevant db. the total number of elements at level I of db, and any Search n keywords WI,..., and w,, the probability that any element at level I contains all ofw,,,,,, and wn, is given by: Once the estimated merit Em(%db) has calculated for each database, we can determine the databases that Mx,,,xm Ndb) IS(db> ' the estimated number Of should be included within the final estimated list, which we intend to report to users, as follows: 179

6 Equation 3 identifies databases that only have positive values for their estimated merits. These values assess the matching information based on the specified level. The subset of databases identified by Equation 3 is called Estimated Relevant databases; they are produced hy a search process called Relevant Search method. Relevance is usually defined in the context of IR as an abstract measure of how well a source (documentidatabase) satisfies user information needs. Ideally, a relevant source is one that would interest the user who issued the query. Unfortunately, this is a subjective notation and difficult to quantify; also it is not possible to know its exact value. One way to address this problem is to consider as relevant any database matching the user query. We think it is fair in our approach to simply look at databases with matching level elements as relevant databases. Therefore, we define Relevant databases to he the set of databases that have positive actual merits, i.e., positive number of level elements matching q. Formally, Rm(q,DB) =(dbe DB Im(q,db) > 0) (4) In order to evaluate the selection function, we need to compare the estimated relevant subset of databases S,(q.DB) against the actual relevant subset of databases with respect to q, i.e., R,(q, DE), Concerning Example 2, we can determine the selected set using Equation 3, and the obtained result is the subset of databases with positive merits, i.e., (db,, db2, dbr). This subset is also considered as the estimated relevant databases. The final step in the database selection algorithm is to rank the selected subset of databases according to the estimated merits in decreasing order, and the reported final ranked set of databases is, (db2, dbi, db4). 5.5 The Selection Functions for the Other Search Methods The result produced using the selection function in Equation 3, and the corresponding database selection algorithm described above is based on the assumption that the user is interested in the set of all relevant databases. However, users may be interest in different types of results. For instance, the user may he interested only in the best database for the given query, which is the database that contains more matching elements than any other database. There are several reasons that may impose choosing this type of result such as limited resources, e.g., time, space and money. In other words, the user wants to submit the query only to a single database which is expected to return the highest result. An intermediate case between reporting all relevant databases and reporting only the best database is to report a set of best databases, i.e., users may be interested in the set of best databases that have matching elements. This is an important case especially, when there is more than one database predicted by the estimator to have estimated merit values very close, within a threshold kom the highest estimated merit with respect to query q. This case neglects databases with low merit values compared to the highest merit Best database search method Consider frequencies /(w,l,db) and IS(&) for a set of databases DE, which we wish to search in order to satisfy some query q; the target is to search only the best single database that contains information relevant to q, i.e., the database that has the maximum estimated merit. The steps for the algorithm that achieves this target are: 1) Calculating the estimated merit for each database with respect to the given query (as described before). 2) Selecting and reporting only the database that has the maximum non-zero estimated merit. In the first step, the estimated merit for each database is computed using the estimator function in Equation 2, and to decide on the database to be reported, we apply the selection function: Sdq, DB) = (dbs DBI E<% db) > 0 I\ 64% db) = $2: E<q, db')) Equation 5 identifies a single database that has the maximum positive value for its estimated merit. We consider the database identified by Equation 5 as the Estimated Best Database; it is produced by a search process known as the Best Search Method. To evaluate this selection function, we need to compare the estimated hest database produced by this function against the actual best database with respect to query q. We define the Best Database, denoted B.(q,DB), as the database that has the maximum positive actual merit, i.e., the highest positive number of elements matching query q. Formally, B,(q,DB) =(db~ DB 1 m(q,db) > O h m(q,db) = max m(q,db')) *se08 (6) (5) 180

7 To illustrate this, again consider Example 2; if the user is interested only in the best database that contains the required information, then the estimated merit must be computed for each database by the same way as in the case of the relevant search method. This way, we can determine the estimated best database using Equation 5, and as a result (db2) is selected and reported to the user Best-group search method Consider database frequencies flw,l,db) and IS(db), for a set of databases DE, which we wish to search in order to satisfy some query q; we would like to search the subset for the best databases that satisfy query q. In this case, the desired behavior of the database selection algorithm is-to select and report the database with the maximum estimated merit along with all databases that have estimated merit close within a threshold to the maximum. So, the selection function in this case should be extended to make the definition of the best database more looser, by also letting databases with estimated merit close to the maximum be considered as best databases. The database selection algorithm for this search method proceeds as follows: I) Calculate the estimated merit for each database with respect to the given query. 2) Specify the maximum non-zero estimated merit. 3) Sclect databases that have estimated merit close to the maximum. 4) Rank databases selected in step 3, based on their estimated merits, and produce the desired estimated databases ranking list. In the first step, the estimated merit for each database is computed using the estimator function in Equation 2. Then, the maximum estimated merit is specified, and to select the databases to be reported to the user, we apply the following selection function, for a given threshold 20: Equation 7 identifies a subset of databases that have the highest positive estimated merits. We consider the identified databases as the Estimated Best-Group Databases, and the search process that produced them as the Best-Group Search Method. This function may be considered as a special case of the function defined by Gravano [8], which was introduced as an improvement to GIOSS system. The (7) selected databases identified by Equation 7 should be ranked first according to their estimated merits, and then reported to the user as the fmal estimated list of ranked databases. Parameter E changes the best database selection function in Equation 5 to facilitate selecting databases that are close to the predicted maximum one. Therefore, the larger E is, the more databases will be included in the selected best-group databases. In order to evaluate this selection function, we need to compare the estimated best-group databases produced by this function against the actual best-group subset of DB with respect to query q. We define the Best-group Databases B,(q,DB) to include databases that have the highest positive actual merit values, close to the maximum merit, i.e., the highest positive number of elements matching query q. Formally, (8) Here, it is worth mentioning that Equations 3 and 4 coincide with Equations 7 and 8, respectively, for el. Also, Equations 5 and 6 coincide with Equations 7 and 8, respectively, for 4. Note that the user who submits a query does not specify a value for e. The semer should fix the value of E according to the desired meaning of the best-group. Some other factors may affect selecting the value of E, like the total number of databases in the system, i.e., if the number of databases in the system is too large, then E may be set io have a small value. In general, the higher the value of E, the more databases tend to be selected by the selection function. To illustrate the impact of this new selection function, let us consider the databases in Example 2. If the user is interested in the best-group databases that contain the required information, and if E is specified by the server as 0.5, then the estimated merits will be computed for each database by the same way as in the case of relevant search method. This way, the maximum estimated merit is specified as 15; then the best-group databases is estimated using Equation 8, and based on the following calculations: Em(l,db,)=lO, which is greater than zero and HSO~; Em(l,db2)=15, has the maximum estimated merit; Em(l,db+O; and Em(l.db,)=l.O67, which is greater than zero, but w

8 Since only db, and db2 satisfy Equation 8, the set {db,, db2) will be selected as the best-group. The final step is to rank the selected subset of databases in decreasing order according to their estimated merits. So, the final estimated ranking list that will be reported to the user is (db2, db,). 6 Summary and Conclusions In this paper, we described a mechanism that enables users to find information of interest and databases that contain such information without any a priori knowledge about its location. This mechanism employs the statistics information extracted from each local database in a step to estimate the number of potentially useful databases. Our estimation methods are based upon established statistics theory. This mechanism also accomplishes these goals with no effect on the autonomy and heterogeneity of the participating databases. Further, the flexibility and ease with which the databases join and leave the system, and organize themselves within the system is one of the most important features of our system. To support relational query processing using keyword search, we gave the necessary extensions to the inverted file in order to take the stmcture of relational databases into account. The experiment results are encouraging. Finally, a prototype that incorporated each part of the architecture has been proposed, which includes: building the index and summary tables for each database; extracting the index and summary tables from individual databases; building the union index and the union summary tables; and building the main system with a web-based interface. Currently, we are working on different evaluation parameters. Also, we are also planning to evaluate the storage requirements of the proposed approach. References [5] S. Dar, et al, DTL s DataSpot: Database Exploration Using Plain Language, Proceedings of the International Conference on Very Large Databases, pp , [6] J. French, et al, Comparing the performance of Database Selection Algorithms, Proceedings of ACM SIGIR Conference on Research and Development in IR, pp , [7] D. Florescu, D. Kossmann and 1. Manolescu, Integrating Keyword Search into XML Query Processing, Proceedings of the international Conference on WWW, [8] L. Gravano, H. Garcia-Molina and A. Tomasic, GIOSS: Text-source discovery over the internet, ACM-TODS, Vo1.24, No.2, pp , June [9] R. Goldman, et al, Proximity search in databases, Proceedings of the International Conference on Very Large Databases, pp.26-37, [IO] V. Hristidis and Y. Papakonstantinou, DISCOVER Keyword Search in Relational Databases, Proceedings of the International Conference on Very Large Databases, [Ill U. Masermann and G. Vossen, Design and Implementation of a Novel Approach to Keyword Searching in Relational Databases, Proceedings of ADBIS-DASFAA, [I21 K.D. Munroe and Y. Papakonstantinou, BBQ: A visual interface for integrated browsing and querying of XML, Proceedings of IEEE International conference on Data Engineering, nnn- LUUL. S. Agrawal, S. Chaudhuri, and G. Das, DBXplorer: A System for Keyword-Based Search [13] N.L. Sards and A.J. Mragyati, A system for over Relational Databases, Proceedings of IEEE keyword-based searching in databases, International conference on Data Engineering, I R. Baeza-Yates and B. Ribeiro-Neto, Modem Information Retrieval, ACM Press, G. Bhalotia, et al, Keyword Searching and Browsing in Databases using BANKS, Proceedings of IEEE International conference on Data Engineering, C. Chuan, et al, Predicate Rewriting for Translating Boolean Queries in a Heterogeneous Information System, ACM Transaction on Information Systems, Vol. 17, No.1, pp.1-39,

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan Keyword search in relational databases By SO Tsz Yan Amanda & HON Ka Lam Ethan 1 Introduction Ubiquitous relational databases Need to know SQL and database structure Hard to define an object 2 Query representation

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

Implementation of Skyline Sweeping Algorithm

Implementation of Skyline Sweeping Algorithm Implementation of Skyline Sweeping Algorithm BETHINEEDI VEERENDRA M.TECH (CSE) K.I.T.S. DIVILI Mail id:veeru506@gmail.com B.VENKATESWARA REDDY Assistant Professor K.I.T.S. DIVILI Mail id: bvr001@gmail.com

More information

Supporting Fuzzy Keyword Search in Databases

Supporting Fuzzy Keyword Search in Databases I J C T A, 9(24), 2016, pp. 385-391 International Science Press Supporting Fuzzy Keyword Search in Databases Jayavarthini C.* and Priya S. ABSTRACT An efficient keyword search system computes answers as

More information

Information Discovery, Extraction and Integration for the Hidden Web

Information Discovery, Extraction and Integration for the Hidden Web Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk

More information

RELATIVE QUERY RESULTS RANKING FOR ONLINE USERS IN WEB DATABASES

RELATIVE QUERY RESULTS RANKING FOR ONLINE USERS IN WEB DATABASES RELATIVE QUERY RESULTS RANKING FOR ONLINE USERS IN WEB DATABASES Pramod Kumar Ghadei Research Scholar, Sathyabama University, Chennai, Tamil Nadu, India pramod-ghadei@yahoo.com Dr. S. Sridhar Research

More information

Searching Databases with Keywords

Searching Databases with Keywords Shan Wang et al.: Searching Databases with Keywords 1 Searching Databases with Keywords Shan Wang and Kun-Long Zhang School of Information, Renmin University of China, Beijing, 100872, P.R. China E-mail:

More information

Distributed KIDS Labs 1

Distributed KIDS Labs 1 Distributed Databases @ KIDS Labs 1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database

More information

A New Technique to Optimize User s Browsing Session using Data Mining

A New Technique to Optimize User s Browsing Session using Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Ontology Based Prediction of Difficult Keyword Queries

Ontology Based Prediction of Difficult Keyword Queries Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com

More information

Data Warehousing Alternatives for Mobile Environments

Data Warehousing Alternatives for Mobile Environments Data Warehousing Alternatives for Mobile Environments I. Stanoi D. Agrawal A. El Abbadi Department of Computer Science University of California Santa Barbara, CA 93106 S. H. Phatak B. R. Badrinath Department

More information

Keyword Search over Hybrid XML-Relational Databases

Keyword Search over Hybrid XML-Relational Databases SICE Annual Conference 2008 August 20-22, 2008, The University Electro-Communications, Japan Keyword Search over Hybrid XML-Relational Databases Liru Zhang 1 Tadashi Ohmori 1 and Mamoru Hoshi 1 1 Graduate

More information

Keyword query interpretation over structured data

Keyword query interpretation over structured data Keyword query interpretation over structured data Advanced Methods of Information Retrieval Elena Demidova SS 2018 Elena Demidova: Advanced Methods of Information Retrieval SS 2018 1 Recap Elena Demidova:

More information

DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY

DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY Reham I. Abdel Monem 1, Ali H. El-Bastawissy 2 and Mohamed M. Elwakil 3 1 Information Systems Department, Faculty of computers and information,

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

GlOSS: Text-Source Discovery over the Internet

GlOSS: Text-Source Discovery over the Internet GlOSS: Text-Source Discovery over the Internet LUIS GRAVANO Columbia University HÉCTOR GARCÍA-MOLINA Stanford University and ANTHONY TOMASIC INRIA Rocquencourt The dramatic growth of the Internet has created

More information

Keyword query interpretation over structured data

Keyword query interpretation over structured data Keyword query interpretation over structured data Advanced Methods of IR Elena Demidova Materials used in the slides: Jeffrey Xu Yu, Lu Qin, Lijun Chang. Keyword Search in Databases. Synthesis Lectures

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

Modelling Structures in Data Mining Techniques

Modelling Structures in Data Mining Techniques Modelling Structures in Data Mining Techniques Ananth Y N 1, Narahari.N.S 2 Associate Professor, Dept of Computer Science, School of Graduate Studies- JainUniversity- J.C.Road, Bangalore, INDIA 1 Professor

More information

Leveraging Transitive Relations for Crowdsourced Joins*

Leveraging Transitive Relations for Crowdsourced Joins* Leveraging Transitive Relations for Crowdsourced Joins* Jiannan Wang #, Guoliang Li #, Tim Kraska, Michael J. Franklin, Jianhua Feng # # Department of Computer Science, Tsinghua University, Brown University,

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols The International Arab Journal of Information Technology, Vol. 5, No. 4, October 2008 381 Incompatibility Dimensions and Integration of Atomic Commit Protocols Yousef Al-Houmaily Department of Computer

More information

Database Technology Introduction. Heiko Paulheim

Database Technology Introduction. Heiko Paulheim Database Technology Introduction Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query Processing Transaction Manager Introduction to the Relational Model

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

An Evolution of Mathematical Tools

An Evolution of Mathematical Tools An Evolution of Mathematical Tools From Conceptualization to Formalization Here's what we do when we build a formal model (or do a computation): 0. Identify a collection of objects/events in the real world.

More information

A MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS

A MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS A MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS Alberto Pan, Paula Montoto and Anastasio Molano Denodo Technologies, Almirante Fco. Moreno 5 B, 28040 Madrid, Spain Email: apan@denodo.com,

More information

Principles of Dataspaces

Principles of Dataspaces Principles of Dataspaces Seminar From Databases to Dataspaces Summer Term 2007 Monika Podolecheva University of Konstanz Department of Computer and Information Science Tutor: Prof. M. Scholl, Alexander

More information

modern database systems lecture 4 : information retrieval

modern database systems lecture 4 : information retrieval modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation

More information

Chapter S:II. II. Search Space Representation

Chapter S:II. II. Search Space Representation Chapter S:II II. Search Space Representation Systematic Search Encoding of Problems State-Space Representation Problem-Reduction Representation Choosing a Representation S:II-1 Search Space Representation

More information

Extending Keyword Search to Metadata in Relational Database

Extending Keyword Search to Metadata in Relational Database DEWS2008 C6-1 Extending Keyword Search to Metadata in Relational Database Jiajun GU Hiroyuki KITAGAWA Graduate School of Systems and Information Engineering Center for Computational Sciences University

More information

Querying Data with Transact SQL

Querying Data with Transact SQL Course 20761A: Querying Data with Transact SQL Course details Course Outline Module 1: Introduction to Microsoft SQL Server 2016 This module introduces SQL Server, the versions of SQL Server, including

More information

Information Retrieval CSCI

Information Retrieval CSCI Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1

More information

Effective Top-k Keyword Search in Relational Databases Considering Query Semantics

Effective Top-k Keyword Search in Relational Databases Considering Query Semantics Effective Top-k Keyword Search in Relational Databases Considering Query Semantics Yanwei Xu 1,2, Yoshiharu Ishikawa 1, and Jihong Guan 2 1 Graduate School of Information Science, Nagoya University, Japan

More information

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014. A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

NON-CENTRALIZED DISTINCT L-DIVERSITY

NON-CENTRALIZED DISTINCT L-DIVERSITY NON-CENTRALIZED DISTINCT L-DIVERSITY Chi Hong Cheong 1, Dan Wu 2, and Man Hon Wong 3 1,3 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong {chcheong, mhwong}@cse.cuhk.edu.hk

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

RSDC 09: Tag Recommendation Using Keywords and Association Rules

RSDC 09: Tag Recommendation Using Keywords and Association Rules RSDC 09: Tag Recommendation Using Keywords and Association Rules Jian Wang, Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem, PA 18015 USA

More information

A CORBA-based Multidatabase System - Panorama Project

A CORBA-based Multidatabase System - Panorama Project A CORBA-based Multidatabase System - Panorama Project Lou Qin-jian, Sarem Mudar, Li Rui-xuan, Xiao Wei-jun, Lu Zheng-ding, Chen Chuan-bo School of Computer Science and Technology, Huazhong University of

More information

Relational Algebra and Calculus

Relational Algebra and Calculus and Calculus Marek Rychly mrychly@strathmore.edu Strathmore University, @ilabafrica & Brno University of Technology, Faculty of Information Technology Advanced Databases and Enterprise Systems 7 September

More information

60-538: Information Retrieval

60-538: Information Retrieval 60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are

More information

Processing Rank-Aware Queries in P2P Systems

Processing Rank-Aware Queries in P2P Systems Processing Rank-Aware Queries in P2P Systems Katja Hose, Marcel Karnstedt, Anke Koch, Kai-Uwe Sattler, and Daniel Zinn Department of Computer Science and Automation, TU Ilmenau P.O. Box 100565, D-98684

More information

Inverted Index for Fast Nearest Neighbour

Inverted Index for Fast Nearest Neighbour Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Data Access Paths for Frequent Itemsets Discovery

Data Access Paths for Frequent Itemsets Discovery Data Access Paths for Frequent Itemsets Discovery Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science {marekw, mzakrz}@cs.put.poznan.pl Abstract. A number

More information

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE Wei-ning Qian, Hai-lei Qian, Li Wei, Yan Wang and Ao-ying Zhou Computer Science Department Fudan University Shanghai 200433 E-mail: wnqian@fudan.edu.cn

More information

A Data warehouse within a Federated database architecture

A Data warehouse within a Federated database architecture Association for Information Systems AIS Electronic Library (AISeL) AMCIS 1997 Proceedings Americas Conference on Information Systems (AMCIS) 8-15-1997 A Data warehouse within a Federated database architecture

More information

International Journal of Advance Engineering and Research Development. Performance Enhancement of Search System

International Journal of Advance Engineering and Research Development. Performance Enhancement of Search System Scientific Journal of Impact Factor(SJIF): 3.134 International Journal of Advance Engineering and Research Development Volume 2,Issue 7, July -2015 Performance Enhancement of Search System Ms. Uma P Nalawade

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

Processing Structural Constraints

Processing Structural Constraints SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited

More information

Deep Web Content Mining

Deep Web Content Mining Deep Web Content Mining Shohreh Ajoudanian, and Mohammad Davarpanah Jazi Abstract The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased

More information

Roadmap. Roadmap. Ranking Web Pages. PageRank. Roadmap. Random Walks in Ranking Query Results in Semistructured Databases

Roadmap. Roadmap. Ranking Web Pages. PageRank. Roadmap. Random Walks in Ranking Query Results in Semistructured Databases Roadmap Random Walks in Ranking Query in Vagelis Hristidis Roadmap Ranking Web Pages Rank according to Relevance of page to query Quality of page Roadmap PageRank Stanford project Lawrence Page, Sergey

More information

Clustering Algorithms for Data Stream

Clustering Algorithms for Data Stream Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:

More information

6. Relational Algebra (Part II)

6. Relational Algebra (Part II) 6. Relational Algebra (Part II) 6.1. Introduction In the previous chapter, we introduced relational algebra as a fundamental model of relational database manipulation. In particular, we defined and discussed

More information

CS143: Relational Model

CS143: Relational Model CS143: Relational Model Book Chapters (4th) Chapters 1.3-5, 3.1, 4.11 (5th) Chapters 1.3-7, 2.1, 3.1-2, 4.1 (6th) Chapters 1.3-6, 2.105, 3.1-2, 4.5 Things to Learn Data model Relational model Database

More information

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management Outline n Introduction & architectural issues n Data distribution n Distributed query processing n Distributed query optimization n Distributed transactions & concurrency control n Distributed reliability

More information

UML-Based Conceptual Modeling of Pattern-Bases

UML-Based Conceptual Modeling of Pattern-Bases UML-Based Conceptual Modeling of Pattern-Bases Stefano Rizzi DEIS - University of Bologna Viale Risorgimento, 2 40136 Bologna - Italy srizzi@deis.unibo.it Abstract. The concept of pattern, meant as an

More information

Consistency and Set Intersection

Consistency and Set Intersection Consistency and Set Intersection Yuanlin Zhang and Roland H.C. Yap National University of Singapore 3 Science Drive 2, Singapore {zhangyl,ryap}@comp.nus.edu.sg Abstract We propose a new framework to study

More information

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Appears in WWW 04 Workshop: Measuring Web Effectiveness: The User Perspective, New York, NY, May 18, 2004 Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Anselm

More information

Chapter 2 Overview of the Design Methodology

Chapter 2 Overview of the Design Methodology Chapter 2 Overview of the Design Methodology This chapter presents an overview of the design methodology which is developed in this thesis, by identifying global abstraction levels at which a distributed

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols Preprint Incompatibility Dimensions and Integration of Atomic Protocols, Yousef J. Al-Houmaily, International Arab Journal of Information Technology, Vol. 5, No. 4, pp. 381-392, October 2008. Incompatibility

More information

second_language research_teaching sla vivian_cook language_department idl

second_language research_teaching sla vivian_cook language_department idl Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli

More information

EXTRACTION INFORMATION ADAPTIVE WEB. The Amorphic system works to extract Web information for use in business intelligence applications.

EXTRACTION INFORMATION ADAPTIVE WEB. The Amorphic system works to extract Web information for use in business intelligence applications. By Dawn G. Gregg and Steven Walczak ADAPTIVE WEB INFORMATION EXTRACTION The Amorphic system works to extract Web information for use in business intelligence applications. Web mining has the potential

More information

Elementary IR: Scalable Boolean Text Search. (Compare with R & G )

Elementary IR: Scalable Boolean Text Search. (Compare with R & G ) Elementary IR: Scalable Boolean Text Search (Compare with R & G 27.1-3) Information Retrieval: History A research field traditionally separate from Databases Hans P. Luhn, IBM, 1959: Keyword in Context

More information

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,

More information

Addressed Issue. P2P What are we looking at? What is Peer-to-Peer? What can databases do for P2P? What can databases do for P2P?

Addressed Issue. P2P What are we looking at? What is Peer-to-Peer? What can databases do for P2P? What can databases do for P2P? Peer-to-Peer Data Management - Part 1- Alex Coman acoman@cs.ualberta.ca Addressed Issue [1] Placement and retrieval of data [2] Server architectures for hybrid P2P [3] Improve search in pure P2P systems

More information

STRUCTURED ENTITY QUERYING OVER UNSTRUCTURED TEXT JIAHUI JIANG THESIS

STRUCTURED ENTITY QUERYING OVER UNSTRUCTURED TEXT JIAHUI JIANG THESIS c 2015 Jiahui Jiang STRUCTURED ENTITY QUERYING OVER UNSTRUCTURED TEXT BY JIAHUI JIANG THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science

More information

DBMS. Relational Model. Module Title?

DBMS. Relational Model. Module Title? Relational Model Why Study the Relational Model? Most widely used model currently. DB2,, MySQL, Oracle, PostgreSQL, SQLServer, Note: some Legacy systems use older models e.g., IBM s IMS Object-oriented

More information

INCONSISTENT DATABASES

INCONSISTENT DATABASES INCONSISTENT DATABASES Leopoldo Bertossi Carleton University, http://www.scs.carleton.ca/ bertossi SYNONYMS None DEFINITION An inconsistent database is a database instance that does not satisfy those integrity

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Data integration supports seamless access to autonomous, heterogeneous information

Data integration supports seamless access to autonomous, heterogeneous information Using Constraints to Describe Source Contents in Data Integration Systems Chen Li, University of California, Irvine Data integration supports seamless access to autonomous, heterogeneous information sources

More information

Requirements Engineering for Enterprise Systems

Requirements Engineering for Enterprise Systems Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2001 Proceedings Americas Conference on Information Systems (AMCIS) December 2001 Requirements Engineering for Enterprise Systems

More information

II B.Sc(IT) [ BATCH] IV SEMESTER CORE: RELATIONAL DATABASE MANAGEMENT SYSTEM - 412A Multiple Choice Questions.

II B.Sc(IT) [ BATCH] IV SEMESTER CORE: RELATIONAL DATABASE MANAGEMENT SYSTEM - 412A Multiple Choice Questions. Dr.G.R.Damodaran College of Science (Autonomous, affiliated to the Bharathiar University, recognized by the UGC)Re-accredited at the 'A' Grade Level by the NAAC and ISO 9001:2008 Certified CRISL rated

More information

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com

More information

Updates through Views

Updates through Views 1 of 6 15 giu 2010 00:16 Encyclopedia of Database Systems Springer Science+Business Media, LLC 2009 10.1007/978-0-387-39940-9_847 LING LIU and M. TAMER ÖZSU Updates through Views Yannis Velegrakis 1 (1)

More information

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016 + Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html

More information

LECTURE 8: SETS. Software Engineering Mike Wooldridge

LECTURE 8: SETS. Software Engineering Mike Wooldridge LECTURE 8: SETS Mike Wooldridge 1 What is a Set? The concept of a set is used throughout mathematics; its formal definition matches closely our intuitive understanding of the word. Definition: A set is

More information

Handout 9: Imperative Programs and State

Handout 9: Imperative Programs and State 06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative

More information

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE Ms.S.Muthukakshmi 1, R. Surya 2, M. Umira Taj 3 Assistant Professor, Department of Information Technology, Sri Krishna College of Technology, Kovaipudur,

More information

Effective Keyword Search in Relational Databases for Lyrics

Effective Keyword Search in Relational Databases for Lyrics Effective Keyword Search in Relational Databases for Lyrics Navin Kumar Trivedi Assist. Professor, Department of Computer Science & Information Technology Divya Singh B.Tech (CSe) Scholar Pooja Pandey

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

DATA MODELS FOR SEMISTRUCTURED DATA

DATA MODELS FOR SEMISTRUCTURED DATA Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and

More information

Implementation Techniques

Implementation Techniques V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight

More information

Downloading Hidden Web Content

Downloading Hidden Web Content Downloading Hidden Web Content Alexandros Ntoulas Petros Zerfos Junghoo Cho UCLA Computer Science {ntoulas, pzerfos, cho}@cs.ucla.edu Abstract An ever-increasing amount of information on the Web today

More information

Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Enhancing Internet Search Engines to Achieve Concept-based Retrieval Enhancing Internet Search Engines to Achieve Concept-based Retrieval Fenghua Lu 1, Thomas Johnsten 2, Vijay Raghavan 1 and Dennis Traylor 3 1 Center for Advanced Computer Studies University of Southwestern

More information

XQuery Optimization Based on Rewriting

XQuery Optimization Based on Rewriting XQuery Optimization Based on Rewriting Maxim Grinev Moscow State University Vorob evy Gory, Moscow 119992, Russia maxim@grinev.net Abstract This paper briefly describes major results of the author s dissertation

More information

Automated Online News Classification with Personalization

Automated Online News Classification with Personalization Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798

More information

Bipartite Graph Partitioning and Content-based Image Clustering

Bipartite Graph Partitioning and Content-based Image Clustering Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the

More information

CS 377 Database Systems

CS 377 Database Systems CS 377 Database Systems Relational Data Model Li Xiong Department of Mathematics and Computer Science Emory University 1 Outline Relational Model Concepts Relational Model Constraints Relational Database

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Course 20761A: Querying Data with Transact-SQL Page 1 of 5 Querying Data with Transact-SQL Course 20761A: 2 days; Instructor-Led Introduction The main purpose of this 2 day instructor led course is to

More information

Information Management (IM)

Information Management (IM) 1 2 3 4 5 6 7 8 9 Information Management (IM) Information Management (IM) is primarily concerned with the capture, digitization, representation, organization, transformation, and presentation of information;

More information

Database System Concepts and Architecture

Database System Concepts and Architecture CHAPTER 2 Database System Concepts and Architecture Copyright 2017 Ramez Elmasri and Shamkant B. Navathe Slide 2-2 Outline Data Models and Their Categories History of Data Models Schemas, Instances, and

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Relational Data Model

Relational Data Model Relational Data Model 1. Relational data model Information models try to put the real-world information complexity in a framework that can be easily understood. Data models must capture data structure

More information

2. Discovery of Association Rules

2. Discovery of Association Rules 2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining

More information

INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN EFFECTIVE KEYWORD SEARCH OF FUZZY TYPE IN XML

INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN EFFECTIVE KEYWORD SEARCH OF FUZZY TYPE IN XML INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 EFFECTIVE KEYWORD SEARCH OF FUZZY TYPE IN XML Mr. Mohammed Tariq Alam 1,Mrs.Shanila Mahreen 2 Assistant Professor

More information

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction CS425 Fall 2016 Boris Glavic Chapter 1: Introduction Modified from: Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Textbook: Chapter 1 1.2 Database Management System (DBMS)

More information

Equivalence Detection Using Parse-tree Normalization for Math Search

Equivalence Detection Using Parse-tree Normalization for Math Search Equivalence Detection Using Parse-tree Normalization for Math Search Mohammed Shatnawi Department of Computer Info. Systems Jordan University of Science and Tech. Jordan-Irbid (22110)-P.O.Box (3030) mshatnawi@just.edu.jo

More information

Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive

More information