Data warehouse access using multi-agent system

Size: px
Start display at page:

Download "Data warehouse access using multi-agent system"

Transcription

1 Distrib Parallel Databases (2009) 25: DOI /s Data warehouse access using multi-agent system Nader Kolsi Abdelaziz Abdellatif Khaled Ghedira Published online: 21 February 2009 Springer Science+Business Media, LLC 2009 Abstract The new approach that we will propose, in this paper deals with the dynamic data distribution of the data warehouse (DWH) on a set of servers. This distribution is different from the classical one which depends on how data is used. It consists in distributing data when the machine reaches its storage limit capacity. The proposed approach insures the scalability and exploits the storage and processing resources available in the organization using the DWH. It is worth noting that our approach is based on a multi-agent model mixed with the scalability distribution proposed by the Scalable Distributed Data Structures. Our multi-agent model is made up of stationary agent classes: Client, Dispatcher, Domain and Server, and a mobile agent class: Messenger. These agents collaborate and achieve automatically the storage, splitting, redirection and access operations on the distributed DWH. In this paper, we focus on the global dynamic for the data access operation and we present the inherent experimental results. Keywords Data warehouse Dynamic distribution Data access Multi-agent system Mobile agent Scalable and distributed data structures Communicated by Ladjel Bellatreche. N. Kolsi ( ) Higher Institute of Business Administration of Sfax, Sfax, Tunisia nader.kolsi@fsegs.rnu.tn A. Abdellatif University of Sciences of Tunis, Tunis, Tunisia abdelaziz.abdellatif@fst.rnu.tn K. Ghedira National School of Informatics Sciences, Manouba Campus University, Tunis, Tunisia khaled.ghedira@isg.rnu.tn

2 30 Distrib Parallel Databases (2009) 25: Introduction The data warehouse (DWH), as defined by its inventor W.H. Inmon [19], is a collection of data which are subject-oriented, integrated, stamped, non-volatile, and used as a support of decision making. It is considered as a deposit of data that have been collected from heterogeneous and autonomous distributed sources. It is used for analytical tasks in business. The DWH usually contains a very large amount of data. This is because of the scope of the period that the DWH must cover (historical data) and the diversity of data sources from which data are extracted. The DWH is a principal component of the information systems in the organizations. In fact, it is the subject of many research works. This research deals with five main parts as shown in [29]: (1) data warehouse modeling and design, (2) data warehouse architectures, (3) data warehouse maintenance, (4) operational issues, and (5) optimization. Our research focuses on the operational issues and optimization topics mainly, but also data warehouse architectures and design. Our work aims at solving the problems of storage space and performance through: (1) developing a dynamic system that can manage the DWH automatically (data storage, data distribution on a set of servers, and data access), (2) taking advantage of the storage and processing resources available in the organization (processors, memory, hard disks, etc.), (3) getting better data storage time, and (4) improving the query response time. This paper is organized as follows: Sect. 2 gives an overview of related works and discusses the problems related to optimization topics. In Sect. 3, we present the multi-agent system. In Sect. 4, we describe the proposed multi-agent model. Section 5 details the global dynamic of the data access operation. In Sect. 6, the inherent experimental results are revealed. Finally, in Sect. 7, a conclusion and an outlook to future works are made. 2 Related works So far, distribution of data warehouses has not attracted much attention in research. The use of DWH with distributed structure has appeared only with the data marts [11, 18]. Although the use of small data marts (data warehouses) was the first attempt to solve the problems of space and performance, data marts are basically stand-alone and have data integration problems in a global data warehouse context. In addition, the performance of many distributed queries is normally poor, mainly due to the load balance problems. Furthermore, each individual data mart is primarily designed and tuned to answer the queries related to its own subject area, whereas the response to global queries depends on the global system tuning and the network speed. So most of researches in literature working in optimization topics propose solutions based on a centralized DWH or a model of partitioning which consist of storing the facts-table in pieces, instead of a large monolithic object on a set of I/O devices

3 Distrib Parallel Databases (2009) 25: with multiprocessors machine or on a centralized database. The latter is very expensive because of the large setup costs, and it is not very flexible due to its centralized nature [5]. In these researches, several queries optimization techniques are proposed. These techniques can be classified in two categories [2]: redundant structures: as materialized views and indexes [3, 15, 22]. These techniques compete for the same resource representing the storage cost and incur maintenance overhead in the presence of updates [28], non-redundant structures: as horizontal partitioning [9]. These techniques do not require an extra space as those in the first category. All these techniques are supported by the current database management systems (DBMS). The improvements, which are provided to these systems and concern the management of large data amount, are not sufficient to satisfy the needs due to the data amount growth of the DWH. In addition, the static data fragmentation schema, actually used in these systems, constitutes a major handicap. It is worth noting that, in our approach, we use the two techniques mentioned above (non-redundant and redundant structures). The horizontal partitioning technique will be used to distribute the data warehouse on a set of machines. The materialized views and indexes will be used on each individual machine that must be tuned and optimized for performance. Obviously, most researches, in the literature, that work on the data warehouse distribution propose solutions based on the studies made on the production databases under the name of very large data bases. These solutions are based on the classic data distribution which depends on the data use and has a static distribution plan. Furthermore, this type of distribution is defined at the design phase. In [9], the authors propose a solution to make this distribution plan dynamic. They present an algorithm to find the optimal vertical schema fragmentation based on the particle swarm optimization. Other researches [30, 31] use the abstract state machines [7] as a flexible and quality-oriented formal method to design and optimize a distributed DWH and OLAP (On Line Analytical Processing) applications. We have to point out that, in our approach, the data distribution that we consider is different from the usual-used ones [21]. In fact, it is not defined at the design phase. However, it is imposed by the storage capacity. As a matter of fact, when a machine reaches its storage capacity limit, we add another one. Then, we distribute the data on the two machines to have a balanced load. There are several ways to divide horizontally the relation. Typically, we can assign tuples to the processors in a round-robin fashion (round-robin partitioning), we can use hashing (hash partitioning), or we can assign tuples to the processors by ranges of values (range partitioning) [5]. In [5, 6, 8, 14], the papers authors use the Data Warehouse Striping (DWS) technique. The latter is a round-robin data partitioning approach especially designed for distributed data warehouse environments. By using the DWS, the fact table will be distributed into an arbitrary number of machines which is fixed at the beginning. Consequently, the queries will be executed in parallel by all of the machines [8]. The round-robin distribution is simple to use and guaranties the load balancing, although its major disadvantage is that we must have machines with

4 32 Distrib Parallel Databases (2009) 25: the same treatment and storage capacities. Otherwise, some machines will be too busy and the others will be under used. We have to note that, in our approach, we use the range partitioning applied by the scalable and distributed data structures (see Sect. 3). So, the queries are executed in parallel not by all the machines but only by those that contain the necessary partitions. Furthermore, the data distribution is dynamic and automatic. In fact, at each time when one machine reaches its limit capacity, it starts up the data distribution operation without needing an external intervention (administrator). Moreover, the number of used machines, in our approach, is not fixed. Therefore, the storage capacity of the DWH tends theoretically to the infinite because we can, at any moment, add dynamically other machines. In the following section, we present the scalable and distributed data structures principle. 3 Scalable and Distributed Data Structures The Scalable and Distributed Data Structures (SDDS) deal with the storage of a large data amount on a set of interconnected machines. The SDDS principle consists in distributing the file contents in a way that allows us to benefit from the available memory on a set of interconnected machines [4, 10]. This distribution is based on the identifiers (keys). In fact, the latter residing in one machine must be included between a lower bound mark and a higher one (see Sect. 5.1). The increasing content of the file involves its splitting. This principle has been extended from files to operational databases [24, 26, 27]. The infinite storage capacity and dynamic data distribution are guaranteed by the principle of the SDDSs [23]. In the rest of this paper, we consider that the two terms splitting and distributing have the same significance. In the following section, we present the multi-agent system concepts. 4 Multi-agent system The agent paradigm is currently in vogue within a lot of research domains. An agent can be a physical or virtual entity that acts autonomously (without the direct intervention of humans or others), on behalf of entities (person, organisation, etc.), in response to input from his environment. Agents have a social ability. They may communicate with the users, system resources and other agents as required in order achieving its goals and tendencies. Moreover, more advanced agents may cooperate with other agents to carry out tasks beyond the capability of a single agent. So, agents contain some level of intelligence, ranging from pre-defined rules up to self-learning artificial intelligence inference machines. This intelligence enables agents to act not only reactively, but sometimes also proactively. An agent can be static or mobile. The latter is a particular class of agent with the ability during execution to migrate dynamically (code, data and execution state) from one machine to another, where it can resume its execution, in order to reach data or

5 Distrib Parallel Databases (2009) 25: remote resources. It has been suggested that mobile agent technology, amongst other things, can help to reduce network traffic and to overcome network latencies [17]. Moreover, the mobile agents have proved a high performance when we access to the data distributed on a set of interconnected machines [1] and when we store these data [20]. A MAS is a system composed of multiple autonomous agents and comprises the following elements [13]: 1. An environment E is a space which generally has volume. 2. A set of situated objects O, that is to say, it is possible at a given moment to associate any object with a position in E. 3. An assembly of agents A, which are specific objects (a subset of O ), represent the active entities in the system. 4. An assembly of relations R, which link objects (and therefore, agents) to one another. 5. An assembly of operations Op, that allows the agents of A to perceive, produce, transform, and manipulate objects in O. 6. Operators with the task of representing the application of these operations and the reaction of the world to this attempt at modification, which we shall call the laws of the universe. The following section reveals the data distribution principle and the proposed multi-agent model. 5 Proposed model The aim of our proposed model is to solve the problems in the DWH context using the available resources in the organization. These problems are related to the data storage, splitting and access. According to the proposed approach, the DWH will be distributed on a set of machines. In this case, the data management needs the collaboration and the interaction between those machines in order to reply to the user s queries while assuring the parallel processing of these queries. Thus, we have chosen to use the Multi-Agent System (MAS) with the mobile agents as essential actors. In fact, the MAS allows following the progress of the dynamic data distribution, facilitates the collaboration, the interaction, and the independency of the different machines, and improves the parallel execution of the user queries. The use of mobile agents in the proposed solution seems to be very helpful because it allows: (1) decreasing the network loads, (2) liberating client machines during the results preparation that needs generally a very important execution-time, (3) and, essentially, securing the data that are transported in the network (see Sect. 6). We use the SDDS principle based on data distribution through intervals (range partitioning) in order to distribute the data of the DWH on a set of machines. This type of distribution allows the decomposition of the DWH into a set of domains. Each domain can be stored on one or more machines according to its data size.

6 34 Distrib Parallel Databases (2009) 25: Principle of data distribution The DWH is horizontally distributed on a set of machines that have the same DBMS and the same star schema (see Fig. 1). Furthermore, on each machine, we can use the materialized views and indexes to tune and to optimize the performance. The principle is to start with a single machine for which we define: (1) the storage capacity limit of this machine for which the used DBMS gives its highest performance (for data access and storage), and (2) both the inferior bound mark and the superior one for each fact table key. When this machine reaches its limit, we add another one and we distribute the data on the two machines to obtain a balanced load. In most cases, the fact table undergoes the splitting operation, because of its important volume. The dimensional tables are distributed when their key constitutes a distribution criterion. Otherwise, they are duplicated. In Table 1, we present a scenario of data splitting. Machine 1 starts up the first splitting operation when it reaches its capacity storage limit. First, we search for the key value that gives two balanced partitions (e.g. Product_Id that is an integer of two numbers). Then, we move the data, related to the new interval, to machine 2. Finally, we update the intervals. The second splitting operation is launched by machine 2 (e.g. Date_Id that is a date). The same process is restarted when one machine reaches its limit capacity. In fact, the data distribution can be continued according to the same criteria or to other ones (Customer_Id, Region_Id). We notice that each SALE table record belongs to only one DWH partition. If we consider that each of these DWH partitions is stored in separate databases, we must, on the one hand, split the Date table and Product table according to the same criteria used for the SALES table. On the other hand, we duplicate the other tables in order to (1) facilitate the checking of the integrity constraints, (2) ensure the databases autonomy, and (3) improve the join time when we access to data. Fig. 1 Distributed data warehouse Table 1 Splitting scenario Start First splitting Second splitting... Machine 1 M1 M2 M1 M2 M3... Customer Id [A, Z] [A, Z] [A, Z]... [A, Z] [A, Z]... Production Id [0, 99] [0, 50] [51, 99] [51, 99] [51, 99] Region Id [AA, ZZ] [AA, ZZ] [AA, ZZ] [AA, ZZ] [AA, ZZ] Date Id [Jan, Dec] [Jan, Dec] [Jan, Dec] [Jan, Jun] [Jul, Dec]

7 Distrib Parallel Databases (2009) 25: The following part deals with the proposed multi-agent model architecture and the waiting database notion that we use in our approach. 5.2 The proposed multi-agent model The proposed model consists of five static agent classes (Client, Dispatcher, Splitting, Domain and Server) and a mobile agent class (Messenger). Each agent class is defined by its knowledge (static or dynamic), its acquaintances (agents that it knows and with which it can communicate), and its behavior [12]. Figure 2 illustrates the interaction between the different agents. The Client agents act as an interface between the user and the DWH management system (Dispatcher agent). In fact, the user utilizes the Client agent to send the data storage and the data access operations (queries) to the Dispatcher agent. Each Client agent has the Dispatcher agent as an acquaintance. Its static knowledge is made up of its name and its address. This agent class does not have dynamic knowledge. The Dispatcher agent arranges the received operations according to their arrival order. These operations will be treated by the Messenger agent. When the Dispatcher agent receives the operation results from the Messenger agents, it sends them to the Client agent, if the latter is connected. Otherwise, it saves them until the Client agent will be connected again. The acquaintances of the Dispatcher agent are: (i) the Client agents which send queries, (ii) the Messenger agents which take charge of executing these operations, and (iii) the Splitting agent. Its static knowledge consists of its name and its address. Its dynamic knowledge is made up of a list containing all the Domain agents existing in the system and two waiting queues. The first queue is used to store operations received from the Client agents. The second one is used to store the results provided by the Messenger agents. Then, the Dispatcher agent sends these results to the sending Client agent (as it is described above). The Messenger agents take charge of executing each operation found in the operations waiting queue of the Dispatcher agent. Each Messenger agent makes the Fig. 2 The proposed multi-agent model architecture

8 36 Distrib Parallel Databases (2009) 25: execution plan of this operation. Then, it visits all the Domain agents concerned with this operation. Finally, it gives the ultimate results to the Dispatcher agent. Each Messenger agent has as acquaintances the Dispatcher agent and the Domain agents necessary to execute the operation. Its static knowledge is made up of its name and its maximum size of data that it can transport. This maximum depends on the network characteristics. The Messenger agent dynamic knowledge consists of: (i) the list of Domain agents to visit for executing the operation, (ii) the operation to execute, (iii) the lists of data to store (if the operation is data storage), or the list of data that are collected from visited Domain agents (if the operation is data access), and (iv) the size of transported data. It has a very important role in our architecture because it allows: (1) reducing the message traffic on the network, (2) accelerating the data storage and access operations, and, essentially, (3) securing the data circulation on the network (see Sect. 6). The Domain agents are responsible for sending the operations to the Server agents which they control. Then, they collect the replies sent by the Server agents and transmit the final result to the Messenger agent. The Domain agent has as acquaintances: (i) the Server agents that are under its control, (ii) the Messenger agents with which it has operations to execute and (iii) the Splitting agent. Its static knowledge is composed of its name, its address, the disk space limit of each Server agent, the maximum number of Server agent it can manage and the maximum size of data it can receive from the Messenger agents. This maximum depends on the machine characteristics (memory, processor, etc...). Its dynamic knowledge consists of the descendant list, the size of memorized data, and two waiting queues. The first queue is used to store the operations brought by the Messenger agents. The second one is used to store the replies sent by the Server agents. Later on, the Domain agent sends them to the appropriate Messenger agent. The Server agents undertake the received operations and send the replies to the Domain agent. Each Server agent has the Domain agent to which it belongs as acquaintances. Its static knowledge is made up of its name and its address. Its dynamic knowledge is a waiting queue used to store the operations received from the Domain agent. The Splitting agent is responsible for the splitting operations and the maintaining of the data road card that allows finding the data location. The splitting operation is started up when the machine reaches its storage capacity limit. The role of this agent consists in the following steps. First, it creates a new Domain agent when it receives a splitting request. Then, it informs the Domain agent, asking for splitting, of the location and the characteristics of the new one. Finally, it sends to the Dispatcher agent the new information concerning the two Domain agents in order to update the Domain agents list. The Splitting agent has as acquaintances the Dispatcher agent and the Domain agents that ask for splitting. Its static knowledge consists of its name and its address. Its dynamic knowledge is the list of splitting requests sent by the Domain agents. The Dispatcher agent manages a metabase which allows it to follow the evolution of the data distribution on the Domain agents, the network status and the Messenger agents load rate (see Fig. 3). This metabase is also used by the Messenger agents to make the execution plans of the received operations and determine the Domain agents

9 Distrib Parallel Databases (2009) 25: Fig. 3 Agent MetaBases tables to visit. The Splitting agent, also, uses this metabase for the splitting operations and updates it at the end of each splitting operation. Furthermore, each Domain agent has an appropriate metabase in order to follow the evolution of the data distribution on its descendants (Server agents) (see Fig. 3, the framed tables). In the following section, we detail the dynamic of the proposed model for the data access operation. 6 Multi-agent dynamic for the data access operation The proposed model is designed to support the different management operations of data warehouse, namely the data storage, splitting, redirection and access. In this paper, we present only the data access operation and we will not consider the case where the system is interrupted. The sequence diagrams presented later describe both the interactions and the agent behaviors made to accomplish the data access operation. The formalism used to represent these diagrams is the MA-UML (Mobile Agent UML) [16], which is an extension of AUML (Agent UML allows modeling the mobile agent behaviors). In this operation, the used agents are: the Client agents, the Dispatcher agent, the Messenger agents, the Domain agents, and the Server agents. These agents exchange different messages in order to accomplish the data access operation. This exchange is shown in the diagram presented in Fig. 4. The data access operation is started up when the users submit their queries to the Client agents. These latter sent them to the Dispatcher agent. The Client agent is satisfied when receiving a result for each sent query. Otherwise, it sends again the query to the Dispatcher agent, eventually, if the query contains any syntax errors, it requests user to correct them. When receiving the queries, the Dispatcher agent assigns each query to a Messenger agent. If no Messenger agent is available, the Dispatcher agent creates one for

10 38 Distrib Parallel Databases (2009) 25: Fig. 4 Data access operation each query. The Dispatcher agent is satisfied when receiving a result for each query. This result will be sent to the appropriate Client agent. If this latter is not connected, the Dispatcher agent places the received result in its results queue. The Domain agent is unsatisfied, if the Messenger agent informs it that there are any syntax errors. In this case, the Dispatcher agent, in its turn, informs the Client agent which sending the query. The Messenger agent is in the charge of the query execution. When receiving the query, it determines the list of Domain agents containing the data replying to the query. The Messenger agent uses the available information in the metabase and the clause WHERE of the query, to determine these agents and their addresses. If this clause does not exist, the list will contain all the Domain agents in the system. The Messenger agent clone itself as much as the number of the visited Domain agents. Each cloned Messenger agent moves to one of the selected Domain agents. When it receives the reply from the visited agent, it returns to the original Messenger agent, sends it the query partial result and kills itself. The cloned Messenger agent is satisfied when receiving the reply from the visited Domain agent. If the query has a clause GROUP BY and/or a clause ORDER BY, the original Messenger agent creates a temporary table, corresponding to the query, to save the received partial results. When it receives all the partial results, the original Messenger agent executes the query on the temporary table to get the final result that will be sent to the Dispatcher agent and it drops the table. If the query does not have these two clauses, the original Messenger agent gathers the partial results and then sends the final result to the Dispatcher agent. The original Messenger agent is satisfied when all the cloned Messenger agents return with the partial results. When receiving the query from the cloned Messenger agent, the Domain agent verifies whether the data requested by the received query belongs to the Server agents

11 Distrib Parallel Databases (2009) 25: which are under its responsibility. If this condition is true, the Domain agent sends the query to the appropriate Server agents. Otherwise, the Domain agent forwards this query to the right Domain agent. The last case occurs when a splitting operation happens before the query arrival. The Domain agent is satisfied when receiving the results from all the Server agents. These results will be sent to the cloned Messenger agent. The Server agent executes the query and sends the obtained result to the responsible Domain agent. It is satisfied when replying to all the received queries. If there are any syntax errors, the Server agent is unsatisfied and it informs the Domain agent. In the following section, we present the results obtained for the data access operation. 7 Experimental evaluation In order to validate our model for the data access operation, we have implemented three prototypes and we have measured the query execution time. One of them permits to access data on a centralized database (DB). The others allow accessing data on a set of machines. In fact, as described below, we have made the experiences using one machine that sends the query and N (three then five) machines that contain the DWH partitions. These machines have the same configuration: P4 and 256 Mo (RAM). We have used JDeveloper10g as a development toolkit, Oracle as a DBMS, and IBM Aglets as a multi-agent platform. We have programmed an engine that inserts the data in DWH partitions. In the first prototype, we have used two machines (Client/Server) and we have programmed an engine which accesses the data, stored on the server machine (centralized DWH), from the client machine. In the second prototype, we have programmed an access engine, without MAS, that accesses the data distributed on a set of machines (three then five machines) using the database links etc.). Each machine contains 1/N of the use data size. The given results (see Figs. 5 and 6) illustrate the aggregate functions (count, sum, avg, max, and min) with this type of query: Fig. 5 Experimental results with data size = 600 Mo without Group by/order by

12 40 Distrib Parallel Databases (2009) 25: Fig. 6 Experimental results with data size = 2.1 Go without Group by/order by Fig. 7 Experimental results with data size = 600 Mo with Group by/order by Select aggregate_function (s) From ((Select aggregate_function (sale_qt) s From Sales@dwh1) Union all (...)... Union all (Select aggregate_function (sale_qt) s From Sales@dwhN)); In the last prototype, we have programmed the MAS dynamic (see Sect. 6). In this prototype, the machines are used as follows: (1) on one of these machines we have made the Dispatcher agent, the metabase (MB), the Client agent and the Messenger agents, and (2) on each of the other N machines, we have made a Domain agent, a partition of the DWH database (DWHi) containing 1/N of the used data size, a MB and a Server agent. The query type used to get the given results (see Figs. 5 and 6)is: Select aggregate_function (sale_qt) From Sales; We have tested these prototypes using different data sizes: records equivalent to 600 Mo and records equivalent to 2.1 Go. We have also tested our model (see Figs. 7 and 8) using this type of query: Select... Group by region_id Order by region_id

13 Distrib Parallel Databases (2009) 25: Fig. 8 Experimental results with data size = 2.1 Go with Group by/order by Table 2 Table of Average gains percentage compared to the centralized DWH Query Query without group by/order by with group by/order by = 600 Mo = 2.1Go = 600 Mo = 2.1Go MAS MAS Distributed DWH without MAS 3 Distributed DWH without MAS 5 In Table 2, we present the average gains percentage obtained when we compared the distributed prototypes to the centralized prototype. When we compare the time needed to execute queries, by the prototype using a distributed DWH without MAS to the time needed by the prototype using a centralized DWH, we remark, in most of cases, that the average gains are positive. This is explained by the facts that: (1) the query accesses only a small part of the fact table, and (2) we execute the query on the fact table part in parallel. These averages turn negative when we have a small data size distributed on a set of machines. In these cases, the data load time becomes sizeable in the query execution time. We note that the average gains given by our model are the best. These gains result from reducing: (1) the network load charge (the Messenger agent encapsulates the partial result) and (2) the communications between machines (each machine executes the query locally). In Table 3, we give the time needed to execute each query step on each used machine to demonstrate these gains. We take as examples the Avg query and the ALL functions query using 3 machines and data size = 2.1 Go.

14 42 Distrib Parallel Databases (2009) 25: Table 3 Time execution by query step by machine Avg query ALL functions query Without group by and order by time in ms (a) M1 M2 M3 M1 M2 M3 Query execution (ServA) Tuple transmission (MessA) MAS coordination Execution time (DomA) Total execution time With group by and order by time in ms (b) M1 M2 M3 M1 M2 M3 Query execution (ServA) Data grouping and sorting (ServA) Tuple transmission (MessA) Insertion partial results in temporary table (MessA) MAS coordination Execution time (DomA) Create the temporary table (MessA) Made the final result (MessA) Total execution time ServA = Server Agent, MessA = Messenger Agent, DomA = Domain Agent We note that the time needed to execute the query on each machine is equal to the time needed to execute the query on a centralized DWH (AVG query (a) = ms, AVG query (b) = ms, ALL query (a) = ms, ALL query (b) = ms) divided by three. In addition, the time required to transmit tuples, to coordinate MAS and to make the final result increase slightly when the number of returned tuples increases. This time is approximately 6500 ms. For the query without group by and order by clauses, when we distribute the data on 5 machines, the time of the query execution is reduced approximately by an average equal to 1000 ms. But, for the query with group by and order by clauses, the time

15 Distrib Parallel Databases (2009) 25: of the query execution is reduced approximately by an average equal to 4500 ms. And, for these two query types, the time needed coordinate MAS and to make the final result increase by approximately 2500 ms. This is why, the time obtained for the same queries, when using 5 machines, is as follow: AVG query (a) = ms, AVG query (b) = ms, ALL query (a) = ms, ALL query (b) = ms. Our model not only gives the best access time but it also secures the data circulation on the network. In fact, we have made a function that the cloned Messenger agent executes, at each time, when it reaches one machine. This function allows to the cloned Messenger agent to check whether the address of the reached machine belongs to its address list. If the address is not found, the cloned Messenger agent tries to leave this machine. If it cannot leave this machine, it destroys the transported data and kills itself. 8 Conclusion In this article, we have presented some researches that deal with the data distribution in the DWH context and the multi-agent system. Then, we have described our proposed multi-agent model and its global dynamic concerning the data access operation. Finally, we have demonstrated the improvements obtained when we have used the MAS and the Messenger agents in the data access operation. We can conclude that when the number of used machines increases the average gains given by our model increase. But, we have to note that the increase in the number of used machines is relative to data size and the query complexity. Otherwise, if we have a few data distributed on a big number of machines, the circulation time makes by the cloned Messenger agents becomes sizeable and the centralized DWH access will be more efficient. These results will be considered to perform the data splitting operation. For each query, we estimate the execution time if we distribute the data on two machines. If this time is less than the time made when data are centralized, we split data. As near future work, we will test our model with Benchmarks (TPC-H and APB-1) and we will compare the given results to those obtained in the literature. We will, also, implement the query redirection process. Another future direction is to study how to make our system robust enough to deal with the momentarily unavailability of one or more machines. References 1. Arcangeli, J., Hameurlain, A., Migeon, F., Morvan, F.: Apport des agents mobiles à l évaluation et l optimisation de requêtes bases de données réparties à grande échelle. Technical Report, laboratory IRIT, Université Paul Sabatier (2002) 2. Bellatreche, L., Boukhalfa, K.: An evolutionary approach to schema partitioning selection in a data warehouse. In: DAWAK 2005, Bellatreche, L., Schneider, M., Lorinquer, H., Mohania, M.: Bringing together partitioning, materialized views and indexes to optimize performance of relational data warehouses. In: Proceeding of the International Conference on Data Warehousing and Knowledge Discovery (DAWAK 2004), pp , September 2004

16 44 Distrib Parallel Databases (2009) 25: Bennour, F.: Les structures de données distribuées et scalables sous windows: tendance hachage linéaire. Doctoral Thesis U. Paris 9, Bernardino, J., Madeira, H.: Data warehousing and OLAP: improving query performance using distributed computing. In: 12th Conference on Advanced Information Systems Engineering. Stockholm, Sweden Bernardino, J., Furtado, P.S., Madeira, H.C.: Approximate query answering using data warehouse striping. J. Intell. Inf. Syst. 19(2), (2002) 7. Börger, E., Stärk, R.: Abstract State Machines. Springer, Berlin, Heidelberg, New York (2003) 8. Almeida, R., Vieira, J., Vieira, M., Madeira, H., Bernardino, J.: Efficient data distribution for DWS. In: Proc. of the 10th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 08), Turin, Italy, September Lecture Notes in Computer Science, vol Springer, Berlin (2008) ISBN Derrar, H., Boussaïd, O., Ahmed-Nacer, M.: Une approche de répartition des données d un entrepôt basée sur l optimisation par essaim particulaire. In: 4èmes journées francophones sur les Entrepôts de Données et l Analyse enligne (EDA 2008), Toulouse, Juin 2008; RNTI, vol. B-4, Cépaduès, Toulouse, pp Diene, Litwin, W.: Performance measurements of RP*: a scalable distributed data structure for range partitioning. In: Int. Conf. on Information Society in the 21st Century: Emerging Techn. and New Challenges. Aizu City, Japan, Informatica white paper. Enterprise-scalable data marts: a new strategy for building and deploying fast, scalable data warehousing systems. (1997) 12. Ferber, J.: Les Systemes Multi-Agents vers une Intelligence Collective. InterEditions, Paris (1995) 13. Ferber, J.: Multi-Agent System: An Introduction to Distributed Artificial Intelligence. Addison- Wesley, Longman, Harlow (1999) 14. Furtado, P.: Experimental evidence on partitioning in parallel data warehouses. In: DOLAP 04 WORKSHOP of the Int l Conference on Information and Knowledge Management (CIKM), Washington, November Gupta, H.: Selection and maintenance of views in a data warehouse. Ph.D. thesis, Standford University, September (1999) 16. Hachicha, H., Loukil, A., Ghédira, K.: MA-UML: une extension de A-UML aux agents mobiles. In: JFIADSMA 2002, Lille, French 17. Harrison, C.G., Chess, D.M., Kershenbaum, A.: Mobile agents: are they a good idea? Technical report, IBM Research Division (1995) 18. Hewlett-Packard white paper. HP Intelligent Warehouse. (1997) 19. Inmon, W.: Building the data warehouse. QED Technical Publishing Group (1992) 20. Kolsi, N., Abdellatif, A., Ghedira, K.: Agent based dynamic data storage and distribution in data warehouses. In: KES-AMSTA, Kolsi, N., Ghedira, K., Abdellatif, A.: Utilisation d un système multi-agents pour la répartition et la scalabilité des données d un data warehouse. In: Acts of the Fourth Scientific Days, Tome 1, pp , Borj El Amri Aviation School, Tunis, Tunisia, May Kotidis, Y., Roussopoulos, N.: Dynamat: a dynamic view management system for data warehouses. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp , June Litwin, W., Neimat, M.A., Schneider, D.: RP*: a family of order-preserving scalable distributed data structures. In: 20th Intl. Conf. On very Large Data Bases VLDB, Litwin, W., Risch, T., Schwarz, Th.: An architecture for a scalable distributed DBS: application to SQL server 2000, Extended abstract. In: 2nd Intl. Workshop on Cooperative Internet Computing (CIC 2002), Hong Kong, August, Narasayya, S.V.R., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp , June Ndiaye, Y., Diene, A., Litwin, W., Risch, W.: AMOS-SDDS: a scalable distributed data manager for windows multicomputers. In: ISCA 14th Intl. Conf. on Par. and Distr. Computing Systems, Texas, USA, August 8 10, Sahri, S., Litwin, W., Schwartz, T.: An overview of a scalable distributed database system SD-SQL server. In: Bell, D., Hong, J. (eds.) Flexible and Efficient Information Handling: 23d British National Conference on Databases, BNCOD 2006, Belfast, Northern Ireland, UK, July 2006 Proceedings. Lecture Notes in Computer Science, vol. 4942, pp Springer, Berlin, Heidelberg, New York (2006)

17 Distrib Parallel Databases (2009) 25: Surajit, S.C., Narasayya, V.R.: Automated selection of materialized views and indexes in microsoft SQL server. In: Proceedings of the International Conference on Very Large Databases, pp , September Wu, M., Buchmann, A.: Research issues in data warehousing. In: BTW 97, March Zhao, J., Ma, H.: Quality-assured design of on-line analytical processing systems using abstract state machines. In: Ehrich, H.-D., Schewe, K.-D. (eds.) Proceedings of the Fourth International Conference on Quality Software (QSIC 2004), Braun-Schweig, Germany, IEEE Computer Society Press, Los Alamitos (2004) 31. Zhao, J., Schewe, K.-D.: Using abstract state machines for distributed data warehouse design. In: Hartmann, S., Roddick, J. (eds.) Conceptual Modelling 2004 First Asia-Pacific Conference on Conceptual Modelling, Dunedin, New Zealand, CRPIT, vol. 31, pp Australian Computer Society, Sydney (2004)

Agent Based Architecture in Distributed Data Warehousing

Agent Based Architecture in Distributed Data Warehousing International Journal of Scientific and Research Publications, Volume 2, Issue 5, May 2012 1 Agent Based Architecture in Distributed Data Warehousing Bindia, Jaspreet Kaur Sahiwal Department of Computer

More information

Data Warehouse Design Using Row and Column Data Distribution

Data Warehouse Design Using Row and Column Data Distribution Int'l Conf. Information and Knowledge Engineering IKE'15 55 Data Warehouse Design Using Row and Column Data Distribution Behrooz Seyed-Abbassi and Vivekanand Madesi School of Computing, University of North

More information

Crises Management in Multiagent Workflow Systems

Crises Management in Multiagent Workflow Systems Crises Management in Multiagent Workflow Systems Małgorzata Żabińska Department of Computer Science, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Kraków, Poland zabinska@agh.edu.pl

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Managing Data Resources

Managing Data Resources Chapter 7 Managing Data Resources 7.1 2006 by Prentice Hall OBJECTIVES Describe basic file organization concepts and the problems of managing data resources in a traditional file environment Describe how

More information

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 06, 2016 ISSN (online):

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 06, 2016 ISSN (online): IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 06, 2016 ISSN (online): 2321-0613 Tanzeela Khanam 1 Pravin S.Metkewar 2 1 Student 2 Associate Professor 1,2 SICSR, affiliated

More information

A MAS Based ETL Approach for Complex Data

A MAS Based ETL Approach for Complex Data A MAS Based ETL Approach for Complex Data O. Boussaid, F. Bentayeb, J. Darmont Abstract : In a data warehousing process, the phase of data integration is crucial. Many methods for data integration have

More information

Novel Materialized View Selection in a Multidimensional Database

Novel Materialized View Selection in a Multidimensional Database Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/

More information

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) 4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,

More information

Designing and Implementing an Object Relational Data Warehousing System

Designing and Implementing an Object Relational Data Warehousing System Designing and Implementing an Object Relational Data Warehousing System Abstract Bodgan Czejdo 1, Johann Eder 2, Tadeusz Morzy 3, Robert Wrembel 3 1 Department of Mathematics and Computer Science, Loyola

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Improving the Performance of OLAP Queries Using Families of Statistics Trees Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

Advances in Databases and Information Systems 1997

Advances in Databases and Information Systems 1997 ELECTRONIC WORKSHOPS IN COMPUTING Series edited by Professor C.J. van Rijsbergen Rainer Manthey and Viacheslav Wolfengagen (Eds) Advances in Databases and Information Systems 1997 Proceedings of the First

More information

Design Patterns for Description-Driven Systems

Design Patterns for Description-Driven Systems Design Patterns for Description-Driven Systems N. Baker 3, A. Bazan 1, G. Chevenier 2, Z. Kovacs 3, T Le Flour 1, J-M Le Goff 4, R. McClatchey 3 & S Murray 1 1 LAPP, IN2P3, Annecy-le-Vieux, France 2 HEP

More information

QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR

QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR International Journal of Emerging Technology and Innovative Engineering QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR V.Megha Dept of Computer science and Engineering College Of Engineering

More information

A generic conceptual framework for selfmanaged

A generic conceptual framework for selfmanaged A generic conceptual framework for selfmanaged environments E. Lavinal, T. Desprats, and Y. Raynaud IRIT, UMR 5505 - Paul Sabatier University 8 route de Narbonne, F-3062 Toulouse cedex 9 {lavinal, desprats,

More information

UMCS. Annales UMCS Informatica AI 6 (2007) Fault tolerant control for RP* architecture of Scalable Distributed Data Structures

UMCS. Annales UMCS Informatica AI 6 (2007) Fault tolerant control for RP* architecture of Scalable Distributed Data Structures Annales Informatica AI 6 (2007) 5-13 Annales Informatica Lublin-Polonia Sectio AI http://www.annales.umcs.lublin.pl/ Fault tolerant control for RP* architecture of Scalable Distributed Data Structures

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

Performance Measurements of RP* : A Scalable Distributed Data Structure For Range Partitioning

Performance Measurements of RP* : A Scalable Distributed Data Structure For Range Partitioning Performance Measurements of RP* : A Scalable Distributed Data Structure For Range Partitioning Aly Wane Diène & Witold Litwin CERIA University Paris 9 Dauphine http://ceria ceria.dauphine..dauphine.fr

More information

e ara e om utin stems:

e ara e om utin stems: ~cs~l? MPCS'94 25,0. SJ~ First International Conference on - BSSI e ara e om utin stems: The Challenges of General-Purpose and Special-Purpose Computing May 2-6, 1994 Ischia, Italy ~IEEE Computer Society

More information

Evaluation of Parallel Programs by Measurement of Its Granularity

Evaluation of Parallel Programs by Measurement of Its Granularity Evaluation of Parallel Programs by Measurement of Its Granularity Jan Kwiatkowski Computer Science Department, Wroclaw University of Technology 50-370 Wroclaw, Wybrzeze Wyspianskiego 27, Poland kwiatkowski@ci-1.ci.pwr.wroc.pl

More information

An Overview of Cost-based Optimization of Queries with Aggregates

An Overview of Cost-based Optimization of Queries with Aggregates An Overview of Cost-based Optimization of Queries with Aggregates Surajit Chaudhuri Hewlett-Packard Laboratories 1501 Page Mill Road Palo Alto, CA 94304 chaudhuri@hpl.hp.com Kyuseok Shim IBM Almaden Research

More information

Mouse Pointer Tracking with Eyes

Mouse Pointer Tracking with Eyes Mouse Pointer Tracking with Eyes H. Mhamdi, N. Hamrouni, A. Temimi, and M. Bouhlel Abstract In this article, we expose our research work in Human-machine Interaction. The research consists in manipulating

More information

Mobile Agent-Based Load Monitoring System for the Safety Web Server Environment

Mobile Agent-Based Load Monitoring System for the Safety Web Server Environment Mobile -Based Load Monitoring System for the Safety Web Server Environment H.J. Park 1, K.J. Jyung 2, and S.S. Kim 3 1 School of Computer Information and Communication Engineering, Sangji University, Woosandong,

More information

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Job Re-Packing for Enhancing the Performance of Gang Scheduling Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction Parallel machines are becoming quite common and affordable Prices of microprocessors, memory and disks have dropped sharply Recent desktop computers feature

More information

Chapter 5 INTRODUCTION TO MOBILE AGENT

Chapter 5 INTRODUCTION TO MOBILE AGENT Chapter 5 INTRODUCTION TO MOBILE AGENT 135 Chapter 5 Introductions to Mobile Agent 5.1 Mobile agents What is an agent? In fact a software program is containing an intelligence to help users and take action

More information

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management Management Information Systems Review Questions Chapter 6 Foundations of Business Intelligence: Databases and Information Management 1) The traditional file environment does not typically have a problem

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

AMOS-SDDS: A Scalable Distributed Data Manager for Windows Multicomputers

AMOS-SDDS: A Scalable Distributed Data Manager for Windows Multicomputers To be presented at the ISCA 14th Intl. Conf. on Par. and Distr. Computing Systems, Texas, USA, August 8-10, 2001 AMOS-SDDS: A Scalable Distributed Data Manager for Windows Multicomputers Yakham Ndiaye,

More information

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting. DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting April 14, 2009 Whitemarsh Information Systems Corporation 2008 Althea Lane Bowie,

More information

Performance Problems of Forecasting Systems

Performance Problems of Forecasting Systems Performance Problems of Forecasting Systems Haitang Feng Supervised by: Nicolas Lumineau and Mohand-Saïd Hacid Université de Lyon, CNRS Université Lyon 1, LIRIS, UMR5205, F-69622, France {haitang.feng,

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

Correlation Based Feature Selection with Irrelevant Feature Removal

Correlation Based Feature Selection with Irrelevant Feature Removal Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

SAS Scalable Performance Data Server 4.3

SAS Scalable Performance Data Server 4.3 Scalability Solution for SAS Dynamic Cluster Tables A SAS White Paper Table of Contents Introduction...1 Cluster Tables... 1 Dynamic Cluster Table Loading Benefits... 2 Commands for Creating and Undoing

More information

MODELING THE PHYSICAL DESIGN OF DATA WAREHOUSES FROM A UML SPECIFICATION

MODELING THE PHYSICAL DESIGN OF DATA WAREHOUSES FROM A UML SPECIFICATION MODELING THE PHYSICAL DESIGN OF DATA WAREHOUSES FROM A UML SPECIFICATION Sergio Luján-Mora, Juan Trujillo Department of Software and Computing Systems University of Alicante Alicante, Spain email: {slujan,jtrujillo}@dlsi.ua.es

More information

DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY

DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY Reham I. Abdel Monem 1, Ali H. El-Bastawissy 2 and Mohamed M. Elwakil 3 1 Information Systems Department, Faculty of computers and information,

More information

StreamOLAP. Salman Ahmed SHAIKH. Cost-based Optimization of Stream OLAP. DBSJ Japanese Journal Vol. 14-J, Article No.

StreamOLAP. Salman Ahmed SHAIKH. Cost-based Optimization of Stream OLAP. DBSJ Japanese Journal Vol. 14-J, Article No. StreamOLAP Cost-based Optimization of Stream OLAP Salman Ahmed SHAIKH Kosuke NAKABASAMI Hiroyuki KITAGAWA Salman Ahmed SHAIKH Toshiyuki AMAGASA (SPE) OLAP OLAP SPE SPE OLAP OLAP OLAP Due to the increase

More information

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems 1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in

More information

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures) CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm

More information

La Fragmentation Horizontale Revisitée: Prise en Compte de l Interaction de Requêtes

La Fragmentation Horizontale Revisitée: Prise en Compte de l Interaction de Requêtes National Engineering School of Mechanic & Aerotechnics 1, avenue Clément Ader - BP 40109-86961 Futuroscope cedex France La Fragmentation Horizontale Revisitée: Prise en Compte de l Interaction de Requêtes

More information

Fundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON.

Fundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON. Fundamentals of Database Systems 5th Edition Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant B. Navathe College of Computing Georgia Institute

More information

Efficient integration of data mining techniques in DBMSs

Efficient integration of data mining techniques in DBMSs Efficient integration of data mining techniques in DBMSs Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex, FRANCE {bentayeb jdarmont

More information

Developing InfoSleuth Agents Using Rosette: An Actor Based Language

Developing InfoSleuth Agents Using Rosette: An Actor Based Language Developing InfoSleuth Agents Using Rosette: An Actor Based Language Darrell Woelk Microeclectronics and Computer Technology Corporation (MCC) 3500 Balcones Center Dr. Austin, Texas 78759 InfoSleuth Architecture

More information

Managing Data Resources

Managing Data Resources Chapter 7 OBJECTIVES Describe basic file organization concepts and the problems of managing data resources in a traditional file environment Managing Data Resources Describe how a database management system

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

Using Tcl Mobile Agents for Monitoring Distributed Computations

Using Tcl Mobile Agents for Monitoring Distributed Computations Using Tcl Mobile Agents for Monitoring Distributed Computations Dilyana Staneva, Emil Atanasov Abstract: Agents, integrating code and data mobility, can be used as building blocks for structuring distributed

More information

Revisiting Join Site Selection in Distributed Database Systems

Revisiting Join Site Selection in Distributed Database Systems Revisiting Join Site Selection in Distributed Database Systems Haiwei Ye 1, Brigitte Kerhervé 2, and Gregor v. Bochmann 3 1 Département d IRO, Université de Montréal, CP 6128 succ Centre-Ville, Montréal

More information

Horizontal Aggregations for Mining Relational Databases

Horizontal Aggregations for Mining Relational Databases Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,

More information

Research Article ISSN:

Research Article ISSN: Research Article [Srivastava,1(4): Jun., 2012] IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY An Optimized algorithm to select the appropriate Schema in Data Warehouses Rahul

More information

Computing Data Cubes Using Massively Parallel Processors

Computing Data Cubes Using Massively Parallel Processors Computing Data Cubes Using Massively Parallel Processors Hongjun Lu Xiaohui Huang Zhixian Li {luhj,huangxia,lizhixia}@iscs.nus.edu.sg Department of Information Systems and Computer Science National University

More information

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model Chapter 3 The Multidimensional Model: Basic Concepts Introduction Multidimensional Model Multidimensional concepts Star Schema Representation Conceptual modeling using ER, UML Conceptual modeling using

More information

Adapting Mixed Workloads to Meet SLOs in Autonomic DBMSs

Adapting Mixed Workloads to Meet SLOs in Autonomic DBMSs Adapting Mixed Workloads to Meet SLOs in Autonomic DBMSs Baoning Niu, Patrick Martin, Wendy Powley School of Computing, Queen s University Kingston, Ontario, Canada, K7L 3N6 {niu martin wendy}@cs.queensu.ca

More information

Least-Connection Algorithm based on variable weight for multimedia transmission

Least-Connection Algorithm based on variable weight for multimedia transmission Least-onnection Algorithm based on variable weight for multimedia transmission YU SHENGSHENG, YANG LIHUI, LU SONG, ZHOU JINGLI ollege of omputer Science Huazhong University of Science & Technology, 1037

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

data dependence Data dependence Structure dependence

data dependence Data dependence Structure dependence data dependence Structure dependence If the file-system programs are affected by change in the file structure, they exhibit structuraldependence. For example, when we add dateof-birth field to the CUSTOMER

More information

Database system development lifecycles

Database system development lifecycles Database system development lifecycles 2009 Yunmook Nah Department of Electronics and Computer Engineering School of Computer Science & Engineering Dankook University 이석호 ä ± Á Ç ºÐ ¼ ¼³ è ± Çö î µ ½Ã

More information

Managing Changes to Schema of Data Sources in a Data Warehouse

Managing Changes to Schema of Data Sources in a Data Warehouse Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2001 Proceedings Americas Conference on Information Systems (AMCIS) December 2001 Managing Changes to Schema of Data Sources in

More information

Distributed KIDS Labs 1

Distributed KIDS Labs 1 Distributed Databases @ KIDS Labs 1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database

More information

Specific Objectives Contents Teaching Hours 4 the basic concepts 1.1 Concepts of Relational Databases

Specific Objectives Contents Teaching Hours 4 the basic concepts 1.1 Concepts of Relational Databases Course Title: Advanced Database Management System Course No. : ICT. Ed 525 Nature of course: Theoretical + Practical Level: M.Ed. Credit Hour: 3(2T+1P) Semester: Second Teaching Hour: 80(32+8) 1. Course

More information

Paradigm Shift of Database

Paradigm Shift of Database Paradigm Shift of Database Prof. A. A. Govande, Assistant Professor, Computer Science and Applications, V. P. Institute of Management Studies and Research, Sangli Abstract Now a day s most of the organizations

More information

Data Warehousing Introduction. Toon Calders

Data Warehousing Introduction. Toon Calders Data Warehousing Introduction Toon Calders toon.calders@ulb.ac.be Course Organization Lectures on Tuesday 14:00 and Friday 16:00 Check http://gehol.ulb.ac.be/ for room Most exercises in computer class

More information

Oracle Tuxedo. CORBA Technical Articles 11g Release 1 ( ) March 2010

Oracle Tuxedo. CORBA Technical Articles 11g Release 1 ( ) March 2010 Oracle Tuxedo CORBA Technical Articles 11g Release 1 (11.1.1.1.0) March 2010 Oracle Tuxedo CORBA Technical Articles, 11g Release 1 (11.1.1.1.0) Copyright 1996, 2010, Oracle and/or its affiliates. All rights

More information

Peer-to-Peer Systems. Chapter General Characteristics

Peer-to-Peer Systems. Chapter General Characteristics Chapter 2 Peer-to-Peer Systems Abstract In this chapter, a basic overview is given of P2P systems, architectures, and search strategies in P2P systems. More specific concepts that are outlined include

More information

LH*TH: New fast Scalable Distributed Data Structures (SDDSs)

LH*TH: New fast Scalable Distributed Data Structures (SDDSs) IJCSI International Journal of Computer Science Issues, Volume, Issue 6, No 2, November 204 ISSN (Print): 694-084 ISSN (Online): 694-0784 www.ijcsi.org 23 LH*TH: New fast Scalable Distributed Data Structures

More information

V Conclusions. V.1 Related work

V Conclusions. V.1 Related work V Conclusions V.1 Related work Even though MapReduce appears to be constructed specifically for performing group-by aggregations, there are also many interesting research work being done on studying critical

More information

Data about data is database Select correct option: True False Partially True None of the Above

Data about data is database Select correct option: True False Partially True None of the Above Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another

More information

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator R.Saravanan 1, J.Sivapriya 2, M.Shahidha 3 1 Assisstant Professor, Department of IT,SMVEC, Puducherry, India 2,3 UG student, Department

More information

SQL-to-MapReduce Translation for Efficient OLAP Query Processing

SQL-to-MapReduce Translation for Efficient OLAP Query Processing , pp.61-70 http://dx.doi.org/10.14257/ijdta.2017.10.6.05 SQL-to-MapReduce Translation for Efficient OLAP Query Processing with MapReduce Hyeon Gyu Kim Department of Computer Engineering, Sahmyook University,

More information

Evaluating the Performance of Mobile Agent-Based Message Communication among Mobile Hosts in Large Ad Hoc Wireless Network

Evaluating the Performance of Mobile Agent-Based Message Communication among Mobile Hosts in Large Ad Hoc Wireless Network Evaluating the Performance of Mobile Agent-Based Communication among Mobile Hosts in Large Ad Hoc Wireless Network S. Bandyopadhyay Krishna Paul PricewaterhouseCoopers Limited Techna Digital Systems Sector

More information

A Resource Look up Strategy for Distributed Computing

A Resource Look up Strategy for Distributed Computing A Resource Look up Strategy for Distributed Computing F. AGOSTARO, A. GENCO, S. SORCE DINFO - Dipartimento di Ingegneria Informatica Università degli Studi di Palermo Viale delle Scienze, edificio 6 90128

More information

A Low-Cost Correction Algorithm for Transient Data Errors

A Low-Cost Correction Algorithm for Transient Data Errors A Low-Cost Correction Algorithm for Transient Data Errors Aiguo Li, Bingrong Hong School of Computer Science and Technology Harbin Institute of Technology, Harbin 150001, China liaiguo@hit.edu.cn Introduction

More information

On Latency Management in Time-Shared Operating Systems *

On Latency Management in Time-Shared Operating Systems * On Latency Management in Time-Shared Operating Systems * Kevin Jeffay University of North Carolina at Chapel Hill Department of Computer Science Chapel Hill, NC 27599-3175 jeffay@cs.unc.edu Abstract: The

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 02 Introduction to Data Warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Qualitative Evaluation Profiles of Data-Warehousing Systems

Qualitative Evaluation Profiles of Data-Warehousing Systems Qualitative Evaluation Profiles of -Warehousing Systems Cyril S. Ku and Yu H. Zhou Department of Computer Science William Paterson University Wayne, NJ 07470, USA Abstract base optimization is one of the

More information

Resource and Service Trading in a Heterogeneous Large Distributed

Resource and Service Trading in a Heterogeneous Large Distributed Resource and Service Trading in a Heterogeneous Large Distributed ying@deakin.edu.au Y. Ni School of Computing and Mathematics Deakin University Geelong, Victoria 3217, Australia ang@deakin.edu.au Abstract

More information

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 A Real Time GIS Approximation Approach for Multiphase

More information

Simulating a Finite State Mobile Agent System

Simulating a Finite State Mobile Agent System Simulating a Finite State Mobile Agent System Liu Yong, Xu Congfu, Chen Yanyu, and Pan Yunhe College of Computer Science, Zhejiang University, Hangzhou 310027, P.R. China Abstract. This paper analyzes

More information

Database Server. 2. Allow client request to the database server (using SQL requests) over the network.

Database Server. 2. Allow client request to the database server (using SQL requests) over the network. Database Server Introduction: Client/Server Systems is networked computing model Processes distributed between clients and servers. Client Workstation (usually a PC) that requests and uses a service Server

More information

Data Warehousing and OLAP Technologies for Decision-Making Process

Data Warehousing and OLAP Technologies for Decision-Making Process Data Warehousing and OLAP Technologies for Decision-Making Process Hiren H Darji Asst. Prof in Anand Institute of Information Science,Anand Abstract Data warehousing and on-line analytical processing (OLAP)

More information

Improving Resource Management And Solving Scheduling Problem In Dataware House Using OLAP AND OLTP Authors Seenu Kohar 1, Surender Singh 2

Improving Resource Management And Solving Scheduling Problem In Dataware House Using OLAP AND OLTP Authors Seenu Kohar 1, Surender Singh 2 Improving Resource Management And Solving Scheduling Problem In Dataware House Using OLAP AND OLTP Authors Seenu Kohar 1, Surender Singh 2 1 M.tech Computer Engineering OITM Hissar, GJU Univesity Hissar

More information

System and method for encoding and decoding data files

System and method for encoding and decoding data files ( 1 of 1 ) United States Patent 7,246,177 Anton, et al. July 17, 2007 System and method for encoding and decoding data files Abstract Distributed compression of a data file can comprise a master server

More information

An Oracle White Paper April 2010

An Oracle White Paper April 2010 An Oracle White Paper April 2010 In October 2009, NEC Corporation ( NEC ) established development guidelines and a roadmap for IT platform products to realize a next-generation IT infrastructures suited

More information

The Near Greedy Algorithm for Views Selection in Data Warehouses and Its Performance Guarantees

The Near Greedy Algorithm for Views Selection in Data Warehouses and Its Performance Guarantees The Near Greedy Algorithm for Views Selection in Data Warehouses and Its Performance Guarantees Omar H. Karam Faculty of Informatics and Computer Science, The British University in Egypt and Faculty of

More information

CGS 3066: Spring 2017 SQL Reference

CGS 3066: Spring 2017 SQL Reference CGS 3066: Spring 2017 SQL Reference Can also be used as a study guide. Only covers topics discussed in class. This is by no means a complete guide to SQL. Database accounts are being set up for all students

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 05 Data Modeling Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Data Modeling

More information

Decision Support Systems aka Analytical Systems

Decision Support Systems aka Analytical Systems Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis

More information

Correctness Criteria Beyond Serializability

Correctness Criteria Beyond Serializability Correctness Criteria Beyond Serializability Mourad Ouzzani Cyber Center, Purdue University http://www.cs.purdue.edu/homes/mourad/ Brahim Medjahed Department of Computer & Information Science, The University

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Mobile Element Scheduling for Efficient Data Collection in Wireless Sensor Networks: A Survey

Mobile Element Scheduling for Efficient Data Collection in Wireless Sensor Networks: A Survey Journal of Computer Science 7 (1): 114-119, 2011 ISSN 1549-3636 2011 Science Publications Mobile Element Scheduling for Efficient Data Collection in Wireless Sensor Networks: A Survey K. Indra Gandhi and

More information

RAMSES: a Reflective Middleware for Software Evolution

RAMSES: a Reflective Middleware for Software Evolution RAMSES: a Reflective Middleware for Software Evolution Walter Cazzola 1, Ahmed Ghoneim 2, and Gunter Saake 2 1 Department of Informatics and Communication, Università degli Studi di Milano, Italy cazzola@dico.unimi.it

More information

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining Miss. Rituja M. Zagade Computer Engineering Department,JSPM,NTC RSSOER,Savitribai Phule Pune University Pune,India

More information

CREATING CUSTOMIZED DATABASE VIEWS WITH USER-DEFINED NON- CONSISTENCY REQUIREMENTS

CREATING CUSTOMIZED DATABASE VIEWS WITH USER-DEFINED NON- CONSISTENCY REQUIREMENTS CREATING CUSTOMIZED DATABASE VIEWS WITH USER-DEFINED NON- CONSISTENCY REQUIREMENTS David Chao, San Francisco State University, dchao@sfsu.edu Robert C. Nickerson, San Francisco State University, RNick@sfsu.edu

More information

Hybrid Approach for the Maintenance of Materialized Webviews

Hybrid Approach for the Maintenance of Materialized Webviews Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2010 Proceedings Americas Conference on Information Systems (AMCIS) 8-2010 Hybrid Approach for the Maintenance of Materialized Webviews

More information

Updates through Views

Updates through Views 1 of 6 15 giu 2010 00:16 Encyclopedia of Database Systems Springer Science+Business Media, LLC 2009 10.1007/978-0-387-39940-9_847 LING LIU and M. TAMER ÖZSU Updates through Views Yannis Velegrakis 1 (1)

More information

Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes and Tori

Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes and Tori The Computer Journal, 46(6, c British Computer Society 2003; all rights reserved Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes Tori KEQIN LI Department of Computer Science,

More information

A Mobile Agent-based Model for Service Management in Virtual Active Networks

A Mobile Agent-based Model for Service Management in Virtual Active Networks A Mobile Agent-based Model for Service Management in Virtual Active Networks Fábio Luciano Verdi and Edmundo R. M. Madeira Institute of Computing, University of Campinas (UNICAMP), Campinas-SP, Brazil

More information