Modeling Object Flows from Distributed and Federated RFID Data Streams for Efficient Tracking and Tracing (Supplemental)

Size: px
Start display at page:

Download "Modeling Object Flows from Distributed and Federated RFID Data Streams for Efficient Tracking and Tracing (Supplemental)"

Transcription

1 1 Modeling Object Flows from Distributed and Federated RFID Data Streams for Efficient Tracking and Tracing (Supplemental) Yanbo Wu, Quan Z. Sheng, Member, IEEE, Hong Shen, and Sherali Zeadally F 1 WHY CENTRALIZED MODELS DO NOT WORK If data from different nodes are stored in a centralized database using the schema above, the definitions themselves can answer the queries. Proper indices can improve the performance. However, there are still some significant performance issues in large scale systems such as IoT where the number of nodes and objects could be very large. We discuss below why a centralized solution does not scale by considering the characteristics of RFID data management. Frequent Updates. Data in IoT are generated as streams and update becomes a frequent operation comparing to traditional database management systems that are optimized for frequent-read scenarios. With a centralized database (or database clusters), frequent updates to the database not only increase the storage cost, but more importantly, cause high costs in index maintenance. For example, for streaming data, a B + tree index could be potentially huge. The space cost for the index itself is O(Num objects Length path ) where Num objects is the total number of objects in the whole network, Length path is the average number of nodes of paths that the object moves along. Row Level Security Requirement. Business applications often require high security. Even in a federated system, the partners want to control their own data. For example, a supermarket wants to hide the buying information about the same product from one supplier to another, while the suppliers can access the data related to their own products. This requires row level authentication and the authentication overhead for space is high. Archiving. RFID data, similar to other time series data, is sensitive to time. Recent records are more interesting to the analyzers than the distant ones. For the purpose of storage efficiency, it is often necessary Y. Wu is with Beijing Jiaotong University. ybwu@bjtu.edu.cn Q.Z. Sheng and H. Shen are with the University of Adelaide. S. Zeadally is with the University of the District of Columbia. EECi0 EECi1 EECi2 EECi3 EECi4 Fig. 1: Example of LTTF Slot to archive the old data in order to make room for the new data. However, different participants may have different definitions of oldness (i.e., when the data should be archived). As a result, when old records are archived, a range query on End with a selection on Node has to be performed. Furthermore, because the records for different nodes are likely to be stored in different pages, after deletion, the pages will have a lot of fragments. The rearrangement of records in these pages could be very costly and time consuming. Mining Efficiency. Stream mining techniques such as online aggregation, critical layers or popular path materialization all require a significant amount of memory in a central server. 2 AN EXAMPLE OF EEC SLOT Figure 1 shows an example structure of the slot s i in LTTF. Slot s i represents the data of the time interval [(2 i 1) T 0, (2 i+1 1) T 0 ). It is split into w s EECs. In this example, w s is 5. For each EEC, we maintain a histogram which models the distribution of data from/to a neighbor. The current size of a slot is defined as the number of used EECs in it, so s i.size apple w s. The reason why we split the slots further into EECs and maintain histograms for each of them, instead of maintaining one histogram for each slot, is because we want a finer control over the memory usage. By splitting the slots, we can limit the memory usage for each slot by changing

2 2 Algorithm : Merge the Model: merge Input: The slot which is being merged s i The model M Output: The merged model 1: if s i+1 does not exist in M 2: s i+1 new slot, M.append(s i+1 ) 3: end if 4: if s i+1 is full, merge(s i+1, M) 5: for each neighbor b j in s i 6: h i+1,j s i+1 [b j ] 7: if h i+1,j is nil 8: s i+1 [b j ] h i+1,j new array size of w s 9: end if 10: move h i+1,j [1] h i+1,j [w s /2] to h i+1,j [w s /2+1] h i+1,j [w s ] 11: for k 1 to w s /2 12: h i+1,j [k] s i [b j ][2 k]+s i [b j ][2 k +1] 13: end for 14: end for Fig. 2: Algorithm to Merge the LTTF Model nodes in the network (but objects are still moving from/to that node). In this case, the information of connections relevant to that node becomes unavailable. Secondly, a node n i and one of its source nodes n j may both be the direct source node of n k. We call this situation the Sideway Problem. Due to this, the algorithms in Section 5 of main file may fail to get the actual source node. As shown in Figure 3a, suppose object o moved from n j to n i, then from n i to n k. Both n i and n j are in the n k s business neighbor set, and for the past few event cycles, n j has more objects moving to n k. Since n j is the first to be queried, it will be treated as the source node of object o from which it moves to n k. Clearly, this is not correct because the source node should be n i. n k the value of w s. The larger w s is, the more memory the model requires, and the more accurate the model is. In an extreme case, when w s is 1, the slots are not split and the accuracy is the lowest. The height of the bars in the histogram of each EEC is the volume of objects flow from/to a neighbor for the time represented by that EEC. The x-axis of the histogram represents the neighbors. The length of the model is defined as the number of slots within it. A model of length l can store the history of w s w e P l 1 i=0 2i. We can calculate the length of the model which stores the history for the past t time units using: t l = log 2 ( + 1) = log w s w 2 ( t + 1) (1) e T 0 Suppose we define the width of an event cycle as an hour and for each slot, there are 24 EECs with in it, i.e., w e is an hour and w s is 24, we only need dlog 2 365e =9slots to store the history of one year. This efficiency in space enables the storage of these slots in main memory. 3 MERGING WITH THE NEXT SLOT When the new event cycle fills the most recent time slot s 0, the second most recent slot s 1 will be merged to the succeeding slots recursively. The merge algorithm in Figure 2 first checks whether the next slot in the model is full. If so, it will be merged (line 4). This process is done recursively until either a non-full slot is found or all the existing slots are full. In the latter case, a new slot is created and appended to the model. Then the slot being merged will be integrated into the first non-full slot (line 5 14). It is compressed by merging two consecutive EECs into one (line 12). Note that in this algorithm, we assume that the width of slots w s is even. This assumption does not affect the performance of the algorithm. 4 THE BUSINESS NEIGHBOR TREE There are two critical issues in real-life applications. Firstly, a node may go offline with or without notification to other n j n i n k n a n b (a) Sideway Problem n j n i n j (b) Source Tree Fig. 3: Sideway Problem and the Source Node Tree A source tree at a node n k is a subgraph of the whole network topology, with n k as the root, representing the topology of nodes that have either direct or indirect connections to n k. Figure 3b depicts the source tree maintained at n k in the sub-network shown in Figure 3a. The root of the tree is n k. Its child nodes are the direct source nodes of n k (i.e., n i, n j and n b in Figure 3b). The maintenance of the source tree is done simultaneously with the flow synopsis building. In the algorithm to gather the source node information, neighbors also include its source tree in the result set R. This source tree is then merged with the local source tree. In this way, the source tree is built recursively. This structure adds more connectivity to the network, which increases the stability of the system in the case that some nodes may leave. However, it does not require replication of any form. It also helps to solve the Sideway Problem. Essentially, the source tree gathers the nodes that are often on the same paths in the object flows. It does not store item-level information and statistics from other nodes. It is impossible for a node to get these information without permission from neighbors. As a result, although we stored a subgraph locally, we do not expose confidential information. Solving Node Leaving Problem. With the source node tree, when a neighbor n i leaves the network, we can query its direct source nodes S(n i ) to check whether an object comes from n i, because if an object comes from a node s n b n a

3 3 direct source node n i, it must also come from one of n i s direct source nodes. In the source node tree, the LTTFs for all the nodes in S(n i ) are maintained, together with the LTTF for n i itself. If an object comes from a node n a in S(n i ), it is used in the maintenance of the LTTFs for both n i and n a. In this way, even when n i is offline, its LTTF is kept accurate. We do not remove n i from the tree when it leaves. Instead, we mark it as disconnected. After it is back online, we remove the disconnected mark. This is useful to deal with temporary departure of nodes (e.g., short server breakdown). The LTTFs for nodes in S(n i ) are deleted because we no longer need them to build the flow synopsis. In this way, even when some nodes in the network leave, we can still respond to tracing queries and continue to build flow synopsis. If after a certain configurable time, n i is still offline, it will be removed from the tree and its LTTF will be deleted or archived. Solving Sideway Problem. To solve the Sideway problem, for the algorithms to answer tracing queries and building the flow synopsis, if a node has both direct and indirect connection to the root node, it will be queried regardless of the order of probabilities. For example, in Figure 3b, not only n j is the direct source node of the root node n k, but there is also an indirect connection (n j! n i! n k ) between n j and n k. n j is queried regardless of its order in the sorted neighbor list. After receiving the result set, we compare the arrival timestamps of the objects which are included in the result sets from more than one neighbors and select the node with the latest timestamp as the source node. For example, if the arrival timestamp for an object is earlier in n j than that in n i, it is clear that the object was sent through the path n j! n i! n k. 5 DETERMINING w s AND w e Generally, our model is a synopsis for the RFID stream. A generic description of such a data structure is that it supports two operations, update and computeanswer [1]. In our model, the update operation takes more time than computeanswer, because it involves network queries (See Section 6). More importantly, it may be so slow that the histograms for the past event cycle have not yet been constructed when it starts constructing the histograms for the current event cycle. In this case, the most recent histograms in slot s 0 is actually an old one. If this happens, the most valuable data is not used because it is not ready yet. Figure 4 illustrates such a situation. At the end of the second event cycle (EC2), the construction of histograms for EC1 has not been done (which will be done at w e + t u, where t u is the time used for update). To avoid this problem, w e should be large enough, i.e., w e should be larger than the maximum possible value of t u. But it should not be too large. If it is too large, the pattern of object flow may change during an event cycle. In this case, the change is not captured. t u is determined by the EC1 0 we 2we we + tu Fig. 4: When the Update Time Is Too Long size of the sample, the number of the business neighbors and the size of the whole network as discussed in Section 6. It is a dynamic value. A good way to determine w e would be to make it a function of the three parameters. However maintaining the size of the whole network is not an easy task, and is costly. In this work, we propose an adaptive approach. When a node joins the network, it sets w e to a default value. During the maintenance process of the model, if t 1 u becomes larger than w e or smaller than w e 2d, w e d t u (d > 1) (d is a constant between 1 and 2). In real applications, after the network becomes stable, t u will likely become stable too, so w e is almost a constant. Our algorithms, which are based on the assumption that w e is a constant, are not affected. To deal with the unstable phase of model construction, each event cycle in the model is parameterized with a timestamp t end for the end of the cycle. The maximum number of EECs in a slot (i.e., w s ) has the influence on the memory usage of the LTTF model. Generally, the memory usage of the model is O(l w s n) (l is the length of model and n is the number of neighbors). w s can be large, as long as the memory is sufficient. Larger w s ensures the model with a finer granularity. 6 PERFORMANCE ANALYSIS 6.1 Model Maintenance Cost The time complexity for maintaining the TISH model consists of three parts: i) sampling the input, ii) querying the source nodes, and iii) updating the model. Suppose t u is the time used to update the model for an event cycle, we have: t u = t sample + t query + t update (2) Using the reservoir algorithm, we only need to scan the input once, and this can be done on-the-fly. So the cost for this part is O(x), where x is the total number of objects in the current event cycle. For the worst case, the model update algorithms (Section 4 of the main file) will go through the whole model, with a complexity of O(l), where l is the number of slots in the model. It should be noted that we can avoid the movement of histograms shown in Figure 2 (line 10). l is often small. The most costly part is querying the source nodes, as the network latency is typically much longer than the 1. t u is an observable variable. EC2 EC3

4 4 B N 1 N 2 N 3 N 4 N B 0 1 ( =0) N 1 N 3 N 2 N 4 N B 0 2 ( =2) N 1 N 2 N 4 N 5 N B 0 3 ( =1) N 1 N 2 N 4 N 3 N Fig. 5: Example in Modeling Accuracy local processing time. Obviously, when querying the source nodes, the queries can be sent out simultaneously. However, the network traffic cost will be O(n) (n is the number of neighbors). This is not economical because building the flow synopsis is not necessary to be done in real time. Indeed, the TISH model is only updated when an event cycle ends, and the maintenance time t u is controlled to be less than an event cycle as discussed in Section 5. With the algorithm shown in Section 5 of the main file, we can save the network traffic cost by querying the nodes which have most possibility to be the source node first. Moreover, we can also avoid the underlying P2P queries when there are no new business neighbors joining. Even if there is one, after the first event cycle, the new neighbor will be in the LTTF model with high probability. As a result, the network cost is reduced. The accuracy of the model is very important for reducing network traffic cost to maintain the model. We represent the real distribution of objects source nodes during a specific time period as B, which is the neighbor list sorted in descending order by the real distributions (e.g., B in Figure 5). For example, in Figure 5, there are five possible source nodes (N 1 to N 5 ) with the possibilities of 50% for N 1, 30% for N 2, and 20% for N 3. The distributions modeled by the TISH model in the same period of time is denoted as B 0. Figure 5 depicted different scenarios for the accuracy of the TISH model. Suppose that, the m objects (the value of m does not matter in this discussion), which we want to query their source nodes, come from y different nodes. In this case, y apple m. In Figure 5, y is 3, because in the real distribution B, the m objects can only come from N 1, N 2 and N 3. In the best case scenario, y queries are needed. Because we have to query the first y nodes in order to find source nodes for all objects. B1 0 is one of the best scenarios. Although the order of N 2 and N 3 are not the same as B, 3 queries are still enough to find the source nodes. In general, the best scenario happens as long as the first y nodes in the model B 0 are the same as the first y nodes in the real distribution B, regardless of the order. In the worst case scenario, all the nodes in the neighbor list are queried. B2 0 is one of these cases. The neighbor N 3 which indeed is a source node, is ranked the last in B2, 0 so we have to query all the 5 neighbors to find the source nodes for all objects. In general, the worst case happens when there exists one node in the first y nodes of B being ranked the last in B 0. Suppose is the number of extra queries made in addition to the optimized value y, we can conclude that: = max y i=1 (B0.rank(B[i])) y (3) For example, in Figure 5, is 0 for B1 0 because all first 3 nodes in B are still ranked first 3 in B1, 0 thus max 3 i=1 (B0 1.rank(B[i])) is 3. For B2, 0 N 3 in first 3 of B is ranked 5 in B2, 0 thus max 3 i=1 (B0 2.rank(B[i])) is 5, so is 2 (i.e., 5-3). Similarly for B3, 0 N 3 is ranked the 4 th in B3, 0 so is 1. The goal of our model is to keep the average value of as small as possible. We will demonstrate through extensive simulation experiments (described in Section 7) that our proposed TISH model is very accurate in this sense. 6.2 Tracing and Tracking Cost The cost of tracking and tracing consists of two parts. If the initiating node of the query is not on the movement path of the object, the query should first be redirected to a certain node on the path. The cost of this step is the cost of the underlying P2P overlay lookup cost. However, in real applications, it is rare that the query is initiated by a third party. So in most cases, this part of the cost does not exist. The second part is the query cost. The total cost of a trace query (t trace ) is a function of the length of the object s moving path. In general, t trace = P l i=1 t i, where t i is the time used to establish the i th segment in the path and l is the length of the path. We assume that t i is proportional to the number of queries (q) made, so t i = q T where T is a constant representing normal remote query processing time. Clearly, the cost is determined by q. It is easy to infer that if the object comes from the j th neighbor in B, then j queries are needed to find it. Thus, the optimized average value of q is (in this section, we reuse the definition of notations in Section 6.1.): q = yx (j B[j].probability) (4) j=1 i.e., the order of the querying follows the actual order of the neighbors sorted by the probability that the object comes from them. The number of queries q 0 by using the TISH model is: yx q 0 = (B 0.rank(B[j]) B[j].probability) (5) j=1 Figure 6 shows an example of the optimized cost and the cost with the TISH model by using the examples in Figure 5. If the actual order (B) is followed when querying the source node, we only need, on average, =1.7 queries. But in a different order, for example, in B 0 1, the order of N 2 and N 3 is reverted. In this case, we have to use 0.5 B 0 1.rank(N 1 )+0.3 B 0 1.rank(N 2 )+0.2 B 0 1.rank(N 3 )= =1.8 queries. The goal of tracing query processing is to make q 0 as close to q as possible. We will show in Section 7 that the

5 5 Number of Queries B[j] probability B B1 0 B2 0 B3 0 N N N q or q Fig. 6: Examples of Tracing Efficiency Fig. 8: Number of Network Calls against c Fig. 7: Distribution of Number of Network Calls for Model Maintenance TISH model can make the difference less than 0.1, i.e., in most cases, B 0 contains the same neighbors with the same order as B. 7 SUPPLEMENTAL EXPERIMENTS 7.1 Model Maintenance Cost An interesting performance evaluation is to investigate how many network calls are used per query for the tracking and tracing algorithm, and the type of distribution we obtain. We illustrate the result in Figure 7, by using a histogram. We note that the majority of the queries use less than 10 (the average number of fan-out) network calls in order to maintain the TISH model. According to the analysis in Section 6.1, we expect that the average number of network calls for model maintenance is close to the effective fanouts, which is the average number of active (non-idle) connections for all the nodes (5 in our experiments). The average number of network calls in this experiment is 5.11, which is close to what we expect. 7.2 Effects of Constant c We are interested in how the constant c affects the performance of the model. In this experiment, we examined the performance of the model maintenance with various values of c from 1 to 5. It is clear that when c is big, the accuracy becomes worse. This is because by using larger c, more old data are included and they biases the model. So we limit the test to a reasonable range. We ran the tests with both Constant Patterns -only and Mixed Without Random settings as defined in Section 6 of the main file. Figure 8 shows the result for this experiment. With Mixed Without Random setting, the performance is the best when c is 3. This is because that when c is smaller than 3, only two of the most recent event cycles are used. The model becomes too sensitive to the changes of the patterns. The little bias in the sampling will be expanded in the model. When c is greater than 3, more historical data are used to calculate the probabilities. This makes the model not sensitive to the changes of the patterns. However, with Constant Pattern, when c is small (less than 4 in our result), it does not affect the performance. This is because the network topology does not change. The only dynamic change is the idle/active state switch of connections. If c is too large, the TISH model becomes insensitive to these dynamic changes and the performance becomes worse. 8 ADDITIONAL LITERATURE REVIEW 8.1 Data Structures and Data Transformation A raw RFID record is a triple tuple (ID, time, location) [2], which presents little value until it is transformed into a form suitable for applicationlevel interactions. In addition, such data has implicit meanings and associated relationships with other RFID records where applications have to make appropriate inferences [15]. In one of the earliest efforts on RFID data modeling, Wang et al. propose an RFID data model that abstracts static and dynamic entities including object, reader, location and transaction [16], [17]. It models the interaction between them as either state or event based relationships. The data model also provides a rule-based data filter engine. In [12], RFID applications are classified into a set of typical scenarios and a generalized data modeling framework with constructs for each typical scenario is proposed. In [6], Hu et al. focus on the path encoding by using a Bitmap data type and they demonstrated that significant storage savings can be achieved. In [7], a novel data structure and algorithms to efficiently encode, decode and query an object s moving path in an RFID database are proposed. The approach has been further improved very recently in [8] to cope with the situation where the moving path is long. Finally, Lin et al. propose in [11] a Multi-Table model in which path and containment relationships can be defined.

6 6 8.2 Knowledge Discovery For an RFID data warehousing approach proposed by Gonzalez et al. in [4], the authors observe that individual objects tend to move and stay together (i.e., bulky object movements in supply chain). They propose a novel model to compress RFID data without information loss. The model consists of a hierarchy of highly compact summaries of data aggregated at different abstraction levels where analysis takes place. These summaries are represented as RFIDcuboids. Each RFID-cuboid records object movements and stores product information for each RFID object, information on objects that stay together at a location, and path information necessary to link multiple stay records. This work is further improved in [3] by discovering the Gateway nodes that have either high fan-in or high fan-out edges. The RFID cuboids are created based on this discovery to save more space. This method is restricted to process static data in centralized environments. It is not suitable for processing RFID data streams. In [9], Lee and Park propose a dynamic tracing model for supply chain management, which considers not only the movements, but also the combination and splitting of RFID-attached objects. In [13], an algorithm to extract frequent sequential patterns from RFID data streams is proposed. The authors also introduce a decision tree model to determine the prime movements of tagged objects by using the extracted patterns. TMS-RFID [10] is a system to manage temporal RFID data, which extracts complex temporal event patterns over RFID streams. In [5], the authors propose a probability evaluation model and algorithms for moving range query processing. These models focus on different aspects of modeling RFID data, but they all have the same limitations as RFID-cuboid. SPIRE [14] is a novel RFID data interpretation and compression system. It extracts location and containment relationships over RFID streams by applying a probabilistic algorithm over a time-varying graph model. This model can be deployed in either centralized or distributed environments. However, it still requires full access to all the data. The BRIDGE project 2 is an implementation of the EPCglobal framework. It includes a probabilistic model to predict future location of RFID-attached objects using Markov Chain. However, this model is designed specifically for supply chain management systems. It requires synchronization with static supply chain model, which makes it inflexible. [3] H. Gonzalez, J. Han, H. Cheng, X. Li, D. Klabjan, and T. Wu. Modeling Massive RFID Data Sets: A Gateway-Based Movement Graph Approach. TKDE, 22:90 104, [4] H. Gonzalez, J. Han, X. Li, and D. Klabjan. Warehousing and Analyzing Massive RFID Data Sets. In Proceedings of the 22nd International Conference on Data Engineering (ICDE 06), Atlanta, Georgia, USA, April [5] Y. Gu, G. Yu, N. Guo, and Y. Chen. Probabilistic Moving Range Query over RFID Spatio-temporal Data Streams. In Proceeding of the 18th ACM Conference on Information and Knowledge Management (CIKM 09), Hong Kong, China, [6] Y. Hu, S. Sundara, T. Chorma, and J. Srinivasan. Supporting RFID- Based Item Tracking Applications in Oracle DBMS Using a Bitmap Datatype. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB 05), Trondheim, Norway, September [7] C.-H. Lee and C.-W. Chung. Efficient Storage Scheme and Query Processing for Supply Chain Management Using RFID. In Proceedings of the 2008 ACM International Conference on Management of Data (SIGMOD 08), Vancouver, Canada, [8] C.-H. Lee and C.-W. Chung. RFID Data Processing in Supply Chain Management Using a Path Encoding Scheme. IEEE Transactions on Knowledge and Data Engineering, 23(5): , [9] D. Lee and J. Park. RFID-based Traceability in the Supply Chain. Industrial Management and Data Systems, 108(6): , [10] X. Li, J. Liu, Q. Z. Sheng, S. Zeadally, and W. Zhong. TMS- RFID: Temporal Management of Large-scale RFID Applications. Information Systems Frontiers, 13(4): , [11] D. Lin, H. G. Elmongui, E. Bertino, and B. C. Ooi. Data Management in RFID Applications. In Proceedings of the 18th International Conference on Database and Expert Systems Applications (DEXA 07), Regensburg, Germany, [12] S. Liu, F. Wang, and P. Liu. Integrated RFID Data Modeling: An Approach for Querying Physical Objects in Pervasive Computing. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Arlington, Virginia, USA, [13] T. Nakahara, T. Uno, and K. Yada. Extracting Promising Sequential Patterns from RFID Data Using the LCM Sequence. In Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (KES 10), Cardiff, UK, [14] Y. Nie, R. Cocci, Z. Cao, Y. Diao, and P. Shenoy. SPIRE: Efficient Data Interpretation and Compression over RFID Streams. IEEE Transactions on Knowledge and Data Engineering, 24(1): , [15] Q. Z. Sheng, X. Li, and S. Zeadally. Enabling Next-Generation RFID Applications: Solutions and Challenges. IEEE Computer, 41(9):21 28, September [16] F. Wang and P. Liu. Temporal Management of RFID Data. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB 05), Trondheim, Norway, September [17] F. Wang, S. Liu, and P. Liu. A Temporal RFID Data Model for Querying Physical Objects. Pervasive and Mobile Computing, 6(3): , REFERENCES [1] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and Issues in Data Stream Systems. In Proceedings of the 21th ACM Symposium on Principles of Database Systems (PODS 02), Madison, Wisconsin, [2] S. S. Chawathe, V. Krishnamurthy, S. Ramachandran, and S. Sarma. Managing RFID Data. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 04), Toronto, Canada, September

A Highly Accurate Method for Managing Missing Reads in RFID Enabled Asset Tracking

A Highly Accurate Method for Managing Missing Reads in RFID Enabled Asset Tracking A Highly Accurate Method for Managing Missing Reads in RFID Enabled Asset Tracking Rengamathi Sankarkumar (B), Damith Ranasinghe, and Thuraiappah Sathyan Auto-ID Lab, The School of Computer Science, University

More information

M. P. Ravikanth et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 3 (3), 2012,

M. P. Ravikanth et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 3 (3), 2012, An Adaptive Representation of RFID Data Sets Based on Movement Graph Model M. P. Ravikanth, A. K. Rout CSE Department, GMR Institute of Technology, JNTU Kakinada, Rajam Abstract Radio Frequency Identification

More information

Architecture and Implementation of a Content-based Data Dissemination System

Architecture and Implementation of a Content-based Data Dissemination System Architecture and Implementation of a Content-based Data Dissemination System Austin Park Brown University austinp@cs.brown.edu ABSTRACT SemCast is a content-based dissemination model for large-scale data

More information

Modeling Massive RFID Datasets: A Gateway-Based Movement-Graph Approach

Modeling Massive RFID Datasets: A Gateway-Based Movement-Graph Approach 1 Modeling Massive RFID Datasets: A Gateway-Based Movement-Graph Approach Hector Gonzalez, Jiawei Han, Hong Cheng, Xiaolei Li, Diego Klabjan Department of Computer Science University of Illinois at Urbana-Champaign

More information

RFID Intelligence. Tiago Peralta.

RFID Intelligence. Tiago Peralta. RFID Intelligence Tiago Peralta 50267 tiago.peralta@tagus.ist.utl.pt Abstract. The constant search for a greater visibility of product movement in the organizations supply chains as turned Radio Frequency

More information

VisoLink: A User-Centric Social Relationship Mining

VisoLink: A User-Centric Social Relationship Mining VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database Algorithm Based on Decomposition of the Transaction Database 1 School of Management Science and Engineering, Shandong Normal University,Jinan, 250014,China E-mail:459132653@qq.com Fei Wei 2 School of Management

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Huayu Wu Institute for Infocomm Research, A*STAR, Singapore huwu@i2r.a-star.edu.sg Abstract. Processing XML queries over

More information

Early Measurements of a Cluster-based Architecture for P2P Systems

Early Measurements of a Cluster-based Architecture for P2P Systems Early Measurements of a Cluster-based Architecture for P2P Systems Balachander Krishnamurthy, Jia Wang, Yinglian Xie I. INTRODUCTION Peer-to-peer applications such as Napster [4], Freenet [1], and Gnutella

More information

Detect tracking behavior among trajectory data

Detect tracking behavior among trajectory data Detect tracking behavior among trajectory data Jianqiu Xu, Jiangang Zhou Nanjing University of Aeronautics and Astronautics, China, jianqiu@nuaa.edu.cn, jiangangzhou@nuaa.edu.cn Abstract. Due to the continuing

More information

Code Transformation of DF-Expression between Bintree and Quadtree

Code Transformation of DF-Expression between Bintree and Quadtree Code Transformation of DF-Expression between Bintree and Quadtree Chin-Chen Chang*, Chien-Fa Li*, and Yu-Chen Hu** *Department of Computer Science and Information Engineering, National Chung Cheng University

More information

Deep Web Content Mining

Deep Web Content Mining Deep Web Content Mining Shohreh Ajoudanian, and Mohammad Davarpanah Jazi Abstract The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased

More information

Improving Range Query Performance on Historic Web Page Data

Improving Range Query Performance on Historic Web Page Data Improving Range Query Performance on Historic Web Page Data Geng LI Lab of Computer Networks and Distributed Systems, Peking University Beijing, China ligeng@net.pku.edu.cn Bo Peng Lab of Computer Networks

More information

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Video Inter-frame Forgery Identification Based on Optical Flow Consistency Sensors & Transducers 24 by IFSA Publishing, S. L. http://www.sensorsportal.com Video Inter-frame Forgery Identification Based on Optical Flow Consistency Qi Wang, Zhaohong Li, Zhenzhen Zhang, Qinglong

More information

Full-Text and Structural XML Indexing on B + -Tree

Full-Text and Structural XML Indexing on B + -Tree Full-Text and Structural XML Indexing on B + -Tree Toshiyuki Shimizu 1 and Masatoshi Yoshikawa 2 1 Graduate School of Information Science, Nagoya University shimizu@dl.itc.nagoya-u.ac.jp 2 Information

More information

Space-Efficient Support for Temporal Text Indexing in a Document Archive Context

Space-Efficient Support for Temporal Text Indexing in a Document Archive Context Space-Efficient Support for Temporal Text Indexing in a Document Archive Context Kjetil Nørvåg Department of Computer and Information Science Norwegian University of Science and Technology 7491 Trondheim,

More information

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)

More information

Compression of the Stream Array Data Structure

Compression of the Stream Array Data Structure Compression of the Stream Array Data Structure Radim Bača and Martin Pawlas Department of Computer Science, Technical University of Ostrava Czech Republic {radim.baca,martin.pawlas}@vsb.cz Abstract. In

More information

Taccumulation of the social network data has raised

Taccumulation of the social network data has raised International Journal of Advanced Research in Social Sciences, Environmental Studies & Technology Hard Print: 2536-6505 Online: 2536-6513 September, 2016 Vol. 2, No. 1 Review Social Network Analysis and

More information

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

Design and Implementation of A P2P Cooperative Proxy Cache System

Design and Implementation of A P2P Cooperative Proxy Cache System Design and Implementation of A PP Cooperative Proxy Cache System James Z. Wang Vipul Bhulawala Department of Computer Science Clemson University, Box 40974 Clemson, SC 94-0974, USA +1-84--778 {jzwang,

More information

Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving Multi-documents

Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving Multi-documents Send Orders for Reprints to reprints@benthamscience.ae 676 The Open Automation and Control Systems Journal, 2014, 6, 676-683 Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving

More information

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

Comprehensive Guide to Evaluating Event Stream Processing Engines

Comprehensive Guide to Evaluating Event Stream Processing Engines Comprehensive Guide to Evaluating Event Stream Processing Engines i Copyright 2006 Coral8, Inc. All rights reserved worldwide. Worldwide Headquarters: Coral8, Inc. 82 Pioneer Way, Suite 106 Mountain View,

More information

Clustering from Data Streams

Clustering from Data Streams Clustering from Data Streams João Gama LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 Introduction 2 Clustering Micro Clustering 3 Clustering Time Series Growing the Structure Adapting

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

Random Sampling over Data Streams for Sequential Pattern Mining

Random Sampling over Data Streams for Sequential Pattern Mining Random Sampling over Data Streams for Sequential Pattern Mining Chedy Raïssi LIRMM, EMA-LGI2P/Site EERIE 161 rue Ada 34392 Montpellier Cedex 5, France France raissi@lirmm.fr Pascal Poncelet EMA-LGI2P/Site

More information

Mining Frequent Itemsets from Data Streams with a Time- Sensitive Sliding Window

Mining Frequent Itemsets from Data Streams with a Time- Sensitive Sliding Window Mining Frequent Itemsets from Data Streams with a Time- Sensitive Sliding Window Chih-Hsiang Lin, Ding-Ying Chiu, Yi-Hung Wu Department of Computer Science National Tsing Hua University Arbee L.P. Chen

More information

What is a multi-model database and why use it?

What is a multi-model database and why use it? What is a multi-model database and why use it? An When it comes to choosing the right technology for a new project, ongoing development or a full system upgrade, it can often be challenging to define the

More information

SDC: A Distributed Clustering Protocol for Peer-to-Peer Networks

SDC: A Distributed Clustering Protocol for Peer-to-Peer Networks SDC: A Distributed Clustering Protocol for Peer-to-Peer Networks Yan Li 1, Li Lao 2, and Jun-Hong Cui 1 1 Computer Science & Engineering Dept., University of Connecticut, CT 06029 2 Computer Science Dept.,

More information

A Framework for Clustering Massive Text and Categorical Data Streams

A Framework for Clustering Massive Text and Categorical Data Streams A Framework for Clustering Massive Text and Categorical Data Streams Charu C. Aggarwal IBM T. J. Watson Research Center charu@us.ibm.com Philip S. Yu IBM T. J.Watson Research Center psyu@us.ibm.com Abstract

More information

A Highly Accurate and Scalable Approach for Addressing Location Uncertainty in Asset Tracking Applications

A Highly Accurate and Scalable Approach for Addressing Location Uncertainty in Asset Tracking Applications A Highly Accurate and Scalable Approach for Addressing Location Uncertainty in Asset Tracking Applications Rengamathi Sankarkumar, Damith C. Ranasinghe Auto-ID Labs, The University of Adelaide, Adelaide

More information

Answering Aggregate Queries Over Large RDF Graphs

Answering Aggregate Queries Over Large RDF Graphs 1 Answering Aggregate Queries Over Large RDF Graphs Lei Zou, Peking University Ruizhe Huang, Peking University Lei Chen, Hong Kong University of Science and Technology M. Tamer Özsu, University of Waterloo

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

Data Hiding on Text Using Big-5 Code

Data Hiding on Text Using Big-5 Code Data Hiding on Text Using Big-5 Code Jun-Chou Chuang 1 and Yu-Chen Hu 2 1 Department of Computer Science and Communication Engineering Providence University 200 Chung-Chi Rd., Shalu, Taichung 43301, Republic

More information

A RFID Spatio-temporal Data Model Oriented Supply Chain

A RFID Spatio-temporal Data Model Oriented Supply Chain Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2015, 7, 31-41 31 A RFID Spatio-temporal Data Model Oriented Supply Chain Open Access Wang Yonghui

More information

Proxy Server Systems Improvement Using Frequent Itemset Pattern-Based Techniques

Proxy Server Systems Improvement Using Frequent Itemset Pattern-Based Techniques Proceedings of the 2nd International Conference on Intelligent Systems and Image Processing 2014 Proxy Systems Improvement Using Frequent Itemset Pattern-Based Techniques Saranyoo Butkote *, Jiratta Phuboon-op,

More information

Mining Temporal Association Rules in Network Traffic Data

Mining Temporal Association Rules in Network Traffic Data Mining Temporal Association Rules in Network Traffic Data Guojun Mao Abstract Mining association rules is one of the most important and popular task in data mining. Current researches focus on discovering

More information

Bitmap index-based decision trees

Bitmap index-based decision trees Bitmap index-based decision trees Cécile Favre and Fadila Bentayeb ERIC - Université Lumière Lyon 2, Bâtiment L, 5 avenue Pierre Mendès-France 69676 BRON Cedex FRANCE {cfavre, bentayeb}@eric.univ-lyon2.fr

More information

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018 Spatiotemporal Access to Moving Objects Hao LIU, Xu GENG 17/04/2018 Contents Overview & applications Spatiotemporal queries Movingobjects modeling Sampled locations Linear function of time Indexing structure

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Preliminary Research on Distributed Cluster Monitoring of G/S Model

Preliminary Research on Distributed Cluster Monitoring of G/S Model Available online at www.sciencedirect.com Physics Procedia 25 (2012 ) 860 867 2012 International Conference on Solid State Devices and Materials Science Preliminary Research on Distributed Cluster Monitoring

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25 Indexing Jan Chomicki University at Buffalo Jan Chomicki () Indexing 1 / 25 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow (nanosec) (10 nanosec) (millisec) (sec) Very small Small

More information

XML Data Stream Processing: Extensions to YFilter

XML Data Stream Processing: Extensions to YFilter XML Data Stream Processing: Extensions to YFilter Shaolei Feng and Giridhar Kumaran January 31, 2007 Abstract Running XPath queries on XML data steams is a challenge. Current approaches that store the

More information

Draft Notes 1 : Scaling in Ad hoc Routing Protocols

Draft Notes 1 : Scaling in Ad hoc Routing Protocols Draft Notes 1 : Scaling in Ad hoc Routing Protocols Timothy X Brown University of Colorado April 2, 2008 2 Introduction What is the best network wireless network routing protocol? This question is a function

More information

Small-World Overlay P2P Networks: Construction and Handling Dynamic Flash Crowd

Small-World Overlay P2P Networks: Construction and Handling Dynamic Flash Crowd Small-World Overlay P2P Networks: Construction and Handling Dynamic Flash Crowd Ken Y.K. Hui John C. S. Lui David K.Y. Yau Dept. of Computer Science & Engineering Computer Science Department The Chinese

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2

Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2 International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1,

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad Hyderabad,

More information

An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment

An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment Navneet Goyal, Poonam Goyal, K Venkatramaiah, Deepak P C, and Sanoop P S Department of Computer Science & Information

More information

Web-based Energy-efficient Cache Invalidation in Wireless Mobile Environment

Web-based Energy-efficient Cache Invalidation in Wireless Mobile Environment Web-based Energy-efficient Cache Invalidation in Wireless Mobile Environment Y.-K. Chang, M.-H. Hong, and Y.-W. Ting Dept. of Computer Science & Information Engineering, National Cheng Kung University

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Towards New Heterogeneous Data Stream Clustering based on Density

Towards New Heterogeneous Data Stream Clustering based on Density , pp.30-35 http://dx.doi.org/10.14257/astl.2015.83.07 Towards New Heterogeneous Data Stream Clustering based on Density Chen Jin-yin, He Hui-hao Zhejiang University of Technology, Hangzhou,310000 chenjinyin@zjut.edu.cn

More information

Filters for XML-based Service Discovery in Pervasive Computing

Filters for XML-based Service Discovery in Pervasive Computing Filters for XML-based Service Discovery in Pervasive Computing Georgia Koloniari and Evaggelia Pitoura Department of Computer Science, University of Ioannina, GR45110 Ioannina, Greece {kgeorgia, pitoura}@cs.uoi.gr

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

A Schema Extraction Algorithm for External Memory Graphs Based on Novel Utility Function

A Schema Extraction Algorithm for External Memory Graphs Based on Novel Utility Function DEIM Forum 2018 I5-5 Abstract A Schema Extraction Algorithm for External Memory Graphs Based on Novel Utility Function Yoshiki SEKINE and Nobutaka SUZUKI Graduate School of Library, Information and Media

More information

Max-Count Aggregation Estimation for Moving Points

Max-Count Aggregation Estimation for Moving Points Max-Count Aggregation Estimation for Moving Points Yi Chen Peter Revesz Dept. of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA Abstract Many interesting problems

More information

The Research on Coding Scheme of Binary-Tree for XML

The Research on Coding Scheme of Binary-Tree for XML Available online at www.sciencedirect.com Procedia Engineering 24 (2011 ) 861 865 2011 International Conference on Advances in Engineering The Research on Coding Scheme of Binary-Tree for XML Xiao Ke *

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

An Efficient Algorithm for finding high utility itemsets from online sell

An Efficient Algorithm for finding high utility itemsets from online sell An Efficient Algorithm for finding high utility itemsets from online sell Sarode Nutan S, Kothavle Suhas R 1 Department of Computer Engineering, ICOER, Maharashtra, India 2 Department of Computer Engineering,

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Warehousing and Analyzing Massive RFID Data Sets

Warehousing and Analyzing Massive RFID Data Sets Warehousing and Analyzing Massive RFID Data Sets Hector Gonzalez Jiawei Han Xiaolei Li Diego Klabjan University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA {hagonzal, hanj, xli10, klabjan}@uiuc.edu

More information

Accelerating XML Structural Matching Using Suffix Bitmaps

Accelerating XML Structural Matching Using Suffix Bitmaps Accelerating XML Structural Matching Using Suffix Bitmaps Feng Shao, Gang Chen, and Jinxiang Dong Dept. of Computer Science, Zhejiang University, Hangzhou, P.R. China microf_shao@msn.com, cg@zju.edu.cn,

More information

Mobility Data Management and Exploration: Theory and Practice

Mobility Data Management and Exploration: Theory and Practice Mobility Data Management and Exploration: Theory and Practice Chapter 4 -Mobility data management at the physical level Nikos Pelekis & Yannis Theodoridis InfoLab, University of Piraeus, Greece infolab.cs.unipi.gr

More information

An Improved Timestamp-Based Password Authentication Scheme Using Smart Cards

An Improved Timestamp-Based Password Authentication Scheme Using Smart Cards An Improved Timestamp-Based Password Authentication Scheme Using Smart Cards Al-Sakib Khan Pathan and Choong Seon Hong Department of Computer Engineering, Kyung Hee University, Korea spathan@networking.khu.ac.kr

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6) CompSci 16 Intensive Computing Systems Lecture 7 Storage and Index Instructor: Sudeepa Roy Announcements HW1 deadline this week: Due on 09/21 (Thurs), 11: pm, no late days Project proposal deadline: Preliminary

More information

Handout 12 Data Warehousing and Analytics.

Handout 12 Data Warehousing and Analytics. Handout 12 CS-605 Spring 17 Page 1 of 6 Handout 12 Data Warehousing and Analytics. Operational (aka transactional) system a system that is used to run a business in real time, based on current data; also

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part IV Lecture 14, March 10, 015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part III Tree-based indexes: ISAM and B+ trees Data Warehousing/

More information

Intelligent Caching in Data Virtualization Recommended Use of Caching Controls in the Denodo Platform

Intelligent Caching in Data Virtualization Recommended Use of Caching Controls in the Denodo Platform Data Virtualization Intelligent Caching in Data Virtualization Recommended Use of Caching Controls in the Denodo Platform Introduction Caching is one of the most important capabilities of a Data Virtualization

More information

Mobile Sink to Track Multiple Targets in Wireless Visual Sensor Networks

Mobile Sink to Track Multiple Targets in Wireless Visual Sensor Networks Mobile Sink to Track Multiple Targets in Wireless Visual Sensor Networks William Shaw 1, Yifeng He 1, and Ivan Lee 1,2 1 Department of Electrical and Computer Engineering, Ryerson University, Toronto,

More information

An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs

An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs Jin Liu 1, Hongmin Ren 1, Jun Wang 2, Jin Wang 2 1 College of Information Engineering, Shanghai Maritime University,

More information

Mining Association Rules in Temporal Document Collections

Mining Association Rules in Temporal Document Collections Mining Association Rules in Temporal Document Collections Kjetil Nørvåg, Trond Øivind Eriksen, and Kjell-Inge Skogstad Dept. of Computer and Information Science, NTNU 7491 Trondheim, Norway Abstract. In

More information

Mining Recent Frequent Itemsets in Data Streams with Optimistic Pruning

Mining Recent Frequent Itemsets in Data Streams with Optimistic Pruning Mining Recent Frequent Itemsets in Data Streams with Optimistic Pruning Kun Li 1,2, Yongyan Wang 1, Manzoor Elahi 1,2, Xin Li 3, and Hongan Wang 1 1 Institute of Software, Chinese Academy of Sciences,

More information

Physical Database Design: Outline

Physical Database Design: Outline Physical Database Design: Outline File Organization Fixed size records Variable size records Mapping Records to Files Heap Sequentially Hashing Clustered Buffer Management Indexes (Trees and Hashing) Single-level

More information

Design and Simulation Implementation of an Improved PPM Approach

Design and Simulation Implementation of an Improved PPM Approach I.J. Wireless and Microwave Technologies, 2012, 6, 1-9 Published Online December 2012 in MECS (http://www.mecs-press.net) DOI: 10.5815/ijwmt.2012.06.01 Available online at http://www.mecs-press.net/ijwmt

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 26 Enhanced Data Models: Introduction to Active, Temporal, Spatial, Multimedia, and Deductive Databases 26.1 Active Database Concepts and Triggers Database systems implement rules that specify

More information

International Journal of Advance Engineering and Research Development. Performance Enhancement of Search System

International Journal of Advance Engineering and Research Development. Performance Enhancement of Search System Scientific Journal of Impact Factor(SJIF): 3.134 International Journal of Advance Engineering and Research Development Volume 2,Issue 7, July -2015 Performance Enhancement of Search System Ms. Uma P Nalawade

More information

Comparative Evaluation of Probabilistic and Deterministic Tag Anti-collision Protocols for RFID Networks

Comparative Evaluation of Probabilistic and Deterministic Tag Anti-collision Protocols for RFID Networks Comparative Evaluation of Probabilistic and Deterministic Tag Anti-collision Protocols for RFID Networks Jihoon Choi and Wonjun Lee Division of Computer and Communication Engineering College of Information

More information

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University,

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Probabilistic Graph Summarization

Probabilistic Graph Summarization Probabilistic Graph Summarization Nasrin Hassanlou, Maryam Shoaran, and Alex Thomo University of Victoria, Victoria, Canada {hassanlou,maryam,thomo}@cs.uvic.ca 1 Abstract We study group-summarization of

More information

An Algorithm for Mining Frequent Itemsets from Library Big Data

An Algorithm for Mining Frequent Itemsets from Library Big Data JOURNAL OF SOFTWARE, VOL. 9, NO. 9, SEPTEMBER 2014 2361 An Algorithm for Mining Frequent Itemsets from Library Big Data Xingjian Li lixingjianny@163.com Library, Nanyang Institute of Technology, Nanyang,

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Where Next? Data Mining Techniques and Challenges for Trajectory Prediction. Slides credit: Layla Pournajaf

Where Next? Data Mining Techniques and Challenges for Trajectory Prediction. Slides credit: Layla Pournajaf Where Next? Data Mining Techniques and Challenges for Trajectory Prediction Slides credit: Layla Pournajaf o Navigational services. o Traffic management. o Location-based advertising. Source: A. Monreale,

More information

An approach to the model-based fragmentation and relational storage of XML-documents

An approach to the model-based fragmentation and relational storage of XML-documents An approach to the model-based fragmentation and relational storage of XML-documents Christian Süß Fakultät für Mathematik und Informatik, Universität Passau, D-94030 Passau, Germany Abstract A flexible

More information

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2017) Vol. 6 (3) 213 222 USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS PIOTR OŻDŻYŃSKI, DANUTA ZAKRZEWSKA Institute of Information

More information

Estimation of Network Partition Problem in Mobile Ad hoc Network

Estimation of Network Partition Problem in Mobile Ad hoc Network Estimation of Network Partition Problem in Mobile Ad hoc Network Varaprasad Golla B.M.S.College of Engineering Bangalore, India varaprasad555555@yahoo.co.in ABSTRACT: A node cannot communicate with others

More information

Master s Thesis. A Construction Method of an Overlay Network for Scalable P2P Video Conferencing Systems

Master s Thesis. A Construction Method of an Overlay Network for Scalable P2P Video Conferencing Systems Master s Thesis Title A Construction Method of an Overlay Network for Scalable P2P Video Conferencing Systems Supervisor Professor Masayuki Murata Author Hideto Horiuchi February 14th, 2007 Department

More information

The Vagabond Temporal OID Index: An Index Structure for OID Indexing in Temporal Object Database Systems

The Vagabond Temporal OID Index: An Index Structure for OID Indexing in Temporal Object Database Systems The Vagabond Temporal OID Index: An Index Structure for OID Indexing in Temporal Object Database Systems Kjetil Nørvåg Department of Computer and Information Science Norwegian University of Science and

More information

SQL Query Optimization on Cross Nodes for Distributed System

SQL Query Optimization on Cross Nodes for Distributed System 2016 International Conference on Power, Energy Engineering and Management (PEEM 2016) ISBN: 978-1-60595-324-3 SQL Query Optimization on Cross Nodes for Distributed System Feng ZHAO 1, Qiao SUN 1, Yan-bin

More information

City, University of London Institutional Repository

City, University of London Institutional Repository City Research Online City, University of London Institutional Repository Citation: Andrienko, N., Andrienko, G., Fuchs, G., Rinzivillo, S. & Betz, H-D. (2015). Real Time Detection and Tracking of Spatial

More information

COOCHING: Cooperative Prefetching Strategy for P2P Video-on-Demand System

COOCHING: Cooperative Prefetching Strategy for P2P Video-on-Demand System COOCHING: Cooperative Prefetching Strategy for P2P Video-on-Demand System Ubaid Abbasi and Toufik Ahmed CNRS abri ab. University of Bordeaux 1 351 Cours de la ibération, Talence Cedex 33405 France {abbasi,

More information

Computer Based Image Algorithm For Wireless Sensor Networks To Prevent Hotspot Locating Attack

Computer Based Image Algorithm For Wireless Sensor Networks To Prevent Hotspot Locating Attack Computer Based Image Algorithm For Wireless Sensor Networks To Prevent Hotspot Locating Attack J.Anbu selvan 1, P.Bharat 2, S.Mathiyalagan 3 J.Anand 4 1, 2, 3, 4 PG Scholar, BIT, Sathyamangalam ABSTRACT:

More information