Mining Sensor Streams for Discovering Human Activity Patterns Over Time

Similar documents
Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Online Appendix to: Generalizing Database Forensics

Comparison of Methods for Increasing the Performance of a DUA Computation

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

Coupling the User Interfaces of a Multiuser Program

Software Reliability Modeling and Cost Estimation Incorporating Testing-Effort and Efficiency

Home to Home Transfer Learning

Skyline Community Search in Multi-valued Networks

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization

AnyTraffic Labeled Routing

Study of Network Optimization Method Based on ACL

Message Transport With The User Datagram Protocol

A Plane Tracker for AEC-automation Applications

Adaptive Load Balancing based on IP Fast Reroute to Avoid Congestion Hot-spots

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Non-homogeneous Generalization in Privacy Preserving Data Publishing

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition

A shortest path algorithm in multimodal networks: a case study with time varying costs

Image Segmentation using K-means clustering and Thresholding

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

Loop Scheduling and Partitions for Hiding Memory Latencies

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides

Learning Polynomial Functions. by Feature Construction

THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM

A Framework for Dialogue Detection in Movies

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Improving Spatial Reuse of IEEE Based Ad Hoc Networks

d 3 d 4 d d d d d d d d d d d 1 d d d d d d

Change Patterns and Change Support Features in Process-Aware Information Systems

Top-down Connectivity Policy Framework for Mobile Peer-to-Peer Applications

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises

On-path Cloudlet Pricing for Low Latency Application Provisioning

Mining Sequential Patterns with Periodic Wildcard Gaps

Design of Policy-Aware Differentially Private Algorithms

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 4, APRIL

Politehnica University of Timisoara Mobile Computing, Sensors Network and Embedded Systems Laboratory. Testing Techniques

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Shift-map Image Registration

NAND flash memory is widely used as a storage

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing

Fast Fractal Image Compression using PSO Based Optimization Techniques

Computer Organization

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace

Fuzzy Learning Variable Admittance Control for Human-Robot Cooperation

Estimating Velocity Fields on a Freeway from Low Resolution Video

Enabling Rollback Support in IT Change Management Systems

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics

Evolutionary Optimisation Methods for Template Based Image Registration

Feature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory

Optimal Routing and Scheduling for Deterministic Delay Tolerant Networks

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Rough Set Approach for Classification of Breast Cancer Mammogram Images

HOW DO SECURITY TECHNOLOGIES INTERACT WITH EACH OTHER TO CREATE VALUE? THE ANALYSIS OF FIREWALL AND INTRUSION DETECTION SYSTEM

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation

Fuzzy Clustering in Parallel Universes

Research Article REALFLOW: Reliable Real-Time Flooding-Based Routing Protocol for Industrial Wireless Sensor Networks

Overlap Interval Partition Join

Using the disparity space to compute occupancy grids from stereo-vision

Handling missing values in kernel methods with application to microbiology data

EDOVE: Energy and Depth Variance-Based Opportunistic Void Avoidance Scheme for Underwater Acoustic Sensor Networks

A multiple wavelength unwrapping algorithm for digital fringe profilometry based on spatial shift estimation

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly

Local Path Planning with Proximity Sensing for Robot Arm Manipulators. 1. Introduction

Optimal Oblivious Path Selection on the Mesh

Table-based division by small integer constants

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset

Dense Disparity Estimation in Ego-motion Reduced Search Space

Frequency Domain Parameter Estimation of a Synchronous Generator Using Bi-objective Genetic Algorithms

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation

Inuence of Cross-Interferences on Blocked Loops: to know the precise gain brought by blocking. It is even dicult to determine for which problem

Politecnico di Torino. Porto Institutional Repository

Estimation of large-amplitude motion and disparity fields: Application to intermediate view reconstruction

A Convex Clustering-based Regularizer for Image Segmentation

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters

Domain Selection and Adaptation in Smart Homes

Architecture Design of Mobile Access Coordinated Wireless Sensor Networks

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract

6 Gradient Descent. 6.1 Functions

Questions? Post on piazza, or Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)!

Modifying ROC Curves to Incorporate Predicted Probabilities

Research Article Research on Law s Mask Texture Analysis System Reliability

Data Mining: Clustering

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway

Algorithm for Intermodal Optimal Multidestination Tour with Dynamic Travel Times

6.823 Computer System Architecture. Problem Set #3 Spring 2002

Multi-camera tracking algorithm study based on information fusion

Supporting Fully Adaptive Routing in InfiniBand Networks

Adjacency Matrix Based Full-Text Indexing Models

Socially-optimal ISP-aware P2P Content Distribution via a Primal-Dual Approach

Using Vector and Raster-Based Techniques in Categorical Map Generalization

On the Placement of Internet Taps in Wireless Neighborhood Networks

Research Article Inviscid Uniform Shear Flow past a Smooth Concave Body

An Adaptive Routing Algorithm for Communication Networks using Back Pressure Technique

Transcription:

Mining Sensor Streams for Discovering Human Activity Patterns Over Time Parisa Rashii CS Department ashington State University Pullman, US mail: prashii@wsu.eu Diane J. Cook CS Department ashington State University Pullman, US mail: cook@eecs.wsu.eu Abstract In recent years, new emerging application omains have introuce new constraints an methos in ata mining fiel. One of such application omains is activity iscovery from sensor ata. Activity iscovery an recognition plays an important role in a wie range of applications from assiste living to security an surveillance. Most of the current approaches for activity iscovery assume a static moel of the activities an ignore the problem of mining an iscovering activities from a ata stream over time. Inspire by the unique requirements of activity iscovery application omain, in this paper we propose a new stream mining metho for fining sequential patterns over time from streaming non-transaction ata using multiple time granularities. Our algorithm is able to fin sequential patterns, even if the patterns exhibit iscontinuities (interruptions) or variations in the sequence orer. Our algorithm also aresses the problem of ealing with rare events across space an over time. e valiate the results of our algorithms using ata collecte from two ifferent smart apartments. Inex Terms Activity Data Mining; Smart nvironments; Sensor Data; Stream Sequence Mining; I. INTRODUCTI In recent years, ifferent emerging application omains have introuce new constraints an methos for ata mining. One such application omain is activity iscovery an recognition in smart environments. Due to the increasing aging population an the rising eman for in-place monitoring, smart environments have attracte many researchers from ifferent isciplines. A smart environment refers to an environment which is equippe with ifferent types of sensors, such as infrare motion sensors, contact switch sensors, RFID tags, power-line controllers, etc. The low level sensor ata obtaine from ifferent sensors is mine an analyze to etect resients activity patterns. Recognizing resients activities an their aily routines can greatly help in proviing automation, security, an more importantly in remote health monitoring of elerly, or people with isabilities. For example, by monitoring the aily routines of a person with ementia, the system can track how completely an consistently the aily routines are performe. It also can etermine when the resient nees assistance or raise an alarm if neee. A variety of supervise methos have alreay been propose for human activity recognition, e.g. neural networks [1], naive Bayes [2], conitional ranom fiels [3], ecision trees [4], Markov moels [5], an ynamic Bayes networks [6]. In a real worl situation, using supervise methos is not very practical, as it requires labele ata for training. Manual labeling of human activity ata is time consuming, laborious an errorprone. Besies, one usually nees to eploy invasive evices in the environment uring ata collection phase to obtain reliable annotations. Another option is to ask the resients to report their activities. Asking the resients to report their activities puts the buren on the resients, an in case of elerly with memory problems such as ementia it woul be out of question. In an effort to aress the annotation problem, recently a few unsupervise methos have been propose for mining human activity ata. Those methos inclue frequent sensor mining [7], mining iscontinuous activity patterns [8], mining mixe frequent-perioic activity patterns [9], etecting activity structure using low imensional eigenspaces [10], an iscovering iscontinuous varie orer patterns [11]. None of these mining approaches take into account the streaming nature of ata, nor the possibility that the patterns might change over time. In a real worl situation, in a smart environment we have to eal with a potentially infinite an unboune flow of ata. Also the iscovere activity patterns can change over time. Mining the stream of ata over time not only allows us to fin new emerging patterns in the ata, but it also allows us to etect changes in the patterns. Detecting changes in the patterns can be beneficial for many applications. For example a caregiver can look at the pattern trens over time an spot any suspicious changes immeiately. In the last ecae, many stream mining methos have been propose as a result of ifferent emerging application omains, such as network traffic analysis, eb click stream mining, an power consumption measurements. Most of the propose methos try to fin frequent itemsets over ata streams [12] [15]. Also some methos have been propose for fining frequent sequences over ata streams [16] [18]. To the best of our knowlege, no stream mining metho has been propose so far for mining human activity patterns from sensor ata in the context of smart environments. In this paper, we exten the tilte-time winow approach propose by Giannella et al. [13], in orer to iscover activity pattern sequences over time. The tilte-time winow approach fins the frequent itemsets using a set of tilte-time winows,

such that the frequency of the item is kept at a finer level for recent time frames an at a coarser level for oler time frames. Such a tilte winow approach can be quite useful for human activity pattern iscovery. For example a caregiver is usually intereste in the recent changes of the patient at a finer level, an in the oler patterns (e.g. from three months ago) at a coarser level. Due to the special requirements of our application omain, we cannot irectly use the metho propose by Giannella et al. [13]. First of all, the time-tilte approach [13], as well as most of the other similar stream mining methos [16] [18] were esigne to fin sequences or itemsets in transactionbase streams. The ata obtaine in smart environment is a continuous stream of unboune sensor events with no bounary between episoes or activities. Secon, as iscusse in [11], ue to the complex an erratic nature of human activity, we nee to consier an activity pattern as a sequence of events. In such a sequence, the patterns might be interrupte by irrelevant events (calle a iscontinuous pattern). Also the orer of events in the sequence might change from occurrence to occurrence (calle a varie orer pattern). Fining variations of a general pattern an etermining their relative importance can be beneficial in many applications. For example in a stuy, Hayes, et al. [19] foun that variation in the overall activity performance at home was correlate with mil cognitive impairment. This highlights the fact that it is important to recognize an monitor all the activities an their variations which are performe regularly by an iniviual in a aily environments. Thir, we also nee to aress the problem of varying frequencies for activities performe in ifferent regions of the space. A person might spen the majority of his/her time in the living room uring the ay, an only go to the beroom for sleeping. e still nee to iscover the sleep pattern though its sensor support count might be substantially less than the support count of the sensors in the living room. In this paper, we exten the DVSM metho propose by Rashii an Cook [11] into a streaming version base on using a tilte-time winow [13]. Our propose metho allows us to fin iscontinuous varie-orer patterns in streaming non transaction sensor ata over time. As another extension to this work [11], we also aress the problem of varying frequencies to better hanle real life ata an to fin a higher percentage of the interesting patterns over time. This represents the first reporte stream mining metho for iscovering human activity patterns in sensor ata over time. Besies activity mining, our iscontinuous an varie orer stream mining metho can be useful in other application omains, where ifferent variations of a pattern can reveal useful information, such as eb click mining. The remainer of the paper is organize as follows. First we explain the relate stream mining works in more etail in section II. Next we escribe the title-time winow in more etail in section III. Our propose solution is explaine in section IV. e then show the results of our experiments on ata obtaine from two ifferent smart apartments in section V. Finally we en the paper with our conclusions an iscussion of future work. II. STRAM MINING RLATD ORKS Sequential pattern mining has been stuie for more than a ecae [20] an many methos have been propose for fining sequential patterns in ata [20] [23]. Compare to the classic problem of mining sequential patterns from a static atabase, mining sequential patterns over ata streams is far more challenging. In a ata stream, new elements are generate continuously an no blocking operation can be performe on the ata. Despite being more challenging, with the rapi emergence of new application omains over the past few years the stream mining problem has also been stuie in a wie range of ifferent application omains. A few such application omains inclue network traffic analysis, frau etection, eb click stream mining, power consumption measurements an tren learning [24]. For fining patterns in a ata stream, approximation an using a relaxe support threshol is a key concept [15], [25]. The first approach was introuce by Manku et al. [15] base on using a lanmark moel an calculating the count of the patterns from the start of the stream. Later Li et al [14] propose methos for incorporating the iscovere patterns into a prefix tree. They also esigne methos for regressionbase stream mining algorithms [26]. More recent approaches have introuce methos for managing the history of items over time [13], [26]. The main iea is that one usually is more intereste in recent changes in more etail, while oler changes are preferre in coarser granularity in long term. There also have been several methos for fining sequential patterns over ata streams, incluing the SPD algorithm [18], methos for fining approximate sequential patterns in eb usage [17], a ata cubing algorithm [27] an mining multiimensional sequential patterns over ata streams [28]. All of these approaches consier ata to be in a transactional format. However, input ata stream in a smart environment is a continuous flow of unboune ata. Figure 1 epicts the ifference between transaction ata an sensor ata. As can be seen in Figure 1a, for transaction ata, each single transaction is associate with a set of items an is ientifie by a transaction ID, making it clearly separate from the next transaction. The sensor ata has no bounaries separating ifferent activities or episoes from each, an it is just a continuous stream of sensor events over time. Approaches propose by sensor stream mining community [29], [30] try to turn a sensor stream into a transactional ataset using techniques such as Apriori technique [20] to group frequent events together. Another metho is to simply use fixe or varie clock ticks [30]. In our scenario, using such simple techniques oes not allow us to eal with complex activity patterns that can be iscontinuous, varie orer, an of arbitrary length. Instea, we exten the DVSM metho [11] to group together the co-occurring events into varie-orer iscontinuous activity patterns. It shoul be note that researchers from the ubiquitous computing an smart environment community have also pro-

Sensors M1 M 2 M 3 M 4 M 5 Transaction ID Items 1 {A, B, D, F, G, H} 2 {D, F, G, H, X} 3 {A, B, C, X, Y } (a) 0 1 2 3 4 5 6 7 8 Time M2 M3 M1 M5 M1 M4... (b) Fig. 1: Transaction ata vs. sensor ata. 31 ays 24 hours 4 qtrs Fig. 2: Natural tilte-time winow. pose methos for fining patterns from sensor ata [7] [11]. However none of these approaches aress the stream mining problem. To the best of our knowlege this is the first work aressing activity pattern iscovery from sensor ata over time. III. TILTD-TIM INDO MODL In this section, we explain the tilte winow moel [13] in more etail. Figure 2 shows an example of a natural tiltetime winow where the frequency of the most recent item is kept with an initial precision granularity of an hour, in another level of granularity in the last 24 hours an then again at another level in the last 31 ays. As new ata items arrive over time, the history of items will be shifte back in the tiltetime winow to reflect the changes. Other variations such as logarithmic tilte-time winow have also been propose to provie a more efficient storage [13]. The tilte-time winow moel uses a relaxe threshol to fin patterns accoring to the following efinition. Definition 1. Let the minimum support be enote by σ, an the maximum support error be enote by ɛ. An itemset I is sai to be frequent if its support is no less than σ. If support of I is less than σ, but no less than σ ɛ, it is sub-frequent; otherwise it is consiere to be infrequent. Using the approximation approach for frequencies allows for the sub-frequent patterns to become frequent later, while iscaring infrequent patterns. To reuce the number of frequency recors in the tilte-time winows, the ol frequency recors of an itemset I are prune. Let fj (I) enote the compute frequency of I in time unit j, an let N j enote t TABL I: xample sensor ata. Timestamp (t) Sensor ID (s) 7/17/2009 09:52:25 M4 7/17/2009 09:56:55 M30 7/17/2009 14:12:20 M15 the number of transactions receive within time unit j. Also let τ refer to the most recent time point. For some m where 1 m τ, the frequency recors f 1 (I),..., f m (I) are prune if quation 1 an quation 2 hol [31]. n τ, i, 1 i n, fi (I) <σn i (1) l, 1 l m n, l f j (I) < (σ ɛ) j=1 l N i (2) j=1 quation 1 fins a point n in the stream such that before that point, the compute frequency of the itemset I is always less than the minimum frequency require. quation 2 fins a point m, where 1 m n, such that before that point, the sum of the compute support of I is always less than the relaxe minimum support threshol. In this case the frequency recors of I from 1 to m are consiere as unpromising an are prune. This type of pruning is referre to as tail pruning. In our moel, we will exten the above efections an pruning techniques for iscontinuous, varie orer patterns. IV. PROPOSD MODL In the following subsections, first we give an overview of efinitions an notations, then we will escribe our moel in more etail. A. Definitions The input ata in our moel is an unboune stream of sensor events, each in the form of e = s, t, where s refers to a sensor ID an t refers to the timestamp when sensor s has been activate. Table I shows an example of several such sensor events. e efine an activity instance as a sequence of n sensor events e 1,e 2,.., e n. Note that in out notations an activity instance is consiere as a sequence of sensor events, not a set of unorere events. e assume that the input ata is broken into batches B b 1 a 1...B b n an where each B b i a i is associate with a time perio [a i..b i ], an the most recent batch is enote by Ba bτ τ or for short as B τ. ach batch Ba bi i contains a number of sensor events, enote by Ba bi i. As we mentione before, we use a time tilte-time winow for maintaining the history of patterns over time. Instea of maintaining the frequency recors, we maintain the compression recors, which will be explaine in more etail in section IV-B. Also, in our moel as the frequency of an item is not the single eciing factor, an other factors such as the length of the pattern an its continuity also play a role, we will use the term interesting pattern instea of a frequent pattern.

Month.... Month eek Fig. 3: Our tilte-time winow. eek eek eek The tilte-time winow use in our moel is epicte in Figure 3. This tilte-time winow keeps the history recors of a pattern uring the past 4 weeks at the finer level of week granularity. History recors oler than 4 weeks are only kept at the month granularity. Regaring our application omain an consiering its en users, a natural time-tilte winow provies a more natural representation vs. a logarithmic tiltetime winow. For example it woul be it easier for a nurse or caregiver to interpret the pattern tren using a natural representation. Secon, as we on t expect the activity patterns to change substantially over a very short perio of time, we omit the ay an hour information for the sake of a more efficient representation. For example, in case of ementia patients it takes weeks an months to see some changes to evelop in their aily activity patterns. Using such a schema we only nee to maintain 15 compression recors instea of 365 24 4 recors in a normal natural tilte-time winow keeping ay an hour information. Note that we chose such a representation in this stuy for the reasons mentione above. However if necessary, one can aopt other tilte-winow moels such as the logarithmic winows, as the choice of title-time winow has no effect on the moel except for efficiency. To upate the tilte-time winow, whenever a new batch of ata arrives, we will replace the compression values at the finest level of time granularity an shift back to the next level of finest time granularity. During shifting, we check if the intermeiate winow is full. If so, the winow is shifte back even more; otherwise the shifting stops. B. Mining Activity Patterns Our goal is to evelop a metho that can automatically iscover resient activity patterns over time from streaming sensor ata. ven if the patterns are somehow iscontinuous or have ifferent event orers across their instances. Both situations happen quite frequently while ealing with human activity ata. For example, consier the meal preparation activity. Most people will not perform this activity in exactly the same way each time, rather some of the steps or the orer of the steps might be change (variations). In aition the activity might be interrupte by irrelevant events such as answering the phone (iscontinuous). The DVSM metho propose in [11] fins such patterns in a static ataset. For example, the pattern a, b can be iscovere from instances {b, x, c, a}, {a, b, q}, an {a, u, b}, espite the fact that the events are iscontinuous an have varie orers. e iscover the sequential patterns from the current ata batch B τ by using an extene version of DVSM that is able to fin patterns in streaming ata an is also able to eal with varying frequencies across ifferent regions of a physical t space. After fining patterns in current ata batch B τ, we will upate the tilte-time winows, an will prune any pattern that seems to be unpromising. To fin patterns in ata, first a reuce batch B r τ is create from the current ata batch B τ. The reuce batch contains only frequent an subfrequent sensor events, which will be use for constructing longer patterns. A minimum support is require to ientify such frequent an subfrequent events. DVSM uses as global minimum support, an it only ientifies the frequent events. Here, we introuce the maximum sensor support error ɛ s to allow for the subfrequent patterns to be also iscovere. e will also automatically erive multiple minimum supports values corresponing to ifferent regions of the space. In mining real life activity patterns, the frequency of sensor events can vary across ifferent regions of the home or other space. If the ifferences in sensor event frequencies across ifferent regions of the space are not taken into account, the patterns that occur in less frequently use areas of the space might be ignore. For example, if the resient spens most of his/her time in the living-room uring the ay an only goes to the beroom for sleeping, then the sensors will be triggere more frequently in the living-room than in the beroom. Therefore when looking for frequent patterns, the sensor events in the beroom might be ignore an consequently the sleep pattern might not be iscovere. The same problem happens with ifferent types of sensors, as usually the motion sensors are triggere much more frequently than other type of sensors such as cabinet sensors. This problem is known as rare item problem in market basket analysis an is usually aresse by proviing multiple minimum support values [32]. Our propose solution for solving the problem of rare items in ata stream can be applie to other application omains, such as eb click mining. For example, a web page might have a lower chance of being visite by visitors, but we still might be intereste in fining click patterns in such pages. e will automatically erive multiple minimum sensor support values across space an over time. To o this, we ientify ifferent regions of the space using location tags l, corresponing to the functional areas such as beroom, bathroom, etc. Also ifferent types of sensor might exhibit varying frequencies. In our experiments, we categorize the sensors into motion sensors an interaction-base sensors. The motion sensors are those sensors tracking the motion of a person aroun a home, e.g. infrare motion sensors. Interaction-base sensors, as we will call them key sensors, are the non-motion tracking sensors, such as cabinet sensors or RFID tags on items. Base on observing an analyzing sensor frequencies in multiple smart homes, we foun that a motion sensor might have a higher chance of being triggere than a key sensor in some regions. Hence we will erive separate minimum sensor supports for ifferent sensor categories. For the current ata batch B τ, we compute the minimum regional support for ifferent categories of sensors as in quation 3. Here l refers to a specific location. c refers to the sensor s category, an S c refers to the set of sensors in a

σ k = 0.02 σ m = 0.02 σ k = 0.02 σ m = 0.02 σ m = 0.03 σ m = 0.06 σ k = 0.01 σ m = 0.03 Fig. 4: The frequent/subfrequent sensors are selecte base on the minimum regional support, instea of a global support. category c. Also f T (s) refers to the frequency of a sensor s over a time perio T. σ c T (l) =1/ S l c s S l c f T (s) s.t. S l c = {s s l s S c } (3) As an illustrative example, Figure 4 shows the compute minimum regional sensor support values for a smart home use in our experiments. Using the minimum regional sensor frequencies, frequent an subfrequent sensors are efine as following. Definition 2. Let s be a sensor of category c locate in location l. The frequency of s over a time perio T, enote by f T (s), is the number of times in time perio T in which s occurs. The support of s in location l an over time perio T is f T (S) ivie by the total number of sensor events of the same category occurring in L uring T. Let ɛ s be the maximum sensor support error. Sensor s is sai to be frequent if its support is no less than σt c (l). It is sub-frequent if its support is less than σt c (l), but no less than σc T (l) ɛ s; otherwise it is infrequent. Only the sensor events from the frequent an subfrequent sensors will be ae to the reuce batch Bτ r, which is then use for constructing longer sequences. e use a pattern growth metho as in [9] which grows a pattern by its prefix an suffix. To account for the variations in the patterns, the concept of a general pattern is introuce. A general pattern is a set of all of the variations that have a similar structure in terms of comprising sensor events, but have ifferent event orers [11]. During pattern growth, if an alreay iscovere variation matches a newly iscovere of pattern, its frequency an continuity information will be upate. If the newly iscovere pattern matches the general pattern, but oes not exactly match any of the variations, it is ae as a new variation. Otherwise it will be consiere as a new general pattern. At the en of each pattern growth iteration, infrequent or highly iscontinuous patterns an variations will be iscare as uninteresting patterns. Instea of solely using a pattern s frequency as a measure of interest, we use a compression objective base on the minimum escription length (MDL) [33]. Using a compression objective allows us to take into account the ability of the pattern to compress a ataset with respect to pattern s length an continuity. The compression value of a general pattern a over a time perio T is efine as in quation 4. The compression value of a variation a i of a general pattern over a time perio T is efine as in quation 5. Here L refers to the escription length as efine by Minimum Description length principle (MDL), an Γ refers to continuity as efine in [11]. Continuity basically shows how contiguous the component events of a pattern or a variation are. It is compute in a bottom-up manner, such that for the variation continuity is efine in terms of the average number of infrequent sensor events separating each two successive events of the variation. For a general pattern continuity is efine as the average continuity of its variations. α T (a) = L(B T ) Γ a L(a)+L(B T a) β T (a i )= (L(B T a)+l(a)) Γ ai (5) L(B T a i )+L(a i ) Variation compression measures the capability of a variation to compress a general pattern compare to the other variations. Compression of a general pattern shows the overall capability of the general pattern to compress the ataset with respect to its length an continuity. Base on using the compression values an by using a maximum compression error, we efine interesting, sub-interesting an uninteresting patterns an variations. Definition 3. Let the compression of a general pattern a be efine as in quation 4 over a time perio T. Also Let σ g an ɛ g enote the minimum compression an maximum compression error. The general pattern a is sai to be interesting if its compression α is no less than σ g. It is sub-interesting if its compression is less than σ g, but no less than σ g ɛ g ; otherwise it is uninteresting. e also give a similar efinition for ientifying interesting/sub-interesting variations of a pattern. Let the average variation compression of all variations of a general pattern a over a time perio T be efine as in quation 6. Here the number of variations of a general pattern is enote by n a. β T (a) = 1 n a n a i=1 (L(B T a)+l(a)) Γ ai L(B T a i )+L(a i ) Definition 4. Let the compression of a variation a i of a general pattern a be efine as in quation 5 over a time perio (4) (6)

T. Also Let ɛ v enote the maximum variation compression error. A variation a i is sai to be interesting over a time perio T if its compression β T (a i ) is no less than β(a) T. It is subinteresting if its compression is less than β(a) T, but no less than β(a) T ɛ v ; otherwise it is uninteresting. During each pattern growth iteration, base on the above efinitions, the uninteresting patterns an variation are prune, i.e. those patterns an variations that are either highly iscontinuous or infrequent (with respect to their length). e also prune reunant non-maximal general patterns; i.e., those patterns that are totally containe in another larger pattern. To only maintain the very relevant variations of a pattern, also irrelevant variations of a pattern are iscare base on using mutual information [34] as in quation 7. It allows us to fin core sensors for each general pattern a. Fining the set of core sensors allows us to prune the irrelevant variations of a pattern which o not contain the core sensors. Here P (s, a) is the joint probability istribution of a sensor s an general pattern a, while P (s) an P (a) are the marginal probability istributions. A high mutual information value inicates the sensor is a core sensor. P (s, a) MI(s, a) =P (s, a) log (7) P (s)p (a) e continue extening the patterns by prefix an suffix at each iteration until no more interesting patterns are foun. A post-processing step recors attributes of the patterns, such as event urations an start times. e refer to the pruning process performe uring the pattern growth on the current ata batch as normal pruning. Note that it s ifferent from the tail pruning process which is performe on the title-time winow to iscar the unpromising patterns over time. In the following subsection we will escribe how the tiltetime winow is upate after iscovering patterns of current ata batch. C. Upating the Tilte-time inow After iscovering the patterns in the current ata batch as escribe in the previous subsection, the tilte-time winow will be upate. ach general pattern is associate with a tilte-time winow. The tilte-time winow keeps track of the general pattern s history as well as its variations. henever a new batch arrives, after iscovering its interesting general patterns, we will replace the compressions at the finest level of granularity with the recently compute compressions. If a variation of a general pattern is not observe in the current batch, we will set its recent compression to 0. If none of the variations of a general patterns are perceive in the current batch, then the general pattern s recent compression is also set to 0. In orer to reuce the number of maintaine recors an to remove unpromising general patterns, we propose the following tail pruning mechanisms as an extension of the original tail pruning metho in [13]. Let α j (a) to enote the compute compression of general pattern a in time unit j. Also let τ refer to the most recent time point. For some m, where 1 m τ, the compression recors α 1 (a),..., α m (a) are prune if quations 8 an 9 hol. n τ, i, 1 i n, α i (a) <σ g (8) l, 1 l m n, l α j (a) <l (σ g ɛ g ) (9) j=1 quation 8 fins a point n in the stream such that before that point, the compute compression of the general pattern a is always less than the minimum compression require. quation 9 computes the time unit m, where 1 m n, such that before that point, the sum of the compute compression of a is always less than the relaxe minimum compression threshol. In this case the compression recors of a from 1 to m are consiere as unpromising an are prune. e efine a similar proceure for pruning the variations of a general pattern. e prune a variation a k if the following conitions in quations 10 an 11 hol. n τ, i, 1 i n, β i (a k ) < β i (a) (10) l, 1 l m n, l β j (a k ) <l ( β l (a) ɛ v ) (11) j=1 quation 10 fins a point in time where the compute compression of a variation is less than the average compute compression of all variations in that time unit. quation 11 computes the time unit m, where 1 m n, such that before that point the sum of the compute compression of a i is always less than the relaxe minimum support threshol. In this case the compression recors of a i from 1 to m are consiere as unpromising an are prune. V. XPRIMNTS The performance of the system was evaluate on the ata collecte from two smart apartments. The layout of the apartments incluing sensor placement an location tags are shown in Figure 5. e will refer to apartments in Figures 5a an 5b as apartments 1 an apartment 2. The apartments were equippe with infrare motion sensors installe on ceilings, infrare area sensors installe on walls, an switch contact sensors to etect open/close status of the oors an cabinets. The ata was collecte uring 17 weeks for apartment 1, an uring 19 weeks for apartment 2. During ata collection, the resient in apartment 2 was away for approximately 20 ays, once uring week 12 an once uring week 17. Also the last week of ata collection in both apartments oes not inclue a full cycle. In our experiments, we constrain each batch to contain approximately one week of ata. In our experiments, we set the maximum errors ɛ s, ɛ g an ɛ v to 0.1, as suggeste in literature. Also σ g was set to 0.75 base on several runs of experiments.

e e e e e e e e e t ^ D s^d (a) Apartment 1. (b) Apartment 2. Fig. 5: Sensor map an location tags for each apartment. On the map, circles show motion sensors while triangles show switch contact sensors. e< e< e< e< ^ ^ Z^ e e (a) Apartment 1 (time unit = weeks). e e e e e e e e t ^ D s^d (b) Apartment 2 (time unit = weeks). Fig. 7: Total number of istinct iscovere patterns over time. e e e e e e e ^ ^ e< e< e< e< t (a) Apartment 1. Z^ e e e e e e e e t (b) Apartment 2. Fig. 6: Total number of recore sensor event over time (time unit = weeks). To be able to evaluate the results of our algorithms base on a groun truth, each one of the atasets was annotate with activities of interest. A total of 10 activities were note for each apartment. Those activities inclue bathing, betoilet transition, eating, leave home/enter home, housekeeping, meal preparation(cooking), personal hygiene, sleeping in be, sleeping not in be (relaxing) an taking meicine. Apartment 1 inclues 193, 592 sensor events an 3, 384 annotate activity instances. Apartment 2 inclues 132, 550 sensor events an 2, 602 annotate activity instances. Figures 6a an 6b show the number of recore sensor events over time. As we mentione, resient of apartment 2 was not at home uring two ifferent time perios, hence we can see the gaps in Figure 6b. e ran our algorithms on both apartments atasets. Figures 7a an 7b show the number of istinct iscovere patterns over time base on using a global support employe by DVSM versus using multiple regional support as in our propose metho. The results confirm our hypothesis that our propose metho is able to etect a higher percentage of interesting patterns using multiple regional support values. Some of the patterns that have not been iscovere are inee quite ifficult to spot an also in some cases less frequent. For example the housekeeping activity happens every 2-4 weeks an is not associate with any specific sensor. Also some of the similar patterns are merge together, as they use the same set of sensors, such as eating an relaxing activities. It shoul be note that some of the activities are iscovere multiple times in form of ifferent patterns, as the activity might be performe in a ifferent motion trajectory using ifferent sensors. One also can see that the number of iscovere patterns increases at the beginning an then is ajuste over time epening on the perceive patterns in the ata. The number of iscovere patterns epens on perceive patterns in current ata batch an previous batches, as well as the compression of patterns in tilte-time winow recors. Therefore, some of the patterns might isappear an reappear over time which can be a measure of how consistently the resient performs those activities. As alreay mentione, to reuce the number of iscovere patterns over time, our algorithm performs two types of pruning. The first type of pruning calle normal pruning, prunes patterns an variations while processing the current ata batch. The secon type of pruning is base on tail pruning to iscar unpromising patterns an variations store in titletime winow. Figures 8a an 8b show the results of both types of pruning on the first ataset. Figures 8 an 8e show the results of both types of pruning on the secon ataset. Figures 8c an 8f show the tail pruning results in tilte-time winow over time. Note that the gaps for apartment 2 results are ue

to the 20 ays when resient was away. By comparing the results of normal pruning in Figures 8a an 8 against the number of recore sensors in Figures 6a an 6b, one can see that the normal pruning somehow follows the pattern of recore sensors. If more sensor events are available, more patterns woul be obtaine, an also more patterns woul be prune. For the tail pruning results, epicte in Figures 8b, 8e, 8c an 8f the number of tail prune patterns at first increases in orer to iscar the many unpromising patterns at the beginning. Then the number of tail prune patterns ecreases over time as the algorithm is stabilize. To better see an example of iscovere patterns an variation, several patterns are shown in 9. e have evelope a visualization software that allows us to visualize the patterns an variations along with their statistics. Figure 9a shows a taking meication activity in apartment 1 that was prune at thir week ue to its low compression. Figure 9b shows two variations of leave home activity pattern in apartment 2. Note that we use a color coing to ifferentiate between ifferent variations, if multiple variations are chosen to be shown simultaneously. Figure 9c shows meal preparation pattern in apartment 2. To see how consistent the variations of a certain general pattern are, we use a measure calle variation consistency base on using the externally provie labels. e efine the varication consistency as in quation 12. Here v 11 refers to the number of variations that have the same label as their general pattern, an v 01 an v 10 refer to the number of variations that have a ifferent label other than their general pattern s label. v 11 purity(p i )= (12) v 11 + v 01 + v 10 Figures 10a an 10b show the average variation consistency for apartment 1 an 2. As mentione alreay, for each current batch of ata the irrelevant variations are iscare using a mutual information metho. The result confirm that that the variation consistency increases at the beginning, an then it quickly stabilizes ue to iscaring irrelevant variations for each batch. To show the changes of a specific pattern over time, we show the results of our algorithm for taking meication activity over time. Figure 11a shows the number of iscovere variations over time for taking meication activity. Figure 11b shows the same results in the tilte-time winow over time. e can clearly see that the number of iscovere variations quickly rops ue to the tail pruning process. This shows that espite the fact that we are maintaining the time recors over time for all variations, yet many of the uninteresting, unpromising an irrelevant variations will be prune over time, making our algorithm more efficient in practice. e also show how the average uration of the taking meication pattern changes over time in Figure 11c. Presenting such information can be informative to caregivers to etect any anomalous events in the patterns. Figure 11 shows the consistency of the taking meication variations over time. Similar to the results obtaine for the average variation consistency of all patterns, we see that the variation consistency is increase an then stabilize quickly. In summary, the results of our experiments confirm that we can fin sequential patterns from a steam of sensor ata over time. It also shows that using two types of pruning techniques allows for a large number of unpromising, uninteresting an irrelevant patterns an variation to be iscare, in orer to achieve a more efficient solution that can be use in practice. VI. CCLUSIS AND FUTUR ORK In this paper we showe a metho for iscovering sequential patterns over time from a stream of sensor ata. e provie an extension of the tilte-time winow moel for continuitybase, varie orer sequential patterns accoring to the special requirements of our application omain. Not only our propose metho can be use in an activity iscovery an recognition system, but it also can be applie to other application omains, such as eb click mining. In the future, we inten to apply our metho to other application omains, as well as to use our moel in a fully functional system eploye in a real home. e also inten to esign anomaly etection methos to etect anomalies in the observe ata over time. RFRNCS [1] M. C. Mozer, R. H. Doier, M. Anerson, L. Vimar, R. F. C. Iii, an D. Miller, The neural network house: An overview, in Proceeings of the American Association for Artificial Intelligence Spring Symposium on Intelligent nvironments, 1998, pp. 110 114. [2] O. Briczka, J. Maisonnasse, an P. Reignier, Automatic etection of interaction groups, in Proceeings of the 7th international conference on Multimoal interfaces, 2005, pp. 32 36. [3] M. Philipose, K. Fishkin, M. Perkowitz, D. Patterson, D. Fox, H. Kautz, an D. Hahnel, Inferring activities from interactions with objects, I Pervasive Computing, vol. 3, no. 4, pp. 50 57, Oct.-Dec. 2004. [4] U. Maurer, A. Smailagic, D. P. Siewiorek, an M. Deisher, Activity recognition an monitoring using multiple sensors on ifferent boy positions, in BSN 06: Proceeings of the International orkshop on earable an Implantable Boy Sensor Networks, 2006, pp. 113 116. [5] L. Liao, D. Fox, an H. Kautz, Location-base activity recognition using relational markov networks, in Proceeings of the International Joint Conference on Artificial Intelligence, 2005, pp. 773 778. [6] T. Inomata, F. Naya, N. Kuwahara, F. Hattori, an K. Kogure, Activity recognition from interactions with objects using ynamic bayesian network, in Casemans 09: Proceeings of the 3r ACM International orkshop on Context-Awareness for Self-Managing Systems, 2009, pp. 39 42. [7] T. Gu, Z. u, X. Tao, H. Pung,, an J. Lu, epsicar: An emerging patterns base approach to sequential, interleave an concurrent activity recognition. in Proceeings of the I International Conference on Pervasive Computing an Communication, 2009. [8] J. Pei, J. Han, an. ang, Constraint-base sequential pattern mining: the pattern-growth methos, Journal of Intelligent Information Systems, vol. 28, no. 2, pp. 133 160, 2007. [9] P. Rashii an D. J. Cook, the resient in the loop: Aapting the smart home to the user, I Transactions on Systems, Man, an Cybernetics journal, Part A, vol. 39, no. 5, pp. 949 959, September 2009. [10] B. Schiele, Unsupervise iscovery of structure in activity ata using multiple eigenspaces, in 2n International orkshop on Location an Context Awareness. Springer, 2006. [11] P. Rashii, D. J. Cook, L. Holer, an M. Schmitter-gecombe, Discovering activities to recognize an track in a smart environment, I Transaction on Knowlege an Data ngineering, 2010. [12] J. Ren an C. Huo, Mining close frequent itemsets in sliing winow over ata streams, in Innovative Computing Information an Control, 2008. ICICIC 08. 3r International Conference on, 18-20 2008, pp. 76 76.

s s e e e e e e t s e e e e e e t e e t t t t D D D D & (a) Apartment 1 (time unit = weeks). (b) Apartment 1 (time unit = weeks). (c) Apartment 1 (time unit = tilte-time frame) s s e e e e e e e e t s e e e e e e e e t e t t t t D D D D & () Apartment 1 (time unit = weeks). (e) Apartment 2 (time unit = weeks). (f) Apartment 2 (time unit = tilte-time frame) Fig. 8: Total number of tail-prune variations over time. For the bar charts, 1-4 refers to the week 1-4, an M1-M5 refers to month 1-5. (a) A visualization of the taking meication variation in apartment 1. (b) Two variations of leave home pattern in apartment 1. Fig. 9: Visualization of patterns an variations. (c) Meal preparation pattern in apartment 2. [13] C. Giannella, J. Han, J. Pei, X. Yan, an P. S. Yu, Mining Frequent Patterns in Data Streams at Multiple Time Granularities. MIT Press, 2003, ch. 3. [14] H. fu Li, S. yin Lee, an M. kwan Shan, An efficient algorithm for mining frequent itemsets over the entire history of ata streams, in In Proc. of First International orkshop on Knowlege Discovery in Data Streams, 2004. [15] G. S. Manku an R. Motwani, Approximate frequency counts over ata streams, in VLDB 02: Proceeings of the 28th international conference on Very Large Data Bases. VLDB nowment, 2002, pp. 346 357. [16] G. Chen, X. u, an X. Zhu, Sequential pattern mining in multiple streams, in ICDM 05: Proceeings of the Fifth I International Conference on Data Mining. ashington, DC, USA: I Computer Society, 2005, pp. 585 588. [17] A. Marascu an F. Masseglia, Mining sequential patterns from ata streams: a centroi approach, Journal of Intelligent Information Systems, vol. 27, no. 3, pp. 291 307, 2006. [18] C. Raïssi, P. Poncelet, an M. Teisseire, Nee for spee: Mining sequential pattens in ata streams, in BDA05: Actes es 21iemes Journees Bases e Donnees Avancees, October 2005. [19] T. L. Hayes, M. Pavel, N. Larimer, I. A. Tsay, J. Nutt, an A. G. Aami, Distribute healthcare: Simultaneous assessment of multiple iniviuals, I Pervasive Computing, vol. 6, no. 1, pp. 36 43, 2007. [20] R. Agrawal an R. Srikant, Mining sequential patterns, in ICD 95: Proceeings of the leventh International Conference on Data ngineering. ashington, DC, USA: I Computer Society, 1995, pp. 3 14. [21] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, an M. Hsu, Prefixspan: Mining sequential patterns by prefix-projecte growth, in Proceeings of the 17th International Conference on Data ngineering. ashington, DC, USA: I Computer Society, 2001, pp. 215 224. [22] J. ang an J. Han, Bie: fficient mining of frequent close sequences, in ICD 04: Proceeings of the 20th International Conference on Data ngineering. ashington, DC, USA: I Computer Society, 2004, p. 79.

sk sk s e e e e s e e e e e e e e e e t e e e e e e t (a) Apartment 1 (time unit = tilte-time frame). (b) Apartment 2 (time unit = weeks). Fig. 10: Total number of istinct iscovere patterns an their variation consistency over time. s e s D e e e e e e e t e s s D t t t t D D D D t (a) (Number of iscovere variations (time unit = weeks). (b) Number of iscovere variations (time unit = tilte-time frame). D e e e D t t t t D D D D t s e e s D e e e e e e e t (c) Duration (time unit = weeks). () Variation consistency (time unit = weeks). Fig. 11: Number of iscovere variations, uration an consistency for taking meication activity pattern over time. [23] F. Masseglia, F. Cathala, an P. Poncelet, The psp approach for mining sequential patterns, in PKDD 98: Proceeings of the Secon uropean Symposium on Principles of Data Mining an Knowlege Discovery. Lonon, UK: Springer-Verlag, 1998, pp. 176 184. [24] M. Garofalakis, J. Gehrke, an R. Rastogi, Querying an mining ata streams: you only get one look a tutorial, in SIGMOD 02: Proceeings of the 2002 ACM SIGMOD international conference on Management of ata. New York, NY, USA: ACM, 2002, pp. 635 635. [25] J. H. Chang an. S. Lee, Fining recent frequent itemsets aaptively over online ata streams, in KDD 03: Proceeings of the ninth ACM SIGKDD international conference on Knowlege iscovery an ata mining. New York, NY, USA: ACM, 2003, pp. 487 492. [26].-G. Teng, M.-S. Chen, an P. S. Yu, A regression-base temporal pattern mining scheme for ata streams, in VLDB 2003: Proceeings of the 29th international conference on Very large ata bases. VLDB nowment, 2003, pp. 93 104. [27] J. Han, Y. Chen, G. Dong, J. Pei, B.. ah, J. ang, an Y. D. Cai, Stream cube: An architecture for multi-imensional analysis of ata streams, Distrib. Parallel Databases, vol. 18, no. 2, pp. 173 197, 2005. [28] C. Raïssi an M. Plantevit, Mining multiimensional sequential patterns over ata streams, in DaaK 08: Proceeings of the 10th international conference on Data arehousing an Knowlege Discovery. Berlin, Heielberg: Springer-Verlag, 2008, pp. 263 272. [29] S. Papaimitriou, A. Brockwell, an C. Faloutsos, Aaptive, hans-off stream mining, in VLDB 2003: Proceeings of the 29th international conference on Very large ata bases. VLDB nowment, 2003, pp. 560 571. [30] K. K. Loo, I. Tong, B. Kao, an D. Cheung, Online algorithms for mining inter-stream associations from large sensor networks, in PAKDD, 2005. [31] J. Cheng, Y. Ke, an. Ng, A survey on algorithms for mining frequent itemsets over ata streams, Knowlege an Information Systems, vol. 16, no. 1, pp. 1 27, 2008. [32] B. Liu,. Hsu, an Y. Ma, Mining association rules with multiple minimum supports, in KDD 99: Proceeings of the fifth ACM SIGKDD international conference on Knowlege iscovery an ata mining. New York, NY, USA: ACM, 1999, pp. 337 341. [33] J. Rissanen, Moeling by shortest ata escription, Automatica, vol. 14, pp. 465 471, 1978. [34] I. Guyon an A. lisseeff, An introuction to variable an feature selection, Machine Learning Research, vol. 3, pp. 1157 1182, 2003.