A top down approach for determining the load profiles of consumers. Nimai

Size: px

Start display at page:

Download "A top down approach for determining the load profiles of consumers. Nimai"

Elvin Lucas
5 years ago
Views:

1 A top down approach for determining the load profiles of consumers Nimai

2 INTRODUCTION Load profiles represent a useful tool in the retail power market, where, in general, small consumers do not have the appropriate metering equipment. Information about the load profiles of consumers can help distribution companies improve their market strategies, develop new tariffs and offer new services. As the purchase and installation of new meters would lead to dramatically increased costs, a method of load-profiles-based payment must be established. Our aim thus is to propose a methodology that is not time-consuming and forms clear and representative groups. Explained in the following slides is a skeleton based on which a model to determine the load profiles can be readily established.

3 Profile Determining methodology The methodology will broadly constitute the follow wing four steps: I. Acquiring Consumer data. II. Preprocessing this data to reduce noise and to normalize it. III. Breaking down the data into different clusters. IV. Allocation of the load profile clusters(lpc) to group of eligible consumers.

4 I. Acquiring consumer data Different distribution companies carried out measurem ments of the load profiles of individual, eligible consumers. Out of these we select the appropriate range of consumers(for eg. in [1] consumers of the range 41kW to 300kW are selected ) It must be noted that all the measurements need to be simultaneous and spread around the target area. Taking the example of [1] The load-profile sampling interval for the Slovenian distribution system was 15 minutes. Therefore, each measured load profile (MLP) obtained from the distribution companies is represented by 96 values per day. Each value represents an average power in a sampling interval. For USA a good place to start is with, two data sets on household-level electricity usage that are national in scope and publicly available: CEX and EIA s Residential Energy Consumption Survey (RECS). The CEX collects data through quarterly interviews of about 7,500 house eholds while the RECS collects information on residential energy use from fewer than 5,000 households and is conducted about every five years.

5 II. Noise Elimination De-noising is done using wavelet mutitiresome an nalysis which is based on a wavelet transform that can perform a fast type of signal analysis, in our case a Measured Load Profile (MLP). It can be noted that wavelet analysis is a really powerful technique, and it can be used to draw out further conclusions about the MLP other than just reducing noise, so the next few slides are dedicated to providing a basic understanding of this technique.

Introduction to Wavelets Fourier Analysis Breaks down a signal into constituent sinusoids of different frequencies, In other words: Transform the view of the signal from time-bas se to

6 Introduction to Wavelets Fourier Analysis Breaks down a signal into constituent sinusoids of different frequencies, In other words: Transform the view of the signal from time-bas se to frequency-base But by using Fourier Transform, we loose the time information. When looking at a Fourier transform of a signal, it is impossible to tell when a particular event took place.

7 Short-Time Fourier Analysis In an effort to correct this deficiency, Dennis Gabor (1946) adapted the Fourier transform to analyze only a small section of the signal at a time -- a technique called windowing the signal. Gabor's adaptation, called the Short-Time Fourier Transform (STFT), maps a signal into a two-dimensional function of time and frequency. The STFT represents a sort of compromise between the timeinformation about both when and at what frequencies a signal event occurs. However, you can only obtain this and frequency-based views of a signal. It provides some information with limited precision, and that precision is determined by the size of the window. Many signals require a more flexible approach - so we can vary the window size to determine more accurately either time or frequency.

8 Wavelet Analysis Wavelet analysis represents the next logical step: a windowing technique with variable-sized regions. Wavelet analysis allows the use of long time intervals where we want more precise low-frequency information, and shorter regions where we want high-frequency information. One major advantage afforded by wavelets is the ability to perform local analysis -- that is, to analyze a localized area of a larger signal. Wavelet analysis is capable of revealing aspects of data that other signal analysis techniques miss, aspects like trends, breakdown points, discontinuities in higher derivatives, and self-similarity. Furthermore, because it affords a different view of data than those presented by traditional techniques, wavelet analysis can often compress or de-noise a signal without appreciable degradation.

9 What is Wavelet Analysis? A wavelet is a waveform of effectively limited duration that has an average value of zero. Fourier analysis consists of breaking up a signal into sine waves of various frequencies. Similarly, wavelet analysis is the breaking up of a signal into shifted and scaled versions of the original (or mother) wavelet. Just looking at pictures of wavelets and sine waves, you can see intuitively that signals with sharp changes might be better analyzed with an irregular wavelet than with a smooth sinusoid, just as some foods are better handled with a fork than a spoon. It also makes sense that local features can be described better with wavelets that have local extent

10 Continuous Wavelet Transform F ( w ) f ( t ) e iwt dt The correlation between Fourier and Wavelet is shown above, the following slides showcase how we can easily acquire a CWT of a signal.

11 Step 1: Take a Wavelet and compare it to a section at the start of the original signal Step 2: Calculate a number, C, that represents how closely correlated the wavelet is with this section of the signal. The higher C is, the more the similarity.

12 Step 3: Shift the wavelet to the right and repeat steps 1-2 until you ve covered the whole signal Step 4: Scale (stretch) the wavelet and repeat steps 1-3 Step 5: Repeat steps 1 through 4 for all scales.

13 How to make sense of all these coefficients? You could make a plot on which the x-axis represents position along the signal (time), the y-axis represents scale, and the coefs at each x-y point represents the magnitude of the wavelet coefficient C. These are the coefficient plots generated by the graphical tools.

Discrete Wavelet Transform Calculating wavelet coefficients at every possible scale is a fair amount of work, and it generates an awful lot of data.

It turns out, rather remarkably, that if we choose scales and positions based on powers of two -- so-called dyadic scales and positions -- then our analysis will be much more efficient and just as

14 Discrete Wavelet Transform Calculating wavelet coefficients at every possible scale is a fair amount of work, and it generates an awful lot of data. What if we choose only a subset of scales and positionss at which to make our calculations? It turns out, rather remarkably, that if we choose scales and positions based on powers of two -- so-called dyadic scales and positions -- then our analysis will be much more efficient and just as accurate. We obtain such an analysis from the discrete wavelet transform (DWT). An efficient way to implement this scheme using filters was developed in 1988 by Mallat.The decomposition process can be iterated(the fig a gives a single iteration), with successive approximations being decomposed in turn, so that one signal is broken down into many lower resolution components(with downsizing being carried out to control the sample size). This is called the wavelet decomposition tree. Each of the decompositions giving us the coefficients C,this looks like (fig b) : LPF A* Input Signal HPF D* Fig a Fig b

15 Wavelet reconstruction Reconstruction (or synthesis) is the process in which we assemble all component back, is done by doing exactly the opposite of the process described in the previous slide, as is intuitive.(with up sampling done by zero inserting between every two coefficients)

16 III. Clustering Methods Clustering methods are a part of pattern-recognit tion methods. They are a popular approach to unsupervised classification in which the pattern is assigned to a hitherto unknown class. In the pattern-recognition literature, a lot of different types of clustering algorithms, such as self organized maps (SOMs), K-means clustering (KM), fuzzy C-means (FCM), hierarchical clustering (HC), etc., can be found, each having its own advantages and limitations. The next few slides are dedicated to providing the basics of clustering methods.

17 What is Clustering? Organizing data into clusters such that there is high in ntra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among objects. Organizing data into clusters shows internal structure of the data Ex. clustering genes Sometimes the partitioning is the goal Ex. Market segmentation Techniques for clustering is useful in knowledge discovery in data Ex. Underlying rules, reoccurring patterns, topics, etc.

18 There is no objectively "correct" clustering algorithm, but can be noted, "clustering is in the eye of the beholder."

19 Desirable Properties of a Clustering Algorith hm Ability to deal with different data types Minimal requirements for domain knowledge to determine input parameters Able to deal with noise and outliers Insensitive to order of input records Incorporation of user-specified constraints Interpretability and usability

20 Two Types of Clustering Partitional algorithms: Construct various partitions and then evaluate them by some criterion Hierarchical algorithms: Create a hierarchical decomposition of the set of objects using some criterion Hierarchical Partitional

21 Hierarchical Clustering Bottom-Up (agglomerative): Starting with each ite em in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together. Top-Down (divisive): Starting with all the data in a single cluster, consider every possible way to divide the cluster into two. Choose the best division and recursively operate on both sides. In order to decide which clusters should be combined (for agglomerative), or where a cluster should be split (for divisive), a measure of dissimilarity between sets of observations is required. In most methods of hierarchical clustering, this is achieved by use of an appropriate metric (a measure of distance between pairs of observations), an nd a linkage criterion which specifies the dissimilarity of sets as a function of the pair wise distances of observations in the sets.

22 Example of a Bottom Up Approach We begin with a distance matrix which contains the distances between every pair of objects in our database. D(, D(, ) = 8 ) =

23 Summary of Hierarchal Clustering Methods No need to specify the number of clusters in advance. Hierarchal nature maps nicely onto human intuition for some domains They do not scale well: time complexity of at least O(n 2 ), where n is the number of total objects. Like any heuristic search algorithms, local optima are a problem. Interpretation of results is (very) subjective.

24 Partitional Clustering Nonhierarchical, each instance is placed in exactly one of K non overlapping clusters. Since only one set of clusters is output, the user normally has to input the desired number of clusters K. (This poses a real problem since there isn t any real deterministic way to choose K to guarantee efficient clustering) Some of the popular ones are discussed in the next few slides. K=2

25 K-Means Clustering Given a set of observations (x 1, x 2,, x n ), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k ( n) sets S = {S S 1, S 2,, S k } so as to minimize the within-cluster sum of squares. In other words, its objective is to find: where μ i is the mean of points in S i

26 Algorithm k-means 1. Decide on a value for k. 2. Initialize the k cluster centers (randomly, if necessary). 3. Decide the class memberships of the N objects by assigning them to the nearest cluster center. 4. Re-estimate the k cluster centers, by assuming the memberships found above are correct. 5. If none of the N objects changed membership in the last iteration, exit. Otherwise go to 3.

27 K-means Clustering Step k k k 3 4 5

28 K-means Clustering: Step k k k 3 4 5

29 K-means Clustering: Step k k 2 k

30 Comments on the K-Means Method Strength Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. Often terminates at a local optimum. The globall optimum may be found using techniques such as: deterministic annealing and genetic algorithms Weakness Applicable only when mean is defined, then what about categorical data? Need to specify k, the number of clusters, in advance Unable to handle noisy data and outliers Not suitable to discover clusters with non-convex shapes

31 Nearest Neighbor Clustering Items are iteratively merged into the existing clusters that are closest. Incremental Threshold, t, used to determine if items are added to existing clusters or a new cluster is created.

32 Fuzzy C-Means (FCM) Clustering The FCM algorithm attempts to partition a finite collection of elements into a collection of c fuzzy clusters with respect to some given criterion. Given a finite set of data, the algorithm returns a list of cluster centers and a partition matrix, where each element tells the degree to which element belongs to cluster. Like the K-means clustering, the FCM aims to minimize an objective function: where: This differs from the k-means objective function by the addition of the membership values and the fuzzifier, with. The fuzzifier determines the level of cluster fuzziness. A large results in smaller memberships and hence, fuzzier clusters. In the limit, the memberships converge to 0 or 1, which implies a crisp partitioning. In the absence of experimentation or domain knowledge, is commonly set to 2.

33 Example of FCM Iteration 1 The cluster means are randomly assigned

34 Iteration 2 Iteration 5 Iteration 25

35 How can we tell the right number of clusters? In general, this is a unsolved problem. The correct choice of k is often ambiguous, with interpretations depending on the shape and scale of the distribution of poin nts in a data set and the desired clustering resolution of the user. In addition, increasing k without penalty will always reduce the amount of error in the resulting clustering, to the extreme case of zero error if each data point is considered its own cluster (i.e., when k equals the number of data points, n). Intuitively then, the optimal choice of k will strike a balance between maximum compression of the data using a single cluster, and maximum accuracy by assigning each data point to its own cluster.

36 Rule of Thumb One simple rule of thumb sets the number to : with n as the number of objects (data points). Elbow Method Another method looks at the percentage of variance explained as a function of the number of clusters: One should choose a number of clusters so that adding another cluster doesn't give much better modeling of the data. More precisely, if one plots the percen ntage of variance explained by the clusters against the number of clusters, the first clusters will add much information (explain a lot of variance), but at some point the marginal gain will drop, giving an angle in the graph. The number of clusters is chosen at this point, hencee the "elbow criterion". Though this "elbow" cannot always be unambiguously identified. The "elbow" is indicated by the red circle. The number of clusters chosen should therefore be 4.

37 IV. Allocation of the load profile clusters Probability neural networks (PNNs) are used to assign the type of activity to a particular cluster. The process can be summarized as follows: The average load profile of an individual cluster, obtained by the FCM algorithm, was a target vector, while the average load profiles of the respective activity types were used as input vectors in the PNN architecture, which classified business activities(or whatsoever criteria we choose to dissect customers by) into their most probable clusters. These clusters were the most representative and their average load profiles weree taken as TLPs. The other clusters were rejected.

38 References D.Gerbec,S.Gasperi, I.Smon and F. Gubina : Determining the load profiles of consumers based on fuzzy logic and probability neural networks. Pitt, B.D., and Kirschen, D.S.: Application of data mining techniques to load profiling. Misiti, M., Misiti, Y., Oppenheim, G., and Poggi, J.-M.: Wavelet toolbox for use with Matlab

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among