Optimized Clusters for Disaggregated Electricity Load Forecasting

Size: px

Start display at page:

Download "Optimized Clusters for Disaggregated Electricity Load Forecasting"

Timothy Dawson
5 years ago
Views:

1 Optimized Clusters for Disaggregated Electricity Load Forecasting Jean-Michel Poggi Lab. de Mathématique, Univ. Paris-Sud, Orsay & Univ. Paris Descartes Joint work with M. Misiti, Y. Misiti, G. Oppenheim

2 Outline Introduction Data and real-world problem Aggregated versus disaggregated Disaggregation for Prediction A 4-steps procedure Cross-prediction dissimilarity Clusters Optimization Results and performance Future developments 2

3 Introduction Takes places in a scientific collaboration with EDF (the French electrical company) EDF R&D, Osiris dept. Clamart (A. Dessertaine) Goal: To improve the accuracy of electrical load forecast To allow to take into account the variation of the EDF due to the liberalization of the electrical market Idea: to conveniently disaggregate the basic signal Prediction accuracy: High in a stable market (a short-term MAPE about 1%) Decreases in new situations Main objective: study if a significant gain can be reached using such a strategy 3

curves during 2 years every hour (17520 points) Averaged

4 The data Load curve: amount of electricity power used over a period of time 2309 individual industrial customers load curves during 2 years every hour (17520 points) Averaged weeks ISF 2008 J-M. Poggi, M. Misiti, Y. Misiti, G. Oppenheim 4

5 Raw data: 2 years 2000 and

6 Raw data: 5 weeks of

7 Raw data: 1 week of

8 Eventail-like forecasting model We consider a fully automatic version of the EDF operational model Eventail See PhD Thesis V. Lefieux, Rennes 2, oct A recent reference: Bruhns, A., Deurveilher, G., Roy, J. S. (2005) A non linear regression model for mid-term load forecasting and improvements in seasonality Proceedings of the 15th Power Systems Computation Conference 2005, Liege, Belgium 8

9 Aggregated versus disaggregated 1- Result for 2 clusters (but true for p) under independence hypothesis: Data Prediction Error Result ( X1,..., X n ) ( Y1,..., Y n ) = X 1,..., Xn 1 X n E ( Xn ) Y1,..., Y n 1 Y n = E ( Yn ) Err = E [ X + Y ( X + Y n )] DISAG n n n 2 ErrDISAG Err AGGR Si = Xi + Yi S1,..., S n 1 S n = E ( S ) Err = E ( X + Y S n ) n 2- Stratified representative sampling AGGR n n 2 ( ) Var Y ( Y) σ σ = to be compared to Var ( Y strat ) = n 2 2 within n ( Y) 9

10 Aggregated versus disaggregated Disaggregate the global signal to improve forecasts Idea: to find a tradeoff between the within the clusters homogeneity the quality of the models estimation The first increases with the number of clusters while the second decreases Aggregate similar individual customers to : Decrease between clusters variability effects Improve within clusters estimation quality Ascending Strategy mixing : Preprocessing using wavelets and clustering Optimization driven by previsibility 10

11 Disaggregation: a 4 steps procedure A. Preprocessing using wavelets Standardization Coding using Ca6 (approximation coefficients at level 6) B. Preliminary clustering Around numerous centroids, typically 90 clusters of very homogeneous customers Each cluster is then associated with the corresponding aggregated signal C. Iterative optimization Supervised by a cross-prediction dissimilarity index A discrete gradient type procedure based on dissimilarities explores the set of partitions D. Consolidation 11

12 Wavelet-based clustering For each signal and for a given wavelet and a maximum decomposition level J: 1. Wavelet decomposition 2. Denoise by wavelet coefficients thresholding 3. Generate partitions, using various common compression wavelet bases hierarchical clustering using Ward method and euclidean distance 4. Select the best partition according to a quality index choose a decomposition level j* choose a number of clusters C ( ca,,, ) J cdj cd j ( ca,,, ) J cdj cd j clusters ( ca,,, ) J cdj cd j I N Z ( P) for 1 j J ( Z, P) ( ZP) Varbetween = CPVar ( )., within * final partition for j and C 12

13 Cross-prediction dissimilarity Dissimilarity index between 2 elements s k and s j : kj (, ) forec = forecast s s kj j k ( , ) E = error s forec Previsibility of s k using a model for s j and vice versa k kj Forecast of 2001 for s k using the model fitted using s j during 2000 Short-term, long-term (LT, CT) MAPE (Mean Absolute Percentage Error) ( ) D = E + E /2, jk kj jk Dissimilarity matrix : D 1 = D = jk, jk, 2 E + E ( ) T ( ) Cost expansive 13

14 Zooming in on optimization Iterative optimization of the initial partition supervised by cross-prediction dissimilarity should be adapted to: prediction horizon error criterion Scheme: Discrete gradient type via a neighborhood definition through a dissimilarity between an element and a cluster induced by D Iterative exploration of elements using nearest D-neighbors Elements are always candidates for cluster change Only the partition evolves and the basic step consists in changing an element from a cluster to another one Generates a non monotonic sequence of partitions (this is not a hierarchical approach) The number of clusters decreases slowly along the iterations, and a cluster disappears if it becomes empty 14

15 Optimization step Performed once step 0 Compute D: dissimilarities between elements step 1 Compute dissimilarities (element, current clusters) using D and a linkage function (e.g. the minimum) Select a neighbor: a couple (E, C) Without updating D step 2 Test the gain of the affectation: error of the disaggregated prediction associated with the resulting partition if the error does not decrease then if it exists candidates then examine the next candidate and step 2 else end {no possible improvement by moving an element from a cluster to another one} else {the error decreases} modify partition and step 1 15

16 Global quantitative result Starting from 90 clusters, the optimized partition reaches MAPE Long Term (LT) MAPE Short Term (ST) 1 cluster 19 clusters GAIN 4.06% 2.39 % % 1 cluster 28 clusters GAIN 2.47 % 1.51% 38.86% Anytime procedure From 90 to 19 clusters: 195 steps 16

17 Preprocessing steps? The two first steps (A and B) wavelet preprocessing and initial clustering, seem to be useful. Indeed: Hierarchical clustering of the original 2309 customers using D Then optimize the associated 90 clusters partition The MAPE-LT error stabilizes around 2.7% instead of 2.4% KM90 Partitions à 19 classes issues de KM90 par CAH avec diverses distances de Linkage Single Ward Complete Average Centroid Median Weight MAPE LT Initiale MAPE LT Finale Nombre final de Classes Effectif Diagonale par rapport à KM90 Pourcentage Diagonale par rapport à KM90 2,617% 3,213% 2,844% 2,854% 2,881% 2,879% 2,893% 2,786% 2,393% 3,028% 2,744% 2,765% 2,774% 2,775% 2,763% 2,744% ,00% 25,21% 29,84% 29,28% 28,28% 30,01% 28,54% 29,45% MAPE CT ISF ,588% 1,870% 1,749% 1,779% 1,781% 1,772% 1,806% 1,775% Initiale J-M. Poggi, M. Misiti, Y. Misiti, G. Oppenheim 17

18 Optimisation step? The optimization step seems to be useful, indeed : Starting from the 60 (or 90 clusters) partition If one constructs the hierarchy of partitions (by hierarchical clustering using D) : It is difficult to select a critical number of clusters The MAPE LT error of the 19 clusters partition remains about 2,6% instead of 2,3% «single linkage» «Ward method». Cut for 19 clusters Irrelevant cut for 19 clusters Very «flat» (log scale for «Ward») 18

19 Future developments First direction : Experiment on these electrical data forecasting methods using wavelets Antoniadis, Paparoditis, Sapatinas, JRSS B. (2006) AminGhafari, Poggi, IJ Wav. Inf. Proc. (2007) Adapt forecasting models to the clusters mimicking the approach Hathaway, Bezdek, IEEE Fuzzy Systems (1993) To make profit of external information (meteorologic and economical) for interpretation and performance improvement To study theoretically the conditions to maximize the benefits of disaggregation in more general contexts Second direction : integrate in a global procedure The wavelet and the representation basis The obtained partition and the adaptation of the model to cluster specificities 19

20 References AminGhafari, Poggi, IJ Wav. Inf. Proc. (2007) Antoniadis et al., JRSS B. (2006) Biau et al., IEEE Trans. Inf. Th. (2005) Biau et al., IEEE Trans. Inf. Th. (2007) Bruhns, et al., Proc. PSC Conf (2005) Calinski, Harabasz, Comm. Stat. (1974) Hathaway, Bezdek, IEEE Fuzzy Systems (1993) Kaufman, Rousseeuw, Wiley (2005) James, Sugar, JASA (2003) Mallat, Academic Press (1998) Misiti et al., Hermes, ISTE (2007) Misiti et al., Lect. Not. Comp. Sc (2007) Ramsay, Silverman, Springer (1997) Tibshirani et al., JRSS B (2001) Vlachos et al., SIAM Conf. Data Mining (2003) Vannucci et al., Chem. and Int. Lab. Syst. (2005) 20

Non supervised classification of individual electricity curves

Non supervised classification of individual electricity curves Jairo Cugliari 17 février 2014 Joint work with Benjamin Auder (LMO, Université Paris-Sud) Outline 1 Motivation 2 Functional clustering 3 Parallel