Robust and Reversible Relational Database Watermarking Algorithm Based on Clustering and Polar Angle Expansion

Robust and Reversble Relatonal Database Watermarkng Algorthm Based on Clusterng and Polar Angle Expanson Zhyong L, Junmn Lu and Wecheng Tao College of Informaton Scence and Engneerng, Hunan Unversty, Changsha, Chna zhyong.l@hnu.edu.cn, 32686094@qq.com, weshens985@hotmal.com Abstract. Dgtal watermarkng has been wdely appled to relatonal database for ownershp protecton and nformaton hdng. But robustness and reversblty are two key challenges due to the frequently database mantanng operators on those tuples. Ths paper proposes a novel relatonal database watermarkng scheme based on a fast and stable clusterng method on database tuples, whch adopts Mahalanobs dstance as the smlarty measurement. Before the process of watermark embeddng and detectng, the databases tuples are adaptvely clustered nto groups accordng to the length of bnary watermark. Moreover the watermark segments are respectvely embedded nto or detected from those groups accordng to the numerc feld's Lowest Sgnfcant Bt (LSB) and polar angle expanson. The majorty decson strategy s used to determne the value of watermark bt n blnd detecton process. The experment results ndcate that the proposed watermarkng scheme has hgher robustness and reversblty under blnd detecton aganst the database mantanng operators. Keywords: Database watermarkng, robustness, reversblty, blnd detecton, tuples clusterng, polar angle expanson. Introducton Dgtal watermarkng s developed n recent years as a potental nformaton securty key technology, whch can determne the ownershp or orgnalty of dgtal content by embeddng percevable or unpercevable nformaton n dgtal works []. It has better characterstcs on securty, nvsblty and robustness [2]. Smlarly, database watermarkng has been proposed on large database securty-control. However, there are some dfferences between relatonal database and multmeda data [3]. So database watermarkng should also have the ablty of real-tme update and blnd detecton and cannot drectly adopt those multmeda watermarkng method. It s more dffcult to ensure the robustness and reversblty of database watermarkng. In recent years, scholars have carred out extensve research on database watermarkng. The groundbreakng study n ths area was conducted by R. Agrawal and R. Son n 2002 [4], [5]. In 2003, X.M. Nu proposed that a meanngful strng

could be nserted nto relatonal database as the watermark [6]. Y.J. L rased a method of nsertng watermark by changng the order of relatonal data ndex [7], and t does not change the physcal locaton or data value to mpar ts use. Y. Zhang converted mage nformaton nto watermark cloud droplets accordng to D.Y. L's cloud model dea, and then embedded t nto relatonal data [8]. When beng extracted, the cloud droplet should be compared wth orgnal copyrght mage. Moreover, Y. Zhang put forward a r eversble watermarkng method for relatonal database [9] that took the dfferences at the end of relatonal data and expanded t usng wavelet transformaton, then embedded watermark nformaton. G. Gupta utlzed dfference expanson and Lowest-Effectve-Bt on ntegers to acheve embeddng and blnd detecton of watermark, but the method s only used for nteger data that make t not unversal [0]. Many other watermark workers also make a lot of efforts to promote the development of database watermarkng []-[4], yet there are stll many shortcomngs n current study, they could be ncluded nto two aspects: On one hand the watermark robustness s too weak to resst varous conventonal database operatons and llegal watermark attacks, such as selecton, addton, modfcaton and so on, on the other hand the orgnal relaton cannot be restored from the watermarked relaton. As a result, how to mprove the robustness and reversblty of database watermarkng s a very dffcult and sgnfcant work. To mprove the robustness and reversblty of database watermarkng, the paper puts forward an adaptve relatonal database watermarkng scheme based on clusterng and polar angle expanson. 2 Method Allowng for the dsorderlness of tuples and attrbutes, nsuffcent redundant space of database, along wth weak robustness of the general database watermarkng algorthm, t s practcable to realze the database watermarkng embeddng and robust detecton wth the stable, hgh-effcency and large-capacty database tuples clusterng method, whch s regarded as the bass of database watermarkng algorthm n ths paper. Meanwhle there are frequently database mantanng operators on tuples and attrbutes whch would affect the robustness of database watermarkng, and we use the majorty decson method to solve the problem when extractng watermark. Moreover, the orgnal data should be restored exactly after extractng watermark for a hghly avalable database, whch means that the watermark should have not only robustness but also reversblty. We already studed a reversble and blnd database watermark method based on polar angle expanson before [5], whch maps the attrbutes to polar coordnates and embeds watermark nto those ponts extendng polar angle. In vew of the aforementoned tuples clusterng, majorty decson strategy and our prelmnary related study, the paper proposes a r obust and reversble database watermark method based on clusterng and polar angle expanson for numercal data. The man dea s as follows: Frst we classfy the tuples by the gven number, and each category represents a partcular meanng; Next use a key as a pseudo-random number seed to produces pseudo-random numbers to select the watermark embeddng 2

poston n each category, then map these attrbutes to polar coordnates one by one, and embed watermark nto those ponts extendng polar angle; Fnally take the LSB method to extract the watermark. Some notatons used n the watermarkng algorthms are gven n table. Table. Notatons used n the watermarkng algorthms. Notaton Explanaton Notaton Explanaton W Bnary bt length f ( x) = hash( x) Hash functon that meet hash( Y ) = hash( Y ) R Orgnal database α = ( α, α2,, α n ) Polar angle correspondng to Y that meet R Expendng polar angle Watermarked β = ( β, β2,, β n ) correspondng to Y that database meet Y = hash( Y ) tan ( β ) Database tuple Y = ( y, y2,, y n ) attrbute where the watermark wll be embedded p( D ) Watermark detecton rate Database tuple Y = ( y, y 2,, y n ) attrbute where the W = w, w2,, wn Bnary representaton of watermark have the watermark been embedded G = ( g, g,, g k ) Tuples clusterng µ The up lmt of data 2 { } L = L, L,, Lk 2 Accumulaton pont change 2. Tuples Clusterng Here we apply the fast clusterng method to the classfcaton of database tuples, whch begns wth classfyng samples roughly, then uses certan regulatons to adjust the categores gradually based on the dstance between samples. It s sutable for clusterng analyss of large data sets. The smlarty of samples s measured by dstance. Due to the dsunty of varous attrbutes unts n database, n order to elmnate the nfluence of dmenson, ths paper adopts Mahalanobs dstance to cluster the tuples. Defnton (Mahalanobs Dstance) : x (, 2,, ) T = x x xq for =, 2,, n represents n samples. Mahalanobs dstance s marked as T d( x, xj) = ( x xj) s ( x xj) where s s the covarance matrx of samples. Defnton 2 (Clusterng):For a data set A= ( a, a2,, a n ), clusterng algorthm s to classfy A nto k categores marked as G = ( g, g2,, g k ) accordng to the gven rule. Each category has hgh smlarty but dffer greatly from other category, k and t meets the condton g = A where g g j =, j. = Defnton 3: q and n respectvely ndcate the number of attrbutes and tuples 3

n database R, so n tuples can be taken as n samples n q dmensonal space. Procedure of Fast Clusterng. L = x 0, x 0,, x 0 k ncludes k ntal cluster ponts. Step. Suppose the set 0 { 2 } Step2. Acheve ntal classfcaton accordng to the followng rule: G 0 ( 0 ) { ( 0 = d xx, x: d xx, j), j=, 2,, k, j }, =, 2,, k Thus n samples are dvded nto k non-ntersect categores G { 0 0 0 0 = G, G2,, Gk } by ther respectve closest ntal cluster pont. Step3. Calculate new cluster ponts set L { x, x2,, xk} x = xl, =, 2,, k s the barycenter of n 0 xl G = based on G 0, where 0 G and n s the number of samples. Next classfy samples agan by L to get a n ew classfcaton G { = G, G2,, Gk}. t t t Then calculate n turn as above. Assumng we get a classfcaton Gt = { G, G2,, Gk} n step t, where x s the barycenter of Gt and nether sample nor the barycenter t of Gt. As the ncrease of t, the classfcaton tends to be stable when approxmate to the barycenter of G t and t+ t t+ t t x x x, G G, and the calculaton can t+ be stopped now. Sometmes classfcaton { t+ t+ G t G, G2,, Gk } t t t Gt { G, G2,, Gk} + = and = are just the same from step t n practcal calculaton, and at ths pont the calculaton can be over. As a r esult, we can use the fast clusterng method measured by Mahalanobs dstance to classfy orgnal database tuples nto desred categores. The convergence condton s as below: when the changed maxmum dstance of cluster ponts s less than or equal to a specfed value multpled by the mnmum dstance of orgnal cluster ponts, the algorthm wll be termnated. 2.2 Database Watermarkng Algorthm Adaptve Factor. We should analyze the nfluence of embedded watermark to data before gvng specfc watermarkng algorthm. Suppose the clusterng result of data set A s G = ( g, g2,, g k ) before embeddng watermark and G = ( g, g 2,, g k ) after embeddng watermark. Interleaved class s defned as follows. Defnton 4 (Interleaved Class):For x A, f x belongs to a classfcaton before embeddng watermark but not belongs to t after embeddng watermark, x s called nterleaved class, that s, ( x g) &&( x g j), where j. Assumng the mnmum dstance between any two adjacent classfcatons s { mn (, ),, } d = d x x x g x g a b. In order to avod arsng nterleaved class, ab j a j b the change of data should meet the followng stuaton: 4

η µ () η ga + dab ab, =, 2,, k, a b 2 The neghborhood η of x s shown n fgure, where x s any sample of classfcaton g a, x s the sample after embeddng watermark, d ab s the nearest dstance between g a and g b. Thus, the change of data just needs to meet x x ga + dab ab, =, 2,, k, a b. 2 Watermarkng Embeddng Algorthm. Step. Generate bnary watermark and use fast clusterng method to classfy database tuples nto W categores and lst ts sequence. Step2. Use hash map to select the locaton of the watermark embeddng, whch take the key and the tuple prmary key as parameters. Step3. Select the Y = ( y, y2,, y ) watermark embeddng attrbute n and calculate the correspondng polar angle α based on lterature [5]. Step4. Get the expandng polar angle β by combnng the polar angle related to each category wth one watermark bt successvely. Step5. Calculate the watermarked attrbute and wrte t back to the database. The number of embedded multplcty s m, and the method of embeddng watermark s to change the least sgnfcant bt. The database owner holds the key, the number of embedded multplcty and the length of watermark. Watermarkng Detecton and Data Recovery Algorthm. Watermarkng detecton and data recovery s the nverse of the embeddng process. Step. Classfy test database nto W categores, and use the rankng functon to acheve synchronzaton of detectng watermarkng, whch takes secret key as parameters and lsts ts sequence based on tuple prmary key. Step2. Use the key to fnd the poston of embedded watermark and calculatng the correspondng polar angle β. Step3. Extract watermark from β by the means of LSB and majorty decson method and get polar angle α. Step4. Restore the orgnal attrbute and wrte t back to the database. Fg.. Stuaton that the change of data needs to meet. Fg. 2. Detecton stuaton of subset selecton attack. Fg. 3. Detecton stuaton of subset addton attack. Fg. 4. Detecton stuaton of subset modfcaton attack. 5

3 Smulaton Experment and Analyss We use open-source database MySQL to make research and smulaton of database watermark and take vsual studo as fore-end. There s 00000 tuples, each of whch has 2 attrbutes (attrbutes value s generated randomly by computer). Selectng 0 numerc data as canddate attrbutes to embed watermark, and nsertng HNU nto database for 00 tmes. Moreover expermental result s compared wth lterature [0] algorthm under the same data set. Invsblty. The holstc nfluence of embedded watermark to each attrbute column of data n database (Rounded to 3 decmal places) s shown n table 2. It can be seen that the error caused by embeddng watermark s very small and not far removed from the result of lterature [0] algorthm. Table 2. Invsblty after embeddng watermark. The changed rato of Mean (%) The changed rato of Varance (%) Attrbutes Lterature [0] Lterature [0] Our algorthm Our algorthm algorthm algorthm a 0 0 0.002 0.00 a2 0 0 0.006 0.000 a3 0 0 0.002 0.004 a4 0 0.03 0.00 0.000 a5 0 0 0.004 0.024 a6 0 0 0.005 0.009 a7 0 0 0.004 0.003 a8 0 0 0.004 0.006 a9 0 0 0.00 0.000 a0 0.043 0.052 0.04 0.0 Test of Database reversblty. Due to space lmtatons, here we only talk about a group of 24 watermarked attrbutes, as shown n table 3. We can fnd that the restoraton s satsfactory. Table 3. Stuaton before and after data restoraton. Before restoraton After restoraton Before restoraton After restoraton Before restoraton After restoraton 32.00 32.34 90.00 89.99 356.25 356.25 28.25 28.05 84.5 84.50 40.50 4.00 504.00 504.20 270.00 270.30 54.00 54.20 35.00 35.00 80.00 80.0 477.75 478.00 20.00 20.25 20.00 20.25 20.25 20.0 72.00 73.00 39.50 392.00 67.50 67.90 420.00 422.0 326.25 326.25 322.50 322.47 56.25 56.20 4.00 4.00 276.00 276.00 6

Test of Watermarkng Robustness. The smulated attacks nclude subset selecton, subset addton and subset modfcaton. These attack tests take the current system tme as random seed and select tuples and attrbutes randomly (takng the average of 20 tests). The result of smulaton experment s shown n Fgure 3, 4 and 5. Fgure 3 shows that the detecton effect on subset selecton attack s better than the algorthm from lterature [0] and ncreased by nearly 5%. Fgure 4 shows that the robustness on subset addton attack s preferably and relatvely stable. Fgure 5 shows that the robustness on subset modfcaton attack s the same as lterature [0] algorthm on the whole. Analyss of Algorthm Tme Complexty. Algorthm : The orgnal operatons of fast clusterng whch classfy n samples nto k categores nclude calculatng the dstance between two samples, comparng the sze, and calculatng cluster ponts. Suppose f( n) = On ( + k) + Onqk ( ( )) represents the tme complexty at the teratons, where the frst tem s the tme complexty of computng cluster ponts and the second tem s the asymptotc tme complexty at one clusterng. The algorthm wll be stopped after n teratons, so the whole asymptotc tme complexty s: T ( n) = O( tkqn) (2) Where t s the number of teratons, k s the class number of clusterng, q s the number of attrbutes (dmensonalty) and n s the samples number. Algorthm 2: Watermarkng embeddng ncludes bnarzng, clusterng, sortng and embeddng, thus the asymptotc tme complexty of algorthm 2 s n T2 ( n) = Ok ( ) + Otkqn ( ) + On ( log ) + Ok ( g ). Where g s the samples of category and whose extremum s g = n. Snce k and q are far less than n, n T2 ( n) = O( tkqn) + O( nlog ). Serously, due to the local convergence of fast clusterng [6], the number of teraton t s uncertan. Convergence crteron defned n algorthm ndcates that t s less than n, therefore the tme complexty of the algorthm n the worst case s: 2 T2 ( n) = On ( ) (3) Algorthm 3: Smlarly, the tme complexty of algorthm 3 n the worst case s 2 T2 ( n) = On ( ). Capacty. The length of watermarkng sequences s W, tuples number s n and the number of embedded multplcty s m, thus the capacty c = n/ ( W m). It can be seen that once the tuples number n n database and watermark are made, only can we adjust the embedded multplcty m to reduce capacty so as to make small data modfcatons. Owng to the hgher watermarkng robustness requrement of copyrght protecton s allowed. Robustness Analyss. Suppose the attackers select each tuple wth an equal probablty pt ( ) = / n and choose each attrbute of tuple wth an equal probablty pa ( ) = / q. At the followng, we wll analyze the watermarkng detecton rate under attacks such as random bt flppng, subset selecton, sortng, subset substtuton and 7

subset addton. Random Bt Flppng: We assume that the attackers know the number of database classfcaton, namely: the length of watermarkng sequences s W. The most extreme case of destroyng watermarkng detecton s to make random bt flppng on the category wth least data records. Suppose the category tuple records s ν, the attackers randomly choose ξ tuples and flp the LSB bts of all attrbutes wthout mpactng data. Thus the watermark can be detected by probablty: ν m ξ m pd ( ) = ν ξ where ξ m. Subset Selecton: Smlarly, suppose the attackers know the number of categores. If they can ptch on the tuples wthout watermark on the category wth least data records, they are able to destroy watermark detecton successfully as random bt flppng. The probablty of detectng watermark successfully s: pd ( ) = m = m ν m ξ ν ξ (4) (5) Sortng: It makes no dfference to watermark detecton f the attackers randomly resort the database tuples. We just need to make database fast clusterng and recover the orgnal order by secret key rearrangement of each category, then extract watermark. Subset Substtuton: Subset substtuton s smlar to subset selecton. Subset Addton: Subset addton wll only ncrease the tuple records of each category. Snce the embedded locaton s determned by the secret key and hash mappng of tuple prmary key n the process of watermark embeddng and detectng, subset addton wll not produce huge mpact. Secondary Watermark Addton: Suppose A nserts watermark w a nto R to get R a, whle B prates the database of A and makes some operatons as above to obtan attacked database R a, then adds ts own watermark w b to get database R. So A can detect watermark w a from R n probablty pd ( ) and B only can detect watermark w b from R a, n probablty ρ 0. As a result, t s effectve to resst secondary watermark addton attack wth the probablty of pd ( ). 4 Concluson Ths paper provdes a n ovel adaptve watermarkng scheme based on clusterng and polar angle expanson for relatonal database, whch frst takes advantage of the dsorder character among database tuples to cluster them by Mahalanobs dstance, and then combnes wth the polar angle expanson strategy to embed and extract watermark. The scheme shows a hgh robustness under blnd detecton for subset selecton, addton and modfcaton attack, and also can restore the orgnal data more 8

truly. Due to the local convergence of fast clusterng and the error of restoraton data, t can not satsfy the applcaton requrement of hgh-accuracy data. The next step s to adopt new update strategy to speed up convergence rate and global convergence, then desgn a co mpletely reversble database watermarkng algorthm and prove t n theory. Acknowledgments. Ths work was supported by the Natonal Natural Scence Foundaton of Chna (67307), the research project of Educaton Mnstry and Scence Mnstry, Guangdong Provnce, Chna (20A09000027) and the Scence and Technology Plan of Changsha, Hunan Provnce, Chna (K09099-). References. R.G. van Schyndel, A.Z. Trkel, and C.F. Osborne, A Dgtal Watermark, Proc. ICIP 94, vol. 2, pp. 86-90(994) 2. I. Cox, M. Mller, J. Bloom, and Chrs Honsnger, Dgtal Watermarkng, Academc Press, USA(2002) 3. R. Son, M. Atallah, and S.Prabhakar, Rghts Protecton for Relatonal Data, IEEE Transactons on Knowledge and Data Engneerng, vol. 6, no.2, pp.509-525(2004) 4. R. Agrawal and J. Kernan, Watermarkng Relatonal Databases, Proc. VLDB 02, pp. 55-66(2002) 5. R. Son, M. Atallah, and S. Prabhakar, On Watermarkng Numerc Sets, Proc. IWDW, pp. 2-5(2002) 6. Xamu Nu et al, Watermarkng Relatonal Databases for Ownershp Protecton, Chnese Journal of Electroncs (n Chnese), vol. 3, no. 2A, pp. 2050-2053(2003) 7. Y.J. L, V. Swarup, and S. Jajoda, Fngerprntng Relatonal Databases: Schemes and Specaltes, IEEE Transactons on Dependable Secure Computng, vol. 2, no., pp. 34-45(2005) 8. Y. Zhang, X.M. Nu, and D.N. Zhao, A Method of Protectng Relatonal Databases Copyrght wth Cloud Watermark, Proc. World Academy of Scence, Engneerng and Technology, vol. 3, pp. 68-72(2005) 9. Y. Zhang, B. Yang, and X.M. Nu, Reversble Watermarkng for Relatonal Database Authentcaton, Journal of Computers, vol. 7, no. 2, pp. 59-65(2006) 0. G. Gupta and J. Peprzyk, Reversble and Blnd Database Watermarkng Usng Dfference Expanson, Internatonal Journal of Dgtal Crme and Forenscs, vol., no. 2, pp. 42-54(2009). I. Kamel, A Schema for Protectng the Integrty of Databases, Computers & Securty, vol. 28, no. 7, pp. 698-709(2009) 2. A.H. Al et al, Copyrght Protecton of Relatonal Database Systems, Proc. 2nd Internatonal Conference on Networked Dgtal Technologes, vol.87, pp. 43-50(200) 3. G.A. Davd, Query-Preservng Watermarkng of Relatonal Databases and XML Documents, ACM Transactons on Database System, vol. 36, no., pp. 30-324(20) 4. Mahmoud E. Farfour et al, A blnd reversble method for watermarkng relatonal databases based on a tme-stampng protocol, vol.39, no.3, pp. 385-396(202) 5. W.C. Tao, Z.Y. L, and H.F. L, Reversble and Blnd Database Watermark Algorthm Based on Polar Angle Expanson, Computer Engneerng (n Chnese), vol. 36, no. 22, pp. 55-57(200) 6. Z.Q. Wen and Z.X. Ca, Convergence Analyss of Mean Shft Algorthm, Journal of Software (n Chnese), vol. 8, No. 2, pp. 205-22 (2007) 9