Fast Color Space Transformation for Embedded Controller by SA-C Recofigurable Computing

Internatonal Journal of Informaton and Electroncs Engneerng, Vol., No., July Fast Color Space Transformaton for Embedded Controller by SA-C Recofgurable Computng Jan-Long Kuo Abstract Ths paper proposes nomnal decomposton for calculatng color space transformaton n sngle assgnment language, to speed up executon tme. Chromatcty coordnate transformaton s dscussed wthn the framewor of nomnal decomposton. The gamma curve loo-up table for functon mappng s also dscussed wth reference to Taylor expanson wth -pont and -pont LUT types. The numercal value s decomposed nto nomnal value and value, respectvely. K-Table matrces for dfferent s are also proposed, n order to provde the change for the assocated dervaton. Such an algorthm s sutable for the programmng of sngle assgnment C. The hardware descrpton language VHDL can be generated by usng Data Flow Graph n a sngle assgnment C compler. The results wll show that the proposed algorthm can shorten the executon tme to a certan extent. It s evdent that the proposed algorthm can be very helpful n mplementng the hardware-nterface or software-drver for the processor dsplay and prnter system. Index Terms Color Space, Chromatcty Coordnate, Data Flow Graph (DFG, Loo-up Table (LUT, Nomnal Decomposton, Parallel Computng,Reconfgurable Computng System (RCS, Sngle Assgnment C (SA-C, VHDL. To ths end, SA-C has been developed nto an alternatve of C programmng language and an optmzng compler for the parallel operaton [6]. The language and compler can transfer hgh-level programs drectly nto FPGA desgn, especally on complcated sgnal processng or mage processng, as well as many other system ntegraton applcatons. As shown n Fg., the SA-C source programs were ntally translated nto DFG, whch s a toen-drven semantcs [7]. SA-C can be further translated nto a type of graph called the data dependence and control flow (DDCF graph, whch s convenent for advanced optmzaton. A conventonal DFG, however, does not have nodes wth nternal state, for example, regsters. Therefore, yet another round of optmzaton and translaton transforms the data flow graph nto an abstract hardware archtecture graph, whch s a data flow graph wth state full nodes (mostly regsters and hand-shang sgnals. The abstract hardware graph s then optmzed one last tme before VHDL s generated. I. INTRODUCTION A. Sngle Assgnment C Recently, systems on a chp, such as FPGA/CPLD, have drawn consderable attenton n consumer electroncs IC desgn as they represent a powerful IC desgn and development platform. However, the desgn language, such as VHDL or Verlog, stll has a tedous mplementaton procedure for a complcated hgh-level sgnal-processng algorthm when the embedded system s consdered. It s not easy to realze a hgh-level sgnal-processng algorthm drectly by usng VHDL or Verlog and a powerful language s requred to realze complcated sgnal processng algorthms or embedded systems. A varety of C language becomes more and more popular due to easy programmng va SystemC or HandleC. Recently, the SA-C language was developed due to the VHDL desgn requrement []-[5]. Research on the Cameron Proect, orgnally runnng at Colorado State Unversty, has rapdly spread worldwde. The purpose of the Cameron proect s to mae FPGAs and other adaptve computng systems avalable to more applcatons desgners. Manuscrpt receved June, ; revsed June 3,.Ths wor was supported n part by the Natonal Scence Councl n Tawan. J. L. Kuo s wth the Insttute of System Informaton and Control, Mechancal and Automaton Engneerng, Natonal Kaohsung Frst Unversty of Scence and Technology, NKFUST, Nan-Tze, Kaohsung, TAIWAN. (e-mal:luo@ccms.nfust.edu.tw. Fgure. SA-C compler procedure for the reconfgurable computng system B. Color Space Transformaton Usng Nomnal Decomposton Increasng demands for computng power have led to rapd mprovement n parallel processng hardware [8]. Many technques have been used to ncrease the speed of parallel computatons at the algorthmc and archtecture level. Selectng the correct parallel algorthm can have a sgnfcant effect on the performance of certan parallel problems. Varous applcatons have forced parallel programmers to develop new algorthms n order to meet performance 3

Internatonal Journal of Informaton and Electroncs Engneerng, Vol., No., July requrements. Ths paper proposes an algorthm called nomnal decomposton for color space transformaton. An actual floatng-pont value can be decomposed nto two ndependent parts: the nomnal part and the part. The nomnal part can be declared as the fxed-pont data type n SA-C. The part can be declared as the nteger data type. Such an algorthm has the advantage of a data-flow ndependent arthmetc operaton that s sutable for parallel SA-C computaton. In most processor systems, such as the operaton system and perpheral vdeo nterface, the follows the CIE standard for every nformaton product. The prnter color CMYK system follows the ICC standard to eep the true color for prnted documents. Among the varety of chromatcty coordnates, such as XYZ and YUV, s the standard color space as the ntermedate of color space transformaton n the processor system [9]. As shown n Fg., the proposed algorthm can convert the color space for the vdeo sgnal, photos or an mage. The algorthm s sutable for parallel operaton due to nomnal decomposton. The vdeo nformaton product has the nterface to follow the standard. In addton, there s a gamma curve n company wth the color space to modfy the color mappng. The gamma curve can mae an mage clearer due to the gamma curve regulaton. The PC wndows system has the standard gamma. for ts OS, whle the MAC computer has the gamma.8. The proposed algorthm can be mplemented by ether the perpheral-hardware or the drver-software approach. The color space transformaton defned n the CIE chromatcty chart has many matrx converson types. Snce the vdeo nterface requres hgh-speed operaton n many products, hgh-speed parallel computaton sutable for speedup wll be developed n ths paper. The gamma characterstc for the color space mght have an effect on the mage shown n the processor dsplay system. The trstmulus xyz value can be derved from the standard coordnate by the approprate matrx operaton. The transformaton formula can be further decomposed nto two parts for arthmetc operaton, the nomnal value and the value. Such decomposton can have the advantage of acceleratng the executon tme. Only a smple data type, such as nteger and fxed pont, are requred. Floatng-pont and double data type are not requred n ths proposed algorthm. By usng SA-C, the nomnal value s declared as the fxed-pont data type, such as fx or ufx. The value s declared as nteger value, such as nt or unt. Therefore, the proposed algorthm can have the potental to accelerate the executon tme. Only nteger and fxed-pont arthmetc operaton are requred for the proposed algorthm. Such a data-flow ndependent algorthm s especally sutable for SA-C programmng wth parallel operaton. As shown n Fg. 3, the data-flow graph can be converted nto a reconfgurable computng system. The compler can mplement the proposed algorthm nto crcut-level models by VHDL programmng wthout too much data-dependency n RCS nstead of the host, as shown n Fg.. As shown n Fg. 4, the nomnal value operaton and the value operaton are ndependent of each other. Fgure. Proposed nomnal decomposton representaton Fgure 3. Paradgm for the DFG generaton of the for loop II. COLOR SPACE TRANSFORMATION A. Representaton of Arrays n SA-C In general, the arrays n SA-C can be represented by two vectors []. A data vector contans all array elements n row-maor order, and a shape vector specfes the number of elements per axs. Let A be an n-dmensonal array wth shape vector and data vector respectvely. sv sv, sv,..., sv ], dv dv, dv,..., dv ] (-. = [ n = [ l Then the length l of the data vector should be n = l sv = (- Sub-arrays or elements of the array may be addressed by ndex vectors of the set: {[ v, v,..., vm ] ( m n ( ( {,,..., m }: v sv } 4

Internatonal Journal of Informaton and Electroncs Engneerng, Vol., No., July An ndex vector v = [ v, v,..., vm ] selects the sub-array wth shape vector sv, sv,..., sv ] and data vector: [ m m+ n [ dv p, dvp+,..., dvp+ q ], where p and q satsfy: m = n t= + n p = ( v sv q = sv (3 t = m The specal cases (m= and (m=n specfy the selecton of the whole array and the selecton of the sngle array element dv respectvely. p B. Matrx Operaton Algorthm for SA-C SA-C has modfed data type sgned or unsgned ntegers and fxed-pont numbers wth user-defned bt-wdths. Nevertheless, SA-C has mult-dmensonal rectangular arrays so that ts extents are determned dynamcally or statcally. For example, the type declaraton nt4 M[:,6] s a declaraton of a matrx M of 4-bt sgned ntegers. The left dmenson s determned dynamcally, whle the rght dmenson s specfed by the user. What s most mportant s the for loop structure. The for loop has three parts: a generator, a body, and a return expresson. By usng the for loop n SA-C, many parallel matrx operatons can be realzed very easly, e.g., for _ n [n] return (tle (B for _,_ n [m,n] return (tle (B wll produce m and n teratons wthout havng to declare an teraton varable. Ths wll be helpful to the matrx expanson or flattenng dscussed n ths paper. Type I: Flattenng a matrx to a vector: : nt6[:] man (nt6 A[:,:] return( : for V (~,: n A return (tle (V ; Type II: Restructurng a vector to a matrx: : nt6[:,:] man (nt6 A[:], nt6 n : {nt6 s=extents(a; 3: assert (s%n==, not rectangular,s,n; 4: nt6 res[:,:]=for wndow V[n] n A step 5: (nreturn (array(v; } return(res; C. Chromatcty Coordnate Transformaton from to YUV Transformaton The YUV model defnes a color space n terms of one lumnance and two chromatcty components. YUV s used n the PAL and NTSC systems of vdeo sgnal for the processor system and t s the standard n much of the world []. The YUV models human percepton of color more closely than does the standard RGB model used n processor graphcs hardware, but not as closely as HSL color space and HSV color space. Y stands for the lumnance component and U and V are the chromatcty components. The YCbCr or YPbPr color space used n vdeo nterface, are derved from YUV. Cb/Pb and Cr/Pr are smply scaled versons of U and V, and are sometmes naccurately called YUV. YUV sgnals are created from an orgnal RGB source. The weghted values of R, G and B are added together to produce a sngle Y sgnal, representng the overall brghtness, or lumnance, of the pxel. The U sgnal s then created by subtractng Y from the blue sgnal of the orgnal RGB, and then scalng. Also, V s created by subtractng Y from the red color, and then scalng by a another factor. The followng equatons can be used to derve Y, U and V from R, G and B n compact form: C = M C (4 YUV YR The standard elements of the above matrx can be descrbed n detal as follows: Y=(.99 R+(.587G+(.4B U=.49(B-Y=(-.47 R-(.89G+(.436B. (5 V=.877(R-Y=(.65 R-(.55G-(.B M YR s defned wth respect to the dfferent color temperature condtons. D. Chromatcty Coordnate Transformaton from to XYZ In a processor system, the quanttatve three-color gray scale s clearly defned by the CIE 93 standard []. The coordnate transformaton between the chromatcty coordnate XYZ and standard RGB ( s the basc operaton for the processor dsplay system [3]. C = M C (6 XYZ XR In complete form, the matrx can be wrtten as [4]: X=(.44 *R+(.3576 *G+(.85 *B Y=(.6 *R+(.75 *G+(.7 *B. (7 Z=(.93 *R+(.9 *G+(.955 *B Note that M XR s defned wth respect to dfferent color temperature condtons. To mplement the coordnate transformaton nto hardware or software, nomnal decomposton s proposed for speedng up the numercal operaton. In some nterface cards, nomnal numercal expresson s especally sutable for the arthmetc operaton. Some processor nterfaces often adapt the ITU-RBT. 79 color system [5]. In such a color system, the whte pont s defned as D65 wth a color temperature of 65K. When the dgtal camera receves the mage, the processor software processes the mage and then saves t to the specfc mage format. D65 s often used as the assocated format. However, the prnter system usually prnts the mage under the format of D5. Therefore, the concept of color space transformaton has to be consdered here to eep the true color of the mages or documents. The related whte-pont calculaton s performed n the verfcaton. E. Generalzed Dervaton for the Color Space Transformaton To descrbe the above two transformatons of Eqs (4 and (5 n detal, the generalzed equaton can be wrtten n the followng form, wth three scalng factors: 5

Internatonal Journal of Informaton and Electroncs Engneerng, Vol., No., July X = m R + m G + m B, Y = m R + m G + m B Z = m R + m G + m B 3 3 Tae the X varable, for example, where we can decompose the matrx operaton nto two components to speed up the manpulaton. The dervaton for the other YZ, YUV varables n the color space XYZ and YUV wll be the same process and neglected for clarty. The orgnal formula can be decomposed nto: X = ( m ( G ( G ( m + ( m 3 3 33 ( R ( R ( m 3 ( B + ( m ( B ( m 3 (8. (9 ( m,( m,and ( m 3 are defned wth the same numercal range wthn [.,.] whch s a fxed-pont data type n SA-C. ( m, ( m, and ( m are selected 3 RX GX BX wth the same,, to get the nomnal values. The values are nteger data types n SA-C: ( m,( m, ( m RX GX BX 3. > max{( m,( m, ( m = }, 3, RX,, GX, ( R,( G, and ( B defned wth the same numercal range wthn [.,.]. ( R,( G and BX Z ( ( B are selected wth the same,, to alze the values: ( R,( G, ( B =,, > max{( R,( G, ( B}. ( Snce RGB s a nomnal value, the can be selected as. The lnear combnaton of X can be further expressed as the followng form d on the proposed decomposton, RX BX m3 B X = ( m ( R + ( m ( G + ( ( GX. (- To consst wth the s for the left-hand sde and rght-hand sde, Let the three factors be equal to -, then we have: = (( ( + ( ( + ( m3 ( B com X X m R m G. (5 Wth nomnal decomposton, the above expresson can be decomposed as the followng two ndependent equatons. Note that =. 5 s the smple rght-shft operaton n com nomnal operaton. The X = : X + ( m 3 = (( m ( B ( R + ( m ( G. (6 Snce there are three terms n the lnear combnaton, the scalng factor.5 can be specfed further under the proposed algorthm and numercal saturaton can be avoded. Any software-d drver or hardware-d processor can mplement the algorthm very easly by usng only fxed-pont and nteger data type n SA-C. The algorthm s d on the fxed-pont operaton. The fxed-pont processor or the FPGA/CPLD chp n the nterface card can adapt the algorthm very easly by usng the SA-C compler. Snce the operaton s decomposed nto nomnal value and value, the calculaton can be further mplemented by parallel operaton wthout any couplng terms. F. K-Table Matrx for Base Change The above dervaton has descrbed detaled nomnal decomposton for the X varable. For smplfcaton, the Y and Z varables have the same dervaton whch can be summarzed as K-Table for the change. The color space transformaton for YUV s also the same and can be defned as another K-Table, as shown n Fg. 4. K-Table can be realzed by usng manfold array operatons provded n SA-C. RX RY RZ K = tabxyz GX GY GZ BX GY GB RY RU RV K = tabyuv GY GU GV BY GU GV K comxyz = [ comx comy comz ] K comy = UV [ ] comy comu comv RX GX + ( m ( G BX 3 X X = (( m ( R where RX BX = = RX BX + ( m ( B comx comx,, GX com = GX comx = mn(,, comx, (- 3. (3 To eep away from the numercal saturaton, the followng nequalty has to be satsfed: = (4 RX = GX BX Fgure 4. Nomnal decomposton for the Taylor expanson arthmetc operaton (left-sde and K-Table memory allocaton (rght-sde By the defnton n SA-C, the K-Table matrx for XYZ can be formed wth three factors, comxyz [ ] K = (7- comx comy comz Then for _,_ n [3,] return (tle (comxyzwll create a matrx wth extents n SA-C wth parallel operaton. The K-Table matrx for YUV can be formed wth another three factors, 6

Internatonal Journal of Informaton and Electroncs Engneerng, Vol., No., July comy UV = [ ] K (7- comy comu comv Then for _,_ n [3,] return (tle (comyuv wll create a matrx wth extents n SA-C wth parallel operaton. To tell the Y of YUV from the Y of XYZ n notaton, the Y s defned for the purpose of dentfcaton. III. GAMMA CURVE MAPPING FUNCTION A. Color Modfcaton wth LUT In processor system, the dgtal camera wth CCD/CMOS devces can capture the mages outsde of the processor system. Dgtal sensors such as CCD/CMOS are lnear. That means the voltage generated n each pxel and the pxel level emergng from ADC are proportonal to exposure. Gamma correcton s requred to create better mage qualty. To provde the unfed coordnate for the color space of Wndows and perpheral systems, s the color space standard defned by Hewlett-Pacard and the Mcrosoft Corporaton. The dgtal encodng for s 8-bt resoluton. The proposed algorthm can be helpful n eepng the maxmum resoluton of numercal operaton. In general, gamma. s wdely adapted as the followng calculaton formula [6]: f R, G, B.34 R R G =.9, =.9 G, B =.9 B f R, G, B >.34, R =.55.55, G =.55.55, B =.55.55 (./.4 R (./.4 G (./.4 B (8- (8- Frst, the case that s larger than.34 s dscussed. The above calculaton can be dvded nto two aspects for dscusson. One aspect s the followng, whch s smply lnear relaton wth offset: R G B =.55 R.55, =.55 G.55 =.55 B.55 (9 The other aspect s the nonlnear gamma characterstc for the processor dsplay. The functon s a nonlnear functon whch s hard to calculate under a fxed-pont d algorthm. Further lnear approxmaton, or Taylor expanson, combned wth loo-up table (LUT wll be consdered. If the lnear approxmaton can be derved, the nomnal decomposton algorthm can be used agan to derve the complete algorthm that s sutable for nomnal operaton for the processor system. Only a few ponts need to be bult nto LUT. The gamma characterstcs for the three-color are represented as: (./.4 = = R (, (./.4 = G = fg ( R (./.4 = = B ( R R f R G B B f R ( The above three functons are nonlnear. LUT s requred to obtan the assocated functon mappng. In order to compute the functons easly, pecewse lnear approxmaton s requred. Taylor expanson combned wth LUT s used to expand the approxmaton locally, whch s then ntegrated usng the proposed global Taylor approxmaton. The local lmtaton s usually the wea pont of the Taylor expanson. By dvdng N sectons to eep the valdty of the local approxmaton, the global approxmaton wll be ntegrated by usng the summaton of the local approxmaton n the ndvdual sectons. In the local sectonal regon, Taylor expanson wll guarantee the valdty of the global approxmaton to be effectve. The accuracy range for the dvson can be further determned by consderng the maxmum acceptable error. The case whch s smaller than.34 can be decomposed as follows: R =.9 R = (.9 (.9 ( R ( R 4 = (.9/6 ( R 4+ = (.9/6( R ( For the 8-bt operaton, the proposed method can get the maxmum dgtal resoluton for the numercal operaton: = R (R (R 4 = (.9/6 ( R + ( 4+ R = =, 4 + =, = 4 ( B. Gamma Curve Implementaton for Color Space Transformaton Ths paper proposes a nomnal decomposton algorthm sutable for the dgtal mplementaton of color space transformaton n the processor dsplay and prnter system. Ths algorthm can be used n ether the hardware or the software. The nomnal decomposton wll convert the CIE chromatcty coordnate from the sgnal. Gamma correcton s also dscussed as a way to regulate the approprate mage qualty n dfferent processor dsplay systems. The gamma curve for the PC Wndows system s gamma value.. The gamma value for the MAC computer system s.8. Each processor system exhbts ts own dsplay texture. The gamma characterstc curve can be generally defned as: Y (γ spec = X =R, G, B (3 where ( γ spec s the gamma curve specfcaton of the -th color for the processor dsplay system. The LUT unt s requred to regulate the gamma curve for the color sgnal. However, the LUT unt usually requres a larger memory to 7

Internatonal Journal of Informaton and Electroncs Engneerng, Vol., No., July map the nonlnear functon. Conventonal pont-to-pont mappng may be requred to obtan the functon value. Such an approach wll occupy a large amount of memory n order to be precsely accurate. To save memory space for the gamma correcton, a Taylor expanson wth parallel structure s proposed n ths paper. As shown n Fg. 4, the proposed algorthm can accelerate the executon of color space transformaton due to the parallel structure. SA-C d programmng can mplement the proposed algorthm very convenently. SA-C can comple the algorthm nto DFG and translate the DFG codes nto VHDL source codes. The algorthm for the gamma correcton can be mplemented n the software-d system drver or n the hardware-d nterface card. C. Global Taylor Expanson Combned wth LUT Usually, the loo-up table needs consderable memory space to mplement the loo-up table for functon mappng [7]. Taylor expanson combned wth LUT can be used to save memory space. Snce the gamma curve s a functon wth a slow change rate, Taylor expanson for the loo-up table should be enough for approxmaton. As shown n Fg. 5, N sectons of the nput varable were dvded to reasonably approxmate the gamma curve n a global regon. N value s llustrated as and n ths paper. The -pont and -pont cases wll be observed upon verfcaton. By dvdng the global regon nto N sectons, the central ponts of the N sectons are X ~X N. To specfy the ndvdual gamma value for each color, the ndex s used to dentfy the color: m and y-ntercept b parameters below: (γ spec Y = X = f ( X + f ( X ( X X = m X + b (6 To decompose the above equaton, the nomnal decomposton can be further expressed as: Y = ( m ( m ( X ( X + ( b ( b (7 Fgure 5. N-secton representaton for the N-pont Taylor expanson wth LUT -pont (upper and -pont (lower The three factors of m, x, and b are the s for the three varables wth power of : m ( m =,( X =,( b =. (8 The nomnal varables are defned as the followng form: x b ( γ spec Y = X = f ( X + f ( X ( X X, = ~ N, = R, G, and B. (4 where the X ~ X are the specfed ponts defned n the N loo-up table. h = X - X s the neghborhood dstance between the two adacent ponts. The functon approxmaton for the neghborhood of the X can be: f X h / X < X + h /, then Y = f + f ( X X (5 where X ~ X are the central ponts defned n the loo-up N table for the N sectons. The loo-up table for the three-color can be bult-up as shown n Fg. 5. Wth LUT, the gamma curve can be calculated n the form of a lnear functon. The lnear functon s a lnear combnaton of the related terms. The addton and multplcaton arthmetc operatons can be used for further SA-C programmng whch s easly translated nto VHDL for FPGAs. As shown n Fg. 5, we can expand the sngle-color LUT nto the three-color LUT requred for the gamma curve. The ndvdual gamma value can be further regulated ndependently due to the color temperature effect. The whte balance functon can be modfed by regulatng the ndvdual gamma values to compensate the color temperature for the mage n the processor system. To speed up the proposed algorthm, nomnal decomposton was used agan. To express for clarty, the equaton can be represented as the lnear functon wth slope ( m ( b = m /( m = b /( b, ( X = X /( X (9 m, x, and b should be selected as the mnmal value to match the followng condtons: ( m ( b com x = = x m b = mn( = Y = m X + b max( m, ( X max( b m com,, x b b = com x, b, m = = m max( X com m + x b com = (( m ( X + ( b com = ( Y = ( Y ( Y (3 (3 (3 The K-Table matrx can be formed by the defnton commxb = [ comm comx comb ]. Then the semantc expresson for _,_ n [3,] return (tle (commxbwll create a matrx by usng extents n SA-C wth parallel operaton. If we want to mae the followng condton bounded, then ( Y <.. We should select the scalng factor as follows to avod from numercal saturaton. Therefore, the followng parameters can be determned by the above 8

Internatonal Journal of Informaton and Electroncs Engneerng, Vol., No., July nequalty: b =, m + x =.Ths means we can select the followng as the reasonable factors to perform the rght-shft operaton: b =, let m =, x = or m =, x =. (33 D. Bounded Proof for Taylor Expanson Ths secton dscusses the accuracy of the remander functon R ( of the Taylor seres approxmaton: Y = X X ( γ spec = f ( X + f ( X ( X X + R ( X where X h / < ξ < X h /, + f ( ξ R( X = ( X X! ( γ ( γ ( spec = ξ! < X < = c X spec X X ( γ ( ( spec γ spec X X +! ( γ spec + ( γ spec h! 4 \ (34 (35 The accuracy functon can be derved from the followng compact form. Snce the ( γ < s a negatve value spec for ths ( γ = /. 4 case, the c can be the form spec ( γ c = 8X spec ( γ + spec h (36 The above shows that the larger the X +, the less accuracy there s. In the next verfcaton secton, the accuracy functons for the -pont LUT and -pont LUT are provded. If you are usng Word, use ether the Mcrosoft Equaton Edtor or the MathType add-on (http://www.mathtype.com for equatons n your paper (Insert Obect Create New Mcrosoft Equaton or MathType Equaton. Float over text should not be selected. IV. VERIFICATIONS A. Comparson wth Executon Speed Color space transformatons are often appled n many processor perpheral and nterface cards. Therefore, the proposed algorthm requres rapd computng to fulfll the real-tme mage-processng requrement. The SA-C compler can generate the parallel computng VHDL for the crcut-level models of the assocated algorthm. Some speedup tests are llustrated to show the capablty of fast computng. As shown n Table, the verfcaton llustrates three cases for study: the ANSI-C VC program runnng on PC and SA-C runnng on the LINUX system for PC wth fxed-pont and floatng-pont. Two of the realzaton approaches were compared: the fxed-pont realzaton (nomnal decomposton used and the floatng-pont realzaton (nomnal decomposton not used. The followng cases were verfed on LINUX system on the GHz Pentum PC. The VHDL were verfed on Xlnx FPGA platform under 4MHz cloc. Case : Algorthm run usng SA-C wth fx data type; SA-C compler comples fxed-pont operaton for proposed algorthm Case : Algorthm run usng SA-C wth float data type; SA-C compler also comples drect floatng-pont operaton Case 3: Algorthm run usng ANSI-C wth float data type; ANSI-C VC compler comples drect floatng-pont operaton TABLE : PERFORMANCE COMPARISON FOR THE CASE STUDY Case Pentum PC percent FPGA Item Computaton tme ms cycles Computat on tme ms perce nt Flp-fl ops rato Case 4 %. % Case 3 46%.9 46% 57 Case 3 6 49%.48 49% 5 As shown n Table, the comparson shows that Case spends less tme to calculate the algorthm. Case 3 spends more tme to calculate the algorthm. Results show that the proposed algorthm combned wth the SA-C programmng wll be faster than the conventonal programmng usng SA-C only. Conventonal SA-C programmng wll declare the numercal operaton by usng float data type drectly. By verfcaton, the proposed algorthm s faster than the conventonal technque. SA-C programmng also proved to be faster than the conventonal ANSI-C VC language that s sequentally programmed. The sequental C programmng technque mght have crtcal compatblty problems wth crcut-level models by VHDL descrpton. VHDL can execute the computng d on the logc desgn whch s consdered n the crcut level. Whle the sequental C programmng has to execute the nstructon wthn one nstructon cycle, SA-C has the reconfgurable capablty to perform the parallel computng. B. Computng of Color Space Transformaton The trstmulus value, nomnal trstmulus value, and chromatcty dagrams were calculated and plotted as shown n Fgs. 6 and 7, and are close to the standard CIE 93 results [8]. In order to verfy the valdty, a number of cases for whte-ponts under dfferent color temperature are llustrated n Table. Not that the axes wthout unts mean alzed value. The gamma curve effect [9] on the testng mage can be observed n Fg. 8. Gamma. wll mae the orgnal value hgher and the mage becomes brghter. Usually the gamma curve wth gamma. s realzed by gamma.4, due to the dgtal mplementaton. Gamma.4 s slghtly modfed by the lnear scalng and offsettng [6]. In the end, they were 9

Internatonal Journal of Informaton and Electroncs Engneerng, Vol., No., July almost the same, as verfed n Fg.9. C. Calculaton of Taylor expanson wth LUT The global Taylor approxmaton was plotted as shown n Fg., to compare the nonlnear functon. The Taylor approxmaton s qute match wth the orgnal functon. Snce the Taylor approxmaton s a lnear functon wth frst-order approxmaton, LUT can calculate the functon very easly. By usng the proposed Taylor expanson for LUT n SA-C, the requred memory s very small. As shown n Table 3, there are two types for LUT: one s -pont LUT, the other one s -pont LUT. The -pont LUT requres words n memory space, whle -pont LUT only requres words n memory space. Of course, the requred memory space can be very small for the two types. LUT methods can speed up the gamma curve mappng. The accuracy for both of them s acceptable. The loo-up table often requres numerous memory space such as M ponts to buld up a mappng functon n the way of pont-to-pont. In the M ponts, we selected only fewer N ponts (N<<M to reduce the memory requrement. Results show that accuracy depends on the X + value as shown n Eq. (36. The relaton for accuracy versus X + can be easly observed from Fg.. For the X nterval, the accuracy s proved to be a constant c and bounded by ths constant. c s assocated wth the gven secton ponts X +, secton dstance h and gamma value γ spec. Also, c s the accuracy for the lnear approxmaton. For the adacent X + nterval, the accuracy was also proved to be a constant c + and bounded. Therefore, the totally accuracy for the global Taylor approxmaton can be the lnear combnaton of the c and c +. V. DISCUSSION The proposed algorthm s not the same as floatng pont operaton. Data-dependency decomposton s used to speedup the calculaton n the pre-processng stage. Only rght-shft and left-shft operatons are requred. Unle the floatng-pont operaton that processes n SAC compler, the proposed algorthm for the color space transformaton s actually a pre-processng stage before SAC compler. No floatng-type data s declared n the algorthm. Only nteger-type data s used. The processng for the float-pont data s dfferent from the one for the nteger operaton. The defnton of the proposed algorthm s clear. The proposed algorthm s a pre-processng before compled by SAC compler. The pre-processng s helpful for speedup the executon tme. At the pre-processng stage, the nomnal value and value are separated ndependently to be sutable for parallel operaton. 3 The K-table n secton.6 s clearly defned n Table. From the precous secton.5, the expressons for the X varable are derved n detal. Y and Z varables can be deduced n the same way. 4 Equaton defnes the requred for the assocated dervaton. The has to be larger than the maxmum value of the three varable m, m, and m 3. 5 The flow chart n Fg. 4 descrbes the data-dependency can be separately consdered by a calculaton machne. Left and rght parts of the flow chart can execute ndependently. The four matrces are defned as shown n Fg. 4. 6 Each column of Table compares the executon tme under the same percent cycle and percent flp-flops. Snce the color space transformaton s modeled by matrx operaton wth real number. Floatng pont operatng s requred to perform the calculaton. Before usng SAC compler, ths paper develops the formulaton to be sutable for fxed-pont operaton. The fxed-data type mplementaton s requred n the pre-processng stage. Fxed-pont data wthout proposed algorthm can not speedup the color space transformaton. VI. CONCLUSIONS Ths paper has successfully proposed nomnal decomposton algorthms for the color space transformaton. The gamma curve mappng functon was also decomposed nto the structure sutable for the SA-C crcut-level parallel operaton. Wth the DFG embedded n SA-C, the requred VHDL codes can be further obtaned. Results show that the obectves of the speedup can be acheved. The color space transformaton for chromatcty was successfully mplemented. Taylor expanson combned wth LUT was able to approxmate the requred gamma curve very well by usng the -pont and -pont types. K-Table matrces were bult up for the change for the nomnal decomposton. It s evdent that the proposed algorthm can be very helpful to the color space transformaton for the hardware-d nterface card and software-d system drver..5.5 x-trstmulus y-trstnulus z-trstmulus 35 45 55 65 75 85 (a trstmulus value for xyz.8.6.4. X-alzed Y-alzed Z-alzed 35 45 55 65 75 85 (balzed value for XYZ Fgure 6. The trstmulus functon versus wavelength for the RGB colors.9.8.7.6.5.4.3.. Y vs X...3.4.5.6.7.8 (a Y-Z wave length,nm 3

Internatonal Journal of Informaton and Electroncs Engneerng, Vol., No., July.9.8.7.6 Z vs X.5.4.3.....3.4.5.6.7.8 (bz-x Fgure 7. The chromatcty dagram for Y-X and Z-X TABLE : THE XYZ AND RGB VALUE FOR THE WHITE-POINTS FOR DIFFERENT STANDARDS CALCULATED FROM THE PROPOSED ALGORITHM standard gamma X Y Z R 8bt G 8bt B 8bt D65 Gamma..37.39.3583 7 7 7 D65 Gamma..37.39.3583 48 55 54 D5 Gamma..3457.355.338 97 7 54 D5 Gamma..3457.355.338 67 48 33 D93 Gamma..83.97.4 5 63 87 D93 Gamma..83.97.4 3 4 6 Gamma..33.39.358 7 7 69 Gamma..33.39.358 48 47 46 (b Fgure 9. (a Gamma curve. and modfed gamma.4 comparson (b generated DFG representaton TABLE 3: GAMMA CURVE FUNCTION MAPPING LUT USING TAYLOR EXPANSION pont LUT LUT pont LUT LUT x>., h=. LUT LUT x>., h=. ponts 9 ponts.5<x<. h=. LUT ponts.5<x<. h=. LUT ponts x<.5 Lnear Lnear x<.5 Y=x Y=x Accuracy for x>.5 Large.5 Accuracy for x>.5 Small.5..8.6.4. -pont accuracy -. -...4.6.8. (a Gamma curve wth -pont LUT..8.6.4 -pont accuracy. -. -...4.6.8.. (b Gamma curve wth -pont LUT (a Gamma. (b Gamma. Fgure 8. Gamma curve modfcaton under full-color YUV color system.5..5 accuracy -pt..4.6.8. -.5..5. (c Accuracy under -pont LUT accuracy -pt.5..4.6.8. -.5 (a (d Accuracy under -pont LUT Fgure. LUT llustratons usng Taylor expanson for -pont and -pont LUT 3

Internatonal Journal of Informaton and Electroncs Engneerng, Vol., No., July ACKNOWLEDGMENT Fnancal support from the Natonal Scence Councl proect s apprecated. The authors wsh to than Ch-Ln Tech. Inc. for provdng the testng equpment. Thans to Tzeng Tseng for typng ths artcle enthusastcally. REFERENCES [] J. Vllarreal et al., Improvng Software Performance wth Confgurable Logc, J. Desgn Automaton of Embedded Systems, pp. 35-339, Nov.,. [] Y. L et al., Hardware-Software Co-Desgn of Embedded Reconfgurable Archtectures, Proc. Desgn Automaton Conf. (DAC, ACM Press, pp. 57-5,. [3] W. Bohm et al., Mappng a Sngle Assgnment Programmng Language to Reconfgurable Systems, J. Supercomputng, vol., no., pp. 7-3,. [4] R. Rner et al., An Automated Process for Complng Dataflow Graphs nto Hardware, IEEE Trans. VLSI, pp. 3-39, Feb.. [5] W. Bohm et al., Complng ATR Probng Codes for Executon on FPGA Hardware, Proc. IEEE Symp. Feld-Programmable Custom Computng Machnes (FCCM, IEEE CS Press, pp. 3-3,. [6] J. Hammes, R. Rner, W. Naar, and B. Draper, A hgh-level algorthmc programmng language and compler for reconfgurable system, The Internatonal Engneerng of Reconfgurable Hardware/Software Obects Worshop,. [7] R. Rner, M. Carter, A. Patel, M. Chawathe, C. Ross, J. Hammes, W. Naar, and W. Bohm, An automated process for complng dataflow graphs nto reconfgurable hardware, IEEE Transactons on VLSI Systems, vol. 9, pp. 3-39,. [8] M. Clement, S. Yu, Q. Snell, and B. Morse, Parallel algorthms for mage convoluton, Internatonal Parallel and Dstrbuted Processng Technques and Applcatons Conference, paper no.35p, 998. [9] B. Lauterbach and W. Anheer, Segmentaton of scanned maps n unform color spaces, Machne Vson Applcatons Worshop, pp. 3-35, 994. [] D. reye, A compler bacend for generc array programmng, Ph. D. thess, Unversty of Kel, 3. [] C. Connolly and T. Fless, A study of effcency and accuracy n the transformaton from RGB to CIELAB color space, IEEE Transactons on Image Processng, vol. 6, no. 7, pp. 46-48, 997. [] G. Zamora and S..Mtra, Lossless codng of color mages usng color space transformatons, IEEE Computer-Based Medcal Systems Symposum, pp. 3-8, 998. [3] Y.V. Haeghen, J.M. Naeyeart, and I. Lemaheu, Consstent dgtal color mage acquston of the sn, Proceedngs of the Annual Internatonal IEEE Engneerng n Medcne and Bology Socety Conference, vol., no., pp. 944-949, 998. [4] E.L. van den Broe and E.M. van Rxoort, Evaluaton of color representston for texture analyss, Proceedng of the Artfcal Intellgence Conference, pp. 35-4, 4. [5] A. Stocman and L.T. Sharpe, Cone spectral snstvtes and color matchng. Cambrdge Unversty Press, London, pp. 53-87, 999. [6] Reference Input Medum Metrc RGB Color Encodng, RIMM RGB Whte Paper, Eastman Koda Company. [7] C. Connolly and T. Fless, A study of effcency and accuracy n the transformaton from RGB to CIELAB color space, IEEE Transactons on Image Processng, vol. 6, no. 7, pp. 46-48, 997. [8] G. Starweather, Color space nterchange usng, Mcrosoft TechPaper, vol., sec. S, 3. [9] C. Grana, G. Pellacan, S. Sedenar, and R. Cucchara, Color calbraton for a dermatologcal vdeo camera system, Proceedng of the nternatonal pattern recoqnton Conference, vol. 3, pp. 798-8, 4. Jan-Long Kuo receved a B.S. degree n Electrcal Engneerng from Natonal Sun Yat-Sen Unversty at Kaoh-Sung n 99, and a Ph.D. degree at the same nsttute n 995. He s now an assocate professor of the Insttute of System Informaton and Control, Mechancal and Automaton Engneerng, Natonal Kaohsung Frst Unversty of Scence and Technology, NKFUST, Nan-Tze, Kaohsung, TAIWAN. He was a proect leader and research scentst at Electro-technology Dvson, Energy & Resources Labs. of ITRI, Hsn-Chu, TAIWAN. Hs research nterests nclude FPGA/CPLD analog and dgtal crcuts desgn and 3C consumer electronc system ntegraton technology n audo and vdeo dsplay systems. The Chnese Insttute of Engneers awarded hm the Prze for Excellent Engneerng Student n June 99. Three natonal patents and two excellent scholastc przes were awarded to hm. He s now a member of IEICE, IEEE, IEE, and PHI-TOU-PHI. 3