Engineer-to-Engineer Note EE-295 Technicl notes on using Anlog Devices DSPs, processors nd development tools Visit our Web resources http://www.nlog.com/ee-notes nd http://www.nlog.com/processors or e-mil processor.support@nlog.com or processor.tools.support@nlog.com for technicl support. Implementing Dely Lines on SHARC Processors Contributed by Divy Sunkr Rev 1 October 25, 2006 Introduction This EE-Note discusses the implementtion of dely line on the ADSP-2126x, ADSP-21362, ADSP-21363, ADSP-21364, ADSP-21365, nd ADSP-21366 SHARC series of processors, herefter representtively referred to s ADSP- 21262 processors. Becuse ADSP-21262 processors ccess externl memory vi the prllel port, specil considertions re required while implementing dely lines on these processors when compred to other SHARC processors. Discrete filters cn be implemented with smplebsed dely lines nd set of coefficients tht define the filter. Due to memory constrints, dely lines re usully implemented in externl memory using vrious techniques. This document discusses two dely-line implementtions using the prllel port. First, smple-bsed dely line using prllel port core ccess is discussed. Lter in the document, techniques used in block-bsed dely line implemented vi prllel port DMA re described. Smple-Bsed Dely Lines Dely effects re chieved by implementing finite impulse response (FIR) or infinite impulse response (IIR) filters. For smple-bsed dely lines, this document focuses on the implementtion of simple stereo dely effect lgorithm with cross feedbck. The left nd right chnnels re coupled by introducing crossfeedbck coefficients, so tht the delyed output of one chnnel is fed into the input of the other chnnel, creting stereo dely effect. For detils, refer to the smple-bsed dely line code for the ADSP-21262 processor locted in the ssocited.zip file. The implementtion of the bove lgorithm requires two dely lines: one for left chnnel, nd one for the right chnnel. The dely lines re implemented s circulr buffers in externl memory. The length of the dely line depends on the required dely nd the input smpling rte. For exmple, 0.5-second dely requires 24000 tp dely line for n 48-KHz input smpling rte. Since this implementtion requires feedbck from the delyed output, the totl dely length is 24001, which includes the previous smple for the dely. Since the externl memory is 8 bits wide, ech tp (32-bit word) requires four externl memory loctions. Hence, the totl dely-line buffer length is four times the number of tps. Due to its feedbck structure, this lgorithm requires red from the dely-line buffer followed by write. The red nd write occur in smple-bsed mnner. Becuse the core ccesses externl memory vi prllel port DMA registers (EIPP, EMPP, nd ECPP), the circulr buffers in externl memory re implemented by using the DAG registers. Even though the DAG registers cnnot ccess externl memory, the DAG registers' circulr ddress cpbilities re Copyright 2006, Anlog Devices, Inc. All rights reserved. Anlog Devices ssumes no responsibility for customer product design or the use or ppliction of customers products or for ny infringements of ptents or rights of others which my result from Anlog Devices ssistnce. All trdemrks nd logos re property of their respective holders. Informtion furnished by Anlog Devices pplictions nd development tools engineers is believed to be ccurte nd relible, however no responsibility is ssumed by Anlog Devices regrding technicl ccurcy nd topiclity of the content provided in Anlog Devices Engineer-to-Engineer Notes.
used to clculte the ddress to ccess the externl memory vi the prllel port. The DAG register pointing to externl circulr buffers re then ssigned to the prllel port externl index register (EIPP) in order for the core to ccess the dt. Initilly, the red pointer points to the buffer's lst tp nd the write pointer points to the first tp. Following ech red nd write, both pointers re decremented nd the trnsfers continue smple by smple. For smple-bsed processing, single output is trnsmitted t prticulr time. Core-driven trnsfers to the externl memory vi the prllel port cn be initited using four techniques s described in the ADSP-2126x SHARC Processors Peripherls Mnul s well s in the ADSP-2136x SHARC Processor Hrdwre Reference. The known-durtion ccess technique, which is the most efficient of ll the ccesses, is used for reds nd writes. With 8-bit externl memory, ech 32-bit ccess vi the prllel port requires known durtion of 15 cycles [(1 ALE-cycle * 3 CCLK) + (4 dtcycles * 3 CCLK)] with PPDUR (prllel port durtion setting) of 3. The progrm cn use the 15 cycles tht occur fter the dt is written to the trnsmit register (or red from the receive register of prllel port) efficiently for other purposes. Another considertion while ccessing dt vi the prllel port is the ddressing of externl memory. This is due to the fct tht the prllel port no longer ccepts logicl ddresses but requires physicl ddresses of the externl memory to trnsfer dt. Consecutive 32-bit vlues in externl memory re ddressed by n offset of 4 becuse ech 32-bit vlue occupies four-byte-wide loctions of externl memory. Figure 1. FIR Filter The impulse response of the filter is vilble in Figure 2. Figure 3, which illustrtes the FIR implementtion using block-bsed dely lines in externl memory, depicts the implementtion with block of eight smples, filter length of six, nd two tp ccesses (tp2 nd tp4). Figure 1 shows the FIR implementtion of single block of dt. The FIR implementtion of single block of input dt requires three processes: write DMA, red DMAs, nd the dely process (convolution of input nd coefficient vlues). Ech of these processes is discussed in detil below. Even though the block-bsed dely line code implements the filter shown in Figure 1 nd Figure 2, Figure 3 shows the implementtion of only two tps of the filter to simplify the illustrtion; the dditionl tps follow the sme templte. Block-Bsed Dely Lines For the block-bsed dely-line technique, this document discusses the implementtion of lowreflection setting FIR s shown in Figure 1. Figure 2. FIR Filter Response Implementing Dely Lines on SHARC Processors (EE-295) Pge 2 of 7
Figure 3. FIR Filter Implementtion Write DMA The smple dt rrives s block of NUM_SAMPLES. The dt in the input buffer contins the left nd right chnnel smples s shown in Figure 4. Therefore, two dely lines re required for ech of the chnnel. The processing on ech of the chnnels is identicl. The length of ech dely line is equl to: Filter Length + ((NUM_SAMPLES/2) -1) For block-bsed FIR implementtions, the processing strts with writing block of dt (write DMA) to the dely-line buffer in externl memory. Figure 4 illustrtes the write DMA process to the left nd right dely lines. With block length (NUM_SAMPLES) of eight nd filter length of six, the dely length of ech of the chnnels is nine. Ech of the nine loctions of the dely line shown in Figure 4 corresponds to four loctions in externl byte-wide memory. Therefore, the totl dely line length in externl memory is 28 (9 x 4). The write pointers for ech of the chnnels re initilized to the first loctions of the left nd right dely line buffers shown s WrLptr nd WrRptr in Figure 4. Two write DMAs tke plce, one for ech chnnel. The function used to perform write DMA for both chnnels is the sme (Listing 1). The prmeters pssed to this function vry, nd re bsed on prticulr chnnel. The DMA for the left chnnel tkes plce with the internl index register pointing to TrLptr nd the externl index register of prllel port pointing to WrLptr. Similrly, for the right chnnel, the internl index register points to TrRptr nd externl index register points to WrRptr. The internl modify register is set to 2 nd the externl modify register is set to -1, plcing ech chnnel dt in counter clockwise direction in externl dely line memory strting with the first smple locted t the strt of the dely line buffer s shown in Figure 4. void TrnsDMA(int *Tinptr, int Tcount, int *Textptr) *pppctl = 0; /* initite prllel port DMA registers for first DMA*/ *piipp = Tinptr; *pimpp = 2; *picpp = Tcount; *pempp = -1; *peipp = Textptr; /* For 8-bit externl memory, the Externl count is four times the internl count */ *pecpp = Tcount * 4; *pppctl = PPTRAN PPBHC PPDUR4; *pppctl = PDEN PPEN; // enlble prllel port DMA /*poll to ensure prllel port hs completed the trnsfer*/ do ; while( (*pppctl & (PPBS PPDS) )!= 0); *pppctl = 0; // disble prllel port // strt of second DMA Listing 1. TrnsmitDMA.c Implementing Dely Lines on SHARC Processors (EE-295) Pge 3 of 7
void WriteDMA( int *Inptr, int *Wrptr, int *TestDMA, int *BUF) int DMAcount1,DMAcount2,Inc; int*wrptr1,*wrptr2,*inptr1,*inptr2; if (Wrptr < TestDMA) DMAcount2 = (TestDMA-Wrptr)/4; DMAcount1 = ((NUM_SAMPLES)/2) - DMAcount2; Wrptr1 = Wrptr; // Pointer to the strt DMA Inc = -(DMAcount1 * 4); Wrptr2 = Wrptr1; // externl pointer for first DMA Wrptr2 = circptr(wrptr2, Inc, BUF, LENGTH); // externl pointer for second DMA Inptr1 = Inptr; // internl pointer for first DMA Inptr2 = Inptr1+ DMAcount1*2; // internl pointer for second DMA TrnsDMA(Inptr1, DMAcount1, Wrptr1); TrnsDMA(Inptr2, DMAcount2, Wrptr2); Figure 4. Write DMA Circulr Buffering Since the prllel port DMA prmeter registers do not tke into ccount the wrpping of circulr buffers, if the write pointer needs to wrp round the end, the DMA is split into two seprte DMA sections. One of the DMA sections strts t the write pointer nd the other DMA section begins t the end of circulr dely line buffer for tht chnnel. For exmple, if the DMA for left chnnel strts t WrLptr (see Figure 4), it would be split into two DMAs with one strting t WrLptr nd trnsferring one word (4 bytes), nd the other strting t the end of the left chnnel dely line buffer, trnsferring four words (16 bytes). The function tht tests for this wrpping ppers in Listing 2. In the function, the TestDMA vrible points to Num_Smple/2 loctions (the number of smples trnsferred in prticulr chnnel per DMA) offset from the strt of externl dely line of prticulr chnnel. else TrnsDMA(Inptr, NUM_SAMPLES/2, Wrptr); RdptrInit= Wrptr; // Next Red pointer for red DMA Inc =-((NUM_SAMPLES/2)*4); WrptrTemp = Wrptr; WrptrTemp = circptr(wrptrtemp,inc,buf,length); // Next write pointer for write DMA Listing 2. writedma.c Red DMA For ech write DMA to the externl dely line buffers, the number of red DMAs correspond to the number of tps ccessed in the FIR filter. Figure 5 depicts the red DMA from the left nd right dely line buffers. As described bove, two tps re ccessed in this illustrtion (tp2 nd tp4), nd there re two red DMAs for every write DMA. The length of the internl buffer to perform these red DMAs is clculted by multiplying the number of smples trnsferred per DMA times the number of tps ccessed. With regrd to Figure 5, the internl buffer length for red DMAs of ech of the chnnels is eight [4 (number of smples per DMA) * 2 (number of tps ccessed)]. The function Implementing Dely Lines on SHARC Processors (EE-295) Pge 4 of 7
tht performs the DMA from the dely line buffer to the internl buffer is given in Listing 3. The DMA tkes plce with the internl modify vlue set to number of tps nd the externl modify vlue set to -1. As shown in Figure 5, two red DMAs re performed for ech of the chnnels with the externl index register pointing to RdLptr1 nd RdLptr2 for the left chnnel (nd RdRptr1 nd Rdrptr2 for the right chnnel), nd the corresponding internl index register points to ReLptr1 nd ReLptr2 for left chnnel (nd RdRptr1 nd RdRptr2 for right chnnel). Ech smple trnsferred from externl memory requires corresponding loction in internl memory, bsed on the modify register vlues s shown in Figure 5. RdLptr nd RdRptr ct s the reference pointers, obtining the externl index register vlues for red DMAs for ech of the chnnels; they re lwys initilized to the current write DMA pointers (WrLptr nd WrRptr) of ech of the chnnels, respectively. The red pointer for ech of the tps is derived by offsetting the tp number from the reference. The internl index for ech successive red DMAs is incremented by one. The function tht sets the externl nd internl index pointers for ech of the red DMA is shown in Listing 4. This function lso checks for wrpping of circulr buffers similr to tht discussed in the Write DMA section of this document. The reference pointer for the red DMAs of the left nd right chnnels for next set of dt is set to the write pointers for the next set of dt, which re NextWrLptr (left chnnel write pointer) nd NextWrRptr (right chnnel write pointer) with respect to Figure 4. Figure 5. Red DMAs void RecivDMA(int *Rinptr, int Rcount, int *Rextptr) *pppctl = 0; *piipp = Rinptr; *pimpp = NUM_TAPS; *picpp = Rcount; *pempp = -1; *peipp = Rextptr; /* For 8-bit externl memory, the Externl count is four times the internl count */ *pecpp = Rcount * 4; *pppctl = PPBHC PPDUR4; /* Receive(Red) */ *pppctl = PPDEN PPEN; // Enble prllel port DMA /*poll to ensure prllel port hs completed the trnsfer*/ do ; while( (*pppctl & (PPBS PPDS) )!= 0); *pppctl = 0; // disble prllel port Listing 3. ReceiveDMA.c Implementing Dely Lines on SHARC Processors (EE-295) Pge 5 of 7
void RedDMA(int *Inptr, int *Rdptr, int *TestDMA, int *BUF) int DMAcount1,DMAcount2,Inc; int *Rdptr1,*Rdptr2,*Inptr1,*Inptr2; int *RdptrTemp,i,InptrTemp; int Tpno[NUM_TAPS] = 993,1315,2524,2700,3202,2700,3119,3123,3202,32 68,3321,3515; InptrTemp = Inptr; for(i=0; i<num_taps; i++) RdptrTemp = Rdptr; Inc = Tpno[i] *4; RdptrTemp = circptr(rdptrtemp, Inc,BUF, LENGTH); if (RdptrTemp < TestDMA) DMAcount2 = (TestDMA-RdptrTemp)/4; DMAcount1 = ((NUM_SAMPLES)/2) - DMAcount2; Rdptr1 = RdptrTemp; Inc = -(DMAcount1 *4); Rdptr2 = Rdptr1; Rdptr2 = circptr(rdptr2, Inc, BUF, LENGTH); Inptr1 = InptrTemp; Inptr2 = Inptr1+ DMAcount1*NUM_TAPS; // internl pointer for second DMA RecivDMA(Inptr1, DMAcount1, Rdptr1); RecivDMA(Inptr2, DMAcount2, Rdptr2); else RecivDMA(InptrTemp, NUM_SAMPLES/2, RdptrTemp); InptrTemp = Inptr+1; Listing 4. RedDMA.c Dely Process After the red DMAs for ll the required tps re completed, processing tkes plce on the vlues red into the internl memory. Figure 6 shows the dely process on the internl memory dt. This process is identicl for both chnnels. The tp vlues for ech of the chnnels re in consecutive loctions in internl memory s result of the red DMA. The dely process implements n FIR filter, which involves multiplying the tp dt vlue times its corresponding coefficient vlues nd dding the results to get the smple output. Consecutive smples re clculted in the sme mnner for both the left nd right chnnels. The processed left nd right smples re plced in lternte loctions to be trnsmitted out in I2S mode. DelyProcess(int*InBlockptr,int*InBufptr) int *TempInBufptr; flot ErlyRef, InSmple; int x,j,m,i,k,l; flot TpVlue[NUM_TAPS]=0.490,0.346,0.192,0.181,0.1 76,0.181,0.180,0.181,0.176, 0.142,0.167,0.134; for (k=0;k<(num_samples/2);k++) TempInBufptr = InBufptr; j = k*2; // increment for block pointer m = k*num_taps;// increment for Buffer pointer InSmple = *(InBlockptr + j)*0.33 ; ErlyRef = 0.0; for (i=0; i<num_taps; i++) x=m+i; ErlyRef = ErlyRef + (*(TempInBufptr+x) * TpVlue[i]); *(InBlockptr + j) = (int)(insmple + ErlyRef * 0.33); Listing 5. Delyprocessing.c Implementing Dely Lines on SHARC Processors (EE-295) Pge 6 of 7
Figure 6. Dely Process Conclusion This document discusses the methods used to implement dely lines for prllel port-bsed processors such s the ADSP-2126x, ADSP- 21362, ADSP-21363, ADSP-21364, ADSP- 21365 nd ADSP-21366 SHARC series of processors. Implementtion methods for smplebsed nd block-bsed dely line re described. The progrm code for ech of these techniques for ADSP-21262 processors is vilble in the ssocited.zip file. For n FIR filter implementtion with blockbsed dely line, n input block of 512 smples with filter length (dely) of 3520 nd 18 tp ccesses tkes 463,814 core clock cycles t clock rte of 200 MHz for ADSP-21262. Incresing the number of input block smples or the number of tp ccesses increses the processing time for implementing the filter. References [1] ADSP-2126x SHARC Processor Peripherls Mnul. Revision 3.0, December, 2005. Anlog Devices, Inc. [2] ADSP-2136x SHARC Processor Hrdwre Reference. Revision 1.0, October 2005. Anlog Devices, Inc. [3] Introduction to Signl Processing, S. J. Orfnidis, Prentice-Hll,1996 [4] About This Reverbertion Business, J. A. Moorer, Computer Music Journl, vol. 3(2), pp.13-18 (1979). Document History Revision Rev 1 October 25, 2006 by D.Sunkr Description Initil version Implementing Dely Lines on SHARC Processors (EE-295) Pge 7 of 7