A lifting wavelet based lossless and lossy ECG compression processor for wireless sensors

LETTER IEICE Electronics Express, Vol.14, No.20, 1 11 A lifting wavelet based lossless and lossy ECG compression processor for wireless sensors Jiahui Luo 1, Zhijian Chen 1a), Xiaoyan Xiang 2, and Jianyi Meng 2 1 Institute of VLSI Design, Zhejiang University, Hangzhou, China 2 State Key Laboratory of ASIC and System, Fudan University, Shanghai, China a) chenzj@vlsi.zju.edu.cn Abstract: This work presents an electrocardiogram (ECG) compression processor for wireless sensors with configurable data lossless and lossy compression. Lifting wavelet transforms of 9/7-M and 5/3 are employed for signal decomposition instead of traditional wavelet. A hybrid encoding scheme improves compression efficiency by encoding the higher scales of decomposed coefficients with modified embedded zero-tree wavelet (EZW) and the lowest scale with Huffman encoding. Besides, a transposable register matrix for coefficients buffering during EZW encoding lowers the processing frequency without extra register resource. Implemented in SMIC 40 nm CMOS process, the processor only takes a total gate count of 10.8 K with 92 nw power consumption under 0.5 V voltage and achieves a compression ratio of 2.71 for lossless compression and 14.9 for lossy compression with PRD of 0.39%. Keywords: ECG compression, lifting wavelet transform, ultra low power, wireless monitoring Classification: Integrated circuits References [1] R. F. Yazicioglu, et al.: A 30 µw analog signal processor ASIC for portable biopotential signal monitoring, IEEE J. Solid-State Circuits 46 (2011) 209 (DOI: 10.1109/JSSC.2010.2085930). [2] C. J. Deepu, et al.: An ECG-on-chip with 535 nw/channel integrated lossless data compressor for wireless sensors, IEEE J. Solid-State Circuits 49 (2014) 2435 (DOI: 10.1109/JSSC.2014.2349994). [3] J. Luo, et al.: A dual-mode ECG processor with difference-insensitive QRS detection and lossless compression, IEICE Electron. Express 14 (2017) (DOI: 10.1587/elex.14.20170524). [4] J.-J. Wei, et al.: ECG data compression using truncated singular value decomposition, IEEE Trans. Inf. Technol. Biomed. 5 (2001) 290 (DOI: 10. 1109/4233.966104). [5] S. Lee, et al.: A real-time ECG data compression and transmission algorithm for an e-health device, IEEE Trans. Biomed. Eng. 58 (2011) 2448 (DOI: 10. 1109/TBME.2011.2156794). [6] J. Ma, et al.: A novel ECG data compression method using adaptive fourier 1

decomposition with security guarantee in e-health applications, IEEE J. Biomed. Health Inform. 19 (2015) 986 (DOI: 10.1109/JBHI.2014.2357841). [7] Y. Zou, et al.: An energy-efficient design for ECG recording and R-peak detection based on wavelet transform, IEEE Trans. Circuits Syst. II, Exp. Briefs 62 (2015) 119 (DOI: 10.1109/TCSII.2014.2368619). [8] C. J. Deepu, et al.: A hybrid data compression scheme for power reduction in wireless sensors for IoT, IEEE Trans. Biomed. Circuits Syst. 11 (2017) 245 (DOI: 10.1109/TBCAS.2016.2591923). [9] C. I. Ieong, et al.: A 0.45 V 147 375 nw ECG compression processor with wavelet shrinkage and adaptive temporal decimation architectures, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 25 (2017) 1307 (DOI: 10.1109/ TVLSI.2016.2638826). [10] R. Benzid, et al.: Fixed percentage of wavelet coefficients to be zeroed for ECG compression, Electron. Lett. 39 (2003) 830 (DOI: 10.1049/ el:20030560). [11] M. D. Adams and F. Kossentni: Reversible integer-to-integer wavelet transforms for image compression: Performance evaluation and analysis, IEEE Trans. Image Process. 9 (2000) 1010 (DOI: 10.1109/83.846244). [12] M. L. Hilton: Wavelet and wavelet packet compression of electrocardiograms, IEEE Trans. Biomed. Eng. 44 (1997) 394 (DOI: 10.1109/10.568915). [13] C. H. Luo, et al.: An ECG acquisition system prototype design with flexible PDMS dry electrodes and variable transform length DCT-IV based compression algorithm, IEEE Sensors J. 16 (2016) 8244 (DOI: 10.1109/JSEN.2016. 2584648). 1 Introduction Wireless and wearable healthcare device has been put a high premium in the recent decade due to the growing demands for real-time and continuous monitoring of physiological signals. This is a pre-step to realize disease prevention/diagnosis and timely alarm based on big data analysis in Internet of Things (IoT), which is of great importance to the evolution of healthcare system. Electrocardiogram (ECG) is a type of physiological signal widely used for health monitoring and the diagnosis of cardiovascular diseases. But as the monitoring of ECG signal requires 24-7 on duty with continuously sampled data for processing, data transmission has dominated the total power consumption of such a wireless system [1]. Researchers have been focusing on ECG signal compression to reduce data amount for transmission. For lossless compression, temporal methods such as linear predictors [2, 3] are commonly used with small hardware cost. But they also result in limited compression ratio (CR) and transmission power reduction. Lossy compression algorithms such as [4, 5, 6] gain higher CR and more power saving, while they bring certain degree of signal distortion. And these transform-based algorithms are usually carried out in software and require massive resource for computation and data buffering. This could lead to huge resource overhead and high power consumption in hardware, which might offset or even exceed the power saving in transmission. [7] proposed a discrete wavelet transform (DWT) based ECG compression ASIC, but the power consumption is still high due to the complicated bior4.4 wavelet computation and large memory use. 2

Besides, as signal quality and power efficiency may have varied priorities under different situations, a single function of lossless or lossy compression can be rigid for practical applications. [8] put forward a lossless and lossy compression scheme based a simple temporal method of Fan algorithm, which only achieves modest compression performance. [9] presented another DWT based compression design for lossless and lossy compression. Though a simpler wavelet basis is applied to reduce cost, the reconstructed signal quality drops quickly as CR increases. And like many transform-based methods, fractional arithmetic is involved which causes rounding errors in the final result for coding and thus the compression is only near-lossless. In this work, we present the design of an ECG compression processor with configurable lossless and lossy compression based on the integer-to-integer lifting wavelet transform (LWT). The wavelets of 9/7-M and 5/3 are applied in different scales of LWT to gain high decomposition performance with less resource overhead than traditional wavelet. Based on the characteristics of the decomposed coefficients, a hybrid encoding scheme is proposed to improve the compression efficiency by applying modified embedded zero-tree wavelet (EZW) to the higher scales of coefficients and Huffman encoding to the lowest scale. Besides, the design is optimized in hardware with a transposable register matrix that stores coefficients orthogonally to achieve low processing frequency and balanced duty work. The low complexity and high compression efficiency make the proposed work a suitable choice for wireless monitoring systems. 2 Method Fig. 1. Block diagram of the proposed ECG compression processor. The block diagram of the proposed ECG compression design is illustrated in Fig. 1. The sampled ECG signal is put into a 5-scale LWT for data decomposition. The highest 4 scales of decomposed coefficients are buffered to perform EZW scanning and adaptive region encoding for compression. The lowest scale coefficients are encoded by Huffman encoding only for lossless compression. The encoded data from the two encoding modules are then packaged into fixed length for transmission. 2.1 Data decomposition In traditional DWT such as the bior4.4 (also known as 9/7) widely used for ECG compression [7, 10], fractional arithmetic is involved in the coefficients calculation. For multi-scale DWT, either extra buffers are provided to hold the fractional part of 3

the intermediate result, or it is ignored or rounded with information loss in each scale, which hinders its use for lossless compression. To overcome this, LWT is applied in this work for data decomposition. The advantage of LWT is that it is reversible integer to integer transform [11] which causes no resolution loss in scale to scale concatenation. Considering resource overhead and decomposition performance, 9/7-M and 5/3 lifting wavelets [11] with short step size and light computation load are employed to perform a 5-scale LWT for ECG signal. 9/7-M wavelet is applied to the first and second scale, as signals in these scales are more closely related and 9/7-M wavelet with longer step size can perform better for the fast changing parts in signal such as the QRS complex. But with down-sampling in each scale, correlation decreases and 9/7-M shows no better performance than 5/3 wavelet. So we apply 5/3 wavelet to the decomposition of the higher three scales for its lower complexity. Table I shows the hardware resources required for 9/7-M and 5/3 wavelet and compares to that of the commonly used bior4.4 wavelet and the bior3.1 wavelet adopted in [9]. Large buffer size and extra multiplications are required for bior4.4, whereas the others are of much simpler arithmetic. Although bior3.1 is also simple, it is not integer to integer transform. Table I. Resources for different wavelet bases. Wavelet basis Buffer Multiplication Addition Integer to integer transform bior4.4 9 4+5 6+8 No bior3.1 4 0 4+4 No 9/7-M 7 0 5+2 Yes 5/3 4 0 2+2 Yes To evaluate the decomposition performance of the above wavelet bases, a 5-scale wavelet transform is performed and the result is compared in Fig. 2. It is shown in Fig. 2(a) that more coefficients fall into small ranges for the combination of 9/7-M and 5/3, which is more efficient for encoding. Also, after applying the Fig. 2. (a) 5-scale DWT/LWT performance based on MIT-BIH Arrythmia database channel 1. (b) 4

typical wavelet-based compression method [10] that flushes fixed percentage of coefficients to zero, the proposed method achieves comparable reconstructed signal quality to bior4.4, and higher than bior3.1, as depicted in Fig. 2(b). This high decomposition performance with low complexity is extremely desirable for low power wireless systems. 2.2 Hybrid encoding strategy The 5-scale decomposed coefficients are encoded by a hybrid encoding strategy that involves two coding methods applied to different scales. 2.2.1 Modified EZW EZW [12] is an efficient method to encode the coefficients of wavelet transform by correlating coefficients between scales. We find it very suitable for hardware implementation as the coefficients can be mapped to a binary matrix and the encoding can be performed by bit-to-bit scanning. To further lower hardware overhead and improve compression efficiency, the algorithm is modified in several aspects, as depicted below. Due to the slow change of ECG signal and the high decomposition performance of 9/7-M, the detail coefficients in the first scale (D 1 ) that constitute half of the total coefficients are generally of small amplitudes and contain very few information of signal. Thus we take them out from EZW to cut down the resources for coefficients buffering by half with minor information loss in signal reconstruction. The higher 4 scales of coefficients (D 2 D 5 ;A 5 ) are buffered in a binary matrix with each coefficient represented by its absolute value and sign bit, as shown in Fig. 3. By applying binary format, the encoding can be done by scanning the matrix bit by bit from MSB to LSB. In the m th loop, the m th bit in each coefficient is scanned. If the scanned bit belongs to searching list, it will be coded based on the following rules: 1) If the scanned bit is 1, then it is coded as significant (S). 2) If the scanned bit is 0 and all its descents in searching list is 0, it is coded as root (R). Fig. 3 shows the root-descendant relationship of coefficients. 3) Otherwise it is coded as zero (Z). While for the scanned bit belonging to refinement list, its value will be sent as the code. The updating of searching list and refinement list is the Fig. 3. Scanning order, region partition and corresponding coding strategies of the modified EZW. 5

same with traditional EZW, but only three types of code (S=R=Z) are possible, as the sign bit of each coefficient is coded as an extra bit following LSB. To improve encoding efficiency, we propose an adaptive region encoding method to encode S/R/Z based on the region separation in the binary matrix, as depicted in Fig. 3. In Region A, as A 5 contains much of the signal energy and usually it is of largest amplitude, S is assigned with the shortest length. In Region B, due to the overall trend of amplitude descending in coefficients from higher scales to lower scales, R is the most common case and coded with highest priority. As to Region C, since D 2 has no descendant, only two cases (S=R) are possible, which requires only 1 bit code. Besides, when scanning to the less significant bits and sign bit, the remaining coefficients in searching list have similar ranges, thus they are treated as individuals without descendant and coded like D 2. Fig. 4. Compression ratio of different coding methods. K is the number of the lowest bits not coded. By applying the adaptive region encoding, the compression performance is improved compared to that of the original encoding [12] and the entropy encoding based on the overall probability, as shown in Fig. 4. It can be observed that the proposed method outperforms the other two methods about 10% 30% under different coding precisions. 2.2.2 Huffman encoding for D 1 When lossless compression is required, the first scale detail coefficients D 1 discarded in EZW are encoded with Huffman coding. As D 1 is usually of small amplitude that concentrates around zero, entropy encoding method such as Huffman coding is more efficient than EZW with lower complexity as no resource is required for coefficients buffering. 3 Hardware implementation The hardware architecture of the proposed ECG compression processor is shown in Fig. 5. The 5-scale LWT is implemented with multiple FIFOs and serial logic adders. Considering that timing constraint for LWT computation is loose due to the low sampling rate of ECG signals (Khz), data bypass is applied between scales to 6

Fig. 5. Hardware architecture overview of the proposed ECG compression processor. reduce FIFO depth for coefficients computing, which can save about 10% resource for LWT implementation. In EZW encoding, a 16 16-bit transposable register matrix is designed to hold the latest wavelet coefficients with an FSM to manage the writing/reading and several buffers to control the scanning process. Besides, a packaging unit is allocated that manages the encoded data from EZW and Huffman coding modules to be ordered and packaged into fixed length for storage and transmission. 3.1 Transposable register matrix EZW scanning handles the binary matrix with a total of 16 16-bit coefficients each time. With each cycle dealing with 1 bit, the scanning requires 16 16 cycles to complete at most. But considering that EZW scanning can only be activated when the 16 coefficients are ready in the binary matrix and the scanning takes multiple cycles, if new coefficients are ready before the current EZW scanning completes, they need to be buffered elsewhere. A resource saving solution is to complete the scanning before the next valid coefficient is ready, as processing clock A shows in Fig. 6. The next coefficient (D 2 ) will be ready 4 sample clock Fig. 6. Different processing clocks for EZW scanning. 7

cycles later and the scanning must be done before that. 6. For ECG signal with sampling frequency f s, processing frequency will need to be f p ¼ 16 16 f s =4. Yet this high processing frequency is inefficient as it only works for a short time and then put to rest. To balance the duty work in each clock cycle without extra resource for coefficient buffering, we propose a novel transposable register matrix with timesharing mechanism, as depicted in Fig. 7. (a) Storing horizonally (b) Storing ver cally Fig. 7. Coefficients storing mechanism of the proposed transposable register matrix. Normally, the 16 coefficients are buffered in the register matrix with a fixed location and the scanning goes from MSB to LSB, as a case shows in Fig. 7(a). When finishing scanning one bit in all coefficients, the information in that column is useless, and the new coming coefficient can be buffered there, as shown in Fig. 7(b). In this way, when the old coefficients are being scanned, the new coefficients can be buffered in the transpose of the original register matrix with the same location. Thus the register matrix is written in horizontal and vertical direction alternately to take full use. And the processing clock only needs to ensure that one loop of scanning is done before that group of register is required for the new coefficient. Therefore the processing frequency can be set to f p ¼ 16 f s =2 to balance duty in each cycle, as processing clock B shows in Fig. 6. Each loop of scanning is allocated with exactly 16 cycles to process with the 16-bit data. 3.2 Code packaging After encoding, the coded data is packaged into fixed length of 16-bit for temporary storage and transmission. For lossy compression, the encoded data from EZW scanning is packaged in order, as shown in Fig. 8(a). At the beginning of an EZW loop, a 4-bit max_code indicates the bit location of the first significant bit scanned. Then the scanned and encoded data follow in order. For lossless compression, the new detail coefficient D 1 keeps coming and requires encoding during EZW scanning, as shown previously in Fig. 6. To ensure correct decoding, the Huffman code for D 1 is arranged after every loop of EZW scanning to ensure correct decoding, as depicted in Fig. 8(b). 8

IEICE Electronics Express, Vol.14, No.20, 1 11 (a) Code order for lossy compression (b) Code order for lossless compression Fig. 8. 4 Code order for lossy and lossless compression. Experimental results The proposed ECG processor is implemented in SMIC 40 nm CMOS process with a total area of 12882 µm2 and a gate count of 10.8 K, as shown in Fig. 9. The 5-scale LWT and transposable register matrix take more than 80% of the total resources. Although the variable length of Huffman encoding for D1 complicates the packaging logic, which accounts for 9% of the area, the size of register matrix for coefﬁcient buffering is cut by half and the total area is reduced. In order to further decrease power consumption, near-threshold voltage supply is applied in the design and the processor is able to run at 23 KHz frequency under 0.5 V voltage supply. The well known MIT-BIH Arrhythmia database (MITDB) in 11-bit resolution with 360 Hz sampling rate is applied for evaluation. The processor works at a low frequency of 2.88 KHz for MITDB, with power consumption of 92 nw under 0.5 V. The compression performance is evaluated by CR and signal distortion. In estimation of signal distortion, percentage root-mean-square difference (PRD) is adopted, as shown in equation (1), where xi is the raw signal and yi is the reconstructed signal. n is the total number of samples. Fig. 9. The layout photograph of the processor and its speciﬁcations. 9

Table II. Comparison of compression performance with other algorithms. Reference Method CR PRD (%) [5] 2011 DCT 5.19 0.23 & Huffman encoding 14.68 1.02 21.30 1.75 [6] 2015 Adaptive Fourier Decomposition 18.0 0.80 & SS encoding 25.64 1.05 33.85 1.47 [13] 2016 Variable-length DCT-IV 6.86 0.18 & Huffman encoding [9] 2017 Bior3.1 WT 5.24 0.42 & Huffman encoding 10.00 0.66 26.91 3.11 Proposed 9/7-M and 5/3 LWT 5.53 0.17 & modified EZW encoding 14.87 0.39 & Huffman encoding 22.80 0.68 33.34 1.34 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X n u i¼1 PRD ¼ ðx i y i Þ t 2 X n ð1þ i¼1 x2 i Table II lists the compression performance of the proposed work and some previous works. The proposed work supports different degrees of lossy compression by configuring the EZW scanning loop. It outperforms the wavelet based method of [9], which shows inferior PRD as CR increases. And it also shows comparable performance to the more complicated software-based methods such as DCT [5, 13] and Fourier decomposition [6]. For example, when CR is below 10, the proposed work achieves better performance than the others except for [13], which shows higher CR with similar PRD. And for CR around 20, the PRD of the proposed work is only 0.68%, much lower than the rest. A comparison with existing hardware implementations is shown in Table III. [8] uses a simple temporal method for compression and has the smallest gate count. [7] and [9] employ DWT methods and show larger gate count or area. The proposed work applies the integer-to-integer LWT and it takes a gate count smaller than [9] and an area smaller than both [7] and [9]. Since leakage power dominates the total power of the low processing frequency (KHz) ECG monitoring systems, with smaller gate count and area, the proposed work also consumes lower power than [7, 9]. As to compression efficiency, though [8] has the lowest complexity, it also gains the lowest performance for both lossless and lossy compression. [9] achieves the highest lossless CR of 2.89, yet the compression is only near lossless due to the integer encoding for fractional results. With LWT and an optimized hybrid encoding strategy, this work gains a high lossless CR of 2.71 and the highest lossy CR of 14.87 with the lowest PRD of 0.39% among the four works. 10

Table III. Comparison with other hardware design of ECG compressors. Reference [7] 2015 [8] 2017 [9] 2017 This work Process 65 nm 350 nm 180 nm 40 nm Gate count - 3.98 K 19.5 K 10.8 K Area (mm 2 ) 0.17-0.86 0.013 Voltage (V) 0.7 3 0.45 0.5 Frequency (Hz) 9 K 360 360 2.88 K Power 49 uw 295 nw 147 375 nw 92 nw CR (lossless) - 2.11 2.89 2.71 CR (lossy) 10.3 7.86 10.00 14.87 PRD 0.64% 0.51% 0.66% 0.39% 5 Conclusion This work presents a low-complexity lossless and lossy ECG compression processor. The combination of 9/7-M and 5/3 LWT applied for ECG signal decomposition and the proposed hybrid encoding strategy contribute to an efficient compression processor with lower complexity than the stat-of-the-art works. Besides, a transposable register matrix helps optimize the processing frequency without burdening resource overhead. Implemented in 40 nm CMOS process, the proposed processor only takes a small gate count of 10.8 K with a low power consumption of 92 nw. And it achieves a lossless CR of 2.71 and scalable lossy CR of 4.24 33.34 with low PRD of 0.11% 1.34% for MITDB. The high power efficiency and compression performance make the proposed processor an attractive choice for wireless ECG monitoring systems. Acknowledgments The authors acknowledge the support of the Fundamental Research Funds for the Central Universities (grant No. 2015QNA4018), and State Key Laboratory of ASIC and System (grant No. 2015KF009). 11