# Novel Approach On Efficient Hardware Architechture For 2d-Discrete Wavelet Transform

Lenin Raja <sup>1</sup>Student Member IEEE, A. Merline<sup>2</sup>, Dr.R. Ganesan<sup>3</sup>

Abstract- A High speed and reduced -area 2D discrete wavelet transform (2D-DWT) is architecture proposed. Previous DWT architecture are mostly based on the modified weighted lifting scheme. In order to achieve a critical path with only one multiplier. **Experimental** measurement of design performance in terms of area, speed and power for 90nm Complementary Metal Oxide Semiconductor (CMOS) implementation are presented, Results indicate that while BP design exhibit inherent speed advantages.DS design requires significantly fewer hardware resource with increased precision and DWT level.In addition to the BP and DS design, a novel flexible DWT processor is presented, which supports run time and increase the performance of the DWT parameters .In this proposed approach were give an efficient hardware support to the VLSI architecture achieved by Weighted Lifted Wavelet Transform(WLWT).

Keyword-Fixed point arithmetic, Fractional Bit, Image Coding, VLSI, Wavelet Transforms.

#### **I INTRODUCTION**

THE DISCRETE wavelet transform (DWT) is a multi-resolution analysis tool with excellent characteristics in the time and frequency domains. The coding efficiency and the quality of image restoration with the DWT are higher than those with the traditional discrete cosine transform. Moreover, it is easy to obtain a high compression ratio. As a result, the DWT is widely used in signal processing and image compression, such as MPEG-4, JPEG2000, and so on [1], [2]. Although there is a

rich literature on different hardware implementations of the DWT [2]–[6] and novel DWT algorithms [7], there has been much less attention directed to approaches in which the precision of the DWT computation is specifically considered as a design goal. Traditional DWT architectures [3], [4] are based on convolutions. Then, the second-generation DWTs, which are based on lifting algorithms, are proposed [5], [6]. Compared with convolution-based ones, lifting-based architectures not only have lower computation complexity but also require less memory. Nevertheless, directly mapping these algorithms to hardware [7] leads to relatively long data path and low efficiency.

Recently, several novel architectures based on the lifting scheme have been proposed [8]-[10]. The flipping structure is another important DWT architecture that was proposed by Huang et al. [11]. With a five-stage pipeline, the critical path can be also reduced to one multiplier. However, the flipping structure has a large temporal buffer, and fewer pipelining stages lead to longer critical path delay. The work in [9] conducts a similar analysis with the fixed-point data path fixed to 12 bits of integer and 12 bits of fractional precision, which provides sufficient dynamic range to compute a six-level DWT with over 50-dB PSNR. The work in [10] examines the effect on PSNR when quantizing filter coefficients for a convolution-based 9/7 DWT, and [11] focuses on analyzing dynamic range requirements of the DWT across different subbands and decomposition levels In addition, a novel architecture is developed to implement the 2-D DWT based on the above modified scheme. The parallel scanning method is employed to reduce the size of the transposing buffer. As a result, our design achieves higher efficiency.



Fig.1. Illustration of a two-level wavelet decomposition. The dotted portions are the final wavelet transformed



Fig. 2. Flipping structure for the lifting-based 1-D 9/7 DWT

The rest of this paper is organized as follows: Section II gives an overview of the Weighted lifting-based DWT and JPEG 2000 quantization. Section III describes the design for the precision-aware BP DWT architecture, whereas Section IV discusses the precision-aware DS approach. Section V presents a configurable DS DWT architecture that provides the flexibility to change DWT levels and precision at run time. Section VI provides experimental results, and concluding remarks are given in Section VII.

data

# II. FORMULATIONS FOR THE COMPUTATION OF THE 2-D DWT The 2-D DWT

signal is successively decomposed in a spatial multiresolution domain by low pass and high pass FIR filters along each of the two dimensions. The four FIR filters, denoted as high pass-high pass (HH), high pass-low pass (HL), low pass-high pass (LH) and low pass-Low pass (LL) filters, produce, respectively, the HH, HL,LH and LL subband data of the decomposed signal at a given resolution level. The samples of the four subbands of the decomposed signal at each level are decimated by a factor of two in each of the two dimensions. For the operation at the first level of decomposition, the given 2-D signal is used as input, whereas for the operations of the succeeding levels of decomposition, the decimated LL subband signal from the previous decomposition level is used as input.





#### III. PROPOSED ALGORITHM A. Wavelet Transform and Multi-scale Representation of Images

In the past decades, wavelet transform has been widely employed in different applications, such as signal processing, image processing [17]–[20]. Especially in the image compression area, wavelet-based image compression has been regarded as a new breakthrough in image compression technologies and the new still picture compression standard JPEG2000 is based on it [21]. The advantages of wavelet transform lie in that wavelet transform can be used to

decompose the signal into different scales so that one can choose the appropriate scales in the wavelet transform domain, whilst ignoring or reducing the contribution of the other scales. For examples, in wavelet transform-based image compression, by doing so, one can get high compression ratio and good quality reconstructed images [20], and in image de noise problems, the noises can be removed effectively [22], [23]. As described in Section I, wavelet transform has been employed in image enhancement for a long time and several algorithms for image enhancement are based on it [9], [11], [12]. The advantages of wavelet transform/multi scale enhancement methods lie in that mammograms contain features with varying scale characteristics [24], subtle features, such as calcifications are mostly contained

within small scales while larger objects with smooth borders, such as masses, are mostly contained in coarse scales [13]. Thus, different features can be selected to be enhanced within different scales. In order to develop our enhancement algorithm, we need the multi scale representation of the image.

#### **B.** Lifting Approach

There are of course many references describing the DWT. For clarity, we briefly describe DWT aspects that are directly relevant to the subsequent design discussion. Fig. 1 illustrates the steps for performing a two-level DWT on an image. The 1-D DWT is first performed on the rows of the image producing low-frequency L1 and high-frequency H1 components. After performing a 1-D DWT again on the columns of L1 and H1, the first level of decomposition is completed, LL1,HL1,LH1 and HH1, are obtained. This process can be recursively applied

on to produce the LL2 ,HL2 ,LH2 , and HH2 subbands. The 9/7 DWT was originally implemented via convolution based methods, in which low-pass and high-pass FIR filters are employed. In 1998, Daubechies and Sweldens [12] showed that DWT can be decomposed into a finite sequence of lifting steps, which provides several advantages including lower computation and memory requirements and easier boundary management [13]. When lifting is used, the 9/7 filter can be expressed using the following steps:

$$P(Z) = \begin{bmatrix} 1 & \alpha(1+z^{-1}) \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ \beta(1+z) & 1 \end{bmatrix} \\ \begin{bmatrix} 1 & \gamma(1+z^{-1}) \\ 0 & 1 \end{bmatrix} \mathbf{x} \begin{bmatrix} 1 & 0 \\ \delta(1+z) & 1 \end{bmatrix} \\ \begin{bmatrix} \zeta & 0 \\ 0 & 1/\zeta \end{bmatrix}$$

Where  $\alpha$ =-1.586134342, $\beta$ =-0.05298011854,  $\gamma$ =0.8829110762,  $\delta$ =0.4435068522 and  $\zeta$ =1.149604398.Fig.2 Illustrates the flipping structure described by Hu and getal.[2]for the lifting-based1-D 9/7 DWT. Although the flipping structure shares the same computational complexity with the traditional lifting scheme, It reduces the critical path considerably by flipping computation units with the inverses of multiplier coefficients Constants C<sub>0</sub>,...C<sub>5</sub> are given by

| Co             | $=1/\alpha$ =-0.6304636206        |
|----------------|-----------------------------------|
| C1             | $=1/(\alpha\beta) = 0.7437502472$ |
| C <sub>2</sub> | $=1/(\beta\gamma)$ =-0.6680671710 |

| C3 | $=1/(\gamma\delta)$ =0.6384438531                  |
|----|----------------------------------------------------|
| C4 | $= \alpha \beta \gamma \delta \zeta = 2.065244244$ |

 $C_5 = \alpha \beta \gamma \delta \zeta = 2.421021152$ 

#### **IV.BP DWT DESIGN**

We first consider a BP approach, which is appropriate when computing speed is the primary goal. Given the lifting frame-work described earlier, the design challenge lies in determining the appropriate number of integer and fractional bits to use in representing all the signals utilized during the computation.

#### TABLE 1

BP APPROACH INTEGER BIT WIDTHS (IB) AND FRACTIONAL

BIT WIDTHS (FB) FOR A TWO-LEVEL DWT WITH A PRECISION

REQUIRMENT FBHH1=8.THE IBS IN THE BRACKETS ARE THE IBS

| BEFORE   | THE  | INCREASE | DUE | FOR | BINARY |
|----------|------|----------|-----|-----|--------|
| POINT AL | IGNN | 1ENT     |     |     |        |

|                     | IB      | IB      | IB      | IB      |         |
|---------------------|---------|---------|---------|---------|---------|
| Signal              | Level 1 | Level 1 | Level 2 | Level 2 | FB      |
|                     | Row     | Column  | Row     | Column  |         |
| $C_0$               | 1       | 1       | 1       | 1       | 15      |
| $C_1$               | 1       | 1       | 1       | 1       | 18      |
| $C_2$               | 1       | 1       | 1       | 1       | 14      |
| $C_3$               | 1       | 1       | 1       | 1       | 19      |
| $C_4$               | 3       | 3       | 3       | 3       | 12      |
| $C_5$               | 3       | 3       | 3       | 3       | 12      |
| $x_i$ / $\hat{x_i}$ | 0 (0)   | 1 (1)   | 2 (1)   | 3 (2)   | 8 or 19 |
| $D_0$               | 2 (0)   | 3 (0)   | 4 (1)   | 5 (1)   | 16      |
| $D_1$               | 0 (0)   | 1 (1)   | 2 (1)   | 3 (2)   | 18      |
| $D_2$               | 2 (1)   | 3 (2)   | 4 (2)   | 5 (3)   | 16      |
| $D_3$               | 2 (2)   | 3 (2)   | 4 (3)   | 5 (3)   | 15      |
| $D_4$               | 0 (-1)  | 1 (-1)  | 2 (0)   | 3 (0)   | 18      |
| $D_5$               | 1 (1)   | 2 (1)   | 3 (2)   | 4 (2)   | 18      |
| $D_6$               | 1 (1)   | 2 (2)   | 3 (2)   | 4 (3)   | 18      |
| $D_7$               | 1 (1)   | 2 (1)   | 3 (2)   | 4 (2)   | 18      |
| $D_8$               | 0 (0)   | 1 (0)   | 2 (1)   | 3 (1)   | 16      |
| $D_9$               | 0 (0)   | 1 (0)   | 2 (1)   | 3 (1)   | 15      |
| $D_{10}$            | 0 (0)   | 1 (0)   | 2 (1)   | 3 (1)   | 16      |
| $D_{11}$            | 0 (0)   | 1 (0)   | 2 (1)   | 3 (1)   | 16      |
| $s_i$ / $\hat{s_i}$ | 1 (1)   | 2 (1)   | 3 (2)   | 4 (2)   | 19      |
| $d_i$ / $\hat{d_i}$ | 1 (1)   | 2 (1)   | 3 (2)   | 4 (2)   | 19      |

In the discussions that follow, two's complement fixed-point representation is used for all signals. The number of integer bits, fractional bits, and the total number of bits of signal are denoted by **IBz**, **FBz**, and **Bz**, respectively,where **Bz= IBz**, + **FBz**, Choosing too many integer bits leads to unnecessary

use of resources, where as too few leads to over flow problems due to insufficient range. Too many fractional bits again wastes resources, where as using too few causes unwanted loss of precision.

### V. DS DWT DESIGN

While DS arithmetic has a significant advantage over BP in terms of circuit area, a key challenge in DS design involves minimizing the number of iterations. For the DS representations used here, we use a radix-2SD redundant number system[19]. Due to redundancy, SD operations do not propagate carries and hence are able to run in most significant digit firs(MSDF) [19] mode (also known as on line arithmetic). This MSDF property makes it attractive for the DS-DWT approach since it allows for varying the number of iterations to obtain different precision .In radix-2SD, the following set is used to represent a digit: {-1 0 1} We use binary bits b'10, b'00, and b'01 to indicate1,0, and1, respectively.

Fig.3 Illustrates the DS1-D9/7 DWT solution. The incoming two's complement data is first serialized and converted into SD representation. The serial SDs are then passed into the DS-DWT, which is partitioned into nine pipeline stages that run in parallel. After the last stage ,the DWT-transformed data is converted back into two's complement representation[20] and parallelized into words. This approach reduces the memory requirement since two's complement occupies half the area of the equivalent SD representation. Both SD addition and SD multiplication produce one digit per cycle, starting from the most significant digit. For SD addition, we use the architecture described by Koren[21]. For SD multiplication, where one of the operands is a constant, we use a structure similar to the one pro-posed by Ercegovac and Lang[22].

#### VI.RESULT AND IMPLEMENTATION

The BP and digit-serial architectures discussed in Sections III and IV enable optimized computation of a single level of the DWT at a single precision requirement. However, many DWT applications involve multilevel DWT decompositions. Thus, it is of high interest to have a single reconfigurable DWT processor that supports different DWT levels (e.g., levels 1–7) and precision (e.g., precision 2–14) at run time. Varying these parameters provides the ability to vary compression ratios, image quality, and processing time. Adding this flexibility to the BP approach would mean that all of the operators would need to be large enough to support the highest level and precision. When Performing a DWT at a low level and/or precision, this would involve significant hardware inefficiency. The DS approach, how- ever, is well suited for this reconfigurability as the number of iterations and, thus, the precision can be simply varied at run time as a function of the DWT level.



Fig. 4. Configurable shift register used in the runtime configurable design.

# TABLE II

NUMBER OF DIGIT ALLOCATED FOR THE COEFFICIENT IN A FOUR LEVEL DWT WITH A PRECISION REQUREMENT OF FBHH1=8

| SIGNAL | C <sub>0</sub> | C <sub>1</sub> | C <sub>2</sub> | C <sub>3</sub> | C <sub>4</sub> | C <sub>5</sub> |
|--------|----------------|----------------|----------------|----------------|----------------|----------------|
| DIGITS | 17             | 16             | 18             | 19             | 19             | 21             |

# TABLE III NUMBER OF ITERATIONS ALLOCATED TO EACH 1-D DWT STEP IN A FOUR LEVEL DWT WITH A PRECISION REQUIRMENT OF

| FBH | HI | =8 |  |
|-----|----|----|--|
|     |    |    |  |

| LEVEL       | 1     | 2     | 3     | 4      |
|-------------|-------|-------|-------|--------|
|             |       |       |       |        |
| ROW/COLOUMN | ROW   | ROW   | ROW   | ROW    |
|             | OLOMN | OLOMN | OLOMN | COLOMN |
|             |       |       |       |        |
|             |       |       |       |        |
|             |       |       |       |        |
| ITERATION   | 25    | 26    | 27    | 28     |
|             | 25    | 27    | 27    | 28     |

The following JPEG2000 images of a weighted LWT approach for the hardware precision aware structure and its PSNR rates improved by Weighted LWT and BP DWT, DS DWT designs.











So this proposed Method were implemented for four level DWT and through this method we achieved high efficiency and improved PSNR rates for NxN images on VLSI Architecture.

TABLE IV EXISTING APPROACH

| APPROAC  | AREA  | CLOC  | PROCESSIN  | DYNAMI  | STATI  | DYNAMI | STATIC | TOTAL  |
|----------|-------|-------|------------|---------|--------|--------|--------|--------|
| Н        | (GATE | Κ     | G TIME(ms) | C POWER | С      | С      | ENERG  | ENERG  |
|          | )     | SPEED |            | (mw)    | POWE   | ENERGY | Y (mw) | Y (mw) |
|          |       | (MHz) |            |         | R (mw) | (mw)   |        |        |
| BIT      | 35228 | 56    | 7.4        | 6.8     | 3.5    | 0.05   | 17.3   | 17.4   |
| PARALLEL |       |       |            |         |        |        |        |        |
| DIGIT    | 18680 | 435   | 29.8       | 23.5    | 1.7    | 0.70   | 8.6    | 9.3    |
| SERIES   |       |       |            |         |        |        |        |        |

#### TABLE V PROPOSED APPROACH

COMPARISION BETWEEN BP AND DS IMPLEMENTATION FOR A FOUR-LEVEL 8 BIT DWT.THE PROCESSING TIME TAKEN FOR THE ENTIRE FOUR-LEVEL DWT ON A 512X512 IMAGE.THE POWER RESULTS ARE OBTAINED WITH SYNOPSYS DESIGN COMPILER USING A SET OF TEST OF TEST VECTORS THE STATIC ENERGY IS COMPUTED ASSUMING A 5-S STAND BY TIME.

| APPROAC  | AREA  | CLOC  | PROCESSIN  | DYNAMI  | STATI  | DYNAMI | STATIC | TOTAL  |
|----------|-------|-------|------------|---------|--------|--------|--------|--------|
| Н        | (GATE | Κ     | G TIME(ms) | C POWER | С      | С      | ENERG  | ENERG  |
|          | )     | SPEED |            | (mw)    | POWE   | ENERGY | Y (mw) | Y (mw) |
|          |       | (MHz) |            |         | R (mw) | (mw)   |        |        |
| BIT      | 34826 | 64    | 6.4        | 6.2     | 3.2    | 0.05   | 16.8   | 16.85  |
| PARALLEL |       |       |            |         |        |        |        |        |
| DIGIT    | 18440 | 524   | 27.6       | 21.8    | 1.5    | 0.682  | 8.2    | 8.882  |
| SERIES   |       |       |            |         |        |        |        |        |
|          |       |       |            | l       |        |        |        |        |

#### VII. CONCLUSION

We have presented precision-aware approaches and associated hardware implementations for performing the DWT. Both BP and DS design methodologies and results have been presented. These methods enable use of an optimal amount of hardware resources in the DWT computation. Moreover, this framework enables quantization, which is traditionally performed after the DWT in algorithms such as JPEG 2000, to be specifically incorporated into the computation of the DWT itself.

We have also presented a highly flexible configurable DWT processor and examined the energy and power tradeoffs between the associated BP and DS designs, in particular, highlighting the differing respective roles of static and dynamic power in each. We believe that design methods and architectures such as those presented here play a significant role in the design of future

# **REFERENCE**

[1] G. Xing, J. Li, and Y. Q. Zhang, "Arbitrarily shaped video-object coding by wavelet," *IEEE Trans. Circuits Syst. Video Technol.*, vol. 11, no. 10, pp. 1135–1139, Oct. 2001.

[2] S. C. B. Lo, H. Li, and M. T. Freedman, "Optimization of wavelet decomposition for image compression and feature preservation," *IEEE Trans. Med. Imag.*, vol. 22, no. 9, pp. 1141–1151, Sep. 2003.

[3] K. K. Parhi and T. Nishitani, "VLSI architecture for discrete wavelet transforms," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 1,no. 2, pp. 191–202, Jun. 1993.

[4] P.Wu and L. Chen, "An efficient architecture for two-dimensional discrete wavelet transform," *IEEE Trans. Circuits Syst. Video Technol.*, vol. 11, no. 4, pp. 536–545, Apr. 2001.

[5] W. Sweldens, "The new philosophy in biorthogonal wavelet constructions," in *Proc. SPIE.*, 1995, vol. 2569, pp. 68–79.

[6] I. Daubechies and W. Sweldens, "Factoring wavelet transform into lifting steps," *J. Fourier Anal. Appl.*, vol. 4, no. 3, pp. 245–267, Mar. 1998.

[7] J. M. Jou, Y. H. Shiau, and C. C. Liu, "Efficient VLSI architectures for the biorthogonal wavelet transform by filter bank and lifting scheme," in *Proc. IEEE ISCAS*, May 2001, vol. 2, pp. 529–532.

[8] S. Barua, K. Kotteri, A. Bell, and J. Carletta, "Optimal quantized lifting coefficients for the 9/7 wavelet," in *Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.*, 2004, vol. 5, pp. 193–196.

[9] V. Spiliotopoulos, N. Zervas, Y. Andreopoulos, G. Anagnostopoulos, and C. Goutis, "Quantization effect on VLSI implementations for the 9/7 DWT filters," in *Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.*, 2001, vol. 2, pp. 1197–1200.

[10] K. Kotteri, A. Bell, and J. Carletta, "Design of multiplierless, high-performance, wavelet filter banks with image compression applications," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 51, no. 3, pp. 483–494, Mar. 2004.

[11] A. Benkrid, K. Benkrid, and D. Crookes, "Optimal wordlength calculation for forward and inverse discrete wavelet transform architectures,"*Opt. Eng.*, vol. 43, no. 2, pp. 455–463, Feb. 2004.

[12] I. Daubechies and W. Sweldens, "Factoring wavelet transforms into lifting steps," *J. Fourier Anal. Appl.*, vol. 4, no. 3, pp. 247–269, May 1998.

[13] T. Acharya and C. Chakrabarti, "A survey on lifting-based discrete wavelet transform architectures," *J. VLSI Signal Process.*, vol. 42, no.3, pp. 321–339, Mar. 2006.

[14] M. Marcellin, M. Lepley, A. Bilgin, T. Flohr, T. Chinen, and J. Kasner, "An overview of quantization in JPEG 2000," *Signal Process.: Image Commun.*, vol. 17, no. 1, pp. 73–84, Jan. 2002.

[15] K. Varma and A. Bell, "JPEG2000—Choices and tradeoffs for en- coders," *IEEE Signal Process. Mag.*, vol. 21, no. 6, pp. 70–75, Nov.2004.

[16] M. Weeks, "Precision for 2-D discrete wavelet transform processors," in *Proc. IEEE Workshop Signal Process. Syst.*, 2000, pp. 80–89.

[17] M. V. Wickerhauser, "High-resolution still compression," *Digital Signal Processing*, vol. 2, pp. 204–226, 1992.

[18] M. Antonini, M. Barlaud, P. Mathieua, and I. Daubechies, "Image coding using wavelet transform," *IEEE Trans. Image Process.*, vol. 1,no. 2, pp. 205–220, Feb. 1992.

[19] S. G. Chang, Z. Cvetkovic, and M. Vetterli, "Resolution enhancement of images using wavelet transform extrema extrapolation," in *Proc.IEEE Int. Conf. Acoustics, Speech, and Signal Processing*, May 1995, vol. 4, pp. 2379–2382.

[20] J. M. Shapio, "Embedded image coding using zerotree of wavelet coefficients," *IEEE Trans. Signal Process.*, vol. 41, no. 12, pp. 3445–3462,Dec. 1993.

[21] [Online]. Available: http://www.jpeg.org/JPEG2000.htm

[22] S. G. Chang, B. Yu, and M. Vetterli, "Image denoising via lossy compression and wavelet thresholding," in *Proc. Int. Conf. Image* 

Processing, Oct. 1997, vol. 1, pp. 604-607.

[23] S. G. Chang and M. Vetterli, "Spatial adaptive wavelet thresholding for image denoising," in *Proc. Int. Conf. Image Processing*, Oct. 1997, vol. 2, pp. 374–377.

[24] E. D. Pisano, R. E. Hendrick, M. Yaffe, E. F. Conant, and C. Gatsonis, "Should breast imaging practices convert to digital mammography? A response from members of the DMIST executive committee," *Radiology*, vol. 245, pp. 12–13, 2007.

# **Author Profile**



Lenin raja received his B.E degree in Electronics and Communication Engineering from Anna University Coimbatore, Tamil Nadu in 2010.He also Received his Electrical Diploma's from K.L.N.M Polytechnic College, Madurai and

Mohammed Sathak A.J. College of Engineering, Chennai in the year 2003,2006 respectively. Currently he pursuing his M.E (VLSI Design) from Sethu Institute of Technology, Kariyapatti, Virudhunagar Dist.,Tamil Nadu, India.His research interests include signal processing, architecture design, and VLSI Implementation of digital systems.