# Design of Enhanced Half Ripple Carry Adder for VLSI Implementation of Two-Dimensional Discrete Wavelet Transform

J. Vinoth Kumar<sup>1\*</sup> and C. Kumar Charlie Paul<sup>2</sup>

<sup>1</sup>St. Peter's University, Chennai - 600054, Tamil Nadu, India; Stpetervinothkumar.vlsi@gmail.com

<sup>2</sup>A. S. L. Pauls College of Engineering and Technology, Coimbatore - 641032, Tamil Nadu, India; charliepaul1970@gmail.com

#### Abstract

The aim of the current research work is to design an efficient two-dimensional Discrete Wavelet Transformation (DWT) based image compression technique. In order to achieve best performance, Enhanced Half-Ripple Carry Adder (EHRCA) has been designed. Verilog Hardware Description Language (Verilog HDL) is used to model the EHRCA and DWT technique. DWT technique has been designed with the help of two types of filtering technique known as Low Pass Filter (LPF) and High Pass Filter (HPF). Three levels of decomposition is made by DWT process and each process have two levels compressions called "Row Wise Compression" and "Column Wise Compression". In proposed DWT models, adders are recognized as high potential than other components. In order to improve the efficiency of DWT process, an efficient adder called "Enhanced Half-Ripple Carry Adder (EHRCA)" has been designed in this research work. Proposed EHRCA circuit offers 10.71% improvements in hardware slice utilization, 11.78% improvements in total power consumption than traditional Binary to Excess 1 Conversion (BEC) based Square Root Carry Select Adder (SQRT CSLA). Further proposed adder has been incorporated into Row Wise Compression and Column Wise Compression for improving the architectural performances of DWT. In future, proposed EHRCA based DWT will be useful in Discrete Cosine Transformation (DCT) and hybrid type and lifting based DWT techniques.

**Key words:** Binary to Excess 1 Conversion based Carry Select Adder, Carry Select Adder, Hybrid and Lifting based Discrete Wavelet Transformation Technique, Row and Column Wise Compression, Very Large Scale Integration

## 1. Introduction

Two Dimensional (2-D) Discrete Wavelet Transformation techniques (DWT) are widely used for image and video compression process<sup>5</sup>. The 2-D DWT technique has multiresolution decomposition capability, because it plays role in many engineering fields<sup>10</sup>. However, accumulation of large values of data of various decomposition levels of the transform makes their complexity computationally very intensive. Large endeavours have been designed many architectures which are aimed at providing high speed 2-D DWT computation with the requirement of reasonable hardware utilization. These architectures can be classified as separable and non-separable architectures. In a separable architecture, 2-D filtering operation can be done through two 1-D filtering operations, one for processing the data in row-wise and another one for processing the data in column-wise. The decomposition levels of input images can be employed by either a Recursive Pyramid Algorithm (RPA) or lighting operation. In separable filtering architecture a 1-D filtering structure is used to perform the 2-D DWT and hence it must need additional

\*Author for correspondence

computational complexity between two 1-D filtering processes. This increases the latency as well as memory size of the architectures. The non-separable architectures are used to reduce the limitation of separable architectures, since in non-separable architectures, 2-D DWT are computed directly by using 2-D filters. However, the speed of the DWT process is very low for non-separable architectures. In order to overcome this problem, pipelining technique is used in DWT architecture<sup>10</sup>.

In general, Haar Discrete Wavelet Transform (HDWT) is used to compress the signal/image<sup>6</sup>. To increase the compression ability of image, precision-aware selfquantizing architectures can be used in<sup>3</sup>. To generate the DWT coefficients, Distributed Arithmetic (DA) based Multiplication is used in<sup>2</sup>. DA based multiplier performs the multiplication operation with the help of Look up Tables (LUTs). Therefore, the performance of DA based multiplier is better than any other multiplier. In<sup>9</sup>, one dimensional DWT techniques can be implemented in Very Large Scale Integration (VLSI) System design environment. Further, VLSI based high speed 2-D DWT can be implemented in<sup>1</sup>.

In this paper, 2-D DWT technique is designed by using Enhanced Half Ripple Carry Adder (EHRCA). An EHRCA is the type of Ripple Carry Adder (RCA), hardware complexity and power consumption is reduced effectively than traditional RCA circuit. Also, the performance of DWT can be increased in terms of silicon area and power consumption, when EHRCA incorporated into DWT process.

## 2. Discrete Wavelet Transforamtion (DWT)

Discrete Wavelet Transformation (DWT) is the technique for decomposing/compressing the images. Also DWT represents as an image which is the sum of wavelet functions (wavelets) with different location and scale. It represents the data into a set of low pass and high pass coefficients. The input data is passed through set of low pass and high pass filters. The output from high pass filters and low pass filters are down sampled by 2. The output from low pass filter is an average coefficient and the output from high pass filter is a detail coefficient. The schematic diagram of 1-D DWT method is shown in Figure. 1.



Figure 1. Block diagram of 1-D DWT.

In 2-D DWT, the input data is passed through set of both low pass and high pass filter in two directions, both rows and columns. As in 1-D DWT, the outputs from low pass and high pass filters are down sampled by 2 in each direction. Figure 2 shows the block diagram of 2-D DWT. As in Figure 2, the output is in set of four coefficients LL, HL, LH and HH. In coefficient representation, the first alphabet represents the transform in row where as the second alphabet represents transform in column. The representation L means low pass signal and H means high pass signal.

In this paper, three levels of decomposition are done to compress the image with the help of EHRCA. The structure of DWT levels is shown in Figure 3. Similarly, in reconstruction, input data can be achieved in multiple resolutions by decomposing the LL coefficient further for different levels. The compressed data is up-sampled by a factor of 2 in order to reconstruct the original input data while performing interpolation process.



Figure 2. Block diagram of 2-D DWT.



Figure 3. Structure of DWT levels.

## 3. Image Compression using DWT

An input image is passed through a series of filters to calculate the DWT coefficients. The procedure starts with passing this image through a half band digital low pass filter with impulse response h[n]. Filtering an image signal corresponds to the numerical operation of convolution of an image signal with the impulse response of the filter. The convolution operation in discrete time is defined as follows:

$$y(n) = \sum_{k=-\infty}^{\infty} x[k] \bullet h[n-k]$$
(1)

A half band low pass filter removes all frequencies that are above half of the highest frequency in the signal, which can be interpreted as losing half of the information. Resolution, on the other hand, is related to the amount of information in the signal, and therefore it is affected by filtering operations. However, subsampling operation does not affect the resolution after filtering, since; removing half of the spectral components from the input signal makes half the number of samples redundant anyway. In summary, half band low pass filtering halves the resolution, but leaves the scale unchanged. This signal is then subsampled by Equation (2), therefore half of the number of samples are redundant. The procedure for subsampling can mathematically be expressed as follows

$$y(n) = \sum_{k=\infty}^{\infty} h[k] \bullet x[2n-k]$$
<sup>(2)</sup>

The input image signals are decomposed into average information and detail information. The average and detail information are described as follows

$$y_{high}[k] = \sum_{n} x[n] \bullet g[2k-n]$$
(3)

$$y_{low}[k] = \sum_{n} x[n] \bullet h[2k-n]$$
<sup>(4)</sup>

In Equations (3) and (4), g[k] and h[k] are represented as detail and average signals. In reconstruction, reverse process is applied to recover the original image.

$$x[n] = \sum_{k=-\infty}^{\infty} \left( y_{high}[k] \bullet g[2k-n] \right) + \left( y_{low}[k] \bullet h[2k-n] \right)$$
(5)

In general, DWT can be implemented in two types of methods. They are 1. Matrix Multiplication Method and 2. Linear Equation Method. In linear equation methods, every set of four pixels are considered to compute the DWT coefficients. These four pixels are processed by using Equation (6),

$$DWT_{coeff} = (Pixel_1 + Pixel_2 + Pixel_3 + Pixel_4)/2$$
 (6)

10

From Equation (6), it is clear that, addition process is required for image compression in DWT for each level. To implement 2D DWT, EHRCA adder is used in our work.

#### 4. Conventional Carry Select Adder

Carry Select Adder (CSLA) is one of the best adders for binary addition. In CSLA architecture, dual RCA is used for carry input 0 and carry input 1 respectively. Further Multiplexors are used in final stage of addition process. A single RCA structure has four numbers of Full Adders (FAs). Therefore, dual RCA structure has 8 numbers of FAs. More number of gates is required to design the CSLA for binary addition. Generally this adder is called as Square Root Carry Select Adder (SQRT CSLA), because, it requires  $\sqrt{N}$  set of dual RCA set to compute N-bit binary addition process. All set of dual RCA can execute in a parallel manner. Final stage of SQRT CSLA uses the multiplexors to produce the final sum results. Hence, final stage only has Carry Propagation Delay (CPD), but in RCA circuit, entire structure has CPD.



Figure 4. Structure of 16-bit BEC based SQRT CSLA.

Further, RCA circuit for carry input 1 has been replaced to Binary to Excess 1 (BEC) Converter to improve the performance. BEC circuit utilizes the less number of gates to perform the RCA operation for carry input 1. For instance, 16-bit BEC based SQRT CSLA is illustrated in Figure 4. It consists of four set of RCA-BEC set to add two 16-bit binary integers. It reduces the silicon area utilization and power consumption than traditional SQRT CSLA circuit. However, silicon area requirement of combined RCA-BEC circuit is more and it consumes large power consumption to perform 16-bit binary addition process. Hence, to reduce this problem, EHRCA circuit is designed in this paper. The brief description of EHRCA is presented in next section.

# 5. Enhanced Half Ripple Carry Adder

RCA is one of the basic adders to perform the binary addition process. However, CPD is the main disadvantages in RCA circuit (i.e.,) every stage must have wait for carry signal from previous stage. In order to reduce the problem of CPD in RCA circuit, Enhanced Half Ripple Carry Adder (EHRCA) is developed in our work. The circuit diagram for developed EHRCA circuit for 4-bit is illustrated in Figure 5. It consists of HAs, OR gate, AND gate and Multiplexors for performing addition process. As the name itself, final half of the circuit only (Multiplexors part) must have to wait until carry signal load from previous stage, remaining circuits can execute in a parallel manner. Hence, this adder circuit named as Enhanced Half Ripple Carry Adder. In other hand, the structure of this circuit is like that SQRT CSLA. Instead of RCA-BEC combination for Cin = 0 and Cin = 1 respectively of CSLA circuit, simplified circuit is designed as shown in Figure 5. The carry input is considered only final stage of EHRCA where as remaining circuit can perform the respective computation in a parallel manner with the help of available input data. Similar to Figure 5, we can design the EHRCA circuit for 8-bit and 16-bit. Further, the EHRCA adder is incorporated into the addition process of Equation (6) to increase the performance of 2-D DWT. Three levels of decomposition are made in this paper for image compression. The performances of conventional SQRT CSLA and developed EHRCA circuits are analyzed in Results and Discussion of this paper.



Figure 5. Circuit diagram for 4-bit EHRCA circuit.

# 6. Results and Discussions

In this paper, Enhanced Half Ripple Carry Adder (EHRCA) circuit is designed using Verilog Hardware Description Language (Verilog HDL). The validation of proposed adder circuit is evaluated using Model Sim 6.3C and Synthesis results are evaluated by using Xilinx 10.1i design tool. Also levels of decomposition of image using 2-D DWT are measured using MATLAB tool. The RCA circuit is realized in this paper and identified the redundant logic operations. Based on identified redundant logic, EHRCA circuit is designed in our work. The circuit of EHRCA is most likely conventional BEC based SQRT CSLA. Hence, the performance of conventional BEC based SQRT CSLA and developed EHRCA circuit for 16-bit is compared in Table 1.

Table 1.Comparison of 16-bit conventional BEC basedSQRT CSLA and developed 16-bit EHRCA circuits

| Туре                                  | Slices | LUT | Delay(ns) | Power(mW) |
|---------------------------------------|--------|-----|-----------|-----------|
| 16-bit Conventional<br>BEC based SQRT | 28     | 47  | 15.971    | 280       |
| CSLA<br>16-bit developed              |        |     |           |           |
| EHRCA                                 | 25     | 42  | 16.707    | 247       |

From Table 1, it is clear that 16-bit developed EHRCA circuit offers 10.71% reduction in silicon area and 11.78% reduction in power consumption than conventional BEC based SQRT CSLA. Therefore, developed EHRCA circuit is the best choice for 2-D DWT implementation. Further, the developed EHRCA circuit is incorporated into 2-D DWT addition process to improve the performance. The simulation result for 2-D DWT is illustrated in Figure 6. The input image is converted into the pixels and these pixels are demonstrated in Figure 6. Three levels of decomposition are made in this paper for image compression with the help of DWT and EHRCA. The input image for to be determine the DWT coefficients is shown in Figure 7. Three levels of decomposed images are illustrated in Figure 8.



Figure 6. Simulation result for image compression using 2-D DWT.



Figure 7. Input Image.



Level 1 Level 2 Level 3

**Figure 8.** Three levels of decomposed images.

# 7. Conclusion

In this paper, 2-D DWT based image compression is developed with the help of Enhanced Half Ripple Carry Adder (EHRCA). The design of EHRCA and incorporation of EHRCA into DWT computation is done by Verilog HDL. The developed EHRCA circuit consumes less hardware resources and power consumption than conventional BEC based SQRT CSLA. The developed EHRCA circuit offers 10.71% reduction in silicon area and 11.78% reduction in power consumption than conventional BEC based SQRT CSLA. Further, developed EHRCA circuit is incorporated into addition process of 2D-DWT for image compression. Three levels of decomposition are made in this paper. Simulation results for image compression using 2-D DWT is validated by both Model Sim 6.3C and MATLAB simulation tools. In future, the developed EHRCA based 2-D DWT will be helpful for image processing applications like compression, segmentation and fragmentations.

# 8. References

- Chao C, Parhi KK. High-speed VLSI implementations of 2-D discrete wavelet transform. IEEE Transactions on Signal Processing. 2008; 56(1):393–403.
- Devangkumar S, Vithlani CH. VLSI-oriented lossy image compressionapproachusingDA-based2D-discretewavelet.Int Arab J Inf Technol. 2014; 11(1):59–68.
- Lee D-U, Kim L-K, Villasenor JD. Precision-aware selfquantizing hardware architectures for the discrete wavelet transform. IEEE Transactions on Image Processing. 2012; 21(2):768–77.
- Manju P, Rohil H. Optimized image steganography using Discrete Wavelet Transform (DWT). IJRDET. 2014; 2(2):75-81.
- Mohanty BK, Meher PK. Memory-efficient high-speed convolution-based generic structure for multilevel 2-D DWT. IEEE Transactions on Circuits and Systems for Video Technology. 2013; 23(2):353–63.
- 6. Monika R, Vij A. Image compression using discrete haar wavelet transforms. IJEIT. 2014; 3(12):47–51.
- Nahvi N, Sharma OC. Implementation of discrete wavelet transform for multimodal medical image fusion. IJETAE. 2014; 4(7):312–7.
- 8. Nanammal V, Abirami BM, Venugopalakrishnan J. VLSI based desgin of an efficient hybrid water marking scheme for multimedia content protection. Indian Journal of Science and Technology. 2015; 8(19).
- Vishwanath M, Michael RO, Irwin MJ. VLSI architectures for the discrete wavelet transform. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing. 1995; 42(5):305–16.
- Zhang C, Wang C, Ahmad MO. A pipeline VLSI architecture for fast computation of the 2-D discrete wavelet transform. IEEE Transactions on Circuits and Systems I: Regular Papers. 2012; 59(8):1775–85.