# Digit-Serial Design of FIR and Adaptive Filter, With Multiple Constant Multiplications (MCM)

Tiya Reba Stepheni PG Student Department of ECE SKCET, Coimbatore Meera Nandhini U<sub>2</sub> PG Student Department of ECE SKCET, Coimbatore,

Nithiya Devi G<sub>3</sub> PG Student Department of ECE SKCET, Coimbatore

Abstract- Efficient algorithms and architectures already exist for the design of low-complexity bitparallel multiple constant multiplication (MCM). This operation dominates the complexity of many digital signals processing system. Alternative to this, digitserial MCM design is available with low complexity. But it is not as much popular as the former one. In this paper, it's been tried to optimize the gate -level area and power of digit-serial MCM design. So initially from the basic parallel designs, like shiftadds implementation, the common sub-expression elimination and graph-based method are used .The efficient one is selected that is the GB technique and is applied to digit-serial design. Then the newly designed MCM block will be placed to the multiplier block of an FIR filter. Thus comparing to bit-parallel FIR filter design, digit-serial design has 41% of power reduction and 40.5% of area reduction and are independent of data word-length. LMS adaptive filter with digit-serial design can be used to reduce noise.

Index Terms-digit-serial arithmetic, finite impulse response (FIR) filters, multiple constant multiplications (MCM), common subexpression elimination (CSE), graph-based (GB).

## I. INTRODUCTION

Filtering is the most important technique in signal processing to choose the limited frequency spectrum in some specific frequency band [1-2]. In general, the digital filter is one kind of the generic filter, but used widely. It is necessary to pay attention to the design of the digital filter. In digital signal processing (DSP) systems finite impulse response (FIR) filters are of great importance. The FIR filters are of non-recursive type, whereby the present output sample depends on the present input samples and previous input samples. In the common case, the impulse response is finite because there is no feedback in the FIR. A lack of feedback guarantees that the impulse response will be finite.

Therefore, the term "finite impulse response" is nearly synonymous with "no feedback". Their characteristics in linear-phase and

feed-forward implementations make them very useful for building stable high-performance filters. Two types of filter implementations are using, they are (1) direct-form [Fig1. (a)] (2) transposed-form [Fig.1(b)]. Both architectures have similar complexity in hardware. But the transposed form is generally preferred because of its higher performance and power efficiency [1].

The multiplier block has significant impact on the complexity and performance of the filter design; because a large number of constant multiplications are required. Multiplications are often implemented with shift-and-add operations for hardware efficiency and are generally known as the multiple constant multiplications (MCM) [Fig. 1(c)] operation. The decomposition of multiplications into shifts and adds is such that as much intermediate computation results (partial products) can be reused as possible.

Full flexibility of a multiplier is not needed for the constant multiplication, since filter coefficients are fixed and determined beforehand by the DSP algorithms. So area, delay, and powerefficient multiplier architectures, such as Wallace [2] and modified Booth multipliers [3] are not needed. Hence, the multiplication of filter coefficients with the input data is generally implemented under a shift-adds architecture, where each constant multiplication is realized using addition/subtraction and shift operations in an MCM operation [Fig. 1(c)]. For the shift-adds implementation of constant multiplications, generally a straightforward method known as digit-based recoding is used. Initially it defines the constants in binary. Then, in the binary representation of the constant, for each "1" according to its bit position, it shifts the variable and adds up the shifted variables to obtain the result. As a simple example, consider the constant multiplications 29x and 43x. Their decompositions in binary are shown below:

29x = (11101) bin x = x << 4 + x << 3 + x << 2 + x43x = (101011) bin x = x << 5 + x << 3 + x << 1 + x



Fig.1. FIR filter implementations. (a) Direct form.
(b) Transposed form with generic multipliers. (c)
Transposed form with an MCM block.

In the fig.2. Shift-add implementation of 29x and 43x is shown. Here sharing of partial product is not done. Digit- based decoding [5] technique does not allow sharing of common partial product, which in turn will reduce the area and power dissipation of the MCM design at the gate level. The MCM problem can be defined by finding minimum number of addition and subtraction operations, which will implement the constant multiplications.



Fig. 2. Shift-adds implementations of 29x and 43x without partial product sharing.

This operation requires 6 additions as shown above. If the common partial products are shared, it will reduce the number of operations and thus power dissipation and area of the MCM design

at the gate level. Hence, the fundamental optimization problem, called the MCM problem, is defined as finding the minimum number of addition and subtraction operations that implement the constant multiplications. Note that, in bit-parallel design of constant multiplications, shifts can be realized only using wires in hardware, without representing any area cost.

The MCM problem can be solved using algorithms: common subexpression elimination (CSE) algorithms [6]-[7] and graphbased (GB) [8]-[9] techniques. In the CSE algorithms, initially extracting all possible subexpressions from the binary, canonical signed digit (CSD) [6], or minimal signed digit (MSD) [10] representations of the constants. Then finds the "best" subexpression generally the most common to be shared among the constant multiplications. The GB methods usually yields better solution and are not limited to any particular number representation. It will consider a larger number of different implementation of constants.



Fig 3.a. Exact CSE algorithm b. GB technique

In Fig. 3.a the implementation of 29x and 43x using the exact CSE algorithm [7] gives a solution with four operations by finding the most common partial products  $3x = (11)_{bin}x$  and  $5x = (101)_{bin}x$ . Here constants are defined using binary. In fig.3.b. The exact GB algorithm is implemented by sharing the common partial product 7x in both multiplication and finds a solution with minimum number of operations. In the exact CSE algorithm the partial product  $7x = (111)_{bin}x$  is not possible to extract from the binary representation of 43x.

Above mentioned algorithms are implemented by parallel processing of the input data. Bit-parallel processing is the existing method. In order to improve the efficiency, by reducing the area and power digit-serial method can be adopted.

In order to reduce the noise problem, LMS adaptive filter is used. The Least Mean Square (LMS) adaptive filter is the most popular and most widely used adaptive filter, not only because of its simplicity but also because of its satisfactory convergence performance [1], [2]. The direct-form LMS adaptive filter involves a long critical path due to an inner-product computation to obtain the filter output. The critical path is required to be reduced by pipelined implementation when it

exceeds the desired sample period. Since the conventional LMS algorithm does not support pipelined implementation because of its recursive behavior, it is modified to a form called the delayed LMS (DLMS) algorithm [3]–[5], which allows pipelined implementation of the filter.

In adaptive filter, the filter output is compared with the desired signal. If there is any error the filter weights will be adjusted according to the error. The rest of this paper proceeds as follows. Section 2 describes the background concept Section 3 proceeds with related work. Section 4 describes the proposed work. In section 5 experimental results are shown and in section 6 gives the conclusion.



Fig.4. Structure of the conventional delayed LMS adaptive filter.

### II. BACKGROUND

### A. Number Representation

For the binary representation of a number, it is been decomposed in to a set of additions of power of 2.In the case of representing a number using a signed digit system, it makes use of positive and negative digits, \{1,0,-1\}. For example the CSD representation [11] is a signed digit system that has a unique representation for each n number and it will verifies the following main properties: 1) two nonzero digits are not adjacent; and 2) the number of nonzero digits is minimum For obtaining the MSD [10] representation, the first property of CSD has to be dropped [2]. Thus, using MSD and CSD, a constant can be represented in several ways with a minimum number of nonzero digits.

### B. Digit-Serial Arithmetic



Fig. 5. Digit-serial operations when d is equal to 2.
(a) Addition operation. (b) Subtraction operation.
(c) Left shift by one time. (d) Left shift by two times

For digit-serial design, the input data is processed serially by dividing it into d bits and then applying each d-bit data in parallel. When the digit size d is equal to 1 and equal to input data wordlength, then a special case called bit-serial and bit-parallel will occur respectively. The basic digit-serial arithmetic operations are, the digit-serial addition, subtraction, and left shift operations are depicted in Fig. 5. Here digit size d equal to 2 is used., where the bits with index 0 and 1 denote the least significant bit (LSB) and the most significant bit (MSB), respectively.

Notice from Fig. 5(a) that a digit-serial addition operation, in general, requires d full adders (FAs) and one D flip-flop. The subtraction operation [Fig. 5(b)] is implemented using 2's complement, requiring the initialization of the D flip-flop with 1 and additional d inverter gates with respect to the digit-serial addition operation. In a left shift operation [Fig. 5(c) and (d)], the number of required D flip-flops is equal to the shift amount and is realized in d layers (one for each bit).



Fig. 6. Digit-serial design of shift-adds implementation of 29x and 43x given in Fig. 3(b) when d is 2.

As an example of digit-serial realization of constant multiplications under the shift-adds architecture, Fig. 6 presents the digit-serial implementation of 29x and 43x illustrated in Fig. 2(c) when d is 2. For the sake of clarity, the initializations of D flip-flops are omitted in this figure. As can be easily observed, the network includes two digit-serial additions, one digit-serial subtraction, and five D flip-flops for all the left shift operations. In this network, at each clock cycle, two bits of the input data x(x1x0) are applied to the network input, and at the outputs of each digit-serial addition/subtraction operation two bits of a constant multiplication are computed. In general, dbits are processed at each clock cycle. The digit-serial design of the MCM operation occupies significantly less area when compared to

its bit-parallel design since the area of the digitserial design is not dependent on the bit width of the input data. However, the latency is determined in terms of clock cycles.

## **III. RELATED WORK**

The exact CSE algorithms that formalize the MCM problem as a 0–1 ILP problem were introduced in [12] and [13]. The target constants are defined under a number representation in these algorithms and all possible implementations of constant multiplications are extracted from the representations of constants. The model simplification and problem reduction techniques for the exact CSE algorithms were presented in [7] and

[14]. Prominent CSE heuristics were proposed in [7], [8], and [15]. In paper [7] author proposed an algorithm that finds all the minimum singed digit (MSD) representation of a constant and it also presents an algorithm to synthesize digital filters based on the MSD representation. The exact GB algorithms define a solution with the minimum number of operations. These are introduced in [8], [9], and [16]. The approximate algorithm [12] computes all possible intermediate constants. This can be synthesized with the current set of implemented constants by using a single operation and chooses the one that leads to the largest number of synthesized target constants.

### **IV. PROPOSED WORK**

In this project digit-serial FIR filter is implemented. When the bit-parallel implementation cannot meet the delay requirements, digit-serial computation is used. Thus, the trade-off between area and delay can be explored by changing the digit-size. Also bit-parallel design requires excessive hardware. Here, the data words are divided in to digit sets, consisting of d bits which are processed one by one. Comparing to bit-parallel design, digit-serial architectures offers lower complexity. This is possible because of the less area of digit-serial operators and is independent of data wordlength. Bit-parallel designs are free of hardware but in digit-serial, the shifts require the use D flip- flops. Hence, while choosing the algorithms, the high-level algorithms should take into account for the sharing of shift operations as well as the sharing of addition/subtraction

operations in digit-serial MCM design. Furthermore, finding the minimum number of operations realizing an MCM operation does not always yield an MCM design with optimal area at the gate level. Hence, the high-level algorithms

# **VI. CONCLUSION**

In this paper, we introduced the CSE algorithm and GB technique for designing digit-serial MCM

should consider the implementation cost of each digit-serial operation at the gate level. [18]

In this paper, we initially implement the shift-add implementation, the exact CSE algorithm. Since there are still instances which the exact CSE algorithm cannot handle, we describe the approximate GB algorithm [14] that iteratively finds the "best" partial product which leads to the optimal area in digit-serial MCM design at the gate level. From these the better one, that is the GB based technique is selected and is implemented in the digit serial design. This digit-serial design is then implemented in the MCM block of the FIR filter. The power and area of this design is then compared with the existing bit-parallel design.

Finally to reduce the noise problem, LMS adaptive filter is used. Here according to the error a particular architecture has to be selected, the implementation of the filter in the above mentioned digit serial architecture is difficult. So another architecture is using. In Fig.7 a digit-serial implementation with d=1 of a constant multiplication is illustrated. Here, the constant 45 is obtained as 5.9 = (1 + 22)(1+23),i.e., two and three shifts are required in the first and second adder stage, respectively. For simplicity, adaptive filters are implemented in this method.



Fig.7 Multiplication with the constant 45 implemented using digit-serial arithmetic with d=1

### V. EXPERIMENTAL RESULTS

The simulation result of digit-serial and bit-parallel FIR filters is obtained using XILINX ISE 8.1i with SPARTAN 2E XCS150E as target device. Simulation is also carried out in ModelSim SE 6.3f using VHDL language. The simulation result shows that the digit-serial design of FIR filter is having less area compared to the bit-parallel design. The power analysis shows that the new design has a power consumption of 119mw which is less than the other approaches. The total equivalent gate count is 320 wherein the other approach is having gate count of 644. Thus it is clear that the digit-serial design is having 41% less power compared to the existing method.

operation. Since there are still instances with which the exact CSE algorithm cannot cope, we proposed an approximate GB algorithm that finds the best partial products in each iteration which yield the optimal gate-level area and power reduction in digit-serial MCM design compared to parallel design. This paper also introduced the design architectures for the digit-serial MCM operation and FIR filters. It was shown that the realization of digit-serial FIR filters under the shift-adds architecture yields 40.5% area reduction and 41% of power reduction, when compared to the filter designs whose multiplier blocks are implemented using bit-parallel constant multipliers. It is observed that a designer can find the circuit that fits best in a application by changing the digit size. As an application, the noise in the ECG signals can be removed.

### **REFERENCES**

- L. Wanhammar, DSP Integrated Circuits. New York: Academic, 1999.
- [2]. C. Wallace, "A suggestion for a fast multiplier," *IEEE Trans. Electron. Comput.*, vol. 13, no. 1, pp. 14–17, Feb. 1964.
- [3]. W. Gallagher and E. Swartzlander, "High radix booth multipliers using reduced area adder trees," in *Proc. Asilomar Conf. Signals, Syst. Comput.*, vol. 1. Pacific Grove, CA, Oct.–Nov. 1994, pp. 545–549.
- [4]. J. McClellan, T. Parks, and L. Rabiner, "A computer program for designing optimum FIR linear phase digital filters," *IEEE Trans. Audio Electroacoust.*, vol. 21, no. 6, pp. 506–526, Dec. 1973.
- M. Ercegovac and T. Lang, Digital Arithmetic. San Mateo, CA: Morgan Kaufmann, 2003.
- [6]. R. Hartley, "Subexpression sharing in filters using canonic signed digit multipliers," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 43, no. 10, pp. 677–688, Oct. 1996.
- [7]. L. Aksoy, E. Costa, P. Flores, and J. Monteiro, "Exact and approximate algorithms for the optimization of area and delay in multiple constant multiplications," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 27, no. 6, pp. 1013–1026, Jun. 2008.
- [8]. A. Dempster and M. Macleod, "Use of minimumadder multiplier blocks in FIR digital filters," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 42, no. 9, pp. 569–577, Sep. 1995.
- [9]. Y. Voronenko and M. Püschel, "Multiplierless multiple constant multiplication," *ACM Trans. Algor.*, vol. 3, no. 2, pp. 1–39, May 2007.
- [10]. I.-C. Park and H.-J. Kang, "Digital filter synthesis based on minimal signed digit representation," in *Proc. DAC*, 2001, pp. 468–473.
- [11]. A. Avizienis, "Signed-digit number representations for fast parallel arithmetic," *IRE Trans. Electron. Comput.*, vol. 10, no. 3, pp. 389–400, Sep. 1961.
  - [12]. P. Flores, J. Monteiro, and E. Costa, "An exact algorithm for the maximal sharing of partial terms in multiple constant multiplications," in Proc. Int. Conf. Comput. - Aided Design, Nov. 2005, pp. 13–16.

- [13]. O. Gustafsson and L. Wanhammar, "ILP modelling of the common subexpression sharing problem," in *Proc. ICECS*, 2002, pp. 1171–1174.
- [14]. Y.-H. Ho, C.-U. Lei, H.-K. Kwan, and N. Wong, "Global optimization of common subexpressions for multiplierless synthesis of multiple constant multiplications," in *Proc. ASPDAC*, 2008, pp. 119–124.
- [15]. M. Potkonjak, M. Srivastava, and A. Chandrakasan, "Multiple constant multiplications: Efficient and versatile framework and algorithms for exploring common subexpression elimination," *IEEE Trans. Comput-Aided Design Integr. Circuits Syst.*, vol. 15, no. 2, pp. 151–165, Feb.1996.
- [16]. L. Aksoy, E. Gunes, and P. Flores, "Search algorithms for the multiple constant multiplications problem: Exact and approximate," *J. Micro-process. Microsyst.*, vol. 34, no. 5, pp. 151–162, Aug. 2010. pp. 407–413, Oct. 1994.