A Low Power Multiplier using a 24-Transistor Latch Adder

Background: Multiplication forms one of the most power hungry operations in a digital system. It is used extensively in the digital signal processing applications and in any general purpose processors. Hence, the efficient hardware realization of the multiplier is crucial in ensuring that the processors operate within the power limits and without getting overheated. Method: In order to make the multipliers more power efficient, ways have been found to curtail the spurious glitching in the internal nodes of the multiplier. Latch adder with the delay lines is used in the multiplier to equalize the delay of the partial products. Findings: In this paper, a novel 24 transistor Latch Adder (LA) is proposed. It is validated using the Wallace tree multiplier as a bench marking circuit. Wallace tree multiplier is implemented using the proposed latch adder and delay lines in the internal nodes. Comparison is made with the multipliers constructed using various full adder configurations available in the literature. Conclusion: It is proved that the proposed multiplier circuit achieves the power reduction of 20% compared to the multiplier using 16T full adder. The multiplier is simulated using the industry standard Cadence® Virtuoso tool in 180nm technology library files and the simulation results confirm the low power operation of the multiplier.


Introduction
The increase in chip complexity with every new node, as predicted by Moore's law has brought the power problem to the forefront of the digital design.Today the power has manifested itself as a third dimension in the design space together with the area and speed.The designers can no longer ignore the power consumption during the design phase.Modern processors with multibillion transistors have their power dissipation density equivalent or more to the nuclear reactors.The operations performed by these processors can be reduced to the basic operations of addition at the lowest level.And, at a slightly higher level the multiplication is performed as a part of many complex operations.For the purpose of multiplication, dedicated multipliers are embedded in the logic fabric of the processors.The multiplication can also be implemented as the basic shift and addition operations, in places involving multiplication/division by the exponents of 2. The studies have shown that the main component of power dissipation in multipliers is due to the spurious switching in internal nodes.In fact, it was shown that almost 50% of the power consumption in an array multiplier is due to such extraneous switching which does not influence the final result in any manner 1 .
Thus, in order to make the multipliers more power efficient, ways have been found to curtail the spurious glitching activities in the internal nodes of the multiplier.Various methods have been proposed and different types of multipliers have evolved over the past years, which had focused on reducing these switching activities.A row bypassing scheme has been used to equalize the delay between the sum and carry bits in the basic full adder module, and the resulting multiplier configuration is called the leapfrog multiplier 2 .A similar architecture has been employed 3  multiplexers to select the carry.A full adder structure called the Latch Adder (LA) has been employed together with the delay lines to equalize the delay of the intermediate partial products as they propagate to the next stage 4 .In addition to these, there are different types of full adders, which are used as the basic building blocks in the multiplier modules.
The array multiplier has been in use for quite some time due to its regular pattern, which makes the layout designing easier as compared to more irregular structures such as the Wallace tree multiplier, which are on the other hand, faster.A Wallace tree multiplier designed using the 18 transistor full adder has proved to be more power efficient than the Latch Adder based multiplier 5 .In this paper, the scheme of latching the intermediate partial products is incorporated in a 16 transistor full adder.A Wallace tree multiplier is designed using the new 24T full adder.This new multiplier is shown to be more power efficient.
The rest of the paper is structured as follows.Some of the different kinds of full adders published in literature are introduced in Section 2 and their comparative power performance is presented.Section 3 deals with the 4 × 4 Wallace tree multiplier configurations available in the literature.The circuits are designed using the full adders taken for consideration and their power dissipation characteristics are analyzed.It is shown that the multiplier consumes minimum power when the 16T full adder is used in the basic adder module.Section 4 presents the design of the new 24T Latch Adder.Section 5 presents the Wallace tree multiplier constructed using the 24T latch adder along with delay lines to activate the latch.Results and corresponding discussions are presented in Section 6 and the proposed design is found to consume the minimum power due to the reduction of the spurious glitches.Finally the conclusion is made in Section 7.

Different Types of Full Adders
There are a variety of types of full adders published in literature, commencing from the basic full adder structure with 28T mirror adder configuration.In addition to this, there are several adders proposed, based mainly on the structural differences such as the 28T latch adder 4 , 18T full adder 5 , 14T full adder 6 , 16T full adder 7 , 10T full adder 7 , 8T full adder and so on.
These adders were simulated using the Cadence® Virtuoso Analog Design environment.The 180nm technology library files have been employed.Arbitrary set of random vectors with the bit period of 10ns and the length of each vector to be 5 bits equivalent to 50ns duration have been chosen for the simulations.
The power consumption of the full adders is shown in the Table 1.It can be seen that though the 8T full adder has the lowest transistor count, it consumes the maximum power and it is very inferior to the other adders.The 42T full adder corresponds to the implementation of the full adder using two half adders and one OR gate using the 12T XOR gates.The Modified Full Adder 2 (MFA) is the compressor based adder structure.From the simulation results, it is found that the latch adder consumes the lowest power compared to all the other adders.However, it does not perform well in terms of power when embedded as a full adder inside a multiplier.The increased power consumption of the 8T full adder is unexpected, since it is supposed to consume the least power owing to its small transistor count.However, the simulation results depict the fact that the 8T full adder suffers from degraded logic levels, which are not constant with time and varies with respect to the pattern of the input data applied.These degraded voltage levels are in fact due to the glitches and for increased time duration of glitches, they tend to increase the power dissipation of the 8T full adder totally.

Multiplier Design
The 4 × 4 Wallace tree multipliers with the various adders discussed above were designed and their power performances were tabulated for comparison as shown in Table 2.The results obtained for the 4 × 4 array multiplier using the 28T mirror adder in carry save configuration is also presented for reference and comparison.The multiplier circuit was simulated using random input vectors of 4 bit length in all the cases, with each bit period equal to 10 ns, with a total period of simulation of 40ns.It is found that the Wallace tree multiplier performance using the 8T full adder, 14T full adder and 10T full adder and the array multiplier using 28T full adder were not satisfactory.However, the array multiplier using 28T was found to be better than the Wallace tree counterparts mentioned above.The best performance was exhibited by the Wallace tree structure using the 16T full adder.The elucidations of the results obtained are as follows.The high power consumption of the former multiplier structures, namely, those using 8T, 14T and 10T is mainly due to the spurious switching activity as observed from the simulation output.These spurious switching can be minimized by appropriately balancing the unequal path delays, which is the attempt made in this present work.Section 4 presents the design of the proposed 24T latch adder and the comparative performance benefits.

T Latch Adder Design
In the previous section, it was found that the Wallace tree with the 16T full adder shows better performance in terms of power, than the other adder topologies.This feature is employed in designing the 24T latch adder.The latch adder is proposed using the 28 transistor (28T) mirror 8,9 adder.The idea is to make the partial product unavailable to the input of the next stage by disabling the adder for the time required to perform the computation and activating it only when the computation based on the current input pattern at the input of a stage of adders is complete.This action prevents any unnecessary glitches due to the partially computed partial products to appear at the output.
It is possible to reduce the power by using such a scheme, wherein an inbuilt latch would be employed to reduce the glitches.
The structure of the new 24T latch adder which is designed using the 16T full adder is shown in Figure 2. The schematic is similar to that of the 16T full adder of Figure 1 except for the additional Transmission Gates (TGs) at the input.These TGs are used as latches which allow the input data only when enabled.The total transistor count of the circuit is 24, which includes the devices in the inverter structure.The TGs at the input are enabled for allowing the input bit patterns to enter the adder.Note that the inverter is employed to invert the enabling signal En to produce the Enb signal.When En is high, the Enb is low.These two signals are used to activate /deactivate the latching action.When the enabling signal En is low, and the complement Enb is high, the TGs are disabled and the inputs are not allowed to enter the adder and the adder continues holding the previous value of the output.The En and Enb signals are generated using a cascade of inverter chains 4 .Note that the bulk of the MOS transistors are connected to the appropriate potentials, i.e., the bulk of PMOS devices with V DD , and the bulk of NMOS devices with V SS .

Multiplier Design Using the New 24T LA
The performance analysis of the new adder and comparison with the other adder types are done by

Results and Discussions
The power measurement results are plotted in Figure 3 for the multipliers constructed using all the various types of adders under consideration.The power consumed as measured from the simulations is 97.3μW for the 24T latch adder which is 20% less than that of the normal 16T full adder.From the structure of the circuits, it may appear that the power consumption of the 24T latch adder based multiplier can be more than that of the normal 16T full adder based counterpart, simply due to the increase in the number of transistors (8 additional transistors).On the contrary, the simulations depict the fact that the major chunk of the power consumption in multipliers is due to the spurious switching happening in all the internal nodes of the circuit.As also shown in Figure 3, the power consumption of the 8T full adder based Wallace tree multiplier happens to incur the maximum of the power dissipation.Since the proposed 24T LA aimed at reducing the internal spurious switching activity, it realizes an appreciable amount of power reduction, as shown in Figure 3.It may also be noted that some amount of glitches are present in the outputs s3 and s4 since they are connected to the longest path from the input to the output (latency of 3 adders plus delay of AND gates used to generate the product terms).However, these glitches are found to be less than those incurred by the circuits using other adders.It may be noted that the reduced switching of 24T LA overcomes the effect of the device redundancy also.It is to be submitted that due to the inclusion of extra delay elements and transistors the area penalty may slightly be large for the proposed circuit.The layout problem may further be exacerbated as one has to route the enable signal En.However, where power performance is of primary concern, the proposed multiplier can very well be employed.

Conclusion
A new 24T latch adder has been proposed and its power performance is tested and validated, by using it as a basic building block of a 4 × 4 Wallace tree multiplier.The proposed adder realizes reduction in the glitches, though it does not completely eliminate the glitches.The 24T LA is found to have its power dissipation reduced by 20% than the present 16T LA based circuit found in the literature.

Table 2 .
Multiplier power comparison and testing a 4 × 4 Wallace tree multiplier used as a test bench.The schematic structure of the multiplier is as depicted in Figure2.The multiplier is tested with the same set of input vectors as that used in Section II.The delay element is used to activate the En and Enb signals.The first stage of the multiplier consists of two 24T LAs and they are activated by the delay element D1.When activated, the inputs enter the LA and their outputs are evaluated but are not fed to the second stage, due to the fact that the LAs in this stage are disabled.Further, the delay element D2 has its output deactivated and does not activate the second stage LAs yet.After a finite delay introduced by the delay element D2, the second stage LAs are activated.Note that by this time, the partial products from the first stage are fully computed and are made available at the input of the second stage.The next set of partial products is now evaluated by the second stage.The final stage of the structure consists of a cascade of 28T mirror adders to sum the final partial products and generate the output.Such an arrangement reduces the spurious transitions at the final output and in effect reduces the power consumption.