© Universiti Tun Hussein Onn Malaysia Publisher's Office



IJIE

Journal homepage: http://penerbit.uthm.edu.my/ojs/index.php/ijie

The International Journal of Integrated Engineering

# ISSN : 2229-838X e-ISSN : 2600-7916

## Low Leakage Low Power Domino Logic Technique for Wide Fan-In Applications, 40-Bit Tag Comparator

### Sapna Rani Ghimiray<sup>1\*</sup>, Pranab Kishore Dutta<sup>1</sup>

<sup>1</sup>Department of Electronics & Communication Engineering, North Eastern Regional Institute of Science and Technology, Nirjuli, 791109, INDIA

\*Corresponding Author

DOI: https://doi.org/10.30880/ijie.2022.14.04.019 Received 10 March 2021; Accepted 03 August 2021; Available online 20 June 2022

Abstract: Advances in sub-micron technologies make low power consumption and delay a major concern for present day systems. These parameters play a critical role in the performance of most widely used wide fan-in- dynamic logic gates. These wide fan-in dynamic gates are employed in designing high speed tag comparators which are critical blocks of cache memory. The Stack-Scheme Clocked-Bleeder Domino (SS-CBD) tries to have an insight on a 40-bit tag comparator in terms of power consumption and noise immunity to produce a high-performance logic style. The approach limits the voltage swing at the dynamic node with the help of stack transistor employed between dynamic node and clocked bleeder transistor. This technique will reduce the overall power consumption and improves the gate speed for constant noise immunity. The observation is carried out with 1 GHz clock frequency and 0.9 V supply at 27 <sup>o</sup>C temperature using Cadence Virtuoso Spectre and Layout editor for GPDK 90 nm CMOS technology.

Keywords: Dynamic logic, tag comparator, VLSI, Low Power, stacking effect, UNG, EDP, UNA

#### 1. Introduction

The faster operational speed of domino logic makes it more preferable than static CMOS logic for large circuit design. Domino design style is extensively employed for high performance application such as data path design [1-5]. Tag comparator is one of such application. Decision on processor requested data word residing in the Cache memory's line (HIT) or not (MISS) requires comparison of memory address blocks. This is typically done with tag comparator. Performance of cache memory entirely depends on these tag comparators which realizes HIT/MISS control signals for controller of cache memory [1]. The wide fan-in gates of tag comparator although have high speed but due to the more power consumption in domino circuit which is execrated in wide fan-in gate leads to huge power consumption. The increase in dynamic node capacitance and per clock discharge leads to increase in power consumption [6]. Moreover, low switching threshold ( $V_{TH}$ ) voltage of domino technique degrades the noise margin of the circuit [7-8].

The noise immunity and power dissipation are directly affected by lowering of  $V_{TH}$  increases sub-threshold leakage current in the case of small channel devices [9-11]. Increasing crosstalk noise also makes noise immunity critical with technology scaling [12-13]. Optimized value of noise immunity and performance comes with increased power consumption, which supports the fact of noise-power trade-off in domino circuits. Thus, improvised domino structure in terms of noise margin, speed and efficiency with low leakage power is required by the researchers.

To decrease total power dissipation, emphasis should be given in decreasing power dissipation in terms of switching, leakage, and short-circuit power consumption. Power reduction techniques can be categorized in three ways for dynamic logics:

- By use of control circuitry for performance improvement of keeper transistor,
- By reduction of voltage swing at N\_Dynamic node,
- By re-modelling and structuring of the footer transistor for evaluation network

\*Corresponding author: <u>sapna2203.sanu@gmail.com</u> 2022 UTHM Publisher. All rights reserved. penerbit.uthm.edu.my/ojs/index.php/ijie The use of control circuitry helps in reducing contention current. The second approach will reduce the switching power and the third approach which will lower the  $V_{TH}$  of the pull-down network (PDN), leading to an overall reduction in leakage current.

In this paper, we have proposed a new technique called Stack-Scheme Clocked-Bleeder Domino (SS-CBD), which aims at reduction of power consumption with help of low voltage swing and sub-threshold reduction created by the stacking effect.

Organization of paper is as follows: Section 2 gives an overview of literature based on domino techniques using remodelling of footer circuit and evaluation network. The proposed circuit technique is explained in Section 3. 40-bit tag comparator implementation is covered in Section 4. Section 5 covers simulation results and post layout analysis and finally Section 6 concludes the paper.

#### 2. Literature Overview

Standard footless domino circuit and standard footed domino circuit proposed by R. Karambeck et al [14] used weak keeper to overcome leakage and boost robustness in the circuit. But it will increase the contention current between the evaluation network and dynamic node leading to increase in power consumption and degradation of speed. This problem is more prominent for wide fan in gate, with large parallel leaky legs.

A diode- partitioned domino (DPD) presented by Suzuki [13], employs enhanced diode for boosting up the NMOS diode's gate voltage serving as common node to connect the low fan-in PDN of Logic design. This results in delay improvement with decrease in dynamic node capacitance. The output is realized by breaking the large evaluation network into smaller portions, each with four parallel leaky paths and smaller keeper is employed for each partition in place of single large keeper. However, this technique suffers from large area overhead and weaker noise immunity.

Nasserian, et al [6] proposed a modified charging-scheme based (MCSD) domino approach, which employs bleeder transistor, to reduce power consumption in wide fan-in gates by compensating effect of substrate, crosstalk noise and supply voltage. This domino style suffers from increased delay and area overhead.

Asyaei in [7] proposed a controlled-current comparison-based (C3D) domino circuit technique. This technique compares the two varying current with respect to voltage across evaluation network. This technique reduces the PDN voltage swing leading to reduction in power consumption at the expense of increased delay and performance degradation.

Sandeep in [8] proposed a foot driven-stacked transistor (FDSTDL) domino logic circuit approach, which introduces additional discharge path for dynamic node through stacked-transistors, N2 & N3. This path utilises voltage at foot node to raise circuit's speed. However, circuit lacks in terms of robustness and increased power consumption owing to output controlled leaky transistor N3, introduced in additional discharge path.

Also, Sandeep in [9] proposed low-power series connected foot driven transistors (LPSC-FDTL) logic, which aims to increase circuit's performance by using division of current at foot node to reduce leakage current. However, this approach is ineffective in controlling delay.

#### 3. Proposed Stack- Scheme based Clocked- Bleeder Domino Structure

The existing design techniques tries to improve the performance either in terms of power consumption, area requirement, delay reduction or noise immunity at the expenses of other parameters. Proposed stack-scheme based clocked-bleeder domino structure tries to improve these parameters taking consideration of the other parameter into account. The proposed domino technique based wide fan-in OR gate implementation is shown in Fig 1. Unlike, conventional footless domino technique, PDN network of proposed circuit is connected between dynamic node and N\_Foot node via drain terminal of bleeder transistor, M1. This makes the evaluation network of the circuit connected to output of static inverter through dynamic node only and hence reduction in voltage swing across PDN is achieved.

In addition to pre-charge transistor,  $M_{PRE}$ , keeper transistor,  $M_K$  and PDN network, proposed circuit comprises of 3 more transistors, namely,  $M_N$ ,  $M_P$  and bleeder transistor, M1 along with static inverter. These transistors help in creating stacking effect and clocking of bleeder transistor to reduce overall leakage. Transistor  $M_N$  is connected to dynamic node via drain terminal and its source terminal is connected with the source of  $M_P$  transistor. The gate of both transistors i.e.  $M_N$  and  $M_P$  are connected to each other's drain terminal. Also, the drain of clock fed- bleeder transistor, M1 is connected to drain of transistor  $M_P$ , identified as N\_Foot node. The switching of these two transistors ( $M_N \& M_P$ ) depends on the voltage difference at dynamic node and drain of bleeder transistor M1. The gate terminal of stack-scheme transistors  $M_N \& M_P$  and bleeder transistor M1 depends collectively on drain voltages, clock state along with inputs applied to PDN. Hence, this approach helps in reducing leakage in pre-charge as well as evaluation phase.



Fig. 1 - Schematic of Proposed Stack-Scheme Based Clocked- Bleeder Domino Scheme

With respect to proposed domino structure, the operation is explained in two phases: during pre-charge phase, CLK= low; turning  $M_{PRE}$  ON and charging dynamic node to  $V_{DD}$  which will in turn make the output low. The low output will turn ON the transistor  $M_K$  which helps in holding dynamic node to  $V_{DD}$ . When bleeder transistor M1 is OFF with all applied inputs as high, N\_Foot node is little less than  $V_{DD}$  i.e.  $(V_{DD}-V_{TH})$ . Transistor  $M_N$  and  $M_P$  are tied between dynamic node and N\_Foot node. Here, drain of  $M_N$  is connected to  $V_{DD}$ , while drain of  $M_P$  is connected to  $(V_{DD}-V_{TH})$ . Conducting PDN turns ON transistor  $M_N$  keeping it in saturation region, with  $V_{DS}$  greater than or equal to  $(V_{GS}-V_{TH})$ , and transistor  $M_P$  is OFF and in cut-off region with drain voltage equal to gate voltage i.e.  $(V_D=V_G=V_{DD})$ . It gives rise to buffer like situation and voltage level of dynamic node becomes  $V_{DD}$  to  $(V_{DD}-V_{TH})$ . As, we know dynamic power depends directly on voltage swing, limiting of  $V_{DD}$  to  $(V_{DD}-V_{TH})$  helps in reducing dynamic power in pre-charge phase. If PDN is nonconducting then, N\_Foot node becomes  $V_{TH}$  and transistor  $M_N$  is turned OFF resulting in stacking effect with two transistors being OFF in a stack (i.e.  $M_N$  and bleeder transistor, M1) leading to lesser leakage along with charging and dis-charging of dynamic node through constant current. This helps in reducing leakage current in pre-charge phase.

During evaluation phase, when CLK=high; transistor  $M_{PRE}$  turns OFF and bleeder transistor M1 becomes ON. Depending upon the input applied to evaluation network, dynamic node gets charged and discharged which is reflected in the output is 1. This will in turn make the keeper transistor  $M_K$  OFF. If all the inputs applied are low; then dynamic node is still charged to  $V_{DD}$ , turning ON keeper transistor Stack- scheme transistor along with bleeder transistor follows exactly same operation as in case of pre-charge phase. But, if any one or all the inputs goes high, then dynamic node starts to discharge and high clock ensures turning ON of transistors M1 which leads to N\_Foot node at V<sub>TH</sub> and dynamic node at 0 condition. This leads  $M_N$  transistors to operate at near cut-off region with  $M_P$  transistor being in saturation.

Transistor  $M_N$  creates stacking effect at node N\_Foot with low switching voltage for bleeder transistor, which helps in limiting substrate effect and leakage current by reducing overall power. Because of the presence of bleeder transistor M1, dynamic node is discharged to V<sub>TH</sub> value. But this value is enough for  $M_{N,inv}$  transistor from static inverter to turn ON which draws the output of the structure to (V<sub>DD</sub> - V<sub>TH</sub>) or high value. So, in general this circuit operates in both precharge and evaluation phase based upon input conditions of PDN. However, use of stack-scheme (i.e. NMOS in series with PMOS) improves performance in terms of reduced delay and power at the expense of low noise margin.

#### **3.1 Transistor Sizing**

Proper sizing of transistor is essential in critical circuits. For proposed circuit the aspect ratio (W/L) is taken as 3:1 where length (L) is taken as 90 nm. To obtain a minimum delay for precharging w.r.t other operation the aspect ratio for transistor MPRE is taken as 4:1. Further, the W/L ratio of bleeder transistor M1 is set as (W/L)M1= (6 Lmin / 2 Lmin) to achieve constant noise environment also known as iso-robustness condition. The transistor sizing for proposed circuit along with literature technique for iso-robustness (i.e. with constant Unity Noise Gain) condition is in Table 1. The output inverter's transistor's width ratio is set to 2. The standard domino logics i.e., footless and footed are sized to minimum width and length accordingly to achieve same UNG condition.

| For, Fan-IN=8; UNG= 546 mV        |                                                                                                                                                                                    |                                                        |                                                |                                                        |                    |                                                |                                                         |                                                         |
|-----------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|------------------------------------------------|--------------------------------------------------------|--------------------|------------------------------------------------|---------------------------------------------------------|---------------------------------------------------------|
|                                   | $\left(\frac{W}{L}\right)PRE$                                                                                                                                                      | $\left(\frac{W}{L}\right)P$                            | $\left(\frac{W}{L}\right)$                     | )N                                                     | $\binom{W/L}{L}$ 1 |                                                | (W)K                                                    | $\begin{pmatrix} W_{P} \\ W_{N} \end{pmatrix} INV$      |
| Proposed<br>circuit               | $\begin{pmatrix} 3_{L\min} \\ 4_{L\min} \end{pmatrix}$                                                                                                                             | $\begin{pmatrix} 3_{L\min} \\ 2_{L\min} \end{pmatrix}$ | $\binom{3_{L\min}}{2}$                         | $\begin{pmatrix} 3_{L\min} \\ 2_{L\min} \end{pmatrix}$ |                    |                                                | $(3_{L\min})$                                           | $\begin{pmatrix} 12_{L\min} \\ 6_{L\min} \end{pmatrix}$ |
| M-CSD                             | $\left( \overset{W}{/}_{L} \right) Mn$                                                                                                                                             | $\left(\frac{W}{L}\right)4$                            | $\left(\frac{W}{L}\right)$                     | $\binom{W_L}{L}K1$                                     |                    | l                                              | $\left(\frac{W}{L}\right)K2$                            | $\begin{pmatrix} W_P \\ W_N \end{pmatrix} INV$          |
| [6]                               | $\begin{pmatrix} 6_{L\min} \\ 4_{L\min} \end{pmatrix}$                                                                                                                             | $\begin{pmatrix} 6_{L\min} \\ 2_{L\min} \end{pmatrix}$ | $\begin{pmatrix} 2_{L\min} \\ 2 \end{pmatrix}$ | $\begin{pmatrix} 2_{L\min} \\ 2_{L\min} \end{pmatrix}$ |                    | n )                                            | $\begin{pmatrix} 4_{L\min} \\ 2_{L\min} \end{pmatrix}$  | $\begin{pmatrix} 8_{L\min} \\ 4_{L\min} \end{pmatrix}$  |
| C3D<br>Circuit                    | $\begin{array}{c} C3D \\ \hline \end{array} \begin{pmatrix} W/L \end{pmatrix} PRE \\ \hline \begin{pmatrix} W/L \end{pmatrix} Dis \\ \hline \end{pmatrix} Eval = (W)1 \end{array}$ |                                                        | (W)Diode                                       | $(W) \operatorname{Pr} e$                              | (W2 = W3 = W4)     | $\begin{pmatrix} W_P \\ W_N \end{pmatrix} INV$ |                                                         |                                                         |
| [7]                               | $ \begin{pmatrix} 3_{L\min} \\ 4_{L\min} \end{pmatrix} $                                                                                                                           | $\begin{pmatrix} 3_{L\min} \\ 2_{L\min} \end{pmatrix}$ | (13 <sub>Lr</sub>                              | $(13_{L\min})$                                         |                    | $(4_{L\min})$                                  | $(4_{L\min})$                                           | $\begin{pmatrix} 12_{L\min} \\ 6_{L\min} \end{pmatrix}$ |
| FDSTDL                            | $\left(\frac{W}{L}\right)P1$                                                                                                                                                       | $\left(\frac{W}{L}\right)P2$                           | (W)N1_                                         | Footer                                                 | (W)N2              | (W)N3                                          | $\binom{W}{L}Eval$                                      | $\begin{pmatrix} W_P \\ W_N \end{pmatrix} INV$          |
| [8]                               | $\begin{pmatrix} 6_{L\min} \\ 4_{L\min} \end{pmatrix}$                                                                                                                             | $\begin{pmatrix} 6_{L\min} \\ 2_{L\min} \end{pmatrix}$ | (4 <sub><i>L</i>m</sub>                        | <sub>iin</sub> )                                       | $(4_{L\min})$      | $(4_{L\min})$                                  | $\begin{pmatrix} 10_{L\min} \\ 2_{L\min} \end{pmatrix}$ | $\begin{pmatrix} 12_{L\min} \\ 6_{L\min} \end{pmatrix}$ |
| LPSC-<br>FDTL -<br>Circuit<br>[9] | $\binom{W}{L}MP1$                                                                                                                                                                  | $\binom{W}{L}MP2$                                      | (W)MN1                                         | (W)MN2                                                 | (W) <i>MP</i> 4    | (W)MN4                                         | $\left(\frac{W}{L}\right)Eval$                          | $\begin{pmatrix} W_P \\ W_N \end{pmatrix} INV$          |
|                                   | $\begin{pmatrix} 6_{L\min} \\ 4_{L\min} \end{pmatrix}$                                                                                                                             | $\begin{pmatrix} 6_{L\min} \\ 2_{L\min} \end{pmatrix}$ | $(4_{L\min})$                                  | $(4_{Lmin})$                                           | $(24_{L\min})$     | $(4_{Lmin})$                                   | $\begin{pmatrix} 10_{L\min} \\ 2_{L\min} \end{pmatrix}$ | $\begin{pmatrix} 12_{L\min} \\ 6_{L\min} \end{pmatrix}$ |

Table 1 - Transistor sizing for all 8-input OR gate.

#### Table 2 - Transistor sizing for 40-bit Tag Comparator

|                       | Initial                                                | Stage                                                   | Intermediate Stage                                                                                                                      |                                                         |                                                                 |  |
|-----------------------|--------------------------------------------------------|---------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|-----------------------------------------------------------------|--|
|                       | (M)Pr e                                                | (M)D                                                    | (M)Pr $e$                                                                                                                               | (M)Eval                                                 | (M)K1                                                           |  |
| Proposed              | $\begin{pmatrix} 8_{L\min} \\ 6_{L\min} \end{pmatrix}$ | $\begin{pmatrix} 10_{L\min} \\ 1_{L\min} \end{pmatrix}$ | $\left( \begin{array}{c} 0_{L\min} \\ 1_{L\min} \end{array} \right) \left( \begin{array}{c} 8_{L\min} \\ 6_{L\min} \end{array} \right)$ |                                                         | $\begin{pmatrix} 3_{L\min} \\ 5 \times 4_{L\min} \end{pmatrix}$ |  |
| 40-BIT Tag Comparator |                                                        |                                                         | Final Stag                                                                                                                              | je                                                      |                                                                 |  |
|                       | (M)K                                                   | (M)P                                                    | (M)N                                                                                                                                    | (M)1                                                    | $M_{P_{inv}}/M_{N_{inv}}$                                       |  |
|                       | $\begin{pmatrix} 3_{L\min} \\ 4_{L\min} \end{pmatrix}$ | $\begin{pmatrix} 6_{L\min} \\ 4_{L\min} \end{pmatrix}$  | $\begin{pmatrix} 6_{L\min} \\ 4_{L\min} \end{pmatrix}$                                                                                  | $\begin{pmatrix} 12_{L\min} \\ 4_{L\min} \end{pmatrix}$ | $\begin{pmatrix} 18_{L\min} \\ 9_{L\min} \end{pmatrix}$         |  |

#### 4. Low Power-High Speed 40- bit Tag Comparator

Demand for high-performance cache, due to high-frequency microprocessor has forced designers to implement high MISS/HIT cache rate memory. Tag comparator designed using wide fan-in OR gates are building blocks of the cache controller to generate MISS/HIT bit [6-7,17-19]. One of the power consuming and speed deterioting components in microprocessor is tag comparator. This gives the need to design a low-power high speed tag comparator. 40-bit conventional tag comparator requires 80-bit parallel inputs with each 2-input XOR gate consisting of 2 legs. This makes tag comparator realization using conventional dynamic structure tedious due to stacking of large number of transistors. This also increases parasitic capacitance at dynamic node.

Nassarein [6] has designed a 40- bit tag comparator. It employs 4-bit partition (i.e. 8 legs) and smaller keeper for each partition in place of large keeper for the complete evaluation network. Suzuki [13] employs 2-bit partition (i.e. 4 legs) in place of 4-bit partition. Two partitions of the comparator circuit proposed by Nasserein [6], is considered to be best structure for designing tag comparator due to efficient leakage tolerant domino structure and 4-bit partition.

40-bit tag comparator realized by proposed domino circuit approach is represented in Fig 2. The complete tag comparator is designed in three stages. Initial stage is similar to [6][13] which comprise of enhanced diode. The intermediate stage comprises of evaluation network and the final stage is the domino approach. In Fig 2(a)  $M_{D[1]} - M_{D[N]}$  are the diode- connected transistors used to realize enhanced diode and  $P_{[1]} - P_{[N]}$  are the number of partitions for complete PDN network. For 40-bit tag comparator, five partition of PDN network, each consisting of 8-bit i.e. 16 legs is utilized as shown in Fig 2(b). Each 8-bit partition is connected to enhanced diode with help of  $M_{D[1]} - M_{D[5]}$  diode-connected transistors, leading to common dynamic node at the final stage, named as COMM. Small keeper,  $K_{P1}$  is associated with each partition as shown in Fig 2(b) and Fig 2(c) and are separated from each other based on output node generating MISS/HIT. Similar to DPD [13] based comparator, each diodes of partition i.e.  $M_{D[1]} - M_{D[5]}$ , are charged to  $V_{DD}$  with help of  $M_{PRE}$  transistor.



Fig. 2 Implementation of 40-bit Proposed Tag Comparator circuit a) Basic Block Diagram, b) structure of 8-bit PDN network, c) complete schematic of 40-bit tag comparator, and d) Structure of final stage.

In the final stage as shown in Fig 2(d), the COMM node, from each partition helps in deciding the charging and discharging of dynamic node followed by turning ON/OFF of keeper resulting on Final MISS/HIT bit similar to discussed in section 3. While designing this comparator, the size of smaller keeper depends upon the number of partitions i.e.,

$$(W_{L})_{Small\_Keeper} = \frac{1}{N} \times (W_{L})_{Large\_Keeper} = \frac{1}{5} \times (W_{L})_{Large\_Keeper}$$
(1)

Transistor sizing for 40-bit tag comparator is summed up in Table 2. The sizing is done to achieve low-power and high-speed operation with constant noise immunity or robustness.

#### 5. Simulations

Simulation of the proposed design has been carried out using Cadence Spectre, 90 nm technology with 1 GHz operating clock, 0.9 V supply voltage and at 27 <sup>o</sup>C temperature. The 8-input OR-gate were simulated using proposed logic technique. The simulation results give a comparison of proposed design namely, SFLD, SFD, DPD, MCSD, C3D, FDSTDL and LPSC-FDTL with the existing design in terms of power dissipation, noise margin, delay and area. These circuits were implemented for iso- robustness i.e. constant noise margin with output load capacitance of 5fF [5]. Simulating an 8-input OR gate as benchmark is to verify the functionality and performance of the proposed domino technique.

#### 5.1 Design Matrix

#### 5.1.1. Noise Immunity

The metric used for measuring noise immunity of the dynamic circuit is unity noise gain (UNG) [4-5,10,18-22]. It is defined as DC voltage applied as input noise resulting in same amplitude noise pulse at output node [18]. It is expressed as in Equation (2);

$$UNG = \left\{ V_{in\_noise} : V_{in\_noise} = V_{out\_noise} \right\}$$
(2)

By varying the amplitude or duration of the noise pulse, level of input noise can be altered. However, the duration of input noise pulse and output noise pulse is different in case of UNG. Therefore, another noise immunity metric, namely,

Unity Noise Average (UNA) is used to account for this problem [5,7]. UNA is defined as the input noise amplitude resulting in same average noise voltage at output node. It can be expressed as;

$$UNA = \{V_{in\_noise} : V_{in\_noise(AVG)} = V_{out\_noise(AVG)}\}$$
(3)

#### 5.1.2. Figure of Merit (FOM)

A measure used to compare the effectiveness of existing techniques with proposed technique in terms of delay, power, area and noise is known as Figure of Merit (FOM) [7, 23-24]. It is expressed as;

$$FOM = \frac{UNA_{Norm}}{P_{total_Norm} \times t_{p_Norm}^2 \times \sigma_{delay_Norm} \times A_{Norm}}$$
(4)

where,  $(P_{total_Norm} \times t_{p_Norm}^2)$ ,  $UNA_{Norm}$ ,  $\sigma_{delay_Norm}$  and  $A_{Norm}$  are Energy-Delay-Product (critical efficiency metric), unity noise average, standard deviation of delay and die area of the circuit, respectively, all normalized to SFLD 8-in OR gate parameters. To measure the FOM parameters, iso-UNA (same noise immunity for all circuits) condition is considered for simulation. The area estimation is done using size of transistors and  $\sigma_{delay}$  is obtained by Monte Carlo simulation.

|                                    |        | I     | <b>T</b> |       |       | I I    | 8             |          |
|------------------------------------|--------|-------|----------|-------|-------|--------|---------------|----------|
| Parameters                         | SFLD   | SFD   | DPD      | M-CSD | C3D   | FDSTDL | LPSC-<br>FDTL | Proposed |
| Power, (µW)                        | 8.32   | 10.83 | 12.44    | 6.40  | 16.84 | 12.04  | 8.12          | 2.82     |
| Norm_Power                         | 1      | 1.30  | 1.50     | 0.77  | 2.02  | 1.45   | 0.98          | 0.34     |
| Clk_Power, (pW)                    | 289    | 321   | 445      | 345   | 295   | 359    | 298           | 248      |
| Std_by_Pwr, (nW)                   | 394.3  | 420.6 | 289.9    | 275.1 | 442.9 | 521.1  | 395.4         | 131.5    |
| Leakage, (nA)                      | 438.2  | 467.3 | 322.2    | 305.7 | 492.1 | 533.8  | 465.3         | 146.1    |
| Delay, (ps)                        | 52.60  | 54.38 | 47.40    | 38.20 | 64.92 | 43.0   | 68.5          | 27.60    |
| Norm_Delay                         | 1      | 1.03  | 0.90     | 0.73  | 1.23  | 0.82   | 1.30          | 0.52     |
| UNG, (mV)                          | 546    | 546   | 546      | 546   | 546   | 546    | 546           | 546      |
| EDP, $(10^{-27} J^2)$              | 23.0   | 32.0  | 27.9     | 9.33  | 70.9  | 22.3   | 38.1          | 2.14     |
| Norm_edp                           | 1      | 1.39  | 1.17     | 0.41  | 3.08  | 0.97   | 1.66          | 0.09     |
| UNG/EDP, $(10^{25} \text{ v/J}^2)$ | ) 2.37 | 1.71  | 1.95     | 5.85  | 0.77  | 2.44   | 1.43          | 25.5     |
| Norm_ung/edp                       | 1      | 0.72  | 0.82     | 2.46  | 0.32  | 1.03   | 0.60          | 10.76    |

Table 3 - Comparison of parameters of all designed 8-input OR gates

The simulation results of proposed 8-input OR gate with literature techniques are summed up in Table 3, under same UNG criteria. Standby power, leakage current, power dissipation along with leakage w.r.t. clock has been compared and found to be least for proposed 8-input OR circuit. The efficiency of the proposed circuit is highly improved. Therefore, UNG/EDP is highest for proposed domino. Least value of leakage current for proposed circuit further attribute to the reduction in sub-threshold leakage and voltage swing at dynamic node.

The FOM of all the 8-input OR gate simulated under iso- UNG condition is tabulated in Table 4. In order to consider the effectiveness, FOMs are normalized to 8-input SFLD OR gate parameters. The area of die estimated is obtained with help of transistor sizes. The degree of improvement in FOM w.r.t SFLD is found to be 3.74 times. The high value of FOM is the result of lesser delay and power consumption achieved by proposed circuit. Fig 3 represents the normalized FOM of proposed circuit with existing circuit for different fan-in inputs. It shows above 8-inputs, the proposed circuit has better performance in terms of power and speed matrix in comparison to the existing circuits. This ensures that proposed scheme will provide better performance if it is used for wide fan-in application.

The robustness of circuit becomes more critical with process variation due to scaling of technology [26]. Therefore, in order to account for process, voltage and temperature (PVT) variation, the proposed OR gate is simulated at four process corners and two temperatures (i.e. 27 °C and 110 °C). Table 5 illustrates the 8-input OR gates main parameters at fast-fast (FF), fast-slow (FS), slow-fast (SF) and slow-slow (SS) process corners and 27 °C and 110 °C temperature with 0.9 V power supply. The result indicates proper operation of 8-input OR gate against process and temperature variation. The low power in all process against SFLD relates to power saving opportunity. The best efficiency in terms of Normalized UNG/EDP of the proposed 8-in OR-gate occurs at SS corner at 110 °C temperature, whereas, the worst efficiency occurs at FF corner 27 °C temperature.

|                          |      |      |       |       | CIRCUIT | ГS     |               |          |
|--------------------------|------|------|-------|-------|---------|--------|---------------|----------|
| Parameters               | SFLD | SFD  | DPD   | M-CSD | C3D     | FDSTDL | LPSC-<br>FDTL | Proposed |
| # of Transistor          | 12   | 13   | 22    | 15    | 19      | 15     | 16            | 15       |
| Norm_Power               | 1    | 1.30 | 1.50  | 0.77  | 2.02    | 1.45   | 0.98          | 0.34     |
| Norm_Delay               | 1    | 1.03 | 0.90  | 0.73  | 1.23    | 0.82   | 1.30          | 0.52     |
| σDelay_                  | 8.02 | 9.88 | 11.32 | 18.03 | 28.20   | 19.78  | 22.21         | 18.0     |
| Norm_ σ <sub>Delay</sub> | 1    | 1.23 | 1.41  | 2.24  | 3.51    | 2.46   | 2.77          | 2.25     |
| Area (µm <sup>2</sup> )  | 234  | 253  | 377   | 272   | 344     | 295    | 309           | 278      |
| Norm_Area                | 1    | 1.08 | 1.61  | 1.16  | 1.47    | 1.26   | 1.32          | 1.18     |
| UNA (mV)                 | 467  | 467  | 467   | 467   | 467     | 467    | 467           | 467      |
| Norm_una                 | 1    | 1    | 1     | 1     | 1       | 1      | 1             | 1        |
| FOM                      | 1    | 0.54 | 0.38  | 0.94  | 0.06    | 0.27   | 0.21          | 3.74     |

 Table 4 - FOM calculation for 8-input OR gates



Fig. 3 - FOM realization for designed Wide Fan-in OR gates

Table 5 - Comparison for 8-input OR gate characteristics at varying temperature and process corners

| Proces<br>Temp. c | ss and corners                 |                   | Standard            | Footless          | Domino      | Logic                                         |                 |               | Prop                | osed dor          | nino Logi   | ic                                            |             |
|-------------------|--------------------------------|-------------------|---------------------|-------------------|-------------|-----------------------------------------------|-----------------|---------------|---------------------|-------------------|-------------|-----------------------------------------------|-------------|
| Proces<br>s       | Tem<br>p.<br>( <sup>6</sup> C) | Powe<br>r<br>(µW) | Leaka<br>ge<br>(nA) | Dela<br>y<br>(ps) | UNG<br>(mV) | EDP<br>(10 <sup>-27</sup><br>J <sup>2</sup> ) | UN<br>G/E<br>DP | Power<br>(µW) | Leaka<br>ge<br>(nA) | Dela<br>y<br>(ps) | UNG<br>(mV) | EDP<br>(10 <sup>-27</sup><br>J <sup>2</sup> ) | UNG<br>/EDP |
| FF                | 27                             | 9.23              | 522.2               | 50.2              | 533         | 23.2                                          | 1               | 3.86          | 201.1               | 24.7              | 533         | 2.35                                          | 9.8         |
| 111 -             | 110                            | 9.01              | 548.3               | 54.4              | 560         | 26.6                                          | 1               | 3.41          | 222.4               | 25.4              | 560         | 2.20                                          | 12.1        |
| FS                | 27                             | 6.42              | 491.1               | 53.5              | 492         | 18.3                                          | 1               | 1.94          | 198.2               | 26.1              | 492         | 1.32                                          | 13.7        |
| 13                | 110                            | 6.13              | 504.4               | 56.2              | 510         | 19.4                                          | 1               | 1.71          | 208.8               | 27.2              | 510         | 1.36                                          | 14.4        |
| SE                | 27                             | 8.84              | 477.7               | 58.8              | 610         | 30.5                                          | 1               | 3.81          | 165.5               | 28.9              | 610         | 3.18                                          | 9.9         |
| 51                | 110                            | 8.43              | 493.7               | 62.4              | 640         | 32.8                                          | 1               | 3.30          | 182.3               | 30.3              | 640         | 3.03                                          | 11.1        |
|                   | 27                             | 6.23              | 462.6               | 64.6              | 570         | 25.9                                          | 1               | 1.88          | 154.4               | 31.4              | 570         | 1.85                                          | 14.0        |
| 00                | 110                            | 6.04              | 469.9               | 66.9              | 590         | 27.0                                          | 1               | 1.60          | 169.0               | 34.5              | 590         | 1.91                                          | 14.2        |



Fig. 4 - Effect of voltage variation on proposed 8-in OR gate circuit parameters at typical-typical (TT) process corner and 27 <sup>o</sup>C temperature; a) UNG vs V<sub>DD</sub>, b) Norm Power vs V<sub>DD</sub>, and c) Norm (UNG/EDP) vs V<sub>DD</sub>.

| Table 6 - Standard deviations and mean values of | power and delay for 8-input OR gates |
|--------------------------------------------------|--------------------------------------|
|--------------------------------------------------|--------------------------------------|

| Circuits         | $\sigma_{Power}(\mu W)$ | $\sigma_{\text{Delay}}(\text{ps})$ | $\sigma_{Delay}/\mu_{Delay}$ | $\sigma_{Power}/\mu_{Power}$ |
|------------------|-------------------------|------------------------------------|------------------------------|------------------------------|
| SFLD             | 0.787                   | 4.442                              | 0.085                        | 0.088                        |
| SFD              | 0.754                   | 4.823                              | 0.087                        | 0.075                        |
| DPD              | 0.892                   | 5.569                              | 0.116                        | 0.074                        |
| M-CSD            | 0.416                   | 2.253                              | 0.194                        | 0.063                        |
| C3D              | 0.726                   | 2.758                              | 0.043                        | 0.040                        |
| FDSTDL           | 0.710                   | 2.987                              | 0.158                        | 0.050                        |
| LPSC-FDTL        | 0.856                   | 4.258                              | 0.048                        | 0.091                        |
| Proposed Circuit | 0.254                   | 1.853                              | 0.067                        | 0.090                        |

| Table 7 · | - Specifications | of Wide Fan   | -in Tag  | comparator | circuits |
|-----------|------------------|---------------|----------|------------|----------|
| Lable /   | Specifications   | or white I an | III I HS | comparator | circuito |

| Eon in    | Denometers       | Circuits |        |          |  |  |  |
|-----------|------------------|----------|--------|----------|--|--|--|
| ran-m     | Parameters       | SFLD     | M-CSD  | Proposed |  |  |  |
|           | Avg_Pwr, (µW)    | 4.49     | 2.86   | 1.74     |  |  |  |
|           | Leakage, (nA)    | 243.44   | 68.12  | 54.12    |  |  |  |
|           | Std_by_Pwr, (nW) | 270.48   | 75.69  | 60.13    |  |  |  |
| 2-Bit Tag | Delay, (ps)      | 110.20   | 69.28  | 56.86    |  |  |  |
|           | UNG, (mV)        | 563      | 563    | 563      |  |  |  |
|           | Avg_Pwr, (µW)    | 8.23     | 3.48   | 2.06     |  |  |  |
|           | Leakage, (nA)    | 568.01   | 146.23 | 113.44   |  |  |  |
|           | Std_by_Pwr, (nW) | 631.11   | 162.48 | 126.04   |  |  |  |
| 4-Bit Tag | Delay, (ps)      | 146.41   | 77.85  | 68.84    |  |  |  |
|           | UNG, (mV)        | 548      | 548    | 548      |  |  |  |
|           | Avg_Pwr, (µW)    | 16.68    | 10.23  | 2.30     |  |  |  |
|           | Leakage, (nA)    | 1000.2   | 646.43 | 243.26   |  |  |  |
|           | Std_by_Pwr, (nW) | 1113.55  | 718.26 | 270.29   |  |  |  |
| 8-Bit Tag | Delay, (ps)      | 168.0    | 89.28  | 71.14    |  |  |  |
|           | UNG, (mV)        | 516      | 516    | 516      |  |  |  |

Apart from process and temperature variation, study of supply voltage variation effects on circuit is very important in low power applications due to delay uncertainty limiting the speed performance and overloading problems. Usually, the deviation around mean value of power increases with increase in supply voltage. Hence, 8-input OR gate is simulated at typical corner and room temperature for  $V_{DD}$  varying between 0.7 V to 1.2 V. The UNG and power dissipation of proposed and SFLD OR gate increases with increase in  $V_{DD}$ , as represented in Fig 4(a) and Fig 4(b).



Fig. 5 - Parameters comparison for designed 8-input OR gates under iso-UNG condition



Fig. 6 - Transient waveform of proposed 40-bit tag comparator with 1 GHz clock and 0.9V supply

However, power saving capacity of proposed circuit is higher due to increase in value of  $(V_{DD} - V_{TH})/V_{DD}$  at dynamic node with reduction in supply voltage. This leads to constant increase in Normalized (UNG/EDP) with increase in power supply, as shown in Fig 4(c), making circuit usable in voltage variation too.

Monte- Carlo simulation has been carried out on all designed 8-input OR gates to estimate process variation effect on power and delay, in same manner as in [3]. For suitable accuracy, Monte -Carlo simulation for N=2000 point has been considered. Then, power and delay variations are obtained by simulating the circuits with values of standard deviation of both, channel length ( $\sigma_{\Delta\omega/\omega}(\%)$ ) and threshold voltage ( $\sigma_{Vth}(mV)$ ) as per [27]. Table 6 includes the results. Here, circuit parameters standard deviation and mean values are represented as  $\sigma$  and  $\mu$ , respectively. The variation in  $\sigma_{Delay} / \mu_{Delay}$ is more compared to variation in  $\sigma_{Power} / \mu_{Power}$ , due to variable delay of designed OR-gate. This variation further increases in wide fan-in OR gates due to increase in dynamic node capacitance and parallel leaky legs in evaluation networks. Also, the  $\sigma_{Power} / \mu_{Power}$ , of proposed OR gate is highest, relating to power saving ability.

For comparison of all designed OR gate circuits power, delay, efficiency along with noise immunity, power, Normalized delay and (UNG/EDP) are evaluated and shown in Fig 5. Higher value of (UNG/EDP) attributes to the better efficiency of circuit in all parameters, respectively. In comparison MCSD circuit is better than rest of literature circuit. But, due to least power consumption of proposed scheme, MCSD technique becomes less attractive for low power highspeed operations.

A 40-bit tag comparator (Fig 2) using proposed domino scheme is implemented, as discussed in section 4 for further verification of proposed technique in high-speed cache controller. Here, instead of directly designing 40-bit tag comparator we first simulated the 2-bit, 4-bit and 8-bit tag comparator using proposed scheme. The results were compared

with second most efficient technique i.e. MCSD and standard footless technique in terms of leakage current, average power, delay, static power and UNG. Table 7 holds the results for parameters of 2-, 4- and 8-bit tag comparator designs. It is evident from the results that MCSD structure is better till 4-bit partitions with respect to SFLD technique in all the parameters, but as increase in parallel legs i.e. 8-bit partition, the performance of structure degrades too much, resulting in higher leakage current and average power.

However, the delay is within tolerable range for same UNG criteria. For proposed circuit, with increase in fan-in of tag comparator, the parameters are within tolerable range and better in comparison to both SFLD and MCSD structure due to reduction in swing voltage by use of effective clocked-bleeder transistor and stacking transistors. So, the implementation of 40-bit tag comparator using proposed circuit consists of enhanced diode, 5 X 8-bit partition of PDN network and final stage domino structure.

The transient output of proposed 40-bit tag comparator is illustrated in Fig 6. The waveform consists of layout waveforms. Here, signal EN\_D\_D\_Lay is the input applied to all diode-connected transistors i.e.  $M_{D[1]} - M_{D[5]}$ . This will helps in maintaining dynamic node of all partition,  $P_{[1]} - P_{[5]}$  below  $V_{DD}$  in both pre-charge and evaluation phase. The MISS\_Lay bit is maintained near  $V_{DD}$  depending upon high input, A<sub>39</sub> and low input D<sub>39</sub>. During pre-charge phase signal COMM\_Lay is high. At evaluation phase as MISS\_Lay signal goes high, COMM\_Lay signal starts to discharge and is maintained at below  $V_{TH}$  value till mismatch in input exist. Power saving is achieved as MISS\_Lay signal does not discharge with change in each clock cycle, and is maintained at high for maximum time.

The layout of complete proposed tag comparator is illustrated in Fig 7. The layout is also divided in three- parts. Fig 7(a) represents layout of Enhance diode, Fig 7(b) represents layout of 8-bit partition of PDN network and Fig 7(c) shows the layout of final stage. The total die area of the comparator is 1029.72  $\mu m^2$ , which is less in comparison to the die area of MCSD based 40-bit tag comparator.







с

Fig. 7 - Layout of proposed 40-bit tag comparator circuit, a) Initial stage, b) 8-bit PDN partition, and c) final stage.

| Demonsterne              | M-0                    | CSD     | Proposed Circuit |             |  |  |
|--------------------------|------------------------|---------|------------------|-------------|--|--|
| Parameters               | Pre-Layout Post-Layout |         | Pre-Layout       | Post-Layout |  |  |
| # of transistor          | 167                    | 167     | 167              | 167         |  |  |
| Power, (µW)              | 20.99                  | 32.32   | 8.72             | 13.51       |  |  |
| Leakage, (µA)            | 3.51                   | 3.51    | 1.17             | 1.17        |  |  |
| Std_by_Pwr, (µW)         | 3.16                   | 3.16    | 1.06             | 1.06        |  |  |
| Delay, (ps)              | 322.67                 | 424.83  | 288.86           | 347.55      |  |  |
| EDP, $(10^{-24} J^2)$    | 2.18                   | 5.83    | 0.72             | 1.63        |  |  |
| Norm_EDP                 | 1                      | 1       | 0.33             | 0.27        |  |  |
| UNG, (mV)                | 492                    | 468.2   | 492              | 484.2       |  |  |
| Norm_UNG                 | 1                      | 1       | 1                | 1.03        |  |  |
| σDelay                   | 22.5                   | 16.0    | 18.8             | 19.2        |  |  |
| Norm_ σ <sub>Delay</sub> | 1                      | 1       | 0.84             | 1.2         |  |  |
| Area (µm <sup>2</sup> )  | 989.2                  | 1454.41 | 889.2            | 1029.718    |  |  |
| Norm_Area                | 1                      | 1       | 0.90             | 0.71        |  |  |
| FOM                      | 1                      | 1       | 4.01             | 4.34        |  |  |

Table 8 - Characteristics of proposed 40-bit tag comparator

Table 9 - PVT variation effect on av\_extracted 40-bit tag comparator

|                |                                                  |       | Process Corners |       |       |       |       |       |        |  |
|----------------|--------------------------------------------------|-------|-----------------|-------|-------|-------|-------|-------|--------|--|
| Temp           | Parameters                                       |       | M-C             | SD    |       |       | Pro   | posed |        |  |
|                |                                                  | FF    | FS              | SF    | SS    | FF    | FS    | SF    | SS     |  |
|                | Power, (µW)                                      | 38.32 | 33.63           | 35.37 | 28.23 | 17.03 | 11.98 | 18.43 | 10.76  |  |
|                | Delay, (ps)                                      | 520.4 | 480.9           | 560.7 | 613.3 | 380.3 | 360.7 | 490.6 | 567.6  |  |
|                | Leakage, (µA)                                    | 4.20  | 4.01            | 2.33  | 1.98  | 2.04  | 2.0   | 0.614 | 0.608  |  |
| 27 °C          | Std_by_Pwr, (µW)                                 | 3.78  | 3.61            | 2.10  | 1.78  | 1.84  | 1.80  | 0.553 | 0.548  |  |
| 27 C           | UNG, (mV)                                        | 454   | 433             | 512   | 498   | 462   | 442   | 532.4 | 516    |  |
|                | EDP, $(10^{-24} J^2)$                            | 10.40 | 7.78            | 11.10 | 10.62 | 2.46  | 1.56  | 4.44  | 3.47   |  |
| -              | UNG/EDP, (10 <sup>23</sup><br>V/J <sup>2</sup> ) | 0.43  | 0.55            | 0.46  | 0.47  | 1.87  | 2.83  | 1.19  | 1.48   |  |
|                | Power, (µW)                                      | 38.83 | 34.23           | 36.00 | 29.93 | 17.56 | 12.41 | 18.69 | 11.34s |  |
|                | Delay, (ps)                                      | 534.7 | 510.6           | 580.3 | 632.8 | 392.4 | 372.3 | 508.4 | 580.8  |  |
|                | Leakage, (µA)                                    | 6.32  | 6.11            | 4.48  | 4.16  | 4.57  | 4.36  | 1.54  | 1.51   |  |
| 110            | Std_by_Pwr, (µW)                                 | 5.69  | 5.49            | 4.03  | 3,74  | 4.12  | 3.93  | 1.39  | 1.36   |  |
| <sup>0</sup> C | UNG, (mV)                                        | 442   | 419             | 496   | 471   | 454   | 434   | 502   | 491    |  |
|                | EDP, $(10^{-27} J^2)$                            | 11.10 | 8.90            | 12.12 | 11.92 | 2.70  | 1.72  | 4.83  | 3.83   |  |
|                | UNG/EDP, (10 <sup>23</sup><br>V/J <sup>2</sup> ) | 0.39  | 0.47            | 0.41  | 0.40  | 1.68  | 2.52  | 1.03  | 1.28   |  |

The area is the *Length*×*Breadth* of total area covered by initial stage layout, 5 partitions of 8-bit PDN network layout and final stage layout. Also, the parasitic capacitance and resistances associated with layout designing for proposed tag comparator is 1152 and 331, respectively, which are less than MCSD based tag comparator (i.e. pcap = 1536 and pres = 426).

The comparison of simulated parameters of proposed and circuit in [6] based 40-bit tag comparator circuits are tabulated in Table 8. Table includes data for both pre-layout and post-layout simulation. Here, the UNG of proposed tag comparator is somewhat degraded after post-layout, but it is within 5% tolerable range. Also, the Energy-delay-product (EDP) of proposed circuit is least due to lesser delay compared to MCSD based tag comparator. Here, for pre-layout area estimation, transistor sizes are considered, while for pre-layout die area, area covered by av\_extracted view of circuit is considered. Finally, FOM is calculated for both schematic and av\_extraction, and all the parameters are normalized with respect to MCSD tag comparators. Higher value of pre-layout FOM and post-layout FOM are 4.01 and 4.34, respectively, reflecting efficacy of proposed scheme in high-speed application assuring lower power consumption.

Proposed 40-bit tag comparator is simulated in all four process corners and two temperatures (i.e. 27 °C and 110 °C) at 0.9 V supply voltage to analyse the effect of PVT variations. These variation effects are observed on av\_extracted view of tag comparator circuits for estimating the performance. Table 9 illustrates the result for process and temperature variation for designed 40-bit tag circuit. Here, the best case of efficiency i.e. (UNG/EDP) occurs at FS corner and 27 °C

temperature, whereas worst case occurs at SF corner and 110 <sup>0</sup>C temperature. The results indicate that the efficiency of proposed circuit does not degrade with process and temperature variations.

Since, the designed 40-bit tag comparator is to be used in cache controller of microprocessor to generate MISS bit, it is important to verify the accuracy of output signal retrieved from the tag comparator. For this analysis, eye diagram measurement is done on the high-speed MISS\_Lay signal generated from av\_extracted 40-bit proposed tag comparator. Table 10 shows the eye amplitude, width and height along with deterministic jitter. Here, the duty cycle distortion given by threshold crossing standard deviation (Std\_Dev) is lesser for proposed tag comparator MISS\_Lay signal (i.e. 278.81 ps) than MCSD based MISS signal (i.e. 321.23 ps). The bit period is 167.14 ps giving the high data rate of ~ 6 GB/s at 1 GHz clock frequency and 0.9 V V<sub>DD</sub>. The peak-peak jitter and rms jitter are 41.40 ps and 23.34 ps respectively. These values are less in comparison to MCSD scheme based on MISS signal jitter values. This means the deviation of data bit from ideal input signal timing is less in proposed MISS bit signal. Also, higher value of eye height means effect of noise on MISS signal is less hence the measure of eve closure i.e. 747.77 mV in 0.9 V V<sub>DD</sub> ensures high signal-to noise ratio (i.e., 22.746) in case of proposed circuit. Hence, the results summed up in table 10 relates to good signal quality along with better system performance. The Fig 8 illustrates the eye compliance mask for the MISS\_Lay signal of proposed av extracted 40-bit tag comparator. The area covered by pentagon indicates the amplitude and duration limits of this signal. All the voltages above the pentagon line will fail as it is maximum voltage range. Also, all the voltages below pentagon lines will fail as it is the minimum voltage range for this system. The complete green region indicates the acceptable voltage range.

Table 10 - Eye measurement for AV\_Extracted proposed 40-bit Tag Comparator

| -                          |             |
|----------------------------|-------------|
| Measurement                | Value       |
| Threshold crossing average | 482.55 p(s) |
| Threshold crossing StdDev  | 278.81 p(s) |
| Level 0 mean               | 4.82 m(V)   |
| Level 0 StdDev             | 1.66 m(V)   |
| Level 1 mean               | 866.20 m(V) |
| Level 1 StdDev             | 36.31 m(V)  |
| Eye Amplitude              | 861.38 m(V) |
| Eye Height                 | 747.77 m(V) |
| Eye Width                  | 167.14 p(s) |
| Eye S/N                    | 22.74       |
| Eye Rise time              | 141.83 p(s) |
| Eye Fall time              | 162.76 p(s) |
| Random Jitter average      | 41.40 p(s)  |
| Random Jitter StdDev       | 23.34 p(s)  |
| Deterministic Jitter       | 114.34 p(s) |



Fig. 8 - Simulated Eye diagram measurement and mask for av\_extracted MISS signal of 40-bit proposed tag comparator

#### 6. Conclusion

A new domino scheme, known as stacked-scheme based clocked-bleeder (SS-CBD) domino is introduced in this paper for wide fan-in implementation of dynamic logics. The approach used in this technique maintains dynamic node to a value below VDD in pre-charge as well as evaluation phase depending upon inputs, which helps in reducing dynamic power consumption due to low voltage swing. The performance of designed technique is compared against performance of conventional domino techniques in same environmental conditions. This technique is found to be efficient in terms of power and speed with relatively low load. The FOM, (UNG/EDP) of proposed 8-in OR gate is better than rest compared techniques, relating high circuit efficacy due to use of stacking effect. Further, 40-bit tag comparator implementation is done using proposed scheme yielding high data rate of ~ 6 Gb/s at 1 GHz clock frequency. For iso-robust condition, the saving of power is 58% along with ~ 10.8% enhancement in speed in comparison to prior work. The proposed domino based 40-bit tag comparator is useful in cache designing of high-speed microprocessors. The time required by this tag comparator in memory mapping is relatively lower and less leaky compared to other designs.

#### Acknowledgement

Author would like to acknowledge MHRD, Govt. of India for supporting this research.

#### References

- [1] J.M. Rabaey, A. Chandrakasan, B. Nicoli.,(2003). Digital Integrated Circuits, a Design Perspective. Second Ed., Prentice Hall, Englewood Cliffs.
- [2] M. Asyaei, A. Peiravi. (2014). Low power wide gates for modern power efficient processors. Integr. VLSI J. 47 272–283.
- [3] A. Peiravi, M. Asyaei. (2013). Current comparison based domino: New low leakage high speed domino circuit for wide fan-in gates. IEEE Trans. Very Large Scale Integr. Syst, 21(5), 934–943.
- [4] Moradi Farshad, Tuan-Vu Cao, et al. (2013). Domino logic designs for high-performance and leakage-tolerant applications. Integration the VLSI journal. 46(3), 247-254
- [5] M. Asyaei. (2015). A new leakage-tolerant domino circuit using voltage comparison for wide fan-in gates in deep sub-micron technology. Integr. VLSI J. 51, 61–71
- [6] M. Nasserian, M. Kafi- kangi, M. M. Nejad. (2016). A low-power fast tag comparator by modifying charging scheme of wide fan-in dynamic OR gates. Integration, the VLSI J. 52, 129-141
- [7] M. Asyaei, Farshad Moradi. (2018). A Domino Circuit Technique for Noise-Immune High Fan-In Gates. Journal of Circuits, Systems, and Computers. 27(10), 1850151,
- [8] Garg S, Gupta TK. (2019). FDSTDL: low-power technique for FinFET domino circuits. Int J Circ TheorAppl. 47(6), 917-940
- [9] Garg, S. and Gupta, T.K. (2020). A 4:1 multiplexer using low-power high-speed domino technique for large fanin gates using FinFET", Circuit World. https://doi.org/10.1108/CW-09-2019-0128
- [10] D. Park, S. Yoon, I. Jung and C. Kim. (2007). Noise-aware split-path domino logic and its clock delaying scheme. Journal of Circuits, Systems, and Computers. 16, 139–154
- [11] N. Ekekwe and R. Etienne-Cummings. (2006). Power dissipation sources and possible control techniques in ultradeep submicron CMOS technologies. Microelectronics. J. 37, 851–860
- [12] R. Kumar. (2003). Interconnect and noise immunity design for the Pentium 4 processor. Proc. 40th Annual Design Automation Conf. (ACM) (Anaheim. CA. USA). 938–943,
- [13] H. Suzuki, C. H. Kim, K. Roy. (2007). Fast tag comparator using diode partitioned domino for 64-bit microprocessors. IEEE Trans. Circuits Syst. I: Reg Regautom J. 54(2), 322–328
- [14] R. Krambeck, C. M. Lee, H. F. Law. (1982). High-speed compact circuits with CMOS. IEEE J. Solid-State Circuits. 17, 614–619
- [15] D. V. Ponomarev, G. Kucuk, O. Ergin, K. Ghose, P. M. Kogge. (2003). Energy-efficient issue queue design. IEEE Trans. Very Large Scale Integr. Syst 11(5), 789–800
- [16] H. Mizuno, et. al. (1996). A 1-V, 100-MHz, 10-mW cache using a separated bit-line memory hierarchy architecture and domino tag comparators. IEEE J. Solid- State Circuits 31(11), 1618–1624
- [17] S. Hui, G. Jing, W. Jiajing, and Zh. Qianling. (2003). A 1.8-V 64-Kb four-way set-associative CMOS cache memory using fast sense amplifier and split dynamic tag comparators. Proceedings International Conference ASIC. 474–477
- [18] L. Wang, R. Krishwamurthy, K. Soumyanath and N. R. Shanbhag. (2000). An energy-efficient leakage-tolerant dynamic circuit technique. Proc. 13th Annual IEEE Int. ASIC/SOC Conf. (IEEE. Arlington. VA. USA), 221–225
- [19] L. Wang and N. R. Shanbhag. (2000). An energy-efficient noise-tolerant dynamic circuit technique. IEEE Trans. Circuits Syst. II Analog Digit. Signal Process. 47, 1300–1306
- [20] A. Alvandpour, R. K. Krishnamurthy, K. Soumyanath and S. Y. Borkar. (2002). A sub-130-nm conditional keeper technique. IEEE J. Solid-State Circuits 37, 633–638

- [21] H. Mahmoodi-Meimand and K. Roy. (2004). Diode-footed domino: A leakage-tolerant high fan-in dynamic circuit design style. IEEE Trans. Circuits Syst. I Regul. Pap. 51, 495–503
- [22] V. Mahor and M. Pattanaik. (2015). Low leakage and highly noise immune FinFET-based wide fan-in dynamic logic design. J. Circuits Syst. Comput. 24, 1550073
- [23] A. Peiravi and M. Asyaei. (2012). Noise-immune dual-rail dynamic circuit for wide fan-in gates in asynchronous designs. IEEJ Trans. Electr. Electron. Eng. 7, 613–621
- [24] Ghimiray SR, Meher P, Dutta PK. (2018). Ultralow power, noise immune stacked-double stage clocked-inverter domino technique for ultradeep submicron technology. International Journal of Circuit Theory and Applications. 46(11), 1953–1967
- [25] Understanding Data Eye Diagram Methodology for Analysing High Speed Digital Signals: Semiconductor Components Industries. LLC. 2015: www.onsemi.com (AND9075/D)
- [26] M. Alioto, G. Palumbo and M. Pennisi. (2010). Understanding the effect of process variations on the delay of static and domino logic. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 18, 697–710
- [27] P. R. Kinget. (2005). Device mismatch and trade-offs in the design of analog circuits. IEEE J. Solid-State Circuits, 1212–1224