Puneet Gupta Research

My research interests primarily lie in CAD for VLSI physical design and manufacturing. An overview of my main research works is as follows.

Analysis of Manufacturing Variability
We proposed a new framework for assessing the impact of process variation on circuit performance and product value with respect to such relevant metrics as parametric yield at selling point, and amount of required design guardbanding. We also evaluate the merits of taking into account such previously unconsidered phenomena as correlations among process parameters. Our results indicate that the impact of variability is decreasing as technology scales. We also proposed the concept of Design for Value to explicitly aim at maximizing a $/wafer metric in circuit optimization.

Reducing Cost of Lithographic Correction
RET insertion and the resulting photomask complexity is becoming a major cost of design, especially for low-volume ASICs. RETs such as optical proximity correction (OPC) are oblivious of designer's intent and design function (e.g., timing criticalities, sensitivities, etc.); this results in unnecessary and costly overcorrection. We have proposed a novel minimum cost of correction (MinCorr) methodology to determine the minimum level of correction for each layout feature subject to the constraint that prescribed parametric yield is attained. We highlight potential solutions to the MinCorr problem and conclude that it is possible to reduce the total cost of OPC significantly while still meeting yield and cycle time targets by making OPC design aware. In a followup work with Photronics, we see such a technique getting 30% mask write time reduction on a real mask writer.

Separately, during the course of my internship at IBM T.J. Watson Research Center, we proposed a new library-based OPC flow which can save orders of magnitude in OPC runtime as well as make impact of OPC predictable during design. The results suggest almost no loss of CD control compared to traditional full-chip OPC.

Performance-Aware Fill Insertion
Dummy fill insertion, post-tapeout, is a standard practice to improve uniformity of chemical-mechanical planarization steps in wafer processing. The timing impact of dummy fill is typically worst-cased during physical design, or else ignored altogether. Dummy fill insertion is completely unaware of the design, and current methods at best rely on simple rules to limit capacitance impact. We review and develop estimates for capacitance and timing overhead of area fill insertion. We then give the first formulation of the performance impact limited fill (PIL-Fill) insertion problem. We propose ILP formulations as well as heuristics to solve the PIL-Fill problem. Our results indicate significant improvements in post-fill timing without loss in layout density control. We are currently improving the proposed algorithms to account for multi-layer effects.

Scan Chain Synthesis
Scan chain insertion can have a large impact on routability, wirelength and timing of the design. We have proposed a routing-based scan chain ordering flow which achieves up to 80% reduction in wirelength impact compared to the traditional placement-based approach. We have extended this work to achieve timing-feasible scan insertion wherein we find the minimum-wirelength timing-feasible connection point for a scan connection. In a related work, we have investigated scan chain ordering for improved, layout-aware delay fault coverage. We propose a multi-fragment greedy algorithm that solves the associated asymmetric traveling salesman problem in a manner that permits exploration of the tradeoff between test coverage and layout impact. We see up to 200% improvement in delay fault coverage with just 20% increase in wirelength compared to layout-driven scan chain ordering.

Power Estimation and Reduction
I have worked on quantifying the error in dynamic power estimation in CMOS circuits as done by conventional methods. Our experiments show that incorrectly accounting for capacitive coupling can be a major cause of discrepancy.
We have also proposed fast closed-form expression based methods for power grid analysis and optimization which can be used within a layout optimization flow. The proposed methods achieve almost perfect correlation with more involved numerical techniques and optimization achieves upto 30% reduction in power grid area.
Finally, we have investigated the use of small gate length biases to reduce leakage power and variability in a cell-based design context. This layout transparent technique can achieve upto 30% reduction in leakage and 40% reduction in variability at 130nm with potential improvements at 90nm and 65nm higher.

Systematic Variation Aware Design Methodologies
At IBM, we proposed a novel static timing methodology which correctly takes into account effects of pattern dependent and focus dependent variation. Isolated and dense lines print systematically differently at best-focus condition and behave differently through-focus. Though OPC and assist features try to correct for these distortions, there is a large residual left resulting in as much as 10% CD variation. We propose and implement a static timing analysis flow and show up to 40% reduction in timing uncertainty caused by CD variation. In a followup work, we use this nice property of focus-dependent CD variation to propose a novel design methodology of self-compensated design achieving 25% less area overhead compared to RDR approaches with same level of robustness. In a separate work, we tried to quantify the impact of various layout design rules on actual manufacturability in terms of CD variation.

Conventional design automation models and algorithms have been geared towards rectangular shaped wires and gates and hence ill-equipped to handle the simulated wafer shapes for purposes of power/performance analyses. We have proposed a full-chip timing and power analysis methodology including both wires and gates to analyze such litho-simulated contours. It is interesting to see that at 100nm lithographic defocus, leakage increases by up to 68\%, cycle time improves by up to 14%, and dynamic power reduces by up to 2%. We have also proposed the first model for non-rectangular channels which accounts for narrow width effects which has significantly less error that the previous simplified models. We have also proposed simplified prediction models for lithographic error from drawn layout.

Placement for Improved Process Window We have proposed perturbation of detailed placement of standard cells to redistribute whitespace in order to remove the so-called ``forbidden pitches'' in the layout and make the layout more amenable to SRAF and etch dummy insertion downstream. The proposed dynamic programming based algorithm used for this purpose reduces lithographic edge placement errors in resist CD by 90%-100% and 70%-100% in etch CD.

Topography Aware OPC We propose a novel method to drive OPC with a topography map of the layout that is generated by CMP simulation. The wafer topography variations result in local defocus, which we explicitly model in our OPC insertion and verification flows by algorithmically partitioning the layout into multiple defocus marking layers. The proposed topography-aware OPC yields 67%-80% reduction in worst-case edge placement errors compared to conventional OPC. Crosstalk Reduction
We proposed a wire-swizzling technique to reduce timing uncertainty caused by capacitive coupling in long parallel buses. We show up to 31% reduction in worst-case delay.

Evaluating Interconnect Architectures We propose a bandwidth-based metric to evaluate multi-layer interconnect stack architectures including via blockage estimates. We evaluate few publicly available 130nm and 90nm interconnect stacks.

Undergraduate Projects
Reconfigurable Computing
We proposed heuristics for scheduling operations of a program on a hypothetical completely reconfigurable processor. We extended this work to a codesign setup wherein a general purpose microprocessor communicates with FPGA based reconfigurable hardware. We proposed simulated annealing and genetic algorithm based hardware-software partitioning algorithms. The testbed was a modified MIPS simulator running ADPCM code. Signature Detection Using Adaptive Fuzzy Networks
We looked upon the problem of online signature (cursive English alphabet) recognition. Implementation was done in C. We used angle frequency histograms for signature characterization. For recognition we used an adaptive neuro-fuzzy inference system (ANFIS) with a hybrid learning rule.
Secure Digital Music
As a hobby project, we modified open-source mp3 players to support a digitally encrypted and watermarked format which permits use-based and time-based restricting of the song play.
Phonetic Search
As another hobby project, we developed a phonetic search engine for Indian languages based on a modified Soundex algorithm and a database backend.
Home Publications Research