«A Dissertation Presented to The Academic Faculty by Craig Elkton Green In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy ...»
COMPOSITE THERMAL CAPACITORS FOR TRANSIENT
THERMAL MANAGEMENT OF MULTICORE MICROPROCESSORS
The Academic Faculty
Craig Elkton Green
In Partial Fulfillment
of the Requirements for the Degree
Doctor of Philosophy in the
George W Woodruff School of Mechanical Engineering
Georgia Institute of Technology
COMPOSITE THERMAL CAPACITORS FOR TRANSIENT
THERMAL MANAGEMENT OF MULTICORE MICROPROCESSORS
Dr. Andrei G Fedorov, Advisor Dr. Yogendra K. Joshi, Advisor School of Mechanical Engineering School of Mechanical Engineering Georgia Institute of Technology Georgia Institute of Technology Dr. Baratunde Cola Dr. Muhannad Bakir School of Mechanical Engineering School of Electrical and Computer Georgia Institute of Technology Engineering Georgia Institute of Technology Dr. Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology Date Approved: May 16th, 2012 To Sophia, my parents, Sharon and Herman, and all my loved ones who inspire and support me.
I would like to thank the members of my research groups, the Multiscale Integrated Thermofluidics Laboratory and the Microelectronics and Emerging Technology Thermal Laboratory, for their valuable time, feedback and suggestions that have helped to streamline my research process.
I would like to acknowledge the support of the Interconnect Focus Center, one of five research centers funded under the Focus Center Research Program, a Semiconductor Research Corporation program- for both their financial support and technical input that has helped to make my research more meaningful.
Finally I would like to thank my family. My wife, Sophia, I thank for her endless support, patience, encouragement and understanding. My mother I would like to thank for her unconditional love, which has carried me through difficult times. My father I would like to thank for the lessons of hard work, perseverance, and audacity that he taught me as a child and which drive me to continually strive to be the best version of myself.
Table 2.1 Thermophysical properties evaluated for cylindrical spreading investigation 17 Table 2.
2 PCM material properties evaluated for grid and residual convergence of the
Figure 3.3 Time-on-a-core before reaching a 90°C threshold for the isotropic array of Siislands layout, and different material properties (for quantitative comparison, a baseline of Si only layer yields 3 ms time of operation before reaching a 90°C threshold) 35
Figure 4.1 Schematic of 3D computational domain and boundary conditions 41 Figure 4.
2 SSC integration approaches (a) SSC integrated into vertically adjacent tier (as
Figure 5.14 Comparison of thermal response of ~130 W/cm2 heat flux hotspots on Pyrex substrate with CTC attached using Kapton tape TIM vs.
spreadable (Ceramique) TIM 83 Figure 5.15 Temperature histories of ~395W/cm2 heat flux hotspots with 2 mm diameter CTCs monolithically integrated as a part of the device under test, both compared to a Si
Figure 5.16 Comparison of temperature histories of ~395W/cm2 heat flux hotspots with 2 mm and 3 mm 50% PCM fraction CTCs monolithically integrated as a part of the device
Figure 5.17 Temperature histories of ~300W/cm2 heat flux hotspots with CTCs monolithically integrated as a part of the device under test, compared to a Si baseline 87 Figure 5.
18 Duty cycle of test chips with CTC and attached TEC for regeneration (a) full
Figure A.2 Process flow of the major stand-alone CTC fabrication steps 119 Figure A.3 Process flow of major fabrication steps for monolithically integrated CTC and
RCA Radio Corporation of America: clean process named after the company where it was first developed Re Reynolds number RF Radio frequency RIE Reactive ion etch RPM Rotations per minute RTD Resistance temperature detector
While 3D stacked multi-processor technology offers the potential for significant computing advantages, these architectures also face the significant challenge of small, localized hotspots with very large heat fluxes due to the placement of asymmetric cores, heterogeneous devices and performance driven layouts. In this thesis, a new thermal management solution is introduced that seeks to maximize the performance of microprocessors with dynamically managed power profiles. To mitigate the nonuniformities in chip temperature profiles resulting from the dynamic power maps, solidliquid phase change materials (PCMs) with an embedded heat spreader network are strategically positioned near localized hotspots, resulting in a large increase in the local thermal capacitance in these problematic areas.
Theoretical analysis shows that the increase in local thermal capacitance results in an almost twenty-fold increase in the time that a thermally constrained core can operate before a power gating or core migration event is required. Coupled to the PCMs are solid state coolers (SSCs) that serve as a means for fast regeneration of the PCMs during the cool down periods associated with throttling events. Using this combined PCM/SSC approach allows for devices that operate with the desirable combination of low throttling frequency and large overall core duty cycles, thus maximizing computational throughput.
The impact of the thermophysical properties of the PCM on the device operating characteristics has been investigated from first principles in order to better inform the PCM selection or design process.
solution, a prototype device called a “Composite Thermal Capacitor (CTC)” that monolithically integrates micro heaters, PCMs and a spreader matrix into a Si test chip was fabricated and tested to validate the efficacy of the concept. A prototype CTC was shown to increase allowable device operating times by over 7X and address heat fluxes of up to ~395 W/cm2. Various methods for regenerating the CTC have been investigated, including air, liquid, and solid state cooling, and operational duty cycles of over 60% have been demonstrated.
The tremendous rate of growth in performance capability of electronics devices over the past few decades has been accompanied by the introduction of some significant thermal challenges including power consumption, heat generation and large nonuniformities in chip temperature profiles [1, 2]. Localized hot spots with heat fluxes exceeding 200-300 W/cm2 have become more common with chip architectures that cluster high power units on the processor to minimize overall chip size . This has typically required thermal management systems that must be designed not only to handle the large background heat fluxes, but also localized hotspots.
Much of the current work involving on-chip hotspot cooling utilizes solid-state refrigeration using thermoelectric coolers (TEC), for example, [4-7]. An alternative solid state cooling (SSC) approach that does not use traditional bulk thermoelectric elements is the use of thin film thermoelectric elements or superlattice coolers [8, 9]. These thin film SSCs offer some advantages in terms of heat fluxes dissipated and improved integration within traditional electronics. However, SSC technology still cannot currently dissipate the largest hot spot heat fluxes, which approach 1 kW/cm2.
An alternative to SSC cooling that holds promise is direct liquid cooling. Liquid cooling is the most energy efficient of the chip cooling approaches available .
Furthermore large heat fluxes can be addressed with liquid cooling. Recently, localized heat fluxes in excess of 500 W/cm2 were removed using evaporative cooling . Heat
cooling . Many of the methods currently being considered for applying direct liquid cooling to hotspots are reviewed in .
While liquid cooling is efficient and effective, a key limitation to implementing it as a cooling choice in electronics systems is the packaging concerns. Liquid based systems must prevent leakage, the need for an external regeneration of the coolant, and sometimes require large and costly pumps. Addressing these thermal challenges will be further complicated in next generation 3D architectures that rely on stacking multiple microprocessors or electronics devices on top of one another to achieve enhancements in computing performance. Architectures that vertically integrate the cores in a 3D multitier package allow for a number of additional design advantages, including shorter wire lengths, increased packaging density, and heterogeneous technology integration that translate into a range of potential performance benefits such as decreases in noise, capacitance, and power consumption . In these 3D architectures, lack of access to the internal tiers of the 3D stack will make integrating hotspot cooling solutions such as those investigated in [2, 12, 14] increasingly difficult.
As an alternative to active hotspot cooling, computational control schemes such as Dynamic Core Migration (DCM) can potentially levelize the thermal profiles across the chips by actively migrating computations from hotter to cooler areas of the die to keep any one area of the chip from overheating . DCM schemes are enabled by the move towards multi core and many core microprocessors in response to Pollack’s and Amdahl’s scaling laws. Pollack’s and Amdahl’s scaling laws indicate that for power
massively parallelizable .
To avoid limitations in computation speed due to the serial portions of the code, asymmetric core architectures can be implemented where a few higher power serial cores augment the performance of the low power cores to provide additional throughput .
While a DCM approach can mitigate some hotspots, nonhomogeneous architectures which introduce dedicated components such as serial cores , may still experience hotspots due to their potentially higher heat fluxes, larger size, and decreased redundancy, . To compensate for the higher power densities the serial cores will either experience more throttling events during an intra-migration time slice or higher migration frequencies [19, 20]. A recent study of different multiplexing techniques for reducing maximum chip temperatures showed that very small (sub-ms) inter migration time slices had to be implemented in order to avoid the presence of hotspots .
In DCM schemes, there is parasitic computational cost associated with each throttling event that can become significant over time when the cycling is too rapid .
In addition to the computational cost, there is a power consumption associated with core migration that increases with increasing migration frequency . Furthermore, rapid thermal cycling can lead to reduced lifetime reliability for the chip . To minimize the performance losses associated with these gating and throttling events, an optimized system should be designed that can operate for longer periods without requiring an idle for cool-down, and have as short of an idle time as possible.
In order to address the unique challenges associated with thermal nonuniformities in 3D many-core architectures, an approach is proposed that is a departure from the traditional approach of bringing a specialized liquid cooling device to the hotspot to locally enhance heat transfer. Instead of attempting to increase the heat transfer coefficients in the hard-to-access internal layers of a 3D stack, the design proposed in this thesis seeks to locally increase the thermal capacitance in thermally troublesome areas of the chip to maximize the time that a core or device can operate before reaching its thermal threshold.
As shown schematically in Figure 1.1(a), for dynamically operated microarchitectures, increasing the local thermal capacitance of a device can significantly decrease the required frequency of core hopping, gating, or throttling events. This in turn reduces the parasitic computational overhead associated with the DCM implementation.
Thus, matching a device’s thermal capacitance to its intrinsic dynamics of power dissipation can “homogenize” the thermal time scales of devices with very different power dissipation profiles.
In order to locally alter the dynamic thermal response of the devices, a portion of the silicon on the inactive back side of the chips can be etched away and a material with a higher thermal capacitance, for example solid- liquid phase change materials (PCMs), can be placed in the cavity created by removal of silicon (Figure 1.2). The PCMs, named because of their ability to reversibly melt/solidify during heating/cooling processes, can absorb a large amount of thermal energy at a relatively constant temperature. One challenge of utilizing certain PCMs is that their typically low thermal conductivities limit the amount of material that can be melted prior to the device reaching its threshold temperature. This can be mitigated by using a “composite thermal capacitor” (CTC), consisting of PCM incorporated into a high thermal conductivity matrix to enhance heat spreading and therefore improve PCM utilization.