Near threshold computing technology methods and applications 1st edition michael hübner

Visit to download the full and correct content document: https://textbookfull.com/product/near-threshold-computing-technology-methods-and-a pplications-1st-edition-michael-hubner/

More products digital (pdf, epub, mobi) instant download maybe you interests ...

Microarray Technology Methods and Applications 1st Edition Paul C.H. Li

https://textbookfull.com/product/microarray-technology-methodsand-applications-1st-edition-paul-c-h-li/

High-Performance Computing in Finance: Problems, Methods, and Solutions 1st Edition M. A. H. Dempster

https://textbookfull.com/product/high-performance-computing-infinance-problems-methods-and-solutions-1st-edition-m-a-hdempster/

Broadband metamaterials in electromagnetics technology and applications 1st Edition Douglas H. Werner

https://textbookfull.com/product/broadband-metamaterials-inelectromagnetics-technology-and-applications-1st-edition-douglash-werner/

Reliability and Statistical Computing Modeling Methods and Applications 1st Edition Hoang Pham (Editor)

https://textbookfull.com/product/reliability-and-statisticalcomputing-modeling-methods-and-applications-1st-edition-hoangpham-editor/

Energy Harvesting and Energy Efficiency Technology

Methods and Applications 1st Edition Nicu Bizon

https://textbookfull.com/product/energy-harvesting-and-energyefficiency-technology-methods-and-applications-1st-edition-nicubizon/

Near Field Radiative Heat Transfer Across Nanometer

Vacuum Gaps Fundamentals and Applications 1st Edition Soumyadipta Basu

https://textbookfull.com/product/near-field-radiative-heattransfer-across-nanometer-vacuum-gaps-fundamentals-andapplications-1st-edition-soumyadipta-basu/

Modern Mathematical Methods and High Performance Computing in Science and Technology M3HPCST Ghaziabad India December 2015 1st Edition Vinai K. Singh

https://textbookfull.com/product/modern-mathematical-methods-andhigh-performance-computing-in-science-and-technology-m3hpcstghaziabad-india-december-2015-1st-edition-vinai-k-singh/

Complexity in Entrepreneurship Innovation and Technology Research Applications of Emergent and Neglected Methods 1st Edition Elisabeth S.C. Berger

https://textbookfull.com/product/complexity-in-entrepreneurshipinnovation-and-technology-research-applications-of-emergent-andneglected-methods-1st-edition-elisabeth-s-c-berger/

Privacy Computing: Theory and Technology Li

https://textbookfull.com/product/privacy-computing-theory-andtechnology-li/

Michael Hübner · Cristina Silvano Editors

Near Threshold Computing

Technology, Methods and Applications

Near Threshold Computing

Michael Hübner • Cristina Silvano

Editors

Near Threshold Computing

Technology, Methods and Applications

Editors Michael Hübner

Ruhr-Universität Bochum Bochum, Germany

Cristina Silvano Politecnico di Milano Milano, Italy

ISBN 978-3-319-23388-8 ISBN 978-3-319-23389-5 (eBook)

DOI 10.1007/978-3-319-23389-5

Library of Congress Control Number: 2015954945

Springer Cham Heidelberg New York Dordrecht London © Springer International Publishing Switzerland 2016

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.

Printed on acid-free paper

Springer International Publishing AG Switzerland is part of Springer Science+Business Media (www.springer.com)

“He had the vague sense of standing on a threshold, the crossing of which would change everything.”

– Kate Morton, The Forgotten Garden

Preface

To face with the power/utilization wall posing the dark silicon problem, nearthreshold computing (NTC) has emerged as one of the most promising approaches to achieve an order of magnitude improvement or more in energy efﬁciency of microprocessors and reconﬁgurable hardware. NTC takes advantage of the quadratic relation between the supply voltage (Vdd) and the dynamic power, by lowering the supply voltage of chips to a value only slightly higher than the threshold voltage. The reduction in power, however, comes with associated drawbacks that include low operating frequency, less reliable operation of both logic and memory, and much higher sensitivity to parameter variability. Recently, industry and academia are actively investigating the technology and architecture issues, and some promising results have already been achieved. However, many challenges remain before NTC can become mainstream.

This book provides a deep insight view form into the most relevant topics related to near-threshold computing from different perspectives. The book chapters are organized in three main parts showing highly relevant research results from specific design and technological challenges (Part I), micro-architectural concerns including the energy-efficient management of voltage islands (Part II), and finally the very important design of the memory subsystem for NTC (Part III).

Overall, we believe that the chapters cover a set of deﬁnitely important and timely issues impacting the present and future research on near-threshold computing. We sincerely hope that the book could become a solid reference in the next years. In our vision, the authors put a big effort in clearly presenting their research contribution outlining the potential impact and the open challenges. We would like to thank all the authors who agreed to contribute to the book.

Bochum, Germany

Michael Hübner Milano, Italy Cristina Silvano July 2015

1 Extreme Energy Efficiency by Near Threshold Voltage Operation ...................................................................................

Shekhar Borkar Part II Micro-Architecture Challenges and Energy Management at NTC

2 Many-Core Architecture for NTC: Energy Efficiency from the Ground Up ...............................................................................

Josep Torrellas

3 Variability-Aware Voltage Island Management for Near-Threshold Computing with Performance Guarantees.........

Ioannis Stamelakos, Sotirios Xydis, Gianluca Palermo, and Cristina Silvano Part III Memory System Design for NTC

4 Resizable Data Composer (RDC) Cache: A Near-Threshold Cache Tolerating Process Variation via Architectural Fault Tolerance ........................................................................................

Avesta Sasan, Fadi J. Kurhadi, and Ahmed M. Eltawil

5 Memories for NTC ..................................................................................

Tobias Gemmeke, Mohamed M. Sabry, Jan Stuijt, Pieter Schuddinck, Praveen Raghavan, and Francky Catthoor

Part I

NTC Opportunities, Challenges and Limits

Chapter 1 Extreme Energy Efﬁciency by Near Threshold Voltage Operation

Shekhar Borkar

Abstract Technology scaling will continue providing abundance of transistors for integration, only to be limited by the energy consumption. Near threshold voltage (NTV) operation has potential to improve energy efficiency by an order of magnitude. We discuss benefits, challenges, and circuit and system design considerations for reliable operation over a wide range of supply voltage—from nominal down to subthreshold region. The system designed for NTV can thus dynamically select modes of operation, from high performance, to high energy efficiency, to the lowest power.

Introduction

VLSI technology scaling has continued over the last several decades enabling affordable, efﬁcient gadgets enriching lives, which is taken for granted. There were several challenges on the way threatening progress: design productivity in the 1980s, power consumption in the 1990s, and leakage issues in the last decade. Advances in design automation for productivity, clock gating, and power management came to the rescue. Although the technology scaling treadmill has continued doubling transistors every generation, reduction in supply voltage scaling has reduced, and consequently it does not reduce energy per operation enough to utilize all the transistors. Therefore, the next challenge we face is of energy efﬁciency—not just low power—but to continue to deliver logic throughput with much less energy consumption. An order of magnitude reduction in energy per operation will be required.

Subthreshold operation of circuits, where supply voltage is reduced below the threshold voltage of the transistor, was believed to be the most efficient operating point. Although this mode of operation consumes much lower power, it is not necessarily the most energy efficient, as we will show later. Rather, near threshold voltage (NTV) operation, where supply voltage is reduced close to the threshold, provides higher energy efficiency. We will describe the benefits of the NTV operation, issues and design challenges, experimental results, and opportunities to enable this new design paradigm.

S. Borkar (*)

Intel Corporation, 2111 NE 25th Ave, Hillsboro, OR 97124, USA

e-mail: Shekhar.Y.Borkar@intel.com

M. Hübner, C. Silvano (eds.), Near Threshold Computing, DOI 10.1007/978-3-319-23389-5_1

S. Borkar

Beneﬁts of Near-Threshold-Voltage (NTV)

At nominal operating voltage, the frequency of operation reduces almost linearly with reduction in the supply voltage, reducing performance linearly, and reducing active energy per operation quadratically. Leakage power too reduces exponentially, and therefore reducing supply voltage should not only reduce power but also improve energy efficiency. We expected this effect to continue through subthreshold region, providing extreme energy efficiency, and conducted an experiment by designing a simple accelerator on 65 nm CMOS, taking into consideration all the design challenges described later, and evaluating it for energy efficiency [1]. The primary goal of this experiment was to evaluate complex tradeoffs in performance, active energy, leakage energy, and overall energy efficiency; we expected energy efficiency to continue to improve with reduction in voltage, and extend well into the subthreshold region of operation, with even greater efficiency.

The results were, however, a little surprising as shown in Fig. 1.1. As the supply voltage is reduced the frequency reduces (a), and the energy efﬁciency increases (b)

Fig. 1.1 Energy efﬁciency of NTV operation

as expected; however, it peaks near the threshold voltage of the transistor and then starts reducing in the subthreshold region. This unexpected reduction in the subthreshold region is explained by the following argument. In the subthreshold region leakage power dominates, and it reduces with voltage but the reduction in frequency is larger than reduction in the leakage power, reducing energy efficiency. Therefore, it is desirable to operate close to the threshold voltage of the transistor for maximum energy efficiency, providing an order of magnitude increased energy efficiency compared to operating at the nominal supply voltage. Subthreshold operation does yield even lower power consumption, but at the expense of reduced energy efficiency, which may be desired in some applications.

Subsequent experiments show that benefits of the NTV operation continue with technology scaling, with measurements confirming benefits on 45, 32, and 22 nm technologies [2–4]. Notice that it includes even the new tri-gate (FinFET) transistor technology (22 nm), clearly showing benefits across technology generations today and more to come.

NTV Design Challenges and Solutions

A design following conventional design practices will scale in voltage improving energy efﬁciency; however, the voltage scaling will be limited for several reasons. First, process variations will play an important role hindering effectiveness of voltage scaling. Second, the voltage sensitive circuits will start failing much before supply voltage reaches the threshold voltage. The circuits have to be designed to operate in the NTV mode comprehending the side-effects of lowering the voltage. Third, subthreshold leakage power starts becoming a substantial portion of the total power. And impact on reliability, such as soft-errors, must be considered. In this section we discuss some of the major design challenges and solutions. Detailed discussion on this topic may be found in Dreslinski et al. [5].

Effect of Process Variations at NTV

As the supply voltage approaches threshold voltage, a small change in the supply voltage results in a large change in the logic delay or frequency of operation. Figure 1.2 shows modeling of frequency of a logic block with voltage scaling. The frequency reduces almost linearly as expected. However, even a 5 % change in the supply voltage or in the threshold voltage (process variation) has increasingly larger spread in the frequency as the voltage is reduced. As much as 50 % variation in the frequency may be expected near the threshold voltage.

Figure 1.3a shows Monte–Carlo simulations showing spread of frequency at nominal voltage as well as NTV. At nominal voltage the spread is ±18 %, and it increases to ±2× at NTV. Figure 1.3b shows impact of temperature increasing the

S. Borkar

spread from ±5 % at nominal voltage to ±2× at NTV across the temperature range. It is important to note that this effect is fundamental, that is, logic designed at nominal voltage will encounter large variations when operated at NTV as well as the logic designed speciﬁcally for NTV operation.

To compensate for logic performance variations several techniques have been proposed, including applying body bias. These conventional techniques will have limited scope because deeply scaled technologies have either no body, or little

Fig. 1.2 Modeling frequency variation

Fig. 1.3 Modeling and measurements of variations

Fig. 1.4 Frequency assignment in a many-core system

body effect left, and the energy cost of ﬁne grain variation control could reduce the energy beneﬁt.

We propose to tolerate the effect of variations using system level techniques. For example, in a many core system where the number of cores is very large, the cores will exhibit different frequency of operation due to variations. Assign the nearest frequency of operation to these cores, and due to the law of large numbers, the overall logic throughput of the chip will not be affected, as shown in Fig. 1.4. This can be achieved by intelligent system software by dynamically reconﬁguring the system by introspection; considering instantaneous throughput requirement, energy consumption, frequency of operation of a core, and manage the system within established limits. That is why, hardware/software co-design will be an important consideration to harvest the beneﬁts of NTV.

Subthreshold Leakage

The subthreshold leakage power at NTV shows two adverse effects: (1) disproportionately large leakage power making it a substantial portion of the total power, and (2) higher variability in the leakage power itself. Careful examination of Fig. 1.1 shows that across the entire supply voltage range the total power reduces by four orders of magnitude, but the leakage power reduces by only three orders of magnitude. The active power reduces cubically, but leakage power does not, and that is why, expect disproportionately larger percentage of subthreshold leakage power with NTV operation.

Figure 1.5 shows modeling of subthreshold leakage power in successive generations of technologies. Assuming 20 % of the total power is in leakage in each generation, it shows percentage of leakage power increasing with NTV. As much as 50 % of the total power could be in leakage and that too with much increased variability. The total power consumption of the system is much lower, but substantial portion of the power will be in leakage. We will conﬁrm this later with a sizable design experiment. Also, this disproportionally large leakage currents with high variability pose most of the design challenges at NTV.

Most logic designs show low average activity, hence at NTV the active power is low and thus the leakage power dominates, reducing the effectiveness of NTV for energy efﬁciency. Therefore, ﬁne grain leakage power management, with sleep transistors or power gating techniques will be even more important.

Designing SRAM and Register File

Small signal arrays, such as static memory, are designed to operate in a narrow voltage range and need signiﬁcant design considerations for NTV operation. 6T static memory cells are typically designed with small transistors for higher density, and thus have stability and yield issues at lower voltages. There are two potential solutions for static memory: (1) employ larger 6T memory cells, or 8T, 10T cells which can operate at lower voltages, all compromising area, and (2) do not operate static memory blocks at NTV. Since static memory’s active energy consumption is relatively low in a system it may be a good compromise.

Register ﬁle circuits at NTV are limited by contention in read/write circuits due to process variation which becomes worse with technology scaling; minimum sized devices are worse in this respect. Also at lower voltages, increased write contention between strong PMOS pull-up and weak NMOS transfer devices across process variations could result in faulty behavior.

The register ﬁle circuit can be made NTV friendly by replacing the conventional dual-ended write cell equipped with a transmission gate [6, 7], as shown in Fig. 1.6.

On the one hand, upsizing the NMOS transfer devices in a conventional dual-ended write cell would improve write contention; however, on the other hand, higher threshold voltage in cross-coupled inverter devices caused by process variation still increases write completion delay, limiting voltage scaling. By replacing NMOS transfer devices with full transmission gates improves both contention and voltage scaling because: (a) it provides two paths to write “1” or “0” to both node bit lines, averaging random variation across two transistors, (b) strong “1” and “0” writes on both sides, and (c) cell symmetry (NMOS and PMOS) reduces the effect of systematic variation.

This NTV tolerant register ﬁle design does incur higher area, and higher active energy due to transmission gate, instead of a simple transfer NMOS device in the cell.

Fig. 1.5 Subthreshold leakage power

S. Borkar

Designing Latches and Flip-Flops

The storage nodes in latches and flip-flops have weak keepers and large transmission gates. When the transmission gate for the slave stage of a conventional master-slave flip-flop circuit is turned off, the weak on-current from the slave-keeper contends with the large off-current through the transmission gate. This causes the node voltage to drop, affecting the stability of the storage node. Low voltage reliability of the flip-flops can be improved by the use of non-minimum channel length devices in the transmission gates to reduce off-currents, and with upsized keepers to improve oncurrents to restore charge lost due to leakage. The write operation remains unaffected

Fig. 1.6 NTV tolerant register ﬁle

since the keepers are interruptible. The circuit modiﬁcations shown in Fig. 1.7 reduce the worst-case droop by 4× in the ultra-low voltage optimized design.

To tolerate the effects of variations at low voltages, averaging technique can be employed, as shown in Fig. 1.8, described in Hsu et al. [4]. Vector ﬂip-ﬂops across two adjacent cells with shared local minimum sized clock inverters to average variation, reduce low voltage hold time violations and improve minimum supply voltage by 175 mV. The stacked min-delay buffers also limit variation-induced transistor speed up, improving hold time margin at low voltage by 7–30 %.

Multiplexers and Logic Gates

Wide multiplexers are also prone to static droops on nodes shared by transmission gates at low voltages. Such structures are typical for one-hot multiplexers, where the on-current of one of the selected inputs contends with the off-current of the

Fig. 1.7 NTV friendly ﬂip-ﬂop design

Fig. 1.8 Vector ﬂip-ﬂop

S. Borkar

1.9 Multiplexers redesigned for NTV

remaining unselected inputs. To avoid this effect, wide multiplexers should be remapped using 2:1 multiplexers as shown in Fig. 1.9, thereby reducing the worstcase off-current contention. Remapping a one-hot 4:1 multiplexer to an encoded 4:1 multiplexer composed of 2:1 multiplexers results in up to 3× reduction in the worstcase static droop.

Static logic gates and combinational logic, too, need special consideration for NTV operation as discussed in Seok et al. [8] and Jain et al. [9]. Figure 1.10 shows impact of random process variations (6σ) on relative logic performance considering (1) depth of stacks in logic gates (fan-in), (2) width of multiplexers, (3) choice of threshold voltage (Vt), and (4) transistor widths in the gates.

Figure 1.10a shows that the delay increases exponentially with the depth of the stack, limiting logic fan-in to 2 or 3 inputs. This limitation on fan-in could increase the number of logic gates in a logic path, and needs careful attention. Figure 1.10c shows that wide transmission gates based multiplexers, too, need to be limited to 2 to 3 wide, once again potentially increasing the number of gates in a given logic path. In general, limitation on the depth of the stack, or gate fan-in, results in increased gates in the logic path, which is preferred for NTV, but not necessarily optimal for nominal voltage design. Figure 1.10b, d clearly show that nominal Vt and increased device width are optimal choices for NTV design, which is not the case for nominal voltage design. Therefore, a logic design optimized for NTV is probably not optimal for normal operation and vice versa; you have to pick the optimal design point.

Level Shifters

The use of multiple supply voltage domains results in the need for level shifter circuits at the low-to-high voltage domain boundaries. A conventional level shifter uses a CVSL stage to provide the up-conversion functionality, with the associated contention currents contributing to a signiﬁcant portion of the level shifter power. Driving the output load directly with the CVSL stage increases its size, while use of

Fig.

additional gain stages after the level shifter to reduce CVSL stage loading results in increased delay.

Figure 1.11a shows a 2-stage cascaded split-output level shifter. An intermediate supply voltage for up-conversion over such a large voltage range limits the maximum current ratio between the higher-supply PMOS pull-up and lower-supply NMOS pull-down devices for correct CVSL stage functionality. Energy-efﬁcient up-conversion from subthreshold voltage levels to nominal supply outputs is achieved by decoupling the CVSL stage of this level shifter from the output, enabling a downsized CVSL stage for the same load without extra gates in the critical path. Reduced contention currents in a downsized CVSL stage enable the split-output design to achieve up to 20 % energy reduction for equal fan-out and delay.

Ultra-low voltage split-output level shifters are described in Hsu [4] and shown Fig. 1.11b. This level shifter decouples the CVSL stage from the output driver stage and interrupts contention devices, improving minimum supply voltage by 125 mV. For equal fan-in/out, the level shifter weakens contention devices, thereby reducing power by 25–32 %.

Fig. 1.10 Logic design considerations

S. Borkar

Soft Errors and Reliability

Single event upsets (soft-errors) is of concern, and especially with NTV operation because of lower supply voltage, and thus increased susceptibility. These errors are caused by alpha particles and more importantly cosmic rays (neutrons), hitting silicon chips, creating charge on the nodes to ﬂip a memory cell or a logic latch. These errors are transient and random. It is relatively easy to detect these errors in memories by protecting them with parity, and correcting these errors in memory is also relatively straight forward by employing error correcting codes. However, if such a single event upset occurs in random logic state then it is difﬁcult to detect and correct. Soft error rate per bit has been decreasing with technology scaling; however, the number of bits is almost doubling each generation, with the net effect of increased soft errors at the system level.

Recent results show that soft error rates do not increase as rapidly with NTV operation as previously feared [10]. This experiment shows that the error rate increase is less than on order of magnitude as the supply voltage is reduced, as shown in Fig. 1.12 Nevertheless, this remains an active topic of investigation in the community.

NTV operation will have some positive impacts on reliability. Due to reduced supply voltage electric ﬁelds are reduced, and lower power consumption will yield lower junction temperature. Therefore, device aging effects, such as NBTI will be less of a concern. Lower temperature and lower currents will also reduce electromigration related defects.

Fig. 1.11 Level shifters

Experimental NTV Processors

Following the NTV design guidelines several experimental designs have been reported [9, 10] with encouraging results. We highlight the experimental Pentium® processor designed to operate from nominal to NTV, as well as in the subthreshold region, with varying performance, power, and energy efﬁciency.

The experimental processor was designed on 32 nm bulk CMOS process, follow ing all of the design guidelines discussed before, with the goal to operate over the full voltage range—from nominal to subthreshold. This fabricated processor was housed in a standard PC platform, booted popular operating systems, and ran several industry standard applications including benchmarks.

The results show that at nominal supply voltage it provides the highest performance with modest power and modest energy efficiency. In the subthreshold region it provides the lowest power, with reduced performance and modest energy efficiency. At NTV, however, it provides the highest energy efficiency, with three orders of magnitude lower power than the original design two decades ago on 0.7 μm technology, yet delivering the same performance it did then. This example shows that an NTV design can provide a wide dynamic range—from high performance to low power to high energy efficiency as shown in Fig. 1.13. Notice that this experiment reports only 5× improvement in energy efficiency at NTV due to two reasons: (1) the original (microarchitecture) design on 0.7 μm technology did not comprehend NTV guidelines, and (2) the SRAM (caches) in this design were designed with limited voltage scalability towards NTV.

Figure 1.14 shows the measured results. The processor voltage scales from maximum voltage of 1.2 V down to subthreshold region below 300 mV, but the memory voltage scales down to only 550 mV, as explained before. The frequency reduces with supply voltage, and the total power consumption too reduces almost cubically. The total energy per cycle reduces as supply voltage reaches NTV by almost 4.7×,

Fig. 1.12 Soft error rate at NTV

S. Borkar

but starts increasing as it enters the subthreshold region. Although the dynamic energy per cycle reduces in the subthreshold region, the leakage energy per cycle increases exponentially, thus increasing total energy consumed per cycle.

Figure 1.15 shows further insight in power consumption, considering dynamic and leakage power, logic and memory, and the three modes of operation (superthreshold, NTV, and subthreshold). In the superthreshold mode, most of the power is active power, with only 11 % of the total power in logic leakage, and small portion in the memory. In the NTV mode, 53 % of the power is in active logic, and 27 % in the active leakage, with 15 % in the memory leakage. This conﬁrms that leakage power becomes substantial portion of the total power at NTV. In the subthreshold region, the entire power consumption is dominated by the leakage power of both logic and memory.

These results were expected as discussed before, but now quantified and confirmed with a significant prototype design boosting our confidence.

System Level Optimization

Although NTV has a potential to improve energy efﬁciency of logic throughput by an order of magnitude a careful system level optimization is required to determine the most efﬁcient NTV operating point.

Fig. 1.13 NTV Pentium® processor

Fig. 1.14 Frequency and power measurements

1.15 Active and leakage power breakdown

Fig.

S. Borkar

Fig. 1.16 Compute and global interconnect energy

In the future technologies, logic energy (with its own local interconnect) will scale disproportionately with respect to global interconnect energy as shown in Fig. 1.16. That is, energy per operation will reduce faster than energy to move data over a ﬁxed distance. Since NTV reduces frequency of operation it reduces throughput of the logic block, hence more logic will be needed for constant throughput (for example increased parallelism). This may incur more data movement, adding data movement energy to the system. As the supply voltage comes closer to the threshold with NTV, system’s logic energy reduces but the data movement energy increases. Hence a global optimization at the system level is required to determine the optimal NTV operating point.

Prospective of NTV

The great old days of Moore’s law scaling and performance; typified by dramatic improvements in transistor density, speed, and energy, delivered 1000-fold performance improvement. The progress continues, but will be more difficult, with technology scaling producing continuing improvements in transistor density, but comparatively little improvement in transistor speed and energy. As a result, in the future, the frequency of operation will increase slowly; and energy will be the key limiter of performance. That is why there is a fear of Dark Silicon—unused silicon, or idle transistors—due simply because of energy. With business as usual, and without continued innovations, this would be a likely scenario, but far from reality. Future designs will use large-scale parallelism, with heterogeneous cores, a few large cores and a large number of small cores, operating at low frequency and low voltage, near threshold—NTV for extreme energy efficiency [11]. Aggressive use of various types of customized accelerators will yield the highest performance and greatest energy efficiency on many applications. The objective will be the purest form of energy proportional computing, and at the minimum levels of energy possible. Heterogeneity in compute and communication hardware will be essential to optimize for performance for energy proportional computing and to cope with variability—all made possible by NTV.

Conclusion

Moore’s Law will continue providing abundance of transistors for integration, only to be limited by the energy consumption. Near threshold voltage (NTV) operation of logic can improve energy efﬁciency by an order of magnitude. We have discussed several NTV design techniques for such future designs, allowing them to operate over a wide range of supply voltage, to dynamically select modes of operation, from high performance, to high energy efﬁciency, to the lowest power.

References

1. Kaul H et al (2009) A 320 mV 56 μW 411 GOPS/Watt ultra-low voltage motion estimation accelerator in 65 nm CMOS. IEEE Journal of Solid-State Circuits 44(1):107–44

2. Kaul H et al (2010) A 300 mV 494GOPS/W reconﬁgurable dual-supply 4-way SIMD vector processing accelerator in 45 nm CMOS. IEEE Journal of Solid-State Circuits 45(1):95–102

3. Kaul H et al (2012) A 1.45 GHz 52-to-162GFLOPS/W variable-precision ﬂoating-point fused multiply-add unit with certainty tracking in 32 nm CMOS. IEEE International Solid-State Circuits Conference (ISSCC), 2012, Feb. 2012 , Page(s): 182–184

4. Hsu S et al (2012) A 280 mV-1.1 V 256b reconﬁgurable SIMD vector permutation engine with 2-dimensional shufﬂe in 22 nm CMOS. IEEE International Solid-State Circuits Conference (ISSCC), 2012, Feb. 2012 , Page(s): 178–180

5. Dreslinski R et al (2010) Near-threshold computing: reclaiming Moore’s law through energy efﬁcient integrated circuits. IEEE Proceedings, Year: 2010, 99(2):253-2066

6. Kaul H et al (2012) Near-threshold voltage (NTV) design – opportunities and challenges. Design Automation Conference (DAC), 3–7 June 2012, Page(s): 1149–1154

7. Agarwal A et al (2010) A 32 nm 8.3 GHz 64-entry × 32b variation tolerant near-threshold voltage register ﬁle. VLSI Circuits Symposium, Year: 2010, Pages: 105–106

8. Seok M et al (2008) The Phoenix processor: a 30 pW platform for sensor applications. VLSI Circuits Symposium, Year: 2008, Pages: 188–189

9. Jain S et al (2012) A 280 mV-to-1.2V wide-operating-range IA-32 processor in 32 nm CMOS. IEEE International Solid-State Circuits Conference (ISSCC), 2012, Feb. 2012, Page(s): 66–68

10. Pawlowski R et al (201) Characterization of radiation-induced SRAM and logic soft errors from 0.33V to 1.0V in 65 nm CMOS. Custom Integrated Circuits Conference (CICC), Year: 2014, Pages: 1–4

11. Borkar S et al (2011) The future of microprocessors. Communications of the ACM, May 2011, 54(5):67–77

S. Borkar

Part II

Micro-Architecture Challenges and Energy Management at NTC

Chapter 2 Many-Core Architecture for NTC: Energy Efficiency from the Ground Up

Josep Torrellas

Abstract The high energy efficiency of NTC enables multicore architectures with unprecedented levels of integration, such as multicores that include 1000 sizable cores and substantial memory on the die. However, to construct such a chip, we need to fundamentally rethink the whole compute stack from the ground up for energy efficiency. First of all, we need techniques that minimize and tolerate process variation. It is also important to conceive highly-efficient voltage regulation, so that each region of the chip can operate at the most efficient voltage and frequency point. At the architecture level, we want simple cores organized in a hierarchy of clusters. Moreover, techniques to reduce the leakage power of on-chip memories are also needed, as well as dynamic voltage guard-band reduction in variation-afflicted onchip networks. It is also crucial to develop techniques to minimize data movement, which is a major source of energy waste. Among the techniques proposed are automatically managing the data in the cache hierarchy, processing in near-memory compute engines, and efficient fine-grained synchronization. Finally, we need coreassignment algorithms that are both effective and simple to implement. In this chapter, we describe these issues.

Introduction

As semiconductor devices continue to shrink, it is clear that we are about to witness stunning levels of integration on a chip. Sometime early in the next decade, as we reach 7 nm, we will be able to integrate, for example, 1000 sizable cores and substantial memory on a single die. There are many unknowns as to what kind of architecture such a many-core chip should have to make it general purpose. What is clear, however, is that the main challenge will be to make it highly energy efficient. Energy and power consumption have emerged as the true limiters to developing more capable architectures.

J. Torrellas (*)

University of Illinois, Urbana-Champaign, Champaign, IL, USA

e-mail: torrellas@cs.uiuc.edu

M. Hübner, C. Silvano (eds.), Near Threshold Computing, DOI 10.1007/978-3-319-23389-5_2

Another random document with no related content on Scribd:

—A groove must be plowed the full length of a piece to work it to advantage. Where a mortise-and-tenon joint is to be made in which the grooved surface is to become a part, the tenon must be so cut as to allow its filling the groove. The mortise should be cut before the groove is plowed. The tenon, after being worked the full width, is gaged from the face edge to a width equal to the length of the mortise and worked to that size. Fig. 183.

F. 183.

Especial care must be taken in gluing up the frame that no glue shall get into the grooves or on the edges of the panel.

109. Rabbeting.

Fig. 184 shows a corner of a frame rabbeted to receive a glass. Rabbets are best worked with either a rabbet plane or the combination plane. In rabbeting across the grain the spur must be set parallel with the edges of the cutter.

F 184 F 185

Since the parts of the frame are rabbeted the full length for convenience, a special joint is necessary at the corners. The mortises are cut before the rabbets are worked. The tenons are laid out so that the shoulder on one side shall extend as far beyond the shoulder on the opposite side as the rabbet is deep. Fig. 185

Where rabbeting must be worked with a chisel alone, Fig. 186 illustrates the manner of loosening up the wood preparatory to removing it, when the rabbet extends along the grain of the wood.

F 186

To place glass panels in rabbets, first place a slight cushion of putty in the rabbet that the glass may rest against it. A light cushion between the glass and the fillet will serve to keep the glass from breaking and will keep it from rattling. Fig. 187.

F. 187.

110. Fitting a Door.

—A door is a frame with a panel or a combination of panels. The names of the parts of a door and their relative positions are indicated in Fig. 188.

188

(1) Mark with a trysquare and saw off the lugs, the parts of the stiles which project beyond the rails. (2) Plane an edge of the door until it fits a side of the frame against which it is to be hung. If the frame is straight, this edge may be planed straight. It is not wise to take for granted the squareness or straightness of a frame. A test or series of tests may first be made with square and straight-edge. A mechanic, however, usually planes an edge until it fits the frame, testing by holding the door against the frame as near to its position as its size will allow. (3) Plane the bottom or top edge of the door until it fits the frame properly when the first planed edge is in position. (4) Measure the width of the frame at its top and bottom, Fig. 189, and transfer these dimensions to the top and bottom of the door, connecting them with a straight edge. When approaching the line, in planing, place the door against the frame often enough to see where the allowances must be made for irregularities in the frame. (5) The length of the frame may next be measured on each side and these dimensions transferred to the door. Connect them with a straight edge and plane and fit as was directed in the third step.

A door to work well must not be fitted perfectly tight; it must have a little “play,” the amount depending upon the size of the door.

The edge of the door which is to swing free is usually planed slightly lower at the back arris than at the front. An examination of

the movement of an ordinary house door will show the reason for this.

111. Hinging a Door.

—The hinges most commonly used in cabinet making and carpentry are the kind known as butts. Where the door stands in a vertical position, hinges in which the two parts are joined by a loose pin are generally used. By removing the pins the door may be removed without taking the screws out of the hinge. Such hinges are more easily applied than those with the fixed pin.

F. 190.

F. 191.

(1) Place the door in position; keep it tight against the top and the hinge side of the frame. (2) Measure from top and bottom of the door to locate the position for the top of the higher hinge and the bottom of the lower hinge. Usually, the lower hinge is placed somewhat farther from the bottom than the higher hinge is from the top. (3) With

the knife or chisel mark on both door and frame at the points just located, Fig. 190. (4) Take out the door, place the hinge as in Fig. 191, and mark along the ends, with a knife. In a similar manner mark the frame. Make certain that the openings on door and on frame are laid off so as to correspond before proceeding further. (6) Set the gage for the depth the hinge is to be sunk and gage both door and frame. (7) Set another gage for width of openings and gage both door and frame, keeping the head of the gage against the front of the door. (8) Chisel out these gains on door and frame. (9) If loose-pin butts are used, separate the parts and fasten them in place. Use a brad awl to make openings for the screws. To insure the hinges’ pulling tight against the side of the gain make the holes just a little nearer the back side of the screw hole of the hinge. Put the door in place and insert the pins. It is a good mechanic who can make a door hang properly the first time it is put up. It is better, therefore, to insert but one or two screws in each part of a hinge until the door has been tried. (10) If the door hangs away from the frame on the hinge side, take it off; take off hinge on door or frame, or both if the crack is large; chisel the gain deeper at its front. By chiseling at the front only and feathering the cut towards the back, the gain needs to be cut but about one-half as deep as if the whole hinge were sunk. If the door should fail to shut because the hinge edge strikes the frame too soon, the screws of the offending hinge must be loosened and a piece of heavy paper or cardboard inserted along the entire edge of the gain. Fasten the screws and cut off the surplus paper with a knife. If plain butt hinges are used the operations are similar to those just described except that the whole hinge must be fastened to the door and the door held in place while fastening the hinges to the frame.

112. Locks.

—Locks which are fastened upon the surface of a door are called rim locks. Those which are set into mortises cut in the edge of the door are called mortise locks. Locks are placed somewhat above the middle of the door for convenience as well as appearance. Three styles of cabinet locks such as are used on drawers and small boxes are shown in Fig. 192.

F. 192.

The manner of applying a cabinet lock will be suggested by the lock itself. On surface locks, (1) the lock is held against the inside of the door or drawer and the position of the keyhole is marked. (2) This hole is bored. (3) The lock is screwed in place, and (4) the escutcheon fastened to the outer or front surface. If a face plate is used, the door is closed, the position marked, after which the door is opened and the plate is set. The face plate is mortised into the frame so that its outer surface shall be slightly lower than that of the wood. With a lock such as the box lock, Fig. 192, sufficient wood must be removed from the mortise so that the bolt may act properly before the plate is screwed fast.

PART III. WOOD AND WOOD FINISHING.

C X.

W.

113. Structure.

—For convenience, tree structure is usually studied (1) in transverse or cross section, (2) radially, (3) tangentially

V Vessels or pores

T S Tangential Section C.S.— Transverse

F. 193.

A transverse section is obtained by cutting a log at right angles to its length; a radial section by cutting it along the radius; a tangential section by making a cut at right angles to a radius. Fig. 193.

F 194

If we should cut transversely a young tree, a sprout, or branch of an oak or similar tree, we should find it composed of three layers of tissue (1) pith or medulla, (2) wood, (3) bark, Fig. 194. These tissues, if magnified, would be found composed of little closed tubes or cells. Fig. 195.

Examine the end of a log cut from a tree such as the oak; we shall find that the center, which in the young tree was soft, has become hard and dry, and that upon it are marked a series of concentric rings —rings having a common center. These rings are known as annual rings because one is added each year.

Usually, about three-quarters of the rings from the center outward will be found to have a different color from the remaining ones. These inner rings form what is called heartwood. The wood of the remaining rings will be found softer and to contain a larger proportion of sap. This part is called sapwood. Young trees are composed mainly of sapwood. As the tree grows older more of it is changed to

heartwood, the heartwood becoming greater in proportion to the sapwood with age.

F 195

Upon examining these rings each will be found to be made up of two layers; one a light, soft, open, rapid growth formed in the spring, the other, a dark, hard, close, slow growth formed in the summer.

Frequently, the center of the annual rings is not in the center of the log. Fig. 196. This is due to the action of the sun in attracting more nourishment to one side than to the other.

Surrounding the sapwood is the bark. The inner part of the bark is called bast and is of a stringy or fibrous nature. Bark is largely dead matter formed from bast, Fig. 195. Its function is to protect the living tissues.

Between the bast and the last ring of the woody tissue is a thin layer called the cambium. This layer is the living and growing part of the tree. Its cells multiply by division and form new wood cells on the inside and new bast cells on the outside.

F. 196.

Heartwood is dead so far as any change in its cells is concerned. Its purpose is merely to stiffen and support the weight of the tree. Sapwood, on the other hand, has many active cells which assist in the life processes of the tree, tho only in the outer layer of cells, the cambium, does the actual growing or increasing process take place. Again examining the end of the log, we shall find bright lines radiating from the center. They are composed of the same substance as the pith or medulla and are called pith or medullary rays. These rays are present in all trees which grow by adding ring upon ring but in some they are hardly visible. The purpose of these horizontal cells is to bind the vertical cells together and to assist in distributing and storing up plant food.

Fig. 197 shows a log cut longitudinally or lengthwise. The lines we call grain, it will be seen, are the edges of the annual rings, the light streaks being an edge view of the spring layer and the dark streaks an edge view of the summer or autumn wood.

F 197

F. 198.

Knots are formed by the massing or knotting of the fibers of the tree through the growth of a branch. Fig. 198 shows the manner in which the fibers are turned. This packing of the fibers is what causes a knot to be so much harder than the rest of the wood.

114. Growth.

—Sap is the life blood of the tree. In the winter when most of the trees are bare of leaves there is but very little circulation of the sap. The coming of spring with its increase of heat and light, causes the tree to begin to take on new life; that is, the sap begins to circulate. This movement of sap causes the roots to absorb from the soil certain elements such as hydrogen, oxygen, nitrogen and carbon, also mineral salts in solution. The liquid thus

absorbed works its way upward, mainly by way of the sapwood and medullary cells. Upon reaching the cambium layer, the nourishment which it provides causes the cells to expand, divide and generate new cells. It also causes the buds to take the form of leaves.

When the sap reaches the leaves a chemical change takes place. This change takes place only in the presence of heat and light, and is caused by the action of a substance called chlorophyll. The importance of the work performed by chlorophyll cannot be overestimated. Nearly all plant life depends upon it to change mineral substance into food. Animals find food in plant life because of this change.

Assimilation is the process of taking up and breaking up, by the leaves, of carbonic acid gas with which the cells containing chlorophyll come in contact. Carbon, one of the elements, is retained, but oxygen, the other element, is returned to the air. Carbon is combined with the oxygen and hydrogen of the water, which came up from the roots, to form new chemical compounds. Nitrogen and earthy parts, which came with the water, are also present.

Chlorophyll gives to leaves and young bark their green color.

The roots of the trees are constantly drinking plant food in the daytime of spring and early summer. From midsummer until the end of summer the amount of moisture taken in is very small so that the flow of sap almost ceases.

The leaves, however, are full of sap which, not being further thinned by the upward flow, becomes thickened thru the addition of carbonic acid gas and the loss of oxygen.

Toward the end of summer this thickened sap sinks to the under side of the leaf and gradually flows out of the leaf and down thru the bast of the branch and trunk, where another process of digestion takes place. One part of this descending sap which has been partly digested in the leaves and partly in the living tissues of root, trunk and branch, spreads over the wood formed in the spring and forms the summer wood. The second part is changed to bark. What is not used at once is stored until needed.

The leaves upon losing their sap change color, wither and drop off. By the end of autumn the downward flow of changed sap from the

leaves is completed and the tree has prepared itself for the coming winter.

It must be remembered that the foregoing changes are made gradually. After the first movement of the sap in the early spring has nourished the buds into leaves of a size sufficient to perform work, there begins a downward movement of food materials—slight at first, to be sure, but ever increasing in volume until the leaves are doing full duty. We may say, therefore, that the upward movement of the sap thru the sapwood and the downward flow of food materials thru the bast takes place at the same time, their changes being of relative volumes rather than of time.

115. Respiration and Transpiration.

—Plants, like animals, breathe; like animals they breathe in oxygen and breathe out carbonic acid gas. Respiration which is but another name for breathing, goes on day and night, but is far less active than assimilation, which takes place only during the day. Consequently more carbonic acid gas is taken in than is given out except at night when, to a slight extent, the reverse takes place, small quantities of carbonic acid gas being given off and oxygen taken in.

Very small openings in the bark called lenticles, furnish breathing places. Oxygen is also taken in thru the leaves.

Transpiration is the evaporation of water from all parts of the tree above ground, principally from the leaves.

The amount of water absorbed by the roots is greatly in excess of what is needed. That fresh supplies of earthy matter may reach the leaves, the excess of water must be got rid of. In trees with very thick bark, transpiration takes place thru the lenticles in the bottom of the deep cracks.

116. Moisture.

—Water is present in all wood. It may be found (1) in the cavities of the lifeless cells, fibers and vessels; (2) in the cell walls; and (3) in the living cells of which it forms over ninety per cent. Sapwood contains more water than heartwood.

Water-filled wood lacks the strength of wood from which the greater part of the moisture has been expelled by evaporation.

117. Shrinkage.

—Water in the cell walls—it makes no difference whether the cells are filled or empty—causes their enlargement and consequently an increase in the volume of the block or plank. The removal of this water by evaporation causes the walls to shrink; the plank becomes smaller and lighter. Thick walled cells shrink more than thin ones and summer wood more than spring wood. Cell walls do not shrink lengthwise and since the length of a cell is often a hundred or more times as great as its diameter the small shrinkage in the thickness of the cell walls at A and B, in Fig. 199, is not sufficient to make any noticeable change in the length of the timber

F. 199.

200.

Since the cells of the pith or medullary rays extend at right angles to the main body, Fig. 200, their smaller shrinkage along the radius

of the log opposes the shrinkage of the longitudinal fibers. This is one reason why a log shrinks more circumferentially, that is along the rings, than it does radially or along the radii. A second cause lies in the fact that greatly shrinking bands of summer wood are interrupted, along the radii, by as many bands of slower shrinking spring wood, while they are continuous along the rings.

F. 201.

This tendency of the log to shrink tangentially or along the radii leads to permanent checks. Fig. 201 A. It causes logs sawed into boards to take forms as shown in Fig. 201 B.

Warping is caused by uneven shrinkage. Sapwood, as a rule, shrinks more than heartwood of the same weight. The wood of pine,