Skip to main content

A survey of architectural techniques for DRAM power management

  • Sparsh Mittal
    Uploaded by
  connect to download
Academia.edu

A survey of architectural techniques for DRAM power management

A survey of architectural techniques for DRAM power management

  • Sparsh Mittal
    Uploaded by
Int. J. of High Performance System Architecture 1 A Survey of Architectural Techniques for DRAM Power Management Sparsh Mittal Electrical and Computer Engineering Iowa State University Iowa, USA, 50014 Email: sparsh@iastate.edu Abstract: Recent trends of CMOS technology scaling and wide-spread use of multicore processors have dramatically increased the power consumption of main memory. It has been estimated that modern data-centers spend more than 30% of their total power consumption in main memory alone. This excessive power dissipation has created the problem of “memory power wall” which has emerged as a major design constraint inhibiting further performance scaling. Recently, several techniques have been proposed to address this issue. The focus of this paper is to survey several architectural techniques designed for improving power efficiency of main memory systems, specifically DRAM systems. To help the reader in gaining insights into the similarities and differences between the techniques, this paper also presents a classification of the techniques on the basis of their characteristics. The aim of the paper is to equip the engineers and architects with knowledge of the state of the art DRAM power saving techniques and motivate them to design novel solutions for addressing the challenges presented by the memory power wall problem. Keywords: Architectural techniques, power efficiency, energy saving, DRAM, main memory, survey, review, classification Reference to this paper should be made as follows: Mittal, S. ‘A Survey of Architectural Techniques for DRAM Power Management’, Int. J. of High Performance System Architecture, Vol. 4, Nos. 2, pp. 110-119 Biographical notes: Sparsh Mittal received his B. Tech. degree in Electronics and Communications Engineering from the Indian Institute of Technology, Roorkee, India. He was the graduating topper of his batch and his major project was awarded Institute Silver medal. At present, he is pursuing his PhD degree in Electrical and Computer Engineering at Iowa State University, USA. He has been awarded scholarship and fellowship from IIT Roorkee and ISU. His research interests include memory system power efficiency, cache architectures in multicore systems and real-time systems. . systems show that memory system consumes as much as 50% more power than the processor cores [3]. There are several architectural and technological 1 Introduction trends that mandate the use of high amount of main memory resources and thus, contribute to the increase Recent years have witnessed a dramatic increase in in memory power consumption. Firstly, in modern power consumption of computing systems. As an processors, the number of cores on a single chip is on rise example, in year 2006 alone, the data-centers and servers [5, 6] and hence, the pressure on the memory system has in U.S. consumed 61 billion kilowatt hours (kWh) been increasing. Secondly, as we move to the exascale of electricity [1]. Further, the energy consumption of era, the requirements of data storage and computation main memory is becoming an increasing fraction of are growing exponentially [7–10] and to fully optimize total energy consumption of the system, in processors the value of such data, modern processors are using ranging from low-end to high-end. It has been estimated main memory (as opposed to persistent storage) as the that as much as 40% of the total power consumed primary data storage for critical applications. Thirdly, in smartphones and datacenters is attributed to the modern data-centers exhibit low average utilization [2], memory system [2–4]. Studies conducted on real server but frequent, brief bursts of activity, and thus, to meet the requirements of service-level-agreements (SLAs), 2.1 DRAM terminology operators are forced to provision high amount of main memory resources, suitable to meet the worst case In DRAM terminology, a column is the smallest requirement. To cater to the above mentioned demands, addressable portion of the DRAM device and a row is a modern processors are using main memory with high group of bits in the DRAM array that are sensed together bandwidth, frequency and capacity. Finally, CMOS at the instance of receiving an activate signal. A DRAM technology scaling has enabled higher transistor packing bank is an array of DRAM cells, which can be active density, which have further exacerbated the problems of independently and has same data bus width as external power consumption and heating and have also inhibited output bus width. A DRAM rank is a group of DRAM the effectiveness of cooling solutions. Thus, improving devices which operate together to service requests from the power efficiency of memory systems has become the memory controller. A DIMM (Dual in-line memory extremely important to continue to scale performance [8] module) is a printed circuit board containing one or more and also achieve the goals of sustainable computing. DRAM ranks on it, which provides an interface to the memory bus. A DRAM channel is a group of one or more Recently, many techniques have been proposed for DIMMs of DRAM that handle requests from the memory optimizing the power efficiency of memory systems. In controller. An typical DRAM can have 2 channels, 2 this paper, we review several of these techniques. As DIMMs per channel, 2 ranks per DIMM, 8 banks per it is practically infeasible to review all the techniques rank, for a total of 64 (= 2 × 2 × 2 × 8) banks. proposed in the literature, we take the following approach to limit the scope of the paper. We only include techniques proposed for saving energy in DRAM 2.2 Sources of Power Consumption systems and do not discuss other emerging storage The power consumption in DRAM memory is broadly technologies (e.g. phase change memory). We consider classified in three categories, namely activation power, DRAM, since it has been traditionally used as main read/write power, background power (for a more detailed memory because of its properties such as high density, analysis, see [11, 13, 14]). Activation power refers to high capacity, low cost and device standardization etc. the power dissipated in activating a memory array row [11]. Also, we focus on architecture-level techniques and in precharging the arrays bitlines. The read/write which allow runtime power management and do not power refers to the power which is consumed when the discuss circuit-level innovations for reducing power data moves either into or out of the memory device. The consumption. Further, although the techniques aimed at background power is independent of the DRAM access improving memory performance (e.g. reducing latency) activity and is due to the transistor leakage, peripheral are also likely to improve memory power efficiency, we circuitry, and the data refresh operations. Note that the only include those techniques that have been evaluated DRAM memory cells store data using capacitors that for improving memory power efficiency. Finally, since lose their charge over time and must be periodically different techniques have been evaluated using different recharged; this is referred to as data refreshing. experimentation methodologies, we do not present their quantitative results; rather, we only discuss the key ideas of those techniques. 3 A Classification of DRAM Power The remainder of the paper is organized as follows. Management Techniques In section 2, we briefly discuss the terminology used in DRAM systems and the sources of power consumption in In this section, we present a classification of the them. Understanding the sources of power consumption DRAM power management techniques based on their also helps in gaining insights into the opportunities characteristics. available for improving power efficiency. In section 3, we As shown by Barroso et al. [2], modern servers present a classification of the techniques proposed for operate most of the time between 10% and 50% of managing DRAM power consumption to highlight the maximum utilization. Thus, considerable opportunities similarities and differences among them. A more detailed exist to transition idle inactive memory banks into low- discussion of these techniques is provided in section 4. power modes. These power modes could be either state- Finally section 5 provides the concluding remarks. preserving modes (i.e. the data is retained) or state- destroying modes (i.e. the data is not retained). For this purpose, DRAM chips provision several modes of operation. Each mode is characterized by its power consumption and the time that it takes to transition 2 Background back to the active mode. This time is referred to as resynchronization latency or exit latency. Typically, In this section, we briefly discuss DRAM terminology the modes with lower energy consumption also have [12] and the sources of power consumption in DRAM higher reactivation time and vice versa. Also, a DRAM systems, to aid in discussion of DRAM power module may enter a low-power state-preserving mode management techniques discussed in the next sections. when it is idle, but must return to the active mode to service a request. A large number of techniques have are clustered into a minimum number of DRAM chips been proposed which utilize the adaptive power saving and the unused chips are transitioned to low power capability offered by the modern multi-banked memory modes. In addition, their technique also monitors the systems [4, 15–59]. time period between accesses to a chip as a metric for Some techniques perform memory access measuring the frequency of reuse of a chip. When this redistribution (also called memory traffic reshaping), time is greater than a certain threshold, the chip is which involves changing the address mapping in DRAM, transitioned to the low-power mode. migrating data within DRAM etc., for increasing the idle Fan et al. [31] present an analytical model to period of certain memory banks or increasing memory approximate the idle time of memory chips. Based on reference locality [15, 17, 24, 34, 38–40, 48–50, 60–64]. this, they identify the threshold time, after which the Several other techniques reduce the power consumed memory chip can transitioned to low-power state. They in each DRAM access by accessing only part of DRAM observe that, for their experimentation framework, the in each memory access [11, 54, 57, 65–68]. Thus, these simple policy of immediately transitioning a DRAM chip techniques provision activating a much smaller portion to a low-power mode when it becomes idle, performs of DRAM circuit component than what is activated in better than more sophisticated policies that predict conventional DRAMs. DRAM chip idle time. Li et al. [42] propose a technique Many techniques for reducing memory power for saving memory energy which adaptively transitions consumption use mechanisms based on memory access memory module to low-power modes while also providing scheduling [15, 32, 44, 69–76]. This includes mechanisms guarantees on the maximum performance loss. such as memory access throttling or buffering or Delaluz et al. [26] propose software and hardware coalescing. Also, techniques have been proposed which based approaches to save memory energy. Their reduce the number of accesses to memory by smart hardware based approach works by estimating the time management of last level caches or program level of next access to a memory bank and then, depending transformations etc. [15, 77–79]. upon the time, switching the bank to a suitable Some techniques use data compression to reduce low-power mode. The software-directed approach uses the memory footprint of the application [49, 80–82]. compiler analysis to insert memory module transition This helps in reducing the number of memory banks instructions in the program binary. To avoid the time occupied by application data. The techniques based on overhead of resynchronization, they propose bringing data replication increase idle time duration of memory the memory bank to active mode before its next use. banks by duplicating their selected read-only data blocks Depending upon the break-even analysis of length of idle on other active banks [49]. time and power saving opportunity of different power Like other CMOS circuits, DRAM power also saving modes, a suitable power saving mode is chosen. depends on operating frequency and supply voltage Delaluz and Sivasubramaniam et al. [27] describe an OS and hence DVFS (dynamic voltage/frequency scaling) scheduler-based power mode control scheme for saving mechanism has been used in several techniques to DRAM energy. Their scheme tracks memory banks that save memory power [29, 64, 70, 83]. Some techniques are being used by different applications and selectively reduce the DRAM refresh power, by trading off turns on/off these banks at context switch points. either performance or reliability [43, 78, 84–89]. Some A limitation of the approaches based on power techniques account for the effect of temperature while mode control is that most of the idle times between optimizing for memory energy [17, 70, 90–92]. Such different accesses to memory ranks is shorter than the techniques are referred to as thermal-aware memory resynchronization time between different power modes. energy saving techniques. To address this issue, Huang and Shin et al. [34] propose Also, while most of the techniques work to a method for saving memory energy by concentrating the reduce total power consumption of DRAM system, memory access activities to merely few memory ranks, a few techniques work to limit their average power such that rest of the ranks can be switched to low- consumption [23], while some other techniques work to power modes. Their method migrates the frequently- limit their peak power consumption [22, 30]. accessed pages to “hot” ranks and the infrequently-used and unmapped pages to “cold” ranks. This also helps in elongating the idle periods of cold ranks. 4 DRAM Power management techniques Delaluz and Kandemir et al. [24] propose a technique to save memory energy by dynamically placing the In this section, we review several DRAM power saving arrays with temporal affinity into the same set of banks. techniques. As discussed before, we only present the This increases the opportunities of exploiting deeper key ideas of each technique and do not discuss their sleep modes (more energy-saving operating modes) and qualitative results. keeping modules in low-power modes for longer durations Lebeck et al. [40] propose a technique for turning of time. Using the same principle, they also propose off DRAM chips in low-power mode. Their technique an array interleaving mechanism [25] for clustering works by controlling virtual address to physical address multiple arrays, which are accessed simultaneously, into mapping such that the physical pages of an application a single common data space. Interleaving enhances spatial locality of the program and reduces the number of Koc et al. [39] discuss a data-recomputation approach accesses to the off-chip memory. Along with interleaving for increasing idle time of memory banks to save their the arrays, their mechanism also transforms the code energy. When an access to a bank in low-power mode accordingly by replacing the original array references and is made, their technique first checks the active banks. If declarations with their transformed equivalents. the requested data can be recomputed by using the data Huang and Pillai et al. [33] propose a technique stored in already active banks, their technique does not for saving memory energy using virtual memory activate the bank in low-power mode. Rather, the data management. Their technique works by using virtual request is fulfilled based on the computations performed memory remapping to reduce the memory footprint of on the data obtained from the already active banks. each application and transitioning the unused memory To reduce the refresh power of the DRAM devices, modules to low-power modes. Zhou et al. [59] discuss a Ghosh et al. [84] propose adaptive refresh method. The utility-based memory allocation scheme where different conventional refresh mechanism of DRAM periodically applications are allocated memory in proportion to their refreshes all the memory rows for retaining the data. utility (i.e. the performance benefit gained by allocation However, from the standpoint of data retention, an of the memory). After allocation, the rest of the memory access to a memory row performs an operation equivalent is transitioned to low-power modes for saving power. For to a regular refresh. The technique proposed by Ghosh estimating the utility of allocating memory to different et al. [84] uses this observation to avoid refreshing a applications, their scheme dynamically tracks page miss- memory row, if it has been recently read out or written to rate curve (MRC) for virtual memory system using by the processor. To track the recency of memory access either hardware or software methods. Lyuh et al. [44] operation, they use counters for each row in the memory propose a technique which uses analytical model for module. Using this technique, the number of regular row- saving memory power. Their technique selects a suitable sweeping refresh operations are greatly reduced, which low-power mode for a memory bank by synergistically results in saving of power. controlling assignment of variables to memory banks and J. Liu et al. [85] propose a technique for saving scheduling of memory access operations, such that total DRAM energy by avoiding unbeneficial refreshes. Their memory power consumption is minimized. technique works on the observation that in the DRAM, only a small number of cells need to be refreshed at Bi et al. [19] propose methods to hide the latency of the minimum conservative refresh rate. The rest of the resynchronization of memory ranks to low-power modes cells can be refreshed at a much higher rate, while still by exploiting the knowledge of system input/output maintaining their charge. Based on this observation, (I/O) calls. Their technique works on the observation their technique groups DRAM rows in multiple bins that a majority of file-I/O accesses are made through and uses different refresh interval for different bins. system calls, the operating system knows the completion Thus, by refreshing most of the cells less frequently than time of these accesses. Thus, using this knowledge, their the leaky cells, their technique reduces the number of technique transitions idle memory ranks into low-power refresh operations required and reduces memory power modes. Further, to hide the resynchronization delay, consumption. their technique uses prediction mechanism to estimate Isen et al. [78] discuss a technique for utilizing the most likely rank to be accessed on a system call program semantics to save memory energy. Their entry and speculatively turns on that rank. On a correct technique uses memory allocation/deallocation prediction, the rank transition completes before the information to identify inconsequential data and avoids memory request arrives and thus, the resynchronization refreshing them. For example, the regions of memory latency is fully hidden. which are free (unallocated and invalid) or freshly Pandey et al. [50] propose a technique for saving allocated (allocated but invalid ) do not store meaningful energy in DMA (direct memory access) transfers. Since data and hence, retaining the data of those regions is DMA transfers are usually larger than the transfers not important. Thus, their technique saves power by initiated by the processors, they are divided into multiple avoiding refreshing such data. number of smaller transfer operations. However, due to S. Liu et al. [43] propose an application level the availability of only short time gaps between any two technique to reduce refresh level power in DRAM DMA-memory requests, the opportunity of transitioning memories. They show that many applications are the memory to low-power mode remains small. To tolerant to errors in the non-critical data, and errors address this, Pandey et al. propose temporally aligning in non-critical data show little or no impact in the DMA requests coming from different I/O buses to the application’s final result. Based on this observation, same memory device. For this purpose, their technique their technique works by using programmer supplied delays DMA-memory requests directed to a memory chip information to identify critical and non-critical data which is in low-power mode and tries to gather enough in the programs. Using this information, at runtime, requests from other I/O buses before transitioning that these data are allocated in different modules of the chip to the normal power mode. This helps in elongating memory. The memory modules containing critical data the idle time of memory chips and also maximizes the are refreshed at the regular refresh-rate, while the utilization of active time of memory chips. modules containing non-critical data are refreshed at substantially lower rates. The use of lower refresh rates sensitive application, their technique uses a mini- leads to saving in refresh power, however, it also increases rank configuration which does not degrade application the probability of data corruption. Thus, their technique performance; while for a latency-insensitive application, exercises a trade-off between energy saving and data their technique uses a mini-rank configuration which corruption. achieves memory power saving. Sudan et al. [63] propose a technique for saving Yoon et al. [75] propose a technique for saving memory power by using OS management approach. memory power consumption by intelligently utilizing Their technique works by controlling the address low-power mobile DRAM components. Their technique mapping of OS pages to DRAM devices such that uses buffering mechanism to aggregate the data outputs the clusters of cache blocks from different OS pages, from multiple ranks of low frequency mobile DRAM which have similar access counts are colocated in a devices (e.g. 400MHz LPDDR2), to collectively provide row-buffer. This improves the hit rate of the row-buffer high bandwidth and high storage capacity equal to and thus leads to saving of memory power. For co- server-class DRAM devices (e.g. 1600MHz DDR3). locating pages, Sudan et al. propose two techniques. One Yoon and Jeong et al. [74] propose a technique of their technique reduces OS page size such that the for saving memory power by dynamically changing the frequently accessed blocks are clustered together in the granularity of data transferred in each DRAM access. new, reduced size page (called a “micro-page”). Then, Their technique works by managing virtual memory the hot micro-pages are migrated in the same row-buffer. such that a specific access granularity can be used for The second technique proposed by them uses a hardware each page based on the spatial locality present in each scheme. This scheme introduces a layer of translation application. For applications with high spatial locality, between physical addresses assigned by the OS and those their technique uses coarse-grained data accesses, while used by the memory controller to access the DRAM for applications with low spatial locality their technique devices. By taking advantage of this layer of mapping, uses fine-grained data accesses. hot pages are migrated in the same row-buffer. Several researchers have proposed techniques which Trajkovic et al. [73] propose a buffering based use DVFS mechanism to save memory energy. Deng et technique for reducing memory power consumption. al. [28, 29] use memory DVFS to save memory energy. Their technique works on the observation that if At the time of low memory activity, their technique in a synchronous DRAM, two memory access (i.e. lowers the frequency of DRAM devices, memory channels read/write) operation are done in a same activate- and memory controllers such that the performance loss precharge cycle, the cost of activation and precharging is minimum. This leads to saving of memory power can be avoided. This is because, the DRAMs allow the consumption. They also extend their technique for row to be left ‘on’ after a memory access. Based on this coordinating DVFS across multiple memory controllers, observation, on read accesses, their technique prefetches memory channels, and memory devices to minimize the additional cache blocks. Similarly, for write accesses, overall system power consumption. combines multiple blocks which are to be written to the Diniz et. al. [30] propose a technique to limit same DRAM row. To store the extra prefetched lines, the instantaneous (peak) power consumption of main their technique uses a small storage structure in the memory to a pre-specified power budget. Their technique memory controller. Similarly, to buffer the writes to the uses knapsack and greedy algorithms to decide the same DRAM row also, a small storage structure is used. timings at which memory devices should be transitioned By adapting the above mentioned prefetching and write- to suitable low-power modes such that the instantaneous combining scheme for each application, their technique power of memory is always within the power budget. achieves reduction in memory power consumption. David et al. [23] present a scheme for limiting the Zheng et al. [57] propose a technique for saving average power consumption of memory, by suitably memory power consumption by reducing the number transitioning memory devices into low-power modes. of memory chips involved in each memory access. This Chen et al. [22] propose a method for limiting the peak is referred to as “rank-subsetting” approach. Their power consumption of the server (which includes power technique adds a small buffer called “mini-rank buffer” consumption of processor and main memory) system between each DIMM and the memory bus. Using this, a using control theoretic approach. DRAM rank, which normally provides 64-bit datapath, Amin et al. [15] propose a replacement policy for last- can be internally designed using either eight 8-bit ranks, level cache, which tries to increase the idle time of certain or four 16-bit ranks or two 32-bit ranks, which are termed pre-chosen DRAM ranks, called “prioritized ranks”. This as mini-rank. With this support, on any memory access, replacement policy tries to prevent the replacement of only a single mini-rank is activated and the other mini- blocks belonging to the prioritized ranks. This reduces ranks can be transitioned to low-power modes. the conflict misses and writebacks to the prioritized rank, Fang et al. [66] extend mini-rank approach to increasing the idle period between accesses made to those heterogeneous mini-rank design which adapts the ranks. Amin et al. also propose a technique which buffers number of mini-ranks according to the memory access writeback requests sent to DRAM, to increase the idle behavior and memory bandwidth requirement of each period of the DRAM ranks. The requests are buffered as workload. Based on this information, for a latency- long as the target ranks remain idle or the buffer remains full. When the targeted ranks become active (due to the using an ILP solver. Using ILP formulation, they find demand misses), the buffered requests are sent to them. the (nonuniform) bank architecture and accompanying Ozturk et al. [48] present a bank-aware cache miss data mapping strategy which best suits the application- clustering approach for saving DRAM energy. Their data access patterns. Similarly, they use ILP formulation technique uses compiler analysis to restructure the code to find best possible data replication scheme which such that the cache misses from the last-level cache are increases idle time of certain banks by duplicating their clustered together. Clustering of the cache misses also selected read-only data blocks on other active banks. leads to the clustering of cache hits. Thus, the memory They also use ILP formulation to find the best time to accesses and memory idle cycles are also clustered. compress and/or migrate the data between banks. This increases the memory access activities in certain Several researchers have used domain-specific banks and the other banks become idle for a long time. optimizations to save DRAM power. Kim et al. [61] and By taking advantage of this, idle memory banks are Li et al. [62] propose techniques for reducing DRAM transitioned to low-power modes. power consumption in video processing domain. Video As the computational requirements of state-of-the-art processing applications are characterized by abundant applications is increasing [93], the pressure on memory spatial and temporal image data correlations, and systems is also on rise and to mitigate this pressure, unbalanced accesses to frames (e.g. certain image frames researchers have proposed techniques to intelligently are accessed much more frequently than other image manage the last level caches (LLCs) in the processors. frames). Hence, to take advantage of these properties, Mazumdar et al. [79] propose a technique for reducing their techniques map image data in DRAM in a way the number of memory accesses in multicore systems which minimizes the number of row-activations. Thus, by cache aggregation approach. Their technique works the power consumption of DRAM is reduced. on the observation that due to the availability of high- Chen et al. [21] propose a technique for tuning the bandwidth point-to-point interconnects between sockets, garbage collector (GC) in Java to reduce memory power a read from the LLC of a connected chip consumes less consumption. GC is an tool used in Java virtual machine time and energy than an access to DRAM. Based on (JVM) for automatic reclamation of unused memory. this, their technique uses the LLC of an idle processor Chen et al. propose using GC to turn off the memory in a connected socket for holding the evicted data from banks that do not hold live data. They also observe the active processor. This reduces the number of accesses that the pattern of object allocation and the number to DRAM and thus reduces the power consumption of of memory banks available in the DRAM architecture DRAM. crucially influence the effectiveness of GC in optimizing Phadke et al. [88] propose a heterogeneous main energy. memory architecture which comprises of three different Pisharath et al. [51] propose an approach to memory modules. Each memory module is optimized for reduce memory power consumption in memory-resident latency, bandwidth, power consumption, respectively, at database management systems (DBMS). One of their the expense of the other two. Their technique works techniques uses hardware monitors to detect the by using offline analysis to characterize an application frequency of use of memory banks during query based on its LLC (last level cache) miss rate and memory execution and based on this, switches the idle banks level parallelism. Using this information, at runtime, the into low-power mode. Another technique uses a software operating system allocates the pages of an application approach. For DBMS systems, when the query is in one of the three memory modules that satisfies submitted, it is first parsed and then sent to the query its memory requirements. Thus, their approach saves optimizer which uses query tree to find the best suited memory energy and also improves performance of the plan for execution of the query [51]. At this point, system. query optimizer finds the database tables which will be Yang et al. [82] discuss a software-based RAM accessed to answer the query. Based on this information, compression technique for saving power in embedded their technique changes the table-to-bank mapping such systems. Their technique uses memory compression only that memory accesses can be clustered. Also, the queries for those applications which may gain performance or presented to the database are augmented with explicit energy benefits from compression. For such applications, bank turn off or turn on instructions. Using this their technique performs compression of memory support, at runtime, the memory banks are dynamically data and swapped-out pages in online manner, thus transitioned into low-power mode. dynamically adjusting the size of the compressed RAM Since leakage (static) power varies exponentially with area. Thus, their technique saves power by using the temperature, the dissipation of power in DRAM compression to increase the effective size of the memory. leads to increase of device temperature, which further Ozturk et al. [49] integrate different approaches increases the leakage power dissipation. This may lead to such as dynamic data migration, data compression, and thermal emergencies. Also, many of the above mentioned data replication etc. to effectively transition a large approaches move or map frequently accessed pages to number of memory banks into low-power modes. They merely a few active memory modules. This is also likely formulate DRAM energy minimization problem as a to increase the temperature of the active modules. To integer linear programming (ILP) problem and solve it address this, Ayoub et al. [17] propose a technique which monitors the temperature of the active modules. [5] S. Borkar, “Thousand core chips: a technology When the temperature reaches a threshold, it selectively perspective,” in Proceedings of the 44th annual DAC, migrates a small number of memory pages between pp. 746–749, ACM, 2007. active and dormant memory modules and transitions [6] Intel. http://ark.intel.com/products/53575/. the active modules in the self-refresh mode. Since this [7] A. Agrawal et al., “A new heuristic for multiple sequence approach spreads out the memory accesses to multiple alignment,” in IEEE EIT, pp. 215–217, 2008. modules, it reduces the power density of the active [8] K. Bergman et al., “Exascale computing study: modules and thus avoids thermal emergencies. Technology challenges in achieving exascale systems,” C. Lin et al. [90] propose a technique for addressing tech. rep., DARPA Technical Report, 2008. memory thermal issues which works by orchestrating [9] S. Khaitan, J. McCalley, and M. Raju, “Numerical thread scheduling and page allocation. Their technique methods for on-line power system load flow analysis,” groups the program threads in multiple groups such that Energy Systems, vol. 1, no. 3, pp. 273–289, 2010. all the threads in a group can be active simultaneously. [10] M. Raju et al., “Domain decomposition based high Then each group is mapped to certain DIMMs and at performance parallel computing,” International Journal any time, only one group and its corresponding DIMMs of Computer Science Issues, 2009. remain active and the rest of the DIMMs are inactivated [11] E. Cooper-Balis and B. Jacob, “Fine-grained activation to reduce their temperature. Similarly, J. Lin et al. for power reduction in DRAM,” Micro, IEEE, vol. 30, [70, 91] propose techniques to mitigate overheating in no. 3, pp. 34–47, 2010. the memory system by adjusting memory throughput to [12] B. Jacob, S. Ng, and D. Wang, Memory systems: cache, stay below the emergency level. DRAM, disk. Morgan Kaufmann Publication, 2007. [13] “Calculating memory system power for DDR3.” http://download.micron.com. 5 Conclusion [14] T. Vogelsang, “Understanding the energy consumption of dynamic random access memories,” in MICRO, Recent advances in CMOS fabrication and chip design pp. 363–374, 2010. have greatly increased the power consumption of main [15] A. Amin and Z. Chishti, “Rank-aware cache replacement memory in modern computing systems. To provide and write buffering to improve DRAM energy a solution to this problem, several research efforts efficiency,” in Proceedings of the 16th ACM/IEEE have been directed towards managing the power international symposium on Low power electronics and consumption of main memory. In this paper, we surveyed design, pp. 383–388, ACM, 2010. several architectural techniques which are designed for [16] V. Anagnostopoulou, S. Biswas, H. Saadeldeen, improving DRAM memory power efficiency. We also A. Savage, R. Bianchini, T. Yang, D. Franklin, and presented a classification of the proposed techniques F. T. Chong, “Barely alive memory servers: Keeping across several parameters, to highlight their similarities data active in a low-power state,” in ACM Journal on and differences. We believe that this survey will help Emerging Technologies in Computing Systems, Special researchers and designers to understand the state of the issue on Sustainable and Green Computing Systems, art in approaches pursued for reducing memory power April 2012. consumption. At the same time, it will also encourage [17] R. Ayoub, K. Indukuri, and T. Rosing, “Energy efficient them to design innovative solutions for memory systems proactive thermal management in memory subsystem,” of future green computing infrastructure. in International Symposium on Low-Power Electronics and Design (ISLPED), pp. 195–200, IEEE, 2010. [18] H. Ben Fradj, C. Belleudy, and M. Auguin, “System level multi-bank main memory configuration for References energy reduction,” in Integrated Circuit and System Design. Power and Timing Modeling, Optimization and [1] R. Brown, E. Masanet, B. Nordman, B. Tschudi, Simulation (J. Vounckx, N. Azemard, and P. Maurine, A. Shehabi, J. Stanley, J. Koomey, D. Sartor, P. Chan, J. Loper, et al., “Report to congress on server and data eds.), vol. 4148 of Lecture Notes in Computer Science, center energy efficiency,” Public law 109-431, 2007. pp. 84–94, Springer Berlin / Heidelberg, 2006. [2] L. Barroso and U. H¨ olzle, “The datacenter as a [19] M. Bi, R. Duan, and C. Gniady, “Delay-hiding energy computer: An introduction to the design of warehouse- management mechanisms for DRAM,” in HPCA, pp. 1– scale machines,” Synthesis Lectures on Computer 10, IEEE, 2010. Architecture, vol. 4, no. 1, pp. 1–108, 2009. [20] K. Chandrasekar, B. Akesson, and K. Goossens, “Run- [3] C. Lefurgy, K. Rajamani, F. Rawson, W. Felter, time power-down strategies for real-time SDRAM M. Kistler, and T. Keller, “Energy management for memory controllers,” in Proceedings of the 49th Annual commercial servers,” Computer, vol. 36, no. 12, pp. 39– DAC, pp. 988–993, ACM, 2012. 48, 2003. [21] G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, [4] M. Ware, K. Rajamani, M. Floyd, B. Brock, J. Rubio, M. J. Irwin, and M. Wolczko, “Tuning garbage collection F. Rawson, and J. Carter, “Architecting for power for reducing memory system energy in an embedded management: The IBM R POWER7 approach,” in java environment,” ACM Trans. Embed. Comput. Syst., HPCA, pp. 1–11, IEEE, 2010. vol. 1, pp. 27–55, Nov. 2002. [22] M. Chen, X. Wang, and X. Li, “Coordinating processor [36] S. Irani, S. Shukla, and R. Gupta, “Online strategies for and main memory for efficient server power control,” in dynamic power management in systems with multiple International Conference on Supercomputing, ICS ’11, power-saving states,” ACM Transactions on Embedded (New York, NY, USA), pp. 130–140, ACM, 2011. Computing Systems (TECS), vol. 2, no. 3, pp. 325–346, [23] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, 2003. and C. Le, “RAPL: Memory power estimation and [37] M. Kandemir, U. Sezer, and V. Delaluz, “Improving capping,” in International Symposium on Low-Power memory energy using access pattern classification,” in Electronics and Design (ISLPED), pp. 189 –194, aug. International Conference on Computer-Aided Design, 2010. pp. 201–206, IEEE Press, 2001. [24] V. De La Luz, M. Kandemir, and I. Kolcu, “Automatic [38] B. Khargharia, S. Hariri, and M. S. Yousif, “Self- data migration for reducing energy consumption in optimization of performance-per-watt for interleaved multi-bank memory systems,” in DAC, 2002, pp. 213– memory systems,” in 14th international conference 218, IEEE, 2002. on High performance computing, HiPC’07, (Berlin, Heidelberg), pp. 368–380, Springer-Verlag, 2007. [25] V. Delaluz, M. Kandemir, N. Vijaykrishnan, M. Irwin, A. Sivasubramaniam, and I. Kolcu, “Compiler-directed [39] H. Koc, O. Ozturk, M. Kandemir, and E. Ercanli, array interleaving for reducing energy in multi-bank “Minimizing energy consumption of banked memories memories,” in ASPDAC, pp. 288–293, IEEE, 2002. using data recomputation,” in International Symposium on Low Power Electronics and Design, 2006., pp. 358– [26] V. Delaluz, M. Kandemir, N. Vijaykrishnan, 361, 2006. A. Sivasubramaniam, and M. Irwin, “DRAM energy management using software and hardware directed [40] A. Lebeck, X. Fan, H. Zeng, and C. Ellis, “Power power mode control,” in HPCA, pp. 159–169, IEEE, aware page allocation,” ACM SIGPLAN Notices, vol. 35, no. 11, pp. 105–116, 2000. 2001. [41] X. Li, R. Gupta, S. Adve, and Y. Zhou, “Cross- [27] V. Delaluz, A. Sivasubramaniam, M. Kandemir, component energy management: Joint adaptation N. Vijaykrishnan, and M. Irwin, “Scheduler-based of processor and memory,” ACM Transactions on DRAM energy management,” in DAC, pp. 697–702, Architecture and Code Optimization (TACO), vol. 4, ACM, 2002. no. 3, p. 14, 2007. [28] Q. Deng, D. Meisner, A. Bhattacharjee, T. F. Wenisch, [42] X. Li, Z. Li, F. David, P. Zhou, Y. Zhou, S. Adve, and and R. Bianchini., “MultiScale: Memory System DVFS S. Kumar, “Performance directed energy management with Multiple Memory Controllers,” in International for main memory and disks,” in ACM SIGARCH Symposium on Low power electronics and design Computer Architecture News, vol. 32, pp. 271–283, (ISLPED), July 2012. ACM, 2004. [29] Q. Deng, D. Meisner, L. Ramos, T. Wenisch, and [43] S. Liu, K. Pattabiraman, T. Moscibroda, and B. Zorn, R. Bianchini, “Memscale: active low-power modes for “Flikker: Saving DRAM refresh-power through critical main memory,” ACM SIGPLAN Notices, vol. 46, no. 3, data partitioning,” ACM SIGPLAN Notices, vol. 46, pp. 225–238, 2011. no. 3, pp. 213–224, 2011. [30] B. Diniz, D. Guedes, W. Meira Jr, and R. Bianchini, [44] C. Lyuh and T. Kim, “Memory access scheduling and “Limiting the power consumption of main memory,” in binding considering energy minimization in multi-bank ACM SIGARCH Computer Architecture News, vol. 35, memory systems,” in Proceedings of the 41st annual pp. 290–301, ACM, 2007. DAC, pp. 81–86, ACM, 2004. [31] X. Fan, C. Ellis, and A. Lebeck, “Memory controller [45] K. T. Malladi, F. A. Nothaft, K. Periyathambi, B. C. policies for DRAM power management,” in Proceedings Lee, C. Kozyrakis, and M. Horowitz, “Towards energy- of the 2001 international symposium on Low power proportional datacenter memory with mobile DRAM,” electronics and design, pp. 129–134, ACM, 2001. in ISCA, pp. 37 –48, june 2012. [32] M. Floyd, S. Ghiasi, T. Keller, K. Rajamani, F. Rawson, [46] D. Meisner, B. Gold, and T. Wenisch, “PowerNap: J. Rubio, and M. Ware, “System power management eliminating server idle power,” ACM Sigplan Notices, support in the IBM POWER6 microprocessor,” IBM vol. 44, no. 3, pp. 205–216, 2009. Journal of Research and Development, vol. 51, no. 6, [47] J. Mukundan and J. F. Martinez, “MORSE: Multi- pp. 733–746, 2007. objective reconfigurable self-optimizing memory [33] H. Huang, P. Pillai, and K. Shin, “Design and scheduler,” HPCA, vol. 0, pp. 1–12, 2012. implementation of power-aware virtual memory,” [48] O. Ozturk, G. Chen, M. Kandemir, and M. Karakoy, USENIX Annual Technical Conference, pp. 57–70, 2003. “Cache miss clustering for banked memory systems,” [34] H. Huang, K. Shin, C. Lefurgy, and T. Keller, in IEEE/ACM international conference on Computer- “Improving energy efficiency by making DRAM less aided design, ICCAD ’06, pp. 244–250, ACM, 2006. randomly accessed,” in Proceedings of the 2005 [49] O. Ozturk and M. Kandemir, “ILP-Based energy international symposium on Low power electronics and minimization techniques for banked memories,” ACM design, pp. 393–398, ACM, 2005. Trans. Des. Autom. Electron. Syst., vol. 13, pp. 50:1– [35] I. Hur and C. Lin, “A comprehensive approach to 50:40, July 2008. DRAM power management,” in 14th International [50] V. Pandey, W. Jiang, Y. Zhou, and R. Bianchini, “DMA- Symposium on High Performance Computer aware memory energy management,” in HPCA, pp. 133 Architecture, 2008. HPCA., pp. 305–316, IEEE, 2008. – 144, feb. 2006. [51] J. Pisharath, A. Choudhary, and M. Kandemir, [64] M. E. Tolentino, J. Turner, and K. W. Cameron, “Reducing energy consumption of queries in memory- “Memory MISER: Improving Main Memory Energy resident database systems,” in Proceedings of the 2004 Efficiency in Servers,” IEEE Trans. Comput., vol. 58, international conference on Compilers, architecture, and pp. 336–350, Mar. 2009. synthesis for embedded systems, pp. 35–45, ACM, 2004. [65] J. Ahn, N. Jouppi, C. Kozyrakis, J. Leverich, and [52] I. Rodero, S. Chandra, M. Parashar, R. Muralidhar, R. Schreiber, “Future scaling of processor-memory H. Seshadri, and S. Poole, “Investigating the potential interfaces,” in Proceedings of the Conference on of application-centric aggressive power management for High Performance Computing Networking, Storage and hpc workloads,” in International Conference on High Analysis, p. 42, ACM, 2009. Performance Computing (HiPC), 2010, pp. 1 –10, dec. [66] K. Fang, H. Zheng, and Z. Zhu, “Heterogeneous mini- 2010. rank: Adaptive, power-efficient memory architecture,” [53] K. Sudan, K. Rajamani, W. Huang, and J. Carter, in 39th International Conference on Parallel Processing “Tiered memory: An iso-power memory architecture to (ICPP), 2010, pp. 21–29, IEEE, 2010. address the memory power wall,” IEEE Transactions on [67] O. Seongil, S. Choo, and J. H. Ahn, “Exploring Computers, 2012. energy-efficient DRAM array organizations,” in 54th [54] A. Udipi, N. Muralimanohar, R. Balsubramonian, International Midwest Symposium on Circuits and A. Davis, and N. Jouppi, “LOT-ECC: LOcalized and Systems (MWSCAS), 2011, pp. 1 –4, aug. 2011. Tiered Reliability Mechanisms for Commodity Memory [68] G. Zhang, H. Wang, X. Chen, S. Huang, and P. Li, Systems,” in Proceedings of ISCA, 2012. “Heterogeneous multi-channel: fine-grained DRAM [55] A. Udipi, N. Muralimanohar, N. Chatterjee, control for both system performance and power R. Balasubramonian, A. Davis, and N. Jouppi, efficiency,” in Proceedings of the 49th Annual DAC, “Rethinking DRAM design and organization for energy- pp. 876–881, ACM, 2012. constrained multi-cores,” in ACM SIGARCH Computer [69] H. Hanson and K. Rajamani, “What computer Architecture News, vol. 38, pp. 175–186, ACM, 2010. architects need to know about memory throttling,” in [56] Z. Wang and X. Hu, “Energy-aware variable Computer Architecture, pp. 233–242, Springer, 2012. partitioning and instruction scheduling for multibank [70] J. Lin, H. Zheng, Z. Zhu, E. Gorbatov, H. David, and memory architectures,” ACM Transactions on Design Z. Zhang, “Software thermal management of DRAM Automation of Electronic Systems (TODAES), vol. 10, memory for multicore systems,” in ACM SIGMETRICS no. 2, pp. 369–388, 2005. Performance Evaluation Review, vol. 36, pp. 337–348, [57] H. Zheng, J. Lin, Z. Zhang, E. Gorbatov, H. David, ACM, 2008. and Z. Zhu, “Mini-rank: Adaptive DRAM architecture [71] J. Lin, H. Zheng, Z. Zhu, Z. Zhang, and H. David, for improving memory power efficiency,” in MICRO, “DRAM-level prefetching for fully-buffered DIMM: pp. 210–221, IEEE, 2008. Design, performance and power saving,” in ISPASS, [58] H. Zheng and Z. Zhu, “Power and Performance pp. 94–104, IEEE, 2007. Trade-Offs in Contemporary DRAM System Designs [72] S. Liu, S. Memik, Y. Zhang, and G. Memik, “A power for Multicore Processors,” IEEE Transactions on and temperature aware DRAM architecture,” in DAC, Computers, vol. 59, no. 8, pp. 1033–1046, 2010. pp. 878–883, IEEE, 2008. [59] P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, [73] J. Trajkovic, A. Veidenbaum, and A. Kejariwal, Y. Zhou, and S. Kumar, “Dynamic tracking of page miss “Improving SDRAM access energy efficiency for low- ratio curve for memory management,” in ACM SIGOPS power embedded systems,” ACM Transactions on Operating Systems Review, vol. 38, pp. 177–188, ACM, Embedded Computing Systems (TECS), vol. 7, no. 3, 2004. p. 24, 2008. [60] D. Kaseridis, J. Stuecheli, and L. K. John, “Minimalist [74] D. Yoon, M. Jeong, and M. Erez, “Adaptive granularity open-page: a DRAM page-mode scheduling policy for memory systems: a tradeoff between storage efficiency the many-core era,” in MICRO, MICRO-44 ’11, (New and throughput,” in ISCA, pp. 295–306, ACM, 2011. York, NY, USA), pp. 24–35, ACM, 2011. [75] D. H. Yoon, J. Chang, N. Muralimanohar, and [61] H. Kim and I.-C. Park, “High-performance and P. Ranganathan, “BOOM: Enabling mobile memory low-power memory-interface architecture for video based low-power server DIMMs,” in ISCA, pp. 25 –36, processing applications,” IEEE Transactions on Circuits june 2012. and Systems for Video Technology,, vol. 11, pp. 1160 – [76] H. Zheng, J. Lin, Z. Zhang, and Z. Zhu, “Decoupled 1170, nov 2001. DIMM: building high-bandwidth memory system using [62] Y. Li and T. Zhang, “Reducing DRAM Image Data low-speed DRAM devices,” in ISCA, pp. 255–266, ACM, Access Energy Consumption in Video Processing,” 2009. IEEE Transactions on Multimedia, vol. 14, pp. 303 –313, [77] N. Aggarwal, J. Cantin, M. Lipasti, and J. Smith, april 2012. “Power-efficient DRAM speculation,” in HPCA, [63] K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, pp. 317–328, IEEE, 2008. R. Balasubramonian, and A. Davis, “Micro-pages: [78] C. Isen and L. John, “Eskimo-energy savings using increasing DRAM efficiency with locality-aware data semantic knowledge of inconsequential memory placement,” in ACM SIGARCH Computer Architecture occupancy for DRAM subsystem,” in MICRO, pp. 337– News, vol. 38, pp. 219–230, ACM, 2010. 346, IEEE, 2009. [79] S. Mazumdar, D. Tullsen, and J. Song, “Inter-socket [86] T. Ohsawa, K. Kai, and K. Murakami, “Optimizing the victim cacheing for platform power reduction,” in DRAM refresh count for merged DRAM/logic LSIs,” in Computer Design (ICCD), 2010 IEEE International Proceedings of the 1998 international symposium on Low Conference on, pp. 509–514, IEEE, 2010. power electronics and design, pp. 82–87, ACM, 1998. [80] G. Chen, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, [87] K. Patel, L. Benini, E. Macii, and M. Poncino, “Energy- and W. Wolf, “Energy savings through compression efficient value-based selective refresh for embedded in embedded java environments,” in Proceedings of the DRAMs,” Integrated Circuit and System Design. Power tenth international symposium on Hardware/software and Timing Modeling, Optimization and Simulation, codesign, CODES ’02, pp. 163–168, ACM, 2002. pp. 909–909, 2005. [81] R. Tremaine, P. Franaszek, J. Robinson, C. Schulz, [88] S. Phadke and S. Narayanasamy, “MLP aware T. Smith, M. Wazlowski, and P. Bland, “IBM memory heterogeneous memory system,” in Design, Automation expansion technology (MXT),” IBM Journal of Research & Test in Europe Conference & Exhibition (DATE), and Development, vol. 45, no. 2, pp. 271–285, 2001. 2011, pp. 1–6, IEEE, 2011. [82] L. Yang, R. P. Dick, H. Lekatsas, and S. Chakradhar, [89] J. Stuecheli, D. Kaseridis, H. Hunter, and L. John, “Online memory compression for embedded systems,” “Elastic refresh: Techniques to mitigate refresh penalties ACM Trans. Embed. Comput. Syst., vol. 9, pp. 27:1– in high density memory,” in MICRO, pp. 375–384, IEEE, 2010. 27:30, Mar. 2010. [90] C. Lin, C. Yang, and K. King, “PPT: joint [83] H. David, C. Fallin, E. Gorbatov, U. R. Hanebutte, and performance/power/thermal management of DRAM O. Mutlu, “Memory power management via dynamic memory for multi-core systems,” in Proceedings of voltage/frequency scaling,” in Proceedings of the 8th the 14th ACM/IEEE international symposium on Low ACM international conference on Autonomic computing, power electronics and design, pp. 93–98, ACM, 2009. ICAC ’11, (New York, NY, USA), pp. 31–40, ACM, 2011. [91] J. Lin, H. Zheng, Z. Zhu, H. David, and Z. Zhang, “Thermal modeling and management of DRAM memory [84] M. Ghosh and H. Lee, “Smart refresh: An enhanced systems,” in ISCA, vol. 35, ACM, 2007. memory controller design for reducing energy in [92] S. Liu et al., “Hardware/software techniques for DRAM conventional and 3D Die-Stacked DRAMs,” in MICRO, thermal management,” in HPCA, pp. 515–525, 2011. pp. 134–145, IEEE Computer Society, 2007. [93] S. Khaitan et al., “Fast parallelized algorithms for [85] J. Liu, B. Jaiyen, R. Veras, and O. Mutlu, “RAIDR: on-line extended-term dynamic cascading analysis,” in Retention-aware intelligent DRAM refresh,” in ISCA, IEEE/PES PSCE, pp. 1–7, 2009. pp. 1 –12, june 2012.
READ PAPER