IBM ShopIBM Support Download
HomeNewsProductsservicesSolutionsAbout IBM
  Hard disk drives
  How to buy
  Technical Support
     IBM Microdrive
     External hard drives
  How to...
  Real-world applications
  Contact Hard Drives

IBM's Drive Temperature Indicator Processor (Drive-TIP)
helps ensure high drive reliability

by Gary Herbst

Operating electronic components such as disk drives at high temperatures can dramatically reduce their reliability. In many computer systems, failures in cooling components (such as clogged filters on fans) can go undetected for an extended time. The resulting stress can lead to unexpected failures and even data loss. To prevent this from happening, IBM has integrated temperature sensors into its new Ultrastar 9LP, 18XP, and 9ZX server disk drives. High temperature conditions are reported to the host system using the Self-Monitoring Analysis and Reporting Technology (S.M.A.R.T.) standard. Once the computer system is alerted to any temperature problems, the user or system administrator can take action.

This white paper describes how a new Ultrastar feature, Temperature Indicator Processor, Drive-TIP, works and its benefits to users of data-intensive applications.

Today's applications require outstanding drive reliability
Network computing has elevated the role of servers from supporting small departmental workgroups to providing essential information and services for the world's largest enterprises. Today, servers and workstations are being called upon to deliver mission-critical applications to more people than ever before. From collaborative workgroup applications to image processing, video editing to data mining, OnLine Transaction Processing (OLTP) to OnLine Analytical Processing (OLAP), today's data-intensive applications are placing much higher demands on disk storage devices. In turn, these devices must provide more reliable access to much more data faster than ever before.

Figure 1: Together, PFA and Drive-TIP work to provide the industry's best information for preventing drive failures.

When it comes to capacity, performance, and reliability, one name stands above the rest-the IBM Ultrastar family of high-capacity, high-performance disk drives. IBM Ultrastar was the first drive family to implement the features now defined in the S.M.A.R.T. standard. Called Predictive Failure Analysis* (PFA), it monitors parameters such as head flying height, noise and signal amplitude, signal coherence, and writing parameters. PFA predicts impending drive failures using algorithms that are robust enough to help avoid failing good drives.¹ Likewise, IBM is first to market with temperature-sensing drives. Following on to PFA, the Drive-TIP feature is also expected to find widespread use as an aid to improving data availability.

Heat has a major effect on drive reliability
Disk drives are complex electro-mechanical devices that can suffer performance degradation or failures due to a single event or a combination of events occurring over time. Environmental conditions that affect drive reliability include ambient temperature, cooling air flow rate, voltage, duty cycle, shock/vibration, and relative humidity. Fortunately, it is possible to predict certain types of failures by measuring environmental conditions. One of the worst enemies of hard disk drives is heat. Within a drive, the reliability of both the electronics and the mechanics (such as the spindle motor and actuator bearings) degrades as temperature rises. Running any disk drive at extreme temperatures for long periods of time is detrimental and can eventually lead to permanent data loss.

Figure 2: Drive reliability decreases significantly as temperature rises above recommended levels

Figure 2 shows the dramatic effect that temperature has on the overall reliability of a hard disk drive. Derivations from a nominal operating temperature (assumed to be maintained over the life of a drive) can result in a derivation from the nominal failure rate. As the temperature exceeds the recommended level, the failure rate increases two to three percent for every one degree rise above it. For example, a hard disk drive running for an extended period of time at five degrees above the recommended temperature can experience an increase in failure rate of 10 to 15 percent. Likewise, operating a drive below the recommended temperature can extend drive life.

Several failure modes within a disk drive are exacerbated by temperature. Thermal tilt of the disk stack and actuator arms can occur very quickly and cause off-track writes, corrupting data on adjacent cylinders. Outgassing of the lubricants in the spindle motor and voice coil motor occurs at high temperatures (experienced over a relatively short 30-60 day time period), which can lead to stiction failures or a possible head crash. Over an extended period of time, the bearings can wear out and cause mechanical failures.

Heat can build up within computer systems due to a clogged fan, failure of air conditioning in a room, operating more drives than the cooling system can handle, and so on. Unfortunately, these conditions can go completely unnoticed until a failure occurs. Because of the essential nature of today's workstations and servers, such risks are unacceptable for many users. What is needed is a way to identify high-temperature situations before they affect data integrity.

Drive-TIP helps warn of extreme temperatures
Since disk drives are the most critical component for retaining vital information, IBM has created a solution, Drive-TIP shown in Figure 3, specifically to protect its drives from the long-term effects of excessive temperature. Drive-TIP automatically monitors the temperature within the drive and alerts the drive controller when the drive exceeds its maximum allowable temperature. This is accomplished through an electronic temperature sensor mounted on the back side of the electronics card close to the base casting and the spindle motor hub. The output of the temperature sensor is continuously monitored by the drive's microprocessor.

Two temperature trip points have been preprogrammed into Ultrastar drives. The first trip point is defined by the system provider (or in some cases the system administrator) in the Vendor Unique Parameter Mode page (00h) in the drive. Typically, this is set to the expected nominal temperature. The difiant is 50 degrees Celcius. The second trip point is 65 degrees Celsius-the maximum allowable temperature of the base casting.

If the first temperature trip point is exceeded, Drive-TIP sets an internal flag in the drive. A warning is sent to the drive controller when the PFA interval timer expires. The Information Exception Control (IEC) mode page (1Ch) controls the interval for posting the PFA errors and warnings.

Figure 3: Back side of card showing the thermal server

The drive microprocessor reads the temperature when it is powered on and every 25 minutes thereafter, as part of the Drive-TIP algorithm. The temperature warning is generated in compliance with the SCSI-3 standard as defined in Figure 4, which is a portion of Table 66 in the SCSI-3 Primary Commands (SPC) document, ASC and ASCQ Assignments. A unique Unit Error Code (UEC) of 22F is also returned on a subsequent Sense command.

Table 66 - ASC and ASCQ Assignments
61h 00h S Video acquisition error
65h 00h DTLPWRSOMCAE Voltage fault
0Bh 00h DTLPWRSOMCAE Warning
0Bh 02h DTLPWRSOMCAE Warning - enclosure degraded
0Bh 01h DTLPWRSOMCAE Warning - specified temperature exceeded
50h 00h T Write append error

Figure 4: SCSI-3 Definition of temperature warnings

When the first temperature trip point is exceeded, the sampling period changes from 25 minutes to 15 minutes. Also, a log entry is made in the permanent drive error log that includes the temperature and Power-On Hours (POH) when it occurred. As long as the temperature remains above the first trip point, it will continue to create log entries. If the temperature exceeds the 65 degree trip point, the sampling period changes from 15 minutes to 10 minutes. The log entries into the permanent drive error log continue at the 10 minute interval.

All log entries in the media error or hardware error logs also include the temperature at the time of the error. All unit starts and unit stops also include the temperature. In addition, the disk drive records the accumulated power-on hours that the temperature is above each trip points and the maximum temperature experienced during the life of the drive. This information is stored in the non-customer data cylinders on the drive.

Applications of Drive-TIP
Drive-TIP works with the industry-wide S.M.A.R.T. standard developed to monitor and predict device performance and reliability. Today, S.M.A.R.T. capable workstations and servers can warn users of some pending device failures so that action can be taken before data is lost or operations impacted. With sufficient warning, users can back up vital data and replace suspect devices. Now, with the addition of Drive-TIP, IBM's S.M.A.R.T. capability is enhanced to provide new levels of data integrity and availability. Figure 5 summarizes how Drive-TIP works with S.M.A.R.T.

Figure 5: Summary of the Drive-TIP function

If the warning is recognized by system users or administrators, corrective action can actually save data. Systems now have the information to vary cooling capacity based on component needs. For example, fan speed can be controlled based on temperature within the system, producing better reliability for customer data.

The Ultrastar Family-Storage Solutions for Data-Intensive Applications
IBM, the company that pioneered disk drive technology, today is one of the world's leading providers of advanced data storage solutions. The Ultrastar family of server drives continues this legacy by providing two ways of predicting future drive failures-PFA and Drive-TIP. This information will be used by end users to correct problems before they result in data loss and by systems providers to optimize the design of their computer systems.

Product description data represents design objectives and is provided for comparative purposes; actual results may vary depending on a variety of factors. Product claims are true as of the date of the first printing. This product data does not constitute a warranty. Questions regarding IBM warranty terms or the methodology used to derive this data should be referred to an IBM representative. Data subject to change without notice.

¹ For more information about PFA, see Predictive Failure Analysis, Advanced Condition Monitoring, an Exclusive of IBM.

� International Business Machines Corporation 1997
All Rights Reserved

* The following are trademarks or registered trademarks of the IBM Corporation in the United States, other countries, or both: IBM, Ultrastar, Drive-TIP, No-ID, and Predictive Failure Analysis.
Other product names are trademarks or registered trademarks of their respective companies.
References in this publication to IBM products, programs, or services do not imply that IBM intends to make them available in all countries in which IBM operates.

IBM Storage Systems Division
5600 Cottle Road
San Jose, CA 95193

IBM hard disk drive product information and technical support center:
United States:   1-888-IBM-5214 (1-888-426-5214)
Other countries: 507-253-4110

IBM TECHFAX:  1-408-256-5418 (requires a touch-tone phone)
International customers must call from a fax machine.

European Headquarters

Japan Sales Branch Office
81-466 45-1039

Asia-Pacific Headquarters