Facebook: Learning Lessons at the Prineville Data Center

0

Facebook’s data center in Prineville has been one of the most energy efficient data center facilities in the world since it became operational early this year. Some of the innovative features of the electrical distribution system are DC backup and high voltage (480 VAC) distributions, which have eliminated the need for centralized UPS and 480V-to-208V transformation. The built-in penthouse houses the chiller-less air conditioning system that uses 100 percent airside economization and evaporative cooling to maintain the operating environment.

Thursday, November 17, 2011 · Posted by Veerendra Mulay at 20:57 PM

These features have enabled Facebook to reduce the energy consumption of the data center significantly, which is reflected in power usage effectiveness (PUE) of the facility. The PUE of the Prineville data center was 1.07 at full load, which was verified during commissioning. Since then, during normal operation of the facility, the PUE has varied between 1.06 and 1.1.

Challenges in Operations

Although these features have resulted in high efficiency, we have learned some lessons along the way. And as a part of our commitment to openness via the Open Compute project, we are sharing our experiences and lessons learned with the community, so that everyone might benefit from them.

One challenge we encountered was keeping our air handler lineups from “fighting” with each other as they dealt with the rapid changes in the temperature and humidity of the outside air between day and night. For example, if outside air dampers of one lineup are at 70 percent, the adjacent lineups would have their outside air dampers at 20-30 percent. This alternate modulation, or fighting, often led to stratification of air streams.

Another, more significant, issue was an error in the sequence of operation controls that led to complete closure of the outside air dampers, causing the one-pass airflow system to function like a recirculatory system. The problem began to manifest in late June as outside air conditions started changing rapidly. The economizer demand signal began responding to the changes; that’s when the erroneous control sequence drove economizer demand to 0, leading to complete closure of the outside air dampers. Thus the data center was recirculating the hot exhaust air at high temperature and low humidity. The evaporative cooling system reacted to this high temperature and low humidity, spraying at 100 percent to maintain the maximum allowed supply temperature and dew point temperature. This resulted in cold aisle supply temperature exceeding 80°F and relative humidity exceeding 95 percent. The Open Compute servers that are deployed within the data center reacted to these extreme changes. Numerous servers were rebooted and few were automatically shut down due to power supply unit failure.

The high temperature and high humidity supply air caused condensation on the concrete slab floor (because concrete has high thermal mass and was in contact with much cooler supply air for a long time). Similarly, upon investigation of the failed power supply units, we observed that the failure was condensation-related.

Issue Analysis

We began investigating this failure by subjecting the server to rapidly changing temperature and humidity conditions in a controlled test chamber. The relative humidity level was raised to 97 percent and the temperature was ramped up from 15°C to 30°C (59°F to 86°F) in the span of 10 minutes. Under these conditions, the condensation was observed on the non-heated components. The server chassis was dripping wet. The motherboard, however, showed no signs of condensation due to the fact that it always ran above the dew-point temperature.

Condensation was also evident on the surfaces of power supply components such as capacitors and inductors.

The plot shows that the surface of CAP1 falls below the dew point at about 6 minutes into the temperature ramp. This is exactly the same time the borescope video starts showing a slight change in the reflectivity of the component surfaces. The condensation then continues for another 9 minutes until the surface temperature of CAP1 rises above dew point. During the entire test interval, the PCB in the power supply always ran above the dew point temperature and showed no signs of condensation.

All these findings suggest the possibility that the failures were caused by water droplets being blown onto the PCB of the power supply, rather than condensation occurring on the PCB itself. As shown in figure 7, the water droplets were observed on the AC/DC cables and connectors. It is highly likely that these droplets were blown into the power supply units when the facilities’ maintenance staff increased the airflow in efforts to mitigate the problem.

Corrective Actions

The erroneous control sequence was promptly corrected and additional safeguards were added to eliminate the possibility of repeated occurrence of such an event. These safeguards include reevaluation of the minimum economizer demand setting, which will avoid the complete closure of the outside air dampers. Several monitoring points and alarm settings were modified to monitor and notify ahead of time should outside air conditions begin to change rapidly. Even though the supply air humidity, which was more than 95 percent at times, was out of the operational range of the power supply units (10-90 percent RH, non-condensing), conformal coating has been applied locally in selective areas of the PCB to avoid condensation and to strengthen the power supply units against such corner cases.

Share.

About Author

Founded in 1994 by the late Pamela Hulse Andrews, Cascade Business News (CBN) became Central Oregon’s premier business publication. CascadeBusNews.com • CBN@CascadeBusNews.com

Leave A Reply