INFO: Handling and emergency procedures for very hot outside temperatures in the Karlsruhe region
Very hot outside temperatures, cooling of the server rooms and operation of the VMs
Most users of the bwCloud are certainly aware that very hot outside temperatures (> 35 degrees Celsius in the shade) are a massive challenge for any cooling system. The problem of adequate indoor cooling in very hot outdoor temperatures is not unique to Deutsche Bahn, but must also be solved by server and infrastructure operators. To keep the temperature and humidity in server rooms constant and to dissipate the heat generated by the operation of the hardware, large and complex cooling and air conditioning systems consisting of several components are used. This does not always have to be exclusively the "classic cold air" with which the hardware is cooled (and thus protected), often it is an interplay of heat removal directly in the hardware and cooling of the indoor air.
Nevertheless, every system reaches the limits of its load capacity at a certain outdoor temperature. If other unfavorable factors, such as a fluctuating power supply, are also added during such hot temperature phases, the situation can arise where the cooling systems are overloaded and have to be temporarily shut down.
Measures to be taken in the event of a temporary shutdown of the cooling systems
Top directives for all measures are:
- Ensure the integrity of the data (VMs, attached storage, etc.), avoid data loss.
- Ensure the functionality of the hardware
According to these two directives, the measures in case of a temporary shutdown of the cooling system are oriented
Should the case occur in the bwCloud Karlsruhe region, we will always try to
- first shut down all running VMs properly before the hardware is
- the hardware is switched off.
Questions & Answers
- Question: Will the users be informed via e-mail before such an emergency shutdown of the bwCloud Karlsruhe region?
-
Answer: No, there is usually no time left for such measures.
- Question: Will the shutdown VMs be automatically booted after the cooling is restored?
-
Answer: We try to start up the VMs that have been shut down after cooling has been restored. In some cases there may be delays, which we have to investigate manually.
- Question: Where can I find information about the current status if the Karlsruhe region has been taken offline?
- Answer: First point of contact: this website. However, this web server is also running in a virtualization environment, so this could also be offline. In this case, there is still the alternative site https://scc.fail.