Friday, September 7, 2007

Remote Rebooter

Every system administrator knows the joy of crashing a server remotely, then having to phone someone on-site to get them to hit the reset button. After doing this a few times, the people on-site start to get annoyed. Remote rebooters are devices which allow you to power-cycle a crashed server without annoying anyone.

Commercial remote reboot devices are usually miniature computers which sit on the network. You point a web browser at its IP address, enter your password, and then press a button. This trips a relay which cuts the power to your (crashed) server for a few seconds. If all goes well, the offending server will reboot normally and reappear on the network a few minutes later. This solution usually costs around $300.

Here's a simple alternative which cost about $30 to build. Our setup has two servers sitting side by side. The rebooter attaches to each server's parallel port. And each server's power plug runs through the rebooter. The device is wired such that either computer can cut the power to the other computer. As long as both computers don't crash simultaneously, one can always bring a system back from the dead.

The biggest headache was trying to avoid a vicious reboot cycle. Imagine A is commanded to reboot B. As the power is restored to B, B accidentally happens to issue the command to reboot A. As A is power-cycled, it accidentally issues the command to reboot B. Rinse. Repeat.

The first method of avoiding this was to make the command an unlikely one to appear accidentally. Since all-bits-high and all-bits-low both happen naturally during a PC's reboot, CMOS logic was added so that a specific pattern of four-bits-high and one-bit-low was required to issue a reboot command. The second method was to make sure the trigger pattern would be ignored if it was just a transient spasm. This was done using some analog electronics that required the valid input to be held for five seconds before it would activate the relays.

One quirk of our servers was that merely cutting and restoring power wasn't sufficient, we needed to press the power button as well. So there's a second pair of relay circuits which hooks up to the power button's pins on each server's motherboard. First the server opens the 120v power relay for 15 seconds. Then it closes the power button's reed relay for 5 seconds.

The resulting gadget can be controlled with a parallel port monitor.

No comments: