Letter from Chris @ Virtbiz:
Dear VIRTBIZ Customer,
You are receiving this notice because your service may have been
impacted by a power failure affecting our DAL1 (Canton Street) facility.
The purpose of this notice is to explain the cause of the failure and
the steps taken to prevent recurrence.
As always, we are striving to provide you with the best information
available, presented in a straight-forward, understandable manner. That
said, we also feel it is valuable to provide you with enough information
so that you have a complete picture of the situation. Therefore, you
will read below information that has been compiled by our technicians,
management and our consultants and vendors.
At approximately 4:03pm CST, our primary 480V feed was taken offline due
to a failure at ONCOR, the utility responsible for maintaining the
electric grid in our area. We are told by our representative there that
due to high winds in the area, a high-voltage conductor wire was blown
off its transmission pole. This caused a number of safety devices to
shunt, effectively disconnecting a large portion of East Dallas,
including DART (light rail and bus) service.
Our emergency power systems immediately initiated their automatic
program which includes waiting approximately 30 seconds for utility
power to reacquire and failing that, start generator and transfer load
to generator. The generator started but would not continue to run and
supply the load.
We immediately called hotlines for our generator service contractor, UPS
manufacturer, and consulting electrician to begin troubleshooting the
system.
At 4:30pm CST UPS entered a low battery state and began to shed load.
At 4:32pm CST, diagnosis was made on the electrical system and emergency
power was brought online. The datacenter ran on generator supplied
power until ONCOR was able to restore utility power.
The datacenter was restored to normal power shortly after 5:00pm CST, at
which time technicians immediately began identifying and tending to
affected systems.
No loss of routing was incurred during this event, as power was not lost
to the routing core and BGP was kept up with all providers.
Analysis and discussion with key personnel has identified an engineering
defect in the transfer of power from utility to emergency, as well as a
flaw in the emergency power testing procedures heretofore in place.
The correction for this issue is to configure the coolant pumps and
dry-coolers to power on in a stepped sequence. Work has already begun
on this reconfiguration project.
Had it not been for the unanticipated inrush draw, or had we resolved
the trouble 2 minutes sooner, or if we had 2 minutes more of UPS time
available, there would have been no impact to service. Unfortunately,
despite our best efforts, there was an impact to some customers.
Although the actual service interruption may have been minimal, we
recognize that sometimes these issues have a cascading effect. We
regret and apologize for this.
If you have questions, please do not hesitate to ask any member of our
staff for assistance.
Sincerely,
Chris Gebhardt
Comments Off