Friday, July 15, 2011

Did an obscure "piecashit" gizmo bring down Tele2 in the Baltics?

I got an official description of what happened July 14 to knock out Tele2's core switch (?) and take down (for some, briefly, in Latvia, well into the night) services to around two million customers in Latvia, Lithuania and Estonia. Despite it being a Swedish-based international telecoms company, the network crash in the Baltics (as far as I could see) got exactly fuck all coverage in the Swedish media, which, like much of Scandinavia, is zoned out in a hammock somewhere or on the beach in Greece before, to use the Latvian expression, the place goes to the Devil's mother.
Basically what happens is this: the core switch, possibly a Nokia MCSI runs on 48 volt DC current, which is fed to the device through a AC/DC transformer getting it from the 220V grid. The transformer apparently can get hot, so it has a "climate control" (read air conditioner or chiller on it). Because the AC/DC transformer is mission critical, the climate control comes with a temperature sensor and some kind of alarm that alerts Tele2 technical staff that the system has failed, but giving them enough time to prevent damage to the transformer. The alarm and the sensor are what may have been the "piecashit" gizmos that ultimately crashed the network. The alarm failed to go off until it was too late. By then, the transformer had overheated and shorted out, knocking out the switch. With no transformer, there was no way to power up the switch until extensive repairs had been made. In addition, very complex systems like mobile phone network core switches do not usually reboot very easily, especially after a power-failure induced crash.
Utility power was still on --Latvenergo's  press secretary freaked out a little when the media blamed electricity for the failure and said, rightly, that the electricity from the utility was never interrupted, it all happened inside the walls of the Tele2 facility. So it does look like one fucked gizmo brought down everything..
Except -- was there really no UPS (providing DC electricity) attached directly to the switch to keep it going for a while until the techies fix whatever broke or switch to generators? Perhaps the emergency protocol was to go directly to the generator, forgetting the possibility that the transformer, thanks to some cheapo gizmo, could blow? This is how we learn...


Jānis Kiršteins said...

Seems like a poor redundancy for a core system.

D.d said...

Mhm, do we (they) really learn?