Friday, July 15, 2011

Did an obscure "piecashit" gizmo bring down Tele2 in the Baltics?

I got an official description of what happened July 14 to knock out Tele2's core switch (?) and take down (for some, briefly, in Latvia, well into the night) services to around two million customers in Latvia, Lithuania and Estonia. Despite it being a Swedish-based international telecoms company, the network crash in the Baltics (as far as I could see) got exactly fuck all coverage in the Swedish media, which, like much of Scandinavia, is zoned out in a hammock somewhere or on the beach in Greece before, to use the Latvian expression, the place goes to the Devil's mother.
Basically what happens is this: the core switch, possibly a Nokia MCSI runs on 48 volt DC current, which is fed to the device through a AC/DC transformer getting it from the 220V grid. The transformer apparently can get hot, so it has a "climate control" (read air conditioner or chiller on it). Because the AC/DC transformer is mission critical, the climate control comes with a temperature sensor and some kind of alarm that alerts Tele2 technical staff that the system has failed, but giving them enough time to prevent damage to the transformer. The alarm and the sensor are what may have been the "piecashit" gizmos that ultimately crashed the network. The alarm failed to go off until it was too late. By then, the transformer had overheated and shorted out, knocking out the switch. With no transformer, there was no way to power up the switch until extensive repairs had been made. In addition, very complex systems like mobile phone network core switches do not usually reboot very easily, especially after a power-failure induced crash.
Utility power was still on --Latvenergo's  press secretary freaked out a little when the media blamed electricity for the failure and said, rightly, that the electricity from the utility was never interrupted, it all happened inside the walls of the Tele2 facility. So it does look like one fucked gizmo brought down everything..
Except -- was there really no UPS (providing DC electricity) attached directly to the switch to keep it going for a while until the techies fix whatever broke or switch to generators? Perhaps the emergency protocol was to go directly to the generator, forgetting the possibility that the transformer, thanks to some cheapo gizmo, could blow? This is how we learn...

Thursday, July 14, 2011

Charlie Foxtrot visits Tele2 in Latvia and the Baltics

As I understand it, you weren't supposed to use obscenity on US Army radios, so instead of saying that something was a clusterfuck, you said Charlie Foxtrot instead. Well, today, and to some extent, still, tonight (2100 local time, July 14), Charlie Foxtrot visited Swedish-owned Tele2 in Latvia and took a chunk out of Lithuania and Estonia as well, knocking a total of well over two million customers off the network (just over a million in Latvia, a million pre-paid users in Lithuania, and the undisclosed prepaid part of a total of 467 000 users in Estonia).
The problems started just after 1400 local time when, according to Tele2's official version, a disturbance in electricity supply took down a major switch. To me, this was an immediate, red-flag WTF?? because mission critical switches have, by default, big motherfuckers of UPS (uninterrupted power supplies) that will keep things going until utility power is restored or switched to emergency generators.
My theory is more that there was some kind of perfect storm event or someone stumbled across a power cable (between the UPS and the Mother of All Switches, if that is possible) causing enough of a power fluctuation to crash the switch at a software level and perhaps fuck up some vital hard disks. That is just my guess.
Business and post-paid customers in Lithuania were unaffected, and in Estonia, pre-paid customers were only down for around 20 minutes, or so the spokesperson said.
Whatever happened, it possibly showed the downside of Tele2  and possibly other operators rationalizing their networks by concentrating services in one switching center (for smaller countries like the Baltics). It appears that prepaid service (billing, switching) were run for all three Baltic countries on servers/switches in Riga. Given how mission critical (more mission critical than if the supporting device and software systems were distributed) the Riga switch is, one wonders how there could be any event involving electric power that could take it down. Some part of the truth may come out tommorrow (July 15).