Google explains the causes of Sunday's blackout

On Sunday, overseas users had to deal with a prolonged one Blackout which involved various services Google: from YouTube to Drive, up to Gmail. An alarm cleared within a few hours. Today the Mountain View group returns to the story, explaining the reasons that caused the malfunction, its impact and the measures put in place to prevent it from happening again.

The post shared by bigG on the official blog talks about delays that have affected the search engine and errors generated by some of the managed platforms. There cause is to be found in the change made to the configuration of some servers: instead of being applied to a small number of machines located in a single region, it has been introduced on a larger scale, cutting the ability to manage inbound and outbound traffic from multiple data centers by more than half. The infrastructure that remained operational thus found itself having to deal with an unexpected amount of requests, generating congestion and consequently the slowdowns experienced.

It happened that the servers prioritized the least demanding requests in terms of bandwidth. Google explains this by using a comparison: it is as if it had continued to deliver the most urgent packages by bicycle, along roads blocked by a traffic jam. Mountain View engineers identified the anomaly in seconds, while the diagnosis of the problem and its correction they took a few minutes, restoring a normal situation only hours later. The same slowdown that affected users also slowed down the intervention of the technicians.

During the blackout, the displays of UAF YouTube decreased by about 10% globally, while traffic on cloud services for storage it shrank by 30%. Still, roughly 1% of users gmail has encountered some kind of malfunction: a small share, but that considering how much the platform is adopted around the world translates into millions of people unable to send or receive messages. The Search Engine instead it has only been subject to slowdowns in the management of queries.

The post highlights how Google is still at work to understand in every detail the dynamics of the problem, as well as the reasons for the prolonged wait so that everything could return to normal. The company also undertakes to work to ensure that such incidents do not occur again. This is the closing comment.

We know that people all over the world rely on Google's services and over the years have grown accustomed to expecting everything to work all the time. We take this expectation very seriously: it is our mission, our inspiration. When we don't satisfy it, like Sunday, it motivates us to learn as much as possible and make our services even better, faster and more reliable.

