Why Gmail went down yesterday : the verdict

gmail_down When users yesterday were unable to connect to the popular service and were receiving errors for at least 15-20 minutes, they rushed to express their panic on , with thousands of tweets from frustrated users being posted. But what exactly happened over there at HQ?

has thousands and thousands of overlapping mail servers which can take the reins if one fails because the data is replicated and spread all around. However there are also request servers which route the requests for to whichever server is available at the time.

It’s now been found that has blamed the down time on some recent changes to the request routers. Ironically, at least some of the changes were meant to improve ’s ability to stay online, but underestimated the load these changes would place on the routers when it took a relatively small number of servers offline for upgrades.

engineering VP Ben Treynor says:

At about 12:30 pm Pacific a few of the request routers became overloaded and in effect told the rest of the system “stop sending us traffic, we’re too slow!”. This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded. As a result, people couldn’t access via the web interface because their requests couldn’t be routed to a server. IMAP/POP access and mail processing continued to work normally because these requests don’t use the same routers.

I’d like to apologize to all of you — today’s outage was a Big Deal, and we’re treating it as such. We’ve already thoroughly investigated what happened, and we’re currently compiling a list of things we intend to fix or improve as a result of the investigation.

Looks like , although just becoming the third largest Service in the U.S, isn’t immune to technical difficulties. They too can mess up every once in a while just like the best of us!

, , , , ,

Leave a Reply