By Greg Bensinger, Shubham Kalia and Deborah Mary Sophia SAN FRANCISCO (Reuters) -Amazon.com said on Monday that a cloud computing unit at its data center in northern Virginia had largely contained fallout from a widespread internet outage that caused global turmoil among thousands of sites, including some of the web's most popular apps like Snapchat and Reddit. Amazon said it had addressed the underlying issue and was close to a resolution, but some users were still complaining of lingering difficulties using services such as digital wallet Venmo and video calling site Zoom. The disruption knocked workers from London to Tokyo offline and halted others from conducting normal everyday tasks like paying hairdressers or changing their airline tickets. It was the largest internet disruption since last year's CrowdStrike malfunction hobbled technology systems in hospitals, banks and airports, highlighting the vulnerability of the world's interconnected technologies. It was at least the third time in five years that AWS's northern Virginia cluster, known as US-EAST-1, contributed to a major internet meltdown. Amazon did not address a request for more clarity about why that particular data center keeps being impacted, instead pointing to an online statement that said the matter had been "fully mitigated." The problems stemmed from what is known as the Domain Name System, or DNS, which prevented applications from finding the correct address for AWS's DynamoDB API, a cloud database relied upon to store user information and other critical data. After hours of disruptions, many applications were gradually coming back online in the afternoon in the U.S. But AWS acknowledged that elevated errors were still affecting several services. There were "tons of broken internal services still now as individual resolution and repair occurring," read language from an internal problem ticket describing the outage and reviewed by Reuters. Lambda, one of AWS's computing services, was experiencing errors due to issues with an internal subsystem, AWS had said earlier. "We are taking steps to recover this internal Lambda system," it said. Earlier, AWS said the root cause of the outage was an underlying subsystem that monitors the health of its network load balancers used to distribute traffic across several servers. The issue, AWS said, originated from within the "EC2 internal network." EC2 refers to Amazon's "Elastic Compute Cloud" service, which provides on-demand cloud capacity within AWS. Businesses use EC2 to run virtual servers to develop, launch and host applications. AWS had said earlier in the day it was seeing signs of recovery for EC2 use at a few data centers. It was taking similar measures at the remaining locations and expects the problems to subside, AWS added, without providing a specific timeline. While some apps like Reddit and Roblox had largely stabilized, according to outage tracking website Downdetector, others, including Snapchat and Duolingo, were showing a resurgence in issues seen earlier in the day. Ken Birman, a computer science professor at Cornell University, said software developers need to build better fault tolerance into their code. He said AWS provides tools developers can use to protect themselves in the event of a problem at one of any of its sprawling network of data centers, and developers can also create backups with other cloud providers. "When people cut costs and cut corners to try to get an application up, and then forget that they skipped that last step and didn't really protect against an outage, those companies are the ones who really ought to be scrutinized later," Birman told Reuters. ISSUE ORIGINATED FROM AWS SITE KNOWN FOR PREVIOUS OUTAGES AWS provides computing power, data storage and other digital services to companies, governments and individuals and is the world's largest cloud provider, followed by Microsoft's Azure and Alphabet's Google Cloud. Disruptions to its servers can cause outages across websites and platforms – ranging from food delivery apps to gaming platforms and airline systems – that rely on its cloud infrastructure. AWS said on its status page that Monday's outage originated at its US-EAST-1 location in northern Virginia, its oldest and largest for web services. The site suffered outages in 2021 and 2020. According to documentation on the AWS website, the US-EAST-1 site is often the default region for many AWS services. "FRAGILE INFRASTRUCTURES" The problem highlights how interconnected everyday digital services have become and their reliance on a small number of global cloud providers, with one glitch wreaking havoc on business and day-to-day life, experts and academics said. "This outage once again highlights the dependency we have on relatively fragile infrastructures," said Jake Moore, global cybersecurity advisor at European cybersecurity firm ESET. In Britain, Lloyd Bank, Bank of Scotland and telecom service providers Vodafone and BT were all hit, according to Downdetector's UK website, as was UK tax, payments and customs authority HMRC's website. "The main reason for this issue is that all these big companies have relied on just one service," said Nishanth Sastry, director of research at the University of Surrey's Department of Computer Science. Ookla, which owns Downdetector, said over 4 million users reported issues due to the incident. "For major businesses, hours of cloud downtime translate to millions in lost productivity and revenue," said Ryan Griffin, U.S. cyber practice leader at insurance broker McGill and Partners. Wall Street was largely unfazed, sending Amazon shares 1.6% higher to $216.48. FROM SNAPCHAT TO VENMO: OUTAGE TAKES DOWN APPS Ookla said at least a thousand companies were affected by the outage. Snapchat last had over 7,500 reports on Downdetector, lower than the peak of more than 22,000 but still higher than the 4,000 outage instances at around 7:00 a.m. ET. Artificial intelligence startup Perplexity, cryptocurrency exchange Coinbase and trading app Robinhood all experienced platform disruptions and attributed them to AWS. Amazon's own services, including its shopping website, Prime Video and Alexa, were also hit, although Downdetector last showed a decrease in severity. Fortnite, owned by Epic Games; Clash Royale and Clash of Clans were among the gaming platforms affected. Uber rival Lyft was also knocked down in the United States. In a post on X, Signal President Meredith Whittaker confirmed the messaging app was hit by the outage as well, though billionaire Elon Musk, who owns X, said his platform continued to work. (Reporting by Shubham Kalia, Devika Nair, Ananya Palyekar and Deborah Sophia in Bengaluru; Additional reporting by James Pearson, Jaspreet Singh and Arsheeya Bajwa; Editing by Saumyadeb Chakrabarty, Joe Bavier, Richard Chang and David Gregorio)
(The article has been published through a syndicated feed. Except for the headline, the content has been published verbatim. Liability lies with original publisher.)
VIDEO SHOWS: HIGHLIGHTS FROM DAY TWO OF THE PAN PACIFIC OPEN RESENDING WITH COMPLETE SCRIPT…
SAO PAULO (Reuters) -Brazilian planemaker Embraer's firm order backlog stood at a record $31.3 billion…
VIDEO SHOWS: FILE FOOTAGE OF FC BAYERN MUNICH HEAD COACH VINCENT KOMPANY SHOWS: MUNICH, GERMANY…
(Reuters) -Philip Morris International raised its annual profit forecast for the third time this year…
(Reuters) -Coca-Cola's revenue and profit exceeded Wall Street's third-quarter expectations on Tuesday, helped by resilient…
(Reuters) -Halliburton beat Wall Street estimates for third quarter profit on Tuesday, helped by steady…