Disaster Recovery
9 Data Center Outages at AWS and Azure
Published on:
Sunday, March 3, 2024
By Khursheed Hassan
Cloud data centers have become business-critical infrastructure as tens of thousands of businesses have migrated to the cloud. Data center uptime is of paramount importance to reliably operate today’s businesses. Outages can have significant consequences, ranging from operational panic to substantial financial losses for customers. Amazon Web Services (AWS) and Microsoft Azure are the two largest cloud service providers (CSP), each offering a vast array of services to businesses worldwide. Let's compare their track records when it comes to data center outages.
AWS Outages:Amazon Web Services has experienced several notable outages over the years, affecting a range of services and regions. A few interesting ones are
February 2017 S3 Outage: One of the most notable AWS outages occurred in February 2017 when an engineer's typo during routine maintenance caused widespread disruption in the Amazon S3 (Simple Storage Service) in the US-East-1 region. This incident lasted several hours affecting numerous websites and applications relying on S3 for storage, leading to temporary service interruptions for many customers.
November 2020 Kinesis Outage: In November 2020, AWS experienced an outage affecting its Kinesis Data Streams service in the US-East-1 region. This incident impacted various AWS customers and services relying on Kinesis for real-time data processing, including popular platforms like Netflix, Disney+, and Slack.
August 2021 US-East-1 Outage: In August 2021, AWS encountered another significant outage in its US-East-1 region, affecting multiple services, including EC2 (Elastic Compute Cloud), S3, and RDS (Relational Database Service). The outage was attributed to issues with the power supply in an AWS data center, leading to disruptions for numerous customers and applications.
December 2021 Kinesis Outage: Towards the end of 2021, AWS experienced another outage related to its Kinesis Data Streams service, impacting customers across various regions. The incident resulted in disruptions for applications relying on Kinesis for data streaming and processing.
Azure Outages:Similarly, Microsoft Azure has encountered its share of outages, albeit less publicized than AWS's incidents.
September 2018 South Central US Outage: In September 2018, Azure experienced a major outage in its South Central US region due to severe weather conditions, including lightning strikes that caused a cooling system failure. This incident affected various Azure services, including Virtual Machines, Azure Active Directory, and Azure SQL Database, leading to service interruptions for many customers.
March 2019 Global Azure Outage: In March 2019, Azure encountered a global outage affecting several services across multiple regions. The outage was caused by an issue with Azure's DNS (Domain Name System) service, which prevented users from accessing Azure resources and services for several hours.
September 2020 Azure AD Outage: Azure Active Directory (Azure AD) experienced an outage in September 2020, affecting authentication processes for various Azure services and Microsoft 365 applications. The incident resulted in users being unable to log in to their accounts and access cloud-based resources for a significant period.
March 2021 Azure DNS Outage: Azure experienced another outage in March 2021, primarily affecting its DNS service in multiple regions. The issue resulted in DNS resolution failures, impacting users' ability to access Azure services and external websites hosted on Azure infrastructure.
April 2021 Azure Storage Outage: In April 2021, Azure Storage encountered an outage in its East US region, affecting storage services such as Blob Storage and Queue Storage. The incident caused data access issues for customers relying on Azure Storage for their applications and data storage needs.
Both, AWS and Azure have invested in redundancy and fault tolerance within their infrastructure. However, incidents do occur, highlighting the importance of robust disaster recovery plans for businesses leveraging cloud services. Importantly, businesses evaluating cloud providers should consider factors such as outage history, resilience of infrastructure, and provider communication to make informed decisions based on their specific needs and priorities. Learn more on how Cloudidr can assess the reliability of your cloud operations by contacting us at https://www.cloudidr.com/