When AWS stumbled – twice – in October 2025, many teams discovered that “we are in the cloud” is not the same as “we have disaster recovery”.
Applications went offline, customer-facing portals returned errors, and internal dashboards that teams rely on every morning failed to load.
Most of those systems were already running on managed cloud services. They had multi-AZ databases, auto scaling groups, and health checks. What they did not have was a clear answer to three simple questions:
How much data can we afford to lose?
How long can we be down?
Where do we run

