Intermittent performance issues with 1Password.com

Incident Report for 1Password

Postmortem

Incident Postmortem - 1Password Cloud Services Degraded

Date of Incident: 2025-10-20
Time of Incident (UTC): 07:26:00 - 20:55:00
Service(s) Affected: 1Password.com website, Sign in, Access to passwords and other items
Impact Duration: 13 hours, 29 minutes

Summary

On October 20, 2025 at 07:26:00 UTC, 1Password.com faced intermittent latency, authentication failures, and degraded service availability due to a major outage at AWS in the us-east-1 region. This was not a security incident and no customer data was affected.

As a result, the 1Password server-side application experienced degradation or intermittent failures, affecting up to 50% of traffic in the US region. Complete service restoration occurred in conjunction with AWS’s final mitigations around 18:30 UTC.

Impact on Customers

All US customers accessing 1Password cloud services experienced intermittent latency, authentication failures, and degraded availability on 1Password.com.

File Share: Sharing of passwords via links could intermittently fail
Login: Users logging into vaults experienced timeout errors and slow responses
Web Access: Users accessing their vault through the web interface experienced timeout errors and slow responses
API Access: CLI users and API requests received timeout errors and slow responses

What Happened?

At 07:11:00 UTC, AWS began experiencing DNS resolution failures in the us-east-1 region, initially affecting DynamoDB and rapidly cascading to multiple AWS services. 1Password monitoring detected impact at 07:26:00 UTC when monitoring alerts fired for inability to scale up clusters, and an incident was declared.

1Password immediately deployed mitigations inside our infrastructure to ensure there was adequate compute capacity to serve our US-based users, which included pausing deployments and scaling down any services not critical to key functionality for our users.

Timeline of Events (UTC):

06:55:05 - 1Password monitoring triggers warning for unavailable Pods in Deployment (caused by inability to obtain AWS IAM credentials)
07:03:06 - 1Password monitoring alerts for 5xx errors on auth start endpoint (caused by inability to obtain AWS IAM credentials) - pages authentication team, but alert recovers within minutes
07:26:00 - 1Password monitoring alerts for inability to scale clusters, engineers begin investigating, Incident declared
07:26:41- AWS confirms elevated error rates across multiple services
07:49:06 - 1Password monitoring alerts for 5xx errors on auth start endpoint (caused by inability to obtain AWS IAM credentials)
07:51:09 - AWS identifies DNS as the root cause, begins mitigation
08:02:13 - 1Password suspends auto-scaling tooling to retain existing capacity
09:27:33 - AWS reports significant recovery signs
10:35:37 - AWS declares DNS issue fully mitigated, services recovering
14:14:00-15:43:00 - AWS announced full recovery across all services; throttles EC2 launches
16:42:49 - 1Password tooling and users start reporting 503s and inability to login due to volume of traffic
16:50:00 - 1Password services restarted to reset and flush connections, prioritizing post-recovery traffic.
20:53:00 - AWS resolves their incident
20:55:00 - 1Password engineers overscale deployments for stability and overnight observation
Oct 21, 2025 - Incident resolved after confirmation of complete upstream recovery

How Was It Resolved?

Mitigation Steps: 1Password paused deployments and auto-management of cluster capacity to ensure enough capacity was available to serve users through peak access times. As demand outstripped available capacity, 1Password engineering reset the circuit breaker to allow additional connections to the service.
Resolution Steps: AWS announced system restoration and a reduction in throttling of EC2 API calls. To ensure sufficient capacity for peak traffic, 1Password engineers updated the required number of pods for core services the following business day and resumed auto-management of cluster capacity tooling. The following day, 1Password engineers resumed verification of the health of the systems, deployments, and auto-scaling of the services.
Verification of Resolution: Engineers observed monitoring systems and cluster management tooling logs to ensure system health.

Root Cause Analysis

Root Cause Analysis: The failures in AWS's internal network affected multiple AWS product APIs. This disruption directly impacted 1Password’s ability to scale up infrastructure, deploy applications, and retrieve configuration data.
Contributing Factors (if any):
- Third-party incident response services and paging services were affected by the AWS incident, which complicated communications.
- Upstream customer IDPs were affected by the AWS outage, and returned errors that resulted in authentication failures.

What We Are Doing to Prevent Future Incidents

Improve Incident Response: Create additional backup protocols for when our incident response tooling is unavailable.
Improve multi-service outage response: Create strong break-glass runbooks in the event of a multi-service cloud provider outage.

Next Steps and Communication

No action is required from our customers at this time.

We are committed to providing a reliable and stable service, and we are taking the necessary steps to learn from this event and prevent it from happening again. Thank you for your understanding.

Sincerely,

The 1Password Team

Posted Oct 27, 2025 - 18:09 EDT

Resolved

Amazon Web Services (AWS) reported all services returned to normal operations. This incident has been resolved. We will publish a post-mortem as soon as we complete it.

Posted Oct 21, 2025 - 09:51 EDT

Update

Amazon Web Services (AWS) continues to observe improved service recovery, and 1Password components that were previously affected are operational. We will continue to monitor performance and provide another update by 9am EDT tomorrow, October 21st.

Posted Oct 20, 2025 - 17:41 EDT

Monitoring

Amazon Web Services (AWS) continues to observe improved service recovery, and 1Password components that were previously affected appear to be operational. We will continue to monitor performance.

Posted Oct 20, 2025 - 16:18 EDT

Update

Due to a widespread outage affecting Amazon Web Services (AWS), customers might encounter intermittent errors when performing various actions on 1Password.com. Retrying these actions may be successful as the service recovers. Components that were previously affected appear to be operational. Our engineering teams are actively working with AWS to restore full functionality. We’ll continue to share updates as more information becomes available.

Posted Oct 20, 2025 - 15:21 EDT

Update

Due to an upstream provider outage impacting 1Password, customers might encounter intermittent errors when performing various actions on 1Password.com. Retrying these actions may be successful as the service recovers. For updates on the upstream provider outage, follow here: https://health.aws.amazon.com/health/status

Posted Oct 20, 2025 - 13:53 EDT

Update

We are continuing to work on a fix for this issue.

Posted Oct 20, 2025 - 13:44 EDT

Update

Posted Oct 20, 2025 - 13:05 EDT

Identified

Customers might encounter intermittent errors when performing various actions on 1Password.com. Retrying these actions may be successful as the service recovers.

Posted Oct 20, 2025 - 06:08 EDT

Investigating

We are currently investigating this issue.

Posted Oct 20, 2025 - 05:05 EDT

This incident affected: USA/Global (1Password.com website, Sign in, Access to passwords and other items).