1Password service was unavailable

Incident Report for 1Password

Postmortem

Date of Incident: 2025-04-22
Time of Incident (EST): 8:51am - 9:31am
Service(s) Affected: SSO, Web Sign In, Sign Up, Web Interface, CLI
Impact Duration: 40 minutes

Summary

On April 22, 2025 1Password’s web interface and APIs were unavailable for all customers accessing the US region. This was not a result of a security incident and customer data was not affected. A database query consumed database resources to the point where other queries began to fail. As a result, requests to the web interface and to APIs in the region returned with an error.

Impact on Customers

During the duration of the incident:

  • Web interface, Administration: Customers were unable to log in to the web interface for 1Password. Users already logged in could not use the web application. Administrators were unable to use the administration tools. Users were presented with the error “upstream connect error or disconnect/reset before headers. reset reason: overflow.”
  • Single Sign-on (SSO), Multi-factor Authentication (MFA): Users on accounts with SSO or MFA enabled were unable to sign in and were presented with the error above, or “An unexpected error occurred.”
  • Command Line Interface (CLI): CLI users received the above errors when accessing our web APIs.
  • Browser Extension: Users who needed to authenticate via the web interface were unable to unlock their vaults.

Scope

  • All users accessing services in the US/Global (non-Canada and non-EU) region were affected.

What Happened?

A problem with database infrastructure prevented services from reading or writing data. There were no recent changes to the infrastructure, but additional load from multiple operations contributed to the issue.

  • Timeline of Events:

    • 2025-04-22 08:51 AM ET: Database performance began to degrade.
    • 2025-04-22 08:54 AM ET: Automation detected the issue, and circuit broke requests to the database. The web app began returning errors. The system raised alerts.
    • 2025-04-22 08:58 AM ET: Incident team began investigating.
    • 2025-04-22 09:18 AM ET: Issue was identified.
    • 2025-04-22 09:25 AM ET: Database connections were reset to enable recovery.
    • 2025-04-22 09:31 AM ET: Monitors confirmed service was restored and database was operating normally.
  • Root Cause Analysis: The database stopped writing data to its underlying data store. We are continuing to investigate the cause of this fault.

  • Contributing Factors:

    • The incident happened during the peak of daily traffic.
    • Two subsystems were running database operations during or just before the incident.

How Was It Resolved?

  • Mitigation Steps: We halted non-critical processes to reduce load on the database.
  • Resolution Steps: All database connections were routed to a healthy replica. This restored service.
  • Verification of Resolution: Monitoring systems were closely observed for 1 hour to ensure error rates returned to normal.

What We Are Doing to Prevent Future Incidents

  • Improving monitors: We are updating our monitoring systems to better detect database issues like this before they can impact customers.
  • Improve database performance: We are tuning queries and refactoring services to improve performance and reduce load on the database.
  • Review database configuration: We are reviewing database size and configuration to optimize performance and enhance monitoring capabilities.

Next Steps

  • No action is needed from customers

We are committed to providing a reliable and stable service, and we are taking the necessary steps to learn from this event and prevent it from happening again. Thank you for your understanding.

Sincerely,

The 1Password Team

Posted Apr 25, 2025 - 13:51 EDT

Resolved

This incident has been resolved. Our engineering team identified the cause of the incident, and made changes to prevent recurrence. We will share more information as soon as it is available.
Posted Apr 22, 2025 - 16:08 EDT

Update

We are continuing to monitor for any further issues.
Posted Apr 22, 2025 - 14:07 EDT

Monitoring

We have recovered from a service disruption affecting our the 1Password web interface. Our engineering team has restored all services, with some slowness due to inbound traffic spikes as clients reconnect. The engineering team continues to investigate to determine the root cause.
Posted Apr 22, 2025 - 09:36 EDT

Update

We are continuing to investigate a service disruption affecting our the 1Password web interface. Our engineering team has restored some services, but some APIs are still returning errors. The engineering team is actively closing in on a root cause.
Posted Apr 22, 2025 - 09:26 EDT

Investigating

We are currently investigating a service disruption affecting our the 1Password web interface. Our engineering team is actively working to identify and resolve the issue.
Posted Apr 22, 2025 - 09:09 EDT
This incident affected: USA/Global (Sign in, Sign up, Admin console, SSO (Single Sign On), Multi-factor Authentication (MFA), Command Line Interface (CLI)).