Lesson Buddy Responsiveness Degredation

Incident Report for Big Blue Swim School

Postmortem

Root Cause: After reviewing our monitoring and database logs, the dev team determined that one of the schedule tables had been growing slowly in size over the last 30 days causing a gradual decay in performance. Table growth isn’t unusual, but this particular table is a maintained table that requires predictable record counts to satisfy schedule search performance requirements. The table size and subsequent temp table sizes reached a critical point where query performance dropped drastically, causing response times to exceed acceptable parameters, 20s - 60s. Normal response times for the search endpoint is expected to be at or below 500 - 750ms.

Impact: Schedule page and schedule search were unresponsive or very slow to load during the incident window, impacting operational teams' and customers' ability to see the schedule or to enroll new lessons. The issue was primarily being reported by internal users.

Response and Recovery: The dev team reviewed the affected queries responsible for the lagged endpoints, identified the data growth problem and immediately began clearing the unnecessary records from the table. Given the volume and size of the excess records, this process took about 30 minutes to complete as it required careful planning and scripting as not to cause further disruptions. This had an immediate positive impact on query and endpoint performance bringing both back within acceptable ranges.

Next Steps: We identified the source of the table bloating and have already released a fix to address the issue. We’re currently monitoring the table records for the efficacy of the fix. We additionally identified another potential problem query that a fix is pending for and will be deployed by EOD 4/15/25. Additionally we identified two additional database monitoring points and those have been created with alerting. We’re also investigating other possible monitoring options to gain better visibility and warning against endpoint latency issues.

Posted Apr 14, 2025 - 14:48 CDT

Resolved

This incident has been resolved.

Posted Apr 12, 2025 - 11:59 CDT

Monitoring

A fix has been implemented and we're seeing an increase in LessonBuddy performance. We will be monitoring as this functionality returns and expect a full return within the next hour.

Posted Apr 12, 2025 - 10:15 CDT

Identified

We've identified the issue and are working towards a resolution with an estimated resolve time of 30 to 45 minutes.

Posted Apr 12, 2025 - 08:13 CDT

This incident affected: LessonBuddy (Web Application (lessonbuddy.com), API).