Resolved -
This incident has been resolved. There was a catastrophic machine failure with a provider that caused the queue manager to eject. Traffic has been shifted, and all services should return to functional.
Mar 17, 19:26 UTC
Identified -
We are receiving 500s from our compute provider for replica training and replica video generation. Shifting traffic off.
Mar 17, 18:31 UTC