Kubernetes Rollout Stuck

Quick post to help anyone else stuck in the same situation I was today. The internet, or at least my Googlefu, failed me.

Today we had a Kubernetes deployment get stuck, all health checks seemed healthy, but a deployment got stuck somewhere between demoting the current replica set to be the old one, and creating a new one.

This seemed to put all replica set creation in limbo as other deployments also ended up stuck in a similar state.

All health checks I could think of came back clear. Attempts to undo the rollout seemed to be partially actioned with the command kubectl rollout status deployment/app returning the message:

Waiting for deployment spec update to be observed...

Nothing I tried seemed to help, eventually I resorted to kill-ing one of the controller-manager processes on a master node which seemed to kick things back to life. This seemed an extreme measure but I’d run out of non-destructive ideas by this point.

My wild speculation is that an in-memory lock was blocking the replica set creation and that killing the process broke the lock.

Hopefully this helps someone else, or me, in the future.