Quick post to help anyone else stuck in the same situation I was today. The internet, or at least my Googlefu, failed me.
Today we had a Kubernetes deployment get stuck, all health checks seemed healthy, but a deployment got stuck somewhere between demoting the current replica set to be the old one, and creating a new one.
This seemed to put all replica set creation in limbo as other deployments also ended up stuck in a similar state.
All health checks I could think of came back clear. Attempts to undo the rollout
seemed to be partially actioned with the command
kubectl rollout status
deployment/app returning the message:
Nothing I tried seemed to help, eventually I resorted to
kill-ing one of the
controller-manager processes on a master node which seemed to kick things back
to life. This seemed an extreme measure but I’d run out of non-destructive ideas
by this point.
My wild speculation is that an in-memory lock was blocking the replica set creation and that killing the process broke the lock.
Hopefully this helps someone else, or me, in the future.