Cilium Node Taints After k3s Restart Caused by Clock Skew

After restarting k3s, all nodes got stuck with node.cilium.io/agent-not-ready:NoSchedule taints. Cilium agents were in CrashLoopBackOff, the operator couldn’t remove taints, and nothing would schedule.

problem

After a k3s server restart, Cilium agents crashed with Unauthorized when contacting the API server. The API server logs showed:

"Unable to authenticate the request"
  err="[invalid bearer token, service account token is not valid yet]"

“not valid yet” - the tokens were correctly signed, but timestamped in the future.

root cause

The Proxmox host’s hardware clock (RTC) had been reset during PSU repairs. It was storing local time (UTC+05:30) while the system was configured to read the RTC as UTC. This created a 5h30m offset on every boot.

16:30 IST  system boots, reads RTC value "11:00" as UTC, thinks it's 16:30 IST
16:30 IST  k3s starts, issues service account tokens timestamped at 16:30 IST
11:01 IST  NTP syncs, clock jumps backward 5h30m to 11:01 IST
11:01 IST  API server rejects all tokens: "not valid yet" (they say 16:30)

Every pod started before the NTP correction held a future-dated token. Not just Cilium - CoreDNS, metrics-server, and the Cilium operator were all affected.

why it got stuck

Cilium agents have a 60-second timeout for the initial API server connection. After 60s of rejections, the agent crashes. Kubernetes applies exponential backoff (up to 5 min). Each restart reuses the same pod with the same bad token. Self-recovery takes hours.

The Cilium operator (not agents) is responsible for removing node taints. The operator also held a bad token, so even after agents recovered, taints stayed:

"Failed to patch node while removing taint" error=Unauthorized

solution

immediate fix

Delete all pods holding bad tokens to force fresh ones:

kubectl delete pod -n kube-system -l k8s-app=cilium
kubectl delete pod -n kube-system -l name=cilium-operator
kubectl delete pod -n kube-system -l k8s-app=kube-dns

New pods came up within 30 seconds with correctly-timestamped tokens.

permanent fix

Manually set the correct UTC time in the Proxmox host’s BIOS/UEFI. The hardware clock holds the value across reboots. Don’t bother with timedatectl.

timedatectl set-time didn’t work even on the host. With chrony running, systemd rejects manual time changes, so you’d need to timedatectl set-ntp false first. But even then, it would only fix the running system clock. On the next reboot the host reads the RTC again before NTP corrects anything, and NTP was broken anyway: Tailscale overwrites /etc/resolv.conf, so chrony couldn’t resolve any NTP server hostnames. The BIOS change was the right fix, after which I fixed the Tailscale DNS with --accept-dns=false.

debugging trail

  1. Set k8sServiceHost to bypass ClusterIP - didn’t help. Cilium connected to the API server directly but still got Unauthorized. Not a routing problem.
  2. Checked RBAC - service accounts, ClusterRoleBindings all intact. kubectl worked fine from the node.
  3. Deleted a stuck pod - fresh pod came up healthy in 22 seconds. Proved the problem was specific to pods that existed during the restart.
  4. Agents healthy but taints persisted - discovered the operator (not agents) manages taint removal, and it also held a bad token.
  5. Disproved signing key rotation theory - both service.key and service.current.key had identical md5 hashes. k3s does NOT rotate keys on restart.
  6. Found “not valid yet” in API server logs - time-based rejection, not signature-based.
  7. Found the clock jump - journalctl --boot -u systemd-timesyncd showed a 5h30m backward jump when NTP synced. 5h30m is exactly UTC+05:30 (IST). The RTC had been storing local time (IST) while the system was reading it as UTC, so every boot added 5h30m to the system clock until NTP corrected it.