Your Kubernetes cluster is not a single server. It is a distributed system with its own API server, its own identity layer, its own networking model, and dozens of moving parts that traditional vulnerability scanners were never designed to assess. A Nessus scan that covers your VM fleet tells you nothing about whether your etcd is encrypted, whether your RBAC policies grant excessive permissions, or whether the container images running in production contain known CVEs.
Kubernetes security requires a different mental model. The attack surface spans the cluster control plane, the node operating system, the container runtime, the images themselves, and the network policies that govern traffic between pods. Each layer has its own class of vulnerabilities, and each requires its own scanning and hardening strategy.
The Kubernetes Attack Surface: Five Layers
Understanding where vulnerabilities live in a Kubernetes environment requires thinking in layers. Each layer has distinct vulnerability classes and different remediation approaches.
Layer 1: Container Images
Container images are the most prolific source of CVEs in any Kubernetes environment. A typical production cluster runs hundreds of unique images, each containing an operating system layer, language runtimes, application dependencies, and the application code itself. Every one of those layers can introduce known vulnerabilities.
The numbers are sobering. Research from Sysdig's 2025 Container Security Report found that 87% of container images in production contain at least one high or critical severity CVE. The average image contains 127 known vulnerabilities. Most of these are inherited from base images that development teams never audit.
- Base image selection matters enormously. A
ubuntu:latestbase image carries hundreds of packages you probably do not need. Distroless images from Google, Alpine-based images, or Chainguard's hardened images reduce the attack surface by 80-95%. - Multi-stage builds are non-negotiable. Build dependencies (compilers, package managers, test frameworks) should never appear in production images. Use multi-stage Dockerfiles to separate build-time from run-time.
- Pin versions explicitly. Using
:latesttags means your image content changes without your knowledge. Pin to digest-level references for reproducibility.
Layer 2: Cluster Configuration (RBAC and API Server)
RBAC misconfigurations are the most common Kubernetes security finding in penetration tests, and they rarely show up in traditional vulnerability scans. The Kubernetes API server grants fine-grained permissions through Roles, ClusterRoles, RoleBindings, and ClusterRoleBindings. When these are overly permissive, an attacker who compromises a single pod can escalate to cluster-admin.
Common RBAC misconfigurations include:
- Wildcard verbs or resources: Roles that grant
*on all resources give far more access than intended. Always scope permissions to the minimum required verbs and resource types. - Default service accounts with elevated privileges: Every namespace has a
defaultservice account. If you bind cluster-level roles to it, every pod in that namespace inherits those permissions. - Excessive use of ClusterRoles: ClusterRoles apply across all namespaces. Many teams create ClusterRoles when namespace-scoped Roles would suffice, granting cross-namespace access unnecessarily.
- Impersonation rights: The
impersonateverb allows a subject to act as any other user. This is effectively cluster-admin if not scoped carefully.
# Dangerous: ClusterRole with wildcard access
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: too-permissive
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
# Better: Scoped Role with minimum necessary permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: app-reader
namespace: production
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch"]
Layer 3: Network Policies
By default, Kubernetes allows all pod-to-pod communication within a cluster. This means a compromised pod in the staging namespace can reach the database pods in the production namespace without any restriction. Network policies are the Kubernetes-native mechanism for implementing micro-segmentation.
The reality in most organizations: fewer than 30% of production Kubernetes clusters have any network policies applied. Of those that do, many have policies that are too broad to be meaningful.
Effective network policy strategy follows three principles:
- Default deny all ingress and egress. Start with a deny-all policy in every namespace, then explicitly allow the traffic flows your applications require.
- Label-based selection. Use pod labels to define allowed communication paths. This creates a dynamic firewall that automatically applies to new pods matching the selector.
- Namespace isolation. Prevent cross-namespace traffic except where explicitly required. Most applications should only communicate within their own namespace and to shared infrastructure services.
Layer 4: Secrets and etcd Security
Kubernetes secrets are base64-encoded by default, not encrypted. Anyone with read access to the etcd datastore or to the Kubernetes API can read every secret in the cluster. This includes database credentials, API keys, TLS certificates, and service account tokens.
Critical hardening measures for secrets include:
- Enable etcd encryption at rest. Configure the
EncryptionConfigurationresource to encrypt secrets in etcd using AES-CBC or, preferably, AES-GCM. Without this, secrets are stored in plaintext in the etcd database files. - Use external secret stores. HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault with the CSI Secrets Store driver keep secrets out of etcd entirely and provide audit logging for secret access.
- Restrict etcd access. The etcd API should never be exposed outside the control plane network. Require mutual TLS for all etcd client connections.
- Rotate secrets automatically. Kubernetes does not rotate secrets natively. Use operators or external tooling to enforce rotation schedules, especially for service account tokens.
Layer 5: Admission Controllers
Admission controllers are the gatekeepers of your Kubernetes cluster. They intercept API requests after authentication and authorization but before the object is persisted to etcd. This is where you enforce security policies at deployment time, rather than discovering violations after workloads are running.
Essential admission controller configurations:
- Pod Security Standards (PSS): The replacement for the deprecated PodSecurityPolicy. Enforce
restrictedprofile by default, which prevents privilege escalation, host namespace sharing, and running as root. - Image policy enforcement: Block images from untrusted registries. Only allow pulls from your private registry and verified public sources. Tools like Kyverno and OPA Gatekeeper provide policy-as-code frameworks for this.
- Resource limits: Require CPU and memory limits on all containers. Without limits, a single compromised pod can consume all node resources and cause cluster-wide denial of service.
- Image signing verification: Use Sigstore Cosign to sign container images and verify signatures at admission time. This ensures only images built by your CI/CD pipeline can run in production.
Building a Kubernetes Vulnerability Scanning Pipeline
Effective Kubernetes security scanning operates at multiple points in the software lifecycle. Waiting until runtime to discover vulnerabilities is too late. By then, the vulnerable image has been deployed, secrets have been exposed, and misconfigurations are already exploitable.
Stage 1: Build-Time Image Scanning
Integrate image scanning into your CI/CD pipeline so that vulnerable images never reach your container registry. Tools like Trivy, Grype, and Snyk Container can scan images during the build phase and fail the pipeline if critical CVEs are detected.
# GitHub Actions example: Trivy scan on every PR
- name: Scan container image
uses: aquasecurity/trivy-action@master
with:
image-ref: 'myapp:${{ github.sha }}'
format: 'sarif'
severity: 'CRITICAL,HIGH'
exit-code: '1' # Fail the build on findings
Build-time scanning catches the low-hanging fruit, but it has limitations. EPSS and KEV data change daily. An image that was clean at build time may contain actively exploited vulnerabilities a week later. This is why runtime scanning is equally important.
Stage 2: Registry Scanning
Scan every image in your container registry on a scheduled basis, not just at push time. Most container registries (Harbor, ECR, GCR, ACR) offer built-in vulnerability scanning, but the quality varies significantly. Consider running your own scanner against the registry API for consistent coverage across multi-cloud environments.
Stage 3: Runtime Scanning and Configuration Auditing
Runtime scanning identifies what is actually running in your cluster right now. This is where tools like kube-bench (CIS Kubernetes Benchmark), kubeaudit, and Falco provide visibility that image scanning alone cannot.
- kube-bench: Audits your cluster against the CIS Kubernetes Benchmark. Checks control plane configuration, node security, RBAC policies, and network settings.
- kubeaudit: Scans live workloads for security misconfigurations. Detects containers running as root, missing resource limits, missing network policies, and mounted service account tokens.
- Falco: Runtime security monitoring. Detects anomalous behavior in running containers, such as unexpected process execution, file access, or network connections.
Prioritizing Kubernetes Vulnerabilities
The volume of CVEs in a Kubernetes environment makes prioritization essential. A cluster running 200 unique container images might surface 25,000 individual CVE findings. You cannot patch all of them, and you do not need to.
Effective prioritization in Kubernetes environments requires additional context beyond what CVSS provides:
- Is the vulnerable package reachable? Many CVEs in container images affect libraries that are installed but never loaded at runtime. Reachability analysis (available in tools like Snyk and Endor Labs) dramatically reduces false positives.
- Is the container exposed to the network? A vulnerable image running in an isolated pod with no ingress is lower risk than the same image behind a LoadBalancer service.
- What EPSS and KEV data say. Cross-reference every CVE finding with EPSS probability scores and the CISA KEV catalog. A CVE with EPSS 0.94 in a network-exposed pod is a genuine emergency. The same CVE with EPSS 0.001 in an isolated batch job can wait.
- Does the pod have elevated privileges? A vulnerability in a pod running as root with
hostPID: truecan lead to full node compromise. The same vulnerability in a restricted pod has a much smaller blast radius.
Common Kubernetes CVEs: Patterns and Lessons
Studying historical Kubernetes vulnerabilities reveals recurring patterns that inform defensive strategy:
- CVE-2024-21626 (Leaky Vessels): A container escape vulnerability in runc that allowed attackers to break out of containers and access the host filesystem. Affected every container runtime using runc. Lesson: keep your container runtime patched independent of Kubernetes version updates.
- CVE-2023-5528: A Kubernetes privilege escalation via Windows node volumes. Demonstrated that node-level vulnerabilities are distinct from cluster-level ones and require separate scanning.
- CVE-2022-0185: A Linux kernel vulnerability exploitable from within containers that had
CAP_SYS_ADMIN. Lesson: dropping capabilities and enforcing restricted Pod Security Standards prevents entire classes of container escape CVEs.
Hardening Checklist for Production Clusters
A minimum-viable hardening baseline for production Kubernetes clusters:
- Enable Pod Security Standards at the
restrictedlevel for all production namespaces - Implement default-deny network policies in every namespace
- Encrypt etcd at rest with AES-GCM
- Scan all images in CI/CD and block deployments with critical CVEs
- Run kube-bench weekly against the CIS Kubernetes Benchmark
- Audit RBAC quarterly and remove wildcard permissions
- Use distroless or minimal base images to reduce CVE surface area
- Enable audit logging on the API server and ship logs to your SIEM
- Require image signing and verify signatures at admission
- Rotate service account tokens and use short-lived credentials via projected volumes
The Bottom Line
Kubernetes security is not a single tool or a single scan. It is a layered strategy that spans build-time image scanning, cluster configuration auditing, runtime monitoring, and intelligent prioritization of findings. The organizations that get this right treat Kubernetes as its own security domain with its own policies, its own scanning cadence, and its own remediation workflows.
The organizations that get breached are the ones still running kubectl apply with images tagged :latest, no network policies, and cluster-admin bound to the default service account. The gap between these two postures is not budget. It is process.