Machine Identity Management Guide
A comprehensive guide to managing machine identities including certificates, service accounts, API keys, secrets rotation, PKI setup, and machine-to-machine authentication patterns.
Every organization invests heavily in managing human identities — user accounts, passwords, MFA, access reviews. But the majority of identities in a modern enterprise are not human. They are machines: servers, containers, microservices, CI/CD pipelines, IoT devices, APIs, and automated scripts. Gartner estimates that machine identities outnumber human identities by a factor of 45:1 in typical enterprises, and that ratio is growing.
Machine identities are often poorly managed. Service accounts with static passwords that never rotate, API keys hardcoded in source code, certificates that expire without warning, and secrets spread across configuration files and environment variables. Each of these is a breach waiting to happen.
This guide provides a structured approach to machine identity management — from discovery through lifecycle automation.
What You Will Learn
- The types of machine identities and their use cases
- How to design a machine identity governance framework
- Setting up a Private PKI for certificate-based authentication
- Managing service accounts, API keys, and secrets securely
- Automating secrets rotation with zero downtime
- Machine-to-machine authentication patterns (mTLS, OAuth, SPIFFE)
Prerequisites
- Machine identity inventory — Before you can manage machine identities, you need to know what exists. Prepare to discover service accounts, certificates, API keys, and secrets across your environment.
- Secrets management platform — A centralized vault (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, CyberArk Conjur) for storing and distributing secrets.
- Certificate infrastructure — Either a cloud-based CA (AWS Private CA, Google Cloud CA Service) or the ability to deploy a private PKI.
- Configuration management — A way to deploy configuration changes to machines (Ansible, Puppet, Chef, or Kubernetes ConfigMaps/Secrets).
- Monitoring stack — Logging and alerting infrastructure to monitor certificate expiry, secret access, and anomalous machine behavior.
Architecture Overview
Machine identity management involves several interconnected systems:
- Certificate Authority (CA): Issues, renews, and revokes X.509 certificates for servers, services, and devices.
- Secrets Vault: Centrally stores and controls access to API keys, database credentials, encryption keys, and other secrets. Provides dynamic secrets where possible.
- Identity Broker: Enables machine-to-machine authentication using short-lived credentials. SPIFFE/SPIRE is the emerging standard for workload identity.
- Discovery Engine: Scans the environment for machine identities — certificates on load balancers, secrets in environment variables, service accounts in AD/cloud IAM.
- Lifecycle Automation: Handles credential rotation, renewal, and revocation without human intervention.
Types of machine identities:
| Type | Format | Typical Lifetime | Use Case | |------|--------|-----------------|----------| | X.509 certificate | PEM/DER | 90 days - 1 year | TLS, mTLS, code signing | | Service account | Username/password or token | Permanent (bad) / rotated (good) | Application-to-database, AD service | | API key | String token | Permanent (bad) / rotated (good) | SaaS API access, third-party integrations | | OAuth client credential | client_id + client_secret | Secret rotated, ID permanent | Machine-to-machine API authorization | | SSH key | RSA/Ed25519 key pair | Varies | Server access, Git operations | | SPIFFE SVID | X.509 or JWT | Minutes to hours | Workload-to-workload in service mesh | | Managed identity | Platform token | Minutes (auto-rotated) | Cloud-native workloads (AWS IAM roles, Azure MI) |
Step-by-Step Implementation
Step 1: Discover Existing Machine Identities
You cannot secure what you do not know about. Run a comprehensive discovery:
Certificates:
# Scan all servers for TLS certificates
nmap --script ssl-cert -p 443 10.0.0.0/16 -oX cert-scan.xml
# Check certificate details
openssl s_client -connect server:443 -servername server.example.com </dev/null 2>/dev/null | \
openssl x509 -noout -subject -issuer -dates -serial
Service accounts in Active Directory:
# Find all service accounts
Get-ADServiceAccount -Filter * | Select-Object Name, Enabled, PasswordLastSet, LastLogonDate
# Find accounts with non-expiring passwords
Get-ADUser -Filter {PasswordNeverExpires -eq $true -and Enabled -eq $true} -Properties PasswordLastSet |
Where-Object { $_.PasswordLastSet -lt (Get-Date).AddDays(-90) }
Secrets in source code (DO NOT do this in production — scan in CI/CD):
# Use a secret scanner like truffleHog or gitleaks
gitleaks detect --source /path/to/repo --report-format json --report-path leaks.json
Cloud IAM service accounts:
# AWS: List IAM roles and their last activity
aws iam list-roles --query 'Roles[*].[RoleName,CreateDate]' --output table
# GCP: List service accounts
gcloud iam service-accounts list --format="table(email, disabled)"
# Azure: List service principals
az ad sp list --all --query "[].{Name:displayName, AppId:appId}" --output table
Create a central inventory of all discovered machine identities, including owner, purpose, credential type, rotation status, and expiry date.
Step 2: Deploy a Secrets Management Platform
Centralize all machine secrets in a vault:
# HashiCorp Vault: Enable secrets engine for database credentials
vault secrets enable database
# Configure dynamic database credentials
vault write database/config/mydb \
plugin_name=postgresql-database-plugin \
allowed_roles="app-readonly" \
connection_url="postgresql://{{username}}:{{password}}@db.example.com:5432/mydb" \
username="vault_admin" \
password="REDACTED"
# Create a role that generates short-lived credentials
vault write database/roles/app-readonly \
db_name=mydb \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
default_ttl="1h" \
max_ttl="24h"
With dynamic secrets, the vault generates a unique database credential for each application instance, valid for only 1 hour. When the lease expires, the vault automatically revokes the credential. No more shared, static database passwords.
Step 3: Implement Certificate-Based Machine Authentication
For service-to-service communication, mTLS (mutual TLS) with short-lived certificates is the gold standard:
# SPIFFE/SPIRE configuration for workload identity
# spire-server.conf
server {
bind_address = "0.0.0.0"
bind_port = "8081"
trust_domain = "example.com"
data_dir = "/opt/spire/data/server"
ca_ttl = "24h"
default_x509_svid_ttl = "1h"
}
plugins {
NodeAttestor "k8s_psat" {
plugin_data {
clusters = {
"production" = {
service_account_allow_list = ["spire:spire-agent"]
}
}
}
}
KeyManager "disk" {
plugin_data {
keys_path = "/opt/spire/data/server/keys.json"
}
}
}
SPIFFE (Secure Production Identity Framework for Everyone) assigns each workload a cryptographic identity (SVID — SPIFFE Verifiable Identity Document) in the form of an X.509 certificate or JWT. SPIRE is the reference implementation that automates issuance and rotation.
Step 4: Implement API Key Rotation
For third-party API keys and SaaS integrations that cannot use mTLS or OAuth:
# Automated API key rotation script
import vault_client
import saas_admin_api
def rotate_api_key(service_name, vault_path):
# Step 1: Generate new API key at the SaaS provider
new_key = saas_admin_api.create_api_key(
service=service_name,
name=f"{service_name}-{datetime.now().isoformat()}",
permissions=["read", "write"]
)
# Step 2: Store new key in vault
vault_client.write(vault_path, {
"api_key": new_key.key,
"created_at": datetime.now().isoformat(),
"expires_at": new_key.expires_at
})
# Step 3: Wait for applications to pick up the new key
# Applications should read from vault on each request or on a short poll
time.sleep(300) # 5 minutes grace period
# Step 4: Revoke the old key
old_key_id = vault_client.read(f"{vault_path}/previous")["key_id"]
saas_admin_api.revoke_api_key(service=service_name, key_id=old_key_id)
# Step 5: Log the rotation event
log_rotation_event(service_name, old_key_id, new_key.id)
Step 5: Secure Service Accounts
Service accounts in Active Directory and cloud IAM require special treatment:
-
Eliminate password-based service accounts. Migrate to Group Managed Service Accounts (gMSA) in AD, which automatically rotate passwords. In cloud environments, use managed identities (AWS IAM roles for EC2/Lambda, Azure Managed Identity, GCP Workload Identity).
-
Apply least privilege. Every service account should have the minimum permissions required. Audit and remove unused permissions quarterly.
-
Disable interactive login. Service accounts should never be used for interactive (human) login. Disable interactive logon rights and monitor for violations.
# Create a Group Managed Service Account (gMSA)
New-ADServiceAccount -Name "svc-webapp" `
-DNSHostName "svc-webapp.contoso.com" `
-PrincipalsAllowedToRetrieveManagedPassword "WebServers" `
-KerberosEncryptionType AES256
# Install gMSA on the target server
Install-ADServiceAccount -Identity "svc-webapp"
Step 6: Implement Monitoring and Alerting
Machine identities need continuous monitoring:
# Monitoring rules for machine identity health
alerts:
- name: "Certificate expiring within 30 days"
query: |
certificate_expiry_days < 30 AND certificate_auto_renew = false
severity: warning
action: notify_certificate_owner
- name: "Service account password not rotated in 90 days"
query: |
service_account_password_age_days > 90
severity: critical
action: notify_security_team
- name: "API key used from unexpected IP"
query: |
api_key_usage WHERE source_ip NOT IN allowed_ips
severity: high
action: alert_security_ops
- name: "Machine identity used outside business hours"
query: |
service_account_login WHERE hour NOT BETWEEN 0 AND 23
AND service_type = "batch_job"
AND login_hour NOT IN expected_schedule
severity: medium
action: investigate
- name: "Vault secret access anomaly"
query: |
vault_secret_reads WHERE count > baseline * 3
severity: high
action: alert_security_ops
Configuration Best Practices
- Eliminate static secrets. Replace every static credential with a dynamic one (vault-issued, managed identity, SPIFFE SVID) or at minimum enforce automated rotation.
- Use managed identities in cloud. AWS IAM roles, Azure Managed Identity, and GCP Workload Identity eliminate the need for static credentials entirely in cloud-native workloads.
- Certificate lifetimes should be short. 90 days maximum for server certificates, 1 hour for workload identity certificates (SPIFFE SVIDs). Shorter lifetimes reduce the window of exposure from a compromised credential.
- Separate machine and human identities. Never share credentials between humans and machines. Service accounts should have dedicated credentials, dedicated permissions, and dedicated monitoring.
- Tag every machine identity with an owner. Every service account, API key, and certificate must have a documented owner who is responsible for its lifecycle.
- Automate everything. Manual rotation does not scale. Invest in automation for issuance, rotation, and revocation from day one.
Testing and Validation
- Rotation testing: Trigger a manual rotation for each credential type and verify that applications continue to function without downtime.
- Revocation testing: Revoke a certificate and verify that the relying service rejects it immediately (OCSP) or within the CRL refresh interval.
- Failover testing: Simulate a vault outage. Verify that applications gracefully degrade (use cached credentials) rather than crash.
- Secret scanner testing: Intentionally commit a dummy secret to a test repository and verify that the CI/CD scanner catches it.
- Expiry alerting: Set a test certificate to expire in 7 days and verify that the alerting pipeline fires correctly.
- Least privilege audit: Attempt to use a service account for an operation outside its granted permissions. Verify the operation is denied.
Common Pitfalls and Troubleshooting
| Problem | Cause | Solution | |---------|-------|----------| | Application crashes during secret rotation | App caches the credential and does not re-read from vault | Implement credential refresh logic; use vault agent for sidecar injection | | Certificate renewal fails silently | ACME client or renewal script has a bug | Monitor for renewal failures with dedicated alerts; test renewal before expiry | | Service account lockout | Too many failed auth attempts from a misconfigured app | Implement circuit breakers; alert on repeated auth failures | | Secrets sprawl | Developers copy secrets to local .env files and config maps | Enforce vault-only access; scan for secrets in CI/CD and at deploy time | | Orphaned machine identities | Service decommissioned but identity not cleaned up | Implement automated deprovisioning tied to infrastructure lifecycle (Terraform, Kubernetes) | | mTLS breaks after certificate rotation | Client cached the old server certificate | Use trust bundles that include both old and new CA certificates during rollover |
Security Considerations
- Machine identity compromise is stealthy. Unlike human account compromise (which may trigger MFA challenges or user reports), machine identity compromise can go undetected for months. Invest heavily in behavioral monitoring for machine identities.
- The vault is a crown jewel. If an attacker compromises your secrets vault, they gain access to every credential it manages. Treat the vault with the same security rigor as your domain controllers: dedicated infrastructure, hardware-backed encryption, strict access controls, and comprehensive audit logging.
- Supply chain attacks target machine identities. The SolarWinds attack demonstrated how compromised build pipelines and signing certificates can have catastrophic consequences. Protect CI/CD credentials with the same vigor as production credentials.
- Certificate authority compromise is existential. If your private CA's root key is compromised, every certificate it issued is untrustworthy. Use an HSM for root key storage and keep the root CA offline.
- Orphaned machine identities are a backdoor. A service account that belongs to a decommissioned application but was never disabled is an attacker's gift. Implement automated identity lifecycle tied to infrastructure provisioning tools.
Conclusion
Machine identity management is one of the most underinvested areas of enterprise security, yet machine identities outnumber human identities by orders of magnitude. The attack surface they present is enormous: static credentials, expired certificates, over-privileged service accounts, and secrets scattered across configuration files.
The path forward is clear: centralize secrets in a vault, automate credential rotation, use short-lived certificates for service authentication, migrate to managed identities in the cloud, and monitor machine identity usage with the same rigor you apply to human identity. Each step materially reduces risk and brings your organization closer to a mature machine identity posture.
FAQs
Q: How do I start if I have no machine identity inventory? A: Begin with discovery. Scan your network for certificates, query Active Directory for service accounts, audit cloud IAM for service principals, and run a secrets scanner across your code repositories. Build the inventory incrementally.
Q: Should I use a separate vault for each environment? A: Yes. Production, staging, and development should have separate vault instances (or at minimum separate namespaces with strict access controls). This prevents development credentials from accidentally being used in production and limits blast radius.
Q: How often should API keys be rotated? A: At minimum every 90 days. For high-sensitivity integrations, rotate every 30 days. If possible, use dynamic credentials (OAuth client credentials with short-lived tokens) instead of static API keys.
Q: What is SPIFFE and do I need it? A: SPIFFE is a standard for workload identity that assigns each service a cryptographic identity. You need it if you run microservices and want to authenticate service-to-service communication without managing individual certificates. It is especially valuable in Kubernetes environments.
Q: How do I handle machine identities in hybrid environments? A: Use a secrets vault that spans both on-premises and cloud (HashiCorp Vault with multi-datacenter replication). For authentication, use SPIFFE for cross-environment workload identity or OAuth 2.0 client credentials for API-based integration.
Q: What is the cost of machine identity management? A: Secrets management platforms range from free (HashiCorp Vault open source) to $50K+/year for enterprise features. Cloud-native managed identities are free. The biggest cost is the engineering effort to migrate from static credentials to dynamic ones, which is a one-time investment that pays off in reduced breach risk and operational overhead.
Share this article