UCC: Connection Saturation and High Response Times#
Overview#
USD Content Cache (UCC) serves cached USD assets to render workers over HTTP/HTTPS. When connection limits are exceeded or worker capacity is insufficient, UCC response times degrade, client requests time out, and simulations fail. UCC uses NGINX as its foundation, which has per-worker connection limits that must be sized appropriately for the workload.
Connection saturation occurs when:
Worker connection limits (
worker_connections) are undersized for concurrent client requestsCPU cores allocated to UCC are insufficient for the connection handling workload
Replica count is too low to distribute connection load across the cluster
Client concurrency spikes exceed UCC’s configured capacity
HTTP/1.1 is used instead of HTTP/2, requiring one connection per concurrent request
When connection saturation occurs, UCC cannot accept new connections, client requests queue or time out, and P99 response times increase dramatically (from milliseconds to seconds or tens of seconds). This manifests as simulation failures with timeout errors or “connection refused” messages.
The recommended sizing formula for UCC connections is:
worker_connections = (GPU_count * client_concurrency) / replica_count / vCPU_count * safety_margin
For example, with 66 GPUs, 256 client concurrency, 5 replicas, and 32 vCPUs per replica:
worker_connections = (66 * 256) / 5 / 32 * 1.5 ~ 1,600
Symptoms and Detection Signals#
Visible Symptoms#
High P99 response times - Response times exceeding 5-10 seconds in P99 percentile
Client timeout errors - Render workers reporting connection timeouts or “connection refused”
Simulation failures - Simulations failing with gRPC UNKNOWN errors or HTTP timeout errors
Connection queue buildup - Connections waiting for available worker slots
Log Messages#
Connection Refused#
Where to find these logs:
Location: Render Worker Pod
Application: OmniClient / Storage Service
Description: Logs indicating connection failures to UCC
# Look for timeout or connection errors in render worker logs
# Patterns may include:
# - *refused*
# - *timeout*
# - *connect*
# - *dial*
Timeout Errors#
Where to find these logs:
Location: Render Worker Pod
Application: Storage Service
Description: Logs indicating request timeouts from UCC
# Look for timeout errors in render worker logs
# Patterns may include:
# - *timeout*
# - *deadline*
# - *context*
Metric Signals#
The following Prometheus metrics can be used to detect connection saturation before it causes simulation failures. Monitor these metrics proactively to identify capacity issues early.
nginx_connections_active{pod=~"usd-content-cache-.*", namespace="ucc"}
nginx_connections_waiting{pod=~"usd-content-cache-.*", namespace="ucc"}
nginx_http_request_duration_seconds{quantile="0.99", pod=~"usd-content-cache-.*"}
rate(nginx_http_requests_total{pod=~"usd-content-cache-.*", status=~"5.."}[5m])
container_network_receive_bytes_total{pod=~"usd-content-cache-.*", namespace="ucc"}
Connection capacity is tracked through NGINX connection metrics that
reveal worker utilization and queueing. The nginx_connections_active
metric reports active connections per NGINX worker process. High values
approaching the configured worker_connections limit indicate
connection saturation risk, requiring alerts when exceeding 80% of the
configured capacity. When connection limits are reached, NGINX begins
queueing incoming connections, tracked by nginx_connections_waiting.
Non-zero values indicate connection queueing has begun, with high values
signaling severe saturation. This metric should typically remain at zero
or very low under normal operation.
Request performance degradation manifests in latency metrics tracked
through nginx_http_request_duration_seconds. The P99 quantile (99th
percentile) reveals tail latency, where values exceeding 5-10 seconds
indicate severe performance degradation. Compare against baseline
performance, typically under 500ms for cache hits, to identify
connection saturation impacts. When saturation prevents requests from
being processed, error rates increase, captured in
nginx_http_requests_total with 5xx status codes. Sharp increases in
this rate may indicate connection saturation causing request failures,
as normal operation should have minimal 5xx errors.
Network capacity constraints can contribute to connection issues,
tracked through container_network_receive_bytes_total, which
measures total inbound traffic to UCC pods. High values approaching NIC
capacity may indicate network saturation contributing to connection
problems. Compare against VM SKU NIC limits to determine if network
bandwidth is becoming a constraint alongside connection capacity.
Root Cause Analysis#
Known Causes#
Connection saturation in UCC is typically caused by undersized worker connection limits, insufficient CPU cores, or too few replicas to handle the workload.
Undersized Worker Connection Limits#
The worker_connections parameter in NGINX controls the maximum
number of simultaneous connections each worker process can handle. The
default value (often 1,024) is insufficient for high-concurrency
simulation workloads. Each NGINX worker process runs on one CPU core, so
total connection capacity is
worker_connections * vCPU_count * replica_count.
For example, with default worker_connections=1024, 32 vCPUs, and 5
replicas:
Total capacity = 1,024 * 32 * 5 = 163,840 connections
However, this includes both inbound (client→UCC) and outbound (UCC→S3) connections. A workload with 66 GPUs and 256 client concurrency requires:
Required connections ~ 66 * 256 = 16,896 inbound + outbound to S3
If worker_connections is too low, NGINX rejects new connections once
the limit is reached, causing “connection refused” errors and queueing.
Check current worker connection configuration:
# Check Helm values for worker_connections
helm get values <ucc-release-name> -n ucc | grep worker_connections
# If not set, check default from ConfigMap
kubectl get configmap -n ucc <ucc-configmap> -o yaml | grep worker_connections
Insufficient CPU Cores#
NGINX spawns one worker process per CPU core. If CPU allocation is too
low, even with properly sized worker_connections, UCC cannot handle
the connection load because there are not enough worker processes to
distribute connections across.
Check CPU allocation:
# Check UCC pod CPU limits and requests
kubectl get pods -n ucc -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].resources}{"\n"}{end}'
# Check actual CPU usage
kubectl top pods -n ucc
Too Few Replicas#
UCC replica count may be too low to distribute connection load. The recommended sizing is based on network bandwidth requirements: provision at least 3.3 Gbps of network bandwidth per GPU, with a minimum of 1 Gbps per GPU.
For example, with 66 GPUs requiring 217.8 Gbps total bandwidth, and VMs with 10 Gbps NICs:
Required replicas ~ 217.8 Gbps / 10 Gbps per node ~ 22 pods
However, network bandwidth is typically the primary sizing factor; connection limits are secondary.
Check current replica count:
# Check UCC StatefulSet replica count
kubectl get statefulset -n ucc
# Check Helm values
helm get values <ucc-release-name> -n ucc | grep replicas
Other Possible Causes#
HTTP/1.1 Instead of HTTP/2
HTTP/1.1 requires one connection per concurrent request
HTTP/2 multiplexes multiple requests over a single connection
Using HTTP/1.1 exhausts connections faster than HTTP/2
Check client HTTP version support and UCC HTTP/2 configuration
Load Balancer Session Affinity Disabled
Without session affinity, client retries hit different UCC pods
Retry storms amplify connection pressure across all pods
Each retry counts as a new connection without affinity
Cloud Provider SNAT Port Exhaustion
Outbound connections to S3 consume SNAT ports
SNAT exhaustion prevents new upstream connections
More common in cloud environments with NAT gateways or load balancers
Network Latency or Packet Loss
High network latency increases connection lifetime
Packet loss triggers retransmissions and connection delays
Connections remain open longer, consuming worker slots
Troubleshooting Steps#
Diagnostic Steps for Known Root Causes#
1. Monitor Connection Metrics and Identify Saturation#
Check active and waiting connection counts to determine if saturation is occurring.
# Access UCC metrics endpoint
kubectl port-forward -n ucc svc/\<ucc-service-name> 9145:9145
curl http://localhost:9145/metrics | grep "nginx_connections"
# Query Prometheus metrics:
# - nginx_connections_active{pod=~"usd-content-cache-.*"}
# - nginx_connections_waiting{pod=~"usd-content-cache-.*"}
# Check if active connections approach worker_connections limit
# Alert threshold: active > 0.8 * worker_connections
Analysis:
Active connections consistently near
worker_connectionslimit indicate saturation.Non-zero
nginx_connections_waitingindicates connection queueing.P99 response times >5s correlate with high active connection counts.
Compare active connections across pods to identify load distribution issues.
Resolution:
If active connections approach limit, increase
worker_connections(see step 2).If queueing occurs, scale replicas or increase CPU allocation (see steps 3-4).
Monitor connection metrics after changes to verify improvements.
2. Increase Worker Connection Limits#
If connection saturation is detected, increase the
worker_connections parameter in NGINX configuration.
# Get current Helm values
helm get values \<ucc-release-name> -n ucc -o yaml > current-values.yaml
# Edit: Add or update nginx.workerConnections
# Recommended: (GPU_count * concurrency) / replicas / vCPU * 1.5
# Example: (66 * 256) / 5 / 32 * 1.5 ~ 1,600
# Apply updated values
helm upgrade <ucc-release-name> <chart-path> -n ucc -f current-values.yaml
# Verify configuration applied
kubectl get configmap -n ucc <ucc-nginx-config> -o yaml | grep worker_connections
# Monitor connection metrics after upgrade
kubectl port-forward -n ucc svc/<ucc-service-name> 9145:9145
curl http://localhost:9145/metrics | grep "nginx_connections_active"
Analysis:
Current
worker_connectionscompared to calculated requirement determines if increase is needed.Post-upgrade, active connections should remain well below new limit (target <70%).
Verify no connection queueing (
nginx_connections_waitingshould be zero).
Resolution:
Set
worker_connectionsto calculated value based on sizing formula.Apply via Helm upgrade:
helm upgrade <release> <chart> -n ucc -f values.yamlRestart UCC pods if configuration hot-reload is not supported.
Monitor metrics for 24-48 hours to validate capacity improvements.
3. Scale UCC Replicas to Distribute Load#
If connection saturation persists after increasing worker limits, scale the number of UCC replicas to distribute load across more pods.
# Check current replica count
kubectl get statefulset -n ucc
# Calculate required replicas based on network bandwidth
# Required bandwidth = GPU_count * 3.3 Gbps (recommended) or 1 Gbps (minimum)
# Example: 66 GPUs * 3.3 Gbps = 217.8 Gbps
# VM NIC capacity: 10 Gbps → required replicas ~ 22
# Update Helm values: cluster.replicas
# Edit current-values.yaml
# Apply updated replica count
helm upgrade <ucc-release-name> <chart-path> -n ucc -f current-values.yaml
# Wait for new pods to become ready
kubectl get pods -n ucc -w
# Verify load distribution across replicas
kubectl top pods -n ucc
Analysis:
Current replica count compared to calculated requirement determines scaling needs.
Post-scaling, connection load should distribute evenly across pods.
Network bandwidth per pod should be well below NIC capacity (target <70%).
Resolution:
Scale replicas to match calculated requirement (network bandwidth-based sizing).
Apply via Helm upgrade:
helm upgrade <release> <chart> -n ucc -f values.yamlMonitor connection distribution and response times across all replicas.
Verify load balancer distributes traffic evenly across new pods.
4. Increase CPU Allocation for More Worker Processes#
If CPU utilization is high (>80%) and connection saturation persists, increase CPU allocation to spawn more NGINX worker processes (one per core).
# Check current CPU allocation
kubectl get pods -n ucc -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].resources}{"\n"}{end}'
# Check actual CPU usage
kubectl top pods -n ucc
# If CPU usage consistently >80%, increase CPU limits
# Edit Helm values: cluster.container.resources.limits.cpu
# Example: increase from 16 to 32 vCPUs
# Apply updated CPU allocation
helm upgrade <ucc-release-name> <chart-path> -n ucc -f current-values.yaml
# Verify NGINX spawned more workers
kubectl exec -n ucc <ucc-pod> -- ps aux | grep "nginx: worker process" | wc -l
# Should equal vCPU count
Analysis:
High CPU utilization (>80%) with connection queueing indicates CPU bottleneck.
Number of NGINX worker processes equals vCPU count (one per core).
More workers allow more concurrent connection handling.
Resolution:
Increase CPU allocation to match workload needs.
Verify worker process count equals new vCPU allocation.
Monitor CPU and connection metrics post-upgrade.
Consider upgrading VM SKU if CPU limits are reached.
Other Diagnostic Actions#
Check load balancer distribution: Verify traffic distributes evenly across UCC replicas:
# Check request distribution across pods (from UCC metrics) kubectl port-forward -n ucc svc/<ucc-service-name> 9145:9145 curl http://localhost:9145/metrics | grep "nginx_http_requests_total" # Compare request counts per pod # Uneven distribution may indicate load balancer issues
Review client session affinity: Check if load balancer has session affinity enabled:
# Check Service configuration kubectl get svc -n ucc <ucc-service> -o yaml | grep -A 5 "sessionAffinity" # For cloud provider load balancers, check annotations kubectl get svc -n ucc <ucc-service> -o yaml | grep -i "affinity\|sticky"
Monitor cloud provider SNAT usage: Check if SNAT port exhaustion is contributing:
# For Azure AKS: # az monitor metrics list \ # --resource <load-balancer-resource-id> \ # --metric "UsedSnatPorts,AllocatedSnatPorts" \ # --interval PT1M --aggregation Average # For AWS: # Check NAT Gateway or load balancer connection tracking metrics
Prevention#
Proactive Monitoring#
Set up alerts for:
Connection utilization thresholds: Alert when active connections exceed 80% of
worker_connectionslimitConnection queueing: Alert when
nginx_connections_waitingis non-zero for >30 secondsP99 response time degradation: Alert when P99 exceeds baseline by 3x (e.g., >1.5s if baseline is 500ms)
CPU utilization: Alert when CPU usage exceeds 80% for >5 minutes
5xx error rate increases: Alert on sharp increases in 5xx response codes
Configuration Best Practices#
Size worker connections appropriately: Use sizing formula:
(GPU_count * concurrency) / replicas / vCPU * 1.5Provision adequate CPU: Allocate vCPUs to match connection handling needs (one worker process per core)
Scale replicas for network bandwidth: Provision at least 3.3 Gbps per GPU (minimum 1 Gbps)
Enable HTTP/2: Configure HTTP/2 on both UCC and clients to multiplex requests over fewer connections
Enable session affinity: Configure load balancer session affinity (30-60s timeout) to improve retry efficiency
Monitor connection trends: Track connection usage over time to predict when scaling is needed
Plan for traffic spikes: Size capacity for peak concurrent simulations, not average load
Capacity Planning#
Calculate connection requirements: Use formula above to determine
worker_connectionsbased on GPU count and client concurrencyAccount for multiple concurrent simulations: Multiply requirements by concurrent simulation count
Provision headroom: Add 50% safety margin to calculated values to handle traffic bursts
Plan VM SKU upgrades: Select VM SKUs with adequate vCPUs and network bandwidth for workload
Test under load: Validate configuration with representative workload before production deployment
Monitor during scale-out: Track metrics during GPU fleet expansion to predict UCC scaling needs