Scalability#

This section provides guidance on scaling the Storage Service for larger deployments. All settings below refer to the Storage Service Helm chart values.

Sizing guideline#

For a cluster of 100 GPUs, use 5 storage nodes.

Set the number of replicas with replicaCount (and optionally replicaMinCount, replicaScalingFactor; the effective count is max(replicaMinCount, ceil(replicaCount / replicaScalingFactor))). This ratio is a practical starting point; adjust based on your workload (e.g. request rate, object size, and use of metadata or enumeration). Monitor metrics (see Observability (Metrics, Traces, Logs)) and scale up or down as needed.

How the Storage Service scales#

The Storage Service can be scaled by running multiple replicas. It maintains internal caches to improve performance of storage operations (e.g. metadata and listing). Whether scaling is appropriate depends on cache behavior and your client workloads:

  • Without bucket notifications — Cache entries are invalidated after a time-to-live (TTL). You can scale by increasing replicaCount. Each pod has its own cache. TTL is configured per cache in the Helm values: config.smallObjectCache.timeToLive, config.statCache.timeToLive, and config.listCache.timeToLive (and config.listCache.enabled if you use the list cache).

    • Stale reads are possible: clients may see slightly outdated data for non-version-specific (e.g. “latest”) objects until the TTL expires. If your client workloads can tolerate that—i.e. they read “latest” and do not require immediate consistency—scaling with caches and TTL is fine. To reduce staleness, lower the TTL values or disable caches (config.smallObjectCache.enabled, config.statCache.enabled, config.listCache.enabled) at the cost of more backend calls.

  • With bucket notifications enabled — Bucket notifications are enabled when config.storageEvents.sqs.enabled or config.storageEvents.azureServiceBus.enabled is true. The service then invalidates caches when it receives storage events (e.g. object created or deleted) from the configured queue; config.statCache.invalidateOnUpdate and config.listCache.invalidateOnUpdate control whether those caches are invalidated on writes and on notification events. The current Helm chart does not support multiple pods all consuming from the same queue in a recommended way.
    • When bucket notifications are enabled, run a single Storage Service instance: set replicaCount to 1 (and ensure replicaScalingFactor does not increase the effective replica count above 1).

Summary#

  • Use 5 storage nodes for 100 GPUs as a starting point; set replicaCount (and related replica values) and tune from metrics.

  • You can scale with multiple replicas when config.storageEvents.sqs.enabled and config.storageEvents.azureServiceBus.enabled are both false—with or without internal caches. With caches and TTL (config.smallObjectCache.timeToLive, config.statCache.timeToLive, config.listCache.timeToLive), clients may see stale reads for non-version-specific objects until TTL expiry; if workloads tolerate that (e.g. reading “latest”), this is acceptable.

  • With notifications enabled (either config.storageEvents.sqs.enabled or config.storageEvents.azureServiceBus.enabled set to true), run a single instance (replicaCount = 1).