FlexCache: Escape Compute Scarcity

The Problem

GPU Scarcity is a Storage Problem

AI and ML workloads are exploding, but GPU and accelerated compute capacity is unevenly distributed across AWS regions and availability zones. When your data lives in us-east-1 but available GPU capacity is in us-west-2, you face an impossible choice: wait months for GPU allocation or perform an expensive, time-consuming full data migration.

✕ GPU/accelerator instances unavailable in your data's home region

✕ Full data migration takes days or weeks and costs millions in egress

✕ Duplicated datasets drift out of sync — introducing model training errors

✕ Cross-region latency makes real-time reads from the origin impractical

Estimated cost impact: Data copy vs. FlexCache for a 500 TB dataset

The Solution

FSx ONTAP FlexCache

FlexCache creates a sparse, read-through cache volume that appears as a full dataset to compute nodes — without physically copying data. Only the blocks your workload actually reads are transferred and cached.

🗄️

Origin Volume

FSx for ONTAP

us-east-1

Full dataset lives here.
Single source of truth.

Read-through

→

on-demand blocks

⚡

FlexCache Volume

FSx for ONTAP

us-west-2

Sparse cache.
Only hot blocks stored.

Low-latency

→

NFS / SMB reads

🖥️

GPU Cluster

p4d / trn1 / inf2

us-west-2

Reads from local cache.
Full training throughput.

🔄

Single Source of Truth

Writes always flow back to the origin. No dataset drift, no reconciliation headaches.

💾

Pay for What You Read

Only accessed blocks are cached. A 500 TB dataset may need only 50 TB of local cache storage.

🌐

Native NFS / SMB

No app changes required. GPU training scripts mount the cache volume exactly like a local filesystem.

Business Value

Break Free from Compute Constraints

FlexCache turns a capacity problem into a non-event. Instead of waiting for GPU allocations to open up in your home region, spin up a cache in any region where instances are available and start training within minutes.

🚀

Time-to-Train

Start training in hours instead of weeks. No waiting for data migration pipelines.

💰

Egress Savings

Cache only the data you read. Avoid paying to transfer a 500 TB dataset you use 10% of.

🔒

Data Governance

Origin remains the master copy — security policies, encryption, and auditing stay centralized.

⚡

Low Latency

After the first read, hot blocks are served from the local cache at full NVMe/SSD speeds.

Typical read latency: cross-region direct access vs. FlexCache (after warm-up)

More Use Cases

FlexCache Beyond GPU Scarcity

Any workload that reads data across geographic or network boundaries benefits from bringing the data closer to compute.

🏭

Multi-AZ Read Scaling

High-Availability Reads

Deploy FlexCache volumes in multiple AZs within the same region. Read-heavy applications (web serving, analytics dashboards) get sub-millisecond local reads without any data replication overhead.

Read scale-out AZ affinity

🧪

Dev / Test Acceleration

Lightweight Sandboxes

Give each developer team a FlexCache volume pointing at the production dataset. Teams read live data without cloning terabytes — and writes are redirected to isolated sandbox volumes, keeping production untouched.

Zero copy Instant provisioning

🌍

Global Collaboration

Distributed Teams

Engineering teams in APAC and EU read the same media, seismic, genomic, or design datasets without suffering cross-Atlantic or cross-Pacific latency. Each region gets a FlexCache; the origin stays authoritative.

Multi-region Single namespace

☁️

Hybrid Cloud Bursting

On-Prem to Cloud

On-premises data served via ONTAP can be cached in AWS for burst compute jobs. FlexCache over AWS Direct Connect gives cloud workloads near-local read latency against on-prem datasets — no lift-and-shift required.

Direct Connect Hybrid

🛡️

DR Read Access

Active Disaster Recovery

DR sites often sit idle, burning budget. FlexCache lets the DR region serve live read traffic — reporting, analytics, model inference — while SnapMirror keeps the origin authoritative. Your DR investment starts delivering ROI on day one.

Active-passive Read offload

📡

Edge / Satellite Offices

Remote Data Locality

Retail, manufacturing, and media organizations with remote sites can run FlexCache in the nearest AWS region, pulling only the files each site actually needs. Employees get fast local-equivalent performance without maintaining separate storage infrastructure at every site.

Edge Distributed

Solution approach comparison across key dimensions

Why FlexCache Wins

vs. The Alternatives

Teams facing compute scarcity typically reach for one of three alternatives — each with significant drawbacks.

📦

Full Dataset Copy (S3 / EFS)

Weeks of transfer time, full egress cost, ongoing sync complexity. Datasets drift out of sync, causing silent training errors.

⏳

Wait for GPU Availability

Weeks or months of delay. Missed product deadlines, wasted engineering cycles, and competitive disadvantage.

🌐

Read Directly Cross-Region

100–250 ms inter-region latency cripples random-read workloads. GPU utilization collapses waiting on I/O.

⚡

FlexCache (Recommended)

Spin up in minutes. Pay only for blocks read. Single source of truth. No app changes. GPU utilization stays high.

Quick Start

Get Running in 4 Steps

Peer the SVMs

Create a cluster and SVM peer relationship between the origin FSx ONTAP file system (us-east-1) and the cache file system (us-west-2).

Create FlexCache

Run volume flexcache create on the cache cluster, pointing to the origin volume. Size the cache to cover your hot working set.

Mount & Train

Mount the FlexCache volume on your GPU instances via NFS. Point your training framework at the mount path. No code changes needed.

Pre-warm (Optional)

Use volume flexcache prepopulate to pre-stage hot files before training starts — eliminating even first-read latency.