RisiAi Logo

RisiAi Consulting

AI Strategy & Implementation Expert

← Back to Architectures
Advanced Level Interactive

AWS FSx for Linux Architecture

Linux-based file storage architecture with FSx for Lustre and EFS integration

Services

FSx for Lustre EFS EC2 HPC

Use Case

High-Performance Computing Storage

🐧

AWS FSx for Linux

High-Performance Linux File Systems · Lustre · OpenZFS · NetApp ONTAP
FSx for Lustre delivers sub-millisecond latency, up to hundreds of GB/s throughput, and millions of IOPS. Purpose-built for HPC, ML training, media processing, and EDA workloads. Natively integrates with S3 as a data repository.
1
Compute & Client Access
Amazon EC2 (Linux)
Amazon Linux 2023 / Ubuntu
Lustre Client (lustre-client-modules)
$ sudo amazon-linux-extras install lustre
$ sudo mount -t lustre \
  fs-0abc.fsx.us-east-1.amazonaws.com@tcp:/fsx /mnt/lustre
EKS / ECS Containers
FSx CSI Driver (fsx-csi-driver)PersistentVolumeClaim
Kubernetes-native storage class for dynamic provisioning. Shared volumes across pods for distributed workloads.
HPC & ML Workloads
AWS ParallelClusterSageMaker Training JobsAWS Batch
Lustre's parallel I/O architecture provides the throughput needed for distributed training across hundreds of GPU nodes.
DEV
Python Management (boto3)
SDK
import boto3

fsx = boto3.client('fsx',
    region_name='us-east-1')

# List file systems
response = fsx.describe_file_systems(
    FileSystemIds=['fs-0abc123']
)

fs = response['FileSystems'][0]
print(f"DNS: {fs['DNSName']}")
print(f"Storage: {fs['StorageCapacity']}GB")
print(f"Type: {fs['FileSystemType']}")

# Create data repo association
fsx.create_data_repository_association(
    FileSystemId='fs-0abc123',
    FileSystemPath='/datasets',
    DataRepositoryPath='s3://my-data',
    S3={
      'AutoImportPolicy': {
        'Events': ['NEW','CHANGED']
      },
      'AutoExportPolicy': {
        'Events': ['NEW','CHANGED']
      }
    }
)
Lustre TCP
2
VPC & Network Architecture
AZ-A
Availability Zone A
Primary Subnet
Private Subnet A
10.0.1.0/24 · FSx ENI + Compute
AZ-B
Availability Zone B
Compute Overflow
Private Subnet B
10.0.2.0/24 · Compute nodes
Security Groups
TCP 988 — Lustre
TCP 1018–1023 — Lustre
All other — DENY
VPC Endpoints
com.amazonaws.fsx
com.amazonaws.s3 (Gateway)
com.amazonaws.kms
com.amazonaws.logs
DNS & Routing
Route 53 Resolver
VPC DHCP Option Sets
Transit Gateway (multi-VPC)
On-prem via Direct Connect
Private ENI
3
Amazon FSx — Lustre
Primary
File System Config
Persistent / Scratch
DeploymentPERSISTENT_2
Storage64 TB SSD
Throughput1,000 MB/s/TiB
CompressionLZ4 Enabled
Stripe CountAuto
Root Squash65534:65534
PERSISTENT_2 = Multi-AZ metadata
S3 Data Repository
Auto-Import (NEW + CHANGED)Auto-Export (NEW + CHANGED + DELETED)Lazy Load on First AccessHSM Archive to S3
Transparent S3 linking. Data is loaded from S3 on first read and can be written back automatically. HSM commands for manual tiering.
Performance Profile
Max Throughputhundreds GB/s (cluster)
IOPSmillions (parallel)
Latencysub-millisecond
Stripe WidthOSTs × chunk size
Parallel file system — performance scales linearly with storage capacity. Each TiB adds throughput.
DR
Backup & Recovery
Auto BackupDaily · 35 days
S3 ExportHSM archive command
Cross-RegionS3 replication
Data persists in S3. Scratch file systems are ephemeral — use PERSISTENT for durability.
KMS
IAM
Audit
4
Security & Encryption
AWS KMS — CMK
Customer Managed Key
AES-256 encryption at rest
Annual automatic key rotation
Key policy: fsx.amazonaws.com only
CloudTrail key usage audit
IAM & Resource Policies
fsx:CreateFileSystem — Admin only
fsx:DeleteFileSystem — Deny SCP
fsx:CreateBackup — Ops role
fsx:TagResource — Required tags
POSIX Permissions
UID / GID mapping
Root squash (Lustre)
NFS export options (OpenZFS)
ACL support (ONTAP)
5
Monitoring & Observability
CloudWatch Metrics
DataReadBytes / DataWriteBytes
FreeDataStorageCapacity
MetadataOperations (Lustre)
Alarm: Storage < 10% free
Alarm: Throughput > 80% cap
CloudTrail
All FSx API calls logged
S3 bucket with lifecycle
EventBridge rule triggers
Cost Management
Per-GB storage cost tracking
Throughput capacity billing
Budget alerts via SNS
Cost allocation tags
6
Linux FSx Comparison
FeatureLustre ⚡OpenZFS 🗃️NetApp ONTAP 🔷
ProtocolLustreNFS (v3/v4)NFS + SMB + iSCSI
Max Throughputhundreds GB/s12.5 GB/s4 GB/s
Latencysub-mssub-mssub-ms (SSD tier)
S3 IntegrationNative (DRA)ManualFabricPool tiering
Multi-AZMetadata onlyYesYes
SnapshotsNoYes (COW)Yes (NetApp Snap)
CloningNoYes (instant)Yes (FlexClone)
CompressionLZ4ZSTD / LZ4 / GZIPInline + Post
Best ForHPC, ML, EDAGeneral LinuxEnterprise hybrid
Deployment Checklist
1
Choose FSx Type
Lustre for HPC/ML throughput. OpenZFS for general Linux NFS. NetApp ONTAP for multi-protocol enterprise.
2
Provision VPC & Subnets
Create private subnets in target AZs. Configure security groups for protocol ports (988/2049/3260).
3
Create KMS CMK
Customer-managed key with annual rotation. Key policy scoped to fsx.amazonaws.com service role.
4
Deploy File System
Select deployment type, storage capacity, throughput tier. Link S3 data repository (Lustre) or configure volumes.
5
Configure IAM Policies
Least-privilege roles. SCP to deny deletion. Required tagging enforcement. Cross-account access if needed.
6
Mount on Clients
Install Lustre client or nfs-utils. Add /etc/fstab entry. Configure CSI driver for EKS workloads.
7
Enable Monitoring
CloudWatch alarms for capacity and throughput. CloudTrail for API audit. Budget alerts for cost control.
8
Validate & Benchmark
Run IOR/fio benchmarks. Test backup + restore. Verify security group rules. Load test with production workload.
AWS FSx for Linux · Lustre · OpenZFS · NetApp ONTAP · Production Reference Architecture

Ready to Build?

This architecture can be customized for your specific needs. Let's discuss how to implement this pattern for your organization, or explore variations that better match your requirements.

Start a Project