Home Enterprise AWS EC2 Trn1 Instances Now Available

AWS EC2 Trn1 Instances Now Available

by Lyle Smith
AWS Trn1 instance - ultracluster scale out

Amazon Web Services (AWS) has announced the general availability of AWS EC2 Trn1 instances. Powered by AWS-designed Trainium chips, Trn1 instances are specifically designed for the high-performance training of machine learning models (in the cloud) with Amazon indicating a reduction of 50% in “cost-to-train” savings when compared to similar GPU-based instances.

Amazon Web Services (AWS) has announced the general availability of AWS EC2 Trn1 instances. Powered by AWS-designed Trainium chips, Trn1 instances are specifically designed for the high-performance training of machine learning models (in the cloud) with Amazon indicating a reduction of 50% in “cost-to-train” savings when compared to similar GPU-based instances.

AWS EC2 Trn1 - ultracluster scale out

AWS EC2 Trn1 instances provide the fastest time to train popular machine learning models on AWS. This allows their customers to lessen training times, quickly iterate on models to increase accuracy, and improve overall productivity for workloads such as natural language processing, speech and image recognition, semantic search, recommendation engines, fraud detection, and forecasting.

Trn1 instances are very flexible as far as pricing goes as well, as there are no minimum commitments or upfront fees. Customers also only need to pay for the amount of compute they use.

Sizes and Specifications of AWS EC2 Trn1 Instances

Instance Name vCPUs AWS Trainium Chips Accelerator Memory NeuronLink Instance Memory Instance Networking Local Instance Storage
trn1.2xlarge 8 1 32 GB N/A 32 GB Up to 12.5 Gbps 1x 500 GB NVMe
trn1.32xlarge 128 16 512 GB Supported 512 GB 800 Gbps 4x 2 TB NVMe

Previously, even if organizations leveraged the fastest accelerated instances available, training more complex machine learning models was still both excessively expensive and time-consuming. With the new AWS EC2 Trn1 instances, Amazon indicates they boast the best price performance and the fastest machine learning model training on AWS.

Other notable features include the following:

  • Those looking to get started without significantly changing code can use AWS Neuron, the software development kit (SDK) for Trn1 instances. It is also integrated into popular frameworks for machine learning like PyTorch and TensorFlow.
  • Trn1 instances feature up to 16 AWS Trainium accelerators that are specifically designed for deploying deep learning models.
  • To improve efficiency, Trn1 is the first Amazon EC2 instance to offer up to 800Gbps in networking bandwidth via the 2nd-gen AWS Elastic Fabric Adapter (EFA) network interface.
  • To speed up training, Trn1 instances also use NeuronLink–a high-speed, intra-instance interconnect.

Amazon EC2 UltraClusters

Customers can deploy Trn1 instances in Amazon EC2 UltraClusters (comprised of tens of thousands of Trainium accelerators) to quickly train the most complex deep learning models, even those with trillions of parameters. With EC2 UltraClusters, organizations have the ability to scale the training of machine learning models with up to 30,000 Trainium accelerators interconnected with EFA petabit-scale networking. Amazon indicates that these organizations will therefore have on-demand access to supercomputing-class performance, which can significantly cut training time that usually takes months to just days.

Each AWS EC2 Trn1 instance supports up to 8TB of speedy local NVMe SSD storage, while AWS Trainium supports a wide range of data types (FP32, TF32, BF16, FP16, and configurable FP8). It also supports stochastic rounding, a method based on probability, to enable high performance and higher accuracy. In addition, AWS Trainium supports dynamic tensor shapes and custom operators, which promotes a flexible infrastructure designed to adapt based on customer training needs.

AWS Nitro System

Trn1 instances are built on the AWS Nitro System, a collection of AWS-designed hardware and software innovations that streamline the delivery of isolated multi-tenancy, private networking, and fast local storage. In order to deliver the necessary performance, the Nitro System offloads the CPU virtualization, storage, and networking functions to dedicated hardware and software.

AWS EC2 Trn1 Instances Availability

AWS Trn1 instances can be purchased now as On-Demand Instances (with Savings Plans), Reserved Instances, or Spot Instances. Currently, they are available in US East (North Virginia) and US West (Oregon), with expanded availability in other AWS Regions soon.

They will also be available through the following other AWS services:

  • Amazon SageMaker
  • Amazon Elastic Kubernetes Service (Amazon EKS)
  • Amazon Elastic Container Service (Amazon ECS)
  • AWS Batch

AWS Trn1 instances

Engage with StorageReview

Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed