Deploy Custom Nova Models with Amazon SageMaker Inference | AWS

by Rachel Kim – Technology Editor

Amazon Web Services (AWS) has made generally available its SageMaker Inference service for custom Amazon Nova models, enabling businesses to deploy and scale customized AI models with greater control and efficiency. The launch, announced today, follows initial customization capabilities released at the AWS Summit in New York City in 2025.

The new service allows customers to train Nova Micro, Nova Lite, and Nova 2 Lite models using Amazon SageMaker Training Jobs or Amazon HyperPod, and then deploy them with managed inference infrastructure through Amazon SageMaker AI. Previously, customers requested the same capabilities available for customizing open-weight models in Amazon SageMaker Inference be extended to Amazon Nova, along with more granular control over instance types, auto-scaling, context length, and concurrency settings.

According to AWS, SageMaker Inference for custom Nova models reduces inference costs through optimized GPU utilization, leveraging Amazon Elastic Compute Cloud (Amazon EC2) G5 and G6 instances over P5 instances. Auto-scaling, based on five-minute usage patterns, and configurable inference parameters further contribute to cost reduction. The service supports continued pre-training, supervised fine-tuning, or reinforcement fine-tuning for specific use cases.

Deployment options include utilizing the SageMaker Studio interface or the SageMaker AI SDK. Supported instance types at the general availability launch include g5.12xlarge, g5.24xlarge, g5.48xlarge, g6.12xlarge, g6.24xlarge, g6.48xlarge, and p5.48xlarge for the Nova Micro model; g5.48xlarge, g6.48xlarge, and p5.48xlarge for the Nova Lite model; and p5.48xlarge for the Nova 2 Lite model.

The service supports both synchronous endpoints for real-time inference, with streaming and non-streaming modes, and asynchronous endpoints for batch processing. An example provided by AWS demonstrates a streaming chat request with parameters for max tokens, temperature, top_p, top_k, logprobs, and reasoning effort.

Amazon SageMaker Inference for custom Nova models is currently available in US East (N. Virginia) and US West (Oregon) AWS Regions. AWS has published a regional availability roadmap on its AWS Capabilities by Region page. Pricing information is available on the Amazon SageMaker AI Pricing page.

AWS encourages users to attempt the service in the Amazon SageMaker AI console and provide feedback through AWS re:Post for SageMaker or standard AWS Support channels.

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.