Amazon SageMaker Launches Bidirectional Streaming for real-time AI Inference
SEATTLE – Amazon SageMaker AI Inference has added support for bidirectional streaming, dramatically reducing latency in real-time applications like speech-to-text transcription. The new capability enables continuous processing of audio and immediate return of partial transcripts as speech occurs, paving the way for more responsive voice agents.
Previously, developers building AI-powered voice applications faced challenges due to the lack of managed infrastructure for bidirectional streaming.This often required building and maintaining custom WebSocket implementations and streaming protocols – a process that could consume weeks of engineering time.
Now, with the Bidirectional Stream API, users can deploy speech-to-text models simply by invoking their endpoint. SageMaker AI automatically establishes a WebSocket connection to the user’s container via an HTTP2 connection,handling the complexities of streaming audio frames and delivering partial transcripts in real-time.The system is designed to work with any container implementing the SageMaker AI contract, meaning existing models like those from Deepgram can be deployed without code modifications.
This advancement eliminates important infrastructure development overhead,allowing data scientists and machine learning engineers to concentrate on model accuracy and agent functionality.
Bidirectional streaming is currently available in the following AWS Regions: Canada (Central), South America (São Paulo), Africa (Cape Town), Europe (Paris), Asia Pacific (Hyderabad), Asia Pacific (Jakarta), Israel (Tel Aviv), Europe (Zurich), Asia Pacific (Tokyo), AWS GovCloud US (West), AWS GovCloud US (East), Asia Pacific (Mumbai), Middle East (Bahrain), US West (Oregon), China (Ningxia), US West (Northern California), Asia Pacific (Sydney), Europe (London), Asia Pacific (Seoul), US East (N. Virginia), Asia Pacific (Hong Kong), US East (Ohio), China (Beijing), Europe (Stockholm), Europe (Ireland), Middle East (UAE), Asia Pacific (Osaka), Asia Pacific (Melbourne), Europe (Spain), Europe (Frankfurt), Europe (Milan), Asia Pacific (Singapore).
Further information can be found on the AWS News Blog here and in the SageMaker AI documentation here.