AT&T Cuts AI Costs 90% with Small Language Models & Agent Orchestration

by Rachel Kim – Technology Editor

AT&T is achieving up to 90% cost savings in its AI operations by shifting away from reliance on massive language models and embracing a multi-agent system built on smaller, more focused AI components, according to the company’s chief data officer, Andy Markus.

The telecom giant faced a significant scaling challenge with its daily usage of 8 billion tokens, making it impractical and expensive to process everything through large reasoning models. Markus and his team responded by reconstructing the orchestration layer for their internal “Ask AT&T” personal assistant, creating a stack leveraging LangChain where “super agents” – large language models – direct the function of smaller “worker” agents designed for specific tasks.

This new architecture has dramatically improved speed, latency, and response times, Markus told VentureBeat. The cost reductions are a key benefit, but the approach likewise allows for greater flexibility. “I believe the future of agentic AI is many, many, many modest language models (SLMs),” Markus said. “We find small language models to be just about as accurate, if not as accurate, as a large language model on a given domain area.”

The re-architected system, deployed with Microsoft Azure, powers “Ask AT&T Workflows,” a no-code/low-code agent builder now available to AT&T employees. The tool allows users to automate tasks by dragging and dropping agents that access proprietary AT&T tools for document processing, natural language-to-SQL conversion, and image analysis. “As the workflow is executed, it’s AT&T’s data that’s really driving the decisions,” Markus explained. “We’re asking questions of our data, and we bring our data to bear to make sure it focuses on our information as it makes decisions.”

Despite the automation, human oversight remains central to the process. All agent actions are logged, data is isolated, and role-based access controls are enforced as workloads are passed between agents. “Things do happen autonomously, but the human on the loop still provides a check and balance of the entire process,” Markus said.

AT&T is prioritizing the use of readily available, “interchangeable and selectable” models rather than attempting to build everything from scratch. The company plans to replace internally developed tools with off-the-shelf options as industry functionality matures. “Since in this space, things change every week, if we’re lucky, sometimes multiple times a week,” Markus said. “We require to be able to pilot, plug in and plug out different components.”

The company rigorously evaluates both its own and external models. AT&T’s “Ask Data with Relational Knowledge Graph” currently leads the Spider 2.0 text-to-SQL accuracy leaderboard, and other tools have performed well on the BERT SQL benchmark. The team utilizes LangChain as a core framework, fine-tunes models with retrieval-augmented generation (RAG) and proprietary algorithms, and leverages Microsoft’s search functionality for its vector store.

Markus cautioned against overcomplicating AI solutions, emphasizing the importance of accuracy, cost, and responsiveness. He suggested builders should assess whether a task truly requires an agentic approach, considering whether a simpler, single-turn generative solution could achieve sufficient accuracy or if the task can be broken down into smaller, more precise components. “Sometimes we over complicate things,” he said. “Sometimes I’ve seen a solution over engineered.”

“Ask AT&T Workflows” has been deployed to over 100,000 employees, with more than half reporting daily usage. Active users have reported productivity gains as high as 90%. Markus noted that “stickiness” – repeated use – is a key indicator of success.

The agent builder offers both a pro-code option, allowing users to program in Python, and a no-code visual interface. Surprisingly, even technically proficient users are increasingly opting for the low-code drag-and-drop interface. At a recent hackathon, more than half of the participants – all experienced programmers – chose the no-code option.

Employees are utilizing the agents across various functions. For example, network engineers are building agents to address alerts and restore connectivity for customers. These agents can correlate telemetry data to identify network issues, check change logs, and open trouble tickets. Further agents can then propose solutions, write code to implement patches, and generate summaries with preventative measures. A human engineer oversees the entire process, ensuring the agents perform as expected.

This approach of breaking down complex tasks into smaller, purpose-built components is also transforming AT&T’s software development process, leading to what Markus calls “AI-fueled coding.” He likened the process to RAG, where developers use agile methods and “function-specific” build archetypes within an integrated development environment (IDE). The resulting code is nearly production-ready, requiring minimal iteration, unlike “vibe coding” which often requires extensive refinement.

Markus believes this technique is “tangibly redefining” the software development cycle, shortening timelines and increasing the output of production-grade code. Non-technical teams can also use plain language prompts to build software prototypes. His team recently built an internal data product in 20 minutes using this method, a task that would have previously taken six weeks. “We develop software with it, modify software with it, do data science with it, do data analytics with it, do data engineering with it,” Markus said. “So it’s a game changer.”

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.