Snapchat Boosts Data Processing with NVIDIA & Google Cloud | 4x Faster Experiments

Snapchat’s parent company, Snap, is significantly reducing the costs associated with its A/B testing infrastructure by leveraging open data processing libraries from NVIDIA on Google Cloud services. The shift to GPU-accelerated data processing has yielded a 76% reduction in daily costs, according to internal data collected between January 1 and February 28, 2026.

Every latest feature released to Snapchat, which boasts over 940 million monthly active users, undergoes rigorous A/B testing. This process involves analyzing variables across subsets of users and measuring approximately 6,000 metrics related to engagement, app performance, and monetization. Snap runs thousands of these experiments monthly, processing over 10 petabytes of data within a three-hour window each morning using the Apache Spark distributed framework.

The core of the improvement lies in the adoption of Apache Spark accelerated by NVIDIA cuDF. This allows Snap to achieve up to 4x faster runtime with the same number of machines, providing a cost-effective path to scale. Prudhvi Vatala, senior engineering manager at Snap, stated, “Experimentation is at the core of our company. Changing our data infrastructure from CPUs to GPUs allows us to efficiently scale this experimentation to more features, more metrics and more users over time.”

The transition wasn’t simply a hardware upgrade. Snap paired NVIDIA’s GPU-optimized software, including the CUDA-X libraries, with Google’s infrastructure management services, specifically Google Kubernetes Engine. This created a full-stack platform optimized for large-scale data processing. The company also utilized cuDF microservices, which automatically validate, test, configure, and optimize Spark workloads for large-scale GPU-accelerated environments.

Collaboration with NVIDIA experts further refined the process, optimizing pipelines on Google Cloud’s G2 virtual machines equipped with NVIDIA L4 GPUs. Data collected between January 1 and March 13, 2026, revealed a reduction in the number of simultaneously required GPUs from approximately 5,500 to around 2,100.

Joshua Sambasivam, a backend engineer on Snap’s A/B testing team, noted the unexpectedly significant impact of the change. “The initial results were really amazing. We saw much larger cost savings than anticipated, and the Spark accelerator fit our workload perfectly.”

Snap plans to expand the use of Spark acceleration beyond the A/B testing team, applying it to a broader range of production workloads. Vatala indicated that the company had only migrated its two largest pipelines as of March 17, 2026, suggesting substantial further opportunities for optimization and cost savings. “We didn’t realize how much potential we had,” Vatala said. “We’ve migrated the two largest pipelines so far, and there are many more opportunities ahead.”

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.