Microsoft’s AI Red Team: Stress-Testing AI for Security Risks & Threats

Since 2018, a team within Microsoft has been working to proactively identify vulnerabilities in the company’s artificial intelligence systems before they are released to the public. Known as the AI Red Team, the group simulates attacks against AI models and applications, exploring potential safety and security concerns ranging from loss of control to risks involving chemical, biological, and nuclear threats.

The team’s work extends beyond simply testing prompts. Researchers evaluate whether AI-generated code compiles and runs, and whether certain programming languages increase the likelihood of harmful outputs. In one instance, the Red Team collaborated with other Microsoft researchers to assess the potential for AI to assist in cyberattacks, including generating or refining malware. Researchers framed requests in benign terms, such as describing a student project or security research, then attempted to elicit increasingly detailed outputs from the systems.

According to Pete Bryan, principal AI security research lead on the Red Team, the systems sometimes produced code comparable to that which a low- to mid-level hacker might create. Following this discovery, the team refined detection systems to better flag such behavior. “We witness a really, really diverse set of tech,” says Tori Westerhoff, principal AI security researcher on the Microsoft AI Red Team. “Part of the kind of magic of the team is that we can see anything from a product feature to a system to a copilot to a frontier model, and we get to see how tech is integrated across all of those, and how AI is growing, and evolving.”

The need for such internal security measures comes as AI systems face increasing scrutiny for potential harms. Recent criticism has focused on allegations that AI software has contributed to mental illness and suicide, facilitated the creation of nonconsensual deepfake images, and aided malicious actors in cybercrime. Simultaneously, methods for bypassing AI safeguards are becoming more sophisticated, utilizing techniques such as poetic prompts and surreptitious data injection.

The broader AI cybersecurity landscape is rapidly evolving, with a growing number of companies focused on securing AI models, data, and infrastructure. According to a report published in January 2026, 74% of IT security professionals have experienced critical impacts from AI-fueled cyberattacks, highlighting the increasing urgency of the issue. Companies like CrowdStrike, Cybereason, and Palo Alto Networks are among those leading the charge in developing AI-driven cybersecurity platforms, while others, such as Mindgard, specialize in autonomous red teaming and continuous security testing for artificial intelligence. Vectra AI focuses on network detection and response, using machine learning to analyze traffic and user behavior.

As of March 11, 2026, Guard Eye – Proactive Security, a Canadian-based consulting firm, is likewise offering AI security services, according to rankings published by Clutch.

Microsoft’s AI Red Team: Stress-Testing AI for Security Risks & Threats

Share this:

Related