Home » Technology » AI Predicts Novel Proteins by Learning Bacterial Genome Patterns

AI Predicts Novel Proteins by Learning Bacterial Genome Patterns

by Rachel Kim – Technology Editor

Evo: ⁣A New Approach to Genomic ​Sequencing

Researchers⁣ have developed a system called Evo that leverages the power of large language models (LLMs) to understand and ​generate genomic sequences.The core principle behind Evo ⁤is its ability to “link nucleotide-level patterns to kilobase-scale genomic context,” meaning it‌ can interpret genomic DNA fragments much like an LLM interprets text, and produce relevant outputs. This allows evo to predict and even create functional genetic⁣ material.

Initial tests focused on Evo’s‌ ability to reconstruct known proteins. When provided with incomplete gene sequences – as little as 30% – Evo successfully predicted⁤ a meaningful portion of ​the missing information, ​achieving⁢ 85% completion. With 80% of ⁣a gene sequence‌ provided, Evo accurately‌ reconstructed the entire sequence. Furthermore,it⁤ demonstrated the ability ⁢to identify and reinstate missing genes within established​ functional clusters.

This accuracy stems from Evo’s extensive​ training on a vast dataset of bacterial genomes. This ⁤training enabled the system to recognise critical regions within proteins and, when making alterations to sequences, to‍ confine those ⁤changes to‍ areas were genetic variation is naturally tolerated. Essentially, Evo has learned the evolutionary constraints governing gene structure and function.

To explore Evo’s potential for innovation, the researchers challenged it to generate entirely new ‍protein sequences. They⁣ focused on bacterial toxins, frequently ​enough ‌paired‍ with antitoxins to protect⁣ the producing ⁤cell. The team designed ⁤a novel toxin, distantly related to known toxins and lacking a corresponding antitoxin, and ‍used its sequence as⁤ a prompt for Evo. To ensure novelty,​ any generated sequences ⁣resembling known antitoxins were ⁣excluded ​from the results. This experiment aimed to ⁤determine if Evo could produce genuinely new genetic information, rather than simply ⁢replicating existing sequences.

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.