Extended context language models and their biological uses

Eric Nguyen, a PhD candidate at Stanford University, joins us today. We discuss his work on lengthy context foundation models, its use in biology in particular, and how it evolved into and models during our chat. We talk about Hyena, a language model built on convolutionals that was created to address the difficulties associated with large context lengths in language modeling. We examine the drawbacks of utilizing transformers for longer sequences, the advantages of convolutional models over transformers, the architecture and training of the models, the function of FFT in computational optimizations, and the explainability of the models in long-sequence convolutions. We also discussed Hyena DNA, a genomic foundation model that was created to identify long-range connections in DNA sequences and was pre-trained on one million tokens. Lastly, Eric presents Evo, a hybrid model with seven billion parameters that combines the convolutional architecture of Hyena DNA with attention layers. We discuss language models for both generating and designing DNA, as well as the trade-offs between state-of-the-art models, zero-shot versus few-shot performance, evaluation benchmarks, and the exciting potential in domains like CRISPR-Cas gene editing.

Source link