Bridging the gap between the language of life (proteins) and human language through machine learning and large language models.
I'm a passionate PhD student and Data Scientist working at the intersection of computational biology and natural language processing. My research focuses on developing novel machine learning approaches to understand and translate between protein sequences (the language of life) and human language.
With a strong background in machine learning, deep learning, and large language models, I'm particularly interested in how we can leverage techniques from NLP to better understand biological sequences and vice versa.
When I'm not training models or analyzing protein sequences, you can find me contributing to open-source projects, mentoring aspiring data scientists, or exploring the latest advancements in AI research.
Investigating how transformer-based architectures can be adapted to understand and generate protein sequences, drawing parallels between amino acid "words" and natural language words.
Developing novel attention mechanisms that capture the unique properties of protein sequences while maintaining the benefits of modern NLP architectures.
Exploring methods to translate between protein sequences and their functional descriptions in human language, enabling better interpretation of biological data.
Creating frameworks that allow researchers to query protein databases using natural language and receive meaningful, interpretable results.
A transformer-based model trained on millions of protein sequences to predict structure and function.
A framework for translating between protein sequences and natural language descriptions of their function.
A question-answering system for protein databases using natural language queries.
Biotech Company • 2021-Present
Developing machine learning models for protein sequence analysis and drug discovery. Leading NLP projects to bridge biological and human language.
AI Research Lab • 2019-2021
Implemented transformer-based models for various NLP tasks. Optimized distributed training pipelines using PyTorch and Spark.
University of Science • 2021-Present
Researching protein language models and their applications in understanding biological sequences and functions.
Tech Institute • 2018-2020
Specialized in machine learning and big data technologies. Thesis on attention mechanisms in sequence models.
Interested in collaborating or learning more about my work? Feel free to reach out!