Hi, I'm Khairi 👋

PhD Student & Data Scientist

Bridging the gap between the language of life (proteins) and human language through machine learning and large language models.

Khairi Abidi

About Me

Khairi Abidi

I'm a passionate PhD student and Data Scientist working at the intersection of computational biology and natural language processing. My research focuses on developing novel machine learning approaches to understand and translate between protein sequences (the language of life) and human language.

With a strong background in machine learning, deep learning, and large language models, I'm particularly interested in how we can leverage techniques from NLP to better understand biological sequences and vice versa.

When I'm not training models or analyzing protein sequences, you can find me contributing to open-source projects, mentoring aspiring data scientists, or exploring the latest advancements in AI research.

My Research

Bridging Protein Language and Human Language

Protein Language Models

Investigating how transformer-based architectures can be adapted to understand and generate protein sequences, drawing parallels between amino acid "words" and natural language words.

Developing novel attention mechanisms that capture the unique properties of protein sequences while maintaining the benefits of modern NLP architectures.

Cross-Modal Translation

Exploring methods to translate between protein sequences and their functional descriptions in human language, enabling better interpretation of biological data.

Creating frameworks that allow researchers to query protein databases using natural language and receive meaningful, interpretable results.

Research Objectives

  • Develop protein-specific language models that outperform current sequence alignment methods
  • Create bidirectional translation systems between protein function and natural language descriptions
  • Improve interpretability of protein language models for biological discovery
  • Enable natural language interfaces for protein databases and analysis tools

Featured Projects

ProtLM: Protein Language Model

A transformer-based model trained on millions of protein sequences to predict structure and function.

PyTorch Transformers Bioinformatics
View Project →

BioTranslator

A framework for translating between protein sequences and natural language descriptions of their function.

PyTorch HuggingFace NLP
View Project →

ProteinQA

A question-answering system for protein databases using natural language queries.

Spark Scala LLMs
View Project →
View All Projects

Technical Skills

Core Competencies

Machine Learning
Deep Learning
LLM Training
Data Analysis
Statistical Modeling
Bioinformatics
Natural Language Processing

Technical Tools

Languages

  • Python
  • Scala
  • R

Frameworks

  • PyTorch
  • Transformers
  • Spark

Libraries

  • HuggingFace
  • Pandas/Numpy
  • Scikit-learn

Tools

  • Git
  • Docker
  • AWS/GCP

Experience & Education

Professional Experience

Data Scientist

Biotech Company • 2021-Present

Developing machine learning models for protein sequence analysis and drug discovery. Leading NLP projects to bridge biological and human language.

Machine Learning Engineer

AI Research Lab • 2019-2021

Implemented transformer-based models for various NLP tasks. Optimized distributed training pipelines using PyTorch and Spark.

Education

PhD in Computational Biology

University of Science • 2021-Present

Researching protein language models and their applications in understanding biological sequences and functions.

MSc in Data Science

Tech Institute • 2018-2020

Specialized in machine learning and big data technologies. Thesis on attention mechanisms in sequence models.

Get In Touch

Interested in collaborating or learning more about my work? Feel free to reach out!

Contact Form

Contact Information

Email

khairi.abidi@example.com

LinkedIn

linkedin.com/in/khairiabidi

GitHub

github.com/khairiabidi

Location

Research Lab, University of Science

Let's Connect