Abdelkareem Elkhateb

Abdelkareem Elkhateb

Arabic NLP Researcher & AI Engineer

Building efficient AI models, semantic search systems, and production ML pipelines for Arabic and multilingual applications.

Arabic NLP Embeddings On-Device ML Production AI


Areas of Expertise

Arabic NLP

  • Arabic embedding models and sparse autoencoders
  • Efficient transformers (Berthash-Femto: 113× smaller)
  • Tiny Arabic models for edge deployment
  • Arabic ColPali for vision-language retrieval

On-Device & Edge ML

  • LiteRT model conversion and optimization
  • Qualcomm AI Hub for NPU compilation
  • CPU inference for Arabic transformers
  • Benchmarking on real Android devices

Semantic Search & RAG

  • ColGrep semantic code search
  • Arabic embedding-based retrieval
  • Self-hosted AI chat widgets
  • Production-grade Arabic RAG systems

Production AI Systems

  • End-to-end ML pipelines at Xbites (Darin)
  • Developer tools (GPUVec, SEO Rat)
  • Arabic ASR streaming and tool-calling models
  • Bridging research and production

اللغة ليست عِلمًا .. بل هي شيء فوق العلم
لغتنا في خطر داهم .. ونحن أيضًا

“Language is not a science — it is something above science. Our language is in danger — and so are we.”


Research

Arabic Embedding

Zarra & Bojji

Zarra & Bojji

Tiny Arabic language models optimized for mobile and edge deployment. → Read more

Vision Language Models

Arabic ColPali

Vision-language model for Arabic document retrieval. Retrieves images and documents from Arabic text queries using ColPali architecture. → HuggingFace

Speech and Audio

Ara-Nemotron 3.5 ASR

Streaming Arabic speech recognition model for real-time transcription. Supports multiple Arabic dialects. → HuggingFace

Arabic TTS (in progress)

Building a from-scratch Arabic text-to-speech model targeting all major dialects.

:::

:::

Agentic and Tool Use

Gemma3 Arabic Tool Calling

Fine-tuned Gemma 3 for Arabic function calling. A no-RAG approach that teaches the model to call tools directly in Arabic. → HuggingFace


Tools

GPUVec

GPUVec

GPU pricing comparison and ML benchmarking platform. → Check it

SEO Rat

SEO Rat

Open-source SEO analysis tool for static sites using Google Search Console data. → GitHub