![]()
Abdelkareem Elkhateb
Arabic NLP Researcher & AI Engineer
Building efficient AI models, semantic search systems, and production ML pipelines for Arabic and multilingual applications.
Areas of Expertise
Arabic NLP
- Arabic embedding models and sparse autoencoders
- Efficient transformers (Berthash-Femto: 113× smaller)
- Tiny Arabic models for edge deployment
- Arabic ColPali for vision-language retrieval
On-Device & Edge ML
- LiteRT model conversion and optimization
- Qualcomm AI Hub for NPU compilation
- CPU inference for Arabic transformers
- Benchmarking on real Android devices
Semantic Search & RAG
- ColGrep semantic code search
- Arabic embedding-based retrieval
- Self-hosted AI chat widgets
- Production-grade Arabic RAG systems
Production AI Systems
- End-to-end ML pipelines at Xbites (Darin)
- Developer tools (GPUVec, SEO Rat)
- Arabic ASR streaming and tool-calling models
- Bridging research and production
اللغة ليست عِلمًا .. بل هي شيء فوق العلم
لغتنا في خطر داهم .. ونحن أيضًا
“Language is not a science — it is something above science. Our language is in danger — and so are we.”
Research
Arabic Embedding
BertHash-Femto
113× smaller than AraBERTv2, 94% accuracy. An Arabic transformer that runs on edge devices. → GitHub
Zarra & Bojji
Tiny Arabic language models optimized for mobile and edge deployment. → Read more
Vision Language Models
Qari-OCR
State-of-the-art Arabic OCR using multimodal LLM adaptation. Achieves WER 0.160, CER 0.061 on diacritically-rich Arabic text. Published at arXiv 2025. → Paper · → GitHub
Arabic ColPali
Vision-language model for Arabic document retrieval. Retrieves images and documents from Arabic text queries using ColPali architecture. → HuggingFace
Speech and Audio
Ara-Nemotron 3.5 ASR
Streaming Arabic speech recognition model for real-time transcription. Supports multiple Arabic dialects. → HuggingFace
Arabic TTS (in progress)
Building a from-scratch Arabic text-to-speech model targeting all major dialects.
:::
:::
Agentic and Tool Use
Gemma3 Arabic Tool Calling
Fine-tuned Gemma 3 for Arabic function calling. A no-RAG approach that teaches the model to call tools directly in Arabic. → HuggingFace
Tools
GPUVec
GPU pricing comparison and ML benchmarking platform. → Check it
SEO Rat
Open-source SEO analysis tool for static sites using Google Search Console data. → GitHub
Featured Writing
Fine-Tuning Gemma 3 for Arabic Tool Calling
A no-RAG approach to teach Gemma 3 to call tools in Arabic — extracting keywords, delegating to grep, and building training data with MIRACL.
HyperRun + ColGrep: Self-Hosted Alternative to RunLLM
Add an AI-powered “Ask AI” chat widget to any docs site using ColGrep semantic code search and FastHTML.
View all posts → · Embedding series →
فقالَ: إن تَصدُقِ اللَّهَ يَصدُقْكَ
:::
Qabilah