kareem’s Blog

~~~~— title: Zaraah Embedding models description: Zaraah model2vec family model analysis and testing on Arabic Embeddings tasks. date: 2025-05-15 categories: - blogging - embedding - minishlab - model2vec - arabic image: images/minishlab.jpg order: 1 draft: false featured: true author: kareem execute: echo: false jupyter: python3 —

from model2vec

Arabic Embedding Models

This blog post introduces the Zaraah family of static embedding models, designed for Arabic language tasks and built using the model2vec distillation technique from MinishLab.

These models distill knowledge from larger transformer models, such as SBERT, into compact, efficient embeddings.

This approach balances performance with speed and resource efficiency.

Below, I explore the Zaraah models, their relationship to Potion models, their strengths and limitations, and their applications in Arabic embedding tasks.

What are Potion Models?

Potion models combine innovative techniques to create high-performing, compact static embeddings.

I liken them to Bojji from Ousama Ranking small in size but capable of competing with giants like Jina AI and BGE models.

Key features of Potion models include:

Superior Performance: They outperform traditional static embeddings like GloVe and FastText across various tasks, matching the performance of models like all-MiniLM-L6-v2 in English.
Compact Size: With approximately 2–4 million parameters, they are ~55 times smaller than GloVe, with model sizes ranging from 8 MB to 30 MB.
Efficiency: Designed for CPU execution and browser-based applications, they are ideal for edge devices and low-resource environments.
MTEB Performance: They achieve an average MTEB score above 50%, making them highly competitive for their size.

What is the model2vec Distillation Method?

The model2vec distillation method addresses the challenge of creating fast, compact sentence transformers.

It transforms large sentence transformer models into static embeddings that are up to 500x faster and 15x smaller, with only a minor performance trade-off.

Unlike traditional methods like GloVe, model2vec captures knowledge from large sentence transformers, producing uncontextualized word vectors.

While this sacrifices some contextual nuance, it offers significant advantages in:

Speed: Up to 500x faster inference.
Size: Models reduced by up to 50x, ranging from 8 MB to 30 MB.
Versatility: Sufficient word representations for most NLP applications.

For more details, refer to the MinishLab blog and GitHub repository.

Jina Embeddings v3 for Arabic

The jina-embeddings-v3 model is currently the top-performing open-source, zero-shot embedding model for Arabic on the MTEB leaderboard. It excels across various tasks and has been validated in production for Arabic applications.

However, its large size and high memory requirements make it computationally expensive and slow compared to other embedding models. To address this, I used model2vec to create a compact Arabic version, the Zaraah family, which retains strong performance while being significantly smaller and faster.

Zaraah Family

The Zaraah models are the first static embedding models for Arabic trained with tokenlearn on the Arabic subset of the C4 dataset.

They are optimized for Arabic-specific tasks and come in multiple sizes:

All variants support float32 and int8 quantization without performance loss, making them highly efficient for resource-constrained environments.

Zaraah Model vs. Competitors

To evaluate Zaraah’s performance, I compared it against several multilingual and Arabic-specific sentence transformer models using MTEB tasks tailored for Arabic.

import os

results_dir = "/home/ai/kobo/bert_world/static_embedding/results"

os.listdir

for k, v in dirs.items():
    print(f"Processing {k}")
    print(os.listdir(v))
    print("=======================================")

import json
import os
import pandas as pd

dirs = {
    "muffakir_embedding": "/home/ai/kobo/bert_world/static_embedding/results/mohamed2811/Muffakir_Embedding/mohamed2811__Muffakir_Embedding/no_revision_available",
    "get_multilingual_base": "/home/ai/kobo/bert_world/static_embedding/results/Alibaba-NLP/gte-multilingual-base/Alibaba-NLP__gte-multilingual-base/ca1791e0bcc104f6db161f27de1340241b13c5a4",
    "arabic_retrieval_v1.0": "/home/ai/kobo/bert_world/static_embedding/results/omarelshehy/Arabic-Retrieval-v1.0/omarelshehy__Arabic-Retrieval-v1.0/no_revision_available",
    "arabic_sts_matryoshka": "/home/ai/kobo/bert_world/static_embedding/results/omarelshehy/Arabic-STS-Matryoshka/omarelshehy__Arabic-STS-Matryoshka/no_revision_available",
    "arabic_triplet_matryoshka_v2": "/home/ai/kobo/bert_world/static_embedding/results/Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2/Omartificial-Intelligence-Space__Arabic-Triplet-Matryoshka-V2/ed357f222f0b6ea6670d2c9b5a1cb93950d34200",
    "gate_arabert-v1": "/home/ai/kobo/bert_world/static_embedding/results/Omartificial-Intelligence-Space/GATE-AraBert-v1/Omartificial-Intelligence-Space__GATE-AraBert-v1/no_revision_available",
    "all_minilm_l6_v2": "/home/ai/kobo/bert_world/static_embedding/results/sentence-transformers/all-MiniLM-L6-v2/sentence-transformers__all-MiniLM-L6-v2/8b3219a92973c328a8e22fadcfa821b5dc75636a",
    "paraphrase-multilingual-MiniLM-L12-v2": "/home/ai/kobo/bert_world/static_embedding/results/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2/sentence-transformers__paraphrase-multilingual-MiniLM-L12-v2/bf3bf13ab40c3157080a7ab344c831b9ad18b5eb",
    "Arabic-MiniLM-L12-v2-all-nli-triplet": "/home/ai/kobo/bert_world/static_embedding/results/Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet/Omartificial-Intelligence-Space__Arabic-MiniLM-L12-v2-all-nli-triplet/6916465c43b984e955aa6dc72851474f0128f428",
    "silma_ai_embedding_sts_v0.1": "/home/ai/kobo/bert_world/static_embedding/results/silma-ai/silma-embedding-sts-v0.1/silma-ai__silma-embedding-sts-v0.1/no_revision_available",
    "jina_zaraah_256_arabic": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_256_arabic/no_model_name_available/no_revision_available",
    "jina_zaraah_256_arabic_int8": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_256_arabic_int8/no_model_name_available/no_revision_available",
    "jina_zaraah_32": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_32_arabic/no_model_name_available/no_revision_available",
    "jina_zaraah_32_int8_arabic": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_32_arabic_int8/no_model_name_available/no_revision_available",
    "jina_zaraah_64_int8_arabic": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_64_int8_arabic/no_model_name_available/no_revision_available",
    # "jina_zaraah_64_arabic": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_64_arabic/no_model_name_available/no_revision_available",
    # "jina_zaraah_16_arabic_int8": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_16_arabic_int8/no_model_name_available/no_revision_available",
    # "jina_zaraah_16_arabic": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_16_arabic/no_model_name_available/no_revision_available",
    # "jina_zarrah_4_arabic_int8": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_4_arabic_int8/no_model_name_available/no_revision_available",
    # "jina_zarrah_4_arabic": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_4_arabic/no_model_name_available/no_revision_available",
    # "jina_zarrah_8_arabic_int8": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_8_arabic_int8/no_model_name_available/no_revision_available",
    # "jina_zarrah_8_arabic": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_8_arabic/no_model_name_available/no_revision_available",
    # "jina_zarrah_2_arabic": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_2_arabic/no_model_name_available/no_revision_available",
    # "jina_zarrah_2_arabic_int8": "/home/ai/kobo/bert_world/static_embedding/results/jina_zaraaah_2_arabic_int8/no_model_name_available/no_revision_available",
    # "jina_zarrah_256_superbpe": "/home/ai/kobo/bert_world/static_embedding/results/jinaai/jina-embeddings-v3_distilled_superbpe_256/no_model_name_available/no_revision_available",
    "potion-multilingual-128M": "/home/ai/kobo/bert_world/static_embedding/results/minishlab/potion-multilingual-128M/minishlab__potion-multilingual-128M/38ebd7f10f71e67fa8db898290f92b82e9cfff2a",
    # "jina_zarrah_256_superbe_int8":"",
    # "jina_zarrah_384_superbe":"/home/ai/kobo/bert_world/static_embedding/results/zaraah_jinav3_v02_385D/no_model_name_available/no_revision_available"
    }

result_files = [
    "STS17.json",
    "STS22.v2.json",
    "MLQARetrieval.json",
    "MassiveIntentClassification.json",
    "MultiHateClassification.json",
    "MIRACLRetrievalHardNegatives.json",
    "XNLI.json",
]

all_data = []

for model_name, model_path in dirs.items():
    model_results = {"model_name": model_name}
    for file_name in result_files:
        json_path = os.path.join(model_path, file_name)
        task_name = file_name.replace(".json", "")  # Use filename as task identifier

        main_score_val = None
        eval_time_val = None

        if os.path.exists(json_path):
            try:
                with open(json_path, "r", encoding="utf-8") as f:
                    data = json.load(f)

                eval_time_val = data.get("evaluation_time")

                # Extract main_score. It can be under 'test', 'validation', or 'dev'.
                # MTEB usually prioritizes 'test', then 'dev' (for retrieval), then 'validation'.
                scores_section = data.get("scores", {})

                score_entry = None
                if "test" in scores_section and scores_section["test"]:
                    score_entry = scores_section["test"][0]
                elif (
                    "dev" in scores_section and scores_section["dev"]
                ):  # For MIRACL style
                    score_entry = scores_section["dev"][0]
                elif (
                    "validation" in scores_section and scores_section["validation"]
                ):  # For XNLI style if test is missing
                    score_entry = scores_section["validation"][0]

                if score_entry:
                    main_score_val = score_entry.get("main_score")

            except json.JSONDecodeError:
                print(f"Error decoding JSON for: {json_path}")
            except Exception as e:
                print(f"An error occurred while processing {json_path}: {e}")
        else:
            print(f"File not found: {json_path}")  # Helpful for debugging

        model_results[f"{task_name}_main_score"] = main_score_val
        # model_results[f"{task_name}_evaluation_time"] = eval_time_val

    all_data.append(model_results)

# Create DataFrame from the collected data
df = pd.DataFrame(all_data)

# Set model_name as index
df.set_index("model_name", inplace=True)

# Calculate the average of main_score columns
score_columns = [col for col in df.columns if col.endswith("_main_score")]
df["Average_main_score"] = df[score_columns].mean(axis=1)

# Create a MultiIndex for columns for better organization
if not df.empty:
    df.columns = pd.MultiIndex.from_tuples(
        [tuple(col.rsplit("_", 1)) if col != "Average_main_score" else ("Average", "main_score") for col in df.columns],
        names=["Task", "Metric"]
    )
    # Sort columns for consistent order: Task Name, then Metric (evaluation_time, main_score)
    df = df.sort_index(axis=1, level=[1, 0])

# Sort DataFrame by Average_main_score in descending order
df = df.sort_values(("Average", "main_score"), ascending=False)

# Create a rich table
console = Console()
table = Table(title="Model Evaluation Summary")

# Add columns to the table with abbreviated names
table.add_column("Model", style="cyan", no_wrap=True)
# Create short aliases for task names (first 5 characters or less if shorter)
task_aliases = {task: task[:5] for task, _ in df.columns}
task_aliases["Average"] = "Avg"  # Short alias for Average column
for task, metric in df.columns:
    table.add_column(task_aliases[task], justify="right")

# Add rows to the table
for model_name, row in df.iterrows():
    row_data = [model_name]
    for value in row:
        row_data.append(f"{value:.4f}" if isinstance(value, (int, float)) and not pd.isna(value) else "-")
    table.add_row(*row_data)

# Display the table
console.print(table)

                                            Model Evaluation Summary                                             
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ Model                                 ┃    Avg ┃  MIRAC ┃  MLQAR ┃  Massi ┃  Multi ┃  STS17 ┃  STS22 ┃  XNLI_ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ arabic_triplet_matryoshka_v2          │ 0.6610 │ 0.6262 │ 0.5093 │ 0.5577 │ 0.5868 │ 0.8531 │ 0.6396 │ 0.8542 │
│ muffakir_embedding                    │ 0.6494 │ 0.6424 │ 0.5267 │ 0.5462 │ 0.5943 │ 0.8485 │ 0.6291 │ 0.7583 │
│ arabic_retrieval_v1.0                 │ 0.6473 │ 0.6159 │ 0.5674 │ 0.5832 │ 0.5993 │ 0.8002 │ 0.6254 │ 0.7393 │
│ gate_arabert-v1                       │ 0.6444 │ 0.5774 │ 0.4808 │ 0.5345 │ 0.5847 │ 0.8278 │ 0.6310 │ 0.8746 │
│ get_multilingual_base                 │ 0.6440 │ 0.7177 │ 0.5698 │ 0.5071 │ 0.5521 │ 0.7881 │ 0.6145 │ 0.7584 │
│ arabic_sts_matryoshka                 │ 0.6413 │ 0.5828 │ 0.4840 │ 0.5457 │ 0.5494 │ 0.8290 │ 0.6242 │ 0.8740 │
│ silma_ai_embedding_sts_v0.1           │ 0.6138 │ 0.3799 │ 0.5011 │ 0.5600 │ 0.5749 │ 0.8559 │ 0.6122 │ 0.8125 │
│ Arabic-MiniLM-L12-v2-all-nli-triplet  │ 0.5431 │ 0.2240 │ 0.3612 │ 0.4775 │ 0.5698 │ 0.8111 │ 0.5540 │ 0.8043 │
│ paraphrase-multilingual-MiniLM-L12-v2 │ 0.5208 │ 0.2191 │ 0.3496 │ 0.4515 │ 0.5573 │ 0.7916 │ 0.4908 │ 0.7859 │
│ jina_zaraah_256_arabic                │ 0.4822 │ 0.2295 │ 0.3473 │ 0.4119 │ 0.5237 │ 0.6469 │ 0.6218 │ 0.5942 │
│ jina_zaraah_256_arabic_int8           │ 0.4809 │ 0.2313 │ 0.3464 │ 0.4121 │ 0.5256 │ 0.6460 │ 0.6113 │ 0.5936 │
│ potion-multilingual-128M              │ 0.4699 │ 0.1658 │ 0.3150 │ 0.4285 │ 0.5338 │ 0.6511 │ 0.5951 │ 0.5999 │
│ jina_zaraah_64_int8_arabic            │ 0.4276 │ 0.1612 │ 0.1998 │ 0.3084 │ 0.5113 │ 0.5999 │ 0.6290 │ 0.5833 │
│ jina_zaraah_32_int8_arabic            │ 0.3892 │ 0.0924 │ 0.0986 │ 0.2186 │ 0.5144 │ 0.5787 │ 0.6323 │ 0.5894 │
│ jina_zaraah_32                        │ 0.3889 │ 0.0934 │ 0.0987 │ 0.2192 │ 0.5112 │ 0.5801 │ 0.6294 │ 0.5902 │
│ all_minilm_l6_v2                      │ 0.2843 │ 0.0005 │ 0.0064 │ 0.1905 │ 0.4934 │ 0.5089 │ 0.2518 │ 0.5384 │
└───────────────────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘

I filtered the most related MTEB tasks that supports Arabic-script only the evalution script is in the references blow. We can say that the average score for the jina_zaraah are very low compared to the other models, but didn’t let the Average score fool you! Average is affected with the outliers so, if one task is the low the final answer with be low also.

But from the first look, we can see the peformance is similar to the Arabic versions of MiniLM-L12 in Average and if you looked at the Sentence similarity for STS22 it’s score are very good compared to static-embedding model.

Understanding MTEB Tasks for Arabic

The Massive Text Embedding Benchmark (MTEB) evaluates embedding models across various tasks. Here’s a breakdown of the tasks used to assess Zaraah:

MIRACLRetrievalHardNegatives: Measures retrieval accuracy for hard negative examples, critical for search and question-answering systems. Zaraah’s lower score here reflects its static embedding nature, which sacrifices some contextual nuance.
MLQARetrieval: Tests retrieval performance on multilingual question-answering datasets, where Zaraah performs comparably to MiniLM models.
STS17 & STS22: Evaluates semantic textual similarity, where Zaraah excels, particularly in STS22, with scores rivaling larger models.
XNLI: Assesses natural language inference, where Zaraah’s performance is competitive despite its compact size.

These tasks highlight Zaraah’s strengths in semantic similarity and efficiency, making it ideal for applications like chatbots and lightweight search systems


# all_data = []

# for model_name, model_path in dirs.items():
#     model_results = {"model_name": model_name}
#     for file_name in result_files:
#         json_path = os.path.join(model_path, file_name)
#         task_name = file_name.replace(".json", "")  # Use filename as task identifier

#         main_score_val = None
#         eval_time_val = None

#         if os.path.exists(json_path):
#             try:
#                 with open(json_path, "r", encoding="utf-8") as f:
#                     data = json.load(f)

#                 eval_time_val = data.get("evaluation_time")

#                 # Extract main_score. It can be under 'test', 'validation', or 'dev'.
#                 # MTEB usually prioritizes 'test', then 'dev' (for retrieval), then 'validation'.
#                 scores_section = data.get("scores", {})

#                 score_entry = None
#                 if "test" in scores_section and scores_section["test"]:
#                     score_entry = scores_section["test"][0]
#                 elif (
#                     "dev" in scores_section and scores_section["dev"]
#                 ):  # For MIRACL style
#                     score_entry = scores_section["dev"][0]
#                 elif (
#                     "validation" in scores_section and scores_section["validation"]
#                 ):  # For XNLI style if test is missing
#                     score_entry = scores_section["validation"][0]

#                 if score_entry:
#                     main_score_val = score_entry.get("main_score")

#             except json.JSONDecodeError:
#                 print(f"Error decoding JSON for: {json_path}")
#             except Exception as e:
#                 print(f"An error occurred while processing {json_path}: {e}")
#         else:
#             print(f"File not found: {json_path}")  # Helpful for debugging

#         model_results[f"{task_name}_main_score"] = main_score_val
#         # model_results[f"{task_name}_evaluation_time"] = eval_time_val

#     all_data.append(model_results)

# # Create DataFrame from the collected data
# df = pd.DataFrame(all_data)

# # Set model_name as index
# df.set_index("model_name", inplace=True)

# # Optional: Create a MultiIndex for columns for better organization
# # This will group 'main_score' and 'evaluation_time' under each task
# if not df.empty:
#     df.columns = pd.MultiIndex.from_tuples(
#         [col.rsplit("_", 1) for col in df.columns], names=["Task", "Metric"]
#     )
#     # Sort columns for consistent order: Task Name, then Metric (evaluation_time, main_score)
#     df = df.sort_index(axis=1, level=[1, 0])


# # Display the DataFrame
# pd.set_option("display.max_columns", None)  # Show all columns
# pd.set_option("display.width", 200)  # Adjust width for better display
# print(df)

# # If you want to save to CSV
# df.to_csv("model_evaluation_summary.csv")

You can see the peformance for every task in MTEB here

from rich.console import Console
from rich.table import Table
from rich.text import Text
import pandas as pd

# Initialize rich Console
console = Console()

# Filter for score columns only
score_columns = df.columns[df.columns.get_level_values('Metric') == 'score']

# List of model name patterns to bold
bold_names = ['jina_zaraah_32_int8', 'jina_zaraah_256_arabic_int8']

# Iterate over each score column
for col in score_columns:
    # Sort DataFrame by the current column (descending)
    df_sorted = df.sort_values(by=col, ascending=False)
    
    # Create a rich Table
    table = Table(title=f"Sorted by {col[0]} (Score)", title_style="bold magenta", show_lines=True)
    
    # Add columns: Model Name and the specific score column
    table.add_column("Model Name", style="cyan", no_wrap=True)
    table.add_column(f"{col[0]}", justify="right", style="green")
    
    # Add rows: Model Name and the value for the sorted column
    for idx, row in df_sorted.iterrows():
        value = row[col]
        # Color-code based on value
        color_style = "green" if value > 0.6 else "yellow" if value > 0.3 else "red"
        # Apply bold if model name contains any bold_names pattern
        row_style = f"bold {color_style}" if any(name.lower() in str(idx).lower() for name in bold_names) else color_style
        table.add_row(
            Text(str(idx), style="bold green" if any(name.lower() in str(idx).lower() for name in bold_names) else "cyan"),
            Text(f"{value:.3f}", style=row_style)
        )
    
    # Print the table
    console.print(table)
    console.print("\n")  # Add spacing between tables

             Sorted by MIRACLRetrievalHardNegatives_main (Score)             
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Model Name                            ┃ MIRACLRetrievalHardNegatives_main ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ get_multilingual_base                 │                             0.718 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ muffakir_embedding                    │                             0.642 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ arabic_triplet_matryoshka_v2          │                             0.626 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ arabic_retrieval_v1.0                 │                             0.616 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ arabic_sts_matryoshka                 │                             0.583 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ gate_arabert-v1                       │                             0.577 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ silma_ai_embedding_sts_v0.1           │                             0.380 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ jina_zaraah_256_arabic_int8           │                             0.231 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ jina_zaraah_256_arabic                │                             0.230 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ Arabic-MiniLM-L12-v2-all-nli-triplet  │                             0.224 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ paraphrase-multilingual-MiniLM-L12-v2 │                             0.219 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ potion-multilingual-128M              │                             0.166 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ jina_zaraah_64_int8_arabic            │                             0.161 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ jina_zaraah_32                        │                             0.093 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ jina_zaraah_32_int8_arabic            │                             0.092 │
├───────────────────────────────────────┼───────────────────────────────────┤
│ all_minilm_l6_v2                      │                             0.001 │
└───────────────────────────────────────┴───────────────────────────────────┘

             Sorted by MLQARetrieval_main (Score)             
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ Model Name                            ┃ MLQARetrieval_main ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ get_multilingual_base                 │              0.570 │
├───────────────────────────────────────┼────────────────────┤
│ arabic_retrieval_v1.0                 │              0.567 │
├───────────────────────────────────────┼────────────────────┤
│ muffakir_embedding                    │              0.527 │
├───────────────────────────────────────┼────────────────────┤
│ arabic_triplet_matryoshka_v2          │              0.509 │
├───────────────────────────────────────┼────────────────────┤
│ silma_ai_embedding_sts_v0.1           │              0.501 │
├───────────────────────────────────────┼────────────────────┤
│ arabic_sts_matryoshka                 │              0.484 │
├───────────────────────────────────────┼────────────────────┤
│ gate_arabert-v1                       │              0.481 │
├───────────────────────────────────────┼────────────────────┤
│ Arabic-MiniLM-L12-v2-all-nli-triplet  │              0.361 │
├───────────────────────────────────────┼────────────────────┤
│ paraphrase-multilingual-MiniLM-L12-v2 │              0.350 │
├───────────────────────────────────────┼────────────────────┤
│ jina_zaraah_256_arabic                │              0.347 │
├───────────────────────────────────────┼────────────────────┤
│ jina_zaraah_256_arabic_int8           │              0.346 │
├───────────────────────────────────────┼────────────────────┤
│ potion-multilingual-128M              │              0.315 │
├───────────────────────────────────────┼────────────────────┤
│ jina_zaraah_64_int8_arabic            │              0.200 │
├───────────────────────────────────────┼────────────────────┤
│ jina_zaraah_32                        │              0.099 │
├───────────────────────────────────────┼────────────────────┤
│ jina_zaraah_32_int8_arabic            │              0.099 │
├───────────────────────────────────────┼────────────────────┤
│ all_minilm_l6_v2                      │              0.006 │
└───────────────────────────────────────┴────────────────────┘

             Sorted by MassiveIntentClassification_main (Score)             
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Model Name                            ┃ MassiveIntentClassification_main ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ arabic_retrieval_v1.0                 │                            0.583 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ silma_ai_embedding_sts_v0.1           │                            0.560 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ arabic_triplet_matryoshka_v2          │                            0.558 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ muffakir_embedding                    │                            0.546 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ arabic_sts_matryoshka                 │                            0.546 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ gate_arabert-v1                       │                            0.534 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ get_multilingual_base                 │                            0.507 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ Arabic-MiniLM-L12-v2-all-nli-triplet  │                            0.478 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ paraphrase-multilingual-MiniLM-L12-v2 │                            0.451 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ potion-multilingual-128M              │                            0.428 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ jina_zaraah_256_arabic_int8           │                            0.412 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ jina_zaraah_256_arabic                │                            0.412 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ jina_zaraah_64_int8_arabic            │                            0.308 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ jina_zaraah_32                        │                            0.219 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ jina_zaraah_32_int8_arabic            │                            0.219 │
├───────────────────────────────────────┼──────────────────────────────────┤
│ all_minilm_l6_v2                      │                            0.190 │
└───────────────────────────────────────┴──────────────────────────────────┘

             Sorted by MultiHateClassification_main (Score)             
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Model Name                            ┃ MultiHateClassification_main ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ arabic_retrieval_v1.0                 │                        0.599 │
├───────────────────────────────────────┼──────────────────────────────┤
│ muffakir_embedding                    │                        0.594 │
├───────────────────────────────────────┼──────────────────────────────┤
│ arabic_triplet_matryoshka_v2          │                        0.587 │
├───────────────────────────────────────┼──────────────────────────────┤
│ gate_arabert-v1                       │                        0.585 │
├───────────────────────────────────────┼──────────────────────────────┤
│ silma_ai_embedding_sts_v0.1           │                        0.575 │
├───────────────────────────────────────┼──────────────────────────────┤
│ Arabic-MiniLM-L12-v2-all-nli-triplet  │                        0.570 │
├───────────────────────────────────────┼──────────────────────────────┤
│ paraphrase-multilingual-MiniLM-L12-v2 │                        0.557 │
├───────────────────────────────────────┼──────────────────────────────┤
│ get_multilingual_base                 │                        0.552 │
├───────────────────────────────────────┼──────────────────────────────┤
│ arabic_sts_matryoshka                 │                        0.549 │
├───────────────────────────────────────┼──────────────────────────────┤
│ potion-multilingual-128M              │                        0.534 │
├───────────────────────────────────────┼──────────────────────────────┤
│ jina_zaraah_256_arabic_int8           │                        0.526 │
├───────────────────────────────────────┼──────────────────────────────┤
│ jina_zaraah_256_arabic                │                        0.524 │
├───────────────────────────────────────┼──────────────────────────────┤
│ jina_zaraah_32_int8_arabic            │                        0.514 │
├───────────────────────────────────────┼──────────────────────────────┤
│ jina_zaraah_64_int8_arabic            │                        0.511 │
├───────────────────────────────────────┼──────────────────────────────┤
│ jina_zaraah_32                        │                        0.511 │
├───────────────────────────────────────┼──────────────────────────────┤
│ all_minilm_l6_v2                      │                        0.493 │
└───────────────────────────────────────┴──────────────────────────────┘

             Sorted by STS17_main (Score)             
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Model Name                            ┃ STS17_main ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ silma_ai_embedding_sts_v0.1           │      0.856 │
├───────────────────────────────────────┼────────────┤
│ arabic_triplet_matryoshka_v2          │      0.853 │
├───────────────────────────────────────┼────────────┤
│ muffakir_embedding                    │      0.849 │
├───────────────────────────────────────┼────────────┤
│ arabic_sts_matryoshka                 │      0.829 │
├───────────────────────────────────────┼────────────┤
│ gate_arabert-v1                       │      0.828 │
├───────────────────────────────────────┼────────────┤
│ Arabic-MiniLM-L12-v2-all-nli-triplet  │      0.811 │
├───────────────────────────────────────┼────────────┤
│ arabic_retrieval_v1.0                 │      0.800 │
├───────────────────────────────────────┼────────────┤
│ paraphrase-multilingual-MiniLM-L12-v2 │      0.792 │
├───────────────────────────────────────┼────────────┤
│ get_multilingual_base                 │      0.788 │
├───────────────────────────────────────┼────────────┤
│ potion-multilingual-128M              │      0.651 │
├───────────────────────────────────────┼────────────┤
│ jina_zaraah_256_arabic                │      0.647 │
├───────────────────────────────────────┼────────────┤
│ jina_zaraah_256_arabic_int8           │      0.646 │
├───────────────────────────────────────┼────────────┤
│ jina_zaraah_64_int8_arabic            │      0.600 │
├───────────────────────────────────────┼────────────┤
│ jina_zaraah_32                        │      0.580 │
├───────────────────────────────────────┼────────────┤
│ jina_zaraah_32_int8_arabic            │      0.579 │
├───────────────────────────────────────┼────────────┤
│ all_minilm_l6_v2                      │      0.509 │
└───────────────────────────────────────┴────────────┘

             Sorted by STS22.v2_main (Score)             
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Model Name                            ┃ STS22.v2_main ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ arabic_triplet_matryoshka_v2          │         0.640 │
├───────────────────────────────────────┼───────────────┤
│ jina_zaraah_32_int8_arabic            │         0.632 │
├───────────────────────────────────────┼───────────────┤
│ gate_arabert-v1                       │         0.631 │
├───────────────────────────────────────┼───────────────┤
│ jina_zaraah_32                        │         0.629 │
├───────────────────────────────────────┼───────────────┤
│ muffakir_embedding                    │         0.629 │
├───────────────────────────────────────┼───────────────┤
│ jina_zaraah_64_int8_arabic            │         0.629 │
├───────────────────────────────────────┼───────────────┤
│ arabic_retrieval_v1.0                 │         0.625 │
├───────────────────────────────────────┼───────────────┤
│ arabic_sts_matryoshka                 │         0.624 │
├───────────────────────────────────────┼───────────────┤
│ jina_zaraah_256_arabic                │         0.622 │
├───────────────────────────────────────┼───────────────┤
│ get_multilingual_base                 │         0.615 │
├───────────────────────────────────────┼───────────────┤
│ silma_ai_embedding_sts_v0.1           │         0.612 │
├───────────────────────────────────────┼───────────────┤
│ jina_zaraah_256_arabic_int8           │         0.611 │
├───────────────────────────────────────┼───────────────┤
│ potion-multilingual-128M              │         0.595 │
├───────────────────────────────────────┼───────────────┤
│ Arabic-MiniLM-L12-v2-all-nli-triplet  │         0.554 │
├───────────────────────────────────────┼───────────────┤
│ paraphrase-multilingual-MiniLM-L12-v2 │         0.491 │
├───────────────────────────────────────┼───────────────┤
│ all_minilm_l6_v2                      │         0.252 │
└───────────────────────────────────────┴───────────────┘

             Sorted by XNLI_main (Score)             
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Model Name                            ┃ XNLI_main ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ gate_arabert-v1                       │     0.875 │
├───────────────────────────────────────┼───────────┤
│ arabic_sts_matryoshka                 │     0.874 │
├───────────────────────────────────────┼───────────┤
│ arabic_triplet_matryoshka_v2          │     0.854 │
├───────────────────────────────────────┼───────────┤
│ silma_ai_embedding_sts_v0.1           │     0.813 │
├───────────────────────────────────────┼───────────┤
│ Arabic-MiniLM-L12-v2-all-nli-triplet  │     0.804 │
├───────────────────────────────────────┼───────────┤
│ paraphrase-multilingual-MiniLM-L12-v2 │     0.786 │
├───────────────────────────────────────┼───────────┤
│ get_multilingual_base                 │     0.758 │
├───────────────────────────────────────┼───────────┤
│ muffakir_embedding                    │     0.758 │
├───────────────────────────────────────┼───────────┤
│ arabic_retrieval_v1.0                 │     0.739 │
├───────────────────────────────────────┼───────────┤
│ potion-multilingual-128M              │     0.600 │
├───────────────────────────────────────┼───────────┤
│ jina_zaraah_256_arabic                │     0.594 │
├───────────────────────────────────────┼───────────┤
│ jina_zaraah_256_arabic_int8           │     0.594 │
├───────────────────────────────────────┼───────────┤
│ jina_zaraah_32                        │     0.590 │
├───────────────────────────────────────┼───────────┤
│ jina_zaraah_32_int8_arabic            │     0.589 │
├───────────────────────────────────────┼───────────┤
│ jina_zaraah_64_int8_arabic            │     0.583 │
├───────────────────────────────────────┼───────────┤
│ all_minilm_l6_v2                      │     0.538 │
└───────────────────────────────────────┴───────────┘

Zaraah vs. all-MiniLM

The Zaraah models outperform the all-MiniLM family from SBERT in Arabic tasks while being significantly faster and capable of running on CPU. This makes Zaraah an excellent lightweight alternative for applications requiring efficient Arabic embeddings.

Also the models performance is most of the time better than the potion-multilingual-128M which indicates that techniques are working with any language not just English.

Arabic RAG Leaderboard

To complement MTEB evaluations, I tested Zaraah on the Arabic-RAG Leaderboard, which provides a robust benchmark for Arabic-specific tasks. Zaraah ranks 37 out of 45 models with an average score of 36.84. This is impressive, as Zaraah is the smallest model in the leaderboard, highlighting its efficiency and competitive performance in resource-constrained settings.

Speed comprsion

import time
import csv
import torch
from model2vec import StaticModel
from sentence_transformers import SentenceTransformer
from rich.console import Console
from rich.table import Table
import numpy as np

# Initialize rich console
console = Console()

# Check for GPU availability
if not torch.cuda.is_available():
    console.print("[red]GPU not available. Falling back to CPU for all models.[/red]")

# Define devices for each model
cpu_device = torch.device("cpu")
gpu_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define models with their respective devices
models = {
    "jina_zaraah_256": {"model": StaticModel.from_pretrained("Abdelkareem/zaraah_jina_v3"), "device": cpu_device},
    "jina_zaraah_32": {"model": StaticModel.from_pretrained("Abdelkareem/zaraah_jina_v3_32D"), "device": cpu_device},
    "potion-multilingual-128M": {"model": StaticModel.from_pretrained("minishlab/potion-multilingual-128M"), "device": cpu_device},
    "paraphrase-multilingual-MiniLM-L12-v2": {"model": SentenceTransformer("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2", device=gpu_device), "device": gpu_device},
    "silma_ai_embedding_sts_v0.1": {"model": SentenceTransformer("silma-ai/silma-embedding-sts-v0.1", device=gpu_device), "device": gpu_device},
    "muffakir_embedding": {"model": SentenceTransformer("mohamed2811/Muffakir_Embedding", device=gpu_device), "device": gpu_device},
    "get_multilingual_base": {"model": SentenceTransformer("Alibaba-NLP/gte-multilingual-base", device=gpu_device, trust_remote_code=True), "device": gpu_device},
    "arabic_retrieval_v1.0": {"model": SentenceTransformer("omarelshehy/Arabic-Retrieval-v1.0", device=gpu_device), "device": gpu_device},
    "arabic_triplet_matryoshka_v2": {"model": SentenceTransformer("Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2", device=gpu_device), "device": gpu_device},
}

# Dataset: Synthetic multilingual sentences
sentences = (
    ["This is a short sentence."] * 3000 +
    ["هذه جملة قصيرة."] * 3000 +  # Arabic: "This is a short sentence."
    ["Este es un texto largo " + "word " * 100] * 4000  # Long Spanish sentences
)
batch_size = 32

# Prepare results storage
results = []

# Benchmark each model
for name, config in models.items():
    model = config["model"]
    device = config["device"]
    
    if model is None:
        console.print(f"[yellow]Skipping {name}: Model not loaded[/yellow]")
        results.append({"Model": name, "Speed (sentences/second)": "N/A", "Device": str(device)})
        continue

    # For SentenceTransformer models, ensure device is set explicitly
    if isinstance(model, SentenceTransformer):
        model.to(device)
    
    start_time = time.time()
    embeddings = model.encode(sentences, batch_size=batch_size, show_progress_bar=False, device=device if isinstance(model, StaticModel) else None)
    elapsed_time = time.time() - start_time
    speed = len(sentences) / elapsed_time
    results.append({"Model": name, "Speed (sentences/second)": f"{speed:.2f}", "Device": str(device)})
    console.print(f"Completed {name} on {device}: {speed:.2f} sentences/second")

# Save results to CSV
with open("benchmark_results.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["Model", "Speed (sentences/second)", "Device"])
    writer.writeheader()
    writer.writerows(results)

# Display results using rich
table = Table(title="Model Benchmark Results")
table.add_column("Model", style="cyan")
table.add_column("Speed (sentences/second)", style="magenta")
table.add_column("Device", style="green")

for result in results:
    table.add_row(result["Model"], result["Speed (sentences/second)"], result["Device"])

console.print(table)
# console.print("[green]Results saved to benchmark_results.csv[/green]")

Some weights of the model checkpoint at Alibaba-NLP/gte-multilingual-base were not used when initializing NewModel: {'classifier.bias', 'classifier.weight'}
- This IS expected if you are initializing NewModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing NewModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Completed jina_zaraah_256 on cpu: 11342.11 sentences/second

Completed jina_zaraah_32 on cpu: 10947.39 sentences/second

Completed potion-multilingual-128M on cpu: 9973.48 sentences/second

Completed paraphrase-multilingual-MiniLM-L12-v2 on cuda: 2120.08 sentences/second

Completed silma_ai_embedding_sts_v0.1 on cuda: 611.62 sentences/second

Completed muffakir_embedding on cuda: 606.21 sentences/second

Completed get_multilingual_base on cuda: 876.47 sentences/second

Completed arabic_retrieval_v1.0 on cuda: 600.66 sentences/second

Completed arabic_triplet_matryoshka_v2 on cuda: 590.78 sentences/second

                           Model Benchmark Results                           
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Model                                 ┃ Speed (sentences/second) ┃ Device ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ jina_zaraah_256                       │ 11342.11                 │ cpu    │
│ jina_zaraah_32                        │ 10947.39                 │ cpu    │
│ potion-multilingual-128M              │ 9973.48                  │ cpu    │
│ paraphrase-multilingual-MiniLM-L12-v2 │ 2120.08                  │ cuda   │
│ silma_ai_embedding_sts_v0.1           │ 611.62                   │ cuda   │
│ muffakir_embedding                    │ 606.21                   │ cuda   │
│ get_multilingual_base                 │ 876.47                   │ cuda   │
│ arabic_retrieval_v1.0                 │ 600.66                   │ cuda   │
│ arabic_triplet_matryoshka_v2          │ 590.78                   │ cuda   │
└───────────────────────────────────────┴──────────────────────────┴────────┘

import csv
from huggingface_hub import model_info
from rich.console import Console
from rich.table import Table

# Initialize rich console
console = Console()

# Define models
models = {
        "jina_zaraah_32": "Abdelkareem/zaraah_jina_v3_32D",
    "jina_zaraah_256": "Abdelkareem/zaraah_jina_v3",
    "potion-multilingual-128M": "minishlab/potion-multilingual-128M",
    "paraphrase-multilingual-MiniLM-L12-v2": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
    "silma_ai_embedding_sts_v0.1": "silma-ai/silma-embedding-sts-v0.1",
    "muffakir_embedding": "mohamed2811/Muffakir_Embedding",
    "arabic_retrieval_v1.0": "omarelshehy/Arabic-Retrieval-v1.0",
    "arabic_triplet_matryoshka_v2": "Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2",
        "get_multilingual_base": "Alibaba-NLP/gte-multilingual-base",

}


def get_model_info(model_id):
    try:
        model_data = model_info(model_id)
        
        if hasattr(model_data, 'safetensors') and model_data.safetensors:
            num_parameters = sum(model_data.safetensors.parameters.get(precision, 0) for precision in model_data.safetensors.parameters)
            num_parameters = round(num_parameters / 1e6, 2)  # Parameters in millions
            size_mb = model_data.safetensors.total / (1024 ** 2) if model_data.safetensors.total else num_parameters * 4
        else:
            num_parameters = 0
            size_mb = model2vec_sizes.get(model_id.split('/')[-1], 0)

        return num_parameters, size_mb
    except Exception as e:
        console.print(f"[yellow]Error: Could not fetch model information for {model_id}. {str(e)}[/yellow]")
        return 0, 0

# Fetch model information
def fetch_model_information(model_name):
    try:
        return get_model_info(model_name)
    except Exception as e:
        console.print(f"[red]Error: Could not fetch model information for {model_name}. {str(e)}[/red]")
        return 0, 0

# Collect results
results = []
for name, path in models.items():
    num_parameters, size_mb = fetch_model_information(path)
    results.append({
        "Model": name,
        "Parameters (M)": f"{num_parameters:.2f}" if num_parameters else "N/A",
        "Size (MB)": f"{size_mb:.2f}" if size_mb else "N/A",
    })

# Calculate relative size and "less than largest" factor
max_size = max(float(result["Size (MB)"]) for result in results if result["Size (MB)"] != "N/A") if any(result["Size (MB)"] != "N/A" for result in results) else 1
for result in results:
    size_mb = float(result["Size (MB)"]) if result["Size (MB)"] != "N/A" else 0
    result["Relative to Largest (%)"] = f"{(size_mb / max_size * 100):.2f}" if size_mb else "N/A"
    result["Less than Largest (x)"] = f"{(max_size / size_mb):.2f}" if size_mb else "N/A"

# Save results to CSV
try:
    with open("model_info_results.csv", "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=["Model", "Parameters (M)", "Size (MB)", "Relative to Largest (%)", "Less than Largest (x)"])
        writer.writeheader()
        writer.writerows(results)
    console.print("[green]Results saved to model_info_results.csv[/green]")
except IOError as e:
    console.print(f"[red]Failed to save CSV: {e}[/red]")

# Display results using rich
table = Table(title="Model Information Results")
table.add_column("Model", style="cyan")
table.add_column("Parameters (M)", style="yellow")
table.add_column("Size (MB)", style="green")
table.add_column("Relative to Largest (%)", style="magenta")
table.add_column("Less than Largest (x)", style="blue")

for result in results:
    table.add_row(
        result["Model"],
        result["Parameters (M)"],
        result["Size (MB)"],
        result["Relative to Largest (%)"],
        result["Less than Largest (x)"]
    )

console.print(table)

Results saved to model_info_results.csv

                                             Model Information Results                                             
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Model                            ┃ Parameters (M) ┃ Size (MB) ┃ Relative to Largest (%) ┃ Less than Largest (x) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ jina_zaraah_32                   │ 8.00           │ 7.63      │ 2.62                    │ 38.17                 │
│ jina_zaraah_256                  │ 64.00          │ 61.03     │ 20.96                   │ 4.77                  │
│ potion-multilingual-128M         │ 128.09         │ 122.16    │ 41.95                   │ 2.38                  │
│ paraphrase-multilingual-MiniLM-… │ 117.65         │ 112.20    │ 38.53                   │ 2.60                  │
│ silma_ai_embedding_sts_v0.1      │ 135.19         │ 128.93    │ 44.27                   │ 2.26                  │
│ muffakir_embedding               │ 135.19         │ 128.93    │ 44.27                   │ 2.26                  │
│ arabic_retrieval_v1.0            │ 135.19         │ 128.93    │ 44.27                   │ 2.26                  │
│ arabic_triplet_matryoshka_v2     │ 135.19         │ 128.93    │ 44.27                   │ 2.26                  │
│ get_multilingual_base            │ 305.37         │ 291.22    │ 100.00                  │ 1.00                  │
└──────────────────────────────────┴────────────────┴───────────┴─────────────────────────┴───────────────────────┘

What’s Next for Zaraah?

It’s just the start with initial tests, there is more to explore from the base models, datasets and the new features from minishlab which will try to narrow the gab between model2vec and sentence-transformers.