Skip to main content

Retrieval Overview

What do you need?

You will need an environment for running Okareo. Typescript and Python are both available. Please see the SDK sections for more on how to setup each.

Cookbook examples for this guide are available:

tip

If you prefer command line, make sure to review the Okareo CLI for more information.

Step 1: Install Okareo

Install Okareo and add in your API key.

pip install okareo

Step 2: Register your embedding model and vector DB

Register a Model: Okareo can support any embedding model and vector DB through our CustomModel class. We also have built-in support for embedding models from Cohere and vector DBs from Pinecone and Qdrant.


When the retrieval model is queried, the input will first be embed by the embedding model. See below for the various ways an embedding model can be setup.

To find a model ID, see the list of the Cohere embedding models here. For example, the model ID could be embed-english-v3.0.

from okareo import Okareo
from okareo.model_under_test import CohereModel

okareo.register_model(
name="Cohere embedding Model",
model=[
CohereModel(
model_type="embed",

# Add your model id here
model_id="your-model-here"
),
YourVectorDB()
]
)

Then, the embedding will be sent to the vector DB, and the documents with the highest similarity scores wil be returned. See below for the various ways this can be achieved.

We have built in support to connect to a hosted Qdrant instance.

from okareo import Okareo
from okareo.model_under_test import QdrantDB

okareo.register_model(
name="Your retrieval model",
model=[
YourEmbeddingModel(),
QdrantDB(
# Your Qdrant instance url
url="...qdrant.io:port",

# Name of the collection within your Qdrant instance
collection_name="your collection name",

# How many top results should be returned from the vector search
top_k=10,
)
]
)

Step 3: Create a Scenario to Evaluate

For a retrieval scenario, the input is the text that is embed, and the result is a list of corresponding document IDs used in the vector database.

The ScenarioType.SEED generation type asks Okareo not to use a synthetic data generator. If you'd like to use our synthetic data generators to create more scenarios, learn more about scenario generation.

from okareo import Okareo
from okareo_api_client.models import ScenarioSetCreate, SeedData, ScenarioType

scenario_set_create = ScenarioSetCreate(
name="Your Scenario set name",
generation_type=ScenarioType.SEED,
number_examples=1,
seed_data=[
SeedData(
input_="<add your input here>",

# Result should be the IDs of the correct documents to retrieve
result=['a9234-da23...', 'c9d24-da23...']
),
# add the rest of your scenario ...
]
)

scenario = okareo.create_scenario_set(scenario_set_create)

Step 4: Evaluate the Scenario

Now that you have a model and scenario, you can run an evaluation and assess the performance of your retrieval.


The metrics for a retrieval evaluation can be configured in the metrics_kwargs parameter. The parameter should be set up as a dictionary where the key is the name of the metric, and the value is an array of the values of k that the metric should be calculated for. A description for the metrics can be found here.

# Indicate which intervals of k should the metrics be calculated for
at_k_intervals = [1, 2, 3, 5, 10, 20]

evaluation = model_under_test.run_test(
name="Your test run name",
test_run_type=TestRunType.INFORMATION_RETRIEVAL,
calculate_metrics=True,

# Use the scenario you have created in the previous step
scenario=scenario,

# If you are using a custom model, this parameter is not needed
api_key=[YOUR_EMBEDDING_MODEL_API_KEY, YOUR_VECTOR_DB_API_KEY],

# Metrics that will be calculated for this test run
metrics_kwargs={
"accuracy_at_k": at_k_intervals,
"precision_recall_at_k": at_k_intervals,
"ndcg_at_k": at_k_intervals,
"mrr_at_k": at_k_intervals,
"map_at_k": at_k_intervals,
},
)

print(f"View the results in Okareo: {evaluation.app_link}")

Step 5: Review Results

Results: Navigate to your last evaluation either within app.okareo.com or directly from the link generated in the example to view evaluation results. Okareo Diagram