# Scenarios in Okareo: An Introduction


 "Open


## 🎯 Goals

After using this notebook, you will be able to:
- Upload your test cases to Okareo as a Seed scenario by:
 1. Uploading a file
 2. Defining input data statically
- Generate synthetic new test cases using Scenario generators
- Chain generators together to make more complex test cases

First, import the Okareo library and use your [API key](https://okareo.com/docs/using-okareo/api_key) to authenticate.

In [2]:
import os
from okareo import Okareo

OKAREO_API_KEY = os.environ["OKAREO_API_KEY"]
okareo = Okareo(OKAREO_API_KEY)

## Uploading a Seed Scenario

Here we use an existing `.jsonl` file to create a seed scenario with the `upload_scenario_set` method. The data here includes short articles about a fictitious company called "WebBizz."

In [3]:
import os
import requests

file_path_articles = "webbizz_10_articles.jsonl"
scenario_name_articles = "WebBizz Articles"

def load_or_download_file(file_path, scenario_name):
 try:
 # load the file to okareo
 source_scenario = okareo.upload_scenario_set(file_path=file_path, scenario_name=scenario_name)
 except:
 print(f"- Loading file {file_path} to Okareo failed. Temporarily download the file from GitHub...") 

 # if the file doesn't exist, download it
 file_url = f"https://raw.githubusercontent.com/okareo-ai/okareo-python-sdk/main/examples/{file_path}"
 response = requests.get(file_url)
 with open(file_path, "wb") as f:
 f.write(response.content)

 # load the file to okareo
 source_scenario = okareo.upload_scenario_set(file_path=file_path, scenario_name=scenario_name)

 # delete the file
 os.remove(file_path)
 return source_scenario

source_scenario = load_or_download_file(file_path_articles, scenario_name_articles)
print(f"{scenario_name_articles}: {source_scenario.app_link}")

WebBizz Articles: https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/3a904bd0-3dfa-42e5-a10a-d42135b9077e


With a seed scenario defined, let's use Okareo to generate some synthetic scenarios.

## Rephrasing generator

Suppose we want to generate a scenario where the same `result`s should be retrieved under `input`s with minor changes to word order. We can achieve this using the Rephrasing generator, which attempts to change the wording of each sentence in a given `input`.

In [4]:
from okareo_api_client.models import ScenarioType

generated_scenario = okareo.generate_scenarios(
 source_scenario=source_scenario.scenario_id,
 name="Retrieval Articles Scenario: Rephrased",
 number_examples=1, # number of examples to generate per row in seed scenario
 generation_type=ScenarioType.REPHRASE_INVARIANT
)

print(generated_scenario)
print(generated_scenario.app_link)

ScenarioSetResponse(scenario_id='4972d1d2-d33d-4512-b16a-de997c780b25', project_id='21b8a05b-f5b6-4578-a36a-fc264036d9d3', time_created=datetime.datetime(2024, 3, 8, 21, 20, 24, 90865), type='REPHRASE_INVARIANT', tags=['seed:3a904bd0-3dfa-42e5-a10a-d42135b9077e'], name='Retrieval Articles Scenario: Rephrased', seed_data=[], scenario_count=0, scenario_input=[], app_link='https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/4972d1d2-d33d-4512-b16a-de997c780b25', additional_properties={})
https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/4972d1d2-d33d-4512-b16a-de997c780b25


Now let's compare the generated scenarios to their corresponding seed scenarios.

In [8]:
# helper methods to compare the seed/generated articles
def make_points_to_dict(points):
 """
 note: using the `result`s as keys will not generalize to any scenario,
 but it works for the scenarios in this notebook.
 """
 d = {}
 for p in points:
 res = p.result if type(p.result) == str else p.result[0]
 if res not in d.keys():
 d[res] = [p.input_]
 else:
 d[res].append(p.input_)
 return d

def compare_seed_and_generated(seed_scenario, generated_scenario):
 gen_points = okareo.get_scenario_data_points(generated_scenario.scenario_id)
 seed_points = okareo.get_scenario_data_points(seed_scenario.scenario_id)
 gen_d = make_points_to_dict(gen_points)
 seed_d = make_points_to_dict(seed_points)
 N = len(gen_d)
 for i, key in enumerate(seed_d.keys()):
 print("-"*8 + f"Seed #{i}" + "-"*8)
 print(seed_d[key][0])
 for j in range(len(gen_d[key])):
 print("-"*5 + f"Generated #{j}" + "-"*6)
 print(gen_d[key][j])
 print("-"*4 + f"End of Seed #{i}" + "-"*5)
 

In [9]:
compare_seed_and_generated(source_scenario, generated_scenario)

--------Seed #0--------
WebBizz is dedicated to providing our customers with a seamless online shopping experience. Our platform is designed with user-friendly interfaces to help you browse and select the best products suitable for your needs. We offer a wide range of products from top brands and new entrants, ensuring diversity and quality in our offerings. Our 24/7 customer support is ready to assist you with any queries, from product details, shipping timelines, to payment methods. We also have a dedicated FAQ section addressing common concerns. Always ensure you are logged in to enjoy personalized product recommendations and faster checkout processes.
-----Generated #0------
WebBizz prioritizes a smooth digital shopping journey for our customers. Our platform is tailored with straightforward interfaces for easier product browsing and selection. We present a broad selection of items from well-recognized brands and emerging ones, keeping diversity and high standards. We have round-th

## Term Relevance generator

Suppose we want to generate a scenario based on keywords from the same `input`s. We can generate such a scenario using the Term Relevance generator, which extracts the most relevant terms based on [term frequency-inverse document frequency (tf-idf)](https://en.wikipedia.org/wiki/Tf%E2%80%93idf).

In [10]:
generated_scenario_tr = okareo.generate_scenarios(
 source_scenario=source_scenario.scenario_id,
 name="Retrieval Articles Scenario: Term Relevance",
 number_examples=1, # number of examples to generate per seed data
 generation_type=ScenarioType.TERM_RELEVANCE_INVARIANT
)

print(generated_scenario_tr.app_link)

https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/6a500f0f-9483-47dd-a9a6-99bfe0fd77fd


In [11]:
compare_seed_and_generated(source_scenario, generated_scenario_tr)

--------Seed #0--------
WebBizz is dedicated to providing our customers with a seamless online shopping experience. Our platform is designed with user-friendly interfaces to help you browse and select the best products suitable for your needs. We offer a wide range of products from top brands and new entrants, ensuring diversity and quality in our offerings. Our 24/7 customer support is ready to assist you with any queries, from product details, shipping timelines, to payment methods. We also have a dedicated FAQ section addressing common concerns. Always ensure you are logged in to enjoy personalized product recommendations and faster checkout processes.
-----Generated #0------
dedicated products product
----End of Seed #0-----
--------Seed #1--------
Safety and security of your data is our top priority at WebBizz. Our platform employs state-of-the-art encryption methods ensuring your personal and financial information remains confidential. Our two-factor authentication at checkout pr

## Seed Scenario Creation via Static Definition 

In addition to uploading a list of json objects from a `.jsonl` file, we can also statically define a scenario by explicitly defining `SeedData` objects. The following cell shows you the required imports and arguments to do this.

In [17]:
from okareo_api_client.models import ScenarioSetCreate, SeedData

# list of statically defined seed data
seed_data=[
 SeedData(input_="The quick brown fox jumps over the lazy dog", result="result1"),
 SeedData(input_="The rain in Spain falls mainly on the plain", result="result2"),
 SeedData(input_="Lorem ipsum dolor sit amet, consectetur adipiscing elit", result="result3")
]

# request for scenario set creation 
scenario_set_create = ScenarioSetCreate(
 name="Statically Defined Scenario: Seed",
 generation_type=ScenarioType.SEED,
 number_examples=len(seed_data), # number of examples to generate per seed data
 seed_data=seed_data
)

source_scenario_static = okareo.create_scenario_set(scenario_set_create)

print(source_scenario_static.app_link)

https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/64056783-d6f9-471e-b978-dfb25a6baa35


## Mispelling generator

Using this statically defined scenario, suppose we want to introduce errors into the `input`s. The Misspelling generator does this by emulating human typing error, i.e. by editing random characters in the input to different, adjacent keys on a typical QWERTY keyboard.

In [13]:
generated_scenario_mis = okareo.generate_scenarios(
 source_scenario=source_scenario_static.scenario_id,
 name="Statically Defined Scenario: Misspelling",
 number_examples=2, # number of examples to generate per seed data
 generation_type=ScenarioType.COMMON_MISSPELLINGS
)

print(generated_scenario_mis)
print(generated_scenario_mis.app_link)

ScenarioSetResponse(scenario_id='bb29a3b3-d69d-46ac-9693-ae7fa99c444d', project_id='21b8a05b-f5b6-4578-a36a-fc264036d9d3', time_created=datetime.datetime(2024, 3, 8, 21, 23, 27, 187544), type='COMMON_MISSPELLINGS', tags=['seed:0c29d21a-e91c-4950-b6e9-818a0635c552'], name='Statically Defined Scenario: Misspelling', seed_data=[], scenario_count=0, scenario_input=[], app_link='https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/bb29a3b3-d69d-46ac-9693-ae7fa99c444d', additional_properties={})
https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/bb29a3b3-d69d-46ac-9693-ae7fa99c444d


In [14]:
compare_seed_and_generated(source_scenario_static, generated_scenario_mis)

--------Seed #0--------
The quick brown fox jumps over the lazy dog
-----Generated #0------
The quick brown fox jumps over the lazt dog
-----Generated #1------
The quick brown fox humps over the lazy dog
----End of Seed #0-----
--------Seed #1--------
The rain in Spain falls mainly on the plain
-----Generated #0------
Rhe rain in Spain falls mainly on the plain
-----Generated #1------
The rain in Spain falls mainly on tge plain
----End of Seed #1-----
--------Seed #2--------
Lorem ipsum dolor sit amet, consectetur adipiscing elit
-----Generated #0------
Lorem ipsum dolor sit amet, consrctetur adipiscing elit
-----Generated #1------
Loeem ipsum dolor sit amet, consectetur adipiscing elit
----End of Seed #2-----


## Contraction generator

Contractions and abbreviations can occur commonly in human-written documents. The Contraction generator lets you generate a scenario that tries to find common abbreviations of strings in the `input`s.

In [15]:
generated_scenario_con = okareo.generate_scenarios(
 source_scenario=source_scenario_static.scenario_id,
 name="Statically Defined Scenario: Contractions",
 number_examples=1, # number of examples to generate per seed data
 generation_type=ScenarioType.COMMON_CONTRACTIONS
)

print(generated_scenario_con.app_link)

https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/ecb0e6f8-e1bf-4527-ae23-de7d5d9b10c3


In [16]:
compare_seed_and_generated(source_scenario_static, generated_scenario_con)

--------Seed #0--------
The quick brown fox jumps over the lazy dog
-----Generated #0------
The quick brwn fox jumps over the lazy dog
----End of Seed #0-----
--------Seed #1--------
The rain in Spain falls mainly on the plain
-----Generated #0------
The rain in Spain falls manly on the plain
----End of Seed #1-----
--------Seed #2--------
Lorem ipsum dolor sit amet, consectetur adipiscing elit
-----Generated #0------
Lorem ipsum dolor sit amet, consctetur adipiscing elit
----End of Seed #2-----


## Reverse Question generator

Suppose we would like to generate some common user queries that are based on the original `input`s. The Reverse Question generator enables this by generating questions where the answer is contained in the original input.

In [22]:
generated_scenario_question = okareo.generate_scenarios(
 source_scenario=source_scenario.scenario_id,
 name="Retrieval Articles Scenario: Reverse Question",
 number_examples=1, # number of examples to generate per seed data
 generation_type=ScenarioType.TEXT_REVERSE_QUESTION
)

print(generated_scenario_question)
print(generated_scenario_question.app_link)

ScenarioSetResponse(scenario_id='b5b990d0-8a00-4c83-8973-76d8ddc51202', project_id='21b8a05b-f5b6-4578-a36a-fc264036d9d3', time_created=datetime.datetime(2024, 3, 8, 21, 24, 45, 358913), type='TEXT_REVERSE_QUESTION', tags=['seed:3a904bd0-3dfa-42e5-a10a-d42135b9077e'], name='Retrieval Articles Scenario: Reverse Question', seed_data=[], scenario_count=0, scenario_input=[], app_link='https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/b5b990d0-8a00-4c83-8973-76d8ddc51202', additional_properties={})
https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/b5b990d0-8a00-4c83-8973-76d8ddc51202


In [23]:
compare_seed_and_generated(source_scenario, generated_scenario_question)

--------Seed #0--------
WebBizz is dedicated to providing our customers with a seamless online shopping experience. Our platform is designed with user-friendly interfaces to help you browse and select the best products suitable for your needs. We offer a wide range of products from top brands and new entrants, ensuring diversity and quality in our offerings. Our 24/7 customer support is ready to assist you with any queries, from product details, shipping timelines, to payment methods. We also have a dedicated FAQ section addressing common concerns. Always ensure you are logged in to enjoy personalized product recommendations and faster checkout processes.
-----Generated #0------
What features does WebBizz offer to enhance the customer's online shopping experience?
----End of Seed #0-----
--------Seed #1--------
Safety and security of your data is our top priority at WebBizz. Our platform employs state-of-the-art encryption methods ensuring your personal and financial information remain

## Conditional generator + Chaining generators

Now suppose we would like to rephrase the questions we generated by the Reverse Question generator. The Conditional generator tries to do this by adding a qualifying phrase to the beginning of each question. 

Observe that the `source_scenario` argument below uses the output of the previous generation (`generated_scenario_question`) meaning the generated Reverse Question scenario is the input to the Conditional generator. This is one method for *chaining* generators together, which can let you layer different generators on one another and create composite generations.

In [24]:
# generate a Conditional scenario with the generated Reverse Question scenario as a source
generated_scenario_conditional = okareo.generate_scenarios(
 source_scenario=generated_scenario_question.scenario_id,
 name="Retrieval Articles Scenario: Conditional",
 number_examples=1, # number of examples to generate per seed data
 generation_type=ScenarioType.CONDITIONAL
)

print(generated_scenario_conditional.app_link)

https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/cf7d3acf-df7f-40ee-b059-29bb6d3457a3


In [25]:
compare_seed_and_generated(generated_scenario_question, generated_scenario_conditional)

--------Seed #0--------
What features does WebBizz offer to enhance the customer's online shopping experience?
-----Generated #0------
Considering WebBizz, what features are offered to improve the online shopping experience for customers?
----End of Seed #0-----
--------Seed #1--------
What measures does WebBizz take to ensure the security of personal and financial data?
-----Generated #0------
Given your stake in WebBizz, what measures does the company take to ensure the safety of personal and financial information?
----End of Seed #1-----
--------Seed #2--------
What are some benefits of being a 'Premium Club' member at WebBizz?
-----Generated #0------
Given your status as a 'Premium Club' member at WebBizz, what are the potential benefits?
----End of Seed #2-----
--------Seed #3--------
How does the 'Wishlist' feature on WebBizz aid in enhancing the shopping experience?
-----Generated #0------
Considering the 'Wishlist' feature on WebBizz, how might it enhance the shopping experienc