Compare

In this example we'll use an utility function from the multigen module that builds a table of answers to a list of questions, as generated by multiple models. This can be very helpful to compare how two or more models react to the same input.

This function generates a 2-D table of [ input , model ], where each row is the output from different models to the same question or input. Such table can be printed or saved as a CSV file.

For the local model, make sure you have its file in the folder "../../models". You can use any GGUF format model - see here how to download the OpenChat model used below. If you use a different one, don't forget to set its filename in the local_name variable below, after the text "llamacpp:".

Jupyter notebook and Python script versions are available in the example's folder.

Instead of directly creating models as we've seen in previous examples, multigen will create the models via the Models class directory.

We'll start by choosing a local and a remote model that we'll compare.

# load env variables like OPENAI_API_KEY from a .env file (if available)
try: from dotenv import load_dotenv; load_dotenv()
except: ...

from sibila import Models

# to use a local model, assuming it's in ../../models:
# setup models folder:
Models.setup("../../models")
# set the model's filename - change to your own model
local_name = "llamacpp:openchat-3.5-1210.Q4_K_M.gguf"

# to use an OpenAI model:
remote_name = "openai:gpt-4"

Now let's define a list of reviews that we'll ask the two models to do sentiment analysis upon.

These are generic product reviews, that you could find in an online store.

reviews = [
"The user manual was confusing, but once I figured it out, the product more or less worked.",
"This widget changed my life! It's sleek, efficient, and worth every penny.",
"I'm disappointed with the product quality. It broke after just a week of use.",
"The customer service team was incredibly helpful in resolving my issue with the device.",
"I'm blown away by the functionality of this gadget. It exceeded my expectations.",
"The packaging was damaged upon arrival, but the product itself works great.",
"I've been using this tool for months, and it's still as good as new. Highly recommended!",
"I regret purchasing this item. It doesn't perform as advertised.",
"I've never had so much trouble with a product before. It's been a headache from day one.",
"I bought this as a gift for my friend, and they absolutely love it!",
"The price seemed steep at first, but after using it, I understand why. Quality product.",
"This gizmo is a game-changer for my daily routine. Couldn't be happier with my purchase!"
]

# model instructions text, also known as system message
inst_text = "You are a helpful assistant that analyses text sentiment."

Since we just want to obtain a sentiment classification, we'll use a convenient enumeration: a list with three values: positive, negative or neutral.

Let's try the first review on a local model:

sentiment_enum = ["positive", "neutral", "negative"]

in_text = "Each line is a product review. Extract the sentiment associated with each review:\n\n" + reviews[0]

print(reviews[0])

local_model = Models.create(local_name)

out = local_model.extract(sentiment_enum,
                          in_text,
                          inst=inst_text)
# to clear memory
del local_model

print(out)

The user manual was confusing, but once I figured it out, the product more or less worked.
neutral

Definitely, 'neutral' is a good answer for this one.

Let's now try the remote model:

print(reviews[0])

remote_model = Models.create(remote_name)

out = remote_model.extract(sentiment_enum,
                          in_text,
                          inst=inst_text)
del remote_model

print(out)

The user manual was confusing, but once I figured it out, the product more or less worked.
neutral

And the remote model (GPT-4) seems to agree on neutrality.

By using the query_multigen() function that we'll import from sibila.multigen, we'll be able to compare what multiple models generate in response to each input.

In our case the inputs will be the list of reviews. This function accepts these interesting arguments: - text: type of text output, which can be the word "print" or a text filename to which it will save. - csv: type of CSV output, which can also be "print" or a text filename to save into. - out_keys: what we want listed: the generated raw text ("text"), a Python dict ("dict") or a Pydantic object ("obj"). For our case "dict" is the right one. - gencall: we need to pass a function that will actually call the model for each input. We use a convenient predefined function and provide it with the sentiment_type definition.

Let's run it with our two models:

from sibila.multigen import (
    query_multigen,
    make_extract_gencall
)

sentiment_enum = ["positive", "neutral", "negative"]

out = query_multigen(reviews,
                     inst_text,
                     model_names = [local_name, remote_name],
                     text="print",
                     csv="sentiment.csv",
                     out_keys = ["value"],
                     gencall = make_extract_gencall(sentiment_enum)
                     )

////////////////////////////////////////////////////////////
The user manual was confusing, but once I figured it out, the product more or less worked.
////////////////////////////////////////////////////////////
==================== llamacpp:openchat-3.5-1210.Q4_K_M.gguf -> OK_STOP
'neutral'
==================== openai:gpt-4 -> OK_STOP
'neutral'

////////////////////////////////////////////////////////////
This widget changed my life! It's sleek, efficient, and worth every penny.
////////////////////////////////////////////////////////////
==================== llamacpp:openchat-3.5-1210.Q4_K_M.gguf -> OK_STOP
'positive'
==================== openai:gpt-4 -> OK_STOP
'positive'

////////////////////////////////////////////////////////////
I'm disappointed with the product quality. It broke after just a week of use.
////////////////////////////////////////////////////////////
==================== llamacpp:openchat-3.5-1210.Q4_K_M.gguf -> OK_STOP
'negative'
==================== openai:gpt-4 -> OK_STOP
'negative'

////////////////////////////////////////////////////////////
The customer service team was incredibly helpful in resolving my issue with the device.
////////////////////////////////////////////////////////////
==================== llamacpp:openchat-3.5-1210.Q4_K_M.gguf -> OK_STOP
'positive'
==================== openai:gpt-4 -> OK_STOP
'positive'

////////////////////////////////////////////////////////////
I'm blown away by the functionality of this gadget. It exceeded my expectations.
////////////////////////////////////////////////////////////
==================== llamacpp:openchat-3.5-1210.Q4_K_M.gguf -> OK_STOP
'positive'
==================== openai:gpt-4 -> OK_STOP
'positive'

////////////////////////////////////////////////////////////
The packaging was damaged upon arrival, but the product itself works great.
////////////////////////////////////////////////////////////
==================== llamacpp:openchat-3.5-1210.Q4_K_M.gguf -> OK_STOP
'positive'
==================== openai:gpt-4 -> OK_STOP
'neutral'

////////////////////////////////////////////////////////////
I've been using this tool for months, and it's still as good as new. Highly recommended!
////////////////////////////////////////////////////////////
==================== llamacpp:openchat-3.5-1210.Q4_K_M.gguf -> OK_STOP
'positive'
==================== openai:gpt-4 -> OK_STOP
'positive'

////////////////////////////////////////////////////////////
I regret purchasing this item. It doesn't perform as advertised.
////////////////////////////////////////////////////////////
==================== llamacpp:openchat-3.5-1210.Q4_K_M.gguf -> OK_STOP
'negative'
==================== openai:gpt-4 -> OK_STOP
'negative'

////////////////////////////////////////////////////////////
I've never had so much trouble with a product before. It's been a headache from day one.
////////////////////////////////////////////////////////////
==================== llamacpp:openchat-3.5-1210.Q4_K_M.gguf -> OK_STOP
'negative'
==================== openai:gpt-4 -> OK_STOP
'negative'

////////////////////////////////////////////////////////////
I bought this as a gift for my friend, and they absolutely love it!
////////////////////////////////////////////////////////////
==================== llamacpp:openchat-3.5-1210.Q4_K_M.gguf -> OK_STOP
'positive'
==================== openai:gpt-4 -> OK_STOP
'positive'

////////////////////////////////////////////////////////////
The price seemed steep at first, but after using it, I understand why. Quality product.
////////////////////////////////////////////////////////////
==================== llamacpp:openchat-3.5-1210.Q4_K_M.gguf -> OK_STOP
'positive'
==================== openai:gpt-4 -> OK_STOP
'positive'

////////////////////////////////////////////////////////////
This gizmo is a game-changer for my daily routine. Couldn't be happier with my purchase!
////////////////////////////////////////////////////////////
==================== llamacpp:openchat-3.5-1210.Q4_K_M.gguf -> OK_STOP
'positive'
==================== openai:gpt-4 -> OK_STOP
'positive'

The output format is - see comments nearby -----> arrows:

//////////////////////////////////////////////////////////// -----> This is the model input, a review text:
This gizmo is a game-changer for my daily routine. Couldn't be happier with my purchase!
////////////////////////////////////////////////////////////
==================== llamacpp:openchat-3.5-1210.Q4_K_M.gguf -> OK_STOP  <----- Local model name and result
'positive'  <----- What the local model output
==================== openai:gpt-4 -> OK_STOP  <----- Remote model name and result
'positive'  <----- Remote model output

We also requested the creation of a CSV file with the results: sentiment.csv.

Example's assets at GitHub.