Sibila#
Extract structured data from remote or local LLM models. Predictable output is important for serious use of LLMs.
- Query structured data into Pydantic objects, dataclasses or simple types.
- Access remote models from OpenAI, Anthropic, Mistral AI and other providers.
- Use vision models like GPT-4o, to extract structured data from images.
- Run local models like Llama-3, Phi-3, OpenChat or any other GGUF file model.
- Sibila is also a general purpose model access library, to generate plain text or free JSON results, with the same API for local and remote models.
No matter how well you craft a prompt begging a model for the format you need, it can always respond something else. Extracting structured data can be a big step into getting predictable behavior from your models.
See What can you do with Sibila?
Structured data#
To extract structured data, using a local model:
from sibila import Models
from pydantic import BaseModel
class Info(BaseModel):
event_year: int
first_name: str
last_name: str
age_at_the_time: int
nationality: str
model = Models.create("llamacpp:openchat")
model.extract(Info, "Who was the first man in the moon?")
Returns an instance of class Info, created from the model's output:
Info(event_year=1969,
first_name='Neil',
last_name='Armstrong',
age_at_the_time=38,
nationality='American')
Or to use a remote model like OpenAI's GPT-4, we would simply replace the model's name:
If Pydantic BaseModel objects are too much for your project, Sibila supports similar functionality with Python dataclasses. Also includes asynchronous access to remote models.
Vision models#
Sibila supports image input, alongside text prompts. For example, to extract the fields from a receipt in a photo:
from pydantic import Field
model = Models.create("openai:gpt-4o")
class ReceiptLine(BaseModel):
"""Receipt line data"""
description: str
cost: float
class Receipt(BaseModel):
"""Receipt information"""
total: float = Field(description="Total value")
lines: list[ReceiptLine] = Field(description="List of lines of paid items")
info = model.extract(Receipt,
("Extract receipt information.",
"https://upload.wikimedia.org/wikipedia/commons/6/6a/Receipts_in_Italy_13.jpg"))
info
Returns receipt fields structured in a Pydantic object:
Receipt(total=5.88,
lines=[ReceiptLine(description='BIS BORSE TERM.S', cost=3.9),
ReceiptLine(description='GHIACCIO 2X400 G', cost=0.99),
ReceiptLine(description='GHIACCIO 2X400 G', cost=0.99)])
Another example - extracting the most import elements in a photo:
photo = "https://upload.wikimedia.org/wikipedia/commons/thumb/3/32/Hohenloher_Freilandmuseum_-_Baugruppe_Hohenloher_Dorf_-_Bauerngarten_-_Ansicht_von_Osten_im_Juni.jpg/640px-Hohenloher_Freilandmuseum_-_Baugruppe_Hohenloher_Dorf_-_Bauerngarten_-_Ansicht_von_Osten_im_Juni.jpg"
model.extract(list[str],
("Extract up to five of the most important elements in this photo.",
photo))
Returns a list with this five strings:
['House with red roof and beige walls',
'Large tree with green leaves',
'Garden with various plants and flowers',
'Clear blue sky',
'Wooden fence']
Local vision models based on llama.cpp/llava can also be used.
⭐ Like our work? Give us a star!