Dataclass
We can also extract objects whose structure is given by a dataclass definition:
Example
from sibila import Models
from dataclasses import dataclass
Models.setup("../models")
model = Models.create("llamacpp:openchat")
@dataclass
class Person:
first_name: str
last_name: str
age: int
occupation: str
source_location: str
in_text = """\
Seated at a corner table was Lucy Bennett, a 28-year-old journalist from London,
her pen poised to capture the essence of the world around her.
Her eyes sparkled with curiosity, mirroring the dynamic energy of her beloved city.
"""
model.extract(Person,
in_text)
See the Pydantic version here.
We can extract a list of Person objects by using list[Person]:
Example
in_text = """\
Seated at a corner table was Lucy Bennett, a 28-year-old journalist from London,
her pen poised to capture the essence of the world around her.
Her eyes sparkled with curiosity, mirroring the dynamic energy of her beloved city.
Opposite Lucy sat Carlos Ramirez, a 35-year-old architect from the sun-kissed
streets of Barcelona. With a sketchbook in hand, he exuded creativity,
his passion for design evident in the thoughtful lines that adorned his face.
"""
model.extract(list[Person],
in_text)
Field annotations#
As when extracting to simple types, we could also provide instructions by setting the inst argument. However, instructions are by nature general and when extracting structured data, it's harder to provide specific instructions for fields.
For this purpose, field annotations are more effective than instructions: they can be provided to clarify what we want extracted for each specific field.
For dataclasses this is done with Annotated[type, "description"] - see the "start" and "end" attributes of the Period class:
Example
from typing import Annotated, Literal, Optional, Union
Weekday = Literal["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"
]
@dataclass
class Period():
start: Annotated[Weekday, "Day of arrival"]
end: Annotated[Weekday, "Day of departure"]
model.extract(Period,
"Right, well, I was planning to arrive on Wednesday and "
"only leave Sunday morning. Would that be okay?")
In this manner, the model can be informed of what is wanted for each specific field.
Optional, default and Union fields#
A field can be marked as optional by annotating with Optional[Type] and setting a default value, as in the "person_name" field:
Example
@dataclass
class Period():
start: Annotated[Weekday, "Day of arrival"]
end: Annotated[Weekday, "Day of departure"]
person_name: Annotated[Optional[str], "Person name if any"] = None
model.extract(Period,
"Right, well, I was planning to arrive on Wednesday and "
"only leave Sunday morning. Would that be okay?")
Due to the dataclass rules, Fields with default values must appear after all other fields.
A field can also be marked as a union of alternative types with Union[Type1,Type2,...] - see the "bags" field below:
Example
class Period(BaseModel):
start: Weekday = Field(description="Day of arrival")
end: Weekday = Field(description="Day of departure")
person_name: Optional[str] = Field(default=None, description="Person name if any")
bags: Annotated[Union[int, str, None], "Number of bags, bag voucher or none"]
person_name: Annotated[Optional[str], "Person name if any"] = None
model.extract(Period,
"Right, well, I was planning to arrive on Wednesday and "
"only leave Sunday morning. Would that be okay?")
Check the Extract dataclass example to see a more sophisticated example of structured data extraction.