Pydantic
Besides simple types and enums, we can also extract objects whose structure is given by a class derived from Pydantic's BaseModel definition:
Example
from sibila import Models
from pydantic import BaseModel
Models.setup("../models")
model = Models.create("llamacpp:openchat")
class Person(BaseModel):
first_name: str
last_name: str
age: int
occupation: str
source_location: str
in_text = """\
Seated at a corner table was Lucy Bennett, a 28-year-old journalist from London,
her pen poised to capture the essence of the world around her.
Her eyes sparkled with curiosity, mirroring the dynamic energy of her beloved city.
"""
model.extract(Person,
in_text)
See the dataclass version here.
We can extract a list of Person objects by using list[Person]:
Example
in_text = """\
Seated at a corner table was Lucy Bennett, a 28-year-old journalist from London,
her pen poised to capture the essence of the world around her.
Her eyes sparkled with curiosity, mirroring the dynamic energy of her beloved city.
Opposite Lucy sat Carlos Ramirez, a 35-year-old architect from the sun-kissed
streets of Barcelona. With a sketchbook in hand, he exuded creativity,
his passion for design evident in the thoughtful lines that adorned his face.
"""
model.extract(list[Person],
in_text)
Field annotations#
As when extracting to simple types, we could also provide instructions by setting the inst argument. However, instructions are by nature general and when extracting structured data, it's harder to provide specific instructions for fields.
For this purpose, field annotations are more effective than instructions: they can be provided to clarify what we want extracted for each specific field.
For Pydantic this is done with Field(description="description") - see the "start" and "end" attributes of the Period class:
Example
from typing import Literal, Optional, Union
from pydantic import BaseModel, Field
Weekday = Literal["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"
]
class Period(BaseModel):
start: Weekday = Field(description="Day of arrival")
end: Weekday = Field(description="Day of departure")
model.extract(Period,
"Right, well, I was planning to arrive on Wednesday and "
"only leave Sunday morning. Would that be okay?")
In this manner, the model can be informed of what is wanted for each specific field.
Optional, default and Union fields#
A field can be marked as optional by annotating with Optional[Type] and setting a default value, as in the "person_name" field:
Example
class Period(BaseModel):
start: Weekday = Field(description="Day of arrival")
end: Weekday = Field(description="Day of departure")
person_name: Optional[str] = Field(default=None, description="Person name if any")
model.extract(Period,
"Right, well, I was planning to arrive on Wednesday and "
"only leave Sunday morning. Would that be okay?")
A field can also be marked as a union of alternative types with Union[Type1,Type2,...] as in the "bags" field below:
Example
class Period(BaseModel):
start: Weekday = Field(description="Day of arrival")
end: Weekday = Field(description="Day of departure")
person_name: Optional[str] = Field(default=None, description="Person name if any")
bags: Union[int, str, None] = Field(description="Number of bags, bag voucher or none")
model.extract(Period,
"Right, well, I was planning to arrive on Wednesday and "
"only leave Sunday morning. Would that be okay?")
Check the Extract Pydantic example to see an interesting example of structured extraction.