Skip to content

Pydantic

Besides simple types and enums, we can also extract objects whose structure is given by a class derived from Pydantic's BaseModel definition:

Example

from sibila import Models
from pydantic import BaseModel

Models.setup("../models")
model = Models.create("llamacpp:openchat")

class Person(BaseModel):
    first_name: str
    last_name: str
    age: int
    occupation: str
    source_location: str

in_text = """\
Seated at a corner table was Lucy Bennett, a 28-year-old journalist from London, 
her pen poised to capture the essence of the world around her. 
Her eyes sparkled with curiosity, mirroring the dynamic energy of her beloved city.
"""

model.extract(Person,
              in_text)

Result

Person(first_name='Lucy', 
       last_name='Bennett',
       age=28, 
       occupation='journalist',
       source_location='London')

See the dataclass version here.

We can extract a list of Person objects by using list[Person]:

Example

in_text = """\
Seated at a corner table was Lucy Bennett, a 28-year-old journalist from London, 
her pen poised to capture the essence of the world around her. 
Her eyes sparkled with curiosity, mirroring the dynamic energy of her beloved city.

Opposite Lucy sat Carlos Ramirez, a 35-year-old architect from the sun-kissed 
streets of Barcelona. With a sketchbook in hand, he exuded creativity, 
his passion for design evident in the thoughtful lines that adorned his face.
"""

model.extract(list[Person],
              in_text)

Result

[Person(first_name='Lucy', 
        last_name='Bennett',
        age=28, 
        occupation='journalist',
        source_location='London'),
 Person(first_name='Carlos', 
        last_name='Ramirez',
        age=35,
        occupation='architect',
        source_location='Barcelona')]

Field annotations#

As when extracting to simple types, we could also provide instructions by setting the inst argument. However, instructions are by nature general and when extracting structured data, it's harder to provide specific instructions for fields.

For this purpose, field annotations are more effective than instructions: they can be provided to clarify what we want extracted for each specific field.

For Pydantic this is done with Field(description="description") - see the "start" and "end" attributes of the Period class:

Example

from typing import Literal, Optional, Union
from pydantic import BaseModel, Field

Weekday = Literal["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"
]

class Period(BaseModel):
    start: Weekday = Field(description="Day of arrival")
    end: Weekday = Field(description="Day of departure")

model.extract(Period,
              "Right, well, I was planning to arrive on Wednesday and "
              "only leave Sunday morning. Would that be okay?")

Result

Period(start='Wednesday', end='Sunday')

In this manner, the model can be informed of what is wanted for each specific field.

Optional, default and Union fields#

A field can be marked as optional by annotating with Optional[Type] and setting a default value, as in the "person_name" field:

Example

class Period(BaseModel):
    start: Weekday = Field(description="Day of arrival")
    end: Weekday = Field(description="Day of departure")
    person_name: Optional[str] = Field(default=None, description="Person name if any")

model.extract(Period,
              "Right, well, I was planning to arrive on Wednesday and "
              "only leave Sunday morning. Would that be okay?")

Result

Period(start='Wednesday', end='Sunday', person_name=None)

A field can also be marked as a union of alternative types with Union[Type1,Type2,...] as in the "bags" field below:

Example

class Period(BaseModel):
    start: Weekday = Field(description="Day of arrival")
    end: Weekday = Field(description="Day of departure")
    person_name: Optional[str] = Field(default=None, description="Person name if any")
    bags: Union[int, str, None] = Field(description="Number of bags, bag voucher or none")

model.extract(Period,
              "Right, well, I was planning to arrive on Wednesday and "
              "only leave Sunday morning. Would that be okay?")

Result

Period(start='Wednesday', end='Sunday', person_name=None, bags=None)

Check the Extract Pydantic example to see an interesting example of structured extraction.