Skip to content

Dataclass

We can also extract objects whose structure is given by a dataclass definition:

Example

from sibila import Models
from dataclasses import dataclass

Models.setup("../models")
model = Models.create("llamacpp:openchat")

@dataclass
class Person:
    first_name: str
    last_name: str
    age: int
    occupation: str
    source_location: str

in_text = """\
Seated at a corner table was Lucy Bennett, a 28-year-old journalist from London, 
her pen poised to capture the essence of the world around her. 
Her eyes sparkled with curiosity, mirroring the dynamic energy of her beloved city.
"""

model.extract(Person,
              in_text)

Result

Person(first_name='Lucy', 
       last_name='Bennett',
       age=28, 
       occupation='journalist',
       source_location='London')

See the Pydantic version here.

We can extract a list of Person objects by using list[Person]:

Example

in_text = """\
Seated at a corner table was Lucy Bennett, a 28-year-old journalist from London, 
her pen poised to capture the essence of the world around her. 
Her eyes sparkled with curiosity, mirroring the dynamic energy of her beloved city.

Opposite Lucy sat Carlos Ramirez, a 35-year-old architect from the sun-kissed 
streets of Barcelona. With a sketchbook in hand, he exuded creativity, 
his passion for design evident in the thoughtful lines that adorned his face.
"""

model.extract(list[Person],
              in_text)

Result

[Person(first_name='Lucy', 
        last_name='Bennett',
        age=28, 
        occupation='journalist',
        source_location='London'),
 Person(first_name='Carlos', 
        last_name='Ramirez',
        age=35,
        occupation='architect',
        source_location='Barcelona')]

Field annotations#

As when extracting to simple types, we could also provide instructions by setting the inst argument. However, instructions are by nature general and when extracting structured data, it's harder to provide specific instructions for fields.

For this purpose, field annotations are more effective than instructions: they can be provided to clarify what we want extracted for each specific field.

For dataclasses this is done with Annotated[type, "description"] - see the "start" and "end" attributes of the Period class:

Example

from typing import Annotated, Literal, Optional, Union

Weekday = Literal["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"
]

@dataclass
class Period():
    start: Annotated[Weekday, "Day of arrival"]
    end: Annotated[Weekday, "Day of departure"]

model.extract(Period,
              "Right, well, I was planning to arrive on Wednesday and "
              "only leave Sunday morning. Would that be okay?")

Result

Period(start='Wednesday', end='Sunday')

In this manner, the model can be informed of what is wanted for each specific field.

Optional, default and Union fields#

A field can be marked as optional by annotating with Optional[Type] and setting a default value, as in the "person_name" field:

Example

@dataclass
class Period():
    start: Annotated[Weekday, "Day of arrival"]
    end: Annotated[Weekday, "Day of departure"]
    person_name: Annotated[Optional[str], "Person name if any"] = None

model.extract(Period,
              "Right, well, I was planning to arrive on Wednesday and "
              "only leave Sunday morning. Would that be okay?")

Result

Period(start='Wednesday', end='Sunday', person_name=None)

Due to the dataclass rules, Fields with default values must appear after all other fields.

A field can also be marked as a union of alternative types with Union[Type1,Type2,...] - see the "bags" field below:

Example

class Period(BaseModel):
    start: Weekday = Field(description="Day of arrival")
    end: Weekday = Field(description="Day of departure")
    person_name: Optional[str] = Field(default=None, description="Person name if any")
    bags: Annotated[Union[int, str, None], "Number of bags, bag voucher or none"]
    person_name: Annotated[Optional[str], "Person name if any"] = None

model.extract(Period,
              "Right, well, I was planning to arrive on Wednesday and "
              "only leave Sunday morning. Would that be okay?")

Result

Period(start='Wednesday', end='Sunday', person_name=None, bags=None)

Check the Extract dataclass example to see a more sophisticated example of structured data extraction.