LlamaCpp#
LlamaCppTokenizer #
Tokenizer for llama.cpp loaded GGUF models.
Source code in sibila/llamacpp.py
encode #
Encode text into model tokens. Inverse of Decode().
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
Text to be encoded. |
required |
Returns:
Type | Description |
---|---|
list[int]
|
A list of ints with the encoded tokens. |
Source code in sibila/llamacpp.py
decode #
Decode model tokens to text. Inverse of Encode().
Using instead of llama-cpp-python's to fix error: remove first character after a bos only if it's a space.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
token_ids |
list[int]
|
List of model tokens. |
required |
skip_special |
bool
|
Don't decode special tokens like bos and eos. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
str
|
Decoded text. |
Source code in sibila/llamacpp.py
OpenAI#
OpenAITokenizer #
Tokenizer for OpenAI models.
Source code in sibila/openai.py
encode #
decode #
Decode model tokens to text. Inverse of Encode().
Parameters:
Name | Type | Description | Default |
---|---|---|---|
token_ids |
list[int]
|
List of model tokens. |
required |
skip_special |
bool
|
Don't decode special tokens like bos and eos. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
str
|
Decoded text. |