caikit.interfaces.nlp.data_model.text

Data structures for text representations

Attributes

log

Classes

Token

Tokens here are the basic units of text. Tokens can be characters, words,

TokenizationResults

Tokenization result generated from a text.

TokenizationStreamResult

Streaming tokenization result that indicates up to where in stream is processed.

ChunkerTokenizationStreamResult

Streaming tokenization result that provides pointer to the input chunk processed

Module Contents

caikit.interfaces.nlp.data_model.text.log[source]
class caikit.interfaces.nlp.data_model.text.Token[source]

Bases: caikit.core.DataObjectBase

Tokens here are the basic units of text. Tokens can be characters, words, sub-words, or other segments of text or code, depending on the method of tokenization chosen or the task being implemented.

start: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(1)]
end: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
text: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(3)]
class caikit.interfaces.nlp.data_model.text.TokenizationResults[source]

Bases: caikit.core.DataObjectBase

Tokenization result generated from a text.

results: py_to_proto.dataclass_to_proto.Annotated[List[Token] | None, FieldNumber(1)]
token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(4)]
class caikit.interfaces.nlp.data_model.text.TokenizationStreamResult[source]

Bases: TokenizationResults

Streaming tokenization result that indicates up to where in stream is processed.

processed_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]
start_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(3)]
class caikit.interfaces.nlp.data_model.text.ChunkerTokenizationStreamResult[source]

Bases: TokenizationStreamResult

Streaming tokenization result that provides pointer to the input chunk processed

input_start_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(20)]
input_end_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(21)]