caikit.interfaces.nlp.data_model.text

Data structures for text representations

Attributes

log

Classes

`Token`	Tokens here are the basic units of text. Tokens can be characters, words,
`TokenizationResults`	Tokenization result generated from a text.
`TokenizationStreamResult`	Streaming tokenization result that indicates up to where in stream is processed.
`ChunkerTokenizationStreamResult`	Streaming tokenization result that provides pointer to the input chunk processed

Module Contents

caikit.interfaces.nlp.data_model.text.log[source]

class caikit.interfaces.nlp.data_model.text.Token[source]

Bases: caikit.core.DataObjectBase

Tokens here are the basic units of text. Tokens can be characters, words, sub-words, or other segments of text or code, depending on the method of tokenization chosen or the task being implemented.

start: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(1)]

end: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]

text: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(3)]

class caikit.interfaces.nlp.data_model.text.TokenizationResults[source]

Bases: caikit.core.DataObjectBase

Tokenization result generated from a text.

results: py_to_proto.dataclass_to_proto.Annotated[List[Token] | None, FieldNumber(1)]

token_count: py_to_proto.dataclass_to_proto.Annotated[int | None, FieldNumber(4)]

class caikit.interfaces.nlp.data_model.text.TokenizationStreamResult[source]

Bases: TokenizationResults

Streaming tokenization result that indicates up to where in stream is processed.

processed_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)]

start_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(3)]

class caikit.interfaces.nlp.data_model.text.ChunkerTokenizationStreamResult[source]

Bases: TokenizationStreamResult

Streaming tokenization result that provides pointer to the input chunk processed

input_start_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(20)]

input_end_index: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(21)]