caikit.interfaces.nlp.data_model.text ===================================== .. py:module:: caikit.interfaces.nlp.data_model.text .. autoapi-nested-parse:: Data structures for text representations Attributes ---------- .. autoapisummary:: caikit.interfaces.nlp.data_model.text.log Classes ------- .. autoapisummary:: caikit.interfaces.nlp.data_model.text.Token caikit.interfaces.nlp.data_model.text.TokenizationResults caikit.interfaces.nlp.data_model.text.TokenizationStreamResult caikit.interfaces.nlp.data_model.text.ChunkerTokenizationStreamResult Module Contents --------------- .. py:data:: log .. py:class:: Token Bases: :py:obj:`caikit.core.DataObjectBase` Tokens here are the basic units of text. Tokens can be characters, words, sub-words, or other segments of text or code, depending on the method of tokenization chosen or the task being implemented. .. py:attribute:: start :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(1)] .. py:attribute:: end :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)] .. py:attribute:: text :type: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(3)] .. py:class:: TokenizationResults Bases: :py:obj:`caikit.core.DataObjectBase` Tokenization result generated from a text. .. py:attribute:: results :type: py_to_proto.dataclass_to_proto.Annotated[Optional[List[Token]], FieldNumber(1)] .. py:attribute:: token_count :type: py_to_proto.dataclass_to_proto.Annotated[Optional[int], FieldNumber(4)] .. py:class:: TokenizationStreamResult Bases: :py:obj:`TokenizationResults` Streaming tokenization result that indicates up to where in stream is processed. .. py:attribute:: processed_index :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(2)] .. py:attribute:: start_index :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(3)] .. py:class:: ChunkerTokenizationStreamResult Bases: :py:obj:`TokenizationStreamResult` Streaming tokenization result that provides pointer to the input chunk processed .. py:attribute:: input_start_index :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(20)] .. py:attribute:: input_end_index :type: py_to_proto.dataclass_to_proto.Annotated[int, FieldNumber(21)]