caikit.interfaces.ts.data_model =============================== .. py:module:: caikit.interfaces.ts.data_model .. autoapi-nested-parse:: Data model definitions for structures in the time series domain Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/caikit/interfaces/ts/data_model/_single_timeseries/index /autoapi/caikit/interfaces/ts/data_model/backends/index /autoapi/caikit/interfaces/ts/data_model/package/index /autoapi/caikit/interfaces/ts/data_model/time_types/index /autoapi/caikit/interfaces/ts/data_model/timeseries/index /autoapi/caikit/interfaces/ts/data_model/timeseries_evaluation/index /autoapi/caikit/interfaces/ts/data_model/toolkit/index Attributes ---------- .. autoapisummary:: caikit.interfaces.ts.data_model.TS_PACKAGE Classes ------- .. autoapisummary:: caikit.interfaces.ts.data_model.PeriodicTimeSequence caikit.interfaces.ts.data_model.PointTimeSequence caikit.interfaces.ts.data_model.Seconds caikit.interfaces.ts.data_model.TimeDuration caikit.interfaces.ts.data_model.TimePoint caikit.interfaces.ts.data_model.ValueSequence caikit.interfaces.ts.data_model.SingleTimeSeries caikit.interfaces.ts.data_model.TimeSeries caikit.interfaces.ts.data_model.Id caikit.interfaces.ts.data_model.EvaluationRecord caikit.interfaces.ts.data_model.EvaluationResult Package Contents ---------------- .. py:data:: TS_PACKAGE :value: 'caikit_data_model.timeseries' .. py:class:: PeriodicTimeSequence Bases: :py:obj:`caikit.core.DataObjectBase` A PeriodicTimeSequence represents an indefinite time sequence where ticks occur at a regular period .. py:attribute:: start_time :type: py_to_proto.dataclass_to_proto.Annotated[TimePoint, FieldNumber(1)] .. py:attribute:: period_length :type: py_to_proto.dataclass_to_proto.Annotated[TimeDuration, FieldNumber(2)] .. py:class:: PointTimeSequence Bases: :py:obj:`caikit.core.DataObjectBase` A PointTimeSequence represents a finite sequence of time points that may or may not be evenly distributed in time .. py:attribute:: points :type: py_to_proto.dataclass_to_proto.Annotated[List[TimePoint], FieldNumber(1)] .. py:class:: Seconds Bases: :py:obj:`caikit.core.DataObjectBase` A nanosecond value that can be interpreted as either a datetime or a timedelta .. py:attribute:: seconds :type: py_to_proto.dataclass_to_proto.Annotated[float, FieldNumber(1)] .. py:method:: as_datetime() -> datetime.datetime Return a python datetime object. The returned object will have timezone.utc set as its timezone info. .. py:method:: as_timedelta() -> datetime.timedelta Interpret these nanoseconds as a duration .. py:method:: from_datetime(time_point: datetime.datetime) -> Seconds :classmethod: Create a Seconds from a datetime .. py:method:: from_timedelta(time_delta: datetime.timedelta) -> Seconds :classmethod: Create a Seconds from a timedelta .. py:class:: TimeDuration Bases: :py:obj:`caikit.core.DataObjectBase` The core data model object for a TimeDuration .. py:attribute:: time :type: Union[py_to_proto.dataclass_to_proto.Annotated[int, OneofField('dt_int'), FieldNumber(1)], py_to_proto.dataclass_to_proto.Annotated[float, OneofField('dt_float'), FieldNumber(2)], py_to_proto.dataclass_to_proto.Annotated[str, OneofField('dt_str'), FieldNumber(3)], py_to_proto.dataclass_to_proto.Annotated[Seconds, OneofField('dt_sec'), FieldNumber(4)]] .. py:class:: TimePoint Bases: :py:obj:`caikit.core.DataObjectBase` The core data model object for a TimePoint .. py:attribute:: time :type: Union[py_to_proto.dataclass_to_proto.Annotated[int, OneofField('ts_int'), FieldNumber(1)], py_to_proto.dataclass_to_proto.Annotated[float, OneofField('ts_float'), FieldNumber(2)], py_to_proto.dataclass_to_proto.Annotated[Seconds, OneofField('ts_epoch'), FieldNumber(3)]] .. py:class:: ValueSequence Bases: :py:obj:`caikit.core.DataObjectBase` A ValueSequence is a finite list of contiguous values, each representing the value of a given attribute for a specific observation within a TimeSeries .. py:class:: IntValueSequence Bases: :py:obj:`caikit.core.DataObjectBase` Nested value sequence of integers .. py:attribute:: values :type: py_to_proto.dataclass_to_proto.Annotated[List[int], FieldNumber(1)] .. py:class:: FloatValueSequence Bases: :py:obj:`caikit.core.DataObjectBase` Nested value sequence of floats .. py:attribute:: values :type: py_to_proto.dataclass_to_proto.Annotated[List[float], FieldNumber(1)] .. py:class:: StrValueSequence Bases: :py:obj:`caikit.core.DataObjectBase` Nested value sequence of strings .. py:attribute:: values :type: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(1)] .. py:class:: VectorValueSequence Bases: :py:obj:`caikit.core.DataObjectBase` Nested value sequence of vectors .. py:attribute:: values :type: py_to_proto.dataclass_to_proto.Annotated[List[Vector], FieldNumber(1)] .. py:method:: __post_init__() .. py:method:: _convert_np_to_list(v) .. py:method:: to_dict() Convert to a dictionary representation. .. py:method:: fill_proto(proto) Populate a protobufs with the values from this data model object. Args: proto: A protocol buffer to be populated. Returns: protobufs: The filled protobufs. Notes: The protobufs is filled in place, so the argument and the return value are the same at the end of this call. .. py:method:: from_proto(proto) :classmethod: Build a DataBase from protobufs. Args: proto: A protocol buffer to serialize from. Returns: protobufs: A DataBase object. .. py:class:: TimePointSequence Bases: :py:obj:`caikit.core.DataObjectBase` Nested value sequence of TimePoints .. py:attribute:: values :type: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(1)] .. py:class:: AnyValueSequence Bases: :py:obj:`caikit.core.DataObjectBase` Nested value sequence of Any objects .. py:attribute:: values :type: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(1)] .. py:method:: decode_values(values: Tuple[str]) :classmethod: Cached class method to enable caching of decoded representations .. py:method:: to_dict() Convert to a dictionary representation. .. py:method:: fill_proto(proto) Populate a protobufs with the values from this data model object. Args: proto: A protocol buffer to be populated. Returns: protobufs: The filled protobufs. Notes: The protobufs is filled in place, so the argument and the return value are the same at the end of this call. .. py:method:: from_proto(proto) :classmethod: Build a DataBase from protobufs. Args: proto: A protocol buffer to serialize from. Returns: protobufs: A DataBase object. .. py:attribute:: sequence :type: Union[py_to_proto.dataclass_to_proto.Annotated[ValueSequence.IntValueSequence, OneofField('val_int'), FieldNumber(1)], py_to_proto.dataclass_to_proto.Annotated[ValueSequence.FloatValueSequence, OneofField('val_float'), FieldNumber(2)], py_to_proto.dataclass_to_proto.Annotated[ValueSequence.StrValueSequence, OneofField('val_str'), FieldNumber(3)], py_to_proto.dataclass_to_proto.Annotated[ValueSequence.TimePointSequence, OneofField('val_timepoint'), FieldNumber(4)], py_to_proto.dataclass_to_proto.Annotated[ValueSequence.AnyValueSequence, OneofField('val_any'), FieldNumber(5)], py_to_proto.dataclass_to_proto.Annotated[ValueSequence.VectorValueSequence, OneofField('val_vector'), FieldNumber(6)]] .. py:class:: SingleTimeSeries(*args, **kwargs) Bases: :py:obj:`caikit.core.DataObjectBase` The TimeSeries object is the central data container for the library. At present it wraps either a pandas.DataFrame, or pyspark.sql.DataFrame to bind into the caikit data model. .. py:class:: StringIDSequence Bases: :py:obj:`caikit.core.DataObjectBase` Nested value sequence of strings .. py:attribute:: values :type: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(1)] .. py:class:: IntIDSequence Bases: :py:obj:`caikit.core.DataObjectBase` Nested value sequence of ints .. py:attribute:: values :type: py_to_proto.dataclass_to_proto.Annotated[List[int], FieldNumber(1)] .. py:attribute:: time_sequence :type: Union[py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.ts.data_model.time_types.PeriodicTimeSequence, OneofField('time_period'), FieldNumber(10)], py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.ts.data_model.time_types.PointTimeSequence, OneofField('time_points'), FieldNumber(20)]] .. py:attribute:: values :type: py_to_proto.dataclass_to_proto.Annotated[List[caikit.interfaces.ts.data_model.time_types.ValueSequence], FieldNumber(1)] .. py:attribute:: timestamp_label :type: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(2)] .. py:attribute:: value_labels :type: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(3)] .. py:attribute:: ids :type: Union[py_to_proto.dataclass_to_proto.Annotated[SingleTimeSeries.IntIDSequence, OneofField('id_int'), FieldNumber(30)], py_to_proto.dataclass_to_proto.Annotated[SingleTimeSeries.StringIDSequence, OneofField('id_str'), FieldNumber(40)]] .. py:attribute:: _DEFAULT_TS_COL :value: 'timestamp' .. py:method:: _get_pd_df() -> Tuple[pandas.DataFrame, str, Iterable[str]] Convert the data to a pandas DataFrame, efficiently if possible .. py:method:: __len__() -> int Return the length of the single time series object. Returns: int: Length .. py:method:: __eq__(other: SingleTimeSeries) -> bool Equivalence operator for SingleTimeSeries objects. Performs ordering of data based on timestamp_label prior to checking for equivalence. Relies on underlying pandas equivalence testing function `pd.testing.assert_frame_equal`. Args: other (SingleTimeSeries): SingleTimeSeries to test against. Returns: bool: True if the SingleTimeSeries are equivalent. .. py:method:: _as_pandas_ops(adf, include_timestamps: Union[None, bool] = False) operate on pandas-like object instead of strictly pandas .. py:method:: as_pandas(include_timestamps: Optional[bool] = None) -> pandas.DataFrame Get the view of this timeseries as a pandas DataFrame Args: include_timestamps (bool, optional): Control the addition or removal of timestamps. True will include timestamps, generating if needed, while False will remove timestamps. Use None to returned what is available, leaving unchanged. Defaults to None. Returns: pd.DataFrame: The view of the data as a pandas DataFrame .. py:method:: as_spark(include_timestamps: Optional[bool] = None) -> caikit.interfaces.ts.data_model.toolkit.optional_dependencies.pyspark.sql.DataFrame Get the view of this timeseries as a spark DataFrame Args: include_timestamps (bool, optional): Control the addition or removal of timestamps. True will include timestamps, generating if needed, while False will remove timestamps. Use None to returned what is available, leaving unchanged. Defaults to None. Returns: pyspark.sql.DataFrame: The view of the data as a spark DataFrame .. py:class:: TimeSeries(*args, **kwargs) Bases: :py:obj:`caikit.core.DataObjectBase` A DataObject is a data model class that is backed by a @dataclass. Data model classes that use the @dataobject decorator must derive from this base class. .. py:attribute:: timeseries :type: List[caikit.interfaces.ts.data_model._single_timeseries.SingleTimeSeries] .. py:attribute:: id_labels :type: List[str] .. py:attribute:: producer_id :type: caikit.core.data_model.ProducerId .. py:attribute:: _DEFAULT_ID_COL :value: '_TS_RESERVED' .. py:attribute:: _DEFAULT_TS_COL :value: 'timestamp' .. py:method:: __len__() -> int Return the length of the time series object. Returns: int: Length .. py:method:: __eq__(other: TimeSeries) -> bool Equivalence operator for TimeSeries objects. Args: other (TimeSeries): TimeSeries to test against. Returns: bool: True if the TimeSeries are equivalent. .. py:method:: _get_pd_df() -> Tuple[pandas.DataFrame, Iterable[str], str, Iterable[str]] Convert the data to a pandas DataFrame, efficiently if possible .. py:method:: as_pandas(include_timestamps: Optional[bool] = None, is_multi: Optional[bool] = None) -> pandas.DataFrame Get the view of this timeseries as a pandas DataFrame Args: include_timestamps (bool, optional): Control the addition or removal of timestamps. True will include timestamps, generating if needed, while False will remove timestamps. Use None to returned what is available, leaving unchanged. Defaults to None. is_multi (bool, optional): Controls how id_labels are handled in the output. If the id_labels are specified in the data model, they are always returned. If there are no id_labels specified, setting is_multi to True will add a new column with generated id labels (0), while False or None will not add any id_labels. Returns: pd.DataFrame: The view of the data as a pandas DataFrame .. py:method:: as_spark(include_timestamps: Optional[bool] = None, is_multi: Optional[bool] = None) -> caikit.interfaces.ts.data_model.toolkit.optional_dependencies.pyspark.sql.DataFrame Get the view of this timeseries as a spark DataFrame Args: include_timestamps (bool, optional): Control the addition or removal of timestamps. True will include timestamps, generating if needed, while False will remove timestamps. Use None to returned what is available, leaving unchanged. Defaults to None. is_multi (bool, optional): Controls how id_labels are handled in the output. If the id_labels are specified in the data model, they are always returned. If there are no id_labels specified, setting is_multi to True will add a new column with generated id labels (0), while False or None will not add any id_labels. Returns: pyspark.sql.DataFrame: The view of the data as a spark DataFrame .. py:class:: Id Bases: :py:obj:`caikit.core.DataObjectBase` A single instance of Id Representation of ids that can be either text or index. Customized this way to be able to work with repeated .. py:attribute:: value :type: Union[py_to_proto.dataclass_to_proto.Annotated[str, OneofField('text'), FieldNumber(1)], py_to_proto.dataclass_to_proto.Annotated[int, OneofField('index'), FieldNumber(2)]] .. py:class:: EvaluationRecord(id_values=None, metric_values=None, offset=None) Bases: :py:obj:`caikit.core.DataObjectBase` A single EvaluationRecord for EvaluationResult Representation of EvaluationRecord for each row in the dataframe EvaluationRecord{id_values=["A", "B"], metric_values=[0.234, 0.568, 0.417], offset="overall"} .. py:attribute:: id_values :type: py_to_proto.dataclass_to_proto.Annotated[List[Id], FieldNumber(1)] .. py:attribute:: metric_values :type: py_to_proto.dataclass_to_proto.Annotated[List[float], FieldNumber(2)] .. py:attribute:: offset :type: py_to_proto.dataclass_to_proto.Annotated[Id, FieldNumber(3)] .. py:class:: EvaluationResult(records=None, id_cols=None, metric_cols=None, offset_col=None, df=None, producer_id=None) Bases: :py:obj:`caikit.core.DataObjectBase` EvaluationResult containing the evaluation results Representation of EvaluationResult stores rows of the dataframe as list of records string lists to keep track of id and metric columns .. py:attribute:: records :type: py_to_proto.dataclass_to_proto.Annotated[List[EvaluationRecord], FieldNumber(1)] .. py:attribute:: id_cols :type: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(2)] .. py:attribute:: metric_cols :type: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(3)] .. py:attribute:: offset_col :type: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(4)] .. py:attribute:: producer_id :type: py_to_proto.dataclass_to_proto.Annotated[caikit.core.data_model.ProducerId, FieldNumber(5)] .. py:method:: as_pandas() -> pandas.DataFrame Generate and return a pandas DataFrame