caikit.interfaces.ts.data_model

Data model definitions for structures in the time series domain

Submodules

Attributes

TS_PACKAGE

Classes

`PeriodicTimeSequence`	A PeriodicTimeSequence represents an indefinite time sequence where ticks
`PointTimeSequence`	A PointTimeSequence represents a finite sequence of time points that may
`Seconds`	A nanosecond value that can be interpreted as either a datetime or a
`TimeDuration`	The core data model object for a TimeDuration
`TimePoint`	The core data model object for a TimePoint
`ValueSequence`	A ValueSequence is a finite list of contiguous values, each representing
`SingleTimeSeries`	The TimeSeries object is the central data container for the library.
`TimeSeries`	A DataObject is a data model class that is backed by a @dataclass.
`Id`	A single instance of Id
`EvaluationRecord`	A single EvaluationRecord for EvaluationResult
`EvaluationResult`	EvaluationResult containing the evaluation results

Package Contents

caikit.interfaces.ts.data_model.TS_PACKAGE = 'caikit_data_model.timeseries'

class caikit.interfaces.ts.data_model.PeriodicTimeSequence[source]

Bases: caikit.core.DataObjectBase

A PeriodicTimeSequence represents an indefinite time sequence where ticks occur at a regular period

start_time: py_to_proto.dataclass_to_proto.Annotated[TimePoint, FieldNumber(1)]

period_length: py_to_proto.dataclass_to_proto.Annotated[TimeDuration, FieldNumber(2)]

class caikit.interfaces.ts.data_model.PointTimeSequence[source]

Bases: caikit.core.DataObjectBase

A PointTimeSequence represents a finite sequence of time points that may or may not be evenly distributed in time

points: py_to_proto.dataclass_to_proto.Annotated[List[TimePoint], FieldNumber(1)]

class caikit.interfaces.ts.data_model.Seconds[source]

Bases: caikit.core.DataObjectBase

A nanosecond value that can be interpreted as either a datetime or a timedelta

seconds: py_to_proto.dataclass_to_proto.Annotated[float, FieldNumber(1)]

as_datetime() → datetime.datetime[source]: Return a python datetime object. The returned object will have timezone.utc set as its timezone info.

as_timedelta() → datetime.timedelta[source]: Interpret these nanoseconds as a duration

classmethod from_datetime(time_point: datetime.datetime) → Seconds[source]: Create a Seconds from a datetime

classmethod from_timedelta(time_delta: datetime.timedelta) → Seconds[source]: Create a Seconds from a timedelta

class caikit.interfaces.ts.data_model.TimeDuration[source]

Bases: caikit.core.DataObjectBase

The core data model object for a TimeDuration

time: py_to_proto.dataclass_to_proto.Annotated[int, OneofField('dt_int'), FieldNumber(1)] | py_to_proto.dataclass_to_proto.Annotated[float, OneofField('dt_float'), FieldNumber(2)] | py_to_proto.dataclass_to_proto.Annotated[str, OneofField('dt_str'), FieldNumber(3)] | py_to_proto.dataclass_to_proto.Annotated[Seconds, OneofField('dt_sec'), FieldNumber(4)]

class caikit.interfaces.ts.data_model.TimePoint[source]

Bases: caikit.core.DataObjectBase

The core data model object for a TimePoint

time: py_to_proto.dataclass_to_proto.Annotated[int, OneofField('ts_int'), FieldNumber(1)] | py_to_proto.dataclass_to_proto.Annotated[float, OneofField('ts_float'), FieldNumber(2)] | py_to_proto.dataclass_to_proto.Annotated[Seconds, OneofField('ts_epoch'), FieldNumber(3)]

class caikit.interfaces.ts.data_model.ValueSequence[source]

Bases: caikit.core.DataObjectBase

A ValueSequence is a finite list of contiguous values, each representing the value of a given attribute for a specific observation within a TimeSeries

class IntValueSequence[source]

Bases: caikit.core.DataObjectBase

Nested value sequence of integers

values: py_to_proto.dataclass_to_proto.Annotated[List[int], FieldNumber(1)]

class FloatValueSequence[source]

Bases: caikit.core.DataObjectBase

Nested value sequence of floats

values: py_to_proto.dataclass_to_proto.Annotated[List[float], FieldNumber(1)]

class StrValueSequence[source]

Bases: caikit.core.DataObjectBase

Nested value sequence of strings

values: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(1)]

class VectorValueSequence[source]

Bases: caikit.core.DataObjectBase

Nested value sequence of vectors

values: py_to_proto.dataclass_to_proto.Annotated[List[Vector], FieldNumber(1)]

__post_init__()[source]

_convert_np_to_list(v)[source]

to_dict()[source]: Convert to a dictionary representation.

fill_proto(proto)[source]

Populate a protobufs with the values from this data model object.

Args:: proto: A protocol buffer to be populated.
Returns:: protobufs: The filled protobufs.
Notes:: The protobufs is filled in place, so the argument and the return value are the same at the end of this call.

classmethod from_proto(proto)[source]

Build a DataBase from protobufs.

Args:: proto: A protocol buffer to serialize from.
Returns:: protobufs: A DataBase object.

class TimePointSequence[source]

Bases: caikit.core.DataObjectBase

Nested value sequence of TimePoints

values: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(1)]

class AnyValueSequence[source]

Bases: caikit.core.DataObjectBase

Nested value sequence of Any objects

values: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(1)]

classmethod decode_values(values: Tuple[str])[source]: Cached class method to enable caching of decoded representations

to_dict()[source]: Convert to a dictionary representation.

fill_proto(proto)[source]

Populate a protobufs with the values from this data model object.

Args:: proto: A protocol buffer to be populated.
Returns:: protobufs: The filled protobufs.
Notes:: The protobufs is filled in place, so the argument and the return value are the same at the end of this call.

classmethod from_proto(proto)[source]

Build a DataBase from protobufs.

Args:: proto: A protocol buffer to serialize from.
Returns:: protobufs: A DataBase object.

sequence: py_to_proto.dataclass_to_proto.Annotated[ValueSequence.IntValueSequence, OneofField('val_int'), FieldNumber(1)] | py_to_proto.dataclass_to_proto.Annotated[ValueSequence.FloatValueSequence, OneofField('val_float'), FieldNumber(2)] | py_to_proto.dataclass_to_proto.Annotated[ValueSequence.StrValueSequence, OneofField('val_str'), FieldNumber(3)] | py_to_proto.dataclass_to_proto.Annotated[ValueSequence.TimePointSequence, OneofField('val_timepoint'), FieldNumber(4)] | py_to_proto.dataclass_to_proto.Annotated[ValueSequence.AnyValueSequence, OneofField('val_any'), FieldNumber(5)] | py_to_proto.dataclass_to_proto.Annotated[ValueSequence.VectorValueSequence, OneofField('val_vector'), FieldNumber(6)]

class caikit.interfaces.ts.data_model.SingleTimeSeries(*args, **kwargs)[source]

Bases: caikit.core.DataObjectBase

The TimeSeries object is the central data container for the library. At present it wraps either a pandas.DataFrame, or pyspark.sql.DataFrame to bind into the caikit data model.

class StringIDSequence[source]

Bases: caikit.core.DataObjectBase

Nested value sequence of strings

values: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(1)]

class IntIDSequence[source]

Bases: caikit.core.DataObjectBase

Nested value sequence of ints

values: py_to_proto.dataclass_to_proto.Annotated[List[int], FieldNumber(1)]

time_sequence: py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.ts.data_model.time_types.PeriodicTimeSequence, OneofField('time_period'), FieldNumber(10)] | py_to_proto.dataclass_to_proto.Annotated[caikit.interfaces.ts.data_model.time_types.PointTimeSequence, OneofField('time_points'), FieldNumber(20)]

values: py_to_proto.dataclass_to_proto.Annotated[List[caikit.interfaces.ts.data_model.time_types.ValueSequence], FieldNumber(1)]

timestamp_label: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(2)]

value_labels: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(3)]

ids: py_to_proto.dataclass_to_proto.Annotated[SingleTimeSeries.IntIDSequence, OneofField('id_int'), FieldNumber(30)] | py_to_proto.dataclass_to_proto.Annotated[SingleTimeSeries.StringIDSequence, OneofField('id_str'), FieldNumber(40)]

_DEFAULT_TS_COL = 'timestamp'

_get_pd_df() → Tuple[pandas.DataFrame, str, Iterable[str]][source]: Convert the data to a pandas DataFrame, efficiently if possible

__len__() → int[source]

Return the length of the single time series object.

Returns:: int: Length

__eq__(other: SingleTimeSeries) → bool[source]

Equivalence operator for SingleTimeSeries objects.

Performs ordering of data based on timestamp_label prior to checking for equivalence. Relies on underlying pandas equivalence testing function pd.testing.assert_frame_equal.

Args:: other (SingleTimeSeries): SingleTimeSeries to test against.
Returns:: bool: True if the SingleTimeSeries are equivalent.

_as_pandas_ops(adf, include_timestamps: None | bool = False)[source]: operate on pandas-like object instead of strictly pandas

as_pandas(include_timestamps: bool | None = None) → pandas.DataFrame[source]

Get the view of this timeseries as a pandas DataFrame

Args:: include_timestamps (bool, optional): Control the addition or removal of timestamps. True will include timestamps, generating if needed, while False will remove timestamps. Use None to returned what is available, leaving unchanged. Defaults to None.
Returns:: pd.DataFrame: The view of the data as a pandas DataFrame

as_spark(include_timestamps: bool | None = None) → caikit.interfaces.ts.data_model.toolkit.optional_dependencies.pyspark.sql.DataFrame[source]

Get the view of this timeseries as a spark DataFrame

Args:: include_timestamps (bool, optional): Control the addition or removal of timestamps. True will include timestamps, generating if needed, while False will remove timestamps. Use None to returned what is available, leaving unchanged. Defaults to None.
Returns:: pyspark.sql.DataFrame: The view of the data as a spark DataFrame

class caikit.interfaces.ts.data_model.TimeSeries(*args, **kwargs)[source]

Bases: caikit.core.DataObjectBase

A DataObject is a data model class that is backed by a @dataclass.

Data model classes that use the @dataobject decorator must derive from this base class.

timeseries: List[caikit.interfaces.ts.data_model._single_timeseries.SingleTimeSeries]

id_labels: List[str]

producer_id: caikit.core.data_model.ProducerId

_DEFAULT_ID_COL = '_TS_RESERVED'

_DEFAULT_TS_COL = 'timestamp'

__len__() → int[source]

Return the length of the time series object.

Returns:: int: Length

__eq__(other: TimeSeries) → bool[source]

Equivalence operator for TimeSeries objects.

Args:: other (TimeSeries): TimeSeries to test against.
Returns:: bool: True if the TimeSeries are equivalent.

_get_pd_df() → Tuple[pandas.DataFrame, Iterable[str], str, Iterable[str]][source]: Convert the data to a pandas DataFrame, efficiently if possible

as_pandas(include_timestamps: bool | None = None, is_multi: bool | None = None) → pandas.DataFrame[source]

Get the view of this timeseries as a pandas DataFrame

Args:

include_timestamps (bool, optional): Control the addition or removal of timestamps. True will include timestamps, generating if needed, while False will remove timestamps. Use None to returned what is available, leaving unchanged. Defaults to None.

is_multi (bool, optional): Controls how id_labels are handled in the output. If the id_labels are specified in the data model, they are always returned. If there are no id_labels specified, setting is_multi to True will add a new column with generated id labels (0), while False or None will not add any id_labels.

Returns:

pd.DataFrame: The view of the data as a pandas DataFrame

as_spark(include_timestamps: bool | None = None, is_multi: bool | None = None) → caikit.interfaces.ts.data_model.toolkit.optional_dependencies.pyspark.sql.DataFrame[source]

Get the view of this timeseries as a spark DataFrame

Args:

include_timestamps (bool, optional): Control the addition or removal of timestamps. True will include timestamps, generating if needed, while False will remove timestamps. Use None to returned what is available, leaving unchanged. Defaults to None.

is_multi (bool, optional): Controls how id_labels are handled in the output. If the id_labels are specified in the data model, they are always returned. If there are no id_labels specified, setting is_multi to True will add a new column with generated id labels (0), while False or None will not add any id_labels.

Returns:

pyspark.sql.DataFrame: The view of the data as a spark DataFrame

class caikit.interfaces.ts.data_model.Id[source]

Bases: caikit.core.DataObjectBase

A single instance of Id Representation of ids that can be either text or index. Customized this way to be able to work with repeated

value: py_to_proto.dataclass_to_proto.Annotated[str, OneofField('text'), FieldNumber(1)] | py_to_proto.dataclass_to_proto.Annotated[int, OneofField('index'), FieldNumber(2)]

class caikit.interfaces.ts.data_model.EvaluationRecord(id_values=None, metric_values=None, offset=None)[source]

Bases: caikit.core.DataObjectBase

A single EvaluationRecord for EvaluationResult Representation of EvaluationRecord for each row in the dataframe EvaluationRecord{id_values=[“A”, “B”], metric_values=[0.234, 0.568, 0.417], offset=”overall”}

id_values: py_to_proto.dataclass_to_proto.Annotated[List[Id], FieldNumber(1)]

metric_values: py_to_proto.dataclass_to_proto.Annotated[List[float], FieldNumber(2)]

offset: py_to_proto.dataclass_to_proto.Annotated[Id, FieldNumber(3)]

class caikit.interfaces.ts.data_model.EvaluationResult(records=None, id_cols=None, metric_cols=None, offset_col=None, df=None, producer_id=None)[source]

Bases: caikit.core.DataObjectBase

EvaluationResult containing the evaluation results Representation of EvaluationResult stores rows of the dataframe as list of records string lists to keep track of id and metric columns

records: py_to_proto.dataclass_to_proto.Annotated[List[EvaluationRecord], FieldNumber(1)]

id_cols: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(2)]

metric_cols: py_to_proto.dataclass_to_proto.Annotated[List[str], FieldNumber(3)]

offset_col: py_to_proto.dataclass_to_proto.Annotated[str, FieldNumber(4)]

producer_id: py_to_proto.dataclass_to_proto.Annotated[caikit.core.data_model.ProducerId, FieldNumber(5)]

as_pandas() → pandas.DataFrame[source]: Generate and return a pandas DataFrame