caikit.interfaces.ts.data_model.backends.spark_util

Internal utilities for supporting spark backend implementations

Functions

iteritems_workaround(→ Iterable)

pyspark.pandas.Series objects do not support

mock_pd_groupby(a_df_like, by[, return_pandas_api])

Roughly mocks the behavior of pandas groupBy but on a spark dataframe.

Module Contents

caikit.interfaces.ts.data_model.backends.spark_util.iteritems_workaround(series: Any, force_list: bool = False) Iterable[source]

pyspark.pandas.Series objects do not support iteration. For native pandas.Series objects this function will be a no-op.

For pyspark.pandas.Series or other iterable objects we try to_numpy() (unless force_list is true) and if that fails we resort to a to_list()

caikit.interfaces.ts.data_model.backends.spark_util.mock_pd_groupby(a_df_like, by: List[str], return_pandas_api=False)[source]

Roughly mocks the behavior of pandas groupBy but on a spark dataframe.