caikit.interfaces.ts.data_model.toolkit.sparkconf

Defines function(s) for obtaining a spark configurations.

Attributes

WE_HAVE_PYSPARK

Functions

`sparkconf_local`([master, executor_memory, ...])	Returns a SparkConf object configured for spark-local operation
`sparkconf_k8s`(app_name, namespace, executor_image, ...)	Return a spark configuraion object for use on a kubernetes cluster. For more information on

Module Contents

caikit.interfaces.ts.data_model.toolkit.sparkconf.WE_HAVE_PYSPARK

caikit.interfaces.ts.data_model.toolkit.sparkconf.sparkconf_local(master: str = 'local[2]', executor_memory: str = '2g', driver_memory: str = '2g', app_name: str = 'unnamed', **kwargs)[source]

Returns a SparkConf object configured for spark-local operation

Args:: executor_memory (str, optional): Exectuor memory. Defaults to “2g”. driver_memory (str, optional): Driver memory. Defaults to “2g”. app_name (str, optional): Spark application name. Defaults to “unnamed”. kwargs: passthru key,value arguments that will be added to the spark configuration
Returns:: SparkConf: a spark configuration object.

caikit.interfaces.ts.data_model.toolkit.sparkconf.sparkconf_k8s(app_name: str, namespace: str, executor_image: str, driver_image: str, master: str = 'k8s://https://kubernetes.default.svc:443', num_executors: str = '2', executor_memory: str = '1g', executor_cores: str = '2', driver_memory: str = '1g', driver_cores: str = '2', pvc_mount_path: str | None = None, pvc_claim_name: str | None = None, python_path: str | None = None, k8s_service_account: str | None = None, **kwargs)[source]

Return a spark configuraion object for use on a kubernetes cluster. For more information on what some of these parameters are for see https://spark.apache.org/docs/latest/running-on-kubernetes.html

NOTE: if you are simply running a local spark job, we advise you use the sparkconf_local method instead as it has fewer parameters and more defaults to get you going more quickly.

Args:

app_name (str): The application name (useful for for keeping track of jobs on a multiuser: cluster)

namespace (str): k8s namespace in which this job will run (e.g., “default”) executor_image (str): The container image to use for spark executors. driver_image (str): The spark driver image to use (tpyically the same as exectuor image) master (_type_, optional): The master specificication. Defaults to

“k8s://https://kubernetes.default.svc:443”.

num_executors (str, optional): The number of executors to run. Defaults to “2”. executor_memory (str, optional): The maximum memory allocated to each executor (use g or M

notation). Defaults to “1g”.

executor_cores (str, optional): The maximum number of cores per executor. Defaults to “2”. driver_memory (str, optional): The maxumum memory allocated to the driver. Defaults to

“1g”.

driver_cores (str, optional): The maximum number of cores allocated to the driver. Defaults: to “2”.
pvc_mount_path (str | None, optional): The PVC mount path for exectuors and driver to mount: (this usually has to be rwX). Defaults to None.
pvc_claim_name (str | None, optional): The PVC claim name assocated with the PVC mount.: Defaults to None.
python_path (str | None, optional): The python path to use in python jobs in executor and: driver python processes. Defaults to None.
k8s_service_account (str | None, optional): The k8s service account to use. Defaults to: None.

kwargs: passthru key,value arguments that will be added to the spark configuration

Returns:

SparkConf: A spark configuration that has been defined in a way that makes it compatible: with time series use cases and intended for use with a k8s cluster.