Using the STACKIT Spark Image with Extra Packages
STACKIT provides ready-to-use Spark images that are optimized for running Spark workloads on Kubernetes. You can use these images directly in your Workflows DAGs and extend them with additional Python packages at runtime. In this tutorial you’ll learn how to configure your DAG to use a STACKIT Spark image and install extra libraries on the fly.
-
Create a DAG in your project
dags/my_extra_packages_dag.pyimport pendulumfrom airflow.decorators import dagfrom stackit_workflows.airflow_plugin.decorators import stackit# Specify the STACKIT Spark imagedefault_kwargs = {"image": "schwarzit-xx-sit-dp-customer-artifactory-docker-local.jfrog.io/stackit-spark:spark3.5.3-0.1.2"}@dag(schedule=None,start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),catchup=False,tags=["demo"],dag_id="07_extra_packages",)def packages():# The STACKIT Spark image provides a writable Python environment.# You can install extra libraries at runtime using pip, conda, or mamba.@stackit.spark_kubernetes_task(**default_kwargs)def tell_jokes():import subprocess, sys# Install an extra package at runtime inside the Spark containersubprocess.check_call([sys.executable, "-m", "pip", "install", "Joking"])import Jokingprint(Joking.random_joke())tell_jokes()packages()How it works
Section titled “How it works”-
STACKIT Spark image
- Defined via the
imageparameter (spark3.5.3-0.1.2). - Provides a maintained Spark runtime plus a writable Python environment.
- You don’t need to build your own base image to get started.
- Defined via the
-
Runtime installation
- Since the environment is writable, you can install additional libraries with pip, conda, or mamba.
- mamba is recommended because it resolves dependencies faster and ships optimized binaries.
-
Best practice
- Runtime installs consume resources each time the pod starts.
- For tasks you run often, build a custom image based on the STACKIT Spark image with all required libraries pre-installed.
-
-
Push the DAG to your environment and trigger it in Airflow.
-
Check the task logs
- You’ll see pip fetching and installing the
Jokingpackage. - The task will then print a random joke from the installed library.
- You’ll see pip fetching and installing the
-
Inspect the Spark pod
- Confirm that the pod is using the STACKIT Spark image you specified.
- The additional library is available only during this task’s runtime.
-