Running Jupyter Notebooks with Papermill in STACKIT Workflows

Zuletzt aktualisiert am 27. März 2026

Prerequisites

You have an active STACKIT Workflows instance.
You have a Git repository connected to your Workflows instance containing your Notebooks.
You have a STACKIT Object Storage bucket (if you wish to persist Notebook running outputs).

In this guide, you will learn how to automate the running of Jupyter Notebooks using the STACKITSparkScriptOperator and Papermill.

Overview of Papermill Integration

Papermill is a tool for configuring and running Jupyter Notebooks. Within STACKIT Workflows, the STACKITSparkScriptOperator automatically detects .ipynb files and uses Papermill to run them as Spark jobs. This allows you to treat notebooks as production-ready data pipelines. When developing notebooks that run both interactively in Notebooks and in production via Papermill/Workflows, you can use the STACKIT__PAPERMILL environment variable to detect the running context. This allows you to handle different scenarios such as secret management, progress bars, and logging differently.

Configuring the Operator

When running a notebook, you need to decide where the run version (containing the results and logs) should be stored. You can either push it to an S3-compliant bucket or stream the output directly to the Airflow logs (stdout).

Connection Setup

To store results in S3, ensure you have an Airflow Connection configured with your Object Storage credentials. In this example, we use a connection ID named s3_papermill_output.

Connection ID	Connection Type	Description
`s3_papermill_output`	`S3`	Credentials for the target bucket where run notebooks are saved.
`lakehouse-rest`	`HTTP/Custom`	Pre-configured connection for STACKIT Lakehouse integration.

Implementation Example

The following DAG demonstrates two tasks: one that saves the output notebook to S3 and another that prints the output to the Airflow task logs.

import pendulum
import os

from airflow.decorators import dag
from airflow.models.connection import Connection
from stackit_workflows.workflows_plugin.operators import STACKITSparkScriptOperator
from stackit_workflows import config

# Fetch S3 credentials from Airflow Secrets
s3_connection = Connection.get_connection_from_secrets("s3_papermill_output")

@dag(
    schedule=None,
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    catchup=False,
    tags=["demo"],
    dag_id="demo_jupyter_notebook",
)
def spark_full_configuration():

    output_s3 = STACKITSparkScriptOperator(
        task_id="papermill_s3",
        # Use a specific Spark image or leave empty for default
        image="schwarzit-xx-sit-dp-customer-artifactory-docker-local.jfrog.io/stackit-spark:spark3.5.3-0.1.2",
        get_logs=True,
        # Explicitly set or let the operator auto-detect via file extension
        is_papermill=True,
        # Path relative to your git-sync folder
        script="Demo/scripts/my_spark_notebook.ipynb",
        # Target S3 path for the executed notebook
        papermill_target_folder="s3://internal-airflow-papermill-output/notebooks",
        s3_arguments={
            "AWS_ACCESS_KEY_ID": s3_connection.login,
            "AWS_SECRET_ACCESS_KEY": s3_connection.password,
            "BOTO3_ENDPOINT_URL": s3_connection.host
        },
        lakehouse_connections=["lakehouse-rest"],
    )

    output_stdout = STACKITSparkScriptOperator(
        task_id="papermill_stdout",
        image="schwarzit-xx-sit-dp-customer-artifactory-docker-local.jfrog.io/stackit-spark:spark3.5.3-0.1.2",
        get_logs=True,
        is_papermill=True,
        script="Demo/scripts/my_spark_notebook.ipynb",
        # Setting folder to "-" redirects the notebook JSON to Airflow logs
        papermill_target_folder="-",
        s3_arguments={
            "AWS_ACCESS_KEY_ID": s3_connection.login,
            "AWS_SECRET_ACCESS_KEY": s3_connection.password,
            "BOTO3_ENDPOINT_URL": s3_connection.host
        },
        lakehouse_connections=["lakehouse-rest"],
    )

    output_stdout >> output_s3

spark_full_configuration()

Running the workflow / DAG

Upload your Notebook Ensure your .ipynb file is committed to the Git repository linked to your STACKIT Workflows instance. The script parameter in the operator must match the relative path in the repo. For a sample Notebook that uses Papermill and Spark, see Using Spark.
Upload the DAG Just like your .ipynb file, commit and push the DAG that runs the notebook (see above). Let’s assume, it is named demo_jupyter_notebook like in the above example.
Trigger the DAG

Navigate to the Airflow UI, locate demo_jupyter_notebook, and trigger it manually.
Monitor Execution

Click on the papermill_stdout task and select Logs. You will see the Spark session initializing and Papermill executing each cell sequentially.
Verify Output

Once the papermill_s3 task completes, check your (STACKIT) Object Storage bucket. You will find a new notebook file containing both the original code and the generated outputs from the run.