Workflows
Übersicht
Features
Section titled “Features”STACKIT Workflows delivers all the power of Apache Airflow with enterprise-grade enhancements. The service provides fully managed infrastructure, eliminating the need to provision, configure, or maintain Airflow components. Dynamic resource allocation automatically scales infrastructure based on workload demands, while Airflow’s flexible retry mechanisms ensure robust workflow execution.
Key features include:
- Intuitive Airflow Web UI for monitoring and managing workflows
- Wide range of pre-installed operators for common tasks
- Secure by design: Connect your Identity Provider (IdP) via OIDC with Role Based Access Control (RBAC). Templates available for Keycloak, Entra ID, Okta, Google and AWS Cognito.
- Connect your own Git repository for DAG storage with continuous polling for changes
- Web-based DAG Development Environment (DDE) for developing and testing DAGs in the actual runtime environment (coming soon)
- Easy-to-use operators and decorators for Spark jobs and custom Python code
- Isolated task execution in dedicated Kubernetes pods (no noisy neighbors)
- Seamless STACKIT Observability integration with pre-defined dashboards
- Support for
KubernetesPodOperatorwith custom Docker images - Dynamically scaled Kubernetes infrastructure for high availability and performance
Use cases
Section titled “Use cases”Data pipeline orchestration
Section titled “Data pipeline orchestration”Workflows excels at coordinating complex data pipelines that span multiple systems and require precise timing and dependency management. Whether processing batch data, reacting to external system changes, or ingesting data, Workflows provides the reliability and scalability you need.
ETL/ELT process management
Section titled “ETL/ELT process management”Automate extract, transform, and load operations across diverse data sources. Workflows orchestrates data movement between databases, data lakes, and analytics platforms while handling error recovery and data quality checks. STACKIT Spark integration simplifies data extraction from various sources and loading into the STACKIT Data Platform.
Machine learning pipeline automation
Section titled “Machine learning pipeline automation”Streamline ML workflows from data preparation through model deployment. Coordinate data preprocessing, feature engineering, model training, validation, and deployment in a single, manageable pipeline.
Infrastructure automation
Section titled “Infrastructure automation”Automate routine infrastructure tasks, system maintenance, and operational procedures. Schedule regular backups, system health checks, and automated responses to common operational scenarios.
Data quality and governance
Section titled “Data quality and governance”Implement automated data quality checks, lineage tracking, and compliance reporting. Ensure data integrity across your organization with scheduled validation and monitoring workflows.