Core features and use cases of AI Model Experiments

Last updated on Apr 7, 2026

AI Model Experiments is built to provide reliability, transparency, and seamless collaboration when managing the machine learning lifecycle. Here are the core features and use cases that make it an essential tool for data science teams:

Core features

Managed service

We provide a fully managed MLflow™ Tracking Server, including the backend metadata database and a hosted UI, so you can start logging experiments in seconds.

Experiment versioning & lineage

Track every detail of your model’s evolution. Log parameters, metrics, tags, and code versions to ensure that every result is 100% reproducible by any member of the team.

Sovereign artifact storage

Maintain full control over your data. While we manage the experiment metadata, all model binaries, datasets, and plots are stored directly within your own STACKIT project space via Object Storage.

Dedicated instance provisioning

Admins can spin up isolated MLflow™ instances for different departments, projects, or staging environments. Each instance acts as a dedicated sandbox with its own configuration and access tokens.

Token-based access

Admins generate scoped access tokens for AI Engineers, ensuring secure programmatic interaction with the server via the Python SDK without sharing credentials.

Interactive experiment UI

Visualize your progress through the hosted MLflow™ UI. Compare multiple runs side-by-side and identify best-performing Hyperparameters at a glance.

Python SDK & REST API integration

Integrate effortlessly into your existing codebases. Whether you are using Jupyter Notebooks, local scripts, or remote training clusters, the service works with the standard MLflow™ library.

LLM Tracing

Standardize Generative AI development by logging prompts, system instructions, and model responses alongside evaluation metrics. Gain full observability into Generative AI workflows by capturing granular, step-by-step “traces” of model interactions. This allows you to visualize the entire running chain—including prompts, tool calls, and intermediate reasoning—while standardizing performance through automated evaluation metrics like relevance, professional tone, or factual accuracy.

Use cases

AI Model Experiments provides the centralized “source of truth” needed for modern AI development. Here are the key scenarios where the service adds maximum value:

Hyperparameter optimization

Run hundreds of variations of a model with different tuning parameters. Use the tracking server to automatically record the results of each permutation and programmatically retrieve the “Best Run” for deployment.

Collaborative model development

Break down silos between data scientists. By using a shared managed server, team members can review each other’s experiments, provide feedback in the UI, and avoid duplicating work on failed architectural approaches.

Compliance & auditable AI

Meet regulatory requirements by maintaining a permanent record of how a model was trained.

CI/CD for Machine Learning

Integrate experiment tracking into your automated pipelines. Use the SDK to log results during automated retraining cycles and trigger deployment workflows only when a new model surpasses a specific performance threshold.

Resource monitoring

Monitor long-running training jobs. Log system metrics and loss curves in real-time to the hosted UI to detect early signs of overfitting or hardware bottlenecks before the job completes.

Model benchmarking

Compare different model architectures on the same dataset. Use the centralized dashboard to standardize evaluation metrics and select the most efficient champion model.