Core features and use cases of AI Model Experiments
Last updated on
AI Model Experiments is built to provide reliability, transparency, and seamless collaboration when managing the machine learning lifecycle. Here are the core features and use cases that make it an essential tool for data science teams:
Core features
Section titled “Core features”Managed service
Section titled “Managed service”We provide a fully managed MLflow™ Tracking Server, including the backend metadata database and a hosted UI, so you can start logging experiments in seconds.
Experiment versioning & lineage
Section titled “Experiment versioning & lineage”Track every detail of your model’s evolution. Log parameters, metrics, tags, and code versions to ensure that every result is 100% reproducible by any member of the team.
Sovereign artifact storage
Section titled “Sovereign artifact storage”Maintain full control over your data. While we manage the experiment metadata, all model binaries, datasets, and plots are stored directly within your own STACKIT project space via Object Storage.
Dedicated instance provisioning
Section titled “Dedicated instance provisioning”Admins can spin up isolated MLflow™ instances for different departments, projects, or staging environments. Each instance acts as a dedicated sandbox with its own configuration and access tokens.
Token-based access
Section titled “Token-based access”Admins generate scoped access tokens for AI Engineers, ensuring secure programmatic interaction with the server via the Python SDK without sharing credentials.
Interactive experiment UI
Section titled “Interactive experiment UI”Visualize your progress through the hosted MLflow™ UI. Compare multiple runs side-by-side and identify best-performing Hyperparameters at a glance.
Python SDK & REST API integration
Section titled “Python SDK & REST API integration”Integrate effortlessly into your existing codebases. Whether you are using Jupyter Notebooks, local scripts, or remote training clusters, the service works with the standard MLflow™ library.
LLM Tracing
Section titled “LLM Tracing”Standardize Generative AI development by logging prompts, system instructions, and model responses alongside evaluation metrics. Gain full observability into Generative AI workflows by capturing granular, step-by-step “traces” of model interactions. This allows you to visualize the entire running chain—including prompts, tool calls, and intermediate reasoning—while standardizing performance through automated evaluation metrics like relevance, professional tone, or factual accuracy.
Use cases
Section titled “Use cases”AI Model Experiments provides the centralized “source of truth” needed for modern AI development. Here are the key scenarios where the service adds maximum value:
Hyperparameter optimization
Section titled “Hyperparameter optimization”Run hundreds of variations of a model with different tuning parameters. Use the tracking server to automatically record the results of each permutation and programmatically retrieve the “Best Run” for deployment.
Collaborative model development
Section titled “Collaborative model development”Break down silos between data scientists. By using a shared managed server, team members can review each other’s experiments, provide feedback in the UI, and avoid duplicating work on failed architectural approaches.
Compliance & auditable AI
Section titled “Compliance & auditable AI”Meet regulatory requirements by maintaining a permanent record of how a model was trained.
CI/CD for Machine Learning
Section titled “CI/CD for Machine Learning”Integrate experiment tracking into your automated pipelines. Use the SDK to log results during automated retraining cycles and trigger deployment workflows only when a new model surpasses a specific performance threshold.
Resource monitoring
Section titled “Resource monitoring”Monitor long-running training jobs. Log system metrics and loss curves in real-time to the hosted UI to detect early signs of overfitting or hardware bottlenecks before the job completes.
Model benchmarking
Section titled “Model benchmarking”Compare different model architectures on the same dataset. Use the centralized dashboard to standardize evaluation metrics and select the most efficient champion model.