Release notes

Last updated on Nov 24, 2025

Mar 2026
Section titled “Mar 2026”
- announcement
  
  STACKIT AI Model Serving: New Model Release GPT-OSS 20B (Replacement for Llama-8B and Nemo)
  Section titled “STACKIT AI Model Serving: New Model Release GPT-OSS 20B (Replacement for Llama-8B and Nemo)”
  Mar 4, 2026
  
  We are excited to announce that we are upgrading our model lineup by introducing openai/gpt-oss-20b, which will serve as the successor to our current Mistral-Nemo and Llama 3.1 8B offerings.
  
  By leveraging 4-bit (MXFP4) quantization, this new 20-billion parameter model provides a significant boost in reasoning capabilities while maintaining the low-latency performance our customers expect. Applications such as real-time chatbots, retrieval-augmented generation (RAG), and agentic workflows will benefit from improved tool-calling and higher throughput.
  
  Deprecation Notice
  Section titled “Deprecation Notice”
  
  As part of this transition, we are officially deprecating the following models:
  - neuralmagic/Mistral-Nemo-Instruct-2407-FP8
  - neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8
  We kindly ask all customers to migrate their workloads to the new model openai/gpt-oss-20b before 4 June 2026.
  
  Explore our full model portfolio, and access detailed examples and tutorials in our documentation. Our Help Center is always at your disposal if you have any questions.
Feb 2026
Section titled “Feb 2026”
- announcement
  
  STACKIT AI Model Serving: New Model Release Qwen3-VL-Embedding-8B as Multi-Modal Embedding
  Section titled “STACKIT AI Model Serving: New Model Release Qwen3-VL-Embedding-8B as Multi-Modal Embedding”
  Feb 5, 2026
  
  We are excited to announce the addition of Qwen3-VL-Embedding-8B to our shared LLM model portfolio. This is a state-of-the-art multimodal embedding model designed to bridge the gap between visual and textual data.
  
  Unlike traditional text-only models, Qwen3-VL-Embedding-8B projects both text and images into a unified semantic vector space. This release unlocks powerful Cross-Modal Retrieval capabilities for your applications, allowing you to perform text-to-image search, image-to-text search, and complex multimodal RAG (Retrieval-Augmented Generation) workflows.
  
  Key Upgrades
  Section titled “Key Upgrades”
  
  This generation delivers comprehensive improvements in vector representation and retrieval accuracy:
  - Unified Multimodality: Computes semantic embedding vectors from chat messages containing both text and images.
  - High-Fidelity Embeddings: Features an output dimension of 4096 and 8 Billion parameters for deep semantic nuance.
  - Extended Context: Supports a maximum input of 32,000 tokens, enabling the processing of dense documents and high-resolution visual inputs.
  - Multi-language Reach: Optimized support for over 30 languages.
  Explore our full model portfolio, and access detailed examples and tutorials in our documentation. Our Help Center is always at your disposal if you have any questions.
Jan 2026
Section titled “Jan 2026”
- announcement
  
  STACKIT AI Model Serving: New Model Release Qwen3-VL-235B-A22B
  Section titled “STACKIT AI Model Serving: New Model Release Qwen3-VL-235B-A22B”
  Jan 19, 2026
  
  We’re excited to announce the release of Qwen3-VL-235B-A22B, the most powerful vision-language model in the Qwen series to date, to our shared LLM model portfolio. This model brings a major leap in reasoning, tool calling capabilities, long-context reliability and visual capabilities.
  
  This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.
  
  Explore our full model portfolio, and access detailed examples and tutorials in our documentation. Our Help Center is always at your disposal if you have any questions.
Dec 2025
Section titled “Dec 2025”
- announcement
  
  STACKIT AI Model Serving: New Model Release GPT-OSS-120B
  Section titled “STACKIT AI Model Serving: New Model Release GPT-OSS-120B”
  Dec 10, 2025
  
  We’re excited to announce the release of GPT-OSS-120B, the most capable model in the GPT-OSS family to date, to our shared LLM model portfolio. This model brings a major leap in reasoning, tool calling capabilities, and long-context reliability.
  
  GPT-OSS-120b is designed to be used within agentic workflows with strong instruction following, and reasoning capabilities. The model provides full chain-of-thought (CoT) and support Structured Outputs.
  
  Explore our full model portfolio, and access detailed examples and tutorials in our documentation. Our Help Center is always at your disposal if you have any questions.
May 2025
Section titled “May 2025”
- announcement
  
  STACKIT AI Model Serving is now available
  Section titled “STACKIT AI Model Serving is now available”
  May 6, 2025
  
  From 6 May 2025 we can offer you the new STACKIT AI Model Serving service.
  
  STACKIT AI Model Serving offers you easy pay-as-you-go access to proven GenAI models, such as Llama 3.3 or Gemma, in a secure environment on the data-sovereign STACKIT Cloud. As a building block of our Data & AI Platform, STACKIT AI Model Serving enables you to use various Large Language Models (LLMs) with maximum data sovereignty. Your data and your queries are neither stored nor used to train models. You choose the LLM that is right for you and receive a seamless user experience when integrating it into your applications thanks to our API.
  
  Our Help Center is always at your disposal if you have any questions.

Release notes

Mar 2026

STACKIT AI Model Serving: New Model Release GPT-OSS 20B (Replacement for Llama-8B and Nemo)

Deprecation Notice

Feb 2026

STACKIT AI Model Serving: New Model Release Qwen3-VL-Embedding-8B as Multi-Modal Embedding

Key Upgrades

Jan 2026

STACKIT AI Model Serving: New Model Release Qwen3-VL-235B-A22B

Dec 2025

STACKIT AI Model Serving: New Model Release GPT-OSS-120B

May 2025

STACKIT AI Model Serving is now available