Skip to content

Release notes

Last updated on

    • announcement

      STACKIT AI Model Serving: New Model Release GPT-OSS 20B (Replacement for Llama-8B and Nemo)

      Section titled “STACKIT AI Model Serving: New Model Release GPT-OSS 20B (Replacement for Llama-8B and Nemo)”

      We are excited to announce that we are upgrading our model lineup by introducing openai/gpt-oss-20b, which will serve as the successor to our current Mistral-Nemo and Llama 3.1 8B offerings.

      By leveraging 4-bit (MXFP4) quantization, this new 20-billion parameter model provides a significant boost in reasoning capabilities while maintaining the low-latency performance our customers expect. Applications such as real-time chatbots, retrieval-augmented generation (RAG), and agentic workflows will benefit from improved tool-calling and higher throughput.

      As part of this transition, we are officially deprecating the following models:

      • neuralmagic/Mistral-Nemo-Instruct-2407-FP8
      • neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8

      We kindly ask all customers to migrate their workloads to the new model openai/gpt-oss-20b before 4 June 2026.

      Explore our full model portfolio, and access detailed examples and tutorials in our documentation. Our Help Center is always at your disposal if you have any questions.

    • announcement

      STACKIT AI Model Serving: New Model Release Qwen3-VL-Embedding-8B as Multi-Modal Embedding

      Section titled “STACKIT AI Model Serving: New Model Release Qwen3-VL-Embedding-8B as Multi-Modal Embedding”

      We are excited to announce the addition of Qwen3-VL-Embedding-8B to our shared LLM model portfolio. This is a state-of-the-art multimodal embedding model designed to bridge the gap between visual and textual data.

      Unlike traditional text-only models, Qwen3-VL-Embedding-8B projects both text and images into a unified semantic vector space. This release unlocks powerful Cross-Modal Retrieval capabilities for your applications, allowing you to perform text-to-image search, image-to-text search, and complex multimodal RAG (Retrieval-Augmented Generation) workflows.

      This generation delivers comprehensive improvements in vector representation and retrieval accuracy:

      • Unified Multimodality: Computes semantic embedding vectors from chat messages containing both text and images.
      • High-Fidelity Embeddings: Features an output dimension of 4096 and 8 Billion parameters for deep semantic nuance.
      • Extended Context: Supports a maximum input of 32,000 tokens, enabling the processing of dense documents and high-resolution visual inputs.
      • Multi-language Reach: Optimized support for over 30 languages.

      Explore our full model portfolio, and access detailed examples and tutorials in our documentation. Our Help Center is always at your disposal if you have any questions.

    • announcement

      STACKIT AI Model Serving: New Model Release Qwen3-VL-235B-A22B

      Section titled “STACKIT AI Model Serving: New Model Release Qwen3-VL-235B-A22B”

      We’re excited to announce the release of Qwen3-VL-235B-A22B, the most powerful vision-language model in the Qwen series to date, to our shared LLM model portfolio. This model brings a major leap in reasoning, tool calling capabilities, long-context reliability and visual capabilities.

      This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.

      Explore our full model portfolio, and access detailed examples and tutorials in our documentation. Our Help Center is always at your disposal if you have any questions.

    • announcement

      STACKIT AI Model Serving: New Model Release GPT-OSS-120B

      Section titled “STACKIT AI Model Serving: New Model Release GPT-OSS-120B”

      We’re excited to announce the release of GPT-OSS-120B, the most capable model in the GPT-OSS family to date, to our shared LLM model portfolio. This model brings a major leap in reasoning, tool calling capabilities, and long-context reliability.

      GPT-OSS-120b is designed to be used within agentic workflows with strong instruction following, and reasoning capabilities. The model provides full chain-of-thought (CoT) and support Structured Outputs.

      Explore our full model portfolio, and access detailed examples and tutorials in our documentation. Our Help Center is always at your disposal if you have any questions.

    • announcement

      From 6 May 2025 we can offer you the new STACKIT AI Model Serving service.

      STACKIT AI Model Serving offers you easy pay-as-you-go access to proven GenAI models, such as Llama 3.3 or Gemma, in a secure environment on the data-sovereign STACKIT Cloud. As a building block of our Data & AI Platform, STACKIT AI Model Serving enables you to use various Large Language Models (LLMs) with maximum data sovereignty. Your data and your queries are neither stored nor used to train models. You choose the LLM that is right for you and receive a seamless user experience when integrating it into your applications thanks to our API.

      Our Help Center is always at your disposal if you have any questions.