Getting started with shared models

Last updated on Mar 11, 2026

The term “Shared Model” refers to models that are used communally by all clients. Through the shared hosting of our LLMs, we enable a large number of users to cost-effectively access these powerful models and utilize them for their specific applications.

Further information about the licenses and endpoints of the provided models can be found on Available shared models.

To start with shared models you need to enable AI Model Serving and create an auth token. Upon completing this step you are ready to use the inference API. In this guide you choose a model and start with inference.

Prerequisites

You have a STACKIT customer account: Create a customer Account
You have a STACKIT user account: Create a user account
You have a STACKIT project: Create a project

Enable AI Model Serving and create an auth token

To enable AI Model Serving login to the Customer portal and click at the sidebar on the left on AI Model Serving. There you can enable the feature, if it is not already done. Confirm the activation to finally enable AI Model Serving.

After you enabled AI Model Serving, you need to create an authentication token:

On the top bar click on Create token.
In the new pane enter a token name and optionally a lifetime in days.
To confirm click on Order fee-based.
Save the generated token to a safe location.

You can not retrieve the token again after closing the pane
Click on Close.

After you created and saved your authentication token, decide which model you want to use. Read Available Shared Models to get information about all available models. In this guide we use Llama 3.3 70B. Now you are ready to use the model. In this tutorial you will send a message to the AI Model Serving and then receive the answer. This is basically how this service is used.

Write a first message and receive the answer

To write the first message to your chat model, you need to use the API. There is no chat window, yet. This is by design, because this product is designed for API first.

Run the following command to write your first message to the chat model. Use the following parameters:

Parameter	Meaning	Example
auth-token	The AI Model Serving auth token	`BZasjkdasbu...`

In addition to this parameters there are more parameters like the system prompt, a temperature and some more. For this tutorial we don’t change those. Just change the auth-token parameter and copy the whole command into your shell:

curl -X POST https://api.openai-compat.model-serving.eu01.onstackit.cloud/v1/chat/completions --header "Authorization: Bearer [auth-token]" --header "Content-Type: application/json" --data '{"model": "cortecs/Llama-3.3-70B-Instruct-FP8-Dynamic","messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Why is this documentation great?"}], "max\_completion\_tokens": 250,"temperature": 0.1}'

The chat model will answer something like this:

{"id":"cmpl-a7a2f78e5ff74fc5b975b8d0059a0001","object":"text\_completion","created":1729776894,"model":"cortecs/Llama-3.3-70B-Instruct-FP8-Dynamic","choices":[{"index":0,"text":"¶\\n\\nThis documentation is designed to help you understand how to use the PyTorch library, which is a popular open-source machine learning framework. Here are some reasons why you should use this documentation:\\n\\n1\. **Comprehensive coverage**: This documentation covers all aspects of PyTorch, including its core features, modules, and tools. You'll find detailed explanations, examples, and tutorials to help you master PyTorch.\\n2\. **Official source**: This documentation is maintained by the PyTorch team, ensuring that the information is accurate, up-to-date, and authoritative.\\n3\. **Easy to navigate**: The documentation is organized in a logical and intuitive way, making it easy to find the information you need. You can browse by topic, search for specific keywords, or use the table of contents to navigate.\\n4\. **Code examples and tutorials**: The documentation includes numerous code examples, tutorials, and guides to help you get started with PyTorch. You can learn by doing, and the examples will help you understand how to apply PyTorch to your own projects.\\n5\. **Community involvement**: The PyTorch community is active and engaged, and the documentation is open-source. This means that you can contribute to the documentation,","logprobs":null,"finish\_reason":"length","stop\_reason":null,"prompt\_logprobs":null}],"usage":{"prompt\_tokens":8,"total\_tokens":258,"completion\_tokens":250}}

You received this answer on your console, now. For real world client applications you will parse this data application/json and are then able to use this data.

After you exchanged your first message with the chat model, you can dig deeper and continue with the How-tos or the Tutorials.