First steps with common frameworks

Prerequisites

You have a STACKIT customer account: Create a customer Account
You have a STACKIT user account: Create a user account
You have a STACKIT project: Create a project

Incorporating a chat model in LangChain

LangChain is one of the most common frameworks to build LLM-powered applications. This section demonstrates how to set up a runnable Chain; the centrepiece of the library.

import os

from dotenv import load_dotenv

load_dotenv("../.env")

model = os.environ["STACKIT_MODEL_SERVING_MODEL"]                           # select a chat-model from https://docs.stackit.cloud/stackit/en/models-licenses-319914532.html
base_url = os.environ["STACKIT_MODEL_SERVING_BASE_URL"]                     # e.g. "https://api.openai-compat.model-serving.eu01.onstackit.cloud/v1"
model_serving_auth_token = os.environ["STACKIT_MODEL_SERVING_AUTH_TOKEN"]   # e.g. "ey..."

First, we load configurations and secrets, e.g., as environment variables. To select a model, choose a reasonably sized chat model. To get an overview over the available models, read Available Shared Models. Furthermore, consult Manage auth tokens to set up your STACKIT AI Model Serving Auth Token.

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model=model,
    base_url=base_url,
    api_key=model_serving_auth_token,
)

With langchain_openai.ChatOpenAI, a wide range of models can be accessed. Any OpenAI API-compatible chat model can be utilized. Read Available Shared Models to select an appropriate model.

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers.string import StrOutputParser

prompt = ChatPromptTemplate([
    ("system", "You are a helpful AI bot."),
    ("human", "Hello, how are you doing?"),
    ("ai", "I'm doing well, thanks!"),
    ("human", "{demand}"),
])

The ChatPromptTemplate is a possible way to leverage an LLM.

It provides a list of messages that mimics a previous conversation. Typical roles are human (or user), AI, and system. The system message is often the first message, as it declares the behavior expected from the AI bot in this use case, serving as an instruction for the model. After the system message, a conversation of arbitrary length may set the tone of the interaction between the human and the AI. The last message should be a human message, stating the actual user request. Be aware that the complete template is passed to the LLM when invoked. Therefore, the template, the actual demand of the user, and the answer must fit within the context of the chosen model. It is recommended to use concise templates to prevent increased costs, since these services are billed per token.

A chat template may contain any number of placeholders. A word in curly brackets is considered a placeholder. Upon invocation, a value for each placeholder must be provided. Typically, the last message is (or contains) the actual user request, which is the sole placeholder in this example. An obvious extension of the example to utilize a second placeholder could be a second system instruction, which demands the answer in a specific language, which is given at invocation.

chain = prompt | llm | StrOutputParser()

demand = "Ask me a riddle."
answer = chain.invoke({"demand": demand})

# Output
# > "When is a door not a door?"

In the first line, a runnable chain is constructed via the LangChain Expression Language (LCEL). This simple chain can be read as a sequence of executables. On invocation, all placeholders in the prompt template are replaced with the actual request, and afterwards it is fed into the model and the generated response is parsed to a plain string. It is possible to connect several chains in order to execute a more complex task. Such tasks typically include multiple model calls and may utilize advanced concepts like agents or information retrievers.

Incorporating a chat model in LlamaIndex

LlamaIndex is another major framework for building LLM-driven applications. This section demonstrates how to set up a basic LLM interface to gain hands-on experience.

import os

from dotenv import load_dotenv

load_dotenv("../.env")

model = os.environ["STACKIT_MODEL_SERVING_MODEL"]                           # select a chat-model from https://docs.stackit.cloud/stackit/en/models-licenses-319914532.html
base_url = os.environ["STACKIT_MODEL_SERVING_BASE_URL"]                     # e.g. "https://api.openai-compat.model-serving.eu01.onstackit.cloud/v1"
model_serving_auth_token = os.environ["STACKIT_MODEL_SERVING_AUTH_TOKEN"]   # e.g. "ey..."

The first steps are the same as those described above for LangChain. We need to determine a chat model. Consult Available Shared Models to select a proper model and provide a STACKIT AI Model Serving Auth Token. Consult Manage auth tokens to set up a STACKIT AI Model Serving Auth Token.

from llama_index.llms.openai import OpenAI

model = OpenAI(
    model=model,
    api_key= model_serving_auth_token,
    api_base=base_url
)

When working with LlamaIndex, the llama_index.llms.OpenAI interface is used to access any OpenAI API-compatible model. The Available Shared Models guide can be checked to select an appropriate model.

from llama_index.core import ChatPromptTemplate
from llama_index.core.llms import ChatMessage, MessageRole

messages_template = ChatPromptTemplate(
    [
        ChatMessage(
            role=MessageRole.SYSTEM,
            content=(
                "You are {kind_and_mood_of_assistant}\n"
                "--------------------------------------\n"
                "Give your answers in a concise way. Do not use more than 150 character."
            )
        ),
        ChatMessage(role=MessageRole.USER, content="{demand}"),
    ]
)

The code snippet above demonstrates how to prepare an instruction for the LLM chat bot assistant.
Similar to the LangChain example, we utilize a structure called ChatMessage. In the prompt template, each message claims a certain role. The role MessageRole.SYSTEM is used to phrase an instruction. It is common to instruct the assistant to behave like a certain role. The “helpful AI assistant” is widely used. Every statement in curly brackets are placeholders, which are replaced when a request is made.

The template above is designed to formulate a certain demand (or request) and also specify the role that the LLM shall act like.

response = model.chat(
    messages_template.format_messages(
        kind_and_mood_of_assistant="a pirate with a colorful personality",
        demand="Who is your best friend?",
    )
)
print(response)
# Output
# > assistant: Me parrot, Squawks! Always by me side, through calm seas and stormy skies.


response = model.chat(
    messages_template.format_messages(
        kind_and_mood_of_assistant="the Mad Hatter",
        demand="Who is your best friend?",
    )
)
print(response)
# Output
# > assistant: The March Hare, of course! We enjoy many a mad tea party together.

The LLM is accessed via model.chat(...).

The template placeholders are specified using the method format_messages. In contrast to LangChain, the placeholder designations are not the keys of a dict, but rather named arguments of the method. Therefore, the naming must adhere to variable naming restrictions, such as using underscores instead of spaces.

The output demonstrates the impact of the specified role very well. This can be considered a valuable control parameter.