Large Language Models (LLMs) excel in generating text but often struggle to produce structured output. By leveraging Pydantic’s type validation and prompt engineering, we can enforce and validate the output generated by LLMs.

All code examples in this blog post are written in Python. The LLM used is OpenAI’s gpt-3.5-turbo.

Query the LLM

To query the LLM, we use the following function:

import openai

def query(prompt: str) -> str:
    """Query the LLM with the given prompt."""
    completion = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        temperature=0.0,
    )
    return completion.choices[0].message.content

Query the model

We can query the model with a simple question:

response = query("What is the largest planet in our solar system?")
print(response)
'The largest planet in our solar system is Jupiter.'

Enforcing JSON output with a prompt

In our prompt, we can ask the LLM to respond in a certain format:

prompt = """
I will ask you questions and you will respond. Your response should be in the following format:
```json
{
	"thought": "How you think about the question",
	"answer": "The answer to the question"
}
```
"""

Then, we query the model:

question = "What is the largest planet in our solar system?"
response = query(prompt + question)
print(response)
'{
	"thought": "This is a factual question that can be answered with scientific knowledge.",
	"answer": "The largest planet in our solar system is Jupiter."
}'

This is great, because we can easily parse the structured output:

import json

parsed_response = json.loads(response)
print(parsed_response["answer"])
'The largest planet in our solar system is Jupiter.'

Validating the output

from pydantic import BaseModel


class ThoughtAnswerResponse(BaseModel):
    thought: str
    answer: str


raw_response = query(prompt)

# Note: When you are using pydantic<2.0, use parse_raw instead of model_validate_json
validated_response = ThoughtAnswerResponse.model_validate_json(raw_response)

print(validated_response)
thought='This is a factual question that can be answered with scientific knowledge.' answer='The largest planet in our solar system is Jupiter.'

print(type(validated_response))
<class 'ThoughtAnswerResponse'>

Using the Pydantic model in the prompt

At this moment, we describe our response format in two places:

  • a JSON description in our prompt
  • a corresponding Pydantic model

When we want to update the response format, we need to change both the prompt and the Pydantic model. This can cause inconsistencies.

We can solve this by exporting the Pydantic model to a JSON schema and adding the schema to the prompt. This will make the response and the Pydantic model consistent.

response_schema_dict = ThoughtAnswerResponse.model_json_schema()
response_schema_json = json.dumps(response_schema_dict, indent=2)

prompt = f"""
I will ask you questions, and you will respond.
Your response should be in the following format:
```json
{response_schema_json}
```
"""

The prompt will now look like this:

I will ask you questions, and you will respond. Your response should be in the following format:
```json
{
	"properties": {
		"thought": { "title": "Thought", "type": "string" },
		"answer": { "title": "Answer", "type": "string" }
	},
	"required": ["thought", "answer"],
	"title": "ThoughtAnswerResponse",
	"type": "object"
}

The response will look like this:

{
  "thought": "The largest planet in our solar system is Jupiter.",
  "answer": "Jupiter"
}

Now, whenever you change the Pydantic model, the corresponding schema will be put in the prompt. Note that the schema has become more complex than it was before. One benefit is that it allows us to be more specific in what responses we require.

Error handling

The LLM may still produce results that are not consistent with our model. We can add some code to catch this:

from pydantic import ValidationError

try:
    validated_response = ThoughtAnswerResponse.model_validate_json(raw_response)
except ValidationError as e:
    print("Unable to validate LLM response.")
    # Add your own error handling here
    raise e

Enforce specific values using a Literal

Sometimes, you want to enforce the use of specific values for a given field. We add the field “difficulty” to our response object. The LLM should use it to provide information about the difficulty of the question. In a regular prompt, we would do the following:

prompt = """Your response should be in the following format:
```json
{
  "thought": "How you think about the question",
  "answer": "The answer to the question",
  "difficulty": "How difficult the question was. One of easy, medium or hard"
}
```
"""

Of course, the model could potentially still use other values. To validate it, we would need to write custom code.

With Pydantic, it is a lot easier. We create a new type called Difficulty using a Literal. A Literal allows us to specify the use of a select list of values. We add a Difficulty type hint to the difficulty field in our Pydantic model:

from typing import Literal

from pydantic import BaseModel


# We create a new type
Difficulty = Literal["easy", "medium", "hard"]


class ThoughtAnswerResponse(BaseModel):
    thought: str
    answer: str
    difficulty: Difficulty

The LLM responds may respond with a value we do not allow:

{
  "thought": "The largest planet in our solar system is Jupiter.",
  "answer": "Jupiter",
  "difficulty": "Unknown"
}

When we parse this result, Pydantic will validate the values for the difficulty field. Unknown does not match one of the values specified in the Literal type we have defined. So we get the following error:

validated_response = ThoughtAnswerResponse.model_validate_json(response)

ValidationError: 1 validation error for ThoughtAnswerResponse
difficulty
    Input should be 'easy', 'medium' or 'hard' [type=literal_error, input_value='Unknown', input_type=str]

Conclusion

By using Pydantic and prompt engineering, you can enforce and validate the output of LLMs. This provides you with greater control of the LLM output and allow you to build more robust AI systems.