Building Type Safe Structured Outputs with Rust and OpenAI
I built a Hacker News AI news summarizer and AI relevancy scorer and, as the type safety and consistency enthusiast I am, I decided to explore structured outputs when interacting with the LLM models. I built a Hacker News AI news summarizer and AI relevancy scorer and, as the type safety and consistency enthusiast I am, I decided to explore structured outputs when interacting with the LLM models. In short: Structured outputs work by supplying a JSON Schema with your request, ensuring that the reply follows the correct format. This includes specifying what each field should be and how the LLM should fill it. A step forward from telling the LLM to output a JSON object and then praying that it decided to comply. The news summarizer project may be the subject of a future blog post. For now, let’s dive into what structured outputs are and how they work. This approach assumes we set An example of this implementation can be found in the complementary GitHub repo. We need a schema, and we have two options: Naturally, I chose the rusty way. I find it too tedious to constantly generate and copy external schemas relying only on hope that they’ll work correctly. I settled on Schemars for my schema generation, though I discovered I needed features from the 1.0 alpha branch to create a compliant schema. We begin by defining the Chat Completion API request, enabling structured outputs by setting the Moving on lets define the Now we supply our schema. First, we need to name it, and then we have a generic Now that we have our correctly defined API request, it’s time to define the schema. We do this by deriving Here we encounter our first roadblock. Reading the Schemars documentation, we naively follow the happy path. And we’re hit with our first error: It turns out that we need to strip the Let’s try again with the same schema. Right… since we set The Giving us the expected schema. And it works! But we want our answer to be two paragraphs! Let’s add descriptions to our schema. The OpenAI models respond to our guidance, and we get two blathery “paragraphs.” There we have it. We now use the schema together with the prompt to generate our structured output. Let’s explore a few final pieces that make it all come together: Next, we need to name the schema. Either we let the user provide a name, or we utilize diagnostic functions to automatically generate something adequate. Let’s take the simpler approach. Giving us: Seems reasonable! Well… the API informs us that naming things is challenging. Of course, there’s a regex pattern to match! This is 2025 after all, and UTF-8 compatibility still presents challenges. Let’s remove the “special” characters. And it works! I suppose detecting characters not in the regex would also work and might be more robust. But having to deal with a regex once in the error message is enough. Putting it all together, let’s create a generic function for schema generation from any type implementing the To query an LLM model, we of course need to send our prompts and specify who they are from. We also need to create some types to deserialize the output. Finally, we can utilize the generated schema to query the OpenAI API and directly deserialize the output. Let’s try it out! Giving us the expected output: And we’re done! We can now query the OpenAI API with a schema and deserialize the response into our desired type. This approach allows us to add the complexity we need, as long as we conform to the requirements of structured outputs. Be it lists of objects, enums or even nested schemas. The full code is available on GitHub. TLDR: Five Easy Steps
strict = true
as shown in all OpenAI documentation examples.1.0.0-alpha.17
to enable with_transform
together with RecursiveTransform
. See the docs.#[serde(deny_unknown_fields)]
to comply with strict = true
.RecursiveTransform
to strip format
from all fields.serde_json::Value
and send it to OpenAI. Deep Dive
response_format
variable. The messages are our prompts, but I’ll return to that later when we query the OpenAI API.
ResponseFormat
struct. The type
is always set to "json_schema"
to enable the structured outputs feature.
serde_json::Value
to contain the generated schema.
schemars::JsonSchema
for our schema.
let schema = schema_for!;
let schema = to_value.unwrap;
Schema
Error querying api: HTTP status client error
for url
format
field from all our types. Fortunately, Schemars has a 1.0-alpha branch allowing us to do that.let schema = default
// The `with_transform` and `RecursiveTransform`
// comes from the 1.0-alpha branch
.with_transform
.into_generator
.;
Response from OpenAI: Error querying api: HTTP status client error
for url
strict = true
like all documentation examples, we need to deny unknown fields. Lets look into our schema to see what we need to change.
additionalProperties
field is required to be supplied and to be false when strict = true
. We can add that our schema by supplying #[serde(deny_unknown_fields)]
since Schemars complies with Serde directives when constructing the schema.
ResponseSchema
SimpleResponseSchema
Naming the Schema
let name = ;
let name = "llm_structured_outputs::SimpleResponseSchema"
Response from OpenAI: Error querying api: HTTP status client error
for url
^[a-zA-Z0-9_-]+$
let name =
.replace
.replace
.replace;
Generic Schema-Safe Querying of OpenAI
schemars::JsonSchema
trait./// Create an OpenAI compatible schema from a Rust type. Utilizes
/// a diagnostic version of the desired response schema's type name
/// for the schema name sent to OpenAI.
/// Query OpenAI with a message and a schema defined by the generic
/// type T. The schema is used to enforce structured output from the
/// OpenAI API and parse the response into said Rust type.
async
/// Query the OpenAI API with a message and a schema.
async
let response: ResponseSchema = query_openai.await;
ResponseSchema