Thoughts

Understanding Schema: Generative Search's Big Star

Author

Corey Vilhauer

Categories:

When we search for something on the web, we’re not looking for “results.” When we’re looking for the nearest children’s hospital, or the nearest credit union, we often think we’re looking for a list of options — a list of results — but that’s not really what we want. We don’t want more decisions. We don’t want randomly associated web pages.

We want answers.

This is a huge reason why generative search gets all the attention — after all, generative search gives us answers — or, at least, it gives us answers in a way that’s much easier to digest than a set of search results.

But, generative search is still a search tool — it attempts to answer your search query with the data that it has collected by pairing your request with a loose collection of ideas that are most commonly connected to your query. It seems like it’s thinking, but in reality it’s still just a search engine. As anyone who has gotten a sideways answer or hallucination knows, generative search is not human. It still needs us to help it understand.

Thankfully, there’s an entire system already in place to help us assign meaning and context to our web content: schemas.

What are schemas?

So first, what is a schema?

As a base definition, a “schema” (fun fact: the plural is both “schemas” and “schemata,” for those who need to know such things) represents a system for understanding context. We use them to help us understand things and ideas — schemas are mostly concerned with assigning meaning to complexity. In its most basic form, you can think of a schema as a blueprint that describes the structure and constraints of a thing or idea, such as a schema for how to communicate a recipe, or the schema that helps us find books at the library.

For web content, “schema” has a more specific definition — it is a collaborative markup language that helps search engines like Google better understand what your website content actually means. (If you want to see an example, you can head to any page on schema.org and scroll to the bottom, such as this page for the “Organization” schema. Select “JSON-LD” and you’ll see what the code looks like.)

The key is understanding that schema helps to assign context. When it comes to web content, it always comes back to context.

How does a schema work?

To understand how we apply schemas to web content, we have to remember that neither site crawlers nor content management systems are thoughtful or reasoning — they are, as I often joke, robots. They don’t track and scan content for understanding — they only interact with content based on how we programmed them to interact with content. Which means the only thing a site crawler or content management system can interpret from your content is what you’ve told it.

This is the discipline of content modeling — using fields and interactions to help define content in a way that the robots in the search engines can understand them.

Some of this comes from how we tag our text — a first-level heading is understood to represent the main point of the page, while a meta description provides a paragraph of words then identified as a general summary of the page. There’s an added benefit here: these are also the fields that traditional search engines use to pass along your intended meaning to the users who are performing a web search.

But applying a schema to our web content takes it beyond the structure of the page itself and provides real-world context to the words we’re publishing. Think of it as adding invisible labels to your web pages that tell search engines things like “this page is about a product, and that product has a price,” or “this chunk of content is a customer review,“ or ”this is an upcoming event." What’s more, schema helps “store” this information — many AI tools can only hold so many “tokens” of information at once, and schema can boil down information more concisely, helping preserve it within a tool for longer.

Simply put, when search engines can clearly identify your content, they can display it more prominently in search results through rich snippets, knowledge panels, and other enhanced features that make your listings stand out from competitors.

What schemas do I need to worry about?

Traditional web crawlers have been dabbling in schema for a while now, especially when it comes to providing context for Google Rich Results or locations. In fact, much of the schema work we saw up until the generative search explosion was focused on providing better results for businesses via tools like Google My Business, and even now searching for information on how to boost your site using schema is often tied to helping coffee shops and clothing stores drive online traffic via location-based schema.

That was primarily because traditional search relies on a complicated algorithm that defines certain aspects of a webpage as “valuable,” including specific types of schema. Today’s generative search does not rely on the same algorithm — instead of scoring a result based on existing data alone, generative search is using the logical structure of HTML (headings, sections, chunking) and using that to play a very impressive game of pattern recognition and auto-complete. There’s no checkmarks to reach or specific data points to touch on — instead, we can provide context to any part of the page and relay that context to the tools that crawl our sites.

It sounds complicated, and from an engineering standpoint it certainly is. But the overall concept can be summed up like this: in the old days, when we wanted to identify parts of a website, we were given a set of pre-printed stickers that we could then use to provide limited context. Now, we have a lot more stickers — and, a lot of blank ones, so we can dive a lot deeper into the specific kind of context we need to provide.

To start, we can look to the schemas we’ve already established. There are hundreds of schema types — many of them formalized by schema.org — but not all of them are as impactful as others. From what we’ve seen so far, the following schemas are a great start for a general website:

Organization — Who are you exactly? This establishes your company entity, including foundational details like name, description, and contact information.
Service — Key for implementation and support service pages, detailing consulting, training, and ongoing support offerings.
Product — A product can be many things. For a software company, for example, this can be applied to software packages or modules, including pricing, availability, and technical specifications, even if they’re not formally sold via a shopping cart on your site.
Offer — Editors can relay pricing plans, subscription models, and service packages with clear terms and availability.
Article — Blog posts, articles, resource offerings, and general thought leadership gets this schema to help identify it as a source of information.
FAQ — Allows you to say, literally, that the text in the accordion dropdown is not just a bunch of words, but an official answer — hugely important because AI tools are usually trying to answer a question. FAQ Page is a page-level schema that solves a similar situation.
Author — Generative search wants to give answers that have been vetted, which means building author authority is important — especially if that author is already known for a specific topic. If you work in a space where scholarly articles need to be reviewed, they can also be handled using this schema.
Case Study — To showcase implementation success stories and tie outside organizations to yours.
Event — For webinars, training sessions, and user conferences.

Where do we start?

For modern generative search, your content needs structure. If your content is already relatively well-structured, this can be a lighter lift: the goal becomes assigning schema markup to those already structured fields. If your content is not well-structured, well, there’s a bit more to work on.

Audit your overall content structure. First, we’ll need to understand what context is already being conveyed. Are pages using structured fields within unique content types? Or is your site using standard or unstructured pages to publish articles or other thought leadership content? In order for generative search to fully embrace your articles, you’ll need to tell generative search that your article is, in fact, an article.
Determine relevant schema. You can go hog wild assigning schemas within the content model, if you want. But not all schema is necessary for every application. While (as we mention above) schema like “Organization” and “Article” are nearly universal, others are relatively industry specific. If you have a web shop with products for sale, you’ll want to employ the “Product” schema, but you wouldn’t need to do that if your site is focused on providing guidance for getting a flu shot. (Don’t know where to start? Consider asking the robots themselves: I asked Claude to generate a list of potential schema for this page announcing a new Hüsker Dü box set from Numero Group and got a very helpful response.)
Add schema to the site. For structured fields, adding schema is relatively easy: it’s a template-level change that can often be done quickly by your development team. For unstructured content, this is a much larger task — in order for schema to be assigned at the page level, it will need to be associated with some kind of structured content, which may require both a reworking of that entire page template and (even harder) a full re-imagining of the content itself.
Make context part of your content. Adding schema markup does not immediately make things work. Just like traditional search optimization relies on writing for understanding, generative search relies on using real words that real people look for. Every piece of content that is added to the site should go through a process that asks “are we saying what someone needs to find this?” The development team will add the code, but the entire organization needs to make context and understanding a major part of ongoing content review.

It’s not complicated, but it’s definitely new.

Providing context to web content has been around for as long as semantic HTML has been around, and Google’s been relying on certain schema.org collections for a while. None of this is new — it’s just that its importance and utility have vastly increased in order to assign meaning where meaning might otherwise be implied.

In reality, the work in building a rich generative search presence is less to do with structure and more to do with day-to-day work: listening to the people who use your product or service, understanding what they’re looking for, and providing content that answers those questions. It’s a natural extension of the work you might already be doing to make potential customers and clients feel included and cared for.

If you’re looking for some guidance on what schema might work best for your existing site, give us a call. If you’re wondering if it’s all worth it — it definitely is.