Thoughts

What We Find Most Often in GEO Audits

Author

Corey Vilhauer

Categories:

A magnifying glass icon surrounded by gear and pencil icons, representing our editorial and technical GEO checklists.

Worried your content won't show up in AI search? Here are the five problems we find most often in GEO audits — from easy heading fixes to deeper editorial work.

We've been offering generative engine optimization (GEO) audits for a while now — for our financial services and B2B clients, as free check-ups for conference attendees, and even on our own web properties — in order to provide some insights on how well content is structured to appear in answer engine results.

While the specifics vary from site to site, the patterns have not. This makes sense — the work we as an industry have done over the past twenty years shifts and flows alongside current web trends, and so when a newer trend like GEO pops up, we reassess what we have and realize we're all kind of in the same boat.

While GEO is a new trend, it's one that will clearly be around for a while. Placing all of your views on AI aside, the push for GEO is an exciting time for those in service of clarity. What GEO is teaching us from an editorial standpoint is that clarity, user understanding, and plain language are no longer "nice to haves," reserved for only the strictest government agencies, but instead a core attribute of what gets picked up by answer engines.

Will it eventually be gamed and adjusted? Of course: this is the internet. But, in the meantime, it's our opportunity to structure our content better, write for understanding, and make sure that every piece of content we put out is as clear as the next.

Anyway, that's your pep talk. Now, let's talk about the issues that we're seeing the most on GEO audits.

The site has no schema markup.

This is the most common finding by volume, and it's usually the first thing we check. Schema markup — structured data written in JSON-LD — is a way of explicitly telling search engines what your organization is, what your content covers, and who created it. Without it, search engines and AI tools have to figure all of that out on their own.

The reason this is missing is easy: a lot of sites have never implemented schema in the first place. Before the Great GEO Boom, page schema was focused in a few specific areas — location schema, job opening schema — where a standard had been put in place. Google business listings and maps helped promote the idea of location schema, while job search engines used job opening schema to help provide context around listings.

There's a bit of nuance to all of this: Google recently said that structured data isn't required specifically for their AI features. But Google's not the only game in town, and any level of added context can do nothing but help AI robots understand the purpose and message of a page on your site. Schema remains important for rich results eligibility in traditional search, and it may carry weight with other AI platforms like ChatGPT and Perplexity, which haven't published the same kind of guidance.

Beyond that, the process of implementing schema forces you to think clearly about what your content is and who it's for — and that clarity benefits everything. If your site has no schema at all, this is one of the highest-value items you can address.

Is your site showing up in AI search?

Tools like ChatGPT, Claude, and Google's AI Overviews are changing how people find information — and how they find you. Sites that aren't structured for generative search are getting left out of the conversation entirely.

Our GEO Readiness Checklist shows you what to look for. Download the editorial checklist, the technical checklist, or both — and start working through your site one item at a time.

Download the Checklist

The headings describe nothing.

When we review editorial headings, we typically find issues of clarity and uniqueness. For example, headings like "Overview," "Our Approach," "Learn More," and "Get Started" could probably appear on any page of any website.

A heading's job is to tell the reader what the section covers. But when a heading says "Our Approach," it tells the reader nothing (and it tells an AI tool even less). AI tools lean on a page's structure — including its headings — as one signal for working out what a section covers, so a heading like "How a home equity loan works and what it costs" gives them something to work with, while "Overview" doesn't.

What's more, page headings (especially the first-level — H1 — headings that usually map to the page title) require more than clarity and context. They need to be relatively unique. While this work helps clarify the site for GEO, it's also useful for internal site search and general understanding of site structure.

One recent audit we performed flagged more than 40 pages with generic H1s, which means 40 pages were missing a unique and powerful point of differentiation. The main title of the page did not communicate what the page actually covered.

This is one of the easiest fixes on the list. It doesn't require new content or a technical overhaul — just a pass through your existing pages with a simple question: does this heading tell someone what they're about to read?

Nobody knows who wrote the articles.

If your hope is to be recognized as a thought leader in your industry, you have to provide a level of trust. This means letting your audience know who wrote your articles and why we should trust them. This is a constant issue, especially at organizations where content is written by committee, produced by an outside agency, or published under a generic brand voice with no individual attached.

This matters because AI search, at its core, tries to predict trustworthy answers to real questions. A page with a named author, a title, and a few lines establishing their experience on the topic carries a stronger credibility signal than the same content published anonymously. It's the difference between "here's what an expert at this organization thinks" and "here's what a website says."

The fix isn't complicated: put a name on your content and add a short bio. This may require updates to the content model itself (to add an author field, or to create a more complex content type that includes the full author bio) but in short, your editorial team simply needs to make it clear that a real person with relevant knowledge is responsible for what's on the page.

This is especially important for topics where expertise matters — financial guidance, healthcare information, legal explanations, or anything where the reader needs a reason to trust the source.

The content doesn't answer anyone's actual question.

The old content strategist in me slowly pumps his fist with this one.

After years of pleading with the world as a whole to begin thinking about the why behind each and every page — to spend time critically assessing why a page exists and whether it's answering the questions a site user might be asking — GEO swings in and whispers into our ears: "I only want content that answers a real question."

It makes sense: answer engines aren't finding things like a traditional search engine might. They're answering questions.

A lot of the pages we audit are full of positioning language: "We're committed to providing innovative solutions for our members," or "Our team delivers best-in-class service." These sentences fill space; they are empty calories posing as LinkedIn posts. Someone searching for "how does a home equity line of credit work" doesn't need to know that you're committed to innovation. They need to know how a HELOC works, what it costs, and whether it's a good fit for their situation.

Google's GEO guidance highlights a concept they call "non-commodity content" — writing that provides real insight, specific information, and a point of view that couldn't come from just anyone. The generic "7 Tips for First-Time Homebuyers" article that exists on ten thousand websites is commodity content. A piece that walks through a specific decision your team helped a real borrower make, with real numbers and a real outcome, is not.

Frustratingly enough to us content strategists, we find that a site's most important pages — the ones that describe mainline products or areas of expertise — are often the least specific. These are pages that passed through dozens of hands until every sharp edge was sanded into oblivion. The content is sanitized, and in doing that the content doesn't answer any real questions.

This is an editorial concern, and it takes time. It means going back to your most important pages and figuring out the purpose of each section — of each paragraph, and each definition. That's way harder than adding schema or fixing a heading or slapping some FAQ on the bottom of the page, but it's also where the biggest gains are.

The site is accidentally blocking AI search crawlers.

When concerns about AI tracking data picked up over the past couple of years, a lot of organizations added blanket blocks in their robots.txt files to keep AI crawlers out.

This is okay! It made sense at the time — we really didn't know what to expect, and erred on the side of security and protecting our assets.

Fortunately, the crawlers that train AI models and the crawlers that power AI search are different bots — and the major platforms now document them separately. OpenAI runs three: GPTBot for model training, OAI-SearchBot for indexing the content that surfaces in ChatGPT's search results, and ChatGPT-User for fetching a page live when someone asks a question. Anthropic mirrors this with ClaudeBot for training, Claude-SearchBot for search indexing, and Claude-User for live, user-triggered fetches. Perplexity keeps it to two: PerplexityBot for indexing and Perplexity-User for live retrieval.

If you want to block the training crawlers, you can do that. But block the search crawlers along with them, and you've cut your content out of those platforms' AI answers entirely. The platform might still show a bare link or your page title, but the content itself won't be there to quote or summarize.

The fix is relatively easy: check your robots.txt file and make sure you're blocking only what you intend to block. For many organizations, this is the single highest-impact change they can make, because everything else on this list is irrelevant if the crawlers can't reach your pages in the first place.

The common thread.

There's a common thread here, obviously: we are all looking for clarity from people we trust. Schema and headings and specificity and clear authorship are the basics of usable and trustable content. These things have always mattered for search, and they continue to matter in search's next phase. In fact, they probably matter even more.

We're no longer asking for a search engine to point us toward the cake mixes. We're asking a search engine to go grab the cake mix for us. Which means it's even more important to understand which cake mixes are on the shelf in the first place.

Putting that delicious metaphor aside, things are clearer now about what good content looks like: structured, authored, and clear.

If you'd like to run through a checklist yourself, you are in luck. We have put together two self-audit checklists — one for editorial teams and one for technical teams — that cover these findings and more. They're free, they're practical, and they'll give you a solid starting point for understanding where your site stands.

Good luck. It all boils down to one thing: start with content worth trusting, and the machines will trust it too.