Understanding Site Search — Part Two: What Content Will You Search?
You might want to search it all. Or, you might want to search a small section. In part two of our site search series, we look at how to choose what's indexed in your site search.
After discussing what really happens behind the scenes in our last post, it’s time to start making some decisions. What is the scope of your searchable content, and how do you plan on making that content searchable in a way that allows users to find what they’re looking for even if they don’t know what they’re looking for.
In this post, we will focus on decisions around choosing the indexable pool of content for your site search.
This is part two of a four-part series on understanding how to plan for site search on your website:
- Introduction: Understanding Site Search
- Part One: How Does Search Work?
- Part Two: What Content Will You Search?
- Part Three: How Will You Organize Results?
- Part Four: Choosing the Right Solution
What content will you search?
The search engine will look at every page of the site and determine if your search query is present, then return results according to its algorithm — the most “important” pages at the top. For example, if you search for “biology,” the search results will include every page that includes the word “biology” somewhere.
So the question becomes, what content do you want to search?
We talked through the spectrum of inclusion above, but to reiterate:
- Include every page — Simply put, every page is available in search.
- Include every page except a few — Perhaps you don’t want a certain set of content types — bio pages, for example — to show in search. You can remove these pages based on content type or based on section within the content tree.
- Include only a few — Perhaps you only want a specific section or a specific content type — academic programs, for example — to show in search. You can adjust your search to only include relevant sections/content types.
Realistically, you can get incredibly complex in what is allowed or not allowed within search. But remember: the more you limit your search pages, the more requirements you place on an individual page before it will show in search results.
What fields will you search?
Beyond just the pages themselves, you’ll also need to make decisions based on what fields you want to search. While full-text search simply scans the page for relevant terms, you can also index specific fields specifically for the purpose of using them to influence your results later on.
We can assume you will be searching text within the main body field and the title. But do you also want to search:
- Preview, summary, or meta description text
- Categories or tags
- Hidden fields created primarily for search
The fields themselves don’t matter as much as the weight you give to them, which we will discuss a little later on.
There’s also a concept in which you provide a unique search option only within a section of content. Think of a search box that only appears within the articles and news feed content types, and only searches articles. This would be considered “sectional” search, and we’ve seen it in a lot of cases:
- A university may have a unique sectional search for programs and majors.
- A hospital system may have a unique sectional search for providers or for locations.
- A commercial product site may have a unique sectional search for warranty and troubleshooting documentation.
These are technically different from your site-wide search in that they are, well, not side-wide. This leads to two questions commonly asked of sectional search:
- Will section search results use the same results page layout? Or will they have their own unique search results page?
- Will pages within the section search also show in site-wide search?
- Are the two applications using a similar search algorithm? If not, they may provide wildly different results from one search to another.
Answers to these are often determined during content strategy and design.
As we’ve mentioned above, search engines do not actually understand the words we’re providing to them — they only work with the context we provide. Which means if we want a search engine to understand that two words are similar enough to return the same results — such as allowing a search for “fridge” to also include results for “refrigerator” — we have to tell the search engine that those words mean the same thing.
While this is a literal use of the concept of a synonym — two words that mean the same thing — we can also provide synonyms to link similar concepts.
For example, while our state might have a set of licensing centers that provide testing and updating of driver’s licenses, we also know that many people call these centers the “DMV.” DMV stands for the “Department of Motor Vehicles,” obviously, but it’s also been accepted as shorthand for what we call a licensing center. Adding this as a synonym for “licensing center” allows someone to search DMV and receive both those pages that include DMV and also those that include “licensing center”
Application of synonyms can be complex — in which large libraries of industry terms are integrated within search — or simple — in which someone on your team manually adds synonyms through your CMS in order to help influence results. Obviously, this also comes with varying levels of editorial involvement — if you can use an existing library, you may depend on that; if your industry does not have that kind of library, or if you’re providing content more specific in nature, you could very well spend a ton of time filling in gaps within that synonym library.
Regardless, synonyms are one of the first steps in moving beyond the “what” and into the “how.” We know what content we want to search, now we need to determine how it will be organized on the search results page.
Now it’s time to display results.
With all of this, we are closer to understanding the pool of content we’re working with. But once someone makes a query, how does that translate into a list of results? We’re going to focus on that in our next chapter on displaying and sorting results.