Understanding Site Search — Part One: How Search Works
Before we can make decisions about site search, we need to understand how it works. In part one of our site search series, we look at how search works.
It’s nearly impossible to go a day without encountering a search box. It’s in our browsers, and it’s on our phones, and it’s on most of the sites we browse day-to-day. But while we have a basic understanding of how search engines work — Enter term. Get results. Find pages. — the reality of site search is that it is it isn’t Google.
You don’t have access to their proprietary algorithm. You don’t have thousands of engineers. Search on your site isn’t backed by a multinational corporation.
But that doesn’t mean it can’t be good. Site search takes the fundamentals of the larger search engines and boils them down into a localized process. All of the work that goes into Google and Bing — and the work that went into their predecessors — is used in some small way to bring you results on your site.
In this post, we’ll talk about the basics of site search, giving us a bit of context before we talk through how to plan for it.
This is part one of a four-part series on understanding how to plan for site search on your website:
- Introduction: Understanding Site Search
- Part One: How Does Search Work?
- Part Two: What Content Will You Search?
- Part Three: How Will You Organize Results?
- Part Four: Choosing the Right Solution
How does search work: site search in a nutshell.
A search engine requires, in essence, two things: a term to search — the search query — and a pool from which to search from. This pool is created as a search engine “indexes” the site — it reviews, logs, and understands the pool of content in order to be able to return results relevant to the search query.
When we talk specifically about site search, we’re talking about an “internal” search that focuses on pages within a specific domain that you manage. What gets included with site search falls along a spectrum of possibilities:
- Section Search — This searches one section or one content type — such as “news articles”
- Full-site Search — This searches all pages on the current site.
- Federated Search — This searches all pages on the current site, plus content from beyond the current install, integrating both sets of pages into a single search result.
There are also ways to allow users to manipulate the search results:
- Sorting — Allowing results to be organized by a specific characteristic
- Filtering — Adjusting results to include only those with a relevant filter, or to a specific content type
Most content management systems come packaged with a base-level search, which provides the bare minimum for search and allows for some level of customization.
Base search: the very basics.
While it’s easy to think that our sites will be as intuitive as Google, using a complex algorithm like PageRank to help filter and sort pages according to perfect levels of relevance, in reality the search engines used within our sites are much less powerful. Part of this is simple economics — our site search will index hundreds or thousands of pages, not millions, which means it will require a much more scaled back option that requires less development, less server power, and much less human interaction.
An understanding of what is included with a base search solution helps frame expectations around what site search can accomplish. Typically, it’s good enough, but not perfect, and includes the following at minimum:
- An index of all site content at the page level
- Full-text search of the title and body fields (without weighting)
- A results page that includes the page title and pagination
In most situations, these are enough. This is the absolute base: you enter a term, the engine looks at all pages for that term in the title or body, and it gives you a list of every page that includes that term.
Everything beyond this base functionality requires a bit more decision-making. This is not to say your CMS doesn’t include functionality beyond these three — it’s just that there are choices we may want to make beyond the defaults.
Expanding past the base search.
Base search is fine — it will serve in a pinch — but there’s going to come a time when you want it to be better. Maybe that time is a year or two after launch. Maybe that’s during development. This is when the bigger decisions begin: your strategic plan may ask for more pointed filtering, or better integration of external sources, or a way to facet products in a way that drives engagement.
Almost anything can be done beyond base search, as long as you understand the following:
- Search needs context — You can think of a search engine as a robot that turns words into numbers. Each page’s words are scored according to an algorithm, and that algorithm delivers results based on the amount of times that word or relevant words or fields are encountered. But the search engine _doesn’t know what those words mean_ unless we assign them some kind of contextual value.
- Context takes resources — The more context you provide a search engine, the better your results will be. But that context requires resources. It requires someone to build that context into the base search engine. It requires an understanding of how each page is structured to provide that context. And it requires added editorial time or contextual tools to tweak results. You can’t just flip a switch and hope Google appears — you must adjust and tweak over time.
- Context requires patience — Additionally, providing context to the search engine isn’t a job that … ends. You may tweak the algorithm over time to deliver better results, or you may adjust your content to do the same. You can’t launch a search option and then forget it forever.
With these things in mind, understand that a base level search may serve you just fine at the start. But also understand that your results are simply not going to be as refined as they will be after more involved search discussions and enhancements.
And with that, we can jump into our first set of requirements — how do we determine what we search.