Developing a plan for effective site search can seem overwhelming, especially because it relies on a set of terms not used much outside of the web design and development industry. This need for clarification around vocabulary — and our article on site search requirements — led us to create this site search glossary.
The following glossary is based on the terms we use at Blend to discuss and plan site search projects:
The set of formulas used by a search engine to determine the ranking of specific pieces of content. For something like Google, this is made up of hundreds of different factors, all weighted differently. For your site search, it’s less complicated — term frequency, relative “weight” of specific fields or content types, other boosting factors — but all still as important.
Word completion based on what you have already typed. Most often used as a search suggestion method in a search query field — the “search box” — and displayed as a dropdown that adjusts based on what you’ve typed. This is based on already existing words — typically, autocomplete works with words already present within the site taxonomy.
Also known as type-ahead.
Word completion based on what you have already typed, but with the addition of more contextual suggestions. Where autocomplete attempts to finish words based on its own taxonomic knowledge, autosuggest attempts to interpret those words to provide another layer of possibilities. For example, autosuggest might offer terms related to the current query, even if they don’t include the letters already typed.
For example, if you have a specific page that provides information on lodging around your university, you may give it keywords such as “hotels” or “where to stay” in order to make sure those search queries return the Lodging page regardless of the algorithm.
Boosting within site search results is a result of assigning keywords or a Best Bets style “recommended” term, and serves to artificially inflate that page’s ranking beyond the content on the page.
A content type is a unique template within the content management system. Content types can include page types (such as news articles or university programs) or blocks (such as an accordion block or call to action block). In site search, content types can sometimes be used to help filter or exclude content.
A search engine crawls a site by moving from link to link in an attempt to index every page. This process uncovers all available pages (as long as they are linked from another page).
Episerver Find is Episerver’s exclusive search solution. It is included with any Episerver DXP install, but can also be added to any non-DXP install.
Faceting is a set of filters that allows for multifaceted (hence the name) exclusion of terms. For example, a search results page may include facets for product type, product size, and product availability — all three terms could be applied to help narrow down results.
Federated search pulls in search results from one or more external data sources. An example of this would be a university that uses a separate content management system to populate bio pages — federated search would combine both the site content and the external bio content as if it was all part of the same pool of content.
A filter in site search is a term selected from a vocabulary of existing terms to help exclude non-relevant pages from the search results. For example, selecting a filter of “articles” will remove any non-article pages from a search results page.
As a search engine crawls a site, it indexes — stores, organizes and makes available for search results — the pages it encounters. When a search engine indexes a site, it essentially makes a mental note that “these are pages I can return.”
There are two ways that a search engine creates an index:
- Inclusive Indexing — Every page the crawl encounters is indexed. It is up to developers to determine if anything is excluded. In this case, the entire site is open for search results by default.
- Exclusive Indexing — Nothing is indexed until the development team determines what needs to be included.
In this case, developers provide the search engine with a list of content types and site sections to include — everything else is excluded by default.
Ideas, words, and phrases that drive content retrieval. Specific keywords can be added to a page in order to boost its search relevance, and these keywords are often found by studying site search data. For example, if users are constantly searching for “cms implementation,” these become important keywords for any page related to developing and customizing a content management system.
Some sites may employ a “keyword” field — often with some kind of field boosting — in order to influence search results. Most CMS also include a META Keyword field for external web search, though Google and other search engines have stopped using this field due to abuse.
A keyword is not the same thing as a search query, though most “search queries” include keywords. A keyword is more like an ideal concept, while a search query is the actual terms used.
The algorithm developed by Google in 1996, designed to deliver results based on link popularity and keyword frequency. It has been adapted and adjusted non-stop since 1996, and is seen as a wildly sophisticated and fiercely protected search algorithm.
Logical separation of a long list of terms into equal-sized pages. Pagination is most often displayed at the bottom of a page as a set of page numbers, including a “Previous Page” and “Next Page” link.
Pagination is typically set at the template level, somewhere between 10-20 items.
Predictive search takes the concept of autosuggest one step further — instead of simply showing a list of related terms as a user types, the search engine begins to provide actual pages based on the current query. On one hand, these assumptions can surface results faster — as the user types — but at the cost of burying any other related content that might otherwise surface.
The actual word or string of words for the current search. The query may also include the associated filters.
See Best Bets.
The results page is the list of results returned for a specific search query. It is sometimes known as a “Search Engine Results Page” or SERP.
Search Categories and Metadata
Pages can include fields — such as a “category” or any behind-the-scenes metadata — that adds to the algorithmic score of a page beyond the actual text on the page. For example, a page about a medical degree might include a category for “Pre-Med,” or it may include metadata such as a keyword field.
In both cases, the user may not see this content at all.
Search contained to a specific section of the site, such as a location search or an academic program search. Section search is typically performed by the same search engine as the overall site search, but with parameters placed on content type.
An open-source search engine, written in Java and connected to Apache Lucene. Solr is an enterprise-level search engine, and is not tied to a specific content management system.
Typically performed after the search engine has returned results, sorting organizes the results according to a specific field — such as publish date, price, or last name.
Stemming allows search engines to associate similar words with different suffixes, such as “working” and “worked,” allowing for a level of context that would otherwise be lost in the literal interpretation of a search query.
Stemming is typically handled by a processing language like Snowball.
A summary appears alongside a link on a search engine results page (SERP), and is generated either automatically from the content of the page, or is created by an editor. It’s an important part in providing context around a link, giving users insight as to the contents of the page before they click on a result.
Synonyms are similar words that are tied together as “equal” to the search engine, so that when a user types in a synonym — such as “DMV” — the search engine understands that this also means “licensing center.”
Within site search, a taxonomy refers to a set of vocabulary groups that represent key concepts across the site. The most basic way of thinking about vocabulary groups are as categories — the terms you use to categorize, say, the departments within your organization are all a part of an “Organization” vocabulary group within the larger taxonomy.
Weighting within a search algorithm assigns certain additional relevance to fields or content types within indexed content. For example, most site search engines weigh the title field higher than other fields, since this is perceived as the most important place for search queries to be found.