Find out how to crank your site search to 11 with Elasticsearch.
Website search bars have become somewhat ubiquitous lately. We spend a great deal of time, as website designers, creating strategic and targeted menu structures in order to give users a clear and quick path to the information that they need but sometimes, site visitors need to find what they’re looking for even faster. So we give them the search option. You’ve seen it – that little magnifying glass icon or the empty text field that just says “Search” – you’ve probably used it but have you ever thought about how it works?
Generally speaking, most sites use one of three types of search engine: they directly query the website database, they integrate a third party, web-based search engine like Google, or they use Elasticsearch.
A direct database query is the simplest and most easily integrated search functionality for a website. You type “retriever” into the search bar, and it returns all instances of the word “retriever” that it can find in the website database – the query is straightforward, clean, and easy. The main drawback to this type of search is that, if you have a great deal of data for the query to sift through or you’re asking it to return results for “retriever” that only appear in a specific category or section of the site, these searches can take a long time to perform. It’s also not a “smart” search – it generally can’t tell that “retriever” and “retrieving” and “retrieve” are essentially the same as far as a user is concerned. It certainly won’t know that, if you’re searching for “retriever” you may actually be interested in results for “labrador” or “dog.” It just returns the word “retriever.”
“We’re off to see the wizard!”
Using an external search engine for your website search is another popular option. This allows you to use the vast searching power of Google on your own website. That sounds amazing on the face of it – and in many respects it IS amazing. Google is essentially the great and powerful Oz – they know everything, right? But there are a few drawbacks. Firstly, you are at the mercy of Google’s indexing schedule. Any new content (new blogs, events, pages, etc.) that hasn’t been picked up and indexed by Google yet won’t show in your search results. This means that your search results and your actual website content might not match up if you’ve updated copy or images on a page recently. The secondary drawback is that you have little to no control over what the returned search results look like or how they’re prioritized. This may seem relatively small in the grand scheme of things but it can be a big deal if you want pages, products, or events that are more important to you at the top of the search results. Not to mention that there is significantly less flexibility when it comes to styling the results returned and it may not match your branding… it will look like what it is, an outside search tool.
So, if you have complex searches that need to run fast and you don’t want to be tied into an outside source, you’re in luck. There is a third option that is available to you.
Elasticsearch is, simply put, extremely powerful, extremely fast, and extremely reliable. At its most basic, Elasticsearch uses distributed search “nodes” to run simultaneous searches on multiple sets of data in order to get you the results that you’ve asked for as fast as possible.
A discussion about Elasticsearch can get super technical, super fast if we’re not careful so we’ll try to use an imperfect metaphor to demonstrate how it works.
Imagine that you are in charge of a research department for a major library. Your department has three teams (called “nodes”) of researchers. Each of those teams has the exact same set of books at their disposal – duplicate libraries, if you will. The first team has a third of those books open on tables in front of them ready to dig through them. The second team has a third of their books open too but they are different books from the first team. And the third team has a different set of books open than the first or the second team. When you ask a research question of your teams, every member on those teams grabs their open books and starts searching and whoever gets the answer first, returns it to you. Make sense?
Here’s where it gets neat.
Suppose someone else is asking questions of the teams at the same time as you. That’s ok, because there is more than one person on each team. If someone is busy researching an answer for someone else, another team member jumps in and searches the books for you. The result is still returned quickly and efficiently. This also works across teams. Suppose something happens and the entire staff of Team A is put out of commission – they all get the flu or something. Because every team has duplicate libraries, the remaining two teams basically split up the third of books that Team A had open, and open them on their own tables to be searched quickly resulting in no downtime for the research department.
Ok, metaphor aside, what does all of this mean for websites using Elasticsearch?
It means that Elasticsearch is fast because it doesn’t rely on a single search query to return every result – there’s a whole bunch of researchers doing the work at the same time.
It means that Elasticsearch is stable because everything is distributed between multiple groups (called “nodes”) of researchers who all have duplicate libraries. If a node gets taken out by a malicious attack, the search will still run and will still be fast because there are more research teams to take up the slack.
It means that Elasticsearch is scalable. If the amount of data to be searched or the amount of search traffic grows to the point where the existing teams are starting to get bogged down with requests, you can simply add whole new teams of researchers (add new nodes to the cluster) to help with the work.
There are other advantages to Elasticsearch as well. The nature of the search structure allows a near infinite amount of customization to how results are returned. Do you want searches for “colorized” to return results for “colorful”, “colored”, “color”, or “coloring”? Elasticsearch can do that. It can even be customized so that searches for “mahogany” return results for “hardwood” or “brown.” Do you want search results prioritized so that results that are physically closer to the user are displayed first? Elasticsearch can do that. Do you want to show only results that are within a 20-mile radius of the user? Elasticsearch can do that too. The possibilities are nearly endless.
“Phenomenal cosmic powers…itty-bitty living space.”
So where’s the catch? There’s always a trade-off, right? It can do anything we ask, it does it fast, and it never fails… so why isn’t everyone using it?
The short answer is that it’s expensive. The Elasticsearch tool itself is actually free, but the servers needed to actually host the tool are not. Not even close. Elasticsearch has historically required some pretty hefty machinery to run and hosting on that kind of server environment is not cheap. Additionally, it takes time and expertise to set it up, get it integrated with a website, and make it do all of its fancy tricks. That amount of time and skill can also cost a decent amount of money. There are some ways to reduce this cost (Elasticsearch has just released a new service called Found that could make using Elasticsearch more accessible to businesses on a tighter budget) but it is still more expensive than other options.
With the kind of investment in time and equipment that Elasticsearch requires, it’s not going to be a viable option for everyone. But, if your organization’s web platform fits any of the below scenarios, it might be worth exploring Elasticsearch:
- You are a resource site with a huge amount of data that needs to be indexed and searched.
- You have custom filtering needs – stacking filters, multiple keywords, categories, etc. The most common examples of this are eCommerce solutions and resource sites.
- You have lots of traffic on your site and simply cannot afford to go down or even slow down due heavy traffic.
- You have an “App Site.” Your primary purpose is dynamically serving data in a customized way, not just standard information retrieval.
- You have multiple sets of data in a single site (for example, free resources or products vs. registered user only resources or products).
We get excited about Elasticsearch because it allows us to create extraordinarily powerful online tools and platforms for clients that are fast and efficient as well as beautiful. We’ve barely scratched the surface of what Elasticsearch can do in this blog, but we’re more than happy to answer any questions that you might have about it and we’d love a chance to get geeky with you on the details. Feel free to contact us any time to learn more!