Quick tips from the dance floor from a Java specialist, enterprise developer, and mobile technology enthusiast.
Understanding ElasticSearch analyzers
Sadly, lots of early Internet beer recipes aren’t necessarily in an easily digestible format; that is, these recipes are unstructured intermixed lists of directions and ingredients often originally composed in an email or forum post.
So while it’s hard to easily put these recipes into traditional data stores (ostensibly for easier searching), they’re perfect for ElasticSearch in their current form.
Accordingly, imagine an ElasticSearch index full of beer recipes, since…well…I enjoy making beer (and drinking it too).
First, I’ll add some beer recipes into ElasticSearch using Node’s ElasticSearch Client(note that the code is CoffeeScript though). I’ll be adding these beer recipes into a beer_recipes index like so:
<code class='javascript'><span class='line'><span class="nx">beer_1</span> <span class="o">=</span> <span class="p">{</span>
</span><span class='line'> <span class="nx">name</span><span class="o">:</span> <span class="s2">"Todd Enders' Witbier"</span><span class="p">,</span>
</span><span class='line'> <span class="nx">style</span><span class="o">:</span> <span class="s2">"wit, Belgian ale, wheat beer"</span><span class="p">,</span>
</span><span class='line'> <span class="nx">ingredients</span><span class="o">:</span> <span class="s2">"4.0 lbs Belgian pils malt, 4.0 lbs raw soft red winter wheat, 0.5 lbs rolled oats, 0.75 oz coriander, freshly ground Zest from two table oranges and two lemons, 1.0 oz 3.1% AA Saaz, 3/4 corn sugar for priming, Hoegaarden strain yeast"</span>
</span><span class='line'><span class="p">}</span>
</span><span class='line'>
</span><span class='line'><span class="nx">client</span><span class="p">.</span><span class="nx">index</span><span class="p">(</span><span class="s1">'beer_recipes'</span><span class="p">,</span> <span class="s1">'beer'</span><span class="p">,</span> <span class="nx">beer_1</span><span class="p">).</span><span class="nx">on</span><span class="p">(</span><span class="s1">'data'</span><span class="p">,</span> <span class="p">(</span><span class="nx">data</span><span class="p">)</span> <span class="o">-></span>
</span><span class='line'> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span>
</span><span class='line'><span class="p">).</span><span class="nx">exec</span><span class="p">()</span>
</span></code>
Note how the interesting part of a recipe JSON document, dubbed beer_1 is found in the ingredients field. This field is basically a big string of valuable text (you can imagine how this string was essentially the body of an email). So while the ingredients field is unstructured, it’s something clearly that people will want to search on.
It’s a hot summers day and I’m thinking I’d like to make a beer with lemon as an ingredient (to be clear: I want to use lemon zest, which is obtained from a lemon peel). So naturally, I need to find (i.e. search for) a recipe with lemons in it.
Consequently, I’ll search my index for recipes that contain the word “lemon” like so:
But nothing shows up – there are no results! Why is that?
If you look closely in the earlier code example (specifically, the beer_1 JSON document), you can see that the word “lemons” is in the text (i.e. “…two table oranges and two lemons…”). It turns out, by default, the way values are indexed by ElasticSearch, lemon doesn’t necessarily match – lemons does though.