Getting started with Querqy

Querqy is a query rewriting framework for Java-based search engines. It is probably best-known for its rule-based query rewriting, which applies synonyms, query dependent filters, boostings and demoting of documents. This rule-based query rewriting is implemented in the ‘Common Rules Rewriter’, but Querqy’s capabilities go far beyond this rewriter.

You might want to …

Installation

Elasticsearch
Solr

Installation under Elasticsearch

  • Stop Elasticsearch if it is running.

  • Open a shell and cd into your Elasticsearch directory.

  • Run Elasticsearch’s plugin install script:

./bin/elasticsearch-plugin install <URL>

The <URL> depends on your Elasticsearch version. Select your version below and we will generate the install command for you:



./bin/elasticsearch-plugin install \
   "https://repo1.maven.org/maven2/org/querqy/querqy-elasticsearch/1.5.es7172.0/querqy-elasticsearch-1.5.es7172.0.zip"
  • Answer yes to the security related questions (Querqy needs special permissions to load query rewriters dynamically).

  • When you start Elasticsearch, you should see an INFO log message loaded plugin [querqy].

Installation under Solr

Warning

Querqy configuration has changed in an incompatible way with the introduction of Querqy v5 for Solr. Make sure to follow the documentation for your Querqy version below. See here for detailed information about changes and a migration guide to Querqy 5 for Solr

The Querqy plugin is installed as a .jar file.

Querqy 5

<!--
   Add the Querqy request handler.
-->
<requestHandler name="/querqy/rewriter" class="querqy.solr.QuerqyRewriterRequestHandler" />

<!--
    Add the Querqy query parser.
-->
<queryParser name="querqy" class="querqy.solr.QuerqyDismaxQParserPlugin"/>

<!--
   Override the default QueryComponent.
-->
<searchComponent name="query" class="querqy.solr.QuerqyQueryComponent"/>

Querqy 4

<!--
    Add the Querqy query parser.
-->
<queryParser name="querqy" class="querqy.solr.DefaultQuerqyDismaxQParserPlugin"/>

<!--
   Override the default QueryComponent.
-->
<searchComponent name="query" class="querqy.solr.QuerqyQueryComponent"/>

Making queries using Querqy

Elasticsearch
Solr

Querqy defines its own query builder which can be executed with a rich set of parameters. We will walk through these parameters step by step, starting with a minimal query, which does not use any rewriter, then adding a ‘Common Rules Rewriter’ and finally explaining the full set of parameters, many of them not related to query rewriting but to search relevance tuning in general.

Minimal Query

POST /myindex/_search

 1{
 2   "query": {
 3       "querqy": {
 4           "matching_query": {
 5               "query": "notebook"
 6           },
 7           "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"]
 8       }
 9   }
10}

Querqy provides a new query builder, querqy (line #3), that can be used in a query just like any other Elasticsearch query type. The matching_query (#4) defines the query for which documents will be matched and retrieved.

The matching query is different from boosting queries which would only influence the ranking but not the matching. We will later see that Querqy allows to specify information for boosting outside the matching_query object and that the set of matching documents can be changed in query rewriting, for example, by adding synonyms or by deleting query tokens.

The query element (#5) contains the query string. In most cases this is just the query string as it was typed into the search box by the user.

The list of query_fields (#7) specifies in which fields to search. A field name can have an optional field weight. In the example, the field weight for title is 3.0. The default field weight is 1.0. Field weights must be positive. We will later see that the query_fields can be applied to parts of the querqy query other than the matching_query as well. That’s why the query_fields list is not a child element of the matching_query.

The combination of a query string with a list of fields and field weights resembles Elasticsearch’s built-in multi_match query. We will later see that there are some differences in matching and scoring.

Querqy inside the known Elasticsearch Query DSL

The following example shows, how easy it is to replace a Elasticsearch query type like multi_match with a Querqy matching_query, so you can profit from Querqy’s rewriters. Let’s say you have an index that contains forum posts and want to find a certain post in the topic “hobby”, that was made 10-12 days ago and was about “fishing”.

A simple Boolean query with a multi_match and a match query inside the must occurrence and a range query in the filter occurrence should do the trick.

POST /index/_search

 1 {
 2   "query": {
 3     "bool": {
 4       "must": [
 5         {
 6           "match": {
 7             "topic": "hobby"
 8           }
 9         },
10         {
11           "multi_match": {
12             "query": "fishing",
13             "fields": ["title", "content"]
14           }
15         }
16       ],
17       "filter": [
18         {
19           "range": {
20             "dateField": {
21               "gte": "now-12d",
22               "lte": "now-10d"
23             }
24           }
25         }
26       ]
27     }
28   }
29 }

To use the matching_query from the querqy query builder, your request would look like this:

POST /myindex/_search

 1 {
 2   "query": {
 3     "bool": {
 4       "must": [
 5         {
 6           "match": {
 7             "topic": "hobby"
 8           }
 9         },
10         {
11           "querqy": {
12             "matching_query": {
13               "query": "fishing"
14             },
15             "query_fields": ["title", "content"],
16             "rewriters": ["my_replace_rewriter", "my_common_rules"]
17           }
18         }
19       ],
20       "filter": [
21         {
22           "range": {
23             "dateField": {
24               "gte": "now-12d",
25               "lte": "now-10d"
26             }
27           }
28         }
29       ]
30     }
31   }
32 }

As you can see, to use a matching_query instead of a multi_match you need to use querqy (line #11) as a “wrapper” for the matching_query.

If you followed the instructions for installing Querqy, you have configured a Querqy query parser in your solrconfig.xml file. This query parser can be used with a rich set of parameters. We will walk through these parameters step by step, starting with a minimal query, which does not use any rewriter, then adding a ‘Common Rules Rewriter’ and finally explaining the full set of parameters, many of them not related to query rewriting but to search relevance tuning in general.

We will not encode URL parameters in the example for better readability.

Minimal Query

/solr/mycollection/select?q=notebook&defType=querqy&qf=title^3.0 brand^2.1 shortSummary

The Querqy query parser is enabled using the defType parameter.

As usual in Solr, the q parameter defines the query for which documents will be matched and retrieved. In most cases the value of parameter q is just the query string as it was typed into the search box by the user. Querqy query rewriting can add boosting information outside that query or change the set of matching documents, for example, by adding synonyms or by deleting query tokens.

The qf parameter specifies in which fields to search. A field name can have an optional field weight. In the example, the field weight for title is 3.0. The default field weight is 1.0. Field weights must be positive.

The use of the q and qf parameters resembles Solr’s built-in dismax and edismax query parsers. We will later see that there are some differences in how scoring works.

Where to go next