Rewriters

Rewriters manipulate the query that was entered by the user. They can change the result set by adding alternative tokens, by removing tokens or by adding filters. They can also influence the ranking by adding boosting information.

A single query can be rewritten by more than one rewriter. Together they form the rewrite chain.

Before you can apply a rewrite chain, you need to configure one or more rewriters.

Configuring and applying a rewriter

We will use a minimal example of the ‘Common Rules Rewriter’ - Querqy’s most popular rewriter - to demonstrate how a rewrite chain is configured in principle.

As search engines differ in how configurations are supplied to them, select your search engine below.

Querqy adds a REST endpoint to Elasticsearch/OpenSearch for managing rewriters at

/_querqy/rewriter

Creating/configuring a ‘Common Rules rewriter’:

PUT  /_querqy/rewriter/common_rules

1{
2    "class": "querqy.elasticsearch.rewriter.SimpleCommonRulesRewriterFactory",
3    "config": {
4        "rules" : "notebook =>\nSYNONYM: laptop"
5    }
6}

Note

OpenSearch users: Simply replace package name elasticsearch with opensearch in rewriter configurations.

Rewriter definitions are uploaded by sending a PUT request to the rewriter endpoint. The last part of the request URL path (common_rules) will become the name of the rewriter.

A rewriter definition must contain a class element (line #2). Its value references an implementation of a querqy.elasticsearch.ESRewriterFactory which will provide the rewriter that we want to use.

The rewriter definition can also have a config object (#3) which contains the rewriter-specific configuration.

In the case of the SimpleCommonRulesRewriter, the configuration must contain the rewriting rules (#4). Remember to escape line breaks etc. when you include your rules in a JSON document.

We can now apply one or more rewriters to a query:

POST /myindex/_search

 1{
 2  "query": {
 3      "querqy": {
 4          "matching_query": {
 5              "query": "notebook"
 6          },
 7          "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"],
 8          "rewriters": ["common_rules"]
 9      }
10  }
11}

The rewriters are added to the minimal query that we constructed earlier using a list of named rewriters (line #8). This list contains the rewrite chain - the list of rewriters in the order in which they will be applied and in which they will manipulate the query. The above example contains only a single rewriter.

Rewriters are referenced in the rewriters element either just by their name or by the name property of an object which allows to pass request parameters to the rewriter. The following example shows two rewriters, one of them with additional parameters:

POST /myindex/_search

 1{
 2  "query": {
 3      "querqy": {
 4          "matching_query": {
 5              "query": "notebook"
 6          },
 7          "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"],
 8          "rewriters": [
 9              "word_break",
10              {
11                  "name": "common_rules",
12                  "params": {
13                      "criteria": {
14                          "filter": "$[?(!@.prio || @.prio == 1)]"
15                      }
16                  }
17              }
18          ]
19      }
20    }
21}

The first rewriter, word_break (line #9), is just referenced by its name (we will see a ‘word break rewriter’ configuration later. The second rewriter is called in a JSON object. Its name property references the rewriter definition by the rewriter name, ‘common_rules’ (#11). The params object (#12) is passed to the rewriter.

In the example, params contains a criteria object (#13). This parameter is specific to the Common Rules rewriter. The filter expression in the example ensures that only rules that either have a prio property set to 1 or that don’t have any prio property at all will be applied.

In the above example rewrite chain, the word_break rewriter will be applied before the common_rules rewriter due to the order of the rewriters in the rewriters JSON list element.

Updating and deleting rewriters

To update a rewriter configuration, just send the updated configuration in a PUT request to the same rewriter URL again.

To delete a rewriter, send a request with HTTP method DELETE to the rewriter URL. For example,

DELETE  /_querqy/rewriter/common_rules

will delete your common_rules rewriter.

List of available rewriters

The list below contains all rewriters that come with Querqy. Click on the rewriter name to see the documentation.

Common Rules Rewriter

Query-dependent rules for synonyms, result boosting (up/down), filters; ‘decorate’ result with addition information

Replace Rewriter

Replace query terms. Used as a query normalisation step, usually applied before the query is processed further, for example, before the Common Rules Rewriter is applied

Word Break Rewriter

(De)compounds query tokens. Splits compound words or creates compounds from separate tokens.

Number-Unit Rewriter

Recognises numerical values and units of measurement in the query and matches them with indexed fields. Allows for range matches and boosting of the exactly matching value.

Shingle Rewriter

Creates shingles (compounds) from adjacent query tokens and adds them as synonyms.