Rewriters

Rewriters manipulate the query that was entered by the user. They can change the result set by adding alternative tokens, by removing tokens or by adding filters. They can also influence the ranking by adding boosting information.

A single query can be rewritten by more than one rewriter. Together they form the rewrite chain.

Before you can apply a rewrite chain, you need to configure one or more rewriters.

Configuring and applying a rewriter

We will use a minimal example of the ‘Common Rules Rewriter’ - Querqy’s most popular rewriter - to demonstrate how a rewrite chain is configured in principle.

As search engines differ in how configurations are supplied to them, select your search engine below.

Elasticsearch/OpenSearch
Solr

Rewriters in Elasticsearch/OpenSearch

Querqy adds a REST endpoint to Elasticsearch/OpenSearch for managing rewriters at

/_querqy/rewriter

Creating/configuring a ‘Common Rules rewriter’:

PUT  /_querqy/rewriter/common_rules

1{
2    "class": "querqy.elasticsearch.rewriter.SimpleCommonRulesRewriterFactory",
3    "config": {
4        "rules" : "notebook =>\nSYNONYM: laptop"
5    }
6}

Note

OpenSearch users: Simply replace package name elasticsearch with opensearch in rewriter configurations.

Rewriter definitions are uploaded by sending a PUT request to the rewriter endpoint. The last part of the request URL path (common_rules) will become the name of the rewriter.

A rewriter definition must contain a class element (line #2). Its value references an implementation of a querqy.elasticsearch.ESRewriterFactory which will provide the rewriter that we want to use.

The rewriter definition can also have a config object (#3) which contains the rewriter-specific configuration.

In the case of the SimpleCommonRulesRewriter, the configuration must contain the rewriting rules (#4). Remember to escape line breaks etc. when you include your rules in a JSON document.

We can now apply one or more rewriters to a query:

POST /myindex/_search

 1{
 2  "query": {
 3     "querqy": {
 4         "matching_query": {
 5             "query": "notebook"
 6         },
 7         "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"],
 8         "rewriters": ["common_rules"]
 9     }
10  }
11}

The rewriters are added to the minimal query that we constructed earlier using a list of named rewriters (line #8). This list contains the rewrite chain - the list of rewriters in the order in which they will be applied and in which they will manipulate the query. The above example contains only a single rewriter.

Rewriters are referenced in the rewriters element either just by their name or by the name property of an object which allows to pass request parameters to the rewriter. The following example shows two rewriters, one of them with additional parameters:

POST /myindex/_search

 1{
 2  "query": {
 3     "querqy": {
 4         "matching_query": {
 5             "query": "notebook"
 6         },
 7         "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"],
 8         "rewriters": [
 9             "word_break",
10             {
11                 "name": "common_rules",
12                 "params": {
13                     "criteria": {
14                         "filter": "$[?(!@.prio || @.prio == 1)]"
15                     }
16                 }
17             }
18         ]
19     }
20   }
21}

The first rewriter, word_break (line #9), is just referenced by its name (we will see a ‘word break rewriter’ configuration later. The second rewriter is called in a JSON object. Its name property references the rewriter definition by the rewriter name, ‘common_rules’ (#11). The params object (#12) is passed to the rewriter.

In the example, params contains a criteria object (#13). This parameter is specific to the Common Rules rewriter. The filter expression in the example ensures that only rules that either have a prio property set to 1 or that don’t have any prio property at all will be applied.

In the above example rewrite chain, the word_break rewriter will be applied before the common_rules rewriter due to the order of the rewriters in the rewriters JSON list element.

Updating and deleting rewriters

To update a rewriter configuration, just send the updated configuration in a PUT request to the same rewriter URL again.

To delete a rewriter, send a request with HTTP method DELETE to the rewriter URL. For example,

DELETE  /_querqy/rewriter/common_rules

will delete your common_rules rewriter.

Rewriter configuration in Solr

Warning

Querqy configuration has changed in an incompatible way with the introduction of Querqy v5 for Solr. Make sure to follow the documentation for your Querqy version below. See here for detailed information about changes and a migration guide to Querqy 5 for Solr

Querqy 5

Querqy adds a URL endpoint to Solr for managing rewriters. When you set up Querqy in solrconfig.xml, you’ve added a request handler for this:

<requestHandler name="/querqy/rewriter" class="querqy.solr.QuerqyRewriterRequestHandler" />

You can then manage your rewriters at

http://<solr host>:<port>/solr/mycollection/querqy/rewriter

Creating/configuring a ‘Common Rules rewriter’:

POST /solr/mycollection/querqy/rewriter/common_rules?action=save
Content-Type: application/json
1{
2    "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory",
3    "config": {
4        "rules" : "notebook =>\nSYNONYM: laptop"
5    }
6}

Rewriter definitions are uploaded by sending a POST request and appending the action=save parameter to the rewriter endpoint. The last part of the request URL path (common_rules) will become the name of the rewriter.

A rewriter definition must contain a class element (line #2). Its value references an implementation of a querqy.solr.SolrRewriterFactoryAdapter which will provide the rewriter that we want to use.

The rewriter definition can also have a config object (#3-5), which contains the rewriter-specific configuration. In the case of the CommonRulesRewriterFactory, the configuration must contain the rewriting rules (#4). Remember to escape line breaks etc. when you include your rules in a JSON document.

If you work with SolrJ, you can create your configuration request using a request that comes with most of the Querqy-supplied rewriters. Just look out for the *ConfigRequestBuilder classes in the Java packages under querqy.solr.rewriter.

Once we have managed our rewriter configuration, We can apply one or more rewriters to a query:

GET /solr/mycollection/select?q=notebook&defType=querqy&querqy.rewriters=common_rules&qf=title^3.0...

The parameter defType=querqy enables the Querqy query parser. The optional parameter querqy.rewriters contains a list of comma-separated rewriter names. These rewriters form the rewrite chain and they are processed in their order of occurrence. In this specific example, we only used the rewriter that we defined in our POST request above and we reference it by its name common_rules. Had we configured another rewriter under /solr/mycollection/querqy/rewriter/replace, we could apply the ‘replace’ rewriter before the ‘common_rules’ rewriter using the URL parameter querqy.rewriters=replace,common_rules.

By default, Solr will reply with a 400 Bad Request response, if a rewriter that was passed in in the ‘querqy.rewriters’ parameter does not exist. Please see this section in the ‘Advanced Solr Plugin Configuration’ documentation for an option to ignore missing rewriters.

Updating and deleting rewriters (Querqy 5)

To update a rewriter configuration, just send the updated configuration in a POST request with action=save to the same rewriter URL again.

To delete a rewriter, send a POST request with action=delete to the rewriter URL. For example,

POST /solr/mycollection/querqy/rewriter/common_rules?action=delete

will delete your common_rules rewriter.

Getting rewriter information (Querqy 5)

You can get a list of configured rewriters at:

GET /solr/mycollection/querqy/rewriter

To retrieve the configuration of a specific rewriter, you can make a GET call against its endpoint. In the case of the common_rules rewriter above, the call would be:

GET /solr/mycollection/querqy/rewriter/common_rules

Querqy 4

The rewrite chain is configured at the Querqy query parser in solrconfig.xml:

 1<!--
 2  Add the Querqy query parser.
 3-->
 4<queryParser name="querqy" class="querqy.solr.DefaultQuerqyDismaxQParserPlugin">
 5
 6  <lst name="rewriteChain">
 7
 8     <!--
 9       Common Rules Rewriter
10     -->
11     <lst name="rewriter">
12
13       <str name="id">commonRules</str>
14
15       <str name="class">querqy.solr.SimpleCommonRulesRewriterFactory</str>
16
17         <!--
18           The file that contains rules for synonyms, boosting etc.
19          -->
20       <str name="rules">rules.txt</str>
21
22     </lst>
23
24     <!--
25
26       You can add more rewriters here
27
28     <lst name="rewriter">
29       <str name="id">rewriter2</str>
30       <str name="class">...</str>
31        ....
32     </lst>
33
34     <lst name="rewriter">
35       <str name="class">...</str>
36        ....
37     </lst>
38
39     -->
40
41
42</queryParser>

The lst element rewriteChain (line #6) serves as a container for the rewriters.

Each rewriter is defined in a rewriter lst element (#11).

All rewriters must have a class property (#15) that specifies a factory for creating the rewriter.

The id property (#13) is optional. In some cases the id is used to route request parameters to a specific rewriter.

The ‘id’ and ‘class’ properties are the only properties that are available for all rewriters. Rewriters can have additional properties that will only have a meaning for the specific rewriter implementation.

In the example, the rules property specifies the resource that contains rule definitions for the ‘Common Rules Rewriter’. Resources are files that are either kept in ZooKeeper as part of the configset (SolrCloud) or in the ‘conf’ folder of a Solr core in standalone or master-slave Solr. They can be gzipped, which will be auto-detected by Querqy, regardless of the file name. If you keep your files in ZooKeeper, remember the maximum file size in ZooKeeper (default: 1 MB).

Example: Configuring rewriter via curl (Querqy 5)

Note

In these examples we use curl and jq to retrieve and edit rewriter configuration from a running Solr installation. We assume, that the Solr instance is reachable at http://localhost:8983. Configure your Solr target using the environment variables below.

List configured rewriters

This will list all configured rewriters as JSON response. Use the rewriters id to retrieve it’s details using the subsequent examples.

1SOLR_URL="http://localhost:8983"
2SOLR_COLLECTION="collection"
3curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter" \
4    | jq '.response.rewriters'
 1{
 2  "filter": {
 3    "id": "filter",
 4    "path": "/querqy/rewriter/filter"
 5  },
 6  "synonyms": {
 7    "id": "synonyms",
 8    "path": "/querqy/rewriter/synonyms"
 9  }
10}

Get rules for a single rewriter

This example will return the Querqy rules configured for a single rewriter as raw output on the console.

1SOLR_URL="http://localhost:8983"
2SOLR_COLLECTION="collection"
3QUERQY_REWRITER="synonyms"
4curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}" \
5    | jq -r '.rewriter.definition.config.rules'

Edit rules for a single rewriter

Downloads the Querqy rules for a single rewriter into a temporary file to edit.

1SOLR_URL="http://localhost:8983"
2SOLR_COLLECTION="collection"
3QUERQY_REWRITER="synonyms"
4curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}" \
5    | jq -r '.rewriter.definition.config.rules' \
6    > /tmp/${QUERQY_REWRITER}.txt

Edit the Querqy rules in /tmp/${QUERQY_REWRITER}.txt. Afterwards upload them using the following curl call.

1curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}" \
2    | jq -r --arg rules "$(cat /tmp/${QUERQY_REWRITER}.txt)" \
3        '.rewriter.definition | .config.rules |= $rules' \
4    | curl -X POST -H "Content-Type: application/json" --data-binary @- \
5        "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}?action=save"

List of available rewriters

The list below contains all rewriters that come with Querqy. Click on the rewriter name to see the documentation.

Common Rules Rewriter

Query-dependent rules for synonyms, result boosting (up/down), filters; ‘decorate’ result with addition information

Replace Rewriter

Replace query terms. Used as a query normalisation step, usually applied before the query is processed further, for example, before the Common Rules Rewriter is applied

Word Break Rewriter

(De)compounds query tokens. Splits compound words or creates compounds from separate tokens.

Number-Unit Rewriter

Recognises numerical values and units of measurement in the query and matches them with indexed fields. Allows for range matches and boosting of the exactly matching value.

Shingle Rewriter

Creates shingles (compounds) from adjacent query tokens and adds them as synonyms.