Rewriters¶
Rewriters manipulate the query that was entered by the user. They can change the result set by adding alternative tokens, by removing tokens or by adding filters. They can also influence the ranking by adding boosting information.
A single query can be rewritten by more than one rewriter. Together they form the rewrite chain.
Before you can apply a rewrite chain, you need to configure one or more rewriters.
Configuring and applying a rewriter¶
We will use a minimal example of the ‘Common Rules Rewriter’ - Querqy’s most popular rewriter - to demonstrate how a rewrite chain is configured in principle.
As search engines differ in how configurations are supplied to them, select your search engine below.
Rewriters in Elasticsearch/OpenSearch¶
Querqy adds a REST endpoint to Elasticsearch/OpenSearch for managing rewriters at
/_querqy/rewriter
Creating/configuring a ‘Common Rules rewriter’:
PUT /_querqy/rewriter/common_rules
1{
2 "class": "querqy.elasticsearch.rewriter.SimpleCommonRulesRewriterFactory",
3 "config": {
4 "rules" : "notebook =>\nSYNONYM: laptop"
5 }
6}
Note
OpenSearch users: Simply replace package name elasticsearch
with opensearch
in rewriter configurations.
Rewriter definitions are uploaded by sending a PUT request to the rewriter
endpoint. The last part of the request URL path (common_rules
) will
become the name of the rewriter.
A rewriter definition must contain a class element (line #2). Its value references an implementation of a querqy.elasticsearch.ESRewriterFactory which will provide the rewriter that we want to use.
The rewriter definition can also have a config object (#3) which contains the rewriter-specific configuration.
In the case of the SimpleCommonRulesRewriter, the configuration must contain the rewriting rules (#4). Remember to escape line breaks etc. when you include your rules in a JSON document.
We can now apply one or more rewriters to a query:
POST /myindex/_search
1{
2 "query": {
3 "querqy": {
4 "matching_query": {
5 "query": "notebook"
6 },
7 "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"],
8 "rewriters": ["common_rules"]
9 }
10 }
11}
The rewriters are added to the
minimal query that we constructed earlier using a
list of named rewriters
(line #8). This list contains the rewrite chain
- the list of rewriters in the order in which they will be applied and in which
they will manipulate the query. The above example contains only a single
rewriter.
Rewriters are referenced in the rewriters
element either just by their
name or by the name
property of an object which allows to pass request
parameters to the rewriter. The following example shows two rewriters, one of
them with additional parameters:
POST /myindex/_search
1{
2 "query": {
3 "querqy": {
4 "matching_query": {
5 "query": "notebook"
6 },
7 "query_fields": [ "title^3.0", "brand^2.1", "shortSummary"],
8 "rewriters": [
9 "word_break",
10 {
11 "name": "common_rules",
12 "params": {
13 "criteria": {
14 "filter": "$[?(!@.prio || @.prio == 1)]"
15 }
16 }
17 }
18 ]
19 }
20 }
21}
The first rewriter, word_break (line #9), is just referenced by its name (we
will see a ‘word break rewriter’ configuration later. The second rewriter
is called in a JSON object. Its name
property references the rewriter
definition by the rewriter name, ‘common_rules’ (#11). The params
object
(#12) is passed to the rewriter.
In the example, params contains a criteria
object (#13). This parameter
is specific to the Common Rules rewriter. The filter expression in the example
ensures that only rules that either have a prio property set to 1 or that don’t
have any prio property at all will be applied.
In the above example rewrite chain, the word_break rewriter will be applied
before the common_rules rewriter due to the order of the rewriters in the
rewriters
JSON list element.
Updating and deleting rewriters¶
To update a rewriter configuration, just send the updated configuration in a
PUT
request to the same rewriter URL again.
To delete a rewriter, send a request with HTTP method DELETE
to the
rewriter URL. For example,
DELETE /_querqy/rewriter/common_rules
will delete your common_rules rewriter.
Rewriter configuration in Solr¶
Warning
Querqy configuration has changed in an incompatible way with the introduction of Querqy v5 for Solr. Make sure to follow the documentation for your Querqy version below. See here for detailed information about changes and a migration guide to Querqy 5 for Solr
Querqy 5
Querqy adds a URL endpoint to Solr for managing rewriters. When you set up
Querqy in solrconfig.xml
, you’ve added a request handler for this:
<requestHandler name="/querqy/rewriter" class="querqy.solr.QuerqyRewriterRequestHandler" />
You can then manage your rewriters at
http://<solr host>:<port>/solr/mycollection/querqy/rewriter
Creating/configuring a ‘Common Rules rewriter’:
POST /solr/mycollection/querqy/rewriter/common_rules?action=save
Content-Type: application/json
1{
2 "class": "querqy.solr.rewriter.commonrules.CommonRulesRewriterFactory",
3 "config": {
4 "rules" : "notebook =>\nSYNONYM: laptop"
5 }
6}
Rewriter definitions are uploaded by sending a POST request and appending the
action=save
parameter to the rewriter endpoint. The last part of the
request URL path (common_rules
) will become the name of the rewriter.
A rewriter definition must contain a class element (line #2). Its value references an implementation of a querqy.solr.SolrRewriterFactoryAdapter which will provide the rewriter that we want to use.
The rewriter definition can also have a config object (#3-5), which contains the rewriter-specific configuration. In the case of the CommonRulesRewriterFactory, the configuration must contain the rewriting rules (#4). Remember to escape line breaks etc. when you include your rules in a JSON document.
If you work with SolrJ, you can create your configuration request using a
request that comes with most of the Querqy-supplied rewriters. Just look out for
the *ConfigRequestBuilder
classes in the Java packages under
querqy.solr.rewriter
.
Once we have managed our rewriter configuration, We can apply one or more rewriters to a query:
GET /solr/mycollection/select?q=notebook&defType=querqy&querqy.rewriters=common_rules&qf=title^3.0...
The parameter defType=querqy
enables the Querqy query parser. The
optional parameter querqy.rewriters
contains a list of comma-separated
rewriter names. These rewriters form the rewrite chain and they are processed in
their order of occurrence. In this specific example, we only used the rewriter
that we defined in our POST request above and we reference it by its name
common_rules
. Had we configured another rewriter under
/solr/mycollection/querqy/rewriter/replace
, we could apply the
‘replace’ rewriter before the ‘common_rules’ rewriter using the URL parameter
querqy.rewriters=replace,common_rules
.
By default, Solr will reply with a 400 Bad Request
response, if a
rewriter that was passed in in the ‘querqy.rewriters’ parameter does not exist.
Please see this section in the ‘Advanced Solr
Plugin Configuration’ documentation for an option to ignore missing rewriters.
Updating and deleting rewriters (Querqy 5)¶
To update a rewriter configuration, just send the updated configuration in a
POST request with action=save
to the same rewriter URL again.
To delete a rewriter, send a POST request with action=delete
to the
rewriter URL. For example,
POST /solr/mycollection/querqy/rewriter/common_rules?action=delete
will delete your common_rules rewriter.
Getting rewriter information (Querqy 5)¶
You can get a list of configured rewriters at:
GET /solr/mycollection/querqy/rewriter
To retrieve the configuration of a specific rewriter, you can make a GET call
against its endpoint. In the case of the common_rules
rewriter above,
the call would be:
GET /solr/mycollection/querqy/rewriter/common_rules
Querqy 4
The rewrite chain is configured at the Querqy query parser in solrconfig.xml:
1<!--
2 Add the Querqy query parser.
3-->
4<queryParser name="querqy" class="querqy.solr.DefaultQuerqyDismaxQParserPlugin">
5
6 <lst name="rewriteChain">
7
8 <!--
9 Common Rules Rewriter
10 -->
11 <lst name="rewriter">
12
13 <str name="id">commonRules</str>
14
15 <str name="class">querqy.solr.SimpleCommonRulesRewriterFactory</str>
16
17 <!--
18 The file that contains rules for synonyms, boosting etc.
19 -->
20 <str name="rules">rules.txt</str>
21
22 </lst>
23
24 <!--
25
26 You can add more rewriters here
27
28 <lst name="rewriter">
29 <str name="id">rewriter2</str>
30 <str name="class">...</str>
31 ....
32 </lst>
33
34 <lst name="rewriter">
35 <str name="class">...</str>
36 ....
37 </lst>
38
39 -->
40
41
42</queryParser>
The lst
element rewriteChain
(line #6) serves as a container for
the rewriters.
Each rewriter is defined in a rewriter
lst
element (#11).
All rewriters must have a class
property (#15) that specifies a factory
for creating the rewriter.
The id
property (#13) is optional. In some cases the id is used to route
request parameters to a specific rewriter.
The ‘id’ and ‘class’ properties are the only properties that are available for all rewriters. Rewriters can have additional properties that will only have a meaning for the specific rewriter implementation.
In the example, the rules
property specifies the resource that contains rule
definitions for the ‘Common Rules Rewriter’. Resources are files that are either
kept in ZooKeeper as part of the configset (SolrCloud) or in the ‘conf’ folder
of a Solr core in standalone or master-slave Solr. They can be gzipped, which
will be auto-detected by Querqy, regardless of the file name. If you keep your
files in ZooKeeper, remember the maximum file size in ZooKeeper (default: 1 MB).
Example: Configuring rewriter via curl (Querqy 5)¶
Note
In these examples we use curl
and jq
to retrieve and edit
rewriter configuration from a running Solr installation. We assume, that
the Solr instance is reachable at http://localhost:8983
. Configure
your Solr target using the environment variables below.
List configured rewriters
This will list all configured rewriters as JSON response. Use the
rewriters id
to retrieve it’s details using the subsequent
examples.
1SOLR_URL="http://localhost:8983"
2SOLR_COLLECTION="collection"
3curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter" \
4 | jq '.response.rewriters'
1{
2 "filter": {
3 "id": "filter",
4 "path": "/querqy/rewriter/filter"
5 },
6 "synonyms": {
7 "id": "synonyms",
8 "path": "/querqy/rewriter/synonyms"
9 }
10}
Get rules for a single rewriter
This example will return the Querqy rules configured for a single rewriter as raw output on the console.
1SOLR_URL="http://localhost:8983"
2SOLR_COLLECTION="collection"
3QUERQY_REWRITER="synonyms"
4curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}" \
5 | jq -r '.rewriter.definition.config.rules'
Edit rules for a single rewriter
Downloads the Querqy rules for a single rewriter into a temporary file to edit.
1SOLR_URL="http://localhost:8983"
2SOLR_COLLECTION="collection"
3QUERQY_REWRITER="synonyms"
4curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}" \
5 | jq -r '.rewriter.definition.config.rules' \
6 > /tmp/${QUERQY_REWRITER}.txt
Edit the Querqy rules in /tmp/${QUERQY_REWRITER}.txt
. Afterwards upload them
using the following curl
call.
1curl -s "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}" \
2 | jq -r --arg rules "$(cat /tmp/${QUERQY_REWRITER}.txt)" \
3 '.rewriter.definition | .config.rules |= $rules' \
4 | curl -X POST -H "Content-Type: application/json" --data-binary @- \
5 "${SOLR_URL}/solr/${SOLR_COLLECTION}/querqy/rewriter/${QUERQY_REWRITER}?action=save"
List of available rewriters¶
The list below contains all rewriters that come with Querqy. Click on the rewriter name to see the documentation.
- Common Rules Rewriter
Query-dependent rules for synonyms, result boosting (up/down), filters; ‘decorate’ result with addition information
- Replace Rewriter
Replace query terms. Used as a query normalisation step, usually applied before the query is processed further, for example, before the Common Rules Rewriter is applied
- Word Break Rewriter
(De)compounds query tokens. Splits compound words or creates compounds from separate tokens.
- Number-Unit Rewriter
Recognises numerical values and units of measurement in the query and matches them with indexed fields. Allows for range matches and boosting of the exactly matching value.
- Shingle Rewriter
Creates shingles (compounds) from adjacent query tokens and adds them as synonyms.