Querqy 4 for Solr

This page documents Querqy 4 for Solr. Querqy 4 has been superseded by the current version of Querqy for Solr, which manages rewriter configurations via a REST API rather than solrconfig.xml. If you are running Querqy 4 and considering an upgrade, see the migration guide.

Installation

Add the Querqy query parser and query component to your solrconfig.xml:

<!--
    Add the Querqy query parser.
-->
<queryParser name="querqy" class="querqy.solr.DefaultQuerqyDismaxQParserPlugin"/>

<!--
  Override the default QueryComponent.
-->
<searchComponent name="query" class="querqy.solr.QuerqyQueryComponent"/>

Making Queries

The Querqy query parser is enabled using the defType parameter:

/solr/mycollection/select?q=notebook&defType=querqy&qf=title^3.0 brand^2.1 shortSummary

The rewrite chain configured in solrconfig.xml is automatically applied to every request.

Configuring Rewriters

In Querqy 4, rewriters are configured as child elements of the query parser in solrconfig.xml. Together they form the rewrite chain, applied in the order in which they are defined:

 1<queryParser name="querqy" class="querqy.solr.DefaultQuerqyDismaxQParserPlugin">
 2
 3  <lst name="rewriteChain">
 4
 5      <!--
 6        Common Rules Rewriter
 7      -->
 8      <lst name="rewriter">
 9
10        <str name="id">commonRules</str>
11
12        <str name="class">querqy.solr.SimpleCommonRulesRewriterFactory</str>
13
14          <!--
15            The file that contains rules for synonyms, boosting etc.
16          -->
17        <str name="rules">rules.txt</str>
18
19      </lst>
20
21      <!--
22
23        You can add more rewriters here
24
25      <lst name="rewriter">
26        <str name="id">rewriter2</str>
27        <str name="class">...</str>
28        ....
29      </lst>
30
31      <lst name="rewriter">
32        <str name="class">...</str>
33        ....
34      </lst>
35
36      -->
37
38
39</queryParser>

The lst element rewriteChain (line #6) serves as a container for the rewriters.

Each rewriter is defined in a rewriter lst element (#11).

All rewriters must have a class property (#15) that specifies a factory for creating the rewriter.

The id property (#13) is optional. In some cases the id is used to route request parameters to a specific rewriter.

The ‘id’ and ‘class’ properties are the only properties that are available for all rewriters. Rewriters can have additional properties that will only have a meaning for the specific rewriter implementation.

In the example, the rules property specifies the resource that contains rule definitions for the ‘Common Rules Rewriter’. Resources are files that are either kept in ZooKeeper as part of the configset (SolrCloud) or in the ‘conf’ folder of a Solr core in standalone or master-slave Solr. They can be gzipped, which will be auto-detected by Querqy, regardless of the file name. If you keep your files in ZooKeeper, remember the maximum file size in ZooKeeper (default: 1 MB).

Rewriter Configurations

Common Rules Rewriter

The rules for the Common Rules Rewriter are maintained in the resource configured as the rules property of the SimpleCommonRulesRewriterFactory:

1<queryParser name="querqy" class="querqy.solr.DefaultQuerqyDismaxQParserPlugin">
2  <lst name="rewriteChain">
3      <lst name="rewriter">
4        <str name="class">querqy.solr.SimpleCommonRulesRewriterFactory</str>
5        <str name="rules">rules.txt</str>
6    </lst>
7  </lst>
8</queryParser>

The rules file must be in UTF-8 character encoding. The maximum file size is 1 MB if Solr runs as SolrCloud and if you did not change the maximum file size in ZooKeeper. The file can be gzipped — Querqy will automatically detect and decompress it.

Configuration reference:

<lst name="rewriter">
  <str name="class">querqy.solr.SimpleCommonRulesRewriterFactory</str>
  <str name="rules">rules.txt</str>
  <bool name="ignoreCase">true</bool>
  <bool name="buildTermCache">true</bool>
  <str name="boostMethod">MULTIPLICATIVE</str>
  <str name="querqyParser">querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory</str>
</lst>
rules

The rule definitions file. The file is kept in the configset of the collection in ZooKeeper (SolrCloud) or in the ‘conf’ folder of the Solr core in standalone or master-slave Solr. Can be gzipped (max 1 MB in ZooKeeper).

Required.

ignoreCase

Ignore case in input matching for rules?

Default: true

buildTermCache

Whether to build a term cache from matching terms. This is an optimisation that might not be feasible for very large rule lists.

Default: true

boostMethod

How to combine UP/DOWN boosts with the score of the main user query. Available methods are ADDITIVE and MULTIPLICATIVE.

Default: ADDITIVE

querqyParser

The querqy.rewrite.commonrules.QuerqyParserFactory to use for parsing strings from the right-hand side of rules into query objects.

Default: querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory

Rule selection by property

To use rule selection (filtering rules by property), the rewriter must have an id configured in solrconfig.xml. This id is then referenced in request parameters:

<queryParser name="querqy" class="querqy.solr.DefaultQuerqyDismaxQParserPlugin">
  <lst name="rewriteChain">
    <lst name="rewriter">
      <!--
          Note the rewriter ID:
      -->
      <str name="id">common1</str>
      <str name="class">querqy.solr.SimpleCommonRulesRewriterFactory</str>
      <str name="rules">rules.txt</str>
      <!-- ... -->
    </lst>
  </lst>
</queryParser>

Rule selection request parameters:

querqy.common1.criteria.sort=priority desc
querqy.common1.criteria.limit=1

The parameters have the prefix querqy.<rewriterID>.criteria where the rewriter ID matches the id configured in solrconfig.xml.

Replace Rewriter

1<lst name="rewriter">
2  <str name="class">querqy.solr.contrib.ReplaceRewriterFactory</str>
3  <str name="rules">replace-rules.txt</str>
4  <str name="ignoreCase">true</str>
5  <str name="inputDelimiter">;</str>
6  <str name="querqyParser">querqy.rewrite.commonrules.WhiteSpaceQuerqyParserFactory</str>
7</lst>

The rules property references a file in ZooKeeper (SolrCloud) or in the conf directory (standalone) that contains the replace rules. The property ignoreCase defines whether the rewriter differentiates between upper- and lowercase when matching query terms (default: true). The property inputDelimiter enables configuring multiple input definitions for the same output, separated by the configured delimiter (default is tab).

Word Break Rewriter

 1<lst name="rewriter">
 2  <str name="class">querqy.solr.contrib.WordBreakCompoundRewriterFactory</str>
 3  <str name="dictionaryField">f1</str>
 4  <bool name="lowerCaseInput">true</bool>
 5  <int name="decompound.maxExpansions">5</int>
 6  <bool name="decompound.verifyCollation">true</bool>
 7  <str name="morphology">GERMAN</str>
 8  <arr name="reverseCompoundTriggerWords">
 9    <str>for</str>
10  </arr>
11  <arr name="protectedWords">
12    <str>slipper</str>
13    <str>wissenschaft</str>
14  </arr>
15</lst>

Number-Unit Rewriter

1<lst name="rewriter">
2  <str name="class">querqy.solr.contrib.NumberUnitRewriterFactory</str>
3  <str name="config">number-unit-config.json</str>
4</lst>

The config property references a JSON configuration file in ZooKeeper (SolrCloud) or in the conf directory (standalone).

Shingle Rewriter

<lst name="rewriter">
  <str name="class">querqy.solr.contrib.ShingleRewriterFactory</str>
  <bool name="acceptGeneratedTerms">false</bool>
</lst>
acceptGeneratedTerms

If true, also create shingle tokens from terms that were created by other rewriters earlier in the rewrite chain.

Default: false

Advanced Configuration

Term Query Cache

The term query cache avoids building Lucene queries for sub-queries that never match in specific fields.

Version-independent cache configuration (solrconfig.xml):

<query>
  <!-- Place a custom cache in the <query> section: -->
  <cache name="querqyTermQueryCache"
         class="solr.LFUCache"
         size="1024"
         initialSize="1024"
         autowarmCount="0"
         regenerator="solr.NoOpRegenerator"
  />

  <listener event="firstSearcher" class="querqy.solr.TermQueryCachePreloader">
    <str name="fields">f1 f2</str>
    <str name="qParserPlugin">querqy</str>
    <str name="cacheName">querqyTermQueryCache</str>
    <bool name="testForHits">true</bool>
  </listener>

  <listener event="newSearcher" class="querqy.solr.TermQueryCachePreloader">
     <str name="fields">f1 f2</str>
     <str name="qParserPlugin">querqy</str>
     <str name="cacheName">querqyTermQueryCache</str>
     <bool name="testForHits">true</bool>
  </listener>
</query>

Tell the Querqy query parser to use the custom cache:

<queryParser name="querqy" class="querqy.solr.DefaultQuerqyDismaxQParserPlugin">

   <str name="termQueryCache.name">querqyTermQueryCache</str>

   <bool name="termQueryCache.update">false</bool>

   <lst name="rewriteChain">
     <!-- ... -->
   </lst>

</queryParser>

The Query String Parser

The query string parser defines how the query string passed in request parameter q is parsed. It can be set using a parser element in the configuration:

<queryParser name="querqy" class="querqy.solr.DefaultQuerqyDismaxQParserPlugin">

  <str name="parser">querqy.parser.WhiteSpaceQuerqyParser</str>

  <!-- ... -->

</queryParser>

The default WhiteSpaceQuerqyParser is sufficient for most use cases.

Migrating to the Current Version

For detailed information about changes and a migration guide from Querqy 4 to the current version, see Migrating to Querqy 5 for Solr.