-
Notifications
You must be signed in to change notification settings - Fork 56
Samplestack Search Configuration
The search capabilities in Samplestack, as documented in Search Tips is powered by the MarkLogic Search API. This page describes how the Search API is configured to support Samplestack's search scenarios, how Samplestack accesses the Search API, and how the build process configures MarkLogic's runtime search capabilities.
In short, the Java Middle tier talks to an instance of the MarkLogic REST API, (by default on localhost, port 8006). This REST API instance, accessed from Java, can be configured and extended as part of the build process. One such extension is to store a set of "Search Options" on the server that can be accessed by name as part of a runtime search. These configurations and extensions are updated in the Java stack as part of the ./gradlew dbconfigure
task.
Samplestack uses several such "Search Options" files, contained at /database/options
. The main search configuration is in the one called questions.json
. You don't have to do any application-level configuration to take advantage of MarkLogic Search capabilities -- you just have to understand where to find your data and what kind of search you expect from it.
First, this file contains "constraints." The name of each constraint translates to a prefix you can use in the Search Bar in Samplestack.
-
askedBy, answeredBy, commentedBy, id. Each of these constraints describes a "path-index" which limits the search to the values within a particular part of the JSON document. Each of these constraints is backed by a range index, which means that Samplestack can find documents by these criteria very quickly, and could also do sorting and comparison operations and facets on them as needed.
-
user, userName, id, resolved. These constraints are configured simply to look for a JSON property called "displayName", "userName" or "id" and search for exact matches. It's a so-called value query, which matches the exact value of a particular property, rather than, say, a word within that value. value-query is supported by MarkLogic's universal index, and is available out-of-the box on whatever you ingest.
-
votes, answers. These constraints are backed by range indexes, and hence support GT and LE operators, and can be used in sorting. The sorting configuration is also in this file, but further down.
-
tag This is a contraint for searching on the "tag" JSON property, but it's also configured for facet resolution. This means that searches will return unique values and frequencies across this property.
The "operator" section of the Samplestack Search Options defines the sort configurations. The three sort states presented for Samplestack searches are by 'relevance', 'vote count' and 'lastActivityDate' which records the last activity on a QnADocument. Although Samplestack provides controls for these states, you can also activate them in a Search box.
The default search, where you don't use a prefix, is configured here. In Samplestack we use a so-called field, which is a slightly more complex object that combines and weights various parts of the document to influence relevance. This field is configured as part of the database setup, and as such is in /database/database-properties.json
:
{
"field-name": "default-samplestack-search",
"field-path": [
{
"path": "/title",
"weight": 2
},
{
"path": "/text",
"weight": 2
},
{
"path": "/answers/text",
"weight": 1
},
{
"path": "//comments/text",
"weight": 0.5
}
],
"tokenizer-overrides": null
}
This field configuration assigns various weights to text depending on whether its in a question body, a title, the text of an answer, or a comment.
The Search Options include configuration of the Search snippets. The "transform-results" section provides some ways to generate basic snippets out of the box. Customization is also an option.
As part of dbconfigure
, gradle uploads a configuration for the database from /database/database-properties.json
. This defines range indexes, which will be created as soon as this file is processed by the Management API. Each range index from the above section has a correlation in the database-properties.json. The value query constraints like "userName" do not require an index, but range constraints such as "votes" do.
Samplestack's main search all goes through rawSearch():
package com.marklogic.samplestack.dbclient;
public class MarkLogicQnAService {
...
public ObjectNode rawSearch(ClientRole role, ObjectNode structuredQuery,
long start, DateTimeZone userTimeZone) {
...
}
}
This method is responsible for passing a structured query object, sent from the browser as a JSON object, to the MarkLogic Client API. (It also does some fancy work with dates and facets).
Inside that method you'll see how a Java client constructs a query and sends it to MarkLogic:
- You need a QueryManager to construct a query, and a DocumentManager to retrieve the documents that match a query.
QueryManager queryManager = clients.get(role).newQueryManager();
JSONDocumentManager docMgr = clients.get(role).newJSONDocumentManager();
- Create a query definition, bound to query options stored on MarkLogic, from the structured query object.
RawQueryDefinition qdef = queryManager.newRawStructuredQueryDefinition(
new JacksonHandle(docNode), QUESTIONS_OPTIONS);
- Set a response transform (also stored on MarkLogic).
ServerTransform responseTransform = new ServerTransform(SEARCH_RESPONSE_TRANSFORM);
qdef.setResponseTransform(responseTransform);
- Create a handle to encapsulate a JSON response from the search, and perform the search.
JacksonHandle responseHandle = new JacksonHandle();
DocumentPage docPage = null;
try {
docPage = docMgr.search(qdef, start, responseHandle);
} catch (com.marklogic.client.FailedRequestException ex) {
throw new SamplestackSearchException(ex);
}
- Get a JSON node view of the response and do things with it.
ObjectNode responseNode = (ObjectNode) responseHandle.get();
This is the main interaction, and a non-trivial one, between a Java Client and MarkLogic search.