Skip to content
This repository has been archived by the owner on May 30, 2019. It is now read-only.

Samplestack Search Configuration

Charles Greer edited this page Jan 28, 2015 · 9 revisions

In Progress

The search in Samplestack, as supported here Search Tips is powered by the MarkLogic Search API. This page describes how the Search API is configured to support this search scenario, how Samplestack accesses the Search API, and how the build process configures MarkLogic's runtime search capabilities.

So, in short, the Java Middle tier talks to an instance of the MarkLogic REST API, (by default on localhost, port 8006). This REST API instance, accessed from Java, can be configured and extended as part of the build process. One such extension is to store a set of "Search Options" on the server that can be accessed by name as part of a runtime search. These configurations and extensions are updated in the Java stack as part of the ./gradlew dbconfigure task.

Options

Samplestack uses several such "Search Options" files, contained at /database/options. The main search configuration is in the one called questions.json. You don't have to do any application-level configuration to take advantage of MarkLogic Search capabilities -- you just have to understand where to find your data and what kind of search you expect from it.

Constraints

First, this file contains "constraints." The name of each constraint translates to a prefix you can use in the Search Bar in Samplestack.

  • askedBy, answeredBy, commentedBy, id. These three constraints work the same way. Each one describes a "path-index" which limits the search to the values within a particular part of the JSON document. Each of these constraints is backed by a range index, which means that Samplestack can find documents by these criteria very quickly, and could do sorting and comparison operations and facets on them too as needed.

  • user, userName, id, resolved These constraints are configured simply to look for a JSON property called "displayName", "userName" or "id" and search for exact matches. It's a so-called value query, which matches the exact value of a particular property, rather than, say, a word within that value. value-query is supported by MarkLogic's universal index, and is available out-of-the box on whatever you ingest.

  • votes, answers. These constraints are backed by range indexes, and hence support GT and LE operators, and can be used in sorting. The sorting configuration is also in this file, but further down.

  • tag This is a contraint for searching on the "tag" JSON property, but it's also configured for facet resolution. This means that searches will return unique values and frequencies across this value.

Operators

The "operator" section of Search Options defines, in this application, the sort configurations. Although Samplestack provides controls for these states, you can also activate them in a Search box. The three sort states presented for Samplestack searches are by 'relevance', 'vote count' and 'lastActivityDate' which records the last activity on a QnADocument.

Term

The default search, where you don't use a prefix, is configured here. In Samplestack we use a so-called field, which is a slightly more complex object that combines and weights various parts of the document to influence relevance. This field is configured as part of the database setup, and as such is in /database/database-properties.json:

{
            "field-name": "default-samplestack-search",
            "field-path": [
                {
                    "path": "/title",
                    "weight": 2
                },
                {
                    "path": "/text",
                    "weight": 2
                },
                {
                    "path": "/answers/text",
                    "weight": 1
                },
                {
                    "path": "//comments/text",
                    "weight": 0.5
                }
            ],
            "tokenizer-overrides": null
        }

This field configuration assigns various weights to text depending on whether its in a question body, a title, in the text of an answer, or in a comment.

Result Transform

The configuration of the Search snippets is also here in the Search Options. the "transform-results" section provides some ways to generate basic snippets out of the box. Customization is also an option.

Indexes

As part of dbconfigure, gradle also uploads a configuration for the database from /database/database-properties.json. This defines range indexes, which will be created as soon as this file is processed by the Management API. Each range index from the above section has a correlation in the database-properties.json. You don't need one for value-query, but for range queries you most certainly do.