Skip to content

Performance

Paul Stovell edited this page Apr 20, 2020 · 24 revisions

This page has tips for getting better performance out of Nevermore.

Nevermore 12.0+ is generally faster for most queries, and with larger documents, uses much less memory. If you're on an older version, consider upgrading. See what's new in Nevermore 12.0+ for benchmarks.

How Nevermore works (performance)

Nevermore is optimized particularly around reading, with a focus on being fast (similar to hand-coded SqlCommand speed) and with very few large memory allocations. This section explains a bit about how it all works.

Reading

When querying or loading documents, Nevermore uses a different "strategy" depending on the type being queried (primitive, value tuple, arbitrary class, or a mapped document). Each strategy behaves a little differently, but the core idea is the same.

Reader behavior

When Nevermore calls ExecuteReader, it specifies the SequentialAccess command behavior for the data reader, which allows Nevermore to read and process data as it streams in from SQL Server without buffering the entire row. This is especially important for large documents. Each Nevermore strategy ensures data is only read in the correct order (i.e., field 2 is read before field 3...)

Compilation

Most strategies keep a cache of queries they've seen before, and a compiled func that knows how to execute the reader for that particular query.

  • For primitives, it creates a func for each type you query (int, string)
  • For ValueTuples, it creates a func for each tuple combination you query (ValueTuple<int, int>, ValueTuple<string, int>)
  • For arbitrary classes (without a document map) + documents, it creates one per type + returned columns

The compiled func will look as much as possible as if it was written by hand. For example, when reading a document with two mapped columns, it will generate and compile something like this:

(DbDataReader reader, DocumentReaderContext context) =>
{
    var temp0 = reader.GetString(0);                                     // Id
    var temp1 = reader.GetString(1);                                     // FirstName
    var temp2 = reader.IsDBNull(2) ? (int?)null : reader.GetInt32(2);    // Age

    var jsonResult = context.DeserializeText<Contact>(reader, 7);

    result.Id = temp0;
    result.FirstName = temp1;
    result.Age = temp2;
    return result;
}

The above is overly simplified - there are extra null checks (e.g., if GetString() returns DBNull.Value), and some field assignments to keep track of what column is being read, in case there's an error, and so on. But you get the picture. Note that each field is read and converted into temporary variables, as we can't deserialize until we encounter the JSON column, and the JSON column is usually not at the start of the query.

If you're curious, look at DocumentReaderExpressionBuilder in the source code. Also, if your query causes an exception, the error message will include the source code for the generated reader in case it isn't what you expect.

The func that is compiled out of this will be fast, but building it is slow, so the result is kept for future queries.

Deserialization

Nevermore can store JSON either as text, or in a compressed format.

For JSON as text, there are two strategies: one when we know there will always be small documents, and one if we think there could be large documents.

  • Small documents: Nevermore simply calls DbDataReader.GetString() and then passes it to JSON.NET for deserialization
  • Large documents: Nevermore uses a custom TextReader that calls through to DbDataReader.GetChars() which allows it to read and process the text directly into a buffer as needed, without loading the entire document into memory.

The benefit of the large document approach is great: Nevermore can completely avoid LOH allocations and keep overall memory usage down. However, for small documents (1K or smaller) it's slightly slower, which is why there's two strategies.

How does Nevermore know which strategy to use? It looks at an ExpectLargeDocuments property on the document map, which defaults to false. If it's true, Nevermore always uses the large document strategy.

By default, ExpectLargeDocuments is false - but don't worry. If Nevermore reads a string larger than 1K when deserializing, it will set ExpectLargeDocuments to true and remember it for next time (until the application restarts). So in this way, Nevermore is self-tuning.

This means:

  • If you know for sure that the table will have at least a few large documents, set ExpectLargeDocuments to true.
  • If you are very confident it won't, or at least for some customers it won't, leave it as false.

In Octopus Deploy's case, roughly 80% of documents fit the small category (1K or under). Looking at large customer databases, only certain tables tend to have clusters of large documents.

For compressed JSON, Nevermore calls GetStream on the data reader, and sends it through a GZipStream to decompress through to the JSON.NET deserializer.

Nevermore also makes use of array pools when possible to reduce allocations, though all buffers used are quite small anyway.

Streaming results

Once the expression is compiled, the actual reading is simply done one row at a time by calling the compiled func.

Writing

In general, most optimization work in Nevermore is focussed on reading data. For writing, the focus is mostly on memory use, especially with large documents.

When inserting or updating documents, Nevermore simply generates a SQL statement with the fields you'd expect, and generates command arguments, then executes it. The properties are read from the object into the SqlCommand using compiled funcs (one for each property, cached on the document map).

When serializing JSON (as text or compressed) Nevermore first has to serialize the document to a memory stream. For this, we use RecyclableMemoryStream, which in testing works much better especially when documents can be quite large. The memory stream is then passed to the SqlCommand as a stream - when the command is executed, the stream is read. Again, this means there should be no LOH allocations when dealing with even very large documents.

When JSON is compressed, Nevermore uses GZipStream with the compression level set to Optimal by default. This is because we assume the SQL Server database will be on another server, and the goal is to reduce network I/O on large documents. Documents are written infrequently but read often, so this seems like a good tradeoff. You can, however, change the compression level if you need to.

Other

Beyond reading and writing documents, Nevermore has a few other operations - ExecuteQuery, ExecuteReader and so on. These all call the corresponding method on SqlCommand so there isn't much to optimize.

Custom serializer?

Nevermore uses JSON.NET by default. You could instead use the System.Text.Json serializer, which seems to use less memory. You can do this by setting your own IDocumentSerializer on the store configuration. However, System.Text.Json makes the assumption that everything you do is in UTF-8, which isn't the case in SQL Server, so you might waste time converting anyway.

How to get the best performance from Nevermore

Consider enabling compression

If the time spent reading and writing JSON from the database and transferring it over the network between the application and SQL Server is a concern, consider enabling compression. This makes a very big difference in bandwidth-constrained environments, especially for larger documents.

Performance benchmarks

TODO: Include

Pitfalls

**Avoid stream with projection

There is an overload of Stream that takes a callback, which is used when doing a projection. It's used when you want to do a query with a join, then map the result to a pair (or three or four) documents. It works well when the result is small (or when you query an arbitrary class) but not for large documents. This is because the data reader cannot use SequentialAccess (as your projection may be out of order), so the entire row must be read into memory.

Large strings/arrays

Nevermore works hard to avoid any LOH allocations (objects 85K or more) by streaming and buffering everything it can. However, if your document includes one huge allocation anyway - e.g., because it has an 85K HTML string embedded in it, or a giant array of ints - then those allocations will still be needed.

For this kind of design, you might want to consider a separate table. One table could store the JSON (which is fully deserialized) but when the full data is needed, you could fetch from the second table instead (of course, maybe this data makes more sense to store on disk anyway!).

If you call IRelationalTransaction.ExecuteReader you'll get raw access to a data reader - you could use this, for example, to read the text using GetChars in chunks and write it to the HTML response directly, without ever allocating a large string.

Clone this wiki locally