Skip to content

Latest commit

 

History

History
228 lines (178 loc) · 12.4 KB

usage-sharing-element.md

File metadata and controls

228 lines (178 loc) · 12.4 KB

Usage sharing element

This Flow Element implements the usage-sharing feature.

Accepted Evidence

The accepted Evidence for share usage is unusual as it cannot simply be a list of accepted keys like most Flow Elements.

Instead, it is a filter function that will need to return false (i.e. not shared) for any key where:

  • Prefix = header. and field is in a user-configurable block list (by default, this block list SHOULD include the cookies HTTP header).
  • Prefix = cookie. and field does not start with 51D_
  • Prefix = query. and field does not start with 51D_ and is not in a user-configurable allow list of query string parameters that will be shared.

Anything else MUST return true. (i.e. it will be shared)

See configuration options for more detail on the configuration parameters that can affect this filter function.

Element Data

This Flow Element does not add an Element Data instance to Flow Data.

Start-up activity

A static XML snippet can be generated for values that will be invariant for the lifetime of the current Pipeline.

From the list of parameters in the processing section below, the invariant ones are:

  • Version
  • Product
  • Flow Elements
  • Language
  • LanguageVersion
  • ServerIP
  • Platform

Note that there is a proposal to avoid duplicating these values on every record.

Processing

The process function SHOULD get the data it needs and complete as soon as possible. We recommend implementing a producer/consumer approach where the process function just adds the raw data to a shared queue.

A background thread can then be used to consume items from this queue, adding them to the XML payload and sending it once it contains the configured quantity of data.

The complete XML document MUST contain a <Devices> root element with each <Device> element containing the details related to a request.

Multiple <Device> elements will be batched up as part of each message. The number of items per batch is configurable using the minimum entries per message setting.

In the snippet below:

  • Text in CAPS represent variable values.
  • Things in [square brackets] are optional
<Device>
  <SessionId>SESSION_ID</SessionId> 
  <Sequence>SEQUENCE</Sequence> 
  <DateSent>DATE_SENT</DateSent> 
  <Version>API_VERSION</Version> 
  <Product>API_NAME</Product>  
  <FlowElement>ELEMENT1</FlowElement> 
  <FlowElement>ELEMENT2</FlowElement>  
  <Language>LANG</Language> 
  <LanguageVersion>LANG_VERSION</LanguageVersion> 
  <ClientIP>IP_ADDRESS</ClientIP> 
  <ServerIP>IP_ADDRESS</ServerIP> 
  <Platform>PLATFORM PLATFORM_VER SERVICE_PACK</Platform> 
  <!--Each evidence value is represented as an entry in the form:--> 
  <PREFIX [escaped=true] [truncated=true] Name="FIELD">VALUE</PREFIX> 
  <!--For example:--> 
  <Header Name="user-agent">USER_AGENT</Header> 
  <Header Name="host">HOST</Header>
  <Cookie Name="51d_screenpixelsheight">SCREEN_HEIGHT</Cookie> 
</Device> 
  • SessionId – A GUID generated by the Sequence Element used to identify a set of Evidence originating from the same source where multiple requests are initiated to resolve the JSON payload for updated Evidence, e.g. client-side overrides or geo location.
  • Sequence – An integer paired with the SessionId which is incremented each time the sequence value is seen in Evidence to identify the number of requests in a session. (This incrementing is handled by the Sequence Element)
  • DateSent - The date that the xml was constructed in UTC and formatted as: yyyy-mm-ddThh:nn:ss
  • Version - The version of the Pipeline that is being used to generate the xml. E.g. “4.0”
  • Product - The name of the product that is generating the xml. For example, “Pipeline”
  • FlowElement – An xml element exists for each FlowElement in the Pipeline. The text contains the fully qualified class name of the FlowElement.
  • Language - The name of the language/framework within which the Pipeline is running.
  • LanguageVersion - The version number of the language/framework within which the Pipeline is running.
  • ClientIP - Public IP address of the client that sent the request to this server.
  • ServerIP - IP address of the server where the xml is being generated.
  • Platform - The platform name, version and service pack if available.
  • Evidence – The Evidence that is shared will be determined by the accepted Evidence.

Evidence values can be extremely long, or contain characters that are not permitted in XML. In order to handle this, some mechanism MUST be implemented to sanitize the values before they are added to the XML.

Where changes need to be made to the value, attributes are added to the XML element for that Evidence entry to let the reader know the value was modified.

Currently, the defined attributes are:

  • escaped - Indicates that the original evidence value included invalid characters, which have been replaced with escaped ones.
  • truncated - Indicates that the original evidence value was too long. The value in the XML does not include all characters from the original.

See the ReplacedString function in Java for an example of this.

Request detail

When sending the data, use the following details:

  • URL configurable using the share usage URL option.
  • POST request
  • XML content written to the body of the request using GZip compression
  • HTTP Headers:
    • content-encoding = gzip
    • content-type = text/xml

Response will be a 200 on success. Anything else is a failure. When failures occur, the status code and content of the response MUST be logged to aid troubleshooting.

See the BuildAndSendXml method in C# for an example of this.

Session tracking

Note that this section is not concerned with actually identifying the precise boundaries of user sessions. Just with reducing the volume of unnecessary data that is shared.

When a user is browsing a website, they will typically make many HTTP requests to access different resources, visit different pages, etc.

These requests will usually all generate exactly the same usage data. We have no need for this duplicate data, so it MUST be discarded as early as possible.

This can be achieved by maintaining a data structure containing recently shared Evidence values. If the data entry being processed already has a matching entry in this data structure, it MUST NOT be shared. For example, the C# implementation generates a hash of the Evidence values using the same logic as the caching feature.

Entries will have a configurable lifetime within this data structure. This is reset each time an entry with matching Evidence values is processed.

See configuration options for more detail on customizing this lifetime.

Note that certain Evidence values MUST NOT be included in the generation of this hash code as they are designed to make a request unique. For example, the sequence number generated by the Sequence Element.

We also include various configuration options for users to exclude any keys that might be identified to cause oversharing of data.

Error handling

As a general principle, share usage can be considered an important, but expendable activity. If errors occur then they MUST be handled and logged without disrupting anything else the Pipeline is doing.

For example, the queue of data to be sent has limited size. If this becomes full, additional data can simply be discarded until there is room to add entries again. However, it is important that these actions are logged so that users can identify potential issues with their usage sharing.

Cleanup

As this element uses a producer/consumer queue and background processing, it will need to handle cleanup a little more carefully than other elements.

  1. If needed, block while the consumer sends data waiting in the queue.
  2. Send a final message with any data remaining in the queue (even if it is less than the configured minimum batch size)
  3. Free any resources associated with the queue and background processing.

Configuration options

Parameter User configurable Optional Default Notes
Included query string parameters yes yes All query string and HTTP form parameters starting with 51D_ are shared. Allows the user to include specific query string or HTTP form parameters in the usage data that is shared.
Share all query string parameters yes yes false If this flag is set, all query string and HTTP form parameters will be shared.
Share all Evidence yes yes false If this flag is set, all Evidence values will be shared.
Blocked HTTP headers yes yes All HTTP headers are shared except for cookies that do not start with 51D_ Allows the user to exclude specific HTTP headers that they do not want to share.
Ignore Flow Data Evidence filter yes yes not set Allows the user to block a request from being shared if the specified criteria are met.
Share percentage yes yes 1.0 Used to set the approximate proportion of requests that will be shared. A value of 1 means that 100% of requests are shared.
Minimum entries per message yes yes 50 The number of requests that will be added to a usage sharing payload before it is sent.
Maximum queue size yes yes 1000 The size of the queue that is used to buffer requests to be added to a usage sharing payload.
Add timeout milliseconds yes yes 5 The timeout to use when trying to add items to the usage sharing queue. If the request times out, the data will be discarded
Take timeout milliseconds yes yes 100 The timeout to use when getting items from the usage sharing queue to add to the next payload.
Share usage URL yes yes https://devices-v4.51degrees.com/new.ashx The URL to send data to
Repeat Evidence interval minutes yes yes 20 The size of the sliding window during which identical usage data will not be sent if it is seen second or subsequent times.