Mapping in Divolte Collector is the definition that determines how incoming events are translated into Avro records conforming to a schema. This definition is constructed using a Groovy-based DSL (Domain-Specific Language).
Most clickstream data collection services or solutions use a canonical data model that is specific to click events and related properties. Things such as location, referrer, remote IP address, path, etc. are all properties of a click event that come to mind. While Divolte Collector exposes all of these fields just as well, it is our vision that this is not enough to make it easy to build online and near real-time data driven products within specific domains and environments. For example, when working on a system for product recommendation, the notion of a URL or path for a specific page is completely in the wrong domain; what you care about in this case is likely a product ID and probably a type of interaction (e.g. product page view, large product photo view, add to basket, etc.). It is usually possible to extract these pieces of information from the clickstream representation, which means custom parsers have to be created to parse this information out of URLs, custom events from JavaScript and other sources. This means that whenever you work with the clickstream data, you have to run these custom parsers initially in order to get meaninful, domain specific information from the data. When building real-time systems, it normally means that this parser has to run in multiple locations: as part of the off line processing jobs and as part of the real-time processing.
With Divolte Collector, instead of writing parsers and working with the raw clickstream event data in your processing, you define mappings that allows Divolte Collector to do all the required parsing on the fly as events come in and subsequently produce structured records with a schema to use in further processing. This means that all data that comes in can already have the relevant domain specific fields populated. Whenever the need for a new extracted piece of information arises, you can update the mapping to include the new field in the newly produced data. The older data that lacks newly additional fields can co-exist with newer data that does have the additional fields through a process called schema evolution. This is supported by Avro’s ability to read data with a different schema from the one that the data was written with. (This is implemented at read-time using a process called schema resolution.)
The goal of the mapping is to get rid of log file or URL parsing on collected data after it is published. The event stream from Divolte Collector should have all the domain specific fields to support you use cases directly.
Before you dive in to creating your own mappings, it is important to understand a little bit about how a mapping is actually performed. The most notable thing to keep in mind is that a mapping script that you provide is not evaluated at request time for each event. Instead a mapping is evaluated only once during startup and declares how the actual mapping should take place.
Schema evolution is the process of changing the schema over time as requirements change. For example when a new feature is added to your website, you add additional fields to the schema that contain specific information about user interactions with this new feature. In this scenario, you would update the schema to have these additional fields, update the mapping and then run Divolte Collector with the new schema and mapping. This means that there will be a difference between data that was written prior to the update and data that is written after the update. Also, it means that after the update, there can still be consumers of the data that still use the old schema. In order to make sure that this isn’t a problem, the readers with the old schema need to be able to read data written with the new schema and readers with the new schema should also still work on data written with the old schema.
Luckily, Avro supports both of these cases. When reading newer data with an older schema, the fields that are not present in the old schema are simply ignored by the reader. The other way around is slightly trickier. When reading older data with a new schema, Avro will fill in the default values for fields that are present in the schema but not in the data. This is provided that there is a default value. Basically, this means that it is recommended to always provide a default value for all your fields in the schema. In case of nullable fields, the default value could just be null.
One other reason to always provide a default value is that Avro does not allow to create records with missing values if there are no default values. As a result of this, fields that have no default value always must be populated in the mapping, otherwise an error will occur. This is problematic if the mapping for some reason fails to set a field (e.g. because of a user typing in a non-conforming location in the browser).
In addition to introducing new fields with defaults, other forms of changes such as renaming and type changes can be permitted under some circumstances. For full details on the changes that are permitted and how the writing and reading schemas are reconciled refer to the Avro documentation on schema resolution.
Schema evolution is not automatically available when using event-based delivery via Kafka or Google Cloud Pub/Sub. The file implementation is defined by the Avro project, but this is not suitable for messaging systems and no alternative is defined. As a result there is no single standard and various strategies are in use.
For Kafka there are two methods:
naked
(and default) mode in which Kafka sinks operate. Consumers are expected to implicitly know the schema used to write the record. This makes changing the schema infeasible without also switching to a new Kafka topic; consumers generally have no way to know which messages were written with the old or new schema.confluent
mode.For Google Cloud Pub/Sub the message data is the Avro-encoded record, identical to the naked
mode for Kafka sinks. In addition to the data messages have a schemaFingerprint
attribute that refers to the schema that was used to encode the message. (See Google Cloud Pub/Sub Sinks for details.) Schema changes are possible so long as they remain backwardly compatible.
Mappings are specified by Groovy scripts that are compiled and run by Divolte Collector on startup. Each mapping script is written in the mapping DSL. The result of running this script is a mapping that Divolte Collector can use to map incoming events from its configured sources onto an Avro schema.
The values that are available on an event depend on the type of source that produced it. Currently browser events make more values available than events produced by JSON sources.
Divolte Collector comes with a built-in default schema and mapping. A mapping will use these if the mapping schema or script file are not specified. The default mapping is intended to map events from a browser source and will map most things that you would expect from a clickstream data collector. The Avro schema that is used can be found in the divolte-schema Github repository. The following mappings are present in the default mapping:
Mapped value | Avro schema field |
---|---|
duplicate | detectedDuplicate |
corrupt | detectedCorruption |
firstInSession | firstInSession |
timestamp | timestamp |
clientTimestamp | clientTimestamp |
remoteHost | remoteHost |
referer | referer |
location | location |
viewportPixelWidth | viewportPixelWidth |
viewportPixelHeight | viewportPixelHeight |
screenPixelWidth | screenPixelWidth |
screenPixelHeight | screenPixelHeight |
partyId | partyId |
sessionId | sessionId |
pageViewId | pageViewId |
eventType | eventType |
userAgentString | userAgentString |
User agent name | userAgentName |
User agent family | userAgentFamily |
User agent vendor | userAgentVendor |
User agent type | userAgentType |
User agent version | userAgentVersion |
User agent device category | userAgentDeviceCategory |
User agent OS family | userAgentOsFamily |
User agent OS version | userAgentOsVersion |
User agent OS vendor | userAgentOsVendor |
The default schema is not available as a mapping script. Instead, it is hard coded into Divolte Collector. This allows Divolte Collector to do something useful out-of-the-box without any complex configuration.
Mapping involves three main concepts: values, fields and mappings.
A value is something that is extracted from the incoming event (e.g. the location or a HTTP header value) or is derived from another value (e.g. a query parameter from the location URI). Values in the mapping are produced using calls to functions that are built into the mapping DSL. Below is the complete documentation for all values that can be produced. One example of such a function call would be calling location()
for the location value or referer()
for the referrer value of the event.
A field is a field in the Avro record that will be produced as a result of the mapping process. The type of a field is defined by the Avro schema that is used. Mapping is the process of mapping values extracted from the event onto fields in the Avro record.
A mapping is the piece that tells Divolte Collector which values need to be mapped onto which fields. The mapping DSL has a built in construct for this, explained below.
map
)¶The simplest possible mapping is mapping a simple value onto a schema field. The syntax is as follows:
map location() onto 'locationField'
Alternatively, the map
function takes a closure as first argument, which can come in handy when the value is the result of several operations or a more complex construct, such as this example where we take a query parameter from the location and parse it as an integer:
map {
def u = parse location() to uri // Parse the URI out of the location
parse u.query().value('n') to int32 // Take the n query parameter and try to parse an int out of it
} onto 'intField'
In Groovy the last statement in a closure becomes the return value for the closure. So in the closure above, the value returned by the parse
call is the result of the entire closure. This is in turn mapped onto the intField
field of the Avro record.
Apart from mapping values onto fields, it is also possible to map a literal onto a field:
map 'string literal' onto 'stringField'
map true onto 'booleanField'
This is most often used in combination with Conditional mapping (when) as in this example:
when referer().isAbsent() apply { // Only apply this mapping when a referer is absent
map true onto 'directTraffic'
}
Not all values are present in each event. For example, when using a custom cookie value there could be incoming events where the cookie is not sent by the client. In this case the cookie value is said to absent. Similarly, events from a JSON source do not have a location value; this is specific to events from a browser source.
Divolte Collector will never actively set an absent value. Instead for absent values it does nothing at all: the mapped field is not set on the Avro record. When values that are absent are used in subsequent expressions the derived values will also be absent. In the following example the intField
field will never be set because the incoming request has no referrer. This is not an error:
def u = parse referer() to uri // parse a URI out of the referer
def q = u.query() // parse the query string of the URI
def i = parse q.value('foo') to int32 // parse a int out of the query parameter 'foo'
map i onto 'intField' // map it onto the field 'intField'
Because absent values result in fields not being set your schema must have default values for all fields that are used for mappings where the value can be absent. In practice, it is recommended to always use default values for all fields in your schema.
Values in a mapping are typed and the value type must match the type of the Avro field that they are mapped onto. Divolte Collector checks for type compatibility during startup and will report an error if there is a mismatch. The type for a value can be found in the documentation below.
Below is a table of all types that can be produced in a mapping and the corresponding Avro types that match them:
Type | Avro type |
---|---|
String |
{ "name": "fieldName", "type": ["null","string"], "default": null }
|
Boolean |
{ "name": "fieldName", "type": ["null","boolean"], "default": null }
|
int |
{ "name": "fieldName", "type": ["null","int"], "default": null }
|
long |
{ "name": "fieldName", "type": ["null","long"], "default": null }
|
float |
{ "name": "fieldName", "type": ["null","float"], "default": null }
|
double |
{ "name": "fieldName", "type": ["null","double"], "default": null }
|
Map<String,List<String>> |
{
"name": "fieldName",
"type": [
"null",
{
"type": "map",
"values": {
"type": "array",
"items": "string"
}
}
],
"default": null
}
|
List<String> |
{
"name": "fieldName",
"type":
[
"null",
{
"type": "array",
"items": "int"
}
],
"default": null
}
|
ByteBuffer |
{ "name": "fieldName", "type": ["null","bytes"], "default": null }
|
JSON (JsonNode ) |
Must match the structure of the JSON fragment. See Mapping JSON (JsonNode) to Avro fields. |
Many of the simple values that can be extracted from an event are strings. Sometimes these values are not intended to be strings. Because type information about things like query parameters or path components is not present in a HTTP request, Divolte Collector can only treat these values as strings. It is, however, possible to parse a string to a primitive or other type in the mapping using this construct:
def i = parse stringValue to int32
In the example above, stringValue
is a string value and the result value, assigned to i
, will be of type int
.
Note
This is not casting, but string parsing. If the string value cannot be parsed to an integer (because it is not a number) the resulting value will be absent, but no error occurs.
A more complete example is this:
def u = parse referer() to uri // u is of type URI (which is not mappable)
def q = u.query() // q is of type map<string,list<string>>
def s = q.value('foo') // s is of type string if query parameter foo contained a integer number
def i = parse s to int32 // i is of type int
map i onto 'intField' // map it onto the field 'intField'
Because int
, long
, Boolean
, etc. are reserved words in Groovy, the mapping DSL uses aliases for parsing. The following table lists the types that can be used for parsing and the corresponding mapping types:
Parsing alias | Type |
---|---|
int32 |
int |
int64 |
long |
fp32 |
float |
fp64 |
double |
bool |
Boolean |
uri |
URI |
JsonNode
) to Avro fields¶Some expressions, for example, eventParameters()
(and its path()
method), produce a JsonNode
value that represents JSON supplied by a client. Because Avro doesn’t have a type for handling arbitrary JSON data, a compatible Avro type must be chosen to match the expected structure of the JSON from the client. The following table lists the rules for compatibility between JSON values and Avro types.
Avro type | JSON value |
---|---|
null |
JSON’s null value |
boolean |
A JSON boolean, or a string if it can be parsed as a boolean. |
int long |
A JSON number, or a string if it can be parsed as a number.
Fractional components are truncated for float and double . |
float double |
A JSON number, or a string if it can be parsed as a number. Note that full floating-point precision may not be preserved. |
bytes |
A JSON string, with BASE64 encoded binary data. |
string |
A JSON string, number or boolean value. |
enum |
A JSON string, so long as it’s identical to one of the enumeration’s
symbols. (If not, the value will be treated as null ). |
record |
A JSON object, with each property corresponding to a field in the record. (Extraneous properties are ignored.) The property values and field types must also be compatible. |
array |
A JSON array. Each element of the JSON array must be compatible with the type declared for the Avro array. |
map |
A JSON object, with each property being an entry in the map. Property names are used for keys, and the values must be compatible with the Avro type for the map values. |
union |
Only trivial unions are supported of null with another type. The
JSON value must either be null or compatible with the other union type. |
fixed |
The same as bytes , as above. Data beyond the declared length will
be truncated. |
In addition to these compatibility rules, trivial array wrapping and unwrapping will be performed if necessary:
For example, a shopping basket could be supplied as the following JSON:
{
"total_price": 184.91,
"items": [
{ "sku": "0886974140818", "count": 1, "price_per": 43.94 },
{ "sku": "0094638246817", "count": 1, "price_per": 22.99 },
{ "sku": "0093624979357", "count": 1, "price_per": 27.99 },
{ "sku": "8712837825207", "count": 1, "price_per": 89.99 }
]
}
This could be mapped using the following Avro schema:
{
"type": [
"null",
{
"name": "ShoppingBasket",
"type": "record",
"fields": [
{ "name": "total_price", "type": "float" },
{
"name": "items",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "LineItem",
"fields": [
{ "name": "sku", "type": "string" },
{ "name": "count", "type": "int" },
{ "name": "price_per", "type": "double" }
]
}
}
}
]
}
],
"default": null
}
The Avro field will remain unchanged if mapping fails at runtime because the JSON value cannot be mapped onto the specified Avro type. (The complete record may subsequently be invalid if the field was mandatory.)
Note
Unlike most mappings, schema compatibility for JSON mappings cannot be checked on startup because compatibility depends on the JSON supplied with each individual event.
The digest()
method produces binary data via its result()
method. Although this can be mapped directly to Avro, often it’s convenient to map a human-readable format instead. The following conversions are supported:
Conversion | Description |
---|---|
.toHexLower() |
Hexadecimal format using lower-case letters (a –f ). |
.toHexUpper() |
Hexadecimal format using upper-case letters (A –F ). |
.toBase64() |
Base-64 format, without newlines. |
An example mapping onto an Avro field that is a string would be:
map digest('sha-256', 'pseudoSession')
.add(partyId())
.add('-')
.add(sessionId())
.result().toBase64() to 'pseudoSession'
when
)¶Not all incoming requests are the same and usually, different types of requests require different values to be extracted and different fields to be set. This can be achieved using conditional mapping. With conditional mapping any boolean value can be used to conditionally apply a part of the mapping script. This can be done using the following syntax:
when conditionBooleanValue apply {
// Conditional mapping go here
map 'value' onto 'fieldName'
}
A more concrete example of using this construct would be:
when referer().isAbsent() apply {
map true onto 'directTraffic'
}
Here we check whether the referer value is absent and if so, map a literal value onto a boolean field.
As an alternative syntax, it is possible to use a closure that produces the boolean value as well, just like in Mapping values onto fields (map). In this example we check if a query parameter called clientId
is present in the location and on that condition perform a mapping:
when {
def u = parse location() to uri
u.query().value('clientId').isPresent()
} apply {
map true onto 'signedInUser'
}
Any boolean value can be used as a condition. In order to be able to create flexible conditional mappings, the mapping DSL provides a number of methods on values that return booleans useful in conditional mappings, such as equality comparisons and boolean logic:
Condition | Description |
---|---|
value.isPresent() |
True if the value is present. See: Value presence |
value.isAbsent() |
True if the value is absent. See: Value presence |
value.equalTo(otherValue) |
True if both values are equal. Values must be of the same type. |
value.equalTo(‘literal’) |
True if the value is equal to the given literal. Non-string types are supported as well. |
booleanValue.and(otherBooleanValue) |
True if both booleans are true. |
booleanValue.or(otherBooleanValue) |
True if either or both of the boolean values are true. |
not booleanValue |
True if the boolean value is false. |
regexMatcherValue.matches() |
True if the regular expression matches the value. See: Regular expression matching. |
Sections are useful for grouping together parts of the mapping that form a logical subset of the entire mapping. In addition to grouping it is possible to conditionally stop processing a section prematurely. Sections are defined using the section
keyword followed by a closure that contains the section:
section {
// Section's mappings go here
map 'value' onto 'field'
}
exit()
¶The exit()
function will, at any point, break out of the enclosing section or, when no enclosing section can be found, break out of the entire mapping script. This can be used to conditionally break out of a section. For example to create a type of first-match-wins scenario:
section {
def u = parse location() to uri
when u.path().equalTo('/home.html') apply {
map 'homepage' onto 'pageType'
exit()
}
when u.path().equalTo('/contact.html') apply {
map 'contactpage' onto 'pageType'
exit()
}
map 'other' onto 'pageType'
}
// other mappings here
There is a optional shorthand syntax for conditionally exiting from a section which leaves out the apply
keyword and closure:
when referer().isAbsent() exit()
stop()
¶The stop()
function will, at any point, stop all further processing and break out of the entire mapping script. This is typically applied conditionally. Generally, it is safer to use sections and exit()
instead. Use with care. The stop()
function can also be used conditionally, just as anything else:
when referer().isAbsent() {
stop()
}
Or, using shorthand syntax:
when referer().isAbsent stop()
Groovy is a dynamic language for the JVM. This means, amongst other things, that you don’t have to specify the types of variables:
def i = 40
println i + 2
The above snippet will print out 42 as you would expect. Note two things: we never specified that variable i is an int and also, we are not using any parentheses in the println
function call. Groovy allows to leave out the parentheses in most function and method calls. The code above is equivalent to this snippet:
def i = 42
println(i + 2)
This in turn is equivalent to this:
def i = 42
println(i.plus(2))
This works well when chaining single argument methods. However, this can be more problematic with nested method calls. Suppose we have a function called increment(x)
which increments the x
argument by 1, so increment(10)
will return 11. The following will not compile:
println increment 10
However this will:
println(increment(10))
Yet this won’t:
println(increment 10)
In the Divolte Collector mapping DSL, it is sometimes required to chain method calls. For example when using the result of a casting operation in a mapping. We solve this by accepting a closure that produces a value as result:
map { parse cookie('customer_id') to int32 } onto 'customerId'
This way you don’t have to add parentheses to all intermediate method calls and we keep the syntax fluent. If you follow these general guidelines, you should be safe:
When calling methods that produce a value, always use parentheses. For example: location()
, referer()
, partyId()
When deriving a condition or other value from a method that produces a value, also use parentheses. For example:
when location().equalTo('http://www.example.com/') apply {
...
}
map cookie('example').isPresent() onto 'field'
map parsedUri.query().value('foo') onto 'field'
When parsing or matching on something, extract it to a variable before using it. This also improves readability:
def myUri = parse location() to uri
when myUri.query().value('foo').isPresent() apply { ... }
def myMatcher = match '^/foo/bar/([a-z]+)/' against myUri.path()
when myMatcher.matches() apply { ... }
When casting inline, use the closure syntax for mapping or conditionals:
map { parse cookie('example') to int32 } onto 'field'
Simple values are pieces of information that are directly extracted from the event without any processing. You can map simple values directly onto fields of the correct type or you can use them in further processing, such as matching against a regular expression or URI parsing.
location()
¶Usage: | map location() onto 'locationField'
|
---|---|
Sources: |
|
Description: | The location URL of the page where the event was triggered: the full address in the address bar of the user’s browser. This includes the fragment part if this is present (the part after the |
Type: |
|
referer()
¶Usage: | map referer() onto 'refererField'
|
---|---|
Sources: |
|
Description: | The referrer URL for the page-view that triggered the event. Unlike |
Type: |
|
firstInSession()
¶Usage: | map firstInSession() onto 'first'
|
---|---|
Sources: |
|
Description: | A boolean flag that is true if a new session ID was generated for this event and false otherwise. If true a new session has started. |
Type: |
|
corrupt()
¶Usage: | map corrupt() onto 'detectedCorruption'
|
---|---|
Sources: |
|
Description: | A boolean flag that is true if the source for the event detected corruption of the event data. Event corruption usually occurs when intermediate parties try to re-write HTTP requests or truncate long URLs. Real-world proxies and anti-virus software has been observed doing this. Note that although this field is available on events from all sources, only browser sources currently detect corruption and set this value accordingly. |
Type: |
|
duplicate()
¶Usage: | map duplicate() onto 'detectedDuplicate'
|
---|---|
Sources: |
|
Description: | A boolean flag that true when the event is believed to be a duplicate of an earlier one. Duplicate detection in Divolte Collector utilizes a probabilistic data structure that has a low false positive and false negative rate. Nonetheless classification mistakes can still occur. Duplicate events often arrive due to certain types of anti-virus software and certain proxies. Additionally, browsers sometimes go haywire and send the same request large numbers of times (in the tens of thousands). Duplicate detection can be used to mitigate the effects when this occurs. This is particularly handy in real-time processing where it is not practical to perform de-duplication of the data based on a full data scan. |
Type: |
|
timestamp()
¶Usage: | map timestamp() onto 'timeField'
|
---|---|
Sources: |
|
Description: | The timestamp of the time the the request was received by the server, in milliseconds since the UNIX epoch. |
Type: |
|
clientTimestamp()
¶Usage: | map clientTimestamp() onto 'timeField'
|
---|---|
Sources: |
|
Description: | The timestamp that was recorded on the client side immediately prior to sending the request, in milliseconds since the UNIX epoch. |
Type: |
|
remoteHost()
¶Usage: | map remoteHost() onto 'ipAddressField'
|
---|---|
Sources: |
|
Description: | The remote IP address of the request. Depending on configuration, Divolte Collector will use any X-Forwarded-For headers set by intermediate proxies or load balancers. |
Type: |
|
viewportPixelWidth()
¶Usage: | map viewportPixelWidth() onto 'widthField'
|
---|---|
Sources: |
|
Description: | The width of the client’s browser viewport in pixels. |
Type: |
|
viewportPixelHeight()
¶Usage: | map viewportPixelHeight() onto 'widthField'
|
---|---|
Sources: |
|
Description: | The height of the client’s browser viewport in pixels. |
Type: |
|
screenPixelWidth()
¶Usage: | map screenPixelWidth() onto 'widthField'
|
---|---|
Sources: |
|
Description: | The width of the client’s screen in pixels. |
Type: |
|
screenPixelHeight()
¶Usage: | map screenPixelHeight() onto 'widthField'
|
---|---|
Sources: |
|
Description: | The height of the client’s screen in pixels. |
Type: |
|
devicePixelRatio()
¶Usage: | map devicePixelRatio() onto 'ratioField'
|
---|---|
Sources: |
|
Description: | The ratio of physical pixels to logical pixels on the client’s device. Some devices use a scaled resolution, meaning that the resolution and the actual available pixels are different. This is common on retina-type displays, with very high pixel density. |
Type: |
|
partyId()
¶Usage: | map partyId() onto 'partyField'
|
---|---|
Sources: |
|
Description: | A long-lived unique identifier stored by a client that is associated with each event they send. All events from the same client should have the same party identifier. For browser sources this value is stored in a cookie. |
Type: |
|
sessionId()
¶Usage: | map sessionId() onto 'sessionField'
|
---|---|
Sources: |
|
Description: | A short-lived unique identifier stored by a client that is associated with each event from that source within a session of activity. All events from the same client within a session should have the same session identifier. For browser sources a session to expire when 30 minutes has elapsed without any events occurring. |
Type: |
|
pageViewId()
¶Usage: | map pageViewId() onto 'pageviewField'
|
---|---|
Sources: |
|
Description: | A unique identifier that is generated for each page-view. All events from a client within the same page-view will have the same page-view identifier. For browser sources a page-view starts when the user visits a page, and ends when the user navigates to a new page. Note that navigating within single-page web applications or links to anchors within the same page do not normally trigger a new page-view. |
Type: |
|
eventId()
¶Usage: | map eventId() onto 'eventField'
|
---|---|
Sources: |
|
Description: | A unique identifier that is associated with each event received from a source. (This identifier is assigned by the client, not by the server.) |
Type: |
|
userAgentString()
¶Usage: | map userAgentString() onto 'uaField'
|
---|---|
Sources: |
|
Description: | The full user agent identification string reported by the client HTTP headers when sending an event. See User agent parsing on how to extract more meaningful information from this string. |
Type: |
|
cookie(name)
¶Usage: | map cookie('cookie_name') onto 'customCookieField'
|
---|---|
Sources: |
|
Description: | The value of a cookie included in the client HTTP headers when sending an event. |
Type: |
|
eventType()
¶Usage: | map eventType() onto 'eventTypeField'
|
---|---|
Sources: |
|
Description: | The type of event being processed. The tracking tag used by sites integrating with browser sources automatically issue a |
Type: |
|
Complex values often return objects that allow for two different behaviours:
This differs from simple values which can only be mapped; no further derived values can be obtained.
eventParameters()
¶Usage: | When submitting custom events from a client:
In the mapping: map eventParameters() onto 'parametersField'
|
||||
---|---|---|---|---|---|
Sources: |
|
||||
Description: | A JSON object or array ( See Mapping JSON (JsonNode) to Avro fields for an example on how to map this to a field. |
||||
Type: |
|
eventParameters().value(name)
¶Usage: | When submitting custom events from a client:
In the mapping: map eventParameters().value('foo') onto 'fooField'
// Or with a cast:
map { parse eventParameters().value('bar') to int32 } onto 'barField'
|
||||
---|---|---|---|---|---|
Description: | The value for an event parameter that was sent as part of a custom event. Note that this is always a string, regardless of the type used on the client side. If you are certain a parameter has a specific format you can explicitly cast it as in the example above. |
||||
Type: |
|
eventParameters().path(expression)
¶Usage: | When submitting custom events from a client:
In the Avro schema: {
"name": "searchResults",
"type": [ "null", { "type": "array", "items": "string" } ],
"default": null
}
In the mapping: map eventParameters().path('$[*].sku') onto 'searchResults'
|
||||
---|---|---|---|---|---|
Description: | This can be used to extract parts of parameters supplied with the event using a JSON-path expression. (See http://goessner.net/articles/JsonPath/ for a description of JSON-path expressions.) If the expression does not match anything, the value is not considered to be present. (A See Mapping JSON (JsonNode) to Avro fields for an example on how to map JSON values to a field. Expressions can return more than one result; these are presented as a JSON array for subsequent mapping. |
||||
Type: |
|
uri
¶Usage: | def locationUri = parse location() to uri
|
---|---|
Description: | Attempts to parse a string as a URI. The most obvious candidates to use for this are the |
Type: |
|
URI.path()
¶Usage: | def locationUri = parse location() to uri
map locationUri.path() onto 'locationPathField'
|
---|---|
Description: | The path component of a URI. Any URL encoded values in the path will be decoded. Keep in mind that if the path contains a encoded |
Type: |
|
URI.rawPath()
¶Usage: | def locationUri = parse location() to uri
map locationUri.rawPath() onto 'locationPathField'
|
---|---|
Description: | The path component of a URI. This value is not decoded in any way. |
Type: |
|
URI.scheme()
¶Usage: | def locationUri = parse location() to uri
map locationUri.scheme() onto 'locationSchemeField'
// or check for HTTPS and map onto a boolean field
map locationUri.scheme().equalTo('https') onto 'isSecure'
|
---|---|
Description: | The scheme component of a URI. This is the protocol part, such as |
Type: |
|
URI.host()
¶Usage: | def locationUri = parse location() to uri
map locationUri.host() onto 'locationHostField'
|
---|---|
Description: | The host component of a URI. For |
Type: |
|
URI.port()
¶Usage: | def locationUri = parse location() to uri
map locationUri.port() onto 'locationPortField'
|
---|---|
Description: | The port component of a URI. For |
Type: |
|
URI.decodedQueryString()
¶Usage: | def locationUri = parse location() to uri
map locationUri.decodedQueryString() onto 'locationQS'
|
---|---|
Description: | The full, URL decoded query string of a URI. For |
Type: |
|
URI.rawQueryString()
¶Usage: | def locationUri = parse location() to uri
map locationUri.rawQueryString() onto 'locationQS'
|
---|---|
Description: | The full, query string of a URI without any decoding. For |
Type: |
|
URI.decodedFragment()
¶Usage: | def locationUri = parse location() to uri
map locationUri.decodedFragment() onto 'locationFragment'
|
---|---|
Description: | The full, URL decoded fragment of a URI. For |
Type: |
|
URI.rawFragment()
¶Usage: | def locationUri = parse location() to uri
map locationUri.rawFragment() onto 'locationFragment'
|
---|---|
Description: | The full, fragment of a URI without any decoding. For // If location() = 'http://www.example.com/foo/#/local/path/?q=hello+world'
// this would map '/local/path/' onto the field clientSidePath
def locationUri = parse location() to uri
def localUri = parse location().rawFragment() to uri
map localUri.path() onto 'clientSidePath'
|
Type: |
|
URI.query()
¶Usage: | def locationUri = parse location() to uri
def locationQuery = locationUri.query()
map locationQuery onto 'locationQueryParameters'
|
---|---|
Description: | The query string from a URI parsed into a map of value lists. In the resulting map, the keys are the parameter names of the query string and the values are lists of strings. Lists are required because a query parameter can have multiple values (by being present more than once). In order to map all the query parameters directly onto a Avro field, the field must be typed as a map of string lists, possibly a union with null, to have a sensible default when no query string is possible. In a Avro schema definition, the following field definition can be a target field for the query parameters: {
"name": "uriQuery",
"type": [
"null",
{
"type": "map",
"values": {
"type": "array",
"items": "string"
}
}
],
"default": null
}
|
Type: |
|
URI.query().value(name)
¶Usage: | def locationUri = parse location() to uri
def locationQuery = locationUri.query()
map locationQuery.value('foo') onto 'fooQueryParameter'
|
---|---|
Description: | The first value found for a query parameter. This value is URL decoded. |
Type: |
|
URI.query().valueList(name)
¶Usage: | def locationUri = parse location() to uri
def locationQuery = locationUri.query()
map locationQuery.valueList('foo') onto 'fooQueryParameterValues'
|
---|---|
Description: | A list of all values found for a query parameter name. These values are URL decoded. |
Type: |
|
match(regex).against(stringValue)
¶Usage: | def matcher = match '/foo/bar/([a-z]+).html$' against location()
|
---|---|
Description: | Matches a regular expression against a string value; the entire value must match. The result of this can not be directly mapped onto a Avro field, but can be used to extract capture groups or conditionally perform a mapping if the pattern is a match. Often it is required to perform non-trivial partial extractions against strings that are taken from the requests. One example would be matching the path of the location with a wild card. It is not recommended to match patterns against the def locationUri = parse location() to uri
def pathMatcher = match '^/foo/bar/([a-z]+).html$' against locationUri.path()
when pathMatcher.matches() apply {
map 'fooBarPage' onto 'pageTypeField'
map pathMatcher.group(1) onto 'pageNameField'
}
|
Type: |
|
Matcher.matches()
¶Usage: | def matcher = match '^/foo/bar/([a-z]+).html$' against location()
// use in conditional mapping
when matcher.matches() apply {
map 'fooBarPage' onto 'pageTypeField'
}
// or map directly onto a boolean field
map matcher.matches() onto 'isFooBarPage'
|
---|---|
Description: | True when the value is present and matches the regular expression or false otherwise. |
Type: |
|
Matcher.group(positionOrName)
¶Usage: | // Using group number
def matcher = match '/foo/bar/([a-z]+).html$' against location()
map matcher.group(1) onto 'pageName'
// Using named capture groups
def matcher = match '/foo/bar/(?<pageName>[a-z]+).html$' against location()
map matcher.group('pageName') onto 'pageName'
|
---|---|
Description: | The value from a capture group in a regular expression pattern if the pattern matches, absent otherwise. Groups can be identified by their group number, starting from 1 as the first group or using named capture groups. |
Type: |
|
header(name)
¶Usage: | map header('header-name') onto 'fieldName'
|
---|---|
Sources: |
|
Description: | The list of all values associated with the given HTTP header from the incoming request. A HTTP header can be present in a request multiple times, and each header can contain a comma-separated list of values. The list of values is normalized by splitting the comma-separated lists for each header and concatenating them together. Empty values are omitted. The Avro type of the target field for this mapping must be a list of string: {
"name": "headers",
"type":
[
"null",
{
"type": "array",
"items": ["string"]
}
],
"default": null
}
Note that the array field in Avro itself is nullable and has a default value of null, whereas the items in the array are not nullable. The latter is not required, because when the header is present the elements in the list are guaranteed to be non-null. |
Type: |
|
header(name).first()
¶Usage: | map header('header-name').first() onto 'fieldName'
|
---|---|
Description: | The first of the normalized values associated with the given HTTP header from the incoming request. |
Type: |
|
header(name).last()
¶Usage: | map header('header-name').last() onto 'fieldName'
|
---|---|
Description: | The last of the normalized values associated with the given HTTP header from the incoming request. |
Type: |
|
header(name).get(index)
¶Usage: | map header('header-name').get(2) onto 'fieldName'
|
---|---|
Description: | Retrieve the value at the specified position in the list of normalized values associated with the given HTTP header from the incoming request. The first element in the list has an index of 0, the second is 1 and so on. If the position requested is negative, the position is relative to the end of the list. The last element in the list has an index of -1, the second last is -2 and so on. No value is mapped if the specified position exceeds the bounds of the list of values. |
Type: |
|
header(name).commaSeparated()
¶Usage: | map header('header-name').commaSeparated() onto 'fieldName'
|
---|---|
Description: | The comma separated string of all values associated with the given HTTP header from the incoming request. If the HTTP header is present in the requested multiple times, the header values are joined using a comma as separator. |
Type: |
|
digest(algorithm[, seed])
¶Usage: | // A simple digester.
def digester = digest("sha-256")
// A seeded digester.
def digester = digest("sha-256","a seed value")
|
---|---|
Sources: |
|
Description: | Initializes a digester that can cryptographically hash data using the supplied algorithm to produce a _digest_. Valid algorithms include Warning It is best practice to:
The seed does not have to be kept secret. |
Type: |
|
digester(…).add(value)
¶Usage: | map digest('sha-256', 'pseudoSession')
.add(partyId())
.add('-')
.add(sessionId())
.result() onto 'fieldName'
|
---|---|
Description: | Appends some data for the digester to process, returning the digester so that chained additions are supported. The data to process be:
Internally the strings are converted for processing into bytes using the UTF-8 encoding. |
Type: |
|
digester(…).result()
¶Usage: | map digest('sha-256', 'aSeed')
.add('...')
.result() onto 'fieldName'
|
---|---|
Description: | Finalise the data to be digested so that it can be mapped to a field. The digest itself is binary data which can be mapped directly to an Avro field or formatted as a string first. (See Formatting binary data for more information.) |
Type: |
|
userAgent()
¶Usage: | def ua = userAgent()
|
---|---|
Sources: |
|
Description: | Attempts to parse a the result of userAgentString string into a user agent object. Note that this result is not directly mappable onto any Avro field. Instead, the subfields from this object, described below, can be mapped onto fields. When the parsing of the user agent string fails, either because the user agent is unknown or malformed, or because the user agent was not sent by the browser, this value and all subfield values are absent. |
Type: |
|
userAgent().name()
¶Usage: | map userAgent().name() onto 'uaNameField'
|
---|---|
Description: | The canonical name for the parsed user agent. E.g. ‘Chrome’ for Google Chrome browsers. |
Type: |
|
userAgent().family()
¶Usage: | map userAgent().family() onto 'uaFamilyField'
|
---|---|
Description: | The canonical name for the family of the parsed user agent. E.g. |
Type: |
|
userAgent().vendor()
¶Usage: | map userAgent().vendor() onto 'uaVendorField'
|
---|---|
Description: | The name of the company or organisation that produces the user agent software. E.g. |
Type: |
|
userAgent().type()
¶Usage: | map userAgent().type() onto 'uaTypeField'
|
---|---|
Description: | The type of user agent that was used. E.g. |
Type: |
|
userAgent().version()
¶Usage: | map userAgent().version() onto 'uaVersionField'
|
---|---|
Description: | The version string of the user agent software. E.g. |
Type: |
|
userAgent().deviceCategory()
¶Usage: | map userAgent().deviceCategory() onto 'uaDeviceCategoryField'
|
---|---|
Description: | The type of device that the user agent runs on. E.g. |
Type: |
|
userAgent().osFamily()
¶Usage: | map userAgent().osFamily() onto 'uaOSFamilyField'
|
---|---|
Description: | The operating system family that the user agent runs on. E.g. |
Type: |
|
Derived simple value:
userAgent().osVersion()
¶Usage: | map userAgent().osVersion() onto 'uaOSVersionField'
|
---|---|
Description: | The version string of the operating system that the user agent runs on. E.g. |
Type: |
|
userAgent().osVendor()
¶Usage: | map userAgent().osVendor() onto 'uaOSVendorField'
|
---|---|
Description: | The name of the company or organisation that produces the operating system that the user agent software runs on. E.g. |
Type: |
|
ip2geo({optionalIP})
¶Usage: | // uses the remoteHost as IP address to lookup
def ua = ip2geo()
// If a load balancer sets custom headers for IP addresses, use like this
def ip = header('X-Forwarded-For').first()
def myUa = ip2geo(ip)
Also other functions are available depending on the array of IP’s: // Returns the last item of the list
def ip = header('X-Forwarded-For').last()
// Returns the second last item
def ip = header('X-Forwarded-For').get(-2)
|
---|---|
Sources: |
|
Description: | Attempts to turn a IPv4 address into a geo location by performing a lookup into a configured MaxMind GeoIP City database. This database is not distributed with Divolte Collector, but must be provided separately. See the Configuration chapter for more details on this. Note that this result is not directly mappable onto any Avro field. Instead the subfields from this object, described below, can be mapped onto fields. When the lookup for a IP address fails or when the argument is not a IPv4 address, this value and all subfield values are absent. |
Type: |
|
ip2geo().cityId()
¶Usage: | map ip2geo().cityId() onto 'cityIdField'
|
---|---|
Description: | The GeoNames City ID for the geolocation. |
Type: |
|
ip2geo().cityName()
¶Usage: | map ip2geo().cityName() onto 'cityNameField'
|
---|---|
Description: | The city name for the geolocation in English. |
Type: |
|
ip2geo().continentCode()
¶Usage: | map ip2geo().continentCode() onto 'continentCodeField'
|
---|---|
Description: | The ISO continent code for the geolocation. |
Type: |
|
ip2geo().continentId()
¶Usage: | map ip2geo().continentId() onto 'continentIdField'
|
---|---|
Description: | The GeoNames Continent Id for the geolocation. |
Type: |
|
ip2geo().continentName()
¶Usage: | map ip2geo().continentName() onto 'continentNameField'
|
---|---|
Description: | The continent name for the geolocation in English. |
Type: |
|
ip2geo().countryCode()
¶Usage: | map ip2geo().countryCode() onto 'countryCodeField'
|
---|---|
Description: | The ISO country code for the geolocation. |
Type: |
|
ip2geo().countryId()
¶Usage: | map ip2geo().countryId() onto 'countryIdField'
|
---|---|
Description: | The GeoNames Country Id for the geolocation. |
Type: |
|
ip2geo().countryName()
¶Usage: | map ip2geo().countryName() onto 'countryNameField'
|
---|---|
Description: | The country name for the geolocation in English. |
Type: |
|
ip2geo().latitude()
¶Usage: | map ip2geo().latitude() onto 'latitudeField'
|
---|---|
Description: | The latitude for the geolocation. |
Type: |
|
ip2geo().longitude()
¶Usage: | map ip2geo().longitude() onto 'longitudeField'
|
---|---|
Description: | The longitude for the geolocation. |
Type: |
|
ip2geo().metroCode()
¶Usage: | map ip2geo().metroCode() onto 'metroCodeField'
|
---|---|
Description: | The Metro Code for the geolocation. |
Type: |
|
ip2geo().timeZone()
¶Usage: | map ip2geo().timeZone() onto 'timeZoneField'
|
---|---|
Description: | The name of the time zone for the geolocation as found in the IANA Time Zone Database. |
Type: |
|
ip2geo().mostSpecificSubdivisionCode()
¶Usage: | map ip2geo().mostSpecificSubdivisionCode() onto 'mostSpecificSubdivisionCodeField'
|
---|---|
Description: | The ISO code for the most specific subdivision known for the geolocation. |
Type: |
|
ip2geo().mostSpecificSubdivisionId()
¶Usage: | map ip2geo().mostSpecificSubdivisionId() onto 'mostSpecificSubdivisionIdField'
|
---|---|
Description: | The GeoNames ID for the most specific subdivision known for the geolocation. |
Type: |
|
ip2geo().mostSpecificSubdivisionName()
¶Usage: | map ip2geo().mostSpecificSubdivisionName() onto 'mostSpecificSubdivisionNameField'
|
---|---|
Description: | The name for the most specific subdivision known for the geolocation in English. |
Type: |
|
ip2geo().postalCode()
¶Usage: | map ip2geo().postalCode() onto 'postalCodeField'
|
---|---|
Description: | The postal code for the geolocation. |
Type: |
|
ip2geo().subdivisionCodes()
¶Usage: | map ip2geo().subdivisionCodes() onto 'subdivisionCodesField'
|
---|---|
Description: | The ISO codes for all subdivisions for the geolocation in order from least to most specific. |
Type: |
|
ip2geo().subdivisionIds()
¶Usage: | map ip2geo().subdivisionIds() onto 'subdivisionIdsFields'
|
---|---|
Description: | The GeoNames IDs for all subdivisions for the geolocation in order from least to most specific. |
Type: |
|
ip2geo().subdivisionNames()
¶Usage: | map ip2geo().subdivisionNames() onto 'subdivisionNames'
|
---|---|
Description: | The names in English for all subdivisions for the geolocation in order from least to most specific. |
Type: |
|