analyticstxt/draft-offen-analyticstxt.md

566 lines
18 KiB
Markdown
Raw Normal View History

2021-03-25 11:11:36 +01:00
---
2021-04-10 14:07:20 +02:00
title: A File Format for the Discoverable Use of Analytics
2021-03-25 20:04:18 +01:00
abbrev: analytics.txt
2021-03-25 11:11:36 +01:00
docname: draft-offen-analyticstxt-latest
category: info
ipr: trust200902
area: General
2021-03-25 11:28:07 +01:00
workgroup: Network Working Group
2021-03-25 11:11:36 +01:00
keyword: Internet-Draft
stand_alone: yes
smart_quotes: no
pi: [toc, sortrefs, symrefs]
author:
-
ins: F. Ring
name: Frederik Ring
organization: Offen
email: frederik.ring@gmail.com
-
ins: H. Niefeld
name: Hendrik Niefeld
organization: Offen
email: hello@niefeld.com
normative:
informative:
2021-04-01 19:30:56 +02:00
DNT:
title: Tracking Preference Expression (DNT)
target: https://www.w3.org/TR/tracking-dnt/
author:
-
name: Roy T. Fielding
-
name: David Singer
GPC:
title: Global Privacy Control (GPC)
target: https://globalprivacycontrol.github.io/gpc-spec/
author:
-
name: Robin Berjon
-
name: Sebastian Zimmeck
-
name: Ashkan Soltani
-
name: David Harbage
-
name: Peter Snyder
2021-03-25 11:11:36 +01:00
--- abstract
Internet privacy has become an important feature for users of websites and services.
2021-04-10 14:07:20 +02:00
This document proposes a way for websites and services to declare and disclose their usage of analytics and tracking software to users, and make it discoverable for tooling.
analytics.txt aims to be an elaborate standard that describes the characteristics of analytics and tracking software in a non-biased way, understandable for a non-technical audience, but also useful for the consumption by tools and software.
2021-03-25 11:11:36 +01:00
--- middle
# Introduction
2021-04-10 14:07:20 +02:00
## Motivation
User tracking and the usage of analytics software on websites has become a widely employed routine, visibly and invisibly affecting the way the user facing internet works and behaves.
Yet, there is no well-defined way of accessing information about what software is being used and what data it is collecting in a standardized way.
Legislation can only ever cover a subset of the range of existing technological implementations, creating incentives for software to find workarounds, thus allowing them to hide their presence from users.
Automated audits are limited to aspects that are possible to detect in clients, but cannot disclose other important implementation details.
2021-04-10 14:07:20 +02:00
## Scope of this proposal
This document defines a way to specify the privacy related characteristics of analytics and tracking software.
We aim for this information to be consumable both by humans as well as software.
For example, search engines or browser extensions could make use of this data and display information to users.
The file "analytics.txt" is not intended to replace the requirement for complying with existing regulations, but supposed to give insights beyond the scope of these regulations.
2021-03-25 11:11:36 +01:00
2021-04-10 14:07:20 +02:00
## Definition of the term "analytics" in the scope of this document
Analytics as referred to in this document involves the collection of usage statistics in order to generate reports that can help the providers of websites and services to better understand and optimize their services towards real world user behavior.
2021-04-01 10:50:53 +02:00
This can also include measuring different content against different groups of users.
2021-04-10 14:07:20 +02:00
Analytics or user tracking as referred to in this document does not refer to the identification of users in order to deliver customized advertising or content across websites of any kind.
2021-03-25 11:11:36 +01:00
# Conventions and Definitions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this
2021-03-25 11:28:07 +01:00
document are to be interpreted as described in BCP 14 {{!RFC2119}} {{!RFC8174}}
2021-03-25 11:11:36 +01:00
when, and only when, they appear in all capitals, as shown here.
2021-03-25 20:04:18 +01:00
The term "implementors" refers to the providers of services and websites that wish to use an analytics.txt file.
2021-03-25 11:28:07 +01:00
# Specification
2021-03-25 11:11:36 +01:00
2021-03-26 19:54:07 +01:00
This document defines a text file format that can be used by implementors to signal information about their usage of analytics software to both users and software.
2021-03-25 20:04:18 +01:00
By convention, this file is called analytics.txt.
Its location and scope are described in {{location}}.
2021-03-25 20:04:18 +01:00
This text file contains multiple fields with different values.
A field contains a "name" which is the first part of a field all the way up to the colon (for example: "Author:") and follows the syntax
defined for "field-name" in section 3.6.8 of {{?RFC5322}}.
Field names are case-insensitive (as per section 2.3 of {{?RFC5234}}).
The "value" comes after the field name and follows the syntax defined for "unstructured" in section 3.2.5 of {{?RFC5322}}.
The file MAY also contain blank lines.
2021-03-25 20:04:18 +01:00
A field MUST always consist of a name and a value (for example: "Author: Jane Doe <jane.doe@example.com>").
Each field MUST appear on its own line.
Unless specified otherwise by the field definition, multiple values MUST be chained together for a single field (for example: "Compliance: gdpr, ccpa") using the "," character (%x2c).
Unless otherwise indicated in a definition of a particular field, a field MAY NOT appear multiple times.
2021-03-25 20:04:18 +01:00
Implementors SHOULD aim for creating an analytics.txt file that is easy to understand by non-technical audiences.
## Comments
Any line beginning with the "#" (%x23) symbol MUST be interpreted as a comment.
The content of the comment may contain any ASCII or Unicode characters in the %x21-7E and %x80-FFFFF ranges plus the tab (%x09) and space (%x20) characters.
2021-03-25 20:04:18 +01:00
Example:
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
# A comment
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
Implementors SHOULD make deliberate use of comments to make an analytics.txt file more accessible for non-technical audiences.
## Line Separators
Every line MUST end either with a carriage return and line feed characters (CRLF / %x0D %x0A) or just a line feed character (LF / %x0A).
## Extensibility
Like many other formats and protocols, this format may need to be extended over time to fit the ever-changing landscape of the Internet.
Special attention is required for defining the allowed values in enumerations to ensure they are a. extendable and b. do not become obsolete too quickly.
2021-03-25 20:04:18 +01:00
## Field Definitions
Field names are case-insensitive, yet implementors SHOULD use the capitalized style used in this document for consistency.
Field values are case-insensitive.
Unless otherwise specified, implementors MUST refer to the allowed values given by the specification.
2021-03-25 20:04:18 +01:00
2021-03-26 18:33:39 +01:00
### Author {#author-field}
2021-03-25 20:04:18 +01:00
This REQUIRED field holds an OPTIONAL author name and a REQUIRED email address providing information about a person or entity responsible for maintaining the contents of the file.
The field MUST contain a valid email address which shall be used for inquiries about the correctness and additions to the data provided in the file.
2021-03-25 20:04:18 +01:00
#### Example
2021-03-25 20:04:18 +01:00
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
Author: Jane Doe <jane.doe@example.com>
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
### Collects {#collects-field}
2021-03-25 20:04:18 +01:00
This REQUIRED multi-value field indicates which potentially privacy relevant user specific data is being collected or used in session identification or other procedures.
These values MUST also be specified if a property is not persisted as-is, but stored or processed in a hashed and/or combined form.
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
#### Allowed values
2021-03-25 20:04:18 +01:00
##### none
No analytics data is collected at all. This value MUST NOT be used in conjunction with other values.
2021-03-26 19:54:07 +01:00
##### ip-address
The request IP address is being used.
2021-03-26 19:54:07 +01:00
##### geographic-location
Geographic location of users is determined and used.
This could for example be derived from the request IP, or from using browser APIs.
2021-03-26 19:54:07 +01:00
##### ua-string
Information about the utilized User Agent is being collected.
2021-03-26 19:54:07 +01:00
##### fingerprint
Browser Fingerprinting is used.
Such mechanisms usually try to compute a unique identifier from properties of the host Operating System, allowing them to re-identify users without having to persist an identifier.
2021-03-26 19:54:07 +01:00
##### device-type
The user's device type (e.g. mobile / tablet / desktop) is being determined and collected.
2021-03-26 19:54:07 +01:00
##### url
The URL of a visit, including its path, is collected and used.
This MUST also be specified in case URLs are stripped of certain parameters or pseudonymized before being stored.
2021-03-26 19:54:07 +01:00
##### referrer
The Referrer of a visit is collected and used. This MUST also be specified if the referrer value is stripped of potential path fragments.
2021-03-26 19:54:07 +01:00
##### visit-duration
The duration of a visit, either on page- or on session-level is measured and used.
2021-03-26 19:54:07 +01:00
##### custom-events
Custom events like conversion goals are defined and used.
This MAY be left out in case the analytics software in use offers such functionality, but implementors chose not to use the feature.
2021-03-26 19:54:07 +01:00
##### session-recording
Detailed behavior like mouse movement and scrolling is recorded and can possibly be played back when analyzing the analytics data.
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
#### Example
2021-03-25 20:04:18 +01:00
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
Collects: url, device-type, referrer
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
### Stores
This field is REQUIRED unless the only value of the Collects field as per {{collects-field}} is none.
The multi-value field indicates whether data is persisted on the client during the collection of analytics data and declares the browser features used for doing so.
In case no data is being persisted at all, the value none MUST be used.
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
#### Allowed values
2021-03-25 20:04:18 +01:00
##### none
No data is persisted on the client during the collection of usage data. This value MUST NOT be used in conjunction with other values.
2021-03-26 19:54:07 +01:00
##### first-party-cookies
First party cookies are in use.
There is no differentiation between session or persistent cookies, just like HTTP and JavaScript cookies are considered equal.
2021-03-26 19:54:07 +01:00
##### third-party-cookies
Third party cookies are in use.
There is no differentiation between session or persistent cookies, just like HTTP and JavaScript cookies are considered equal.
2021-03-26 19:54:07 +01:00
##### local-storage
Data is persisted on the client using non-cookie JavaScript APIs like `localStorage`, `sessionStorage`, `WebSQL` or `IndexedDB`
##### cache
The analytics software leverages browser caches to store identifiers.
For example, ETag headers can be used to identify users based on their browser caches' contents.
This value is not required in case the analytics software sends static resources with cache headers, but does not make use of the request headers on subsequent requests.
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
#### Example
2021-03-25 20:04:18 +01:00
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
Stores: 1st-party-cookies, local-storage
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
### Uses
This field is REQUIRED unless the only value of the Collects field {{collects-field}} is none.
The multi-value field indicates the technical implementation details for how analytics data is being collected.
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
#### Allowed values
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
##### javascript
A client-side script is used to collect data.
2021-03-26 19:54:07 +01:00
##### pixel
A static resource - typically a pixel - transferred via HTTP is being used to collect data through the request parameters.
2021-03-26 19:54:07 +01:00
##### server-side
Collection of usage data is happening on the server side at application layer.
2021-03-26 19:54:07 +01:00
##### logs
Usage data is being calculated from server log files.
2021-03-26 19:54:07 +01:00
##### other
Other techniques that are not described in this section are in use.
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
#### Example
2021-03-25 20:04:18 +01:00
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
Uses: script
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
### Allows
This field is REQUIRED unless the only value of the Collects field {{collects-field}} is none.
The multi-value field discloses information about whether user consent is being acquired before collecting analytics data, and if it is possible for users to opt out of the collection of usage data.
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
#### Allowed values
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
##### opt-in
No usage data is collected before users have given their consent.
2021-03-26 19:54:07 +01:00
##### opt-out
Users can opt out of collection of usage data using a dedicated feature tailored towards the user audience.
This value is only applicable in case no data at all is collected after having opted out.
2021-03-26 19:54:07 +01:00
##### none
The software does not define a way for users to opt in or opt out of the collection of usage data.
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
#### Example
2021-03-25 20:04:18 +01:00
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
Allows: opt-out
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
### Retains
This field is REQUIRED unless the only value of the Collects field {{collects-field}} is none.
The single-value field indicates the duration for which the analytics data is being stored before being deleted.
The value is either a duration as defined in {{!RFC3339}} or the token "perpetual" in case data is retained without expiring it at some point.
Implementors SHOULD add a comment providing a human readable value to this field.
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
#### Example
2021-03-25 20:04:18 +01:00
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
# Data is retained for twelve months
Retains: P12M
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
2021-04-01 19:30:56 +02:00
### Honors
2021-04-10 16:42:36 +02:00
This OPTIONAL, RECOMMENDED multi-value field indicates which browser level privacy controls are being honored when collecting data.
2021-04-01 19:30:56 +02:00
#### Allowed values
##### none
Data is collected no matter the browser settings in use. This value MUST NOT be used in conjunction with other values.
##### do-not-track
User-Agents that have DoNotTrack {{DNT}} enabled will be excluded from the collection of analytics data.
##### global-privacy-control
User agents that have Global Privacy Control {{GPC}} enabled will be excluded from the collection of analytics data.
#### Example
~~~~~~~~~~
Honors: do-not-track, global-privacy-control
~~~~~~~~~~
### Tracks
2021-03-25 20:04:18 +01:00
2021-04-10 16:42:36 +02:00
This OPTIONAL, RECOMMENDED multi-value field indicates the coverage in session and user lifecycle tracking.
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
#### Allowed values
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
##### anonymous
2021-04-10 16:42:36 +02:00
Each event that is collected is anonymous. There is no way to connect and group multiple pageviews by user or similar. This value MUST NOT be used in conjunction with other values.
##### sessions
2021-04-10 16:42:36 +02:00
Metrics that source from a single browser session can be grouped and distinguished as such.
##### users
Users can be identified across multiple browser sessions.
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
#### Example
2021-03-25 20:04:18 +01:00
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-04-10 16:42:36 +02:00
Tracks: sessions, users
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
### Varies
2021-04-01 11:16:55 +02:00
This OPTIONAL, RECOMMENDED single-value field indicates the usage of content experiments like A/B testing.
It MUST contain a single value only.
#### Allowed values
##### none
All users are served the same content without any changes.
##### random
Content experiments are performed by grouping users randomly into buckets and serving them different content.
##### behavioral
Content experiments are performed by grouping users into buckets based on their behavior and serving them different content.
#### Example
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
Varies: random
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-04-01 11:16:55 +02:00
### Shares
2021-03-25 20:04:18 +01:00
This OPTIONAL, RECOMMENDED multi-value field indicates whether data is shared with select users, the general public or third parties.
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
#### Allowed values
2021-03-25 20:04:18 +01:00
##### none
The data collected is not shared with any party unless affiliated with implementor.
##### per-user
Users can access the usage data that is associated with them in a non-aggregated way, isolating all data that is specific to their current means of re-identification.
##### general-public
Usage statistics for the site or service are available to the general public.
2021-03-25 20:04:18 +01:00
##### third-party
Data is being shared non-publicly with third parties. This MUST also be specified when datasets are aggregated or pseudonymized beforehand.
2021-03-26 19:54:07 +01:00
#### Example
2021-03-25 20:04:18 +01:00
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
Shares: general-public
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
### Implements
2021-03-25 20:04:18 +01:00
This OPTIONAL field indicates conformance with certain regulations and legislation.
2021-03-25 20:04:18 +01:00
Example values are:
- gdpr
- hiipa
- ccpa
2021-03-26 19:54:07 +01:00
#### Example
2021-03-25 20:04:18 +01:00
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
Implements: gdpr, ccpa
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
### Deploys
2021-03-25 20:04:18 +01:00
This OPTIONAL field indicates which software is being used for collecting analytics.
Example values are:
- google-analytics
- plausible
- hotjar
- matomo
2021-03-26 19:54:07 +01:00
#### Example
2021-03-25 20:04:18 +01:00
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
Deploys: google-analytics, hotjar
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
## Examples of analytics.txt files
2021-04-01 19:52:34 +02:00
### A site using analytics
2021-03-25 20:04:18 +01:00
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
# analytics.txt file for www.analyticstxt.org
2021-03-26 18:33:39 +01:00
Author: Frederik Ring <hioffen@posteo.de>
2021-03-25 20:04:18 +01:00
Collects: url, referrer, device-type
Stores: first-party-cookies, local-storage
2021-04-01 10:50:53 +02:00
# Usage data is encrypted end-to-end
2021-03-25 20:04:18 +01:00
Uses: javascript
# Users can also delete their usage data only without opting out
Allows: opt-in, opt-out
# Data is retained for 6 months
Retains: P6M
# Optional fields
2021-04-01 19:30:56 +02:00
Honors: none
2021-04-10 16:42:36 +02:00
Tracks: sessions, users
Varies: none
Shares: per-user
Implements: gdpr
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-04-01 19:52:34 +02:00
### A site not using any analytics
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
# analytics.txt file for www.frederikring.com
2021-04-02 11:29:11 +02:00
Author: Frederik Ring <frederik.ring@posteo.de>
Collects: none
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
# Location of the analytics.txt file {#location}
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
By default, an analytics.txt file SHOULD be placed in the ".well-known" path as per {{!RFC8615}} of a domain name or IP address.
## Alternatives
In case implementors are unable to meet this requirement, other options are available.
### link Tag
2021-03-25 20:04:18 +01:00
Implementors MAY signal the location of an analytics.txt file in the context of a HTML document using a link element of rel "analytics"
Example:
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
<link rel="analytics" href="https://example.com/resources/analytics.txt">
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
2021-03-26 19:54:07 +01:00
### HTTP Header
2021-03-25 20:04:18 +01:00
In addition to that implementors MAY send an HTTP header of `X-Analytics-Txt` with a response, sending the URI of the applicable file.
Example:
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
X-Analytics-Txt: https://example.com/resources/analytics.txt
2021-04-01 17:10:29 +02:00
~~~~~~~~~~
2021-03-25 20:04:18 +01:00
## Precedence
2021-03-26 19:54:07 +01:00
2021-03-25 20:04:18 +01:00
In case multiple of these signals are being used, the precedence taken is:
2021-03-26 19:54:07 +01:00
1. X-Analytics-Txt Header
1. link element
1. ".well-known" location
2021-03-25 20:04:18 +01:00
## Scope of a file
2021-04-10 14:07:20 +02:00
An analytics.txt file MUST only apply to the domain or IP address of the URI used to retrieve it, and SHALL NOT apply to any of its subdomains or parent domains.
If distributed in non-standard locations, an analytics.txt file MAY also apply to products and services provided by the organization publishing the file (e.g. desktop or mobile applications) and which cannot be mapped to a domain name or IP address.
In such cases, implementors MUST add sufficient commentary describing the applicable scope.
2021-03-25 20:04:18 +01:00
2021-03-25 11:11:36 +01:00
# Security Considerations
2021-03-25 11:28:07 +01:00
## Incorrect or stale information
2021-03-25 11:11:36 +01:00
If information given in an "analytics.txt" file is incorrect or not kept up to date, this can result in usage of services under wrong assumptions, thus exposing users to possibly unwanted data collection and handling.
Not having an "analytics.txt" file may be preferable to having incorrect or stale information in this file.
Implementors MUST use the "Author" field (see {{author-field}}) to allow inquiries about the correctness of the given information.
2021-03-25 11:11:36 +01:00
2021-03-25 11:28:07 +01:00
## Spam
2021-03-25 11:11:36 +01:00
2021-03-26 18:33:39 +01:00
Implementors should be aware that disclosing mandatory author information as per {{author-field}} in such a file exposes them to possible Spam schemes or spurious requests.
## Multi-user environments
In multi-user / multi-tenant environments, it may possible for a single user to take over the location of the "/.well-known/security.txt" file which would also apply to others.
Organizations should ensure the ".well-known" location is properly protected. Implementors can instead use other locations as per {{location}} in such scenarios.
2021-03-25 11:11:36 +01:00
2021-03-25 11:28:07 +01:00
# IANA Considerations
2021-03-25 11:11:36 +01:00
## Well-Known URIs registry
The "Well-Known URIs" registry should be updated with the following additional values (using the template from {{!RFC8615}}):
URI suffix: analytics.txt
Specification document(s): this document
Status: permanent
2021-03-25 11:11:36 +01:00
--- back