This document proposes a way for websites and services to declare and disclose their usage of analytics and tracking software.
analytics.txt aims to be an elaborate file format that describes the privacy related characteristics of analytics and tracking software in a non-biased way.
An analytics.txt file is understandable for a non-technical audience, while also useful for the automated consumption by tools and software.
User tracking and the utilization of analytics software on websites has become a widely employed routine, visibly and invisibly affecting the way the user facing internet works and behaves.
Yet, there is no well-defined way of accessing information about what software is being used and what kind of data it is collecting in a standardized way.
Legislation can only ever cover a subset of the range of existing technological implementations, creating incentives for software to find workarounds, thus allowing them to hide their presence from users.
Automated audits are limited to aspects that are possible to detect in clients, but cannot disclose other important implementation details.
The file "analytics.txt" is not intended to replace the requirement for complying with existing regulations, but supposed to give insights beyond the scope of these regulations.
A fundamental design goal of the "analytics.txt" format is to make such a file human readable.
While the percentage of consumers that are actually human beings will likely be low - browser extensions or search engines would be good examples of possible consumers - this tenet can drive the specification into a direction where the format will focus on providing information that is useful for human beings, even when captured and processed further by other software.
Analytics as referred to in this document involves the collection of usage statistics in order to generate reports that can help the providers of websites and services to better understand and optimize their services towards real world user behavior.
"analytics.txt" is designed to provide insights beyond what is technically auditable from a client perspective.
While some characteristics could be determined automatically or manually at client level, others won't, and will rely on implementors providing correct information about what is happening at layers that are opaque to users.
This means consumers of an "analytics.txt" file will implictly need to trust the implementor to provide correct information, implicating two design goals for the format (technical implications are discussed in {{incorrect-information}}).
### Non-biased
All of the given datapoints are purely informational, there is no right or wrong option to choose from, and the format will never provide guidelines on how to assess or rate an "analytics.txt" file.
Based on this, implementors don't have strong incentives for providing incorrect information, but choose implementation because they are wishing to disclose information about their site that they otherwise couldn't.
An "analytics.txt" file should never be the canonical source of truth for making automated decisions or ratings about a site.
It is supposed to be one of multiple signals that can be used for assessing the behavior of a website, creating the possibility to connect and compare the provided data with what has been surveyed using other channels of information.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 {{!RFC2119}} {{!RFC8174}} when, and only when, they appear in all capitals, as shown here.
This document defines a text file format that can be used by implementors to signal information about their usage of analytics software to both users and software.
A field contains a "name" which is the first part of a field all the way up to the colon (for example: "Author:") and follows the syntax defined for "field-name" in section 3.6.8 of {{?RFC5322}}.
Unless specified otherwise by the field definition, multiple values MUST be chained together for a single field (for example: "Implements: gdpr, ccpa") using the "," character (%x2c).
Any line beginning with the "#" (%x23) symbol MUST be interpreted as a comment.
The content of the comment may contain any ASCII or Unicode characters in the %x21-7E and %x80-FFFFF ranges plus the tab (%x09) and space (%x20) characters.
Like many other formats and protocols, this format may need to be extended over time to fit the ever-changing landscape of the Internet.
Special attention is required for defining the allowed values in enumerations to ensure they are a. extendable and b. do not become obsolete too quickly.
This REQUIRED field holds an OPTIONAL display name and a REQUIRED email address ("name-addr") as per section 3.4 of {{!RFC5322}} providing information about a person or entity responsible for maintaining the contents of the file.
This REQUIRED multi-value field indicates which potentially privacy relevant user specific data is being collected or used in session identification or other procedures.
These values MUST also be specified if a property is not persisted as-is, but stored or processed in a hashed and/or combined form.
Such mechanisms usually try to compute a unique identifier from properties of the host Operating System, allowing them to re-identify users without having to persist an identifier.
The multi-value field indicates whether data is persisted on the client during the collection of analytics data and declares the browser features used for doing so.
This value is not required in case the analytics software sends static resources with cache headers, but does not make use of the request headers on subsequent requests for purposes other than managing caching of assets.
This field is REQUIRED unless the only value of the Collects field {{collects-field}} is none.
The multi-value field discloses information about whether user consent is being acquired before collecting analytics data, and if it is possible for users to opt out of the collection of usage data.
The single-value field indicates the duration for which the analytics data is being stored before being deleted. This duration MUST also cover periods where data might transition to be stored in aggregated form only.
The value is either a duration in days (including the days suffix), or the token "perpetual" in case data is retained without expiring it at some point.
A day is defined as 24 hours.
In case the retention period does not divide evenly into days, it MUST be brought up to the next round figure.
Users can access the usage data that is associated with them in a non-aggregated way, isolating all data that is specific to their current means of re-identification.
This OPTIONAL field indicates conformance with existing regulations and legislation. Values for this field SHOULD use all lowercase tokens with whitespace being replaced by the dash character (%x2d).
This OPTIONAL field indicates which software is being used for collecting analytics. Values for this field SHOULD use all lowercase tokens with whitespace being replaced by the dash character (%x2d).
An analytics.txt file located in the ".well-known" location MUST only apply to the domain or IP address of the URI used to retrieve it, and SHALL NOT apply to any of its subdomains or parent domains.
If the location is signaled using the HTTP Header or in the document markup itself, its scope SHALL be limited to the requested resource only.
If distributed in non-standard locations, an analytics.txt file MAY also apply to products and services provided by the organization publishing the file (e.g. desktop or mobile applications) and which cannot be mapped to a domain name or IP address.
In such cases, implementors MUST add sufficient commentary describing the applicable scope.
If information given in an "analytics.txt" file is incorrect or not kept up to date, this can result in usage of services under wrong assumptions, thus exposing users to possibly unwanted data collection and handling.
Not having an "analytics.txt" file may be preferable to having incorrect or stale information in this file.
This guideline also applies to field level: in case of ambiguities or uncertainties, it's recommended to omit a field or a value rather than providing incorrect information.
Implementors should be aware that disclosing mandatory author information as per {{author-field}} in such a file exposes them to possible Spam schemes or spurious requests.
In multi-user / multi-tenant environments, it may possible for a single user to take over the location of the "/.well-known/analytics.txt" file which would also apply to others.
Organizations should ensure the ".well-known" location is properly protected. Implementors can instead use other locations as per {{location}} in such scenarios.
The authors would like to acknowledge the feedback and input provided during the creation of this document as given by Michiel Leenaars, Cyrill Krähenbühl, Lasse Voss.