2
0
mirror of https://github.com/offen/website.git synced 2024-10-18 12:10:25 +02:00

update article

This commit is contained in:
Hendrik Niefeld 2020-10-28 06:51:22 +01:00
parent 830dac0b2b
commit b5a4170f91

View File

@ -1,6 +1,6 @@
title: Opt in for quality insights
description: Collecting data only with user consent has a less obvious implication: the quality of insights from web analytics increases.
date: 2020-10-27
date: 2020-10-28
slug: opt-in-quality
sitemap_priority: 0.7
image_url: /theme/images/offen-blog-0120-opt-in-quality.jpg
@ -13,7 +13,7 @@ bottom_cta: budget
A key feature of our fair and open web analytics tool [Offen](https://www.offen.dev/get-started/) is that data will only be collected after website users have opted in. This is absolutely necessary for a fair data transfer, but also comes with another, not so obvious implication.
Collecting data only with user consent has a significant impact on the quality of analytics insights, especially for operators of smaller websites.
Collecting data only with *user consent has a significant impact on the quality of analytics* insights, especially for operators of smaller websites.
### Analyzing our own turf
@ -31,7 +31,7 @@ First let's take a look at some numbers provided by our web analytics tool. Thes
<img class="mt3 mb2" alt="Figure A" src="/theme/images/offen-blog-0120-opt-in-quality-A.svg"/>
To get an overview of our total traffic in the same time frame we use [GoAccess](https://goaccess.io/) to analyze our server logs. Although "total traffic" is a rather symbolic term here, since the exact number of visitors can never be determined by any method. Even if we leave aside all non human traffic, a combination of adblockers, privacy tools and bugs reliably prevent an absolutely accurate measurement.
To get an overview of our total traffic in the same time frame we use [GoAccess](https://goaccess.io/){: target="_blank"} to analyze our server logs. Although "total traffic" is a rather symbolic term here, since the *exact number of visitors can never be determined* by any method. Even if we leave aside all non human traffic, a combination of adblockers, privacy tools and bugs reliably prevent an absolutely accurate measurement.
<img class="mt3 mb2" alt="Figure A" src="/theme/images/offen-blog-0120-opt-in-quality-B.svg"/>
@ -39,7 +39,7 @@ Not surprisingly, far more data is generated in our server logs than with our we
Visitors in the server logs are identified on the basis of a single day and could therefore have been counted several times during recurring visits. Also our logs count visitors and not unique users. This is because all non human traffic on our website is also covered. Which means that search engines indexing our website and all other page views generated by software agents are included.
According to the [7th Annual Bad-Bot Report](https://www.imperva.com/resources/resource-library/reports/2020-bad-bot-report/) (Imperva Threat Research Lab, 2020), the average non human traffic on websites has now grown to more than 37%. Two thirds of this non human traffic accounts for so called bad bots. This software interacts with your website in the same way as a human user would do, which makes them more difficult to detect and block.
According to the [7th Annual Bad Bot Report](https://www.imperva.com/resources/resource-library/reports/2020-bad-bot-report/){: target="_blank"} (Imperva Threat Research Lab, 2020), the average *non human traffic on websites has now grown to more than 37%.* Two thirds of this non human traffic accounts for so called bad bots. This software interacts with your website in the same way as a human user would do, which makes it more difficult to detect and block.
Let us therefore take a closer look at the quantity and quality of referrer domains collected by both methods.
@ -49,7 +49,7 @@ Our server logs collected more than twice as much data over the period. Unfortun
We consider entries that originate from server networks without an useful domain name or any obvious marketing content as plain spam. Furthermore, all entries without an explicit link that do not come from a search engine are considered questionable.
Perhaps these interferences have no relevance on websites with very high traffic. However, if your website never has more than a hundred unique users per day, the noise generated by spam will have a significant impact on your analytics results.
Perhaps these interferences have no relevance on websites with very high traffic. However, if your website never has more than a hundred unique users per day, the noise generated by *spam will have a significant impact on your analytics results.*
Common web analytics tools try to solve this problem by blocking single traffic sources. But all the domains considered questionable and some of the spam related ones would certainly be included there. In any case, this approach leads to long lists of spam referrers in the respective code, which by definition are always out of date. An arms race that the developers of these tools inevitably lose. Is all this really necessary?
@ -59,7 +59,7 @@ We don't think so. An "opt in only" policy for data collection, which is necessa
Talking about these real users brings us back to the question of whether it is important to know your exact opt in rate. For an answer, consider for which users you want to optimize your website and what kind of users you want to attract.
Those who consent are most likely interested in your content. They support you with their usage data and may therefore be willing to support you in any other way. The exact share of these users is less interesting.
*Those who consent are most likely interested in your content.* They support you with their usage data and may therefore be willing to support you in any other way. The exact share of these users is less interesting.
### Deeper insights for optimization
@ -67,7 +67,7 @@ Nevertheless, common web analytics tools that collect data without user consent
Many users are recorded even though they have visited your website with very little or no interest. Some bounce off immediately and may just have been there by mistake. Still, all these data points are included in your analytics and will give you a distorted impression. The resulting false assumptions distract you from the important users and make it difficult for you to keep the necessary focus.
This is why the use of all available data is not the way to do better web analytics. Only a careful selection of the data to be evaluated leads to deeper insights for optimization. All the better if this can be done in combination with a privacy friendly approach to data collection.
This is why the use of all available data is not the way to do better web analytics. Only *a careful selection of the data to be evaluated leads to deeper insights* for optimization. All the better if this can be done in combination with a privacy friendly approach to data collection.
### Try Offen today