Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracking on readthedocs.io not mentioned in "Ethical Ads" section of the documentation #3952

Closed
paulmueller opened this issue Apr 16, 2018 · 17 comments
Labels
Needed: documentation Documentation is required

Comments

@paulmueller
Copy link

paulmueller commented Apr 16, 2018

Details

https://docs.readthedocs.io/en/latest/ethical-advertising.html

  • We don’t track you
  • We don’t sell your data
  • We host everything ourselves, no third-party scripts or images

Expected Result

According to this information, I would assume that there is no tracking on readthedocs.io.

Actual Result

readthedocs.io always connects to

In addition, readthedocs.io stores a cookie for each subdomain on my PC.

Thus, there seems to be tracking involved. If this is not tracking, it would be nice if you could update the linked article with an explanation or clarify that only clicks are tracked and how (e.g. as in the blog post: https://blog.readthedocs.com/ads-on-read-the-docs/).

@davidfischer
Copy link
Contributor

Arguably we should be more specific in those docs. Specifically by "we don't track you", we mean:

  • we do not have a trove of user data we use to target advertising
  • we do not keep track of every ad a specific user has clicked on for targeting purposes. We do keep track of clicks for billing purposes although it is not tied to the user data for logged-in Read the Docs users.

If you want to read more about Read the Docs' current and evolving position on Google Analytics, you can do so in #3896. Tracking means different things to different people. What do you mean by "tracking"?

RackCDN is our static file server on Rackspace. I don't know what ID you are referring to specifically, but in the example image of https://216d72aca007988f34d7-7c4f8fef8f0aad53b9488757bc3dab78.ssl.cf5.rackcdn.com/triplebyte-matt.jpg the first part (216d72aca007988f34d7-7c4f8fef8f0aad53b9488757bc3dab78) is always the same for every user and is simply a Read the Docs static media server. This should be trivial to verify on your side. I don't believe this qualifies as tracking. Do you disagree?

@davidfischer
Copy link
Contributor

I see you edited the issue to mention a cookie.

In regards to the cookie, could you tell me what cookie is being stored? Other than the Google Analytics cookies, typically the only cookie being stored is a short lifespan CSRF token.

@paulmueller
Copy link
Author

Thank you for this thorough response.

  • I can confirm that I can see the same string `216d72aca007988f34d7-7c4f8fef8f0aad53b9488757bc3dab78`` from rackcdn.com. It is shown by uBlock Origin on my PC. I could not make out any differences from when it is blocked to when it is not blocked. Therefore, I assumed it might be connected to tracking. No, I think this does not qualify as tracking. What kind of static media are stored on rackcdn.com? All of the documentation seem to be served from readthedocs.io.
  • I can confirm that this is a short lifespan CSRF token. Sorry, I am not very familiar with such things.
  • I just saw that there is a 1x1px type Background image (data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAIAAACQd1PeAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAyRpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdpbj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuMy1jMDExIDY2LjE0NTY2MSwgMjAxMi8wMi8wNi0xNDo1NjoyNyAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RSZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtcDpDcmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNiAoTWFjaW50b3NoKSIgeG1wTU06SW5zdGFuY2VJRD0ieG1wLmlpZDoxOERBMTRGRDBFMUUxMUUzODUwMkJCOThDMEVFNURFMCIgeG1wTU06RG9jdW1lbnRJRD0ieG1wLmRpZDoxOERBMTRGRTBFMUUxMUUzODUwMkJCOThDMEVFNURFMCI+IDx4bXBNTTpEZXJpdmVkRnJvbSBzdFJlZjppbnN0YW5jZUlEPSJ4bXAuaWlkOjE4REExNEZCMEUxRTExRTM4NTAyQkI5OEMwRUU1REUwIiBzdFJlZjpkb2N1bWVudElEPSJ4bXAuZGlkOjE4REExNEZDMEUxRTExRTM4NTAyQkI5OEMwRUU1REUwIi8+IDwvcmRmOkRlc2NyaXB0aW9uPiA8L3JkZjpSREY+IDwveDp4bXBtZXRhPiA8P3hwYWNrZXQgZW5kPSJyIj8+EwrlwAAAAA5JREFUeNpiMDU0BAgwAAE2AJgB9BnaAAAAAElFTkSuQmCC).
    Does this serve a purpose?

Thanks again for answering my questions and sorry if they are a little naive.

@davidfischer
Copy link
Contributor

All our static assets are stored on Rackspace currently and media.readthedocs.org resolves there. The RackCDN URL is separate because it stores some static files that are not built during the Read the Docs' static asset pipeline (compiling JS/CSS, etc.). Mostly it is ad assets (static images) but some other files are there for the company like prospectuses.

A CSRF token is required to protect users against cross site request forgery attacks. There aren't typically any forms on *.readthedocs.io so arguably we don't need it. It is largely an artifact of the web framework we use.

Regarding that image, I see it as well (on https://docs.readthedocs.io/en/latest/ for example) but I'm not 100% sure what inserts it. It is an inline base64 image though so it doesn't have anything to do with tracking.

@davidfischer
Copy link
Contributor

That 1x1 image is part of the Read the Docs Sphinx theme: https://media.readthedocs.org/css/sphinx_rtd_theme.css

@davidfischer
Copy link
Contributor

Also, since you mentioned uBlock Origin, I'm just going to mention that we have a whitelist for advertising on Read the Docs if you are willing to whitelist us and/or the larger ad-supported open source community:
https://ads-for-open-source.readthedocs.io/en/latest/

@RichardLitt
Copy link
Member

Might be worth adding some of this information to the docs page in question before closing this ticket. If we use Google Analytics, we should state why.

@paulmueller
Copy link
Author

Thanks for the links. I am a strong supporter of whitelisting ethical ads.

Regarding google analytics, the thing is that you can never actually be sure that Google will anonymize IPs sent to them before they are stored (#3896). They are sent. I am not a lawyer, but especially in Germany, where you have to put your name and address on your site if your are even remotely connected to some kind of business, such things must be clear (you are addressing this already #2602).

For me it would be convenient if I could pay for both, disabling ads and disabling "click-tracking" (either by GA or future RTD).

@RichardLitt RichardLitt added the Needed: documentation Documentation is required label Apr 16, 2018
@davidfischer
Copy link
Contributor

I'm actively working on a docs PR to address at least some of these considerations. I get this question enough where it is worth mentioning. Specifically it will:

  • Document some of the ad targeting details
  • Mention the use of GA

Currently, you can pay to disable ads on Read the Docs. This is what Read the Docs gold essentially does. We haven't marketed it very heavily that it removes ads (it does) but that is coming soon.

What do you mean specifically by 'disabling "click-tracking"'?

@paulmueller
Copy link
Author

Yes, I am aware of RTD gold.

With paying to disable "click-tracking" I mean the option to disable GA on all project pages related to an RTD gold subscription. I.e. not only disabling ads, but also disabling sending any data to GA (which I know you need to improve RTD).

@davidfischer
Copy link
Contributor

Got it. The term "click tracking" is just a little unclear because it can mean a few different things. I just wanted to make sure I got what you meant explicitly and I think I do now. For example, since we bill advertisers based on clicks, we always have to count clicks unless we change our advertising model.

Right now I don't have a way to explicitly disable GA other than by running an ad blocker. I should mention that whitelisting ads on Read the Docs does not whitelist GA on Read the Docs so you can continue to block GA with an ad blocker while allowing ads.

@davidfischer
Copy link
Contributor

Per @RichardLitt's suggestion, I added docs detailing this in #3955

@paulmueller
Copy link
Author

paulmueller commented Apr 17, 2018

Thanks a lot! I believe ethical-advertising.rst should also get an update in #3955:

  • We don't track you
  • We don't sell your data
  • We host everything ourselves, no third-party scripts or images

To clarify the GA point, maybe change this to something similar to:

  • We don't track you. Our current advertising model relies on anonymized Google Analytics data though. For more information, see :ref:advertising-analytics.
  • We don't sell your data
  • We host everything ourselves, no third-party scripts or images

@davidfischer
Copy link
Contributor

@paulmueller, thanks for your comments. Can you out that comment into #3955 so your feedback on the PR is attached to the PR?

@davidfischer
Copy link
Contributor

I do want to keep all the data on analytics together though so if and when we swap GA out I don't have to remember to update multiple places.

@paulmueller
Copy link
Author

I think updating ethical-advertising.rst is important. If you worry about remembering this file, I would suggest a new issue "Swap out Google Analytics with own solution" with todo items as a checklist, including
- [ ] remove GA reference in ethical-advertising.rst.

@davidfischer
Copy link
Contributor

Just to give a small update here, it is now possible to opt out of Google Analytics on all docs pages built after May 1, 2018 by enabling your browser's Do Not Track setting.

See: https://docs.readthedocs.io/en/latest/privacy-policy.html#do-not-track

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needed: documentation Documentation is required
Projects
None yet
Development

No branches or pull requests

3 participants