Flickr pictures, web beacons, and a modest proposal
As I noted in the comments of the previous post, I don’t have ads on the site, but I do have flickr pictures directly linked from my flickr account.
It is conceivable to me that flickr pictures could qualify as “web beacons” under the Yahoo privacy policy, and thus be used for tracking purposes. Presumably, this was not the original intention of the flickr developers, but it’s certainly a possibility now that they’re owned by Yahoo. Are the access logs for the static flickr pictures available to Yahoo? Probably. Are they correlated with other sorts of usage information? It’s not clear. Presumably, flickr pictures are linked in places where standard Yahoo web beacons can’t go, because they’re not invited (like on this site, for example).
I think my conclusion is that this is probably not a problem, but maybe it is. It and other sorts of distributed 3rd party tracking all have one thing in common:
It’s called HTTP_REFERER.
Here’s how it works. When you make a request for any old random web page that contains a 3rd party ad or an image or a javascript library or whatever, your browser fetches the embedded piece of content from the 3rd party. When it does that, as part of the request, it sends the URL of the page you visited as part of the request, in a field called the referer header (yes, it’s misspelled).
So, every time you visit a web page:
- You send the URL to the owner of the page. So far so good.
- You send your IP address to the owner of the page. Not terrible in itself.
- You send the URL of the page you visited to the owner of the 3rd party content. And this is where it starts to degrade a little.
- You send your IP address to the owner of the 3rd party content. The owner of the 3rd party content may be able to set a cookie identifying you. Modern browsers are set by default to refuse 3rd party cookies. However, if that 3rd party has ever set a cookie on your browser before (say, if you hit their site directly), they can still read it. In any case, you can be identified in some incremental way.
- The next time you visit another site with content from the same 3rd party, they can probably identify you again.
That referer URL is a significant key that ties a lot of browsing habits together.
There’s an important distinction to be made here. The referer header makes it possible for 3rd party sites to track your content, and it’s only one of many ways. Doing away with the referer header won’t prevent the sites running 3rd party tracking content from doing so. The owner of the site can always send the URL you’re looking at to the 3rd party as part of the request, even if your browser isn’t. However, what this does prevent is tracking without the consent of the owner of the site you’re looking at. Of all of the sites you’re looking at, actually. Judging from my admittedly limited conversations with site owners, there are a LOT of people out there who have no idea that their users can be tracked if they include 3rd party ads on their site, or flickr images, or whatever. (Again, not to say that their users are being tracked, but the possibility is there.)
Again, the site that includes the ad or image or whatever isn’t sending that information – your browser is, and this is a legacy of the early days of the web. Some browsers allow you to turn it off and not send any referer information. I’d argue that this should be off by default, because there disadvantages outweigh the benefits. I’m told that legitimate advertisers don’t rely on the referer header anyway, because it can be unreliable. If that’s true, that’s even less reason to keep it around.
Suggestion number one was “Tracking information that’s linked to personally identifiable information should also be considered personally identifiable“.
Perhaps suggestion two is “Let’s do away with the Referer header”. (Of course, this comes on the heels of a Google-employed Firefox developer adding more tracking features instead of taking them away.)
Arguments for or against? Are there any good uses for this that are worth the potential for abuse?
Tags: IP address, privacy, tracking, logs, retention, personal information, referer
January 30th, 2006 at 1:59 pm
The referer-URL is the easiest way to pretty much eliminate a whole category of cross-site scripting attacks.
Many cross-site scripting attacks in most applications can be defeated by checking the referer URL and only accepting a post if the referer URL is from the same site.
While developers should clean up their act and solve cross-site scripting problems the right way, no doubt a huge amount of the potential cross-site scripting attacks have been forestalled through the simple expedient of checking the referer. The reason this usually works is that so far (I believe) there has not been a single vulnerability on any browser or platform that tricks the browser into sending a bogus referer, and users very seldom turn the referer off — much more seldom than they turn off cookies, in my experience.
The ideal use of referer URL would permit sending referer URLs *only within the same site*, or *only within the same site when doing a form POST.
This would enable referer URL to continue to be used in the common manner it’s used today, but would have no privacy implications I can see.
So perhaps you should campaign for “referer header on form POST ONLY!”
January 30th, 2006 at 2:49 pm
Call me naive, you wont be the first, but what are some examples of “the potential for abuse” that you hope to avoid?
January 30th, 2006 at 5:29 pm
Here’s one example. Say you have a flickr picture on your site, and I have a flickr picture on my site. Someone hits both of them. Now flickr has a log with that person’s IP address hitting both sites. Flickr’s owned by Yahoo. Now that person logs into Yahoo, and Yahoo has a way to tie IP address to their email address. Flickr pictures don’t seem to carry cookies with them, but other content types do, and that makes it easier too.
I’m not making any judgements here about why someone might want to track that that person visited my site and yours, or any other sites, or under what circumstances Yahoo would release the logs tying them together. But the fact is that these sorts of logs do enable tracking of this sort, in general, and that is what I see as the problem.
I don’t see any particular reason, other than what James mentioned above, why the browser should be enabling this sort of tracking by sending referers in the first place, either for embedded content or for linked content.
January 30th, 2006 at 10:52 pm
Hi. Interesting discussion I found via WWWAC.
I don’t know if it will assuage your concerns, but from the tracker’s point-of-view, an IP address is a pretty unreliable way of identifying an individual, for several reasons that I’ve discovered in my own work to analyze my sites’ log files. The most common reason is the IP addresses are assigned “dynamically” to many individuals who connect to the internet via an ISP, so each time an individual turns on the computer and connects to the internet, a new IP is temporarily assigned to them. Another reason is that individuals increasingly connect to the internet from multiple locations, so they are likely to present a variety of IP addresses in the course of normal internet use. A third reason is the some organizations “pool” users from throughout a network behind a single IP address for traffic to the web, so that everyone on the network presents the same IP address to the outside world, even though their actual computers have distinct IP addresses inside the network.
In your above example, Yahoo could tie the hits on Flickr together with the Yahoo login, but only if the IP address had not changed.
Cookies would be a more reliable means of tying the hits together and seeing what URLs a user goes to.
In general, though, I’d say you are correct to guess that featuring Flickr photos opens your site’s users up to examination in ways that most people don’t realize.
John
January 30th, 2006 at 11:11 pm
I have done some work in the past on user tracking based on correlating IP address with cookie values for registered users of a large e-commerce site, to figure out how to tie user sessions to a particular webserver based on IP address. Our finding was that somewhere betwen 15-20% of users, for some reason, had frequently changing IP addresses, from hopping around, dynamic reassignment, or IP pools.
Obviously, cookies are more reliable, if people are accepting cookies, but it’s not an either/or question. Frequently, cookies and a relatively static IP address are both present, but even if one or the other is missing, there are still varying levels of reliability with respect to tracking capability.
(See the previous post at http://www.aquick.org/blog/2006/01/29/whats-the-big-fuss-about-ip-addresses for more discussion about this.)
January 31st, 2006 at 8:56 am
Adam (#3), so yahoo now knows that user X has viewed a certain flicker image, and visited two websites.
Based on that information, i’ll repeat my question: what is the “potential for abuse”?
January 31st, 2006 at 10:19 am
Interesting ideas, and dialogue. It’s important that site operators remain transparent as to the data they are collecting on their visitors and how they use it. Third party services need to do this as well.
And as a side-note, if you’ve logged into flickr, then you are sending your account id (and password) in the cookie on every other request to flickr, like when visiting this site. This is an example of how services sometimes take a low-road in security best practices that might lead to folks becoming even more paranoid!
January 31st, 2006 at 10:27 am
Not anymore. I’ve removed the flickr badge.
January 31st, 2006 at 11:38 am
I think there are various levels of “potential abuse” here, and any potential tracking possibilities by this are simply another piece in the puzzle that result in more surveillance overall. Is there a specific danger that I can point to that may arise from flickr pictures being present on multiple sites? Probably not. I can’t say I’ve heard of one. That doesn’t mean it’s a good trend to encourage, and just because there hasn’t been a public case where this is a problem doesn’t mean there’s no problem brewing.
So, let’s talk about the picture of the current state of surveillance, which embedded third party content broadens in the general case, often without the knowledge of those participating. Suppose your government passes a law making something illegal that was legal at one point. Now suppose they want a list of all of the people who were looking at sites that were legal but aren’t anymore, and they get the usage records of those sites going back a few months or years, to compile a list of potential people who should be “monitored”. Do the records tying those IP addresses to real people still exist at that point? Maybe they do. Data retention regulations are starting to be adopted. Is it “okay” to monitor those people even though what they were doing was perfectly legal at the time? Now suppose it’s not something that’s been made illegal, but it’s just something that the government wants to suppress, like criticism of the government. Still okay? Now suppose it’s not the government, but a blackmailer who’s broken into the systems of the third party content ad provider and just wants some cash to not tell your boss you were looking at tentacle porn or whatever your thing is. Still okay?
Is your clickstream private? Is it public? Is it secret?
Is third party data aggregation of browsing habits any different from a company being hired to process library loans, and then doing whatever they want with the records of who’s reading what, possibly without the permission of the library?
January 31st, 2006 at 11:56 am
Yikes. If i lived in communist china, or the former U.S.S.R, i might have reason to worry about such repressive surveillance of otherwise law abiding citizens. Thankfully, i don’t. But it makes you wonder what log-data google will willingly share with the red chinese, as just another “cost of doing business”, similar to the way they agreed to censor search results on google.cn.
http://sayanythingblog.com/2006/01/26/what-oppression-looks-like/
January 31st, 2006 at 12:54 pm
Just something to mention in relation to this.
The referer header is completely unreliable. It’s trivial to spoof, and in fact there’s a very handy extension for firefox (”refspoof”) that makes it super easy for even non-experienced folks to spoof referrers. It’s been very easy to do forever if you knew what you were doing with a command prompt.
That doesn’t really address the tracking of “normal people” issue, except to note that it would be stupid for sites to rely on that header actually containing real data. It does make it completely useless for web developers to try to control access to stuff with the referer header, and folks do try.
January 31st, 2006 at 4:03 pm
Re: referer spoofing –
The referer header doesn’t have to be reliable to stop certain cross-site scripting attacks. It’s not intended to verify the user — *it’s intended to help a victimized user who doesn’t realize he’s being directed to a site as part of a cross-site scripting attack*.
This is a very interesting and particular case where it does some good.
The referer header stops certain cross-site scripting attacks by essentially asking a web user who has been tricked into submitting a malicious script: “Did you mean to get here through the normal channels? Because if you did, your referer is wrong, empty, or a spoof.” If the user has spoofed the referer or turned it off, but did come through normal channels, she will be turned away until she enables the normal referer.
A malicious attack that succeeded despite a referer check would need to cause the browser to spoof the correct referer, in addition to pulling off the basic XSS attack. I am unaware of any such attack which can cause a browser to send a particular referer string. Note that this is *not* the same as the user being able to spoof the referer; the point is that the malicious page would need to somehow trick the user’s browser, unbeknownst to the user, into spoofing the referer that’s supposed to be safe. Again, I am unaware of any such attack.
Note that the website is not trusting the user to accurately report the referer. The website’s security doesn’t depend on the user being honest. The website is helping an unfortunate or clueless user who gets tricked by a cross-site scripting attack to notice that he or she did not mean to post a form with, say, a malicious script in it.
If the user is spoofing his or her own referer, the website will reject the post, which is good, possibly with a warning “You must accurately provide a referer in order to post to this site. If you did not intend to post to this site, you may have been directed here by a malicious page, email or other cross-site scripting attack.”
Now this is *not* sufficient to secure against all XSS atacks, but it is a common step, and in some cases it may be helping keep sites safer than they would otherwise be.
The referer header cannot be *relied* upon, but it can be used sort of like an air bag. The user can buy a car without it. The user can choose to remove the air bag. The user can decide to always wear a seat belt and drive with an air bag, or to use a seat belt but *not* to have an air bag. Certainly, the best alternative is not to crash the car, to always wear a seat belt, and to have an air bag. But if the user is not wearing a seat belt and a crash happens, the air bag still offers some protection. In this scenario, a referer check is like an air bag, and it is probably doing some good, some of the time.
Therefore, in the circumstances where it can help, I would recommend that the referer should continue to be supported.
January 31st, 2006 at 7:01 pm
Mr. Wetterau: You admit that the header is unreliable, but you’re still willing to use it to base security decisions on. Personally, I think that’s a bad idea (and I think most security professionals would agree).
In your airbag example, would you think much of that airbag being there if you knew it was so unreliable that it only offered about a 1% chance of doing any good at all? How about if your car had ten different safety devices — 8 airbags, a seatbelt, and a proximity sensor or something — but no single one of them had more than a 1-in-100 chance of working. How safe would you feel? Would you choose that scenario over having a single 99% reliable seatbelt?
If we can’t be certain the information in that header is real, what good does it do?
I’m dubious about your contention that there’s not a current way to get an XSS attack to spoof the referrer header, but I haven’t tried it so I’ll have to take your word on that.
January 31st, 2006 at 11:47 pm
Sean:
I did not say I was willing to use the referer. I said that the referer is so used by some people, and it would be good not to propose breaking that, because in some cases it’s doing some people some good.
As to the referer’s being unreliable — it is unreliable as a source of authentication, and I would *never*, ever use it for that purpose. It is not unreliable (as far as I know) in the case I mentioned. And there’s a very good reason for that, which you inquire about:
“If we can’t be certain the information in that header is real, what good does it do?”
That is a good question, for which I have a good answer. You see, it works like this:
Clueless user A gets an HTML email or sees a web form from malicious attacker B. This HTML email or web form purports to be a harmless link to something enticing, and the clueless user A violates the canons of good security and clicks on a button on the page.
The form has been set to post to a web site that makes some of user A’s sensitive account information available, such as possibly a credit card number. The site in question would permit a cross-site scripting attack, having been badly written, and the button the user clicked on will submit a form that will cause javascript to be posted and run and then expose the user’s cookie, thereby allowing for account stealing by malicious attacker B.
This *would* work, but the inept website managers have at least taken the precaution of validating that the referer URL for the form in question must be one particular other page on the site.
Now, the referer URL header may not be valid. User A’s browser may be set up to spoof it to be empty, or to be some random string.
In that case, the website will *infallibly* reject the submitted form. This is exactly the right thing to do!
If, on the other hand, the user has not set up his or her browser to do anything to the referer URL, the result will be that the referer URL will show that the user did not come from the correct page, and the form will be rejected; again, *infallibly* the right thing.
In fact, the only way the form will be accepted is if it comes from a static form elsewhere on the site, and the user has not munged the referer. This will, of course require users of the site not to munge referers, but this is not uncommon; in fact it covers the vast majority of cases. This only happens in the event user A does not follow a phishing link or inadvertently click on a malicious button on another site, but rather navigates the site in the expected way.
In this case, the referer header would always block the XSS attack I was discussing (but possibly not other families of XSS attacks).
It will also not permit users who have munged their referers to use the form in the normal way. In that case, it will be a momentary annoyance for the tiny ultra-minority of users who do so, and who have to turn the referer defaults back on in order to use the form.
This technique never fails to prevent this particular category of attacks, and only inconveniences a tiny minority of users.
Now, is it worth it?
It really doesn’t matter what I think. The important fact is that these techniques have in fact been used by some sites. They shouldn’t rely on this as their solution to XSS, but in those cases it’s possible that they are actually preventing attacks, and accepting the trade-off of inconvenience for some of their users.
I do not want to encourage a practice that will result in making their sites less secure while they adapt, if they have knowingly and willingly accepted the inconvenience. If the referer were uniformly turned off, such sites would have to stop checking it, which in some cases would make the XSS attack feasible. Therefore, I do not want to advocate eliminating referers in all cases. I think restricting the referer to the case of submitting a form from one page on a site to another page on the same site is a sensible and pragmatic compromise that will allow these poorly audited sites to continue to benefit from their defense.
This defense has a 1 in 1 chance of working in the case it covers.
February 1st, 2006 at 1:04 pm
Thanks for the update James, I think I understand what you’re getting at now.
I finally found a document describing what I thought was possible: http://www.cgisecurity.com/lib/XmlHTTPRequest.shtml
“Note that Referer is considered (for some reason) to be a good way of validating that a browser-using non-malicious client is interacting with the site in the “expected” manner, i.e. not via CSRF or embedded frame. Referer validation is suggested in [3] to prevent CSRF, and in several other sources as a way to prevent leeching (linking to images in other sites). In this paper, I prove that given some conditions, the Referer can be completely spoofed at the client side, and that pages and images can be successfully pulled and displayed using a spoofed Referer (in some scenarios). As such, using the Referer can no longer be considered a security measure, at least not in HTTP requests (as opposed to HTTPS/SSL).”
Also note another issue: our inept website author is truly inept, and they try to “protect” themselves by detecting the referrer. Then they make a nice snazzy error page when the referer data is bad (so that folks know that they need to turn it on, for example). If they include the contents of the referer header itself in that error page, *that’s* a great avenue for an XSS attack itself, making the cure worse than the illness.
There is also a paper at http://www.cgisecurity.com/lib/XSS.pdf that shows how to use information gathered from the referer header via sending malicious e-mails to webmail accounts to attack the recipients.
This is probably getting a bit beyond what folks on this blog want to see, so if you want to talk more with me about it, you’ll find my e-mail address at my website linked above.
February 1st, 2006 at 1:59 pm
Please feel free to continue the discussion.
February 1st, 2006 at 5:24 pm
Sean: Thanks for pointing that out. At some point in the discussion I did wonder about AJAX based attacks, but I didn’t look into it.
One thing that’s not clear to me is whether such AJAX attacks would succeed in getting the user’s cookie. Here’s why: the traditional attack works by posting a form and relying on the browser to display the results, which may include malicious script. Since the result comes from a legitimate web site that’s trusted for the cookie, the malicious script can get the cookie value and cause it to be sent elsewhere.
But what about the AJAX case, where an object opens the connection? Would the cookie be accessible across that connection? Does the browser treat such connections as the same as a browsed URL for purposes of sending stored cookies?
If so, then I would agree that there’s absolutely no value left to checking the referer under any circumstances, and we might as well give up on it completely.
If not, then I think there is one very narrow and specific type of XSS attack that it helps forestall, and for the sake of those people who are using referer for that purpose, I would prefer not to give up on it completely.
As to the further ways that an inept site administrator can screw up with referer tags, such as by displaying them — yes, that certainly can happen.