Adam Fields (weblog)

This blog is largely deprecated, but is being preserved here for historical interest. Check out my index page at adamfields.com for more up to date info. My main trade is technology strategy, process/project management, and performance optimization consulting, with a focus on enterprise and open source CMS and related technologies. More information. I write periodic long pieces here, shorter stuff goes on twitter or app.net.

3/16/2007

ISPs apparently sell your clickstream data

Apparently, “anonymized” clickstream data (the urls of which websites you visited and in what order) is available for sale directly from many ISPs. There is no way that this is sufficiently anonymized. It is readily obvious from reading my clickstream who I am – urls for MANY online services contain usernames, and anyone who uses any sort of online service is almost certainly visiting their own presence far more than anything else. All it takes is one of those usernames to be tied to a real name, and your entire clickstream becomes un-anonymized, irreversibly and forever.

I’ve talked about the dangers of breaking anonymization with leaking keys before:

Short answer: It is not enough to say that a piece of data is not “personally identifiable” if it is unique and exists with a piece of personally identifiable data somewhere else. More importantly, it doesn’t even have to be unique or completely personally identifiable – whether or not you can guess who a person is from a piece of data is not a black and white distinction, and simply being able to guess who a person might be can leak some information that might confirm their identity when combined with something else.

This is also completely setting aside the fact that you have very little direct control over much of your clickstream, since there are all sorts of ways for a site you visit to get your browser to load things – popups, javascript includes, and images being the most prevalent.

Preserving anonymity is hard. This is an egregious breach of privacy. Expect lawsuits if this is true.

http://internet.seekingalpha.com/article/29449

Tags: , , ,


8 Responses to “ISPs apparently sell your clickstream data”

  1. Aaron Hitchcock Says:

    Maybe this is due to web-naivete or possibly naivete in general, but I don’t understand why anonymity
    is so important. Are people afraid of a big-brother type figure seeing something subversive and limiting
    freedoms? Are people worried about employers checking on how much porn we look at at work and at home?
    Are people worried about marketing agencies targeting us due to our buying patterns? I agree, these are
    concerns. I just wonder how important they are.

    I ask with this kind of mindset- My father has gotten a cell-phone. I don’t think
    we suddenly have more to say to each other, we just now have a more readily available conduit for communication.
    Sure we talk more, but what about? Why does he keep calling me? Was he this compelled to talk me before?
    Which leads me to think why, with internet use, do I need to worry about anonymity? I didn’t before.
    Before the internet, or mass communication and the like, your business was up for everyone. My grandmother
    tells stories about party-line telephones, where anyone could pick up the phone and listen to whatever
    conversation was going on. Why are WE so concerned about anonymity?

    It’s something I’d really like to understand more. Maybe I’m misunderstanding the whole thing. Could you get
    back to me if you get a chance? Thanks.

  2. dph Says:

    Aaron: In my view, there are valid reasons for wanting to have some control over what you choose to reveal over what you do not reveal. Privacy as a concept, doesn’t mean that you necessarily hide things, it means that you have control over what you reveal to whom. The erosion of anonymity in your life (and a significant component of how many people live is conducted online, whether it’s the food choices they order take out from, where they go on vacation, consequences of potential medical conditions) also erodes your privacy.

    Tied to this notion of control is also the notion of context. Think of the things that you have clicked on or searched for, that if taken out of the context of the moment could give someone a significantly skewed version of who you are or the life that you live. What happens if a potential employer identifies that I have performed lots of searches on identity theft, insider threats to organizations, social engineering and the like. Am I a threat to that organization, or am I someone who is curious about this because I am a responsible member of society who wishes to contribute to the dialog involving these issues. Why ask for clarification when it is easier to just not bring me in for an interview. I get no opportunity to set the record straight, address the concerns or anything.

    It would be similar to if all of your financial records were available, but you had no way to know what was there and no way to correct an inaccuracy. If a mistake or a legitimate disagreement were to make it on the record, you would have no dispute resolution process to turn to, you wouldn’t even know what happened, all you know is that you can’t even get a credit card or a loan secured by collateral. Not only that, but ANYONE can get access to these records if they pay 29.95. I dont’ know how you feel about the girl that you went out on a date with, or your neighbor, having access to all of your financial information, but that isn’t a particularly cheery thought to me.

    D

  3. adam Says:

    I’ve partially addressed this question in one of the previous threads:

    http://www.aquick.org/blog/2006/01/30/flickr-pictures-web-beacons-and-a-modest-proposal/#comment-667

    This is a pretty complicated question with many aspects. I think we’re more concerned with anonymity because there are more ways to breach it with machines, and they’re distributed unevenly. When your business was up for everyone, it probably wouldn’t travel very far even as it was pretty open. I think we’re just getting to the tip of the iceberg of being able to imagine what people can do with this data. It’s very important to note that once it’s out, there’s absolutely no way to contain it. It’s also very important to note that actions have and will continue to be taken on the basis that this information represents a deliberate intent – that is, that because a site shows up in your clickstream, that you visited that site. In many cases, that’s a false assumption, and is asking for trouble. We’re just starting to see how this can play out – for an extreme example, look at the Julie Amero case:

    http://en.wikipedia.org/wiki/Julie_Amero

    In any event, it needn’t be that complicated. Who cares if it’s actually damaging? I have an expectation of privacy with respect to my online activities, and I have an expectation of a reasonable categorization of information as personally identifiable or not. My ISP’s privacy policy says that personally identifiable information that may be shared is limited to mailing address information. My point is that clickstream data (and IP address, and other) is often lumped in with non-personally identifiable information for the purposes of these agreements, even if that information can be used to identify you, or used as a significant piece of the puzzle in doing so. That is, in my opinion, a gross misrepresentation of the reality behind the intent of those agreements.

  4. macguy Says:

    Um, yeah our clickstream data is being sold, collected and analyzed by marketers. Companies like Compete, Inc. buy Terabytes of clickstream data directly from ISPs and analyze the data for shopping behavior (are more people shopping for Toyota Camrys this month?). I’m pretty sure that the information is held privately and is completely anonymized, but if you are active online as others have mentioned, your personal data could appear, one would think.

  5. John Says:

    Aaron: somebody who was part of the spanish inquisition once said “give me six sentences spoken by an honest man, and I will give you an heretic”.

  6. Andrew Says:

    Hitwise has been buying data from ISPs for years, although they have very strict policies about privacy. They do not release user data, only aggregate data, as far as I’m away. Nothing new.

  7. Blinky the Hitman Says:

    ROTFLMFAO! I hope to God they ARE selling mine, because I’ve got a few 4K-character Firefox bookmarklets I run for an hour or so every night, that sequentially go to
    hundreds of the most godawful places on the web. I also have one that runs hundreds of Google (read: Toadboys-of-the-MegaCorpCabalGov) searches for such phrases as “Pig-Tailed Hookers on Tricycles”, “Best Deal on Gro-Lights in Crawford, TX”, and
    “Blueprints for Homemade Flamethrower”.

    Screw Diebold, I vote with JavaScript. :-]

  8. A BoingBoing Reader Says:

    In the party line days, there was parity: the people who could listen in on your business were the same ones whose business you could listen in on.

    With this sort of data-mining, the information flow is all one-way: if my ISP collects and sells my clickstream, I don’t get to see personal data about the ISP owners or board of directors, or personal information about the company that buys the clickstream and mines it for information, which it then sells to another organization that is protective of its “trade secrets.”

    I hear “honest people have nothing to hide” a lot–often from people who would turn around and defend the government’s right to classify all sorts of information from the citizenry, and who aren’t about to publish their own phone numbers, let alone the complete list of who they spoke to in the last month, and what porn sites they themselves browsed.

    I’m giving my ISP money for a service, not the other way around.

    [Under the circumstances, it seems fitting not to put my actual contact info here.]

Powered by WordPress