Google’s Empire, Privacy, and Government Access to Personal Data

google-priv.jpgA New York Times editorial observes:

At a North Carolina strangulation-murder trial this month, prosecutors announced an unusual piece of evidence: Google searches allegedly done by the defendant that included the words “neck” and “snap.” The data were taken from the defendant’s computer, prosecutors say. But it might have come directly from Google, which – unbeknownst to many users – keeps records of every search on its site, in ways that can be traced back to individuals.

This is an interesting fact — Google keeps records of every search in a way that can be traceable to individuals. The op-ed goes on to say:

Google has been aggressive about collecting information about its users’ activities online. It stores their search data, possibly forever, and puts “cookies” on their computers that make it possible to track those searches in a personally identifiable way – cookies that do not expire until 2038. Its e-mail system, Gmail, scans the content of e-mail messages so relevant ads can be posted. Google’s written privacy policy reserves the right to pool what it learns about users from their searches with what it learns from their e-mail messages, though Google says it won’t do so. . . .

The government can gain access to Google’s data storehouse simply by presenting a valid warrant or subpoena. . . .

This is an important point. No matter what Google’s privacy policy says, the fact that it maintains information about people’s search activity enables the government to gather that data, often with a mere subpoena, which provides virtually no protection to privacy — and sometimes without even a subpoena. In my book, The Digital Person, and in an earlier paper, Digital Dossiers and the Dissipation of Fourth Amendment Privacy, 75 S. Cal. L. Rev. 1083 (2002), I argued that today an increasing amount of detailed personal data is being maintained by various companies, merchants, and organizations. The Supreme Court has held that the Fourth Amendment does not protect against the government accessing records maintained by third parties. In United States v. Miller, 425 U.S. 435 (1976), for example, the Supreme Court held that people lack a reasonable expectation of privacy in their bank records because “[a]ll of the documents obtained, including financial statements and deposit slips, contain only information voluntarily conveyed to banks and exposed to their employees in the ordinary course of business.”

The New York Times op-ed goes on to criticize Google for not being a leader in protecting privacy:

It is hard to believe most Google users know they have a cookie that expires in 2038, or have thought much about the government’s ability to read their search history and stored e-mail messages without them knowing it. . . .

Google should develop an overarching privacy theory that is as bold as its mission to make the world’s information accessible – one that can become a model for the online world. Google is not necessarily worse than other Internet companies when it comes to privacy. But it should be doing better.

I agree with the op-ed, but I also think that businesses should use their power to push for greater legislative protections of personal information from government access. It is here were Google’s interests and the privacy interests of its users coincide. Right now, the government is inadequately regulated when it comes to accessing personal data maintained by third parties. If the businesses maintaining the data lobbied Congress for greater protections, this would help to address one of the major privacy threats that their maintaining the information poses. It wouldn’t solve all of the problems, but it would address a big one.

Thanks to Chris Hoofnagle and Steve Charnovitz for pointing me to this op-ed.

  1. E says:

    Actually, the scope of the information that Google collects goes much further than just one’s searches through Google and using other Google services such as GMail. The now ubiquitous Google Ads and the new Google Analytics service in use by a number of websites allow Google to track most of one’s visits, queries, and other activities throughout the Web. Additionally (though I am not aware of any such current activities), aggressive use of web browser technologies such as Javascript further enable fairly advanced analysis of what you are looking at, clicking, etc. to be relayed back home – all performed by your own web browser without your knowledge.

    Given the scope of this activity, it is pretty much impossible to “opt out” of Google’s data collection practices, or simply to surf anonymously. Instead, one has to take fairly aggressive actions not to participate. One might attempt various rudimentary actions, such as cookie blocking, to avoid such tracking, but it is fairly simple to correlate one’s IP address with one’s identity. Furthermore, more aggressive techniques such as anonymous proxies, URL blocking, etc. raise issues for websites dependant upon advertising. If I somehow manage to block Google Ads, is it fair to a website that counts on Google Ad revenue in order to operate? Additionally, one still has to be fairly vigilant to remain an anonymous surfer.

    This is not to say that Google is acting with any malicious purpose, or even is not making good use of the data it is collecting to provide better services. But the corpus of personal data amassed by Google, coupled with information found in various private databases, raises huge concerns. What one looks for and looks at on the web, activity which is only increasing over time, seems to already have become available for access for law enforcement and national security purposes. It seems that discovery in civil suits is not far behind. If I have a well-placed source working at Google, how much of this information might I be able to offer up for sale? Private investigators have done this for some time through the DMV, etc. What if someone manages to buy Google? This information is a significant asset for all kinds of purposes. What if Google decides to offer new services for employers to screen employees? Coupled with all kinds of other databases they could give you a complete picture of all aspects of a prospective employee’s life. There a lot of “what ifs” about these huge databases that are not currently being adddressed, or even fully understood what it means when this data persists indefinitely.

  2. Bruce says:

    A couple of things:

    I think it’s important not to get carried away here and figure out what Google is actually doing, and how that differs from what anyone else is doing.

    Google’s ability to identify your searches has nothing to do with cookies and is simply a function of their properly employing two very useful technologies: using GET for their search results and logging IP addresses. Using GET means that you can link directly to Google searches, which is very useful. Logging IP addresses of visitors, and what pages they view, is common practice on the web, for both security and web development reasons. Since Google pages contain the search terms right in the URL of the page, that means that if you know someone’s IP address, and you have access to Google’s server logs for the time period in question, then you can figure out what they searched for. So far, Google isn’t really any different from any other business (including your ISP) that has personal information about you.

    The cookies are a different matter, but I think it’s important to keep in mind that cookies are not inherently nefarious and are used for all sorts of legitimate purposes. I checked my own cookies folder and there are 2 cookies written by Google on there — one is labelled PREF, which sounds like it might be used to track my personal preferences, e.g. for Google News. The other seems to have something to do with Blogspot — my guess would be it allows automatic logins for comments on blogs. Of course either cookie ID could be matched to my IP address and page searches, in which case Google would be able to match my preference selections to my IP address and page views — but so what? I don’t think that cookie can be matched to my name and address unless I tell them who I am at the same time they can read the cookie, and I haven’t done that to my knowledge. And the 2038 expiration just means that the cookie will remain until deleted or you buy a new computer, which for most people happens every few years or so. If it was shorter — say, one month — that means I would have to reset my Google News preferences once a month, which would be a pain. It doesn’t mean that Google will retain their server logs for that long — I seriously doubt they have the money for or the interest in that amount of storage space.

  3. Mike says:

    Dan, is there anything to prevent the government from issuing a subpoena to Google requiring them to provide information on every person who, say, searches for [child pr0n]? I ask because after the Candyman case, it seems like the government would have a strong argument supporting probable cause to search the computers of everyone Googling such terms. So, essentially, the government would be saying the following: “Google has in its possession information regarding people whose computers are being used to commit a crime. Therefore, Google has to turn this information over to us.”


  4. Dave! says:

    Additionally, you can clear your cookies… and a cookie that doesn’t expire until 2038 is really just for convenience… what are the odds you’ll be using the same browser or the same computer until 2038??

    The logging of searches by IPs is also dubious. For example, 1) I don’t have a static IP at home, so there’s no way to prove 100% that any search linked to an IP was/is my search; 2) IPs can be shared/proxied so they are not 100% conclusive to link to a single person (or even entity, necessarily).

    Although I agree, the idea of the government being able to subpoena Google’s records to determine what you’ve been searching for is frightening.

    It leaves me with a neophyte (read: law student) question though… given that an IP != person, wouldn’t it be really difficult to meet the requirements for authentication to enter this kind of evidence? (Supposing the evidence resided on Google’s servers, not on your own personal computer.)

  5. E says:

    The big problem with the Google cookie is that it attaches a universal ID to anything done by your browser with any Google property – whether you choose to or not. Whatever the cookie may have been named is at best an indication of what it might have done at some point, and says nothing as to what additional uses the information it represents may currently serve or be put to later. Also, note that the cookie is for the entire domain, which, as I stated before, includes Google services that people do not choose to engage in, are very pervasive across the web, and end up recording many of the things that they do online. The only thing one receives in exchange is (possibly) better targeted ads.

    As far as Google’s data retention policies, they have already demonstrated their intention to retain the data they collect. One of their key technologies is aggregating inexpensive hardware to produce massive storage systems. As I understand it, they are currently putting about 3.5 petabytes of storage in a 40-foot container. At an estimate of 1 billion logged transactions a day (which may be low given 250M searches/day in early 2003), that represents about 2.5 years of logs in one trailer. Also, Google was the first to offer 1+ GB email, not simply out of hopes that people would not use the space, but because they simply could do it. In my experience, Google has taken the view that storage is cheap, getting cheaper by the week, and they might as well retain everything – probably an easy mindset to adopt once you’re doing things like caching significant amounts of the Internet’s content. We have already seen snapshots of what Google has retained thus far through last April’s introduction of their Personal History Feature. A number of people were surprised that suddenly they had access to a significant chunk of their past search history – they didn’t really consider that Google could & might choose to keep all of this information around. One article noted that the “service is designed to store years of each individual’s search activity.”

    As far as linking this cookie value to your actual identity, I would assert that in many cases doing so is not only possible, but simple. For those who have GMail accounts, the exercise is trivial. Accessing a GMail account from your laptop, desktop, and office connects them all. I am not suggesting that Google is doing so today with any nefarious purpose – what do they care about the details of Joe Schmoe #5334’s life? I am simply pointing out that it can be done by anyone with access to the pool of information assembled by Google.

    You point out that some or most of these practices are common in the industry – many websites perform IP logging and use cookies. If nothing else, I would suggest that the sheer scale of Google’s activities invites closer scrutiny. This is beyond the idea that one theoretically could correlate a super shopper card database with banking records and credit card records to figure out pretty much everything you’ve purchased. Google’s retained information pretty much points to everywhere one has gone online. Recently there was concern over states tracking cell phones for the purpose of monitoring traffic congestion, and promises were made that the collected data would be aggregated and anonymized for privacy purposes. Why should people be any less concerned about the much more thorough tracking of their movements in cyberspace performed by Google? Shouldn’t we figure out just what will be done when everything is not only logged, but also sits around indefinitely? What if Google is already at that point – or will be in the very near future? Should a party to civil litigation be able to obtain this information through discovery? It would be very helpful to have this mountain of information to impugn someone’s character for places visited online.

    Such possibilities remind me of the old email mantra: never send anything by email that you wouldn’t want to see on the front page of the newspaper. Should it be the same for what we do online? As it stands now, Google sees a lot of what people do online – it’s just a matter of whether they choose to remember it.

  10. Alt Text says:

    Back at it

  13. JD on MX says:

    Clearing my browser

    Clearing my browser: I’ve got a lot of tabbed windows open, with interesting material for which I lack significant personal additions, but can write a short summary for each… you may find something of interest in the following webpages this…

  15. oldfox says:

    Does the NYTimes recommend that Google be categorized as a news outlet like itself so it can then raise the veil of “newsman privilege” before cooperating with law enforcement agencies?

    The NYTimes has no such privilege (nor any obligation to restrict it’s own information collection activities) and why should Google be any different?

    The Times has lost it’s power of reason in a muddle of partisanship. They really have.

