The Google Subpoena Case: A Google Victory

google.jpgOn Friday, Judge James Ware, a U.S. District Judge in San Jose, CA, issued a decision in Gonzales v. Google, Inc., No. CV 06-8006MISC JW (Mar. 17, 2006), the case involving a government subpoena for Google search queries. A few days before Judge Ware released his opinion, he stated that he would be ordering Google to turn over some information, though not everything that the government was demanding. Media reports indicated a victory for the government, as these headlines suggest: “Judge Siding With Feds Over Google Porn Subpoena” (AP) and “Google Faces Order to Give Up Records” (Boston Globe).

But Judge Ware’s written decision strikes me as much more of a victory for Google and privacy than for the government.

The subpoena was issued because the government wanted information for use in ACLU v. Gonzales, No. 98-CV-5591, pending in the Eastern District of Pennsylvania. That case involves a challenge by the ACLU to the Child Online Protection Act (COPA), 47 U.S.C. § 231. Google wasn’t even a party to that case, but the government suboenaed from Google (1) URL samples: “[a]ll URL’s that are available to be located to a query on your comapny’s search engine as of July 31, 2005” and (2) search queries: “[a]ll queries that have been entered on your company’s search engine between June 1, 2005 and July 31, 2005 inclusive.” Subsequently, the goverment narrowed its URL sample demand to 50,000 URLs and it narrowed its search query demand to all queries during a 1-week period rather than the two-month period mentioned above. Google still raised a challenge, and the government again narrowed its search query request for only 5000 entries from Google’s query log.

Under Federal Rule of Civil Procedure 26, a subpoena may be quashed if the “burden or expense of the proposed discovery outweighs its likely benefit.” The court (Judge Ware) began by analyzing the government’s request for a URL sample, pointing out the paucity of the government’s explanation for its need for the information. The court observed:

The Government’s disclosure of its plans for the sample of URLs is incomplete. The actual methodology disclosed in the Government’s papers as to the search index sample is, in its entirety, as follows: “A human being will browse a random sample of 5,000-10,000 URLs from Google’s index and categorize those sites by content” and from this information, the Goverment intends to “estimate . . . the aggregate properties of the websites that search engines have indexed.” The Government’s disclosure only describes its methodology for a study to categorize the URLs in Google’s search index, and does not disclose a study regarding the effectiveness of filtering software. Absent any explanation of how the “aggregate properties” of material on the Internet is germane to the underlying litigation, the Government’s disclosure as to its planned categorization study is not particularly helpful in determining whether the sample of Google’s search index sought is reasonably calculated to lead ot admissible evidence in the underlying litigation.

One would think, after reading this paragraph, that the government has failed to establish a justification for the URLs. Nevertheless, the court attempted to “imagine[]” and “envision” a possible use for the information the government is seeking. The court then concluded that it would “give[] the Government the benefit of the doubt.”

This was the partial victory that the government won, and it wasn’t a very big victory. The second half of the opinion was all Google. This latter part of the opinion dealt with the government’s demand for search queries — the part of its demand that implicated privacy. The court rejected the government’s request for the search queries — even after the government had repeatedly backed away from its initial demands. The government had begun by demanding two months worth of search queries (constituting millions of queries); it then backed down and demanded queries for just a one week period (a substantial number of queries); and it recently had further retreated to asking for just 5000 queries. This was a dramatic retreat, but the court still sent the government packing.

According to the government, it planned to use the search queries as follows: “A random sample of approximately 1,000 Google queries from a one-week period will be run through the Google search engine. A human being will browse the top URLs returned by each search and categorize the sites by content.” The court, without much analysis, concluded that “were the Government to run these URLs through the filtering software and analyze the results, the information sought would be reasonably calculated to lead to admissible evidence.” Although the court ultimately denied the government’s demand, I wonder whether the court should have so quickly conceded the government’s need for the information. Why couldn’t the government just create its own search queries and run them through Google’s search engine? Why did it need a sampling of people’s searches? It could certainly conduct a study of how various searches work with filtering software by using its own queries. Moreover, the fact that the government had begun with a wildly broad request and narrowed it significantly should at least spark some skepticism about whether the government was engaging on a fishing expedition.

The court then turned to the considerations on the other side — the costs and burdens of Google’s production of the information. Google argued that it would lose user trust if compelled to reveal the searches. The court began by using Google’s privacy policy against it, stating: “Google’s privacy policy does not represent to users that it keeps confidential any information other than ‘personal information.'” The court then noted:

However, even if an expectation by Google users that Google would prevent disclosure to the Government of its users’ search queries is not entirely reasonable, the statistic cited by Dr. Stark that over a quarter of all Internet searches are for pornography indicates that at least some of Google’s users expect some sort of privacy in their searches. The expectation of privacy by some Google users may not be reasonable, but may nonetheless have an appreciable impact on the way in which Google is perceived, and consequently the frequency with which users use Google.

The court concluded that the goverment did not need both the URL samples and the search queries, and it required only the disclosure of the URL samples but not the search queries. The court concluded that “the marginal burden of loss of trust by Google’s users based on Google’s disclosure of its users’ search queries to the Government outweighs the duplicative disclosure’s likely benefit to the Government’s study.”

Beyond Google’s argument about customer goodwill, the court also raised general privacy concerns as a public policy interest implicated by the subpoenas. In Rule 26, “considerations of the public interest, the need for confidentiality, and privacy interests are relevant factors to be balanced.” Gill v. Gulfstream Park Racing Association, 399 F.3d 391, 402 (1st Cir. 2005). The government argued that it was only requiring the text of the search queries entered, not the identies of the users who entered them, and that therefore there was no privacy interest. But the court concluded:

Although the Government has only requested the text strings entered, basic identifiable information may be found in the text strings when users search for personal information such as their social security numbers or credit card numbres through Google in order to determine whteher such information is available on the Internet. The Court is also aware of so-called “vanity searches,” where a user queries his or her own name perhaps with other information. . . . Thus, while a user’s serach query reading “[user name] stanford glee club may not raise serious privacy concerns, a user’s search for “[user name] thrid trimester abortion san jose,” may raise certain privacy issues as of yet unaddressed by the parties papers. This concern, combined with the prevalence of Internet serches for sexually explicit material — generally not information that anyone wishes to reveal publicly — gives this Court pause as to whether the search queries themselves may constitute potentially sensitive information.

Moreover, there is the problem of what I’ll call the subpoena two-step. Step One is using a subpoena to get a bunch of de-identified search queries. Then, if the government discovers search queries it deems “suspicious,” it can use a subpoena (Step Two) to get any identifying information (e.g., an IP address, which can be linked to a user’s identity via ISP records, and even sometimes a user’s name if a user has an account with Google). I blogged about this possibility in an earlier post on this case. The court also appeared to recognize this problem:

Even though counsel for the Government has assured the Court that the information received will only be used for the present litigation, it is conceivable that the Government may have an obligation to pursue information received for unrelated litigation purposes under certain circumstances regardless of the restrictiveness of a protective order. The Court expressed this concern at oral argument as to queries such as “bomb placement white house,” but queries such as “communist berkeley parade route protest war” may also raise similar concern. In the end, the Court need not express an opinion on this issue because the Government’s motion is granted only as to the sample of URLs and not as to the log of search queries.

The court explicitly did not address the ECPA argument made by Google.

Overall, I view this opinion as a victory for information privacy. The government was not entitled to obtain information about people’s search queries — even the much more narrow request of a sample of 5000 of them.

UPDATE: Philipp Lenssen at Google Blogoscoped points out: “In retrospect, this decision also shows that MSN, Yahoo and others gave away their search logs even when they didn’t have to – even when they could’ve successfully and legally opposed to it.”

Related Posts:

1. Solove, Do No Evil and Perhaps Do Some Good: Google, Privacy, and Business Records

2. Solove, Government vs. Google

3. Solove, Google’s Empire, Privacy, and Government Access to Data

4. Solove, Google’s New Privacy Policy

You may also like...

17 Responses

  1. Dissent -there’s too much assuming of the conclusion. Part of it has to do with the fact that the reasoning is obscure. But that doesn’t make it sinister or pretextural. That part of the ruling is interesting:

    “Additionally, this is not a case where the Government does not have the benefit of any information with which to form some basic methodology –the Government has already been to the pond and fished, so to speak, with data from AOL, Yahoo, and Microsoft, and it would not have been unreasonable at this stage to have required the Government to assist the Court in its determination of relevance by providing the Court with more information on its plans for the information sought from Google”

    That is, if it was a fishing expedition, THEY’VE FISHED ALREADY!!! In multiple ponds. So that would seem to be ruled out.

    I think the Govt. just doesn’t want to tip its hand as to its legal strategy.

  2. David Zaring says:

    Uh, not to be critical of my generous hosts, but although the government did narrow its initial search request, as one does in these sorts of cases, I note that its narrow search request was granted. Frankly, as it should have been – we’re past the days of fishing expeditions claims in discovery- that’s what protective orders are for.

  3. David writes: “Uh, not to be critical of my generous hosts, but although the government did narrow its initial search request, as one does in these sorts of cases, I note that its narrow search request was granted.”

    No, that’s not quite correct. As I said in my post, there were two prongs to the government’s request: (1) 50,000 URLs and (2) 5000 search queries. Discovery for the URLs was granted but NOT discovery of the queries. The request for the queries was the part of the government’s request that raised privacy concerns and was the focus of most of the media attention when the government first made the request (in much broader form). And with regard to the 5000 search queries request, part of the government’s argument was that it would be protected by a protective order, but the court still didn’t grant the discovery request.

  4. But the Court didn’t grant the request because they called it “duplicative”, which seems a much smaller victory than one might get the impression from secondary sources:

    “From this Court’s interpretation of the Government’s general statements of purpose for the information requested, both the sample of URLs and the set of search queries are aimed at providing a list of URLs which will be categorized and run through the filtering software in an effort to determine the effectiveness of filtering software as to certain categories. Both sources of the URL “test set” list seem to be open to the same sorts of criticism by Plaintiffs in the underlying litigation. The content of these objections are not germane to the Court’s determination of whether the information sought is relevant under the broad dictates of Rule 26, but the actual similarity of the two categories of information sought in their presumed utility to the Government’s study indicates that it would be unreasonably cumulative and duplicative to compel Google to hand over both sets of proprietary information. To borrow the Government’s vivid analogy, in order to aid the Government in its study of the entire elephant, the Court may burden a non-party to require production of a picture of the elephant’s tail, but it is within this Court’s discretion to not require a non-party to produce another picture of the same tail. Faced with duplicative discovery, and with the Government not expressing a preference as to which source of the test set of URLs it prefers, this Court exercises its discretion pursuant to Rule 26(b)(2) and determines that the marginal burden of loss of trust by Google’s users based on Google’s disclosure of its users’ search queries to the Government outweighs the duplicative disclosure’s likely benefit to the Government’s study.”

  5. Seth — the search queries were the part of the request that posed the greatest concern for privacy. In this regard, there is a difference between the search queries and the URLs. For the purposes of the government’s study, there may not have been much difference, but from the standpoint of privacy, there is a significant one.

    I basically read the opinion as saying politely that the government’s request is unnecessary, but the court will throw the government a tiny bone and deny the demand for information that can implicate people’s privacy. Even before the opinion, the government had already practically backed down — from requiring two months worth of searches (millions, perhaps billions of searches) to requiring just 5000. I see this as the government saying, in a sense: “Please, please just give us something — we’re now only asking for just a miniscule fraction of what we wanted.” And even this small scaled-back request was still denied. But the court is being somewhat nice about it — it provides the government the 50,000 URLs, which do not strike me as all that contentious or even all that important — even though the court has all but held that the government really hasn’t made a good justification for the need. In other words, the court seems to be throwing the government a bone so it can save a tiny bit of face.

    But in the big picture, the court seems to be suggesting that this is as far as the government can reach, and it isn’t very far. The government had originally asked for all the URLs in Google’s database (as of July 2005) and all search queries in a 2-month span. The extent to which the government backed down is extraordinary. And then it still lost partially in the case. So comparing what the government had wanted to what it ultimately got, the government did not do well at all. And I read the court’s opinion to be only very reluctantly giving the government it’s bone — had the government not backed down and scaled back its requests, I bet that there’s a chance that the government might have lost entirely.

  6. Daniel – Interestingly, I see the decision almost entirely the reverse. I see the court as basically deferring to the government, but tossing *Google* a tiny bone on the issue of the *perception* of privacy, so that *Google* can save face.

    My view is that the first request for all the data was basically out of ignorance. They’re lawyers (no offense meant!) they had no idea it was so absurdly large. Previously, they’d gone down to one million URLs and “a random sampling of one million search queries submitted to on a given day”. Then they got into some posturing and went up on the demands. But from day one, this has only been about a statistics study, never, ever about particular users’ behavior. They’ve never wanted to invade anybody’s privacy, and have all sort of protective orders for that purpose. Note in fact the government is now insulated from certain objections to the study, because they can say that they tried to do better in terms of sampling, but other constraints required them to modify the methodology, so they’ve produced the best evidence legally possible.

    “But in the big picture, the court seems to be suggesting that this is as far as the government can reach, and it isn’t very far.”

    Again, I view the court as saying the exact opposite:

    “Nothing in this Order is intended to indicate how the Court would rule on the original broad subpoena or on any follow-up subpoena. The Court’s decision on this Motion to Compel reflects the limited use to which the Government intends to put the information produced in response to the subpoena. In particular, this Order does not address the Plaintiffs’ concern articulated at the hearing about the appropriateness of the Government’s use of the Court’s subpoena power to gather and collect information about what individuals search for over the Internet.”

    I view that as saying the government might be able to get more if they made a better argument, and that the judge is specifically *NOT* making an overall privacy ruling, especially when combined with the “duplicative” reasoning.

    Thus regarding: “So comparing what the government had wanted to what it ultimately got, the government did not do well at all.”

    I very strongly don’t think the original request is a valid basis for comparison. It’s like the civil lawsuits where the original demand is a zillion dollars in damages – it’s just for show, and each side knows it’s almost certainly not going to be the ultimate outcome (and here, I think the government didn’t even know they really didn’t want it, and couldn’t handle it if they got it!).

  7. Well, Seth, I guess we’ll just have to disagree on this one. I read the court’s caveats over the reach of its decision as basically saying: “It’s a balancing, so this opinion is not a statement that privacy always wins.” So yes, one cannot use this opinion in cases where the government’s justification is different, because the test is a balancing one, and the court’s decision is based in part on the weakness of the government’s side of the balance.

    I don’t find the analogy to civil damage demands to be apt. This does not strike me as a case of aiming high to improve one’s negotiating posture. I don’t think that the government’s initial demands were just for show.

    And in the end, the government didn’t really get much in this case. I find it very hard to interpret the result as a victory for the government.

  8. I think the initial demands were more out of ignorance, but it’s *like* the civil damages posturing over amounts. That is, a comparison that they asked for 4? 8? billion URLs, got only 50,000 URLs, is like so-and-so asked for 4? 8? billion dollars in damages, got $50,000 dollars in damages. Pushing this metaphor, one could analogize the user searches to requests for punitive damages. I’d argue the proof that the big demand was just their starting point, is that even before this went to court, the government was willing to go down to one million URLs on the database, and down on the searches. It seems a lot more sensible to me that the first big request was because they didn’t really understand it, and maybe figured they could go down (as they did) much more easily than go up.

    The victory I see for the government is that the government was able to get Google to contribute to its study, with giving almost no justification. Remember, *critically*, the big war is about the Child Online Protection Act (COPA) law, NOT, NOT, NOT, about search engine user privacy (though I’d definitely count the outcome as a *PR* victory for Google).

  9. David Zaring says:

    Sorry – that’s correct about the queries, but I don’t think the government isn’t considering the ruling a victory. Full disclosure: I used to work on this case for said government. But I’m with Seth on this one, despite the nice skeptical language in the opinion, the result was consistent with that turned over by the other search engines.

    But to make a larger point, privacy is a very flexible term to use to avoid discovery – look how much useful information it swallows up in the discovery-like FOIA. To tweak my very brilliant host some more: would it even be a good idea to have third party privacy protected in discovery?

  10. David Zaring says:

    That is, protecting third party privacy as a general matter in cases like this one, and through anything other than a protective order (which are controversial in their own right, see Arthur Miller on that, but strike me as less problematic than the denial of discovery of material that’s going to tell us something of interest about the internet).

  11. David — I can’t quite tell where your descriptive analysis ends and your normative analysis begins. Descriptively, the government got a ton less than it originally had wanted, even under the very permissive standards for subpoenas. The key part of the battle was the search queries, not the URLs, and the number of search queries turned over was zero.

    Regarding the normative issues, I do find third party privacy to be very important and worth protecting, especially when the entity seeking the information is the government. I disagree with Dionisio, the Supreme Court case holding that subpoenas are not searches under the Fourth Amendment. I also disagree with Miller, Smith, and the other third party doctrine cases. I won’t rehash my arguments here, but you can read them in my book, The Digital Person, or in this paper.

    What I find interesting about this case is that the standards for allowing discovery of information pursuant to a subpoena are remarkably permissive (far too permissive in my view) — and yet the court still denies the government’s search query request even after the government has whittled down its request to nearly nothing.

  12. David Zaring says:

    Fair enough – descriptively we disagree. Normatively we might as well. I’m not sure how relevant it would be to the constitutionality of COPA, but, as someone interested in the internet, I’d be interested in seeing the results of the study that these subpoenas will permit to happen. Let’s hope that I – and everyone with similar interests – gets to see the results.

  13. David,

    I’d be very interested in an explanation for why the government needed the information. If the rationale was to test filtering software, why couldn’t it have found URLs on its own and created its own search queries? I’m really straining to understand why the government even bothered to make the requests in the first place. And why did the government want such a large amount of information initially and then back down? This undercuts its credibility to some extent in its argument that it needs the information.

    In other words, why can’t the study be conducted without the information from Google? The government’s request makes little sense to me.

    Anyway, the reason why descriptively I believe that the case was a victory is because subpoenas are rarely quashed, and the subpoena standards are very pro-disclosure.

  14. David writes: “I’m not sure how relevant it would be to the constitutionality of COPA, but, as someone interested in the internet, I’d be interested in seeing the results of the study that these subpoenas will permit to happen. Let’s hope that I – and everyone with similar interests – gets to see the results.”

    That seems to me to be a problematic justification to use the subpoena power to gather information — to conduct a study that would be of interest to people. I’d sure like the power to commandeer Microsoft’s software team to make software I want, or to commandeer another company’s information to conduct empirical research for me. I can dream up a lot more cool research projects and studies . . . I wanna go work for the DOJ so I can find some way to force companies to help me out on them!

  15. Regarding: “The key part of the battle was the search queries”

    Actually, I’m not sure that’s true for the *government’s* *goals* *in* *COPA*. That is, it’s pretty interesting that they gave almost no *reason* why they needed it apart from sampling. It would be amazingly anti-climactic if it turned out that the search queries are overkill.

  16. On second thought, I should back off from that. It would be better to say there’s two parts to what the goverment is trying to study, call them “existence” and “popularity”. And “popularity” (which needs the search queries) is probably the more important part of the two.

    I don’t think the judge really grasped what the study is trying to do, which is understandable given that the government was so vague, which is also understandable given that they don’t want to lay out their strategy.

    I suppose if it’s really important, we may see another round of this case later on.

  17. David Zaring says:

    Okay, I can’t resist one more comment. I agree with Seth’s comments above about the existence and popularity being a possible result of the study. Dan and I don’t completely disagree on the oddness of broad subpoena powers, but I think it could be less bad than requirements that subpoenas be targeted and their purpose be directly relevant to whatever (though that would certainly be good for the lawyers). And I don’t think the government is comandeering anything here – it’s analyzing information that Google wanted to keep private. Just because the government could have performed other kinds of analysis doesn’t mean that it shouldn’t have been allowed to perform this kind of analysis. And with that, I cede the floor.