Why I’m in Favor of the Google Book Search Settlement

When Google announced their book-scanning project five years ago, most academics I talked to about it were pretty happy. These days a lot of that enthusiasm seems, if not to have disappeared, then at least to have been tempered by serious doubts. I share some of these, but on the whole the settlement is a profoundly good thing. I support it, and I hope my colleagues will, too.

About the settlement

First, two notes: One on the underlying legal issue and one on what’s at stake. The publishers and authors (via the AAP and the Authors Guild, respectively) sued Google for alleged “massive copyright infringement” shortly after Google began scanning books from several prominent libraries. The theory is that because Google makes a copy of every book they scan, they require the rightsholders’ permission to assemble their book search database. Google says the process is covered by the fair use exception in copyright law and is no different from their Web search business, which also copies texts in order to index them. How this would be decided in court is unknown, mostly because the legal definition of fair use is extremely and deliberately vague.

But it’s clear that Google has a lot more to lose than do the publishers and authors if the case were to go against them. If the publishers were to lose, Google could index their stuff without further permission. But it’s hard to see how they’d be hurt by that, since it would only help people find books and wouldn’t change the strong basic copyright protections they already enjoy. Google still wouldn’t be able to sell or give away in-copyright books, for instance. Google, on the other hand, could be destroyed if they were to lose. They’d be on the hook for God knows how much in damages, of course (willful infringement of copyright carries maximum statutory damages of $150,000 per instance). But—and this is much more important—because there’s no fundamental difference between the copyright protections for Web pages and those for books, a decision in favor of the publishers would effectively outlaw search as it currently exists. Would a court dare do that? I have no idea, but Google obviously took the threat seriously enough to settle rather than to fight, especially since Web search is everything to them, whereas books are a comparative hobby. I wish Google had chosen to go to trial, because I think (and hope) they would have won, thereby clarifying and solidifying fair use rights in computational contexts, but it’s neither my money nor my business that’s at stake, and I understand why they chose to settle.

This dispute about fair use is interesting in its own right, but it’s not in itself the main objection to the settlement from most of my academic friends. (Most academics, though certainly not all, are in favor of more liberal fair use rights, and would therefore usually side with Google on copyright issues.) They’re concerned instead about a missed opportunity for real reform, and about the perceived market power the settlement would grant to Google and the rightsholders. How so? Not over works that are already in the public domain; these are free to copy and redistribute already, and there’s nothing in the settlement that would (or could) change that. Anyone else could create a competing database of public domain works (see the Open Content Alliance, for instance). And it’s not about current books, whether in or out of print, which the rightsholders are free to dispose of as they wish—they can be bought, sold, and licensed according to the whims of the publishers and authors. Again, nothing in the settlement could possibly change this, since to do so would involve rewriting American copyright law. The issue, then, is over so-called “orphan” works, books for which an appropriate rightsholder cannot be established or contacted.

Here’s how things stand now with respect to orphan works: They’re simply off limits for anything beyond ordinary fair use. They can’t be reissued, corrected, or adapted. You can’t assign them in a college course, because no one can produce a new edition and you can’t make copies of your own or your library’s (rare) copy. You can’t use an orphan sound or video clip in a new song or film. And, absent a real answer to the fair use question raised by Google’s scanning project, you can’t include them in a search tool, because you can’t get a rightsholder’s approval to do so. It would be an exaggeration to say orphan works may as well not exist—they still do sit in libraries and archives—but they’re a lot less useful than either public domain or current works.

The settlement would establish a “rights registry,” a clearinghouse tasked with identifying and tracking rightsholders (if any) and copyright status for all books. As a practical matter, “all” would mean “those scanned by Google,” at least at first. Google would pay $34.5 million to establish this registry, which would then operate as a non-profit and work on behalf of rightsholders, distributing whatever funds it collects to the appropriate parties. In exchange for setting up this registry and paying a chunk of cash ($125 million in all), the publishers and authors drop their copyright infringement claims (so Google can go on scanning). Maybe more importantly, as far as my uncomfortable academic friends are concerned, Google gets the right to scan, process, and sell orphan works, even though their proper rightsholders can’t be determined, and they get indemnity from lawsuits if they make honest mistakes about the copyright status of a work (and sell it or offer it for free when they shouldn’t, for instance). Rightsholders can opt out of this arrangement at any time, though of course they’ll then lose the benefits of being available through Google.

Some objections

This all looks pretty win-win. Google gets to do what they do, maybe opening up a big new market in the process, and they remove a significant legal cloud hanging over them. Publishers and authors get a pile of cash, a new outlet for their goods, and they get to sell a bunch of old stuff that’s currently out of print. Users win because they get a search and information resource that they wouldn’t otherwise have had.

The concern, though, is that Google is the only would-be scanner to benefit directly from the settlement. The settlement leaves unanswered the fair use question about book scanning. It leaves unchanged the status of orphan works, but allows Google alone (at least at first) to make use of them. And it gives two private, for-profit entities (the Authors Guild and the Association of American Publishers) control over the rights registry.

Wouldn’t it be better, these friends of mine say, to resolve these issues legislatively, so that the law would be clear and everyone would stand of level ground? Couldn’t we create a limited right to use orphan works, to store “non-consumptive” copies of texts for computational use, and set up a public rights registry? Wouldn’t that provide better and fairer competition in the marketplace? Absent those changes, don’t we risk creating a situation in which there are only two (cooperating) players (Google and the rightsholders) in the marketplace? Would any other company be able to negotiate an equivalent agreement with the rightsholders? Especially since those rightsholders wouldn’t have any incentive to help set up a competitive market for their products? Would any other company have the resources to scan millions of books, especially after Google has a head start on both the technical and the business sides? Isn’t this our one big chance to get scanning done right? Aren’t we missing a great opportunity to reform a badly out-of-whack U.S. copyright regime? And won’t libraries be almost required by their patrons to subscribe to Google’s digital products, available at only monopoly prices?

My answers

I share many of these concerns. But I still think we’ll be much, much better off with the settlement than without it. Here’s why:

Copyright reform

We do need copyright reform, including provisions for orphan works. But I don’t think we’ll ever get it, especially in the absence of the settlement. When has Congress ever scaled back any part of copyright protection? Is there any reason to think it will do so now or in the foreseeable future? Even if it were to, how long do you think we’ll have to wait for it, given our current political priorities, making no progress on things like book search and computational analysis rights in the interim?

Our current copyright regime—which allows for effectively endless copyright protection without any provision for an evolving public domain—is totally out of alignment with the social cost/benefit analysis that authorizes U.S. copyright law. I don’t think there’s any chance that’s going to change, but if the settlement is approved, there will at least be large, powerful, monied interests (cf. Microsoft, Amazon, and Yahoo, all of which recently [re-]joined the Open Content Alliance) lobbying to create specific provisions relaxing aspects of copyright control like those affecting orphan works and computational use. This differs from the current situation in which all the money and influence is on the other side. And they’ll have a legislator-friendly argument, namely that they’re just trying to compete in the marketplace on terms equal to Google’s. So far, they haven’t had to make this push, because no one has been making much money there. The settlement will change those incentives.

[Note in passing that the Berne Convention is always going to pose problems, since it’s built around absurdly strong European-style (“moral”) copyright provisions that prohibit things like registration requirements. The U.S. has never, of course, been especially keen on international agreements, but copyright protection is one of its long-standing hobby horses. It seems unlikely that the U.S. government would push for serious changes to Berne.]

An open market

There’s no reason to believe other entities won’t be able to enter the marketplace. The settlement provides only non-exclusive licenses to Google, and will serve as a ready-made template for a legal agreement between the rightsholders and any future scanners. Moreover, there would surely be serious antitrust scrutiny if the rightsholders were to withhold similar terms from others who wanted to enter the market. And why would they, really? More outlets means more differentiated products and more opportunities to sell their goods. Plus, with the registry already in place and both scanning and storage getting cheaper by the day, the barriers to entry are falling with time, not rising.

The status quo

What’s the alternative? If the settlement isn’t approved, no one can go ahead with any scanning projects. Not even those limited to the public domain (which, as noted, is less relevant by the day, because nothing new will ever fall into it); it would only take one mistaken scan of a protected work to expose a scanner to bankrupting litigation. Our current copyright system, written exclusively for content creators without even a nod to the public interest, will go on unchanged. And the public, academics and normal people alike, will have lost a terrifically promising resource, one assembled at significant cost and risk (if not with strictly altruistic motives) by a private company at almost no expense to us.

Library costs

Finally, libraries will, as always, have a choice to make about how they spend their subscription money, including whether or not to buy extended access to Google’s offerings. But they’ll already have free access (albeit at a single “terminal,” whatever that will mean in practice) to all of Google’s digital holdings. If prices are too high and they choose not to subscribe, they’ll still be better off than they were to begin with, since they’ll have one terminal with millions of in-copyright books, rather than none, as they do now. And how different is this situation from the one that holds with respect to commercial presses and journal publishers? Those publishers are already effective monopolies, and no one (alas!) seems to be suggesting legislation to change that fact. Do you think Google will be better or worse? How much do you pay for Google’s services now? Plus, if I’m right and other companies or not-for-profits enter the market, any monopoly concern disappears.


My argument here isn’t so different from the one progressives are now making about health care reform: The current situation is really, really bad. This plan makes things a lot better, with minimal downsides. I’d like real copyright reform as much as I’d like single-payer healthcare, but I think they’re about equally likely. So let’s not let the perfect be the enemy of the good.

Now, there’s a chance that a defeat for the settlement would be galvanizing in its own way, and that it would give rise to serious copyright reform. My own feeling is that if Eldred v. Ashcroft didn’t do it, nothing will. Maybe I’m wrong, but I’d much rather have Google Book Search and all it entails, plus the settlement-provided computational research corpus, a useful and well-funded rights registry (a significant public good), the plausible prospect of a thriving marketplace for digital texts and products based on them, and the first ever relaxation of at least a few copyright protections, than torpedo the settlement in hope of getting a marginally better legislative result that’s a huge longshot.

9 thoughts on “Why I’m in Favor of the Google Book Search Settlement

  1. Thanks, Matthew. This is an interesting and new series of arguments. I’m so accustomed to negative reviews of the Google Book Search settlement, that your positive review comes as a refreshing surprise.

    There are many aspects of the settlement that worry me, but the one I really cannot figure out is the single terminal stipulation. How does limiting GBS to a single terminal in a library benefit Google? It certainly does not benefit library users.

    Again, thanks. I will be thinking over your lucid arguments.

    • Oh, I don’t think it benefits Google at all. I don’t think Google cares. That one *must* be the rightsholders’ demand.

      I think it’s worth bearing in mind that Google is the party on the defensive in the settlement; they’re being sued and are making concessions (to the rightsholders) in order to have the suit dropped.

      • I agree with Matt that the provisions for the Google BS-devoted library terminal is no doubt a stipulation that stems from the rightsholders. Still, it is one of the prime issues that concerns me about the settlement as well as the burden of reporting that it seems to place on libraries.

        As Matt details below, I too do not see much threat from the settlement in terms of privacy, competition, and “ownership” of facts.

    • The linked article from Eric Kansa is well worth a read. He raises three main objections to the GBS settlement:

      1.) Privacy
      2.) Competing products
      3.) Rights to extracted/discovered facts and ideas

      My thoughts:

      1.) The privacy issues don’t really bother me, though I can see how they make others uncomfortable. I haven’t seen anything to suggest Google has misused the vast piles of data they’ve collected in other contexts. Again, I can see how other people are put off by the very existence of all this information in the hands of a corporation, but it just isn’t much of an issue for me. If Google wants to serve me ads based on my use of GBS, well, (a.) that doesn’t bother me, and (b.) I assume Adblock Plus will continue to nuke them, just as it does now :)

      2.) Competing products. The settlement is clear that you can publish and redistribute the results of your research on the corpus, even in commercial contexts like for-profit journals. If, on the other hand, I want to set up my own book search service, I don’t expect Google to help me do that for free.

      The EFF argument linked from Eric’s comment makes the case that it’s a waste of resources for every new competitor in the market to rescan every book, and suggests establishing an escrow entity to hold the existing scans for fourteen years before releasing them to anyone who pays for a blanket license akin to the one Google’s getting through the settlement. I don’t see a lot of value in this; fourteen years is far too long to help establish a market now, and I doubt the current scans will look especially useful a decade and a half from now. But it couldn’t hurt, especially for individual researchers, though I imagine that most research work, especially in 2023, will be done with a derivative corpus, not the original scans.

      3.) I don’t see how anything in the settlement changes the ownership of facts and ideas extracted from the research corpus. The only provisions that could possibly be construed that way (to which Eric points on p. 82) concern take-downs of “competing services,” not research results. Maybe Eric and others fear that Google and/or the publishers will construe ordinary research results as “competing services,” but I think that’s pretty effectively covered in the settlement. As an i-school person, he’s maybe more likely than I am to butt up against “service” issues. But I still don’t really see the problem; the settlement says you’re not entitled to Google’s database for purposes other than research. That strikes me as fair.

      • I don’t think the distinction between research results and competing services is as clear as you make it out to be. A specialist academic publisher might very well want to create an “allusion index” to poetry archives and charge academic libraries for subscriptions to use the index. That kind of service seems very close to the kind of work many digital humanists are doing.

        More generally, Clancy’s insistence on a clear distinction between extracting “facts” from the book corpus vs. making “inferences” based on the book corpus, arguing that while the former might be restricted the latter would not, strikes me as untenable. Sure, we might be able to identify clear cases of fact-extraction or inference-making at the edges, but there is a large gray area in the middle. I imagine that the thought of copyright owners’ lawyers frolicking in that gray area might make researchers think twice before making too large of an investment in using the GB corpus.

        • Good points – I really wish I’d been at the symposium to hear more about this directly. My reading of the settlement is that there’s no provision for any kind of interactivity for third parties, i.e., that you couldn’t set up any kind of Web app, say, to run queries on behalf of other users. So that’s one whole class of potential uses that’s out (alas, though not surprisingly). Did Dan Clancy say anything about access to the public-domain part of the corpus? I’d imagine these might be more loosely held, but I haven’t heard anything specific about it.

          I’m not quite sure what an “allusion index” might be – are you thinking of it as analogous to the index at the back of a printed book, hence as something a researcher might generate once (for the whole corpus) and them make available as either a scholarly or a commercial product? I can see how that would be a gray area in terms of the settlement – is it clear that such a thing would be a piece of scholarship? Would it make a difference if you published illustrative parts of it rather than the whole thing? How much (and this is an honest, I-have-no-idea kind of question) would it impact the scholarly value of such an index if you couldn’t make the whole thing public?

          But (and this seems important to me), isn’t the research corpus projected to be only a subset of the full GBS corpus? If so, it seems that would diminish the competitive concerns of Google and the publishers over something like the hypothetical allusion index. If that index only covers a representative sample of all the texts, could anyone use it to compete with the core functions of GBS?

          Last (and again a serious question, not a rhetorical one), since facts can’t be copyrighted, how much control could Google or the publishers exert over facts derived from the research corpus?

Leave a Reply to Ryan Shaw Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s