Followups on the GBS Settlement

There have been some very smart comments on (and around) my previous post on the Google Book Search settlement. If you’re interested, you might want to see the comments section of that post, plus two good posts by Eric Kansa, one before and one after the recent GBS conference at Berkeley.

Most of my thoughts on the points Eric and others raise appear in the comments section of my last post (linked above). But I think maybe the gut-level difference is related to this passage from Eric’s second post:

The Google Books corpus is unique and not likely to be replicated (especially because of the risk of future lawsuits around orphan-works). This gives Google exclusive control over an unrivaled information source that no competitor can ever approximate.

If Eric’s right about this, then it’s critical to get as much public access as possible built into the settlement now, because we won’t have another shot at it. For reasons laid out in my previous post, though, I’m less pessimistic about the prospects for future competition. I think the settlement will make it easier for others to enter this space by providing both a template for negotiations with the authors and publishers and a strong antitrust incentive for the rightsholders to grant equal access.

Scanning is a big-ish project, no doubt, but not prohibitively so (witness the Open Content Alliance, as well as Microsoft’s former efforts, stopped more by fear of legal action than by lack of funds). This is especially true if it turns out there’s significant money to be made by doing it (and the objection, after all, is that the scanned corpus is an immensely valuable resource on which Google will be sitting). Plus, scanning will only get cheaper with time.

My own ideal case would be a combination of meaningful copyright reform (to clarify that scanning for indexical use doesn’t require permission from a rightsholder) and something like Dan Cohen’s proposal for a government (or Ivy League) -funded book-scanning “moon shot” to benefit society at large. Barring this (extremely unlikely, I think) outcome, by all means, let there be as many public-friendly provisions tacked onto the GBS settlement as possible. My point, though, is that even as it stands now, the settlement provides enough benefits to enough people that I’d rather have it go forward than not, and I’m optimistic that many of its shortcomings can and will be addressed (by competition, by legislation, by technological advances) in the short to medium term.

The alternative, just to be clear, is really bad: maybe no book search at all, from anyone (thanks to the unresolved legal questions), and certainly no search of anything outside the (fossilized) public domain. No research corpus. No free public terminals with millions of in-copyright books at libraries. And this situation would endure indefinitely, backed up by the very real example of a messy, expensive, status-quo-reinforcing failure.

Google and EPUBs

Google just announced that they’re making a million+ public-domain books downloadable in EPUB format. This is an improvement over the old situation, where you could download PDFs (sans OCRed text) of those books or read them in plain text online (one physical page at a time), but not download a small, well-OCRed text copy.

I’d be delighted if they went all the way to true plain text downloads. (And then let me download all the public-domain stuff in bulk. And gave me a pony.) But this is a nice improvement. In other news, I’d also be delighted if my Kindle supported EPUB natively.

Why I’m in Favor of the Google Book Search Settlement

When Google announced their book-scanning project five years ago, most academics I talked to about it were pretty happy. These days a lot of that enthusiasm seems, if not to have disappeared, then at least to have been tempered by serious doubts. I share some of these, but on the whole the settlement is a profoundly good thing. I support it, and I hope my colleagues will, too.

About the settlement

First, two notes: One on the underlying legal issue and one on what’s at stake. The publishers and authors (via the AAP and the Authors Guild, respectively) sued Google for alleged “massive copyright infringement” shortly after Google began scanning books from several prominent libraries. The theory is that because Google makes a copy of every book they scan, they require the rightsholders’ permission to assemble their book search database. Google says the process is covered by the fair use exception in copyright law and is no different from their Web search business, which also copies texts in order to index them. How this would be decided in court is unknown, mostly because the legal definition of fair use is extremely and deliberately vague.

But it’s clear that Google has a lot more to lose than do the publishers and authors if the case were to go against them. If the publishers were to lose, Google could index their stuff without further permission. But it’s hard to see how they’d be hurt by that, since it would only help people find books and wouldn’t change the strong basic copyright protections they already enjoy. Google still wouldn’t be able to sell or give away in-copyright books, for instance. Google, on the other hand, could be destroyed if they were to lose. They’d be on the hook for God knows how much in damages, of course (willful infringement of copyright carries maximum statutory damages of $150,000 per instance). But—and this is much more important—because there’s no fundamental difference between the copyright protections for Web pages and those for books, a decision in favor of the publishers would effectively outlaw search as it currently exists. Would a court dare do that? I have no idea, but Google obviously took the threat seriously enough to settle rather than to fight, especially since Web search is everything to them, whereas books are a comparative hobby. I wish Google had chosen to go to trial, because I think (and hope) they would have won, thereby clarifying and solidifying fair use rights in computational contexts, but it’s neither my money nor my business that’s at stake, and I understand why they chose to settle.

This dispute about fair use is interesting in its own right, but it’s not in itself the main objection to the settlement from most of my academic friends. (Most academics, though certainly not all, are in favor of more liberal fair use rights, and would therefore usually side with Google on copyright issues.) They’re concerned instead about a missed opportunity for real reform, and about the perceived market power the settlement would grant to Google and the rightsholders. How so? Not over works that are already in the public domain; these are free to copy and redistribute already, and there’s nothing in the settlement that would (or could) change that. Anyone else could create a competing database of public domain works (see the Open Content Alliance, for instance). And it’s not about current books, whether in or out of print, which the rightsholders are free to dispose of as they wish—they can be bought, sold, and licensed according to the whims of the publishers and authors. Again, nothing in the settlement could possibly change this, since to do so would involve rewriting American copyright law. The issue, then, is over so-called “orphan” works, books for which an appropriate rightsholder cannot be established or contacted.

Here’s how things stand now with respect to orphan works: They’re simply off limits for anything beyond ordinary fair use. They can’t be reissued, corrected, or adapted. You can’t assign them in a college course, because no one can produce a new edition and you can’t make copies of your own or your library’s (rare) copy. You can’t use an orphan sound or video clip in a new song or film. And, absent a real answer to the fair use question raised by Google’s scanning project, you can’t include them in a search tool, because you can’t get a rightsholder’s approval to do so. It would be an exaggeration to say orphan works may as well not exist—they still do sit in libraries and archives—but they’re a lot less useful than either public domain or current works.

The settlement would establish a “rights registry,” a clearinghouse tasked with identifying and tracking rightsholders (if any) and copyright status for all books. As a practical matter, “all” would mean “those scanned by Google,” at least at first. Google would pay $34.5 million to establish this registry, which would then operate as a non-profit and work on behalf of rightsholders, distributing whatever funds it collects to the appropriate parties. In exchange for setting up this registry and paying a chunk of cash ($125 million in all), the publishers and authors drop their copyright infringement claims (so Google can go on scanning). Maybe more importantly, as far as my uncomfortable academic friends are concerned, Google gets the right to scan, process, and sell orphan works, even though their proper rightsholders can’t be determined, and they get indemnity from lawsuits if they make honest mistakes about the copyright status of a work (and sell it or offer it for free when they shouldn’t, for instance). Rightsholders can opt out of this arrangement at any time, though of course they’ll then lose the benefits of being available through Google.

Some objections

This all looks pretty win-win. Google gets to do what they do, maybe opening up a big new market in the process, and they remove a significant legal cloud hanging over them. Publishers and authors get a pile of cash, a new outlet for their goods, and they get to sell a bunch of old stuff that’s currently out of print. Users win because they get a search and information resource that they wouldn’t otherwise have had.

The concern, though, is that Google is the only would-be scanner to benefit directly from the settlement. The settlement leaves unanswered the fair use question about book scanning. It leaves unchanged the status of orphan works, but allows Google alone (at least at first) to make use of them. And it gives two private, for-profit entities (the Authors Guild and the Association of American Publishers) control over the rights registry.

Wouldn’t it be better, these friends of mine say, to resolve these issues legislatively, so that the law would be clear and everyone would stand of level ground? Couldn’t we create a limited right to use orphan works, to store “non-consumptive” copies of texts for computational use, and set up a public rights registry? Wouldn’t that provide better and fairer competition in the marketplace? Absent those changes, don’t we risk creating a situation in which there are only two (cooperating) players (Google and the rightsholders) in the marketplace? Would any other company be able to negotiate an equivalent agreement with the rightsholders? Especially since those rightsholders wouldn’t have any incentive to help set up a competitive market for their products? Would any other company have the resources to scan millions of books, especially after Google has a head start on both the technical and the business sides? Isn’t this our one big chance to get scanning done right? Aren’t we missing a great opportunity to reform a badly out-of-whack U.S. copyright regime? And won’t libraries be almost required by their patrons to subscribe to Google’s digital products, available at only monopoly prices?

My answers

I share many of these concerns. But I still think we’ll be much, much better off with the settlement than without it. Here’s why:

Copyright reform

We do need copyright reform, including provisions for orphan works. But I don’t think we’ll ever get it, especially in the absence of the settlement. When has Congress ever scaled back any part of copyright protection? Is there any reason to think it will do so now or in the foreseeable future? Even if it were to, how long do you think we’ll have to wait for it, given our current political priorities, making no progress on things like book search and computational analysis rights in the interim?

Our current copyright regime—which allows for effectively endless copyright protection without any provision for an evolving public domain—is totally out of alignment with the social cost/benefit analysis that authorizes U.S. copyright law. I don’t think there’s any chance that’s going to change, but if the settlement is approved, there will at least be large, powerful, monied interests (cf. Microsoft, Amazon, and Yahoo, all of which recently [re-]joined the Open Content Alliance) lobbying to create specific provisions relaxing aspects of copyright control like those affecting orphan works and computational use. This differs from the current situation in which all the money and influence is on the other side. And they’ll have a legislator-friendly argument, namely that they’re just trying to compete in the marketplace on terms equal to Google’s. So far, they haven’t had to make this push, because no one has been making much money there. The settlement will change those incentives.

[Note in passing that the Berne Convention is always going to pose problems, since it’s built around absurdly strong European-style (“moral”) copyright provisions that prohibit things like registration requirements. The U.S. has never, of course, been especially keen on international agreements, but copyright protection is one of its long-standing hobby horses. It seems unlikely that the U.S. government would push for serious changes to Berne.]

An open market

There’s no reason to believe other entities won’t be able to enter the marketplace. The settlement provides only non-exclusive licenses to Google, and will serve as a ready-made template for a legal agreement between the rightsholders and any future scanners. Moreover, there would surely be serious antitrust scrutiny if the rightsholders were to withhold similar terms from others who wanted to enter the market. And why would they, really? More outlets means more differentiated products and more opportunities to sell their goods. Plus, with the registry already in place and both scanning and storage getting cheaper by the day, the barriers to entry are falling with time, not rising.

The status quo

What’s the alternative? If the settlement isn’t approved, no one can go ahead with any scanning projects. Not even those limited to the public domain (which, as noted, is less relevant by the day, because nothing new will ever fall into it); it would only take one mistaken scan of a protected work to expose a scanner to bankrupting litigation. Our current copyright system, written exclusively for content creators without even a nod to the public interest, will go on unchanged. And the public, academics and normal people alike, will have lost a terrifically promising resource, one assembled at significant cost and risk (if not with strictly altruistic motives) by a private company at almost no expense to us.

Library costs

Finally, libraries will, as always, have a choice to make about how they spend their subscription money, including whether or not to buy extended access to Google’s offerings. But they’ll already have free access (albeit at a single “terminal,” whatever that will mean in practice) to all of Google’s digital holdings. If prices are too high and they choose not to subscribe, they’ll still be better off than they were to begin with, since they’ll have one terminal with millions of in-copyright books, rather than none, as they do now. And how different is this situation from the one that holds with respect to commercial presses and journal publishers? Those publishers are already effective monopolies, and no one (alas!) seems to be suggesting legislation to change that fact. Do you think Google will be better or worse? How much do you pay for Google’s services now? Plus, if I’m right and other companies or not-for-profits enter the market, any monopoly concern disappears.


My argument here isn’t so different from the one progressives are now making about health care reform: The current situation is really, really bad. This plan makes things a lot better, with minimal downsides. I’d like real copyright reform as much as I’d like single-payer healthcare, but I think they’re about equally likely. So let’s not let the perfect be the enemy of the good.

Now, there’s a chance that a defeat for the settlement would be galvanizing in its own way, and that it would give rise to serious copyright reform. My own feeling is that if Eldred v. Ashcroft didn’t do it, nothing will. Maybe I’m wrong, but I’d much rather have Google Book Search and all it entails, plus the settlement-provided computational research corpus, a useful and well-funded rights registry (a significant public good), the plausible prospect of a thriving marketplace for digital texts and products based on them, and the first ever relaxation of at least a few copyright protections, than torpedo the settlement in hope of getting a marginally better legislative result that’s a huge longshot.