748065_piratesScribd is working hard to be the text version of YouTube. Upload some text, tag it, and let the world discover it. It isn’t just unpublished novels – many copyrighted textbooks are already there via unauthorized uploads.

Like YouTube, users can upload anything and the site isn’t under any legal obligation to screen for copyright protection. Copyright holders have to proactively scan and search for their content. Get it taken down today – it can be uploaded again tomorrow morning.

For example, Key Curriculum Press’s “Discovering Advanced Algebra” is available for anyone to download all 888 pages for free (and has been since last August when it was posted by “skihe63”). It has been accessed 8,898 times since it was loaded less than a year ago. It is an older copyright, but still – on Amazon it sells for $25. That is $222k in value.

This isn’t just Key Curriculum Press’s (fixed) problem – they are algebra texts from McGraw-Hill, University of Chicago Press, and many other publishers a click away.

"Gee – nice copyright you have there. Be a shame if anything happened to it."

What are Key Curriculum’s (fixed) options? They should submit a report to Scribd to remove the illegal copy (and hopefully they will – edit THEY DID). But that just solves it for this one instance – today.

The only way to get Scribd to protect your copyright is to cut them in on the action. Upload your content, and let users download the PDF for a fee which you share with Scribd. Then, and only then, will they screen for unauthorized copies.

So publishers are starting to up load their own content and cut Scribd in on the action – protection money if you will.

While Scribd has the Milleneum Copyright Act law on their side it sure feels like they are doing something immoral. Once they are on notice about a copyrighted material they should be obligated to screen for it. Key Curriculum’s (fixed) unambiguous notice is right on page 3 and it explicitly prohibits reproducing it, storing it, or transmitting it without permission.

How it Should Work

Technically, scanning text for a copyright notice is a trivial exercise. If TripIt can decode 95% of the confirmation emails I get from a wide variety of travel providers this is child’s play.

You would think that if Scribd wants publishers to be allies that they would scan for copyright notices and require up-loaders to explicitly acknowledge by publisher name that they have authorization (not some dense generic mealy mouthed legalese that is deliberately designed to be ignored). A dialog like “We detected that Key Curriculum Press holds a 2004 copyright on these materials – click here to acknowledge that you have the publishers written permission to transmit these materials to our servers. Violations of copyright law can expose you to personal liability.”
Hell – they could go a step further and provide a link to request permission from the named publisher.

But don’t hold your breath on Scribd doing the right thing on this score – there is money to be made in playing it fast and loose with other people’s property.

Conflicting Values

We know from market after market that increased exposure grows the market – but it often does so in ways that are highly disruptive to existing distribution mechanisms. The music business isn’t hurting right now – music publishers are. Movie attendance went UP after HBO went on-line in the early ’80’s.

The trick for publishers is to figure out how to surf this transition and to see opportunities in challenges. If we reflexively try to fight folks who are playing too cute with our property but who have the law on their side we are going to lose.

As a netizen I find my value for free and open access to information is coming into direct conflict with my publishing hat. I want the hard work put in by the writers, editors, designers, instructional designers and others involved in the creation of our intellectual property to be able to continue providing high value.

There are no easy answers – we will have to hire people to monitor this new world for us and our web marketing teams need to figure out how to align with sites like Scribd rather than fight them.

Hat tip to Tim McHugh at Saddleback for sharing their approach and thoughts on this topic with me.

Note: I mistakenly attributed the Algebra title in the piece originally to Pearson’s Modern Curriculum Press – I knew better and the good folks at Key Curriculum Press were good natured about it. The references have been fixed.


  1. Doug Stein says:

    Minor correction – “Discovering Advanced Algebra” is a Key Curriculum Press product (as you mentioned indirectly later in the article).

    Other than that, the point is well-taken. I think the technical justification for Scribd choosing to proactively scan only when a publisher signs up is that their technology scans using a reference document. That is, instead of an open-ended scan of all documents, it’s more of a hash-search of documents against a set of official publisher documents. This allows them to discover not only book-level, but also chapter or lesson-level piracy.

    I do share your concern that their business model smacks of a shady protection racket.

    How would I handle this if I were Scribd? I’d offer:
    1) a free service that allows the publisher to upload an official copy that would then be used to enforce copyright. At this level, the copy is *not* available for sale.

    2) a flat-fee service that provides statistics on number of attempted uploads as well as any previous downloads (since the publisher might only engage with Scribd after infringement has started).

    3) a percentage-based revenue share for electronic copies sold via Scribd. (Their current model for working with publishers.)

    Does this appropriately disinfect Scribd’s business model from anything smelly? It’s to Scribd’s benefit to present themselves as a marketplace that provides tools for concerned copyright owners to protect their property. Safe marketplaces (cops on the beat) are better attended by honest buyers and sellers.

  2. Doug Stein says:

    An update on Scribd’s automated copyright protection:
    See this FAQ:
    All content removed from Scribd via a DMCA copyright infringement takedown notice are added to our text-matching copyright protection system. This system makes a “fingerprint” of the copyrighted work and stores it in a database that’s inaccessible to the public. After a new document is uploaded, it is checked against the “fingerprints” in our copyright database. If there’s a significant match, the content is removed from Scribd.

    The text-matching system is still in development, and the sheer number of uploads each day means there is a lag time between upload and detection. But so far it’s been highly effective at detecting and blocking hundreds of unauthorized uploads every day.

    It therefore seems that if they’re a protection racket, they are also offering 100% off coupons (just submit a takedown letter and they’ll protect your work for ftee). We can still fault them for not clearly informing publishers about this capability.


  3. Alexey Verkhovsky says:

    People complaining about this business model automatically assume that hosting providers rake in money hand over fist from the infringing content.

    Few questions to ponder: how much content on Scribd / Youtube / Flickr etc is legitimate (i.e., not infringing? How big a share of revenue comes from infringing content? How does it compare with compliance costs (backoffice costs for processing DMCA notices, legal costs, maintenance of the automated copyright filtering systems, opportunity cost of management time etc)? Is it actually profitable?