ACSGate: has Atypon fallen into its own Publisher Spider Trap? and the ACS reply

<s>No word yet from ACS  so some of this is hypothetical - but they are communal hypotheses.</s>

It seems the spider trap is part of Atypon software (http://www.atypon.com). From their site:

Atypon delivers innovative solutions that revolutionize the way publishers and media organizations do business. Literatum, Atypon's flagship ePublishing platform, provides all of the functionality that publishers need to compete in the digital world, including advanced search and information discovery, access control, e-commerce, marketing and business intelligence. Literatum hosts more than 17 million journal articles ...

It's run by "Georgios Papadopoulos, Founder and Chief Executive Officer"

Its clients include ACS, Elsevier, Informa, NewEngJMed, OUP, Taylor and Francis and 20 others.  Interestingly some of these also appear to have the spider trap. From the info above it appears that the articles - including the OA ones - are hosted on Atypon. It's therefore believable that the spider trap link was added by Atypon - whether the ACS knew about this we don't know and wait for Darla.

The following - incredible - comment from Georgios Papadopoulos atypon.com  appeared on my blog. I don't believe it's a spoof.

This is really funny. Tom Demeranville described the trap very acurately.

These LINKS (they are not DOIs!) are not visble or clickable. Only a (dumb) spider follows them.
You created such a dumb spider and you were scraping the content. You were not reading it or clicking on anything.

You were caught, but perhaps the funniest part of that was that then you also came up and exposed yourself. We usually never identify the writers of such crawlers.

If genuine, this is one of the most breathtakingly self-destructive statements from a CEO since Gerald Ratner  described his products as "crap". GP boasts of his cleverness but has utterly missed the point and revealed himself as completely out of touch.

So what about the spider trap that he and his company built? Well this afternoon the story of the Spider Trap hit Hacker News. I Promise I didn't send it and I didn't urge others to. Hacker news (Hacker is a positive term based on MIT usage) has news about all things geeky. They know about the web. What did they think of the Spider Trap? see https://news.ycombinator.com/item?id=7530712 here are some...

  •  It's so technologically simple as to be useless against anyone who could deploy a web scraper in the first place.
  •  This has massive potential for abuse.
  • That's some level of incompetence - the trappers I mean. A half arsed solution because they couldn't think of a better one.
  •  I am furious because the malice was implemented in the stupidest, most useless, laziest manner possible. It's like keeping the neighborhood kids off your lawn by burying a pressure plate switch out there for the armed nuclear bomb in your garage. And then not telling anyone about it. And then inviting all the neighbors over for a croquet tournament.

The overwhelming consensus is that the spider trap was totally incompetent and highly dangerous. The URL could easily have been transformed and redistributed by software which simply edited HTML files. Hidden in mails. Even the existence of a simple URL that disables a whole university (yes - there are universities with only one IP) is unbelievable.

Here's Tom Demeranville again - who applauds what Ross did as the quickest and most effective way of lancing this ugly boil... (my emphases)

I’ve just identified another serious spider trap that would cut universities not just from one publisher but whole swathes of them. As much as I’d like to share the link around for the LOLz, I think I better contact the owner first :D Damn.

I don’t think you can shoot the messenger here. Ross made everyone well aware of the dangers these traps pose. If it wasn’t for him I’d have not found what I’ve found. Speaking from experience I’ll also add that polite emails to publishers regarding bugs in their websites take approximately eleventy billion years to be actioned. This way of maximum publicity is the quickest way to get it fixed.

I've shown what the world thinks of Atypon's spider trap. You can decide what you think of Atypon and its CEO. It's caused massive public furore - most of it against ACS, Ross and me (probably in that order). It's caused huge waste of time and effort - I hear that JISC, Nature Publishing Group, CrossRef were all cut off from ACS. Indeed if Pandora, I and Ross had not exposed it we might have had regular ACS outages indefinitely.

ACS have to stand up and tell us what's going on.

If they don't, this sort of thing will continue. If a foolish implementation is allowed to persist who knows what may happen?

2014-04-05:07-12 UTC: ACS have now posted a reply. This is competent, if relatively uninformative of details. Selected quotes:

ACS worked diligently to resolve the issue, and as of 4 PM EDT April 3, service was restored for all subscribers affected by this incident. Simultaneously, steps were taken to address the specific protocol that triggered this outage.

...

Employing the use of these types of tools is imperative to providing users with continued access to that trusted research. We will therefore continue to refine our security procedures to support evolving publishing access models while protecting both users and content from malicious activities.

The rest is either history I have provided or general stuff about ACS serving the community.

Are there more spider traps out there? Almost certainly. TomD has identified some. Will I go looking for them? Not as an activity in itself. Will others go looking for them? Judging by the tenor of Hacker News almost certainly. Will we hit further traps as we launch The Content Mine? Hopefully not, especially if they are labelled "Bomb". Unlike Georgios Papadopoulos' simplistic view we don't just throw wget at the Web - we work out the semantics.

What is clear is that machine reading of the literature is now a legitimate mainstream activity. We can and will do it without "publishers' API". We shall continue to expose incompetent and dangerous publishing.

This entry was posted in Uncategorized. Bookmark the permalink.

8 Responses to ACSGate: has Atypon fallen into its own Publisher Spider Trap? and the ACS reply

  1. Peter, in case you missed it, here's the official ACS response posted Friday afternoon: http://pubs.acs.org/page/announcements/20140404_announcement.html

  2. pm286 says:

    Thanks Darla,
    I have now included this and commented.

  3. fluorogrol says:

    Wow. When I saw that comment from Papadopoulos, I thought, 'Huh, a reading-impaired troll'.

    The CEO of a company that provides publishing platforms? How can he be so clueless? Is this combination of ignorance and arrogance what marks him out as CEO material?

  4. Just a quick update - still no word from the other Spider Trap operator I identified, but I'm picking it up again to see if I can get some movement.

    If the comment from Papadopoulos had been posted fifteen years ago it would have not raised an eyebrow - it wasn't that unusual back when servers were as big as fridges and slower than your iPhone. But the world has moved on both technologically and licence-to-science-wise and it's no longer appropriate.

  5. Ryan B. says:

    This post is written in a very dramatic fashion: "highly dangerous"; "a simple URL that disables a whole university". We're talking about a URL hidden in source code, where nobody is supposed to be looking, that has the potential to temporarily disable the university's access to one pubisher's content.

    What I don't understand is why researchers feel the need to go clicking around in the source code? And why should you be entitled to click anything you find hidden in source code and expect no consequences?

    I understand that researchers are frustrated by publishers that charge money for access to content. But given that reality, why is it such a shock and a horror that a publisher should want to protect their copyrighted content from screen scrapers and other malicious spiders?

  6. Ryan B. does not seem to be familiar with modern digital research techniques.

    May I suggest Ryan consult the 'Text and Data Mining report from the Expert Group' written for the European Commission: http://ec.europa.eu/research/innovation-union/pdf/TDM-report_from_the_expert_group-042014.pdf

    On page 11 you'll find this comment:
    "Scraping the World-wide web for data is today a familiar activity for the digitally
    literate researcher"

    It's a legitimate research activity that researchers need to do, in order to do rigorous research.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>