<s>No word yet from ACS so some of this is hypothetical – but they are communal hypotheses.</s>
It seems the spider trap is part of Atypon software (http://www.atypon.com). From their site:
Atypon delivers innovative solutions that revolutionize the way publishers and media organizations do business. Literatum, Atypon’s flagship ePublishing platform, provides all of the functionality that publishers need to compete in the digital world, including advanced search and information discovery, access control, e-commerce, marketing and business intelligence. Literatum hosts more than 17 million journal articles …
It’s run by “Georgios Papadopoulos, Founder and Chief Executive Officer”
Its clients include ACS, Elsevier, Informa, NewEngJMed, OUP, Taylor and Francis and 20 others. Interestingly some of these also appear to have the spider trap. From the info above it appears that the articles – including the OA ones – are hosted on Atypon. It’s therefore believable that the spider trap link was added by Atypon – whether the ACS knew about this we don’t know and wait for Darla.
The following – incredible – comment from Georgios Papadopoulos atypon.com appeared on my blog. I don’t believe it’s a spoof.
This is really funny. Tom Demeranville described the trap very acurately.
These LINKS (they are not DOIs!) are not visble or clickable. Only a (dumb) spider follows them.
You created such a dumb spider and you were scraping the content. You were not reading it or clicking on anything.
You were caught, but perhaps the funniest part of that was that then you also came up and exposed yourself. We usually never identify the writers of such crawlers.
If genuine, this is one of the most breathtakingly self-destructive statements from a CEO since Gerald Ratner described his products as “crap”. GP boasts of his cleverness but has utterly missed the point and revealed himself as completely out of touch.
So what about the spider trap that he and his company built? Well this afternoon the story of the Spider Trap hit Hacker News. I Promise I didn’t send it and I didn’t urge others to. Hacker news (Hacker is a positive term based on MIT usage) has news about all things geeky. They know about the web. What did they think of the Spider Trap? see https://news.ycombinator.com/item?id=7530712 here are some…
- It’s so technologically simple as to be useless against anyone who could deploy a web scraper in the first place.
- This has massive potential for abuse.
- That’s some level of incompetence – the trappers I mean. A half arsed solution because they couldn’t think of a better one.
- I am furious because the malice was implemented in the stupidest, most useless, laziest manner possible. It’s like keeping the neighborhood kids off your lawn by burying a pressure plate switch out there for the armed nuclear bomb in your garage. And then not telling anyone about it. And then inviting all the neighbors over for a croquet tournament.
The overwhelming consensus is that the spider trap was totally incompetent and highly dangerous. The URL could easily have been transformed and redistributed by software which simply edited HTML files. Hidden in mails. Even the existence of a simple URL that disables a whole university (yes – there are universities with only one IP) is unbelievable.
Here’s Tom Demeranville again – who applauds what Ross did as the quickest and most effective way of lancing this ugly boil… (my emphases)
I’ve just identified another serious spider trap that would cut universities not just from one publisher but whole swathes of them. As much as I’d like to share the link around for the LOLz, I think I better contact the owner first Damn.
I don’t think you can shoot the messenger here. Ross made everyone well aware of the dangers these traps pose. If it wasn’t for him I’d have not found what I’ve found. Speaking from experience I’ll also add that polite emails to publishers regarding bugs in their websites take approximately eleventy billion years to be actioned. This way of maximum publicity is the quickest way to get it fixed.
I’ve shown what the world thinks of Atypon’s spider trap. You can decide what you think of Atypon and its CEO. It’s caused massive public furore – most of it against ACS, Ross and me (probably in that order). It’s caused huge waste of time and effort – I hear that JISC, Nature Publishing Group, CrossRef were all cut off from ACS. Indeed if Pandora, I and Ross had not exposed it we might have had regular ACS outages indefinitely.
ACS have to stand up and tell us what’s going on.
If they don’t, this sort of thing will continue. If a foolish implementation is allowed to persist who knows what may happen?
2014-04-05:07-12 UTC: ACS have now posted a reply. This is competent, if relatively uninformative of details. Selected quotes:
ACS worked diligently to resolve the issue, and as of 4 PM EDT April 3, service was restored for all subscribers affected by this incident. Simultaneously, steps were taken to address the specific protocol that triggered this outage.
Employing the use of these types of tools is imperative to providing users with continued access to that trusted research. We will therefore continue to refine our security procedures to support evolving publishing access models while protecting both users and content from malicious activities.
The rest is either history I have provided or general stuff about ACS serving the community.
Are there more spider traps out there? Almost certainly. TomD has identified some. Will I go looking for them? Not as an activity in itself. Will others go looking for them? Judging by the tenor of Hacker News almost certainly. Will we hit further traps as we launch The Content Mine? Hopefully not, especially if they are labelled “Bomb”. Unlike Georgios Papadopoulos’ simplistic view we don’t just throw wget at the Web – we work out the semantics.
What is clear is that machine reading of the literature is now a legitimate mainstream activity. We can and will do it without “publishers’ API”. We shall continue to expose incompetent and dangerous publishing.