Gigascience tweeted that they were studying my suggested principles for data repositories – which I shall now amend to DataSharers. I’d heard vaguely about GigaScience on the blogosphere but not paid huge attention as their datasets are large and I am more interested in the long tail. But as they are at least interested in me I will have a look at them. In what follows I am probably simply ignorant so corrections are welcome.
I am taking my information from: http://www.gigasciencejournal.com/about which seems to have some relation with Biomed Central, though nowhere is this very explicit. The tweetfeed comes from Shenzhen, China and the editorial board is from the BGI (http://en.wikipedia.org/wiki/Beijing_Genomics_Institute ). After some time I have now found a press release: http://www.eurekalert.org/pub_releases/2011-07/bc-fde070611.php which is clearer:
BioMed Central and BGI launch a new integrated database and journal, to meet the needs of a new generation of biological and biomedical research as it enters the era of “big-data.”
GigaScience, an innovative new journal and integrated database to be launched by BioMed Central in November 2011, has released their first datasets to be given a Digital Object Identifier (DOI). This enables a long-needed way to properly recognize the data producers who have provided an untold number of essential resources to the entire research community. This not only promotes very rapid data release, but also provides easy access, reuse, tracking, and most importantly permanency for such datasets. The journal is being launched by a collaboration between BGI, the world’s largest genomics institute, and open access publisher BioMed Central, a leader in scientific data sharing and open data.).
Head of Public Relations, BioMed Central
PMR: Recommendation. If you are launching a new journal make it clear on the web page what the publishing organization is. It took me 15+ minutes of trawling the web to get these facts. The BGI seems to have commercial interests so I would want to know what the absolute policy on the journal is – who runs it, who runs the data?
So let’s see how the DataSharer principles match up. I’m still refining them. I am taking the view that DataSharers must be completely Open (OKD-compliant, libre). BMC and I share the same operational views on Openness. The final pointer to the BMC licence makes the general principles reasonably clear for articles (but not for data)
Authors of articles published in GigaScience retain the copyright of their articles and are free to reproduce and disseminate their work (for further details, see the BioMed Central copyright and license agreement)
An online open-access open-data journal, we publish ‘big-data’ studies from the entire spectrum of life and biomedical sciences. To achieve our goals, the journal has a novel publication format: one that links standard manuscript publication with an extensive database that hosts all associated data and provides data analysis tools and cloud-computing resources.
PMR: I will be interested to see the links.
GigaScience aims to increase transparency and reproducibility of research, emphasizing data quality and utility over subjective assessments of immediate impact. To enable future access and analyses, we require that all supporting data and source code be publically available and we provide an extensive database and cloud repository that can host associated data, supplementary information and tools.
PMR: this will be interesting. I question “publically available” as I’m not clear what this means in practice (note, it often isn’t very easy to make all code available if part of it have been licenced, e.g. from database vendors)
A unique feature of our database is that important associated datasets can be given DOIs, providing both permanency and an additional citation. Thus GigaScience provides easier access to associated data as well as recognition for data producers.
PMR: Very important, but surely not “unique” – isn’t this what Datacite does?
All articles published by GigaScience are made freely and permanently accessible online immediately upon publication, without subscription charges or registration barriers. Further information about open access can be found here.
PMR: There is nothing about data – data are different from articles, so this should be addressed specifically.
Following publication in GigaScience, the full-text of each article is deposited immediately and permanently in repositories in e-Depot, the National Library of the Netherlands’ digital archive of electronic publications. GigaScience is included in all major bibliographic databases. A complete list of indexing web services that include BioMed Central’s journals can be found here.
BioMed Central is working closely with Thomson Reuters (ISI) to ensure that citation analysis of articles published in GigaScience will be available.
PMR: It is critical that this indexing metadata is made specifically Open, identified as such and made available to the community. Otherwise BMC is granting third parties ownership over citation data that they can control and resell to the scientific community (as happens with other citations). Make data set citation data OPEN.
Publication and peer-review process
Suitability of research for publication in GigaScience is dependent primarily on the data quality and utility, rather than a subjective assessment of immediate impact. To encourage transparent reporting of scientific research as well as enable future access and analyses, it is a requirement of submission that all supporting data and source code be made available.
PMR: Excellent requirement – it won’t be easy.
Data and materials release
Submission of a manuscript to GigaScience implies that readily reproducible materials described in the manuscript, including all relevant raw data, will be freely available to any scientist wishing to use them for non-commercial purposes
[PMR emphasis]. Nucleic acid sequences, protein sequences, and atomic coordinates should be deposited in an appropriate database in time for the accession number to be included in the published article. In computational studies where the sequence information is unacceptable for inclusion in databases because of lack of experimental validation, the sequences must be published as an additional file with the article.
PMR: Whyever has the NC been included? It’s inconsistent with everything that has been said before. It’s unenforceable. It goes against all current BMC policies AFAIK. Please, Please remove it asap. I cannot regard Gigascience as Open while it remains. See /pmr/2010/12/17/why-i-and-you-should-avoid-nc-licences/ .
Crystal structures of organic compounds can be deposited with the Cambridge Crystallographic Data Centre.
PMR: Structures in the CCDC are not Open. Their distribution is controlled by the CCDC and there is no right of re-use. Put them anywhere Open.
PMR: In general I get good vibes about Gigascience. I think they check most, but not all, of my initial principles. However I would like to see Data addressed specifically and consideration given to the Panton Principles for Open Data in Science http://pantonprinciples.org/ including clear labelling.
PMR: If you reply in comments these will be visible to everyone. I will treat them constructively.
UPDATE: Comment from GigaScience to this blog crossed this post.