GIFs and other horrors

The GIF (and its extended family of PNG, JPEG, TIFF, BMP, etc.) are major destroyers of scientific data. This post shows why they should be avoided for much scientific data. (The GIF has additional infamy through the patent fiasco). In this post I'll us "GIF" to refer to all bitmapped formats (as opposed to vector formats such as SVG).

All bitmapped images contain data captured as individual pixels. The resolution of the data cannot, therefore be better than the separation between pixels. The problem occurs when a high-resolution object (such as a spectrum) is captured as a GIF. A spectrum in chemistry typically has a resolution along the x-axis of ca. 16, 000 points, while a GIF may have 1000, Therefore 94% of the data are lost in coverting from spectrum to GIF. Sometimes the conversion involves dithering pixels so that the final image looks somewhat more beautiful but this adds no information and usually destroys some. Anyway, here goes:

Text and chemicals:

bjoc-react.gif

- a bit diificult to read, so let's magnify it:

bjoc-react2.GIF

Bigger but not much better...

bjoc-react4a.GIF

... now we see the full horror - the dithering hasn't added information - it just hid the problem.

It isn't just that we have jaggies, but we can actually lose information in a seriously misleading manner. Here's a chemical reaction:

betalactam2.GIF

This looks very pretty. But suppose we have to shrink it just a little bit (say 10%). Now we get:
betalactam2a.GIF

What's happened? The lines used to be one pixel wide. When the picture was shrunk the converter had to decide whether the line was in a vertical line of pixels. It just missed, so it's not been drawn. This corrupts, rather than destroys the chemistry - it could be mistaken for a different molecule!

In practice the greatest destruction is probably in the spectra. Remember they have a resolution far greater than the screen. But here are some pixel-based spectra from supplemental data. You can find these in all publishers' repositories...

acs.gif

The resolution of the spectrometer is probably 0.001 in the vertical axis - the GIF can only manage about .025

bjoc-spect.GIF

Here the spectrum has been dithered but that can't save it. Again the actual data resolution is probably 50 times what you can see.

rsc-suppdata.jpg

And this is one of the best. It was a proper digital spectrum. It's been printed out (losing some of the metadata such as frequency), been annotated with the compound on a Post-it (though we cannot make sense of what is attached - it seems to be related to a different spectrum). Then it has been photcopied - losing resolution again. We don't know how it got to the publisher, but here is their record of the scientific experiment.

This entry was posted in chemistry, general. Bookmark the permalink.

5 Responses to GIFs and other horrors

  1. GIFs are horror in other areas of science and math, too. Most programs that make LaTeX text available on the web use gifs for all of the mathematical equations -- yuck! Sometimes there is an alt tag for the image in which some of the information is findable -- if you know to search for LaTeX markup. Most scientific papers and websites don't use MathML or any other searchable format. If you're a mathematician who knows the name or class of equations, you can search on that. If you need to find an equation of the form x, then you're out of luck.

  2. pm286 says:

    (1) Yes! It amazes me how little progress has been made towards universal semantic maths or even universal presentational maths. Back when the Web was bright and shiny and innocent (1994) there was a splendid tool - LaTeX2HTML (Nikos Drakos, Leeds, UK) which did exactly that. It produced GIFs for the maths and that was reasonable at the time. But now we need built-in support for MathML, SVG in browsers and educate the community in the pixel disaster.

    One area we are starting to make progress in is "plots" - graphs of y (y1...) against X. Most of these are trashed as PDF or GIF. But they can all be captured in STMML (a subset of CML) and SVG. We hope to make rapid progress here.

    "If you need to find an equation of the form x, then you’re out of luck.!"
    I agree. I assume there are lots of people who would like to search for (say) first-order partial differential equations in (say) the atmospheric chemistry literature.

  3. Rich Apodaca says:

    Nice demonstrations of the "hamburger effect". Once data have been "flattened" as bitmaps (especially at low resolution) there isn't much that can be done to reconstruct the original.

    What's the answer, though? Even SVG has limitations. Currently browser support (and other software support) is limited. A lot of scanning software doesn't support it. NMR, IR, and chromatography software generally don't offer an "export as SVG" option either. Finally, awareness of the problem is so low, few see a need to change what they're doing. And many wouldn't change even if they knew a problem existed, due to lack of technical capacity and accepted standards.

    The successful solution to this problem is probably one in which scientists don't have to change their current practices. How that actually works technically is an interesting area of research...

  4. Pingback: Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » Presentation to Open Scholarship 2006

  5. Great post Peter -- this is a great illustration of why not giving out (raw) data is such a problem.

    @Rich: I note that SVG support is getting pretty good: Chrome and Firefox have supported for a long time and IE9 is scheduled to include it (and there are also now plenty of workarounds for IE's lack of support including this wonderful javascript hack from google: http://code.google.com/p/svgweb/)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>