The Ridge of Refactoring

pinnacleridge.jpg
The Island of Skye has the most dramatic mountains in the UK, and Sgurr Nan Gillean is one of the most visible and well known. For me the Pinnacle Ridge (left) [1] epitomises the reality of computing.
There is an often quoted Gartner’s_Hype_Cycle for technology with a graph like this (not Open). After the

  • Technology Trigger comes the
  • Peak of Inflated Expectations then
  • Trough of Disillusionment,
  • Slope of Enlightenment and
  • Plateau of Productivity.

Great fun for consultants. I sometimes even believe it. However our software projects usually behave like this.

  • The idea. The lash-up demo. Everything works. Everything moves fast…
  • The first public wonder demo. People are wowed…
  • Everything breaks. Upgrades in the browser kill the demo. Everything takes 10 times longer than it used to. It really does.
  • The endless fractal Ridge of Refactoring…
  • …and occasionally a stunning view through the clouds

What is refactoring? A metaphor might be maintaining a garden (weeds, pruning, all the seeds of decay). Lots of work goes in but little seems to change. Sometimes (as when parts are ploughed up) it seems to go backwards. Refactoring is making your code better without having anything new to show people. It’s time-consuming, often solitary, and filled with local troughs of disillusion. Frequently you go downhill and when you have climbed the nest peak you are only a little higher than before. But it is a little higher
Here’s a simple example. I want to print some lines to the output (don’t worry about the Java, the idea should be clear)

String[] ss;
...
for (int i = 0; i < l.length; i++)
System.out.println(ss[i]);

This code goes through each string by incrementing the index i and printing it. I have written thousands of little loops like this. It’s not wrong, but it’s not as good as it good as it could be. Why?

  • I haven’t delimited the loop (without {} it is easy to get edit loop structures wrong).
  • I might edit the code and change the value of i within the loop.
  • one of the entries in the array might be null

and several other ways of introducing time-bombs
So I edit it:

List moleculeNameList;
...
for (String moleculeName : moleculeNameList) {
CMLUtil.output(moleculeName);
}

I haven’t changed the functionality of the code – my colleagues won’t notice anything different. I might have had to make many hundreds of edits like this (“blood on the floor at midnight”). There will be many times when I am disilllusions – the code no longer works. Often I feel like a hermit crab who has discarded its safe shell and hasn’t yet found another. I can’t go to sleep until it’s fixed. I don’t know how long that will be. Programmers make mistakes at 0300. There is fog all around (this is very common in Skye and in the Cuiilins the compas points in the wrong directions).
Finally it all compiles. And the Unit tests (more later) run. I get the green bar. Now I can go to sleep (or at least to bed).
So what is better, and why did I refactor? Simply, the code is easier to read and less likely to break. I can output the result to somewhere other than the screen. If someone in the Blue Obelisk or elsewhere wants to re-use it it’s easier and quicker. It’s close to a community component. And that is what we are striving for.
The good news is that Refactoring is very well appreciated in the community. Eclipse has special functions for refactoring. With a little investment in time it can go through all your code and make simple changes (rather like this one) on hundreds of instances without making a mistake. Now that’s productivity!
[1] See http://en.wikipedia.org/wiki/Image:Pinnacle_ridge%26gillean2.jpg for authorship and (Open) copyright of image

Posted in programming for scientists | Leave a comment

Knowledge-limited, not time-limited

In recent comments, JamesM raised the idea of “spike solutions“.
I had never heard of these (nor had Wikipedia) so I asked and found they came from the XP school – here is a reasonable description. Particularly:

Spikes are good when you are knowledge-limited, not time-limited. — KentBeck

or to put it simply – when you haven’t a clear idea where you are going.
This happens (or should happen) a lot in research. Essentially we start out to see if a particular approach goes somewhere useful. You can’t write tests in advance because you don’t know what you are testing. Here are examples from my own work.

  • I would like to be able to interpret chemical diagrams as molecular connection tables (i.e. as atoms and bonds rather than pixels). I know it had been done before by commercial companies but the code was unaffordable (and I suspect) of limited applicability. And I had what I thought was a smart idea – to look for common patterns as the images within a paper would probably come from the same software. I had no idea whether I would use neural nets, Fourier transforms or brute force. So I downloaded a number of images which looked of good quality, and lashed up some code to try to isolate glyphs for machine learning. In fact the glyphs turned out to be less isolated that I had thought – the imagaes weren’t monochrome but had a lot of antialiasing to soften them. This meant that some image processing was necessary – so I had to create some simple filters (and I sometimes don’t reuse code because I like implementing algorithms to see how they work). I made it to the start of the next stage but there were no quick wins and not many wins at all. At that stage I stopped the activity and moved to using text parsing instead.
  • Frequently when you deal with external data sources – especially unstructured ones such as text – you get a Zipf‘s law distribution of problems. You get a good feel for how the bulk of the data behaves, and code for that. You might at this stage decide you wish to validate your code against Tests. However with every new data item there is a chance that it uses something slightly different which might break the test. So you are actually testing the data against the tests, not the code. At some stage you need to draw a line and declare an arbitrary conformance spec. There are then at least two tests. One for the code against a small “platonic” data set which makes sure that you don’t break core methods during refactoring and one for the data which tests conformance against this platonic spec.
  • I wish to use an external tool (database, library, etc.). I don’t know how this works or even whether it works and if so, will it do what I want. So there is quite a lot of glue code that makes connections, creates sample objects etc. We can create test objects but we can’t test the tool until we know how it behaves!

Sometimes these things actually work! So at that stage the discipline has to be to modularise and refactor the experimental code. Obviously it is post facto, but at this stage it should be possible to write the tests – and so there will usually be some catch-up in testing. But do it as soon as possible because it will get worse if you leave it!

Posted in programming for scientists | 6 Comments

More Mystery Molecules

Four more mystery molecules – not all from Pubchem. There is a stronger link bteween these than the last ones. The actual link requires some knowledge or some intuition into my thought processes. There is a purpose behind this! – which will be revealed soonish.


mol1.png Pubchem CID 70723


mol23.png


Pubchem CID: 10074
mol3.png


mol4.png


Note: this post has been reedited because it was sligtly corrupted by WordPress.

Posted in "virtual communities", chemistry, data | 3 Comments

Totally Synthetic Useful Org Prep Daily

I am delighted at the increase in activity of blogs about synthetic organic chemistry. This is about how to make carbon compounds, many of which occur as natural products (produced by plants, bacteria, etc.) which often have valuable medicinal properties. They aren’t easy to make in the lab, and require complicated multistep processes.
The results of this work are reported in the peer-reviewed literature and you normally to have to subscribe to the journals (or secondary abstracting services) to know what is going on. But there is now a chemical blogosphere in which practising (young?) chemists report on key papers and processes when they appear. The standard of reporting and the (voluminous) commentary is very high indeed – there are virtually no irrelevant trivia – and some of the comments are perceptive in that they praise or deprecate some or all of the primary work.
I’m quite sure that this blogosphere will develop to become a key part of informatics (publishing, retrieval, etc.) in this area. Obviously the blogs are openly accessible, and several (like mine) use Creative Commons or Science Commons licenses. They have immense potential to change the model of chemical information. Jean-Claude Bradley of Useful Chemistry reported in Org Prep Daily says:

As you mention the desire to control format and quick indexing by Google are precisely why this bottom up approach to disseminating scientific information will become increasingly important over time. With Google as the UberDatabase, the usefulness of publication gatekeepers becomes more and more questionable. It is then up to the author to make it easy for their work to be found. For example, in organic chemistry, adding InChI strings of the molecules used and produced at the end of posts makes it that much easier to locate molecules on Google.

Indeed Google becomes the UberDatabase. Currently it deals primarily with text strings although I am sure we can expect more magic soon for things like images, music, etc. Chemistry can be done with a clever trick – the InChI. This takes a complete chemical structure and turns it into a magic string – try our WWMM Web Services GoogleInChI server to get a feel for this. If the chemistry in the blogosphere is published as InChIs then Google acts as an UberChemicalDatabse.
The main challenge is to get InChI – which is only a year old – adopted as the main way of indexing molecules. That is where the blogosphere comes in. So we are starting to talk with the main chembloggers to see what tools are required and what type of social computing will work in this area.
Among the synthetic chemistry blogs are:

  • TotallySynthetic a very detailed account of selected key papers from the recent literature. These syntheses may consist of many steps A->B->…Z
  • Org Prep Daily which highlights a typical and important step (C->D) of general applicability
  • Useful Chemistry where J-C B reports the chemistry as he does it!

I also have to mourn the passing of Tenderbutton’s blog – he has laid down his “pen” to finish his thesis. Like The Chem Blog and In The Pipeline this was a more discursive blog (although it had frequent comments on organic synthesis).
As Org Prep says:

The Org Prep Daily access rate has been pretty high this week – clearly with the help of the plugs from Dylan, Kyle, Tot Synth and also Jean-Claude in their blogs. I guess there are few hundreds regular readers now, just few weeks after this site started. I am particularly pleased that Google search terms like “ninhydrin solution TLC”, “azide from mesylate”, “IBX Dess-Martin” and “TFFH” are getting through and directing people here. I plan on writing Org Prep Daily in the near future, at least until the end of this year, to see if this kind of interest would continue.
There has been a similar project, Synthetic Pages http://www.syntheticpages.org/browse.php going on for several years now. They have a seriously useful synthetic procedure collection there and I definitely recommend Synthetic Pages to everyone’s attention. I wish them best luck with their effort. The reason why I did not send my procedures in there was that I wanted to start my own page with a more personal touch. A site that would have some day-to-day activity, where one could have a comment sections after each procedure. I think the blogg format may work for this purpose. The main inspiration came from Dylan’s Tenderbutton but the decisive factor was the WordPress software and free hosting at wordpress.com. WordPress has made this a fairly effortless undertaking, even for a computer-naive person like me.
Two things about the near future of Org Prep Daily: 1) I have not taken any vacation since I started at Scripps Florida and I am taking some time off, one week from now (my dad is visiting) so there will be a brief hiatus on updates – for about one week- beginning from next Saturday.
2) Call for authors: It is clear that it wouldn’t be possible for me to keep up with two-procedures-a-day updates if every procedures here was to be based on my experimental output (and I woudn’t want to go on by posting rubbish). When I started Org Prep Daily, I already had a collection of procedures from current and past projects. Since I am not in industry anymore, posting the building-block-and-reagent procedures was non-problematic for me; and I have made lots of these over the years. But my store of good procedures will eventually run out – maybe in one month time. So I am going to invite other people to write for this page.

This is a vision for social computing in the chemblogosphere. There is tangible synergy between multiple efforts – they diversify and mutate and give each other support. I can see a future where enough chemists are excited by this that most things of note end up in the blogosphere. That’s where we need tools like InChI and others to help us – we are developing some exciting tools and there will be more posts on this subject quite soon.

Posted in "virtual communities", chemistry | 1 Comment

Mystery molecules

Here are some compounds (taken from PubChem). I give no more explanation. I will be pleasantly surprised if you can work out why I have posted them.


1,1,1-Trifluoro-3-chloropropane:InChI=1/C3H4ClF3/c4-2-1-3(5,6)7/h1-2H2Molecular Weight: 132.512 g/molMolecular Formula: C3H4ClF3Coronene:
coronene.png
Molecular Weight: 300.352 g/mol
Molecular Formula: C24H12
Gestrinone:
gestrinone.png
Molecular Weight: 308.414 g/mol
Molecular Formula: C21H24O2


Quartz:quartz.pngMolecular Weight: 60.0843 g/molMolecular Formula: O2Si


Posted in chemistry | 7 Comments

Extreme Programming for Small Scientists?

We have a new autumn intake of researchers into our Centre and are aware that there are constantly changing demands on the software and informatics skills needed. In Big Science projects there is provision for infrastructure and training and well-developed methodology for the creation of software. We’re working out what is appropriate for “smaller” sciences like chemistry.
“Small” is not inferior – in fact it can have advantages, allowing faster and more diverse activity. But there is less formal support and software usually has to be done in the margins of projects. Software per se has little positive formal reward in science as it is the research in citable papers that matters to the evaluators, regardless of the value to the community.
So how do we develop a good, modern, software environment and lead people to best practices? Today I’ll start with Extreme Programming (XP) which talks a lot of sense (I quote from Wikipedia):

Extreme Programming Explained describes Extreme Programming as being:

  • An attempt to reconcile humanity and productivity
  • A mechanism for social change
  • A path to improvement
  • A style of development
  • A software development discipline

and

… five values are:

and summed up in 12 practices, grouped into four areas, derived from the best practices of software engineering:

Fine scale feedback

Continuous process

Shared understanding

  • Coding Standards
  • Collective Code Ownership
  • Simple Design
  • System Metaphor

Programmer welfare

  • Sustainable Pace

Now, XP is aimed at teams of developers in commercial organisations creating saleable products against whose success the team can be measured. Whereas small scientists are often working singly on projects with no positive software metric. So can XP (and it has many critics) be relevant?
I think some of it can. The five values are de facto attributes of a successful Open Source project, so by adopting Open Source (especially on a distributed global model) you have to adopt these. You cannot grow a successful group if they do not communicate, write simple code, give feedback (bugs), have extreme courage (more than XP demands), and have respect. So if we can translate the Open Source values into local practice then we have imported these values, regardless of the size of the projects. Of course there has to be some shared goal, but most research departments probably provide some of that (there will, of course, be some individuals working in such new areas that they have no natural companions).
Of the 12 practices some are only applicable to commercial and quai-commercial organisations (perhaps in Big Science). So my list is something like:

  • Pair Programming. Where possible someone else should work alongside you some of the time (“mentoring” could be a better word). The second person need not be an expert programmer, but may be good at designing information or act as a rubber duck.
  • Test Driven Development. Absolutely essential. Junit tests (more in later posts) have revolutionised my programming – I couldn’t libe without them.
  • Whole Team. Not easy, as not everyone belongs to the same team, but valuable if possible.
  • Continuous Integration. Yes. Things change so fast that we cannot work with infrequent large releases. Working on Sourceforge we are used to nightly builds and welcome them. Of course the nightly builds have to pass the Junit tests!
  • Small Releases. Again with the Sourceforge mentality this is standard practice. It does require careful attention to APIs – too many changes and the re-users get disillusioned. For example I decided to refactor the namespace for CML (there were just too many variants) to a single namespace for all time. One of my valued users told me that he would just about tolerate this, but any more and I was dead meat!
  • Coding Standards. Difficult to enforce socially, but happily the tools (at least in Java) implicitly set standards. Tools such as PMD are very useful and we can hopefully standardise on a set of style guides which are not too picky
  • Collective Code Ownership. Again any Sourceforger gets used to this. But it can be more difficult within a real-world group.
  • Simple Design. Fundamental, but not easy to learn or teach. Like architecture. You know it when you see it! So emulation is a good approach and constant code review by others. The balance between YAGNI and anticipation of requirements is difficult.

I add to this things that seem obvious to the commercial developer but by no means so natural to the Small Scientist

  • Use an integrated development environment (IDE).  These are now Open and very impressive. We use Eclipse for Java and are going to recommend it to everyon in the Centre.
  • Standardise on libraries. Again that is difficult in some cases, but we can now do this with Blue Obelisk for chemical informatics. After all, some of us have spent enough time developing it!
  • Present software projects to the assembled group even if they are on different projects. Be honest – what went wrong is often more valuable than what went right.

So – if you have read this far – we would be very grateful for any feedback from other Small Scientists in similar positions.

Posted in programming for scientists | 23 Comments

My Data or Our Data?

In the Science Commons meeting Creating a Vision for Making Scientific Data Accessible Across Disciplines (see earlier post) Andrew Lawrence (Royal Observatory Edinburgh) illustrated the wide range of “ownership” of data even in a single discipline – physics – I hope my notes do it justice.
Distinguish “ownership” from re-use. I can continue to own data while allowing others to use it. Legal constraints (formal) vs. community practices (informal). Data (private until publication) vs. knowledge (public, universal). Technology and policy must address all of these.
He showed a knowledge chain:

  1. raw data (directly from instrument)
  2. calibrated data (skymaps, catalogues)
  3. physical properties (particular knowledge)
  4. understanding (properties in general)

Generally the data in 1/2 “belong” to the experimental team and are separated from 3/4 by the “public ownership line”
In physics he described three communities:

  1. Condensed matter (solids, liquids, surfaces, etc.) generally done in small labs and small teams, sometimes needing experiements on a facility such as a reactor or synchrotron. Data are very sensitive until publication after which they are thrown away. However there is political pressure (especially from the funders and facilities) to re-use the data.
  2. Particle Physics. The epitome of big science, big facilities (CERN, etc.) big teams often with “Stalinist” control. Data belongs to the project with elebaorate rules for access and re-use. Good data infrastructure (these people gave us the Grid). The assert that data re-use is pointless (who else could re-use it?) but offer the infrastructure for re-use
  3. Astrophysics. (small) Big facilities (telescopes) but small teams – might get a few nights use at a time. Analyze, publish, throw away. BUT the facilities archive all the data – they are private for a year and then anyone can access them. There are standards for archival, formats, access, analysis. The “Virtual Observatory” provides data that are “science ready” – i.e. a potential user should be able to understand the provenance and know how to use them. (big) Systematic surveys (e.g. Hubble telescope) which produce “science-ready” archives. Everything is public from the start. “The archive becomes the sky”

The Virtual Observatory has a small set of professional service centres and a large set of end-users. Andrew finished by making a case for global standards, well-funded data centres, infrastructure in software, and data servers.
So where are other subjects in this? I traditionally look to biosciences with envy for their open data and requirements to publish. But even here we are under threat. Tim Hubbard quoted Graham Cameron as saying that IPR restritions would have made it impossible to build the European Bioinformatics Institute today. And universities continue to urge IPR protection which is rapidly creating the anticommons. In crystallography there is an enviable requirement to publish data alongside full-text articles. Some publishers (rightly) regard this as copyright-free while others (to be named-and-shamed later) carry out creative works by adding their copyright to experimental data.
And chemistry… … mainly publish and throw away… …most data is lost. Our SPECTRa project is looking at why this is so – so far our findings show it is social factors (“ownership”) that are the main factor. And re-use? We publish hamburgers so there aren’t many cows.

Posted in data, open issues | 1 Comment

Jmol – we love it

Bob Hanson has just released yet another stunning set of routines in Jmol.
Here is a snapshot (of a slice through a crystal), but the actual demo is, of course, interactive.
jmol1.gif
You don’t have to be a scientist or computer guru to appreciate these – any modern browser will bring molecules to life in front of your eyes.
And it’s no coincidence that last week Jmol was #20 in activity stats in Sourceforge (and CDK was ca #30). That is amazing as there are ca 1,000,000 projects (perhaps 100,000 active). The top slots are always filled with mainstream computing tools – editors, mailers, content managers, etc. But here are 2 Blue Obelisk projects right up there with them.
So the tide is turning towards OpenSource in molecular science. If anyone tells you that only commercial companies can write useful code, point them to the Jmol applet and site. And, although it’s less easy to demonstrate visually, to all the Blue Obelisk projects.
jmol2.png

Posted in "virtual communities", chemistry, open issues | 1 Comment

"Departmental anything, not just chemistry, may be dying"

(I think WordPress failed to publish this when I wrote it, so please excuse if it is a second posting).
Last week’s Nobel Prize for Chemistry has upset a number of the chemical bloggers, some of whom even posted odds for various chemists to win. It was not won by any of them – but by Roger Kornberg for what may be labelled “chemical biology”. There is a feeling of “unfairness” – the prize should be protected for “mainstream chemistry”. A useful summary of feeling in the ChemBlog is:

As for the Nobel… I’ve thought it out and I don’t really know what to say. Kornberg’s work is seminal and groundbreaking and had a Nobel coming to it; however, the endless frustration of seeing more synthetic or physical chemistry passed over for chemical biology is obnoxious. I want to dismiss it as a bandwagon trendy lots-a-hot-air chemistry, but it isn’t. If anything, its time has come and with so much money dumped into it there isn’t any rational reason why important discoveries shouldn’t be coming out of it. Traditional chemistry has built a foundation and upon that foundation we are finally able to tackle the more complex biological systems. That’s something all the self-assembly, macromolecular and synthetic folks can be proud of. That being said, we should also get used to this idea, and as sacrilegious as it is going to sound, it’s the fundamental truth:

Some more correspondents commenting in that blog – and I assume that many are recent graduates:

…LOL. Too bad it’s turned into the Nobel Prize for Chemistry and Medicine. I’m going to re-read my molec & cell bio book before setting next year’s odds.

…The guy used x-ray crystallography as a tool to attain his goal… his prize wasn’t for his use of x-ray crystallography.
More importantly:
So far as I can tell, everyone is bitching that a biochemist won a nobel prize. Well, boo****in’hoo! It is biochemistry. Last I checked, just because you don’t use anhydrous THF at least four times a week, doesn’t mean you’re not a goddamn chemist.
…No, in that instance (and really, only in that instance) organic chemistry is a means unto itself. If your goal is to mass produce a marine natural product to fight cancer then synthetic organic chemistry is a tool and the nobel would rightly go to the guy that discovered the bioactivity and no the 20 groups that tried to make it.

…I mean… who honstly thought it was going to go to the molecular elevator guy or the yet-another-chiral-reduction-catalyst guy? If that deserves a nobel prize, I do.

and one of the most relevant:

Departmental anything, not just chemistry, may be dying, [my emphasis] but the real challenge is to find a way to give chemists both the special skills and insights that comprise chemistry at its best with the breadth and depth of knowledge of the complementary fields needed to understand where chemistry can make a contribution. The darwinian process of competitive evolution applied to science and academic recognition may not be the best way to either recognise or understand the major problems the science is now capable of solving. Do you get a Ph.D for finally synthesising something or for coming up with a question that is worth answering?

I think the quote is perceptive. I’m grateful to the chemical community for many things and proud to be part of it. But I hope I am more than a “chemist”. As I have posted earlier we need multidisciplinary scientists and technologists who go beyond labels. The activities of merging the language and practice of chemistry with the Internet revolution is valued outside chemistry but not yet within it. There are many examples and I’ll just briefly mention a few.
PubChem is probably the most prominent and valuable example of the knowledge revolution in chemical science, but it is largely unknown within mainstream chemistry. It has over 5 million molecules but the driving and funding force is biology, not chemistry. Similarly I lamented that when I and others presented at a session on last month “Cyberchemistry” at the ACS there was virtually no effective use or interest in the Grid/cyberinfrastructure for chemistry beyond the “usual suspect” 3-4 groups. And in Open Access matters the publishers of chemistry have been among the last to explore this (and most haven’t).
We see that in the UK chemistry departments continue to close. Chemistry is a wonderful subject and it’s given me a lot. But there is no divine right for chemistry to exist as a subject and its current components may well be appropriated as needs be by biology, nanotechnology, neuroscience, environmental and other subjects that catch people’s attention.
I use computers without being in a computing department – people are increasingly using chemistry without a “chemistry department”.

Posted in chemistry | 1 Comment

Open Source need not be shiny

A very interesting comment on the tragedy of the lurkers (my concern that Blue Obelisk software is heavily used by people who do not show up in the community.)…

  1. daen Says:
    October 4th, 2006 at 9:50 am eI can think of several reasons for this, Peter. I downloaded OpenBabel some time ago, wrote a interface layer (which treats C++ object instances as handles), built a DLL from the source using MinGW and wrote a wrapper for Delphi around the whole thing. It was very much a rush job and I have never gone back to clean it up. It was a quick hack but it kind of worked enough for what we were trying to do at the time (doing SMILESMOLFILE conversion driven from an Access database). In my opinion, there’s an ethical issue in contributing code which you know to be sub-standard and have neither the time nor inclination to redact.

Rather than post a reply I’ll expand here.
Firstly, Thanks very much. I understand and appreciate this attitude. However you have now already contributed, simply by announcing that you exist! That, in the first instance, is what we want. It has motivated me to make this post! Your contribution ipso facto enhances Openbabel. It gives us moral support to know that what we are doing is useful. It shows that people want to interface to Access databases. Maybe if 3 people also had done the same the OB project would seriously consider an interface to Access…
But by its nature OpenSource does not regard what you have written as substandard. It doesn’t have to be shiny. When I first saw Jmol it wa a million miles from where it is now. If Dan Gezelter (hope that’s right) had felt it wasn’t worth exposing we wouldn’t have it.
So we have mechanisms for taking code of all sorts. It might get worked on now, it might lie fallow for a year or more. Someone working on it later might even throw it all away. But that doesn’t mean it’s not useful
No one takes contributions to an Open Source project and regards them as “substandard”. They are simply contributions of varying quality and use. They may be useful and buggy or thoroughly tested and irrelevant (apparently) or even possibly both or neither. It’s worth checking beforehand: “I have written soime routines that do X… we are not likely to do any more work on them – would they be useful?” You should always get a courteous and considered reply from the guru.
Contributors are always honoured in an OS project, often in alphabetical order. This survives even if (or often when…) their code is refactored or removed so that not a word of the contribution remains lexically. But the contribution has still been made.
If contributors want to remain anonymous – and there is no shame in that – they can contact the guru privately and, if necessary, use an alias on SF. (My name on SF is petermr – hardly an encryption, admittedly – but I could have called myself zaphod237 or whatever).

Posted in "virtual communities", open issues | 2 Comments