By Neil Chue Hong.
Another social coding scandal broke loose on the twitter-sphere - and with it the fragile trust between the programmer who open sources their code and the public who comments on it was destroyed a little. So what's going wrong, and what should we do about it?
Here's the backstory. Someone writes a small script to help them with a workflow. They find it useful and think that maybe others will find it useful too. They publish it on GitHub, others use it. So far so good.
Then the script gets lots of attention on Twitter, and not all of the good kind. As the author, Heather Arthur, notes on her blog:
"At this point, I know is that by creating this project I’ve done something very wrong. It seemed liked I’d done something fundamentally wrong, so stupid that it flabbergasts someone. So wrong that it doesn’t even need to be explained. And my code is so bad it makes people’s eyes bleed. So of course I start sobbing.
Then I see these people’s follower count, and I sob harder. I can’t help but think of potential future employers that are no longer potential. My name and avatar are part of its identity, and it’s just one step for a slightly curious person to see the idiot behind this project."
Wait just a second.
Someone has gone to effort of making their code open and accessible, and now the very community they thought were supportive of them has turned them into a crying wreck?
This chimes with comments in discussions about reasons for open-sourcing scientific software. There is a fair amount of give and take in research, but often your reputation is the most valuable asset you have.
For instance, Joon Ro writes:
This is pretty much the reason why I haven't released code for a project which has been done. I am not that worried about the ridicule, but rather I am worried that somebody else (who works in the same field) might find a bug in my code before me and it will be an embarrassment.
You are not your code
Here's the thing we need to realise - our reputation is not just built on the quality of our published code.
In reply to Joon, Randy Olson notes:
Better to be temporarily embarrassed than have a bug in your code that could cause confounding problems for you or others who use the code in the future.
This is particularly true in science now the emphasis on reproducible research is leading people to - shock horror - challenge published results, and where we have had examples of published work requiring retraction after bugs were found in code.
In fact in many ways it's better for you to be embarrassed for a bug in your code (hey, a scientist can't programme!) than for a bug in your science (hey, a scientist can't do science!) And as a fellow scientific computing person, I judge you on how quickly you respond to the bug being found, not whether there is a bug (and did you put in a test to catch that bug in the future?)
The tricky one is where you're in that mixed mode career path of being a "research software engineer" i.e. people are hiring you for your coding skills. There reputational damage can hurt on both fronts. And yet I'd still say that it is overall positive to publish your code and have a bug found in it, because you enabled the bug to be found and therefore it can be fixed. If it's poorly written/documented code it's a bit harder because in theory you should consider anything you write (open or closed) to be a potential advert to a future employer. But then you'd never complete anything.
Jessica Kerr kindly pointed me in the direction of Scott Hanselmen's comment on the subject, where he notes that the code in question was actually rather useful, didn't require idiomatic command line knowledge, and, most importantly solved someone's problem. What I like best is his analogy of publishing personal pieces of code to be like having a garage sale:
One person's junk is another person's treasure.
"I can't believe I found code that does exactly what I needed."
"Wow, I learned a lot from that algorithm."
In any case, what's the chances of your code being called out? Impact Story have been looking at this as part of their work to look at Alt-Metrics, and believe that getting a single tweet about your GitHub repository puts you in the top 15% of repositories readvertised to other people. Just three "stars" puts you in the top 20% of repositories recommended.
It seems that the maxim that "all news is good news" holds true - it's better to have people see and ridicule your code than for noone to see it at all. And in the case of Heather Arthur, the publicity has had a positive effect - it appears that there are many more people forking her repository and creating pull requests to improve her code!
So, go ahead - release something out there today! If you feel a little ashamed of it, release it anyway, maybe under Matt Might's academic-strength CRAPL license. And if you see someone else's code and find an issue keep that criticism constructive.
PS If you're a researcher who's still a bit embarassed at releasing their code, perhaps participating in Software Carpentry might give you the skills and confidence to do so in the future.