Chemistry Blog

«

»

May 04

The Source Code Debate

by Kenneth Hanson | Categories: science policy | (15525 Views)

Few researchers were using computers 30 years ago.  This quickly changed with the release of several commercially viable personal computers in the 1980s. Since then, processing power has increased and the cost of computers decreased at an exponential rate (see Moore’s Law).

It’s no surprise that computers are now pivotal in chemistry research. We use them in a wide range of calculations – from determining the 40th decimal place of the absolute energy of He to modeling the release and distribution of toxic chemicals in river basins. The software used to address these complex problems is becoming increasingly accessible and easy to use too. There are already a variety of cell phone apps for chemistry related problem solving.

Yet, while the prevalence of software and computer-based research continues to grow, the rules for publishing results and sharing software lags behind. The magical/miracle nature of black-box calculations is disconcerting to individuals that want to know how the answers were obtained (see Sidney Harris cartoon).  A palpable concern is growing in the scientific community around the sharing of software – and the foundational source code -necessary to reproduce published results. Two recent opinion pieces, one in Science titled, “Shining Light into Black Boxes” and the other in Nature titled, “The case for open computer programs” are trying to bring attention to this issue. The articles discuss the advantages and apprehensions of sharing, as well as suggest possible changes. Below is a summary of the points raised by the authors of the two articles – as well as the thoughts others (including myself).

Advantages to sharing software and source code:

  • Reproducibility: As stated by Ince et. al., “The vagaries of hardware, software and natural-language will always ensure that exact reproducibility remains uncertain…” without the release of source code in its entirety.
  • Catching errors: A simple mistake in converting units, assigning missing values as zero, rounding errors, or a misplaced decimal point, can wildly skew outcomes (see Office Space). We can only see and correct errors if we can see the source code.
  • Facilitating progress: All publications require that data, equations, materials, methods, and instrumentation are disclosed so that the results can be tested and furthered by others. We are all better served when source code is disseminated in a similar manner so that programs can be studied and repurposed in future research.
  • Teaching tools: Real, applied examples – that are relevant to research – are useful for new students and researchers learning to program and develop code.
  • Openness: Despite the competition to acquire funding and to publish first, we are all joined in the endeavor of understanding the rules that govern the universe. The open sharing of information has been and will continue to be the foundation of scientific progress.
  • Relying on faith: No matter how prolific or respected you are as a researcher, the implicit assertion, “Trust me, the program works the way I say it does” is not an acceptable means of justifying your results. On a fundamental philosophical level, black box justifications like that should be socially unacceptable in the sciences.

Apprehensions against sharing software and source code:

  •  It’s not required: With the constant push to publish early and often, no one wants to put in unnecessary time and effort during the submission process.
  • Embarrassment: Many computer programmers take pride in how clean, efficient, and elegant they make their code. For researchers, on the other hand, programing is often a means to an end. The idea of someone else looking at your “messy” code with a critical lens could be intimidating or even embarrassing.
  • Citations: There is not a shared mechanism or expectation for citing the authors of source code. Without citations, sharing source code does not help your career and may even help your “competitors.”
  • Formatting: Currently, there is not a standard format for sharing code.
  • Intellectual Property: Obviously, if a program is commercially available or has potential to be, the release of source code would allow anyone to reproduce the program without purchasing it.
  • Source code in the wrong hands can be dangerous: This is a concern among many theoreticians. In a day and age where anyone can calculate energies, structure and spectra of molecules with prepackaged software (Gaussian, Spartan, GAMESS, etc.), the rationalization of results based on black box answers is common. Dissemination of more software and source code is likely to compound the problem.
  • I did the work, so should you: Recently, a colleague of mine contacted an author of a paper requesting the source code for results which were not reproducible without it. She was basically told, “we used a Monte Carlo method. You can find something similar in Matlab.” My colleague was understandably disappointed at the answer. Taking the time to write a program to double-check someone else’s results, when it might not even reproduce the results (see above) – even using Matlab as a starting point -  is just not worth the time and effort.

With all of that said, the question remains: what should be required for publication? Currently, depending on the journal, the requirement can range from a general description of the nature of the program – so individuals can write their own code – to the full release of the source code.

I have been, and will continue to be, a strong advocate of sharing all results and information, no matter how trivial or insignificant they may seem (see Journal of Failed Chemistry). Any information may be the key to a major research discovery that helps move an entire field forward. This information could include a program/source code that solves a lingering problem or reveals a new way to think of a solution.  For tax payer funded research, I agree with Ince et. al. that, “anything less than the release of the source programs is intolerable for results that depend on computation.” For those of us that want to see the release of source code, changes clearly need to be made.

Possible changes:

  •  Institutionally: Publically-funded research institutions should create and implement a quick, standard, open source licensing procedure. For most code, commercialization is not an issue. Usually, the questions the code address is so specific that only a handful of people would be interested, limiting its potential profitability. Yet, for source code that is potentially profitable there needs to be a quick mechanisms for protection.
  • Funding agencies: Tax payer funded agencies should clearly state their preference for the open dissemination of software and source code. This includes the requirement of a code dissemination plan in the grant proposal.
  • Journals: Publishers could enact a policy that requires all software and source code necessary to reproduce results be made available. This would be a condition for publication. They also need to develop simple mechanisms for sharing source code.
  • Researchers: We should share our software/source code using any and every possible means. Sharing mechanisms can range from including it as a supplement to our publications to making it available for download online.
  • Reviewers: Demand full source code for any software that is not commercially available prior to publication. For those programs that are commercially available or those that potentially could be, it may be necessary to coordinate an independent third-party tester that is given access to the code but has signed a non-disclosure agreement.

This post provided an overview of the debate around the availability of non-commercial software/source code with research publications. In my next post I will discuss a specific example within the chemistry community that exemplifies how a commercially-available software package can further complicate the situation.


7 comments

Skip to comment form

  1. azmanam

    If it’s publicly funded research, I really don’t think there’s a question. It’s public domain. Maybe that doesn’t mean it MUST be in the supporting info, but maybe a line in the paper ‘source code available upon request’ or something.

  2. Sean

    I think one requirement that is not always enforced is publishing the version number of the program used. I sometimes come acorss researchers only citing publications, which are not updated nearly as frequently.

    1. Kenneth Hanson

      Good point! That has not even crossed my mind while reviewing papers. I will make sure to pay attention to that from now on.

  3. EquationForLife

    I just had a quick question…when you mention the lack of citation rules for shared software, does that mean right now you can use someone else’s source code/software for a calculation and not even cite their work?

    I apologize if the articles you posted answers the question, I don’t have journal access at the moment.

    1. Kenneth Hanson

      Example 1:
      There are situations where a theory was published 30 years ago by person A as a revolutionary new way to solve such and such problem. Researcher B writes a program 20 years later to do calculations based on this theory and cites A in their publication. Researcher C can contact B and get their source code. After using person B’s code to get an answer, it is possible for C to publish those results and only cite the original paper by A. If journals don’t require the sharing of source code there is no way to know that C used B’s code. Additionally, even if sharing the source code was required it is very unlikely that the reviewer of the paper C would know where that the code came from B (Unless the reviewer was B).

      Example 2: A researcher can find code in a publication from an entirely different area of research. They can re-purpose it for their own work and pawn it off as their own. Since the original programer/researcher will never review let alone know about the paper it is unlikely to be noticed.

      It is a dirty, but relatively safe way to increase your reputation and prestige by pretending you did more work than you actually did.

      More and more publishers are using comparative software to check for plagiarism in papers. However, since publishing source code is not mandatory it is much harder to catch these events.

      tl;dr Yes. You can use someones code and not cite them.

  4. Marcus D. Hanwell

    It is great to see this getting so much attention now, and I hope that we are able to change this as I found the situation very disappointing when I was getting into research. If you want to publish results, and especially when they are publicly funded results, there should be an obligation to publish everything necessary to reproduce those results. This used to mean equations and derivations but more and more that means source code in my honest opinion. If you don’t want to publish reproducible results then we should seriously consider the value such publications add to the field.

  5. Sherrill Zin

    When i was younger i wanted to end up being a computer programmer due to the fact i wish to develop some wonderful application programs and i also idolized Bill Gates.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>