Guides

Porting formulaeBy Mike Jackson, Software Architect

As part of my open call consultancy for LUX-ZEPLIN (LZ), I looked at how LZ could migrate their data and computation from Excel to MongoDB and Python. There are many resources with valuable advice on cleaning data in Excel into a form suitable for analysis using Python, R or other data analysis packages. Unfortunately, how to handle formulae and cross-references is little discussed.

Based on my experiences, I have written a guide on “Tips for porting formulae from Excel into code” in which I provide some (hopefully) helpful hints on how to identify and highlight formulae and cross-references, which can help when porting these to Python or R, and to restructure tables so that raw data is contiguous, and so is easy read by data analysis packages or to export into a database or files. Feedback, suggestions and additional advice is more than welcome.

Feel free to add these as comments!

 

citing softwareBy Will Usher, Senior Researcher: Infrastructure Systems Modeller, University of Oxford

Plagiarism is a serious issue, and we are all familiar with the horror stories of students unceremoniously ejected from courses for copying essays. Any undergraduate degree worth its salt teaches students how to cite work correctly, acceptable bounds on quotation and how to attribute ideas and concepts to their sources. But in the growing world of open-source research software, best practices have yet to be universally understood, as I recently found out.

During my PhD at University College London, I became involved in the heady enthusiasm of the Research Software Programming group, attending and then helping out at Software Carpentry workshops. As a consequence, I was keen to apply my new knowledge of Python, version control and software development to my research. As luck would have it, I discovered an existing Python library on Github, which implemented several Global Sensitivity Analysis routines I could make use of. As I used the library, I started adding bits and pieces, and so by the end of the PhD I had made a considerable contribution to the package.

It's probably safe to say that SALib (sensitivity analysis library) is the go-to Python library for the unfortunately still-far-too-niche use of global sensitivity analysis in modelling, and our…

Continue Reading

Old map of the worldBy Aleksandra Nenadic, Training Lead

Say you've got a Google spreadsheet with a column for addresses. It could be street addresses or postcodes. You want to map this data and embed the map into a website. Maybe you also want the map to update dynamically as more rows are added to the spreadsheet. What are your options?

This guide goes through the different ways to do this. However, to first map the data you’ll need to find the geocodes; i.e., latitude and longitude coordinates for these addresses. For locations that are more general, such as “UK”, geocoding APIs usually return the coordinates of the centroid—the area’s center point—or the capital.

Using Google My Maps

Google My Maps is a powerful tool designed to easily create custom maps from your data and share and publish maps online. You don’t need to worry about geocodes—they will be calculated for you out of addresses and postcodes.

To use this tool, you’ll need a Google account and you can either load data from a CSV, XSLX, KML or GPX file or link your Google spreadsheet (making sure it is either publicly available via "File" > "Publish to the web..." option in Google Spreadsheets or you have created a special sharing link for it).

Continue Reading

By Neil Chue Hong, Director.

This guide explains how software fits with the EPSRC policy framework for research data.

Why write this guide?

From 1 May 2015, organisations that receive EPSRC funding, and their researchers, are expected to comply with the EPSRC policy framework on research data. This sets out EPSRC’s principles and expectations concerning the management and provision of access to EPSRC-funded research data, in particular the principle that “research data is a public good produced in the public interest and should be made freely and openly available with as few restrictions as possible in a timely and responsible manner” (from the EPSRC principles).

This guide has been written to clarify how these expectations relate to research software. It explains how access to research software may be provided in line with the policy, and provides examples of common situations and how they can be dealt with. If you have further questions about the EPSRC research data policy, please get in touch with your EPSRC contact or the person responsible for EPSRC matters in your institution's research support office.

What is research data?

Research data is

Continue Reading

By Neil Chue Hong, Director of the Software Sustainability Institute.

Google announced today that their open source project hosting site, Google Code, is to close. The site has disabled the creation of new projects, will turn read-only on 24 August 2015, and will close on 25 January 2016. In the announcement.

Google's Director of Open Source Chris DiBona cited the move of projects away from Google Code to other services such as GitHub and BitBucket - indeed Google itself has moved thousands of its projects to GitHub.

The first thing to stress is: don't panic. Google has provided a long time for you to migrate your project, along with tooling to make the process easier.

For those who have projects hosted on Google Code and want to move to GitHub, a Google Code to GitHub Exporter Tool is available. As part of the export process, only public issues will be transferred across, and all repositories (SVN, Mercurial or Git) will be converted into Git repositories. More details are available in the FAQ

The closure of Google Code leaves SourceForge as the last remaining large source code hosting site offering Subversion…

Continue Reading

Bridge

By Mike Jackson, Software Architect.

This is a guide on using Git and GitHub within a VMWare virtual machine (VM) which, for whatever reason (e.g. organisational security policies), cannot be connected to a network.

Why write this guide?

This guide arose from our open call collaboration with the Distance project at the University of St. Andrews. They use Windows XP virtual machines for developing their Distance for Windows software. Their interface code, implemented in Visual Basic, is not held under revision control and institutional security policies mean that their XP virtual machines cannot be connected to the network. Please see the blog post on Building a bridge between a virtual machine and the outside world.

Assumptions

  • You have a VMWare virtual machine that cannot be connected to the network directly.
  • Your host machine that run the virtual machine, can be connected to the network.
  • You have a repository on GitHub.
  • You have GitBash installed on both your host…
Continue Reading

Much research software starts its life thanks to a research grant. But what happens when your code proves useful and you want to extend it or ruggedise it for release to the wider community? Research grants generally can't help because they focus on solving research problems, not improving code. Who should you turn to?

We're putting together a list of funders and funding calls who can help with the costs of improving code. This list is not comprehensive, but we'd like it to be. If you know of other sources for this type of funding, please let us know and we'll add them to the list.

In general, we recommend using the ResearchResearch.com service to identify and then filter the various opportunities for funding.

Specific calls

  • BBSRC TRDF2 (tools and resources development). A competitive call (<20% success rate), which takes place annually. The code must be within a BBSRC area. For more information, see the last call.
  • Digital Science Catalyst grants. Up to £15,000 for scientific software innovation, which can help attract to additional investment. It's an ongoing call as described on the Digital Science website.
  • EPSRC Software for the Future. 
Continue Reading

ClocksBy Mike Jackson, Software Architect

Working with researchers is something the Institute has been doing for many years now. So we thought it was about time to put together our top tips for software developers working with researchers to help foster productive, and enjoyable, collaborations.

1. Remember they are not software developers

You may know the difference between centralised and distributed revision control, classes and objects, pass-by-value and pass-by-reference, upcasting and downcasting, coupling and cohesion, processes and threads, or a stack overflow and StackOverflow, but your researcher may not. Knowing how to knock together a few dozen lines of code does not make someone a software developer, as writing code is just a fraction of what a software developer does.

Be aware of your researcher’s level of knowledge about software development and keep in mind that it can be hard for people to admit that they don’t know or don’t understand something.

It can also help to adopt a technique from storytelling—show, don’t tell. Don’t tell them to apply the factory pattern, a recommendation daunting in its abstract vagueness. Rather, provide an introduction to the factory pattern and show them an example of how the factory…

Continue Reading

MoneyBy Mike Jackson, Software Architect.

Your project has been developing software that is becoming ever more popular. You now find yourself struggling to find time both to develop and support your software, and keep your stakeholders happy, and to do your research. One way to continue to satisfy demand is to recruit a dedicated software developer for your project. But how do you get the funding? This guide helps you to make the case for funding a software developer for your project. It helps you to define the activities that a software developer would do and the effort these will take. It suggests how you can gather evidence of your software’s impact and popularity so funders can be reassured that the benefits of funding ongoing development of your software are far beyond just your project itself.

Why write this guide?

As part of our collaboration with BoneJ we helped to write a case for funding a full-time software developer to maintain and further develop the BoneJ open-source software. We felt that a guide based upon both this work and our experiences in developing, maintaining and supporting open source software would be useful to researchers seeking funding for developers on their projects.

Provide an overview of your software

What is the…

Continue Reading

By Mike Jackson, Software Architect, the Software Sustainability Institute

Choosing an open-source licence can be a daunting and time consuming prospect. A new online resource, tl;drLegal should make life a little bit simpler!

tl;drLegal provides plain-English summaries of popular open-source licences, allows the licences to be searched according to features of interest (what users can, cannot and must do), allows the implications of combining licences to be explored and provides a simple tool to auto-generate attribution documents in HTML so you can give credit where it's due to the open-source software you use.

For more on choosing open-source licences, please also check out our guide on Choosing an open-source licence.

And, in case you were wondering what tl;dr stands for, it's too long didn't read!.

Subscribe to Guides