Collaborations workshop 2014

An advice and best practise article based upon CW14 is available

This year's Collaborations Workshop (CW14) brought together researchers, software developers, managers, funders and more to explore important ideas in software and research and to plant the seed of interdisciplinary collaborations.

The workshop took place on March 26-28th 2014 at the Oxford e-Research Centre, and was sponsored by Microsoft Research and GitHub. Its theme was the role software plays in reproducible research, which was reflected by keynote talks and collaboration sessions, as listed in the agenda.

The workshop concluded with the CW14 Hackday, which brought together developers and researchers to create solutions to some of the issues that were identified during the workshop.

Twitter

If you're interested in what happened at the workshop, you can see a summary via Storify. All announcements and news were published on Twitter with the hashtag #CW14.

Agenda

The …

Continue Reading

Discussion sessions are a fundamental part of the Collaborations Workshop and help people work on solving shared problems and learn about new ideas.

Before you take part in the discussion session, read these guidelines about how the session should run and what part you can play.

Visionary/Strategic

The following discussion topics focus on the grand challenges. The aim of these topics is to scope the problem space and suggest what needs to be done. When discussing these questions or linking to relevant materials, the #CW15strategy hashtag needs to be used.

  • Should we consider Software Engineering as a discipline within interdisciplinary research?
  • What are the best ways of communicating knowledge between disciplines in interdisciplinary research? What venues for presenting interdisciplinary research (journals, conferences, etc) exist?
  • What would a good funding scheme for interdisciplinary research which uses software look like?
  • How would we create a better forum to find collaborators in different disciplines?​
  • What is the purpose of a data management plan in helping to share data and results?
  • What new ways would you like to see give you credit for interdisciplinary working?
  • What new ways would you like to see give you credit for developing software?
  • What are the management challenges in interdisciplinary research, how do we structure goals and rewards to involve the whole project

Practitioner

The following discussion topics focus on identifying "good…

Continue Reading

It's generally quite easy to get people talking, but it's a lot more difficult to record what's been said and then share it with everyone.

You'd expect a technological solution from the Software Sustainability Institute. But what technology? We need something that's easy to use, available to everyone and free. Rather than choose something new and flashy, we're going to use email.

How does it work?

Everyone who attends the workshop will be signed up to the official CW15 mailing list. This means that anything sent to the mailing list (collabw15@googlegroups.com) will be received by all attendees.

In the spirit of openness, anyone will be allowed to view the email discussions, but only members can send emails.

To view the email discussions, visit the CW15 Google Group. You don't need a Google account to view the Google Group.

Reporting back on discussion sessions

During each discussion session, a Scribe will record the filled-in template in an email, and at the end of the discussion, the Scribe will send that email to the mailing list. This means that everyone will have a record of all of the discussions that take place.

We want the discussions to continue long after the original session, so if you want to comment on an issue raised in a session, simply reply to the relevant email.

For more information, see the reporting back section of the discussion sessions webpage.…

Continue Reading

by Shoaib Sufi, Community Leader.

This year’s Collaborations Workshop was a great success, with a rich variety of outcomes and a great many things discussed around software in research and, in particular, reproducible research. 

CW14 took place at the Oxford e-Research Centre in March, with three days of events including talks, workshops, discussion sessions and keynotes from Institute co-investigator Carole Goble, Microsoft’s Kenji Takeda and Github’s Arfon Smith.

The final day was dedicated to a special CW Hackday where competing teams tried to develop their own software in a short space of time, and ended with prizes being given to the best coders. <--break->

From the feedback it was clear that our attendees enjoyed and found the event useful. Indeed, as our feedback survey showed, 86% of attendees found the event useful, while 90%…

Continue Reading

A full range of videos filmed at this year's Collaborations Workshop are now live on YouTube.

Available on the official SoftwareSaved video channel, under the Collaborations Workshop 2014 playlist, the videos feature opening remarks, a keynote by Institute co-investigator Carole Goble, guest contributions by Github's Arfon Smith and Microsoft Research's Kenji Takeda, and all the lightning talks and collaborative ideas sessions featured at the event.

These videos will also be featured on the main CW14 Agenda page, next to the events they feature. Please do leave comments and share the links too!

In the first of our exclusive Collaborations Workshop 2014 videos, one of the Institute's co-investigators, Professor Carole Goble CBE FREng, discusses reproducibility and why it is important for research. Reproducibility is the principle and practice of being able to repeat an experiment from scratch, which not only ensures high standards for researchers but also prevents errors and fraud in the worst cases.

In this talk, Carole details how software can help you keep track of your experiments and so ensure that they retain that all-important reproducibility. She also discusses the perils of the replication gap, and two key rules - record everything and automate everything.

If you would like a closer look at the slides Carole uses in her talk, they are available for download, and if you have any questions for Carole herself, please do leave a comment below and we will get back to you!

What is the best way to train producers of good software documentation, and why is this important?

What are the five most important things learnt during this discussion:

  • Documentation depends on how we learn – we need training in pedagogy for documentation courses (pictures/text?) – snapshots that the users see, youtube videos (channel)
  • Make no assumptions about the users, could be complete beginners (common problem list)
  • Documentation needs to be peer reviewed
  • Some people don’t read the documentation – have clear error codes. As a documenter – this is really important! (fast track manual, top tips)
  • Documentation writing course – specific course, perhaps with software carpentry course

What are the problems, and are there solutions?

  • Often the software developer is not the best person to produce the documentation. Lack of agreement, different priorities – how does it work? What does it do? How do I use it?  
  • Training – specific pedagogy courses, how do people learn? Visual! Videos and pictures. Keep it online and current!
  • Training - tied in with software carpentry worshops and need a recommended format for documentation (multitple levels – beginner, medium and advanced)
  • Often people don’t read documentation – perhaps structured, clear error messages might be better! Common problem list and small handbook. 
  • Specialist 3rd party training providers could implement the training in documentation. Can also out-source documentation writing
  • Convince developers is time well spent – not just good practice.

We need:

Continue Reading

What are the current must-use standards for publishing and reproducible research?

What are the five most important things learnt during this discussion:

(Apologies for taking a different approach here. I think the diversity of thoughts and ideas are exactly what made this discussion so fruitful, so I'm hesitant to condense it too much)

Development in 2 years

More cross-disciplinary and interdisciplinary publication opportunities. Lots of experimentation.: Metadata around authors and papers for journals, type of contribution. Would like to see more first authorships/ownerships for PhD students, e.g. in biology. Bundled publishing in projects, better impact metrics. Louder shout about problems, creating more noise and more bureaucracy. The more you log and measure, the more bureaucracy and policying emerges. CS community starting to realize drawbacks of combinening travel (conference)  with formal publication. Better curation of data and material associated with articles. Make it searchable and usable by anybody.

Development in 5 years

Funders may be requesting change in behavior from researchers. ID is not addressed well in academia, will have to improve. Some of the key funding programs are just starting now, hopefully will provide solutions. Impact = relevance: this should take relevance and reproducibility in account. Impact also needs to propagate faster. Also need improvements in accessibility. We need a policy group that guides the next REF, and advices on how to judge _all_ academic outputs. Perhaps less change than expected on this time frame, same people likely still in charge. Full generation needed to…

Continue Reading

Use and utility of ELNs and other tools for reproducible science

Q: Are notebook/labbooks style tools, such as IPython Notebook or  Labtrove, useful tools to improve reproducibility?

Q: How are tools such as Github, iPython Notebook and RStudio being used to practice reproducible science?

What are the five most important things learnt during this discussion:

  1. Certain tools intermingle code and data. Raises issue of implicit data contexts which inhibits reproducibility since the notebooks can't be rerun ... and may fail (cryptically).
  2. Interactive IPython-style notebooks could be used for future teaching as the code and context is inline and integrated. The story of the analysis done. A bridge between reading about what's happened and seeing it, and then experimenting with different parameters etc.
  3. Versioning of software can be a challenge as scientists may not be aware of the major impact on reproducibility this can have. And code authors may not be diligent in their versioning policies.
  4. Reproducing/replicating the original author's dodgy data counts as reproducibility.
  5. Reproducibility "in principle".

What are the problems, and are there solutions?

See above.

Risk of a false sense of security if using the right tools and processes but writing the wrong thing

Being open in processes, tools, versions, data at least provides transparency and the hope that someone will spot a flaw

What further work could be done, and who should do it (make a pledge)?

SSI - Hack-up example of using GitHub to store data…

Continue Reading

"This is where my time goes!"

Collaborative Idea team members

Robyn Grant, Fabian Renn, Ian Gent, Mike Jackson

Context

Any involving software development.

Problem

It can be hard for non-coders e.g. PIs, to appreciate how challenging, and time-consuming, development tasks are.

It can be hard for coders to estimate how long they need for development tasks, when encountering new tasks or technologies. This especially applies to new coders. But they'll often be asked to estimate regardless by PIs/managers.

Solution

A web-based app that allows people to record time spent on development tasks e.g:

  • Install a product
  • Set up web server
  • Set up build system
  • Design, develop, test component
  • Fix bug
  • Write user guide
  • Answer support query
  • etc.

Users create an account and record this information and their perceived expertise level.

App can process data across many users anonymously to see average time to do certain tasks.

Help your own personal estimation, help with justifying time estimates to PIs/managers, help PIs/managers when seeking funding.

Policy drivers e.g. SSI, can use the data to drive for recognition of the contribution of and effort invested by coders in research software.



Diagrams

Continue Reading
Subscribe to Collaborations workshop 2014