CW13 Five important things

mertonCollege_flickr_user--londonmatt.jpg

Each of the break-out sessions reported back on the five important things they discovered during their discussion. We've listed these things below. 

 

Session 1

Computational techniques and scaling science; exploring the connection.

1. Collaboration is extremely important.
2. Early career research involvement.
3. Departmental culture and personal experiences can be very different.
4. Need for formal processes for match-making and awareness of opportunities (EPSRC network visualisation: "visualising our portfolio").
5. Legal issues in intellectual property, liability, non-disclosure agreement and commercial split. Publishing vs confidentiality - misconceptions.

Status boards for labs.

1. Launch with credibility. Use things which are directly relevant to the lab (for example, Twitter, Facebook, Cluster status, where people are/status updates, Mendeley, CollabGraph).
2. Make it live and visible (constant updates).
3. Idea: make it something everybody can edit, add to and so on.
4. Need to find tools which are appropriate. Consider starting with a blank sheet (Google Doc) and then move forwards to more specific tools. There is no silver bullet.
5. Somebody will have to do some development.

What is the best way to ensure all researchers can progress with their computing research?

1. We want to develop a social environment where people can get and give help on moving their project forward.
2. There are a number of key step changes: increasing automation; use of software version control; sharing software to increase collaboration.
3. Researchers need to consider their future scalability when they choose their tools. Licensing!
4. Researchers need encouraging to make their workflows more reproducible.

The role and career of the Research Software Developer. 

1. Role of researcher technologist: integral research collaborator with broad set of skills and experience. Post-doc experience (discipline independent) are an advantage, arguably a requirement in most cases.
2. A good first step is to understand who belong to the community of researcher technologists, how are they housed within HE institutions, what are their payscales, how are they supported, what are their job descriptions etc.
3. Models for institutional integration of researcher technologists: Centralised services VS project embedded, different sets of issues emerge.
4. Need for research councils to consider how to support institutional change towards a suitable model.
5. Need for specialised funds available to non-academic departments like research technologist departments

Computational techniques and scaling science; exploring the connection.

1. Types of code that have scaling issues that we are familiar with: code that operates apparatus; code that models science and apparatus that probes it;  code that does data analysis; code that manages storage of the data.
2. Types of scaling problem for each of these, but there are overlaps of scaling problems to across these scaling issues: increase resolution of image, more to process; complexity issue, richer data - different types of domain data to analyse (for example, for images MRI, etc.).
3. Problem of richer data can lead to more research questions, which is a positive outcome?  Positive outcome that is a property of this scaling: when you get richer  data, this is a scaling problem for processing. But when you get more of   this data, make it available, this leads to new questions.
4. Resources are out there to learn about how to approach these problems. If researchers start to encounter certain problems, then they should be aware of the options that can help (for example, Numerical Algorithm Group (NAG) for consultancy or training,  Doctoral Training Courses (DTCs), locally at institutions, lots of teaching/tutorial material on web, etc.).

 

Session 2

How do the Research Councils fund software and what could they do in the future? 

1. Is the balance right between development stage and continued funding (Stage gated projects)?
2. Pathways to impact funding is useful e.g for open source
3. Should We try to encourage s/w to sell to researcher to commercialise?
4. Encouraging technologists to be part of peer review.
5. Are there good tools to cost software projects.

How can we make scientific software easier to find, re-use and sustain?

1. Code licensing can be problematic, particularly for legacy codes that are no longer supported (abandonware).
2. Software required for a particular field is often scattered across different repositories, webpages etc.
3. Documentation eases the evaluation of software and its development.
4. Community support is a major strength for a software project (e.g. CRAN).
5. Sustainability needs to be built-in or added-on.
 

Engagement of non-technical people in software behaviour design.

1. What is the non-technical user? Who are we aiming at?
2. What is behaviour design? How the user EXPERIENCES the software.
3. Prospect of creating user-developers (for example, scratch your own itch) but what  tools/approaches are needed?
4. Iteration of designs, rapid cycles.
5. We should aim to figure out the ideal solution to the real problems, opening out the design space is critical.

Advanced use of social media for researchers.

1. Don't sweat the tumbleweeds - people will come when you have content. Then can use twitter to recycle/refresh old content.
2. Great for feedback - put pseudo-conference paper on blog and then get comments on paper.
3. Very useful for personal brand building - people might not recognise you but they might know your blog.
4. Know your audience - if they don't use Twitter, don't use Twitter.
5. "Starting a blog was the single best professional decision I took".

Using data/linked open data; it's published but how do we make tools and applications to use it.

1. Big drive to publish open data...
2. ...but less of a drive to consume it in research.
3. Researchers aren't conditioned to go out and look for relevant existing data sets...
4. ...and it's technically challenging if they do.
5. Getting researchers to collaborate and share data in an ad-hoc way is more practical and achievable than first devising a perfect collaboration infrastructure to then be imposed upon all researchers (perfect is the enemy of the good).

 

Session 3

How and why to release your research tools. 

1. Researchers live in fear - fear of their code being stolen, their research being stolen, being mocked for the quality of their code, being blamed if someone uses their code and things go wrong.
2. Releasing is complex - when to release, how to release, where to release, who needs to be consulted in institutions before release, under what licence to release.
3. Releasing code ticks a lot of boxes - reproducible research, raise software quality, promote reuse, demonstrate impact - that PIs, funders and institutions want ticked.
4. Software Carpentry and other training can improve software development skills, but researchers then need education in releasing to get it out there!

How do we measure the impact of the work we do?

1. Impact is an effect on people - changing their behaviour (by giving them tools, training them).
2. Almost all measurements methods are flawed. Should we still use them, or rather how?
3. Measurements may not be as comparable as they look like (counting long vs short runs for instance).
4. Hard to measure scientific impact; easier to measure things that indicate there will/may be some later (downloads for instance).
5. Impact is in different domains, on different timescales.


The importance of openness and freedom in research-led software development.

1. There are four main areas of interest when it comes to openness: licensing, overcoming the fear of scrutiny, community building and advocacy to the closed source people.
2. There are nuanced ways of viewing openness - we are not purists. You can provide premium software, charging doesn't mean that you reduce your community size.
3. Licensing is not just important for protecting your software, but also to make it clear what users can do with the software.
4. We're not sure whether people stealing code is a real problem, but it's certainly a real fear. Publishing software could combat the worry of someone stealing your code.
5. A general article that references software is useful for validating  the data and interpretation of your research, and it also can validate the effort you've put into developing the software.

How to increase the bus factor beyond 1.

1. We need to help people know what skills they need and where to get help with a particular technology in a field.
2. We should encourage pair programming.
3. Researchers need to know how to assess which software is worthy of basing their research on, that is, how to know what the bus factor of an existing tool is. Although low bus factors may not be a problem for well documented, stable code.
4. If you acquire a bus factor 1 project, you should be prepared to take on that tool or use an alternative.
5. Organisational bus factors can be a problem too, that is over-reliance on a single person (for example PI), many Postdocs and PhDs, not many junior lecturer grades.

What role do researchers feel central facilities can play in assisting software sustainability?

1. Software tools in the open source world naturally have strong 'network effects' towards centralisation (github, package management systems) because there are so many benefits (URIs, discoverability, quality, metrics) - with no central planning.
2. Hardware centralisation exists in academia, but often there is a lack of software centralisation or consensus in research - something is broken in the 'network effect'. This has really negative effects  (no URIs for software so it's hard to track, lack of discoverability so people reinvent the wheel, lack of metrics for judging developers etc).
3. Why the difference? Researchers are frustrated by lack of formal guidance from research councils, lack of input from developers into funding review process, hence difficult to know what services they should use... many of the issues we have discussed over the last few days.
4. Specific steps that can be taken by everyone to support centralised services: use and promote where they do exist (and are appropriate), talk to users, more funding for promotion.
5. Basic need: people with software expertise at really senior levels in research councils to provide guidance, vet funding applications for appropriate use of facilities, sharing expertise across research councils, ensure the centralised facilities are appropriate.
 

Session 4

Maker digital culture as a route to STEM for school pupils.

1. Students don’t go into STEM because they don’t see that as the identity they want. The values they want to have (i.e. creativity, innovative, teamwork) do not perceivably belong/happen in science to them.
2. Maker culture is fundamentally concerned with people-centred, real-world design problems.
3. School systems are sufficiently broken. Exams and content dominate learning and education strategies.
4. Bringing digital maker groups to model/support teachers in changing up school-based activities infuse some maker culture in stale science teaching.
5. There is a role for researchers to bridge science and outreach in a way that engages students.
 

What are the pragmatic things which will help us achieve reproducibility and reusability of software, models and data?

1. Catalogues need good curation to avoid becoming a chocolate teapot.
2. Problems are not confined to any one domain
3. Search criteria for catalogues are difficult to design in general
4. Discoverability is the first hurdle.

Developing community grids / crowd-sourced HPC.

1.Considering user experience: security, intrusion into normal operation of computer, attractive interface.
2. Should be considered carefully: properly resourced; properly considered.
3. Good advertising/publicity is essential.
4. Working out the tip-over point. If we have x number of machines, is this worth my time?
5. Knowing how you will scale the software. What happens to my server if this becomes popular? Solution: use the cloud!

Models for credit used to acknowledge the value of software in research. How does one get credit for this work?

1. Researchers that use software and publish results based on that software do not always cite its use and/or its developers.
2. Funders and policy makers not always aware of importance of software development for research.
3. Developers are often stuck in the quandary of needing to concentrate on work (i.e. software) that is not valued by institutions and academia at large: not enough time to e.g. write journal articles that count towards career progression.
4. Digital outputs should be valued as highly as publications.
5. Different print-based outputs considered differently by different fields: highest regard for journal papers in science, conference papers in computer science, books/monographs in humanities. (Software often doesn't count!).

An impossible question: what's the best open-source software, and why? (The discussion in this group lead to 19 rather than 5 important things!)

1. Linux as exemplar - release early and release often!
2. Julia - a completely open, inclusive development with plenty of user input!
3. Mendelay and the user feature request system.
4. Help and support - the problem of asking for help from people who are already doing it for free…

5. Android (while not entirely open) still has stack overflow. LibQT - a widget library - allows for flexible UI.
6. Users shouldn't have to build their software - it installs in one command.
7. Administrator privileges as barrier to use - creating software that circumvents this.
8. Open source, and the need to cater for Windows (and its many, many users).
9. Mongo DB, Putty and R - easy to deploy.
10. Download the executable file in as easy as a fashion as possible.
11. Image J and plug-ins - being able to write them and install them without difficulty, as well as thorough instructions.
12. Using Beta releases to select users as testing model. Apple's IOS often doesn't work because it doesn't do this. Whereas, the Beta test is a sign of respect for one's users. This also allows small development teams to access thorough product testing.
13. This depends on the skill level of the testers too, however.
14. Package learning curves - do they matter? Is this a metric with which we can judge open source software?
15. Audacity - easy to use and reliable.
16. Notepad++ for Windows - takes care of line endings, future rich and extensible.
17. Vim/Emats - Similar to Notepad++
18. Open Standards - any good examples? Apache, Firefox (despite its memory hunger) etc.
19. GIMP - the poor man's Photoshop, which is often far more accessible for universities.