Assigning Fellowship programme 2018 applications to reviewers

Assigning fellows applicationsBy Raniere Silva, Community Officer

A few people have asked us how we run certain processes at the Institute. This time, we will look at how we assigned Fellowship programme 2018 applications to our reviewers.

Data repository

We used Google Forms to collect applications as, from experience in previous editions, we know that Google Spreadsheet works well for reviewers as they are familiar with the platform and usually have a Google account. Google Drive then allows us to share the data the reviewers need and use "Microsoft Excel programming language" to summarise the result of the reviews.

On the master spreadsheet each reviewer has a sheet with their initials, where they will find all the information needed to assess the candidates, with relevant columns to mark their thoughts. Data validation helps reviewers input correct values.

Sheet generation

To generate each of the reviewers’ sheet we use pandas and PuLP, both Python libraries. Pandas allows us to interact with raw data stored in a tabular form as a Google Spreadsheet and to create local CSV files. PuLP is a linear optimisation modeler and allows us to describe the restrictions we want when assigning the reviews. PuLP also includes cbc (Coin-or branch and cut), a linear optimisation solver that can provide us with a feasible assignment given specific restrictions.

Assignment restrictions

We want to impose some restrictions on the review distribution to make the process more balanced and ensure we select the best candidates. The restrictions that we used this year were:

  • each candidate must receive the same number of reviews, to make the average score fair;

  • no candidate should be reviewed by someone from the same institution, to avoid reviewers trying to inflate the number of Fellows at their institution;

  • each candidate must receive at least one review from someone working in a close domain, to prevent candidates being untruthful in their applications;

  • reviewers must review at least two candidates at the same career stage, to avoid candidates being compared with earlier or senior candidates; and

  • each reviewer must mark a similar number of applications, to avoid a small group of reviewers being responsible for selecting most of the applications.

Restrictions that we plan to use in the future are:

  • each candidate must be reviewed by a male and a female reviewer, to avoid gender bias;

  • each reviewer must mark a similar number of male and female applications, for the same reason;

  • no candidate can receive all of their reviews from a single institution, to avoid problems with institutional rivalry;

  • no candidate can receive all their reviews from the same career stage, to ensure reviews are balanced; and

  • No candidate can receive all their reviews from the same domain, for the same reason.

We also have a list of restrictions that would be great to use but may make the assignment infeasible:

Restrictions and PuLP

If we didn't impose any restrictions when assigning reviews, the assignment could be reduced to a loop over the list of applications. A simple Python code to solve it could be:

reviewers = [[] for i in range(number_of_reviewers)]

reviewer = 0

for application in applications:

   reviewers[reviewer].append(application)

   reviewer = (reviewer + 1) % len(reviewers)

When we want to enforce a restriction as simple as that the reviewer and the applicant can't be from the same institution, the number of lines of (Python) code would explode. For example, we would have a code like:

reviewers = [[] for i in range(number_of_reviewers)]

reviewer = 0

for application in applications:

   application_institution = ...

   reviewer_institution = ...

   while (application_institution == reviewer_institution):

       reviewer = (reviewer + 1) % len(reviewers)

       reviewer_institution = ...

   reviewers[reviewer].append(application)

   reviewer = (reviewer + 1) % len(reviewers)

but the while-loop creates a point of failure and our program could end up in a infinite loop. This is the main motivation to use PuLP.

The assignment problem can be modeled using one boolean variable for each pair application-reviewer which will indicate if application "A" should be marked by reviewer "R". Using PuLP, we can create our variables with:

x = {}

for application in applications:

   for reviewer in reviewers:

       x[(applicant_id, reviewer_id)] = pulp.LpVariable(

           "applicant={}, reviewer={}".format(applicant_id, reviewer_id),

           cat="Binary"

       )

The restriction that the reviewer and the applicant can't be from the same institution can be modeled by saying that the sum of the variables related with application "A" and reviewers of the same institution as "A" are zero. The Python code for this restriction would be something like

for applicant in applicants:

   prob += sum([x[(applicant, reviewer)] for reviewer in reviewers if applicant_institution == reviewer_institution]) == 0

The code is smaller, easier to understand and free of points of failure. This becomes all the more appealing when we start adding the constraints discussed in the previous section.

Sheet upload

Pandas can't upload the generated reviewers’ sheets to Google servers. We ended up using googlesheets, an R library to interact with Google Sheets, to upload the generated CSV files. Python has a client library for the Google Sheets API, gspread, but their documentation mentioned configuration steps, including obtaining a access token, and we didn't use it because these steps could become time consuming in future.

Conclusions

PuLP was a time saver for us and we are very happy with the result. In future editions we plan to add more restrictions, as mentioned early, to make the process more fair and to resolve conflicts of interest early by incorporating them into the constraints. During this edition we have been swapping reviews between reviewers when conflicts of interest are reported.

Acknowledgements

I am grateful for the recommendation of PuLP by our Fellow Vincent Knight.

Support file

The documented Jupyter Notebook with the Python code that we used minus Google Spreadsheets addresses is available here.

Posted by s.aragon on 18 December 2017 - 9:50am