Why Numbers Are a No-Go in Hackathon Judging

Milla Lappalainen
Junction
Published in
5 min readNov 6, 2017

--

Judging is a matter of honour for hackathons. The most esteemed hackathons in the world have many things in common, and successful judging is one of them. With a multitude of projects and judges, hackathon judging is a tough nut to crack. Add the limited time for giving the scores and interpreting them, and you have yourself a real challenge.

We had a chat with Juuso Lappalainen, who is part of the team developing Junction’s judging system.

“Judging is a tricky subject, not least because it’s always somewhat subjective. There’s probably not a single hackathon in the world that would have done it perfectly”, Juuso says.

“At Junction, we consider judging to be a huge part of what makes the hackathon experience enjoyable for all participants, and ultimately the sign of a quality hackathon. That’s why we’re working extra-hard to make sure we have the best possible judging.”

From pen and paper to swift digital judging

Last year, the judging of Junction leapt to the digital age. The new judging system was no longer based on pen and paper, like before, but on a web app that judges could access on their phones. Most importantly, numerical grades were abandoned.

“If you think about judging in general, the most common way to evaluate a performance is to attribute a number to it, for example on a scale from 1 to 10”, Juuso starts.

“However, a numerical system has clear flaws. Let’s say a judge gives the first project full points, a ten. What if the next project is even better? A numerical scale doesn’t have the ability to stretch upwards, if needed.”

Even if a single judge manages to perfect his evaluation, there are bound to be differences between the judges. Some judges look at projects more positively than others. For some, a 10 is an unattainable extremity. For others, it’s what the best project deserves and other projects are mirrored to it.

“This alone is not necessarily a problem due to a phenomenon called the Wisdom of Crowds. There tends to be an equal number of people who overestimate things as there are those who underestimate. Therefore, over-positive evaluations counteract over-negative evaluations”, Juuso explains.

The Wisdom of Crowds is demonstrated in this astonishing video by professor Marcus du Sautoy. He fills a jar with some 4000 jellybeans and asks around for estimates on the content. Nobody guesses correctly, but the average of all guesses is only 0.1 percent away from the correct amount. A collective guess is like information.

Professor Marcus du Sautoy showcases the curious magic of the Wisdom of Crowds.

Developing a judging system without numbers

All the same, a number-based judging system is prone to errors. Dealing out numbers is not an easy task for a non-professional judge with no real frame of reference for what is a good hack.

A numerical system also requires the judge to remember the projects he or she has judged in order to be fair. If not, the judging easily becomes wishy washy.

The solution? Pairwise comparison.

“It’s impossible to ensure that everyone would judge the projects identically. However, almost anyone can pick the better one out of two projects”, Juuso says.

Instead of giving a project a numerical score, a Junction judge is asked to pick the better one of two projects. Winning such a comparison pushes the project up a notch and losing it pushes it down. The more comparisons are done, the more reliable the ranking becomes.

The idea of a judging system without numerical scores was introduced by Anish Athalye for HackMIT. He wanted to to figure out a way to improve the quality of judging at hackathons and large-scale competitions in general. Now that math is running behind Junction’s judging system, too. It was originally developed by Chen et al.

“The magic of the comparative judging system is that every vote affects every other vote, too, and not just the pair in question. With 150 judges evaluating at the projects for 3 hours we should get a very reliable result”, Juuso says.

In 2016, the new judging system was tested with good promise. This year, team Junction is ironing out the kinks. The biggest change is that now every track has its own pool of judges. Looking at projects from the same track, for example Fintech, should make the job easier for judges and yield even more reliable results.

A callout for Community Judges at Junction 2017

Interested in becoming a Community Judge at Junction? Great, read on! We are looking for a diverse group of judges, so remember: you don’t need to be a programmer. The most important criteria for a hackathon judge is curiosity.

Junction is judged in two parts. Once time is up, Community Judges have three hours to make as many pairwise comparisons as they can. With the help of Athalye’s math, the winners of each track will be chosen. The best track winners will go on to the finals, where they get a chance to pitch their hack to a jury of professional judges — and one of them will be crowned the winner of Junction 2017 and take home the grand prize of 20,000 euros.

Fill the short application here to become a Community Judge!

A good Community Judge

  • has an interest towards cool new tech
  • wants to be part of deciding who ultimately deserves to win the grand prize
  • has some amount of knowledge or experience related to the track(s) they want to judge.

A Community Judge must

  • be available in Espoo, Finland on Sunday the 26.11. from 10–15
  • have a smartphone.

Perks of being a Community judge

  • Getting to experience Europe’s largest hackathon first hand
  • Free snacks and drinks at the event to keep you energized
  • A Junction t-shirt to take home as a memory

--

--