What an amazing world, the world of Kaggle competitions! This month I spent part of my free time to go through the ‘March Machine Learning Mania 2016’ competition, by studying the subject and by attending two meetups here in London. The objective of the Kaggle competition was to predict the 2016 NCAA Basketball Tournament, called March Madness. It was a very enjoyable experience.
You might think, what the heck has this to do with HR Analytics, the subject in which I am normally interested in. Quite a lot in fact! Predicting performance through machine learning algorithms is a crucial aspect for HR Analytics.
I learnt about a Bayesian skill rating system called TrueSkill used in large-scale commercial online gaming platforms, for example Xbox Live developed by Microsoft. The purpose of such ranking system is to identify and track the skills of gamers in order to be able to match them into competitive matches. There is even an R package on TrueSkills.
As Wikipedia says under ranking systems, performance isn’t measured absolutely. It is inferred from wins, losses, and draws against other players. Players’ rating depend on the ratings of their opponents, and the results scored against them. The difference in rating between two players determines an estimate for the expected score between them.
It is possible to infer entire time series of skills of players by smoothing through time. The skill of each participating player is represented by a latent skill variable which is affected by the relevant game outcomes that year, and coupled with the skill variables of the previous and subsequent year.
I got knowledgeable about the ELO ranking system successfully used by a variety of leagues organised around two-player games, such as world football league and chess competitions. I reviewed the art of seeding, i.e. arranging matches at knockout tournaments to allow strong teams to compete against their peers only towards the end of the tournament and about the upsets, i.e. the games surprising all predictions.
A phase of entertaining play, can only be useful to develop further
HR Analytics skills, for which there is so much need in the HR function.
The two meetup meetings allowed me to encounter other like minded data and computer enthusiasts to discuss the latest tools and techniques available to the data science community. Meetups are an excellent alternative to the sometimes dry e-learning available on line. The oddities of the R language go much better down with a pizza sponsored by a bank, I can assure you.
Kaggle is a community of data scientists who come to compete in machine learning competitions. The platform allows data scientists to share and collaborate.
The data sets provided by Kaggle about this ‘2016 NCAA Basketball Tournament’ were absolutely clean and easy to use. Only a bit of time was required to understand the whole subject, which was not too difficult for me, as I grew up in Varese, Italy, a town with a strong tradition in basketball.
Keywords: HR Analytics, People Analytics, Data Driven HR, Talent Analytics, learning R language, Kaggle competition, Machine learning, Basketball, March Madness.