TrueSkill, A Gamer's Guide - Part 1 What Is Skill Rating and Why Use It?

Uploaded by MrTimeItself on 12.11.2012

[into music]
Hello, Time Itself here with part one of my gamer’s guide to TrueSkill. This guide is
aimed at giving gamers a better understanding of Microsoft’s TrueSkill system and how
it fits in their favorite games. No proficiency in calculus or advanced statistics required.
In part one we’ll talk about what a skill rating system is and some of the basics of
the TrueSkill system including why everyone starts at zero.
Trueskill was in beta for Halo 2 and went on to be used in Halo 3 as well as many other
games such as Age of Empires 3, a personal favorite. And hopefully it will make an appearance
in Halo 4 as 343 Industries continues post-launch support.
Skill ratings can help players find games against similarly skilled opponents making
for closer, more competitive and enjoyable games. This is the number one purpose of skill
ranking systems, It’s not to inflate egos. These skill levels aren’t like RPG levels.
Skill levels approximate how a player’s skill relates to everyone else who plays the
game where RPG levels are a measure of accumulated experience. Skill ratings are always a relative
and require an active community of players to have any context or server any purpose.
Ranking is done solely based on who wins. That is the objective of the game after all.
Other stats like how lopsided or close the game was or individual performance on a team
aren’t considered. Those things could allow for ratings manipulation, they’d make the
system even more complicated and would require significantly more effort to tune any additional
weighting parameters into ratings. For example, civilization strengths got a small balance
for ratings purposes in Age of Empires 3 but this required a lot of data after each balance
patch to tune and even then it was still quite minimal. We’ll talk more about tuning in
part two. TrueSkill is a variation of the Gliko rating
system. These systems keep track of two numbers for every player; a skill rating and a how
accurate the system considers that rating to be. After each match the system calculates
new updated values for each player based on the result.
Here is the first trick of the TrueSkill rating system. Both numbers I mentioned aren’t
the ones that players see. Instead it gives a conservative rating that, if the assumptions
about how the game community’s skill level are accurate, is the level which the player
is almost certainly, on better than 99.87%, certain to be better than. Just keep in mind
that the conservative ratings are what everyone sees. The starting rating and uncertainty
are set so that players see their conservative rating as 0 when they start and then see it
increases as the level of uncertainty about their rating decreases even if the ‘approximate
rating’ goes down to below the starting value.
One of the biggest challenges for a rating system is finding accurate ratings for players,
possibly hundreds of thousands for a popular game, in minimal number of games. If you were
to assume that the higher rated player always wins the problem turns into a classic sorting
problem, something computer scientists have done a lot of work on. We know the theoretical
minimum but our ranking problem is more complicated. One part of this challenge is that the better
and more highly rated player will be more likely to win, but not certain to win. We
only want to update ratings after each game and be done with it. We don’t want to go
back keep track of how other players do after a given match and then go back and revise
skill ratings later. It’s just too complicated that way. Team games present even more or
a challenge. And what to make of games that are played where the result is already so
expected that it doesn’t give the system any new information. This isn’t an easy
problem. But how does TrueSkill do here? It depends
on the game mode. Free for all games offer to rate every player against every other player
in the game and accurate ratings can be found very quickly. I’m guessing this is partly
why free for all was the first choice offered to players in Halo 2 and 3. But as you combine
players onto teams you get less information from each game. As only who won or lost is
considered (and not the placement of players within the teams) or the possible tie. TrueSkill
assumes on a team players’ skills are strictly additive, but that isn’t always the case
especially for objective game modes, again more complication that we care to take on.
Still, from the information Microsoft has put out they claim that the TrueSkill system
can accurately rate a player in as few as 3 games, in eight player free for all that
is. But that number gets a whole lot bigger as the teams get larger and the results of
the game can’t be attributed to any specific player’s skill, taking 46 games in 4v4 and
a whopping 91 games in 8v8. TrueSkill offers some impressive advantages
over the common ELO systems but it still has a hard time with team games, especially objective
team games due to the complicated interactions between the players. I don’t think any rating
system out there right now is up to the task right now. One of the issues with TrueSkill
is that it requires some tuning to perform as intended. These tuning parameters will
be the topic of part two when we dig into a little more detail and see more about how
TrueSkill works. Thanks for joining me and I’ll see you guys
next time.
[game play audio]