AddThis Social Bookmark Button

Print

How to Optimize Rank Data in MySQL
Pages: 1, 2, 3, 4, 5, 6, 7

Typical Queries for a Leaderboard Page

Take the page showing gamer 1010's position in game 1 as an example. The simplest design for a leaderboard page is to find on which page gamer 1010 falls, then select 20 gamers in descending order of game 1 score, discarding the data for pages 1 through 19. This is design #1. In SQL:



-- Find gamer 1010's score in game 1. Result: 97101
select score from score where gamer = 1010 and game = 1;

-- At which place does gamer 1010 appear?  Result: 309.
-- Ties are resolved by gamer ID.
select count(*) from score
where score.game = 1 and score.score >= 97101;

-- Select the leaderboard page that shows gamer 1010.
select gamer.gamer, gamer.name, score.score
from gamer
   inner join score on gamer.gamer = score.gamer
where score.game = 1
order by score.score desc
limit 300, 20;

-- Find the rank for the first row on that page.  Result: 284.
select count(distinct score.score)
from score
where score.game = 1 and score.score >= 97173;

You need the last query because the others don't give the PHP code all the data it needs. The leaderboard query selects the rows, but not their ranks, and the leaderboard page needs to display ranks. It's up to the PHP code to add the rank; every time it sees a row with a different score than the last row, it has to increment the rank. Still, it needs to know the first row's rank, which is what that last query finds.

Two Better Designs

The previous queries scan and discard many rows just to retrieve 20. Is there a better way to get a row's rank than counting, on average, half the rows in the current game?

The queries need to find ranks and offsets; that's unavoidable. Unfortunately, InnoDB doesn't perform well for COUNT() queries.

The other problem is how to optimize a query that does a LIMIT with an offset. LIMIT 20 is not a problem, but LIMIT 2000, 20 is, because it generates and discards 99 percent of the rows. I need to get rid of the offset and make sure the ORDER BY uses an index.

To accomplish this, I'll denormalize slightly by adding indexed rank columns to the tables. Ignoring the maintenance cost for the moment, this is design #2:

-- Find score and rank as above.
select score, rank_in_game from score_ranked where gamer = 1010 and game = 1;

-- Find position in leaderboard again.  Result: 309
select count(*) from score_ranked
where game = 1 and score >= 97101;

select gamer.gamer, gamer.name, u.score, u.rank_in_game
from gamer
   inner join (
      (
         -- Fetch start (first 9 rows) of leaderboard.
         select gamer, score, rank_in_game
         from score_ranked
         where game = 1 and rank_in_game <= 290 and gamer <= 1010
         order by rank_in_game desc, gamer desc
         limit 9
      )
      union all
      (
         -- Fetch remaining 11 rows.
         select gamer, score, rank_in_game
         from score_ranked
         where game = 1
            and ((gamer > 1010 and rank_in_game = 290) or rank_in_game > 290)
         order by rank_in_game, gamer
         limit 11
      )
      order by rank_in_game, gamer
   ) as u on gamer.gamer = u.gamer;

That's actually an improvement, even though it looks complicated. I've eliminated one COUNT() query and gotten rid of the offsets on the limits. The queries now return all the necessary data for the leaderboard, including the rank.

There is still one COUNT() query to align the page perfectly modulo 20, but if I don't care about that, I can remove it too. This is design #3:

-- Find score and rank as above.
select score, rank_in_game from score_ranked where gamer = 1010 and game = 1;

-- Select the leaderboard page that shows gamer 1010.
select gamer.gamer, gamer.name, u.score, u.rank_in_game
from gamer
   inner join score_ranked as u on gamer.gamer = u.gamer
where game = 1 and rank_in_game >= 290 and u.gamer >= 1010
order by rank_in_game, gamer
limit 20;

This is even better. The query looks like it's only going to read a few rows. Other types of leaderboards (rank in country, rank overall, etc) can be similarly optimized.

If you must align the pagination along multiples of 20, you can do an extra query the first time a random record is accessed, use it to align the page, and then carry the positioning along to further requests.

Pages: 1, 2, 3, 4, 5, 6, 7

Next Pagearrow




-->