Typical Queries for a Leaderboard Page
Take the page showing gamer 1010's position in game 1 as an example. The simplest design for a leaderboard page is to find on which page gamer 1010 falls, then select 20 gamers in descending order of game 1 score, discarding the data for pages 1 through 19. This is design #1. In SQL:
-- Find gamer 1010's score in game 1. Result: 97101 select score from score where gamer = 1010 and game = 1; -- At which place does gamer 1010 appear? Result: 309. -- Ties are resolved by gamer ID. select count(*) from score where score.game = 1 and score.score >= 97101; -- Select the leaderboard page that shows gamer 1010. select gamer.gamer, gamer.name, score.score from gamer inner join score on gamer.gamer = score.gamer where score.game = 1 order by score.score desc limit 300, 20; -- Find the rank for the first row on that page. Result: 284. select count(distinct score.score) from score where score.game = 1 and score.score >= 97173;
You need the last query because the others don't give the PHP code all the data it needs. The leaderboard query selects the rows, but not their ranks, and the leaderboard page needs to display ranks. It's up to the PHP code to add the rank; every time it sees a row with a different score than the last row, it has to increment the rank. Still, it needs to know the first row's rank, which is what that last query finds.
Two Better Designs
The previous queries scan and discard many rows just to retrieve 20. Is there a better way to get a row's rank than counting, on average, half the rows in the current game?
The queries need to find ranks and offsets; that's unavoidable. Unfortunately, InnoDB doesn't perform well for
The other problem is how to optimize a query that does a LIMIT with an offset.
LIMIT 20 is not a problem, but
LIMIT 2000, 20 is, because it generates and discards 99 percent of the rows. I need to get rid of the offset and make sure the
ORDER BY uses an index.
To accomplish this, I'll denormalize slightly by adding indexed rank columns to the tables. Ignoring the maintenance cost for the moment, this is design #2:
-- Find score and rank as above. select score, rank_in_game from score_ranked where gamer = 1010 and game = 1; -- Find position in leaderboard again. Result: 309 select count(*) from score_ranked where game = 1 and score >= 97101; select gamer.gamer, gamer.name, u.score, u.rank_in_game from gamer inner join ( ( -- Fetch start (first 9 rows) of leaderboard. select gamer, score, rank_in_game from score_ranked where game = 1 and rank_in_game <= 290 and gamer <= 1010 order by rank_in_game desc, gamer desc limit 9 ) union all ( -- Fetch remaining 11 rows. select gamer, score, rank_in_game from score_ranked where game = 1 and ((gamer > 1010 and rank_in_game = 290) or rank_in_game > 290) order by rank_in_game, gamer limit 11 ) order by rank_in_game, gamer ) as u on gamer.gamer = u.gamer;
That's actually an improvement, even though it looks complicated. I've eliminated one
COUNT() query and gotten rid of the offsets on the limits. The queries now return all the necessary data for the leaderboard, including the rank.
There is still one
COUNT() query to align the page perfectly modulo 20, but if I don't care about that, I can remove it too. This is design #3:
-- Find score and rank as above. select score, rank_in_game from score_ranked where gamer = 1010 and game = 1; -- Select the leaderboard page that shows gamer 1010. select gamer.gamer, gamer.name, u.score, u.rank_in_game from gamer inner join score_ranked as u on gamer.gamer = u.gamer where game = 1 and rank_in_game >= 290 and u.gamer >= 1010 order by rank_in_game, gamer limit 20;
This is even better. The query looks like it's only going to read a few rows. Other types of leaderboards (rank in country, rank overall, etc) can be similarly optimized.
If you must align the pagination along multiples of 20, you can do an extra query the first time a random record is accessed, use it to align the page, and then carry the positioning along to further requests.