AddThis Social Bookmark Button

Print

How to Optimize Rank Data in MySQL
Pages: 1, 2, 3, 4, 5, 6, 7

How to add the rank column

The rank column is basically a cache. This seems like a better design than calculating rank with every query, except for one thing: maintaining the column. If there isn't a cheap way to do this, I've just made the table and indexes larger, and moved the work from the SELECT queries somewhere else. In some cases, this might be a big win for example, if the data changes infrequently. Yet as I told you, this site has frequent updates, so this design is only justifiable if I have a cheap way to maintain the rank column.



Fortunately, there is a way. It relies on two things. The first is simulating a write cursor with MySQL's user variables, so you can do one-pass rank calculations. The second is smart update batching; you can calculate how expensive an update will be, and apply it immediately only if it's cheap.

Here's how to add the rank_in_game column I used in the previous queries:

drop table if exists score_ranked;

create table score_ranked like score;

alter table score_ranked add rank_in_game int not null default 0,
   add key(game, rank_in_game);

I'm creating a new table, rather than modifying the existing one. I'm going to benchmark later, so I want to keep the original table as it is.

Now I'm ready to populate the table. This is where I have to get tricky with user variables. Here's the query to insert the existing data while simultaneously calculating rank:

set @rank := 0, @game := null, @score := null;

insert into score_ranked(gamer, game, score, rank_in_game)
select gamer, game, score,
   greatest(
      @rank := if(@game = game and @score = score, @rank, if(@game <> game, 1, @rank + 1)),
      least(0, @score := score),
      least(0, @game  := game)) as rank
from score order by game desc, score desc;

Don't worry, I'll explain how that works!

How the User-Variable Query Works

MySQL's user variables let you use and assign to them at the same time. This lets you do some really neat things with them. I've written several articles about advanced user variable techniques on my own website, but I'll give a summary here. First, analyze the query to see how it works:

select gamer, game, score,
   greatest(
      -- Calculate current row's rank
      @rank := if(@game = game and @score = score, @rank, if(@game <> game, 1, @rank + 1)),
      -- Save score and game for next row
      least(0, @score := score),
      least(0, @game  := game)) as rank
from score order by game desc, score desc;

Ignore the GREATEST() function for a moment, and read the query from the inside out. The first thing to know about a user-variable expression is that it executes for every row in the result set, when the row is generated. In other words, MySQL processes the rows iteratively, not all at once as a set.

The second is that the query reads every row from the score table in rank order. This query only works if the query reads the rows in the right order.

The query works because you can find a gamer's rank by starting at 1, and increasing the rank every time you find a gamer with a lower score. Row by row, the query looks to see if the current game and score are the same as the last values seen. If so, @rank remains unchanged; otherwise it resets to 1 when the game changes, and increments @rank if the game is the same.

After finding the current row's rank, the query needs to save the current game and score for the next row, but you can't put these assignments into a column in the SELECT list. There are no corresponding columns in the destination table to accept them, so it must be possible somehow to remove those spurious columns. That's why I wrapped them inside a LEAST(0, ...) function, and then wrapped the whole thing inside GREATEST(). This ensures that the @score and @rank assignments are executed but discarded.

The GREATEST() function returns the newly computed @rank value.

Pages: 1, 2, 3, 4, 5, 6, 7

Next Pagearrow




-->