How to Optimize Rank Data in MySQLby Baron Schwartz
Imagine a site that keeps track of gamers' scores in computer games and displays gamers in "leaderboards" ordered by decreasing score. The site is written in PHP and the backend is a MySQL 5 database server. Because the data changes frequently, the server uses the InnoDB storage engine.
This scenario occurs often in applications that need to rank and paginate data by some criterion such as popularity, score, or recency. Forums are a good example, as are hot lists on social bookmarking sites.
Because the example gaming site has millions of users, the leaderboards use pagination. This means they require a combination of ranking and limits with offets. This is a common requirement, and it's hard to optimize. The obvious queries will probably not perform well under heavy load or high concurrency, and don't scale with increasing data size. The ranking information will either be expensive to generate or expensive to maintain.
I've found three possible solutions. The obvious design performs badly, but there is a better way.
I think these designs are appropriate for a mid-sized site, which is big enough that you can't use a single MySQL server without optimization, but not so big that it requires massive clustering or horizontal scaling. My goal is to show how you can design ranking applications to get more out of your existing systems.
Before I start, I need to explain the site in more detail. It keeps scores for 10,000 gamers. The gamers collectively play 10 games in 5 countries. Pretend every gamer plays every game, so the site keeps 100,000 score records. I'm intentionally making this small so you can run the queries on an average computer.
The site's most important pages are the leaderboards. They display gamers in descending order of rank by various criteria such as country, game, and overall. For example, gamer 1010 lives in country 2 and plays game 1. She is ranked 51st in game 1 in her country, 290th in game 1 in the world, and 1,867th overall in the world.
Each leaderboard shows twenty gamers per page. Suppose you click on the link that leads from gamer 1010's profile to her leaderboard page in game 1. There are 50 gamers in her country with an equal or higher score in game 1, so you see page 3 in the leaderboard, and she is 11th on the page.
Here's a script that will create tables and populate them with quasi-random data for the rest of this article. I've given a seed to all calls to
RAND(), so if you run this script, you will get the same data I am using. This script will generate 100,000 rows and takes several minutes to run on my desktop machine.
set @num_gamers := 10000, @num_countries := 5, @num_games := 10; drop table if exists gamer; drop table if exists game; drop table if exists country; drop table if exists score; drop table if exists semaphore; create table semaphore(i int not null primary key); insert into semaphore(i) values (0); create table gamer( gamer int not null, country int not null, name varchar(20) not null, primary key(gamer) ) engine=InnoDB; create table game( game int not null, name varchar(20) not null, primary key(game) ) engine=InnoDB; create table score( gamer int not null, game int not null, score int not null, primary key(gamer, game), index(game, score), index(score) ) engine=InnoDB; create table country( country int not null, name varchar(20) not null, primary key(country) ) engine=InnoDB; -- I use the integers table to generate large result sets. drop table if exists integers; create table integers(i int not null primary key); insert into integers(i) values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9); insert into country(country, name) select t.i * 10 + u.i, concat('country', t.i * 10 + u.i) from integers as u cross join integers as t where t.i * 10 + u.i < @num_countries; insert into game(game, name) select t.i * 10 + u.i, concat('game', t.i * 10 + u.i) from integers as u cross join integers as t where t.i * 10 + u.i < @num_games; insert into gamer(gamer, name, country) select th.i * 1000 + h.i * 100 + t.i * 10 + u.i, concat('gamer', th.i * 1000 + h.i * 100 + t.i * 10 + u.i), floor(rand(0) * @num_countries) from integers as u cross join integers as t cross join integers as h cross join integers as th; insert into score(gamer, game, score) select gamer.gamer, game.game, floor(rand(0) * @num_gamers * 10) from gamer cross join game;