A while back I was asking about how to look at server load issues. I wound up using collectd, which was pretty useful.

I identified a handful of disk I/O spikes - unfortunately it’s hard to match these up against the subjective delays/lag problem. However, I looked into ways to improve I/O handling, and found information about caching and pdflush.

I’m currently experimenting with upping /proc/sys/vm/swappiness (to 90), to encourage more swapping to disk - on the grounds that I do have an active server, and I’d rather use the disk cache more. I think. I’m a little unsure of my logic here, but so far this seems to have improved the situation a bit, so maybe I’m on the right lines.