Article:
  Why Learning Assembly Language Is Still a Good Idea
Subject:   O(n^2)? & powerpc learning material?
Date:   2004-05-14 04:06:42
From:   CraigRinger
Response to: O(n^2)? & powerpc learning material?

http://www.nist.gov/dads/HTML/bigOnotation.html
and
http://en.wikipedia.org/wiki/Big_O_notation
may help.


It's really very simple once someone explains it. My understanding is that, for typical uses in terms of computing algorithms, it is typically used to represent "scaling" efficiency of algorithms.


O(n^2) is _fine_ for small values of n (usually data set size / number of items/records ), but quickly becomes a problem as n grows.


Here's an example from an app I'm currently working on. There's a function that runs in O(n^2) time, and I need to use it with large data sets. As it's not a super-fast operation anyway, the result is it's just too slow for me to be able to use.


I have to options to fix it.


(a) find a way to improve it in a constant way, and improve it enough that for my data sets the completion time is OK. I then have to hope my data sets don't grow.


(b) Improve it so that the increase in run time grows more slowly as n increases.


Currently:
Num rows Completion time
1000 3.7s
2000 11s
3000 25s
4000 46s
100000 ???


Not cool for web document delivery ;-) Note that the completion time is growing (roughly) in proportion to the square of number of rows:


1000**2 / 3.7 == 270270
2000**2 / 11 == 363636
3000**2 / 25 == 360000
4000**2 / 46 == 347826


I've been able to make some changes that allow the function to complete in O(n) time. Now (yeah, I know the row counts tested with are different):


4000 : 5s
8000 : 10s
20,000 : 24s
100,000: 131s


or:
4000/5 == 800
8000/10 == 800
20000/24 == 833
100000/131 == 763


make sense? The function used to process the data (n) times, meaning that it'd be procssing twice the data twice as many times when the data size was doubled. O(n^2). The revised function only processes the data once, and is able to complete in O(n) time.


The article comments on people being obsessed with improving algorihmic efficiency in terms of logical completion times - taking an O(n^2) algorithm to an O(n) one, for example. While I agree there are times that you can say "instead, I'm better off getting a 100-fold linear improvement in speed on my O(n^2) algorithm, that'll be good enough for the data set size I expect to use" there are also times when you need to say "O(n^2) is not OK; we need large data sets and I don't think we'll be getting the ten-million-fold linear speed improvement we'd need."


Both have value.