Maybe that word processor doesn't need to be 10.000 times faster to be efficient, but in a multi-threaded or multi-CPU system those other applications running in the background would benefit a lot if it were.
Just an example of poorly written code that could easily have been a lot better:
The drivers for my printer block my entire machine for as long as the printjob lasts.
In your world such is perfectly OK, after all the printer doesn't need the machine to be available for other things while printing so why make the driver such that it does not block the machine?
But maybe the user (me!) wants to do something else while his printer chugs away on that 100 page printjob in the background. Why didn't you think of that?
And yes, I've learned assembly in uni. And there too it was abandoned because OO languages were all the rave at the time.
We used to learn Assembly and Pascal, then along came C++ and every single curiculum was changed to that in the blink of an eye.
I'd wish I had time to keep up my assembly skills but I don't...
Instead I continue to keep the lessons learned in mind. Which operation will yield the smallest possible number of CPU cycles or OS calls, what memory structure is the most efficient, etc.
Every programmer should be forced to run his tests on the absolute minimal hardware the program is to run on (as determined by marketing, not himself).
Until he is no longer swearing under his breath because the application is unworkable because of performance or user interface design (another area where major improvements can be made, and I don't mean skinnable applications which are a cheap "solution" that does nothing) he's not allowed to pass it on to the test team (you do have testers in-house don't you?).
One of the things I was asked to do as a contractor was to improve the performance of a small (about 1MB source) C++ application.
It was beautifully architectured, applying every design pattern known to Man and adhering strictly to the rule that everything is a class and every bit of code that's used more than once should be a method.
In fact, that beauty was its disaster. All the object creation, method callstacks, etc. etc. etc. caused so much performance overhead that the application needed 36 hours to run a relatively simple check over a logfile of approximately 1MB.
I'd have rewritten it from scratch in C but that was not allowed, so I did the next best thing which was to optimise the heck out of existing code by things like unrolling loops and inlining one-liner methods (did I tell you the author had deemed it prudent to create a method to add 2 ints, adding a method callstack to "x=y+z"?
In the end I was able to shave 20 hours (or roughly 60%) from the runtime with such simple optimisations, not a bad gain at all given that the application was supposed to run once a day which was now possible where in the past it had not been.
Had the author of the application (who had left the company to write theoretical books on C++ application design and UML) had a grain of realism he'd have realised the performance penalties in his design and used some pragmatism, thus saving the company a week of my time (at 100 an hour consultancy fees) and a lot f frustrated users.
And there's the other big problem these days with performance:
Many applications are designed by OO theorists who have no understanding of the implications of their theories on the performance of the code those designs dictate.
Not every performance problem is because of a poor implementation of an algorithm.
Many problems have to do with copious method calls, object creation and mandated errorchecking enforced by insistance on all kinds of frameworks, design patterns and best practices.
While these may make for fantastically maintainable code (I beg to differ as overenginered code is as hard or harder to maintain than underengineered code), they also make for poor performance and code bloat.
At one project (in which the same designer named earlier had a big impact) a relatively simple automated mailer and customer management program (booking of incoming documents, automated generation of reminder letters, etc.) took up 100MB of sourcecode which yielded roughly 50MB of executable and DLLs.
A weekly printrun of roughly 500 pages of form letters would take over 24 hours to complete until we went in and created some hooks that allowed the printing system to be rewritten to bypass about 80% of the application framework (so 80% of the object creation and method calls generated by overdesign).
After that the same printrun was limited in speed only by the speed of the printer and the underlying database calls...