I think the author is missing some important things. When he reasons that a programmer can gain efficiency by rearranging machine instructions manually, it seems a bit as if he failed to keep pace with the scientific work done in the field of compiler construction over the last 20 years or so.
Today's compilers (especially the mature C compilers) contain amazingly impressive algorithms in the code-generating back-end that are fully optimized for the target language. Using mechanisms such as BURS (bottom-up rewrite systems), together with a built-in, up-to-date set of machine instructions for the CPU at hand (including relative costs), these backends are guaranteed to find the optimal arrangements of instructions.
Important here also is that most humans aren't fully aware of the specialized instruction sets of modern CPU's. Compilers are.
This may have been different 20, 30 or 40 years ago, but like the author's programming skills, compilers too have come of age.
The author must realize that more and more software is written in higher level languages. Some of these don't even produce executable code (java for example). When the author claims that one should think assembly, but write high-level code (to choose language constructs that translate to better machine code), it's unclear which instruction set to follow. Either that of the intermediate code, or that of the real hardware. Also, interpreted code often runs on a multitude of platforms, each having their own specialities. The quality of the interpreter (and possibly JIT stages) and runtime system is critical.
Profound knowledge of assembler is definately still crucial. However, these efforts are far more lucrative when writing a compiler than when writing a program that is still to be fed to a compiler.
Probably the gradual shift to higher languages with highly optimized interpreters/compilers isn't as bad as the author suggests.