I am a professional assembler (former C/C++) programmer and an active member and contributor to the assembler community.
I started learning assembler about 3 years ago in my spare time. My first step, after learning the basic instruction set, was to co-write a small project in both assembler and C/C++. Using the disassemblies of my C/C++ program to help me program certain functions if/when I got stuck. My first program written in assembler was a chess game. To my enjoyment, I found assembler helped me make more optimizations in my C/C++ code (mainly the AI), than my C/C++ code helped me learn assembler. Among other things, by reading the disassembled C/C++ (VC6) code I found by simply moving several instructions around in my project, the compiler was able to generate smaller and faster executing output. At first, my assembler AI code was not performing as well as the C/C++ code, and I became aware of some nice optimizations made by the compiler. I implemented some in my Assembler program. I continued to optimize and eventually reached the/my limits of optimization in my C/C++ code. So, as the reason for the project was to learn assembler, I continued on my assembler project, satisfied with the C/C++ project. As time went by, and I learnt new/more assembler instructions (my assembler vocabulary grew) I began reaching, and exceeding the speed of execution of my C/C++ project. At the end of the day (3 weeks later), I was happy that the project allowed me such an in-depth look into what type of code the VC6 compiler produces.
During but mainly after this project, I disassembled many common Windows API and C/C++ library calls. I tried my hand at improving their speed/efficiency. My first achievement, almost 3 years ago was a 15% speed increase on the common itoa() (exported by msvcrt??.dll) function used to convert an int to it's ascii string representation (this function written by the assembler community is up to 20x faster). A while ago I had the need to improve the speed of the API call GetPixel(), the result was a 25x improvement that no C/C++ compiler would beat! And my most recent is an estimated 3x increase in speed of the lstrlen() (exported by kernel32.dll) function used to determine the length of an ascii string. I have found that the longer and more complex the algorithm, generally the greater the increase in speed. For study purposes, I have read disassemblies of many common functions used by C/C++ and API programmers (formerly like myself). I can now, comfortably say, that a compiler is hard pressed to beat my standard, un-optimized assembler code. Worst case scenario is that they (my un-optimized code and C/C++ release code) perform equally well.
About 1 year ago, I was commissioned to complete a database validation program, I had 3 days to get a basic working model complete. 6 users would use my program, which would display an image, and they would have to validate OCR entries. They would be validating for 6 months. Due to the extremely tight deadline (1 weekend), I chose to develop in Borland C++ Builder 6, with it's RAD interface (using VCL), taking only a few minutes to do the front end design (I had no time for a fancy front end and couldn't waste the time). I used Borland's library for JPEG images and database connectivity. The users started using my application, and I continued to add features to improve productivity. The biggest complaint, especially on the slower (pII/pIII) PC's, was speed. I profiled and improved algorithms and response times, but still, they were unhappy. I disassembled critical parts of code, moved and removed code and checks. It became apparent that we weren't going to meet our 6 month validation deadline at the current pace of validation. My boss had thoughts of increasing the user count. I decided to re-write critical and non-critical parts of my code in assembler. Non critical included populating thousands of entries in a list box (which rapidly increased in speed with assembler), as the user interface and more critical sections became faster, our validation speed (along with user familiarity with the system) increased. Response times increased and users complained less. Critical image retrieval, viewing and manipulation became more responsive, database connectivity, retieval and connectivity increased (I re-wrote many ODBC functions/features in assembler) and the older PC's were as responsive as the pIV's we had.
If nothing else, I would honestly have to agree that knowledge of assembler, has made me an all round better C/C++ programmer. It took me about 1 year of assembler studies to reach 'compiler' level. After 2 years, I was thinking and coding like a compiler. After 3 years, compilers are left in the dust.
One thing I would like to make clear about this 'transition' from C/C++ to assembler I was able to make. Is that the abundance of C/C++ source code and resources out there, is astronomical. Libraries, functions, snipets, help files and documentation. I initially found these resources difficult to come-bye in the assembler community. When I wrote C/C++ programs, I had my 'frameworks', my templates. I could write a project in 3 days. I have CD's full of C/C++ source code, example code on everything imaginable. I have years of the stuff. But I had very few for assembler. What I could do, of course was compile and disassemble. So that's what I did to get my assembler source code. But I'm happy with my assembler library now. I have assembler code to achieve everything and anything I want and need in assembler now. If I don't have it, I'll disassemble it.
My Pro Assembler comments
"You don't know the power of the dark side": Darth Vader. You don't know the power of assembler, till you learn. Have respect for it. Have respect for those that code in it. All languages have pros and cons, assembler included. All languages have a purpose and a reason for being there. Assembler will never die. By it's nature, it cannot die. Without assembler, there's no true performance. Sucking everything out of the CPU can only be done in assembler. When new CPU's and instructions come out, assembler is normally the first and only language to support them. C/C++ has no real native/built in support for MMX/SSE/SSE2 instructions. How else can you utilize these instruction sets? Maybe inline assembler but then you loose the 'benefit' of compiler optimizations.
I'm not the best assembler programmer, but I don't consider myself a newbie. I can say I consistently beat compiler output and my project deadlines are met, without 'buggy' code as some would argue and without complaint from users on performance. But I do consider myself lucky!
How much can this line in C/C++ be optimized?
for (int i = 0; i < count; i++)
<some code here>
In assembler you can optimize to your hearts content. For Intel/AMD CPU's for example.
Sorry for the long post guys.
For the love of programming, Assembler is king!
But it's a long dark road, with few to guide you!
Final words: YES, assembler made me a much better C/C++ programmer.