This document has evolved over time and contains a number of the best ways
to hand-optimise your C-code. Compilers are good, but they can't do everything,
and here I hope to help you squeeze the best performance out of your code. This
is not intended for beginners, but for more experienced programmers.
Depending on your particular hardware and compiler, some of these techniques may actually slow down your code. Do some timings with and without them, as modern compilers may well be able to do things better at a low level. Improving the overall algorithm used will often produce better results than localised code tweaking. This document was originally written as a set of personal notes for myself - do not consider it to be an authoritative paper on the subject of optimisation. I may have made mistakes! If you have anything to add to this, or just have some constructive criticism (flames ignored), please contact me at the address below.
No error-checking is shown here as this article is only concerned with the fundamentals. By the time the application gets down to the low-level routines, you should have filtered out any bad data already.
switch ( queue ) { case 0 : letter = 'W'; break; case 1 : letter = 'S'; break; case 2 : letter = 'U'; break; }or maybe
if ( queue == 0 ) letter = 'W'; else if ( queue == 1 ) letter = 'S'; else letter = 'U';A neater ( and quicker ) method is to simply use the value as an index into a character array, eg.
static char *classes="WSU"; letter = classes[queue];
void func1( int *data ) { int i; for(i=0; i<10; i++) { somefunc2( *data, i); } }Even though "*data" may never change, the compiler does not know that somefunc2() did not alter it, and so the program must read it from memory each time it is used - it may be an alias for some other variable that is altered elsewhere. If you know it won't be altered, you could code it like this instead:
void func1( int *data ) { int i; int localdata; localdata = *data; for(i=0; i<10; i++) { somefunc2( localdata, i); } }This gives the compiler better opportunity for optimisation.
register unsigned int var_name;(although it is not guaranteed that the compiler will take any notice of "register", and "unsigned" may make no difference to the processor.)
for(i=0; i<100; i++) { stuff(); } for(i=0; i<100; i++) { morestuff(); }It would be better to do:
for(i=0; i<100; i++) { stuff(); morestuff(); }Note, however, that if you do a lot of work in the loop, it might not fit into your processor's instruction cache. In this case, two separate loops may actually be faster as each one can run completely in the cache.
for(i=0; i<3; i++) { something(i); }is less efficient than
something(0); something(1); something(2);because the code has to check and increment the value of i each time round the loop. Compilers will often unroll simple loops like this, where a fixed number of iterations is involved, but something like
for(i=0;i<limit;i++){ ... }is unlikely to be unrolled, as we don't know how many iterations there will be. It is, however, possible to unroll this sort of loop and take advantage of the speed savings that can be gained. A good example of this was given in the "Graphic Gems" series of books, as a way of speeding up the display of pixels in a scanline during graphics rendering, but can also be applied to any situation which involves the same operation being applied to a large amount of data.
#include<stdio.h> #define BLOCKSIZE (8) void main(void) { int i = 0; int limit = 33; /* could be anything */ int blocklimit; /* The limit may not be divisible by BLOCKSIZE, * go as near as we can first, then tidy up. */ blocklimit = (limit / BLOCKSIZE) * BLOCKSIZE; /* unroll the loop in blocks of 8 */ while( i < blocklimit ) { printf("process(%d)\n", i); printf("process(%d)\n", i+1); printf("process(%d)\n", i+2); printf("process(%d)\n", i+3); printf("process(%d)\n", i+4); printf("process(%d)\n", i+5); printf("process(%d)\n", i+6); printf("process(%d)\n", i+7); /* update the counter */ i += 8; } /* * There may be some left to do. * This could be done as a simple for() loop, * but a switch is faster (and more interesting) */ if( i < limit ) { /* Jump into the case at the place that will allow * us to finish off the appropriate number of items. */ switch( limit - i ) { case 7 : printf("process(%d)\n", i); i++; case 6 : printf("process(%d)\n", i); i++; case 5 : printf("process(%d)\n", i); i++; case 4 : printf("process(%d)\n", i); i++; case 3 : printf("process(%d)\n", i); i++; case 2 : printf("process(%d)\n", i); i++; case 1 : printf("process(%d)\n", i); } } }Another simple trick to use with loops is to count down, instead of up. Look at this code, which will go through the values of i=0 to i=9 :
for(i=0; i<10; i++) { do stuff... }If the order in which the loop contents are executed does not matter, you can do this instead:
for( i=10; i--; )which will step through i=9, down to i=0.It is important to use "i--" rather than "--i", otherwise the loop will terminate early.
for( i=0; i<10; i++){ ... }i loops through the values 0,1,2,3,4,5,6,7,8,9
If you don't care about the order of the loop counter, you can do this instead:
for( i=10; i--; ) { ... }Using this code, i loops through the values 9,8,7,6,5,4,3,2,1,0, and the loop should be faster.
The syntax is a little strange, put is perfectly legal. The third statement in the loop is optional (an infinite loop would be written as "for( ; ; )" ). The same effect could also be gained by coding:
for(i=10; i; i--){}or (to expand it further)
for(i=10; i!=0; i--){}The only things you have to be careful of are remembering that the loop stops at 0 (so if you wanted to loop from 50-80, this wouldn't work), and the loop counter goes backwards.It's easy to get caught out if your code relies on an ascending loop counter.
if( val == 1) dostuff1(); else if (val == 2) dostuff2(); else if (val == 3) dostuff3();it may be faster to use a switch:
switch( val ) { case 1: dostuff1(); break; case 2: dostuff2(); break; case 3: dostuff3(); break; }In the if() statement, if the last case is required, all the previous ones will be tested first. The switch lets us cut out this extra work. If you have to use a big if..else.. statement, test the most likely cases first.
void print_data( const bigstruct *data_pointer) { ...printf contents of structure... }This example informs the compiler that the function does not alter the contents (pointer to constant structure) of the external structure, and does not need to keep re-reading the contents each time they are accessed. It also ensures that the compiler will trap any accidental attempts by your code to write to the read-only structure.
found = FALSE;
for(i=0;i<10000;i++)
{
if( list[i] == -99 )
{
found = TRUE;
}
}
if( found ) printf("Yes, there is a -99. Hooray!\n");
This works well, but will process the entire array, no matter where
the search item occurs in it.
A better way is to abort the search as soon as you've found the desired
entry.
found = FALSE;
for(i=0; i<10000; i++)
{
if( list[i] == -99 )
{
found = TRUE;
break;
}
}
if( found ) printf("Yes, there is a -99. Hooray!\n");
If the item is at, say position 23, the loop will stop there and then, and skip the remaining 9977 iterations.
If you find any of this helps to dramatically increase the performance of your software, please let me know.