LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-25-2014, 01:44 AM   #1
Jerry Mcguire
Member
 
Registered: Jul 2009
Location: Hong Kong SAR
Distribution: RedHat, Fedora
Posts: 201

Rep: Reputation: 31
Performance of switch cases in C


Part 1) An interesting finding, not a question.

I did a test to compare the performance on the following statements in C:

Fall-Through
Code:
switch (loc) {
case 0: ...; /* fall-through */
case 1: ...; /* fall-through */
case ...
case 15: ...;
    break;
case 16: ...; /* fall-through */
case 17: ...; /* fall-through */
case ...
case 31: ...;
    break;
...
case 255: ...;
    break;
}
case-break-case-break
Code:
switch (loc) {
case 0: ...; ...; ...; ...; ...  ...; ...; break;
case 1: ...; ...; ...; ...; ...  ...; break;
case ... /* you get the idea */
case 15: ...; break;

case 16: ...; ...; ...; ...; ...  ...; ...; break;
case 17: ...; ...; ...; ...; ...  ...; break;
case ...
case 31: ...; break;

...
case 255: ...; break;
}
Result shows that the case-break-case-break statement is more efficient than the Fall-Through.

Part 2) The question

Now since I strongly believe in my finding in Part 1), I'm going to use this case-break-case-break approach for maximum efficiency. If there is no fall-through, then the case-groups can be re-arranged in any order I want. My question is, does the position of the case-group in the switch-statement affect performance when it is called very often?

Say the variable 'loc' is in normal distribution ranges from 0 to 255, then if should I put case 127: case 128: first and case 0: case 255: last for maximum efficiency?

fyi.
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52)
 
Old 08-25-2014, 02:04 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,687

Rep: Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274
I think you will never get a "this or that is better" answer, because it always depend on (at least two different) things:
1. it depends on the functionality, what is implemented in those cases. Your second example needs usually more space, but
2. it also depends on the optimization, and that may completely reorganize the flow of the program.

My very personal opinion is the following: optimization will do a lot of things and probably knows some tricks better (especially on low level - assembly), but we can have a much better view on the code and functionality and therefore sometimes we can make much better high level optimizations.
 
1 members found this post helpful.
Old 08-25-2014, 02:12 AM   #3
psionl0
Member
 
Registered: Jan 2011
Distribution: slackware_64 14.1
Posts: 722
Blog Entries: 2

Rep: Reputation: 124Reputation: 124
Is this what you mean?
Quote:
Originally Posted by Jerry Mcguire View Post
Fall-Through
Code:
switch (loc) {
case 0: ...; /* fall-through */
case 1: ...; /* fall-through */
case ...
case-break-case-break
Code:
switch (loc) {
case 0: ...; < case 1 etc. actions .... >; break;
case 1: ...; < case 2 etc. actions .... >; break;
case ...
If this is the case then I don't see case-break-case-break doing anything other than adding more code.

Anything else and the actions of the two pieces of code won't be the same.

Last edited by psionl0; 08-25-2014 at 02:15 AM.
 
Old 08-25-2014, 02:42 AM   #4
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,851
Blog Entries: 1

Rep: Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868
Duplicating code is bad. Very bad. Don't do it.
Even if you gain 0.01 percent in speed, it doesn't worth it.
 
Old 08-25-2014, 03:17 AM   #5
Jerry Mcguire
Member
 
Registered: Jul 2009
Location: Hong Kong SAR
Distribution: RedHat, Fedora
Posts: 201

Original Poster
Rep: Reputation: 31
Thanks for replying

The gain is about 3%. The repeated code is written in the preprocessor to keep it manageable.
I gained access to a large amount of processing power recently so I re-did the Eternity II search for fun, achieved over 1 peta-legitimate moves per day.

Any insight on Part 2 the question?

Last edited by Jerry Mcguire; 08-25-2014 at 03:20 AM. Reason: typo x2
 
Old 08-25-2014, 03:20 AM   #6
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,198

Rep: Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307
Have you tried examining the assembly code that's produced in each case?

My guess is that the first one checks the condition, then jumps to the code in the default block, while the second one checks the condition, then continues to the very next instruction that has been inlined after it. The second one would not only save a jump, but also cache much better.

Do check the assembler to confirm or refute that.

Last edited by dugan; 08-25-2014 at 03:26 AM.
 
Old 08-25-2014, 03:28 AM   #7
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,450

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Good read on the subject:

http://741mhz.com/switch/
 
2 members found this post helpful.
Old 08-25-2014, 03:34 AM   #8
a4z
Senior Member
 
Registered: Feb 2009
Posts: 1,727

Rep: Reputation: 742Reputation: 742Reputation: 742Reputation: 742Reputation: 742Reputation: 742Reputation: 742
a jump table , something like array[255] , is very fast
and something like this might be generated for the non fall through part
it helps the compiler if you cases are ordered,
the same is valid for if, else if, (I read this some time ago)

I once saw program, searching char,
also with a lot of fall through
the fastest was a char[255] and have the search values on the location
the case was then something like *(&array + val)
was faster than if and case,
but if there is code in the case it might be different, would be interesting to test, if you do it, please tell the results
 
Old 08-25-2014, 03:38 AM   #9
Jerry Mcguire
Member
 
Registered: Jul 2009
Location: Hong Kong SAR
Distribution: RedHat, Fedora
Posts: 201

Original Poster
Rep: Reputation: 31
I don't know a bit about assembly but instinct says the same as your view. I'm guessing each fall-through involves a jump of execution, so on average the 16 cases (15 fall-throughs) adds 7.5 jumps per iteration. The source code is just an example, not the one that saves 3%.

But again, I need opinion on the position of the case-group. Instinct also says the position doesn't matter, since it's 1 jump any way. But instinct doesn't tell about the efficiency of switch(loc) on 256 cases. I have no idea how this is implemented in the assembly code. If it is a if-else-if like expansion, I would say moving the case 127 to the front would likely help... I don't know.
 
Old 08-25-2014, 05:24 AM   #10
a4z
Senior Member
 
Registered: Feb 2009
Posts: 1,727

Rep: Reputation: 742Reputation: 742Reputation: 742Reputation: 742Reputation: 742Reputation: 742Reputation: 742
you should read the link Guttorm has proided, it contains all info you search and eve more
 
Old 08-25-2014, 05:32 AM   #11
psionl0
Member
 
Registered: Jan 2011
Distribution: slackware_64 14.1
Posts: 722
Blog Entries: 2

Rep: Reputation: 124Reputation: 124
Quote:
Originally Posted by Jerry Mcguire View Post
I'm guessing each fall-through involves a jump of execution, so on average the 16 cases (15 fall-throughs) adds 7.5 jumps per iteration.
I would have thought the opposite - mainly that a fall-through means "don't jump" - conditionally or not.
 
Old 08-25-2014, 12:14 PM   #12
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,198

Rep: Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307
What optimization options are you using? I'm surprised that gcc isn't inlining the first case, and if is then the two should actually be identical.

Anyway:

As mentioned, the cost of a jump is not the jump instruction itself (which is negligible), it's the potential for cache misses (which isn't). CPU's fetch data and instructions into the cache according to the principle of locality; when an instruction is executed, the next few instructions are fetched into the cache. If the instructions are not in the cache, then they must be fetched from main memory, which, according to the following graphic, is 100 times slower:

http://www.eecs.berkeley.edu/~rcs/re...e_latency.html

Let's say that fetching an instruction from the cache takes 1ns, and fetching a block of instructions from main memory takes 100ns.

The instructions are laid out consecutively when they're in main memory.

1. instruction 1
2. instruction 2
3. instruction 3
...
n. instruction n

And if the CPU instruction cache has room for eight instructions, then they're fetched in consecutive blocks of 8. With no jumps, it would be: instructions one through eight, then instructions 9 through 16, etc.

To start, the CPU fetches the first eight instructions from main memory into the cache. It fetches instruction 1 from the cache, executes it, fetches instruction 2 from the cache, executes it, and so on. Each instruction takes 1ns to fetch from the cache.

But what if there's a jump? The CPU fetches instruction 3 from the cache, and the instruction is JUMP TO INSTRUCTION 20. Instruction 20 is not in the cache, so the CPU first clears the cache, then fetches instructions #17 through #24 from main memory into the cache. That takes an extra 100ns! It then executes instruction 20. Because of this, a jump-plus-instruction pair is 100 times more expensive than just the instruction itself.

That's very oversimplified, of course (I am not a CPU engineer), but it should make it clear where the delay is probably coming from.

Quote:
Originally Posted by psionl0 View Post
I would have thought the opposite - mainly that a fall-through means "don't jump" - conditionally or not.
Fall-through should mean "unconditional jump", shouldn't it?

Last edited by dugan; 08-25-2014 at 12:28 PM.
 
Old 08-25-2014, 12:37 PM   #13
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
If you are looking into optimization, the easiest way to optimize is trial and error. You can also mess with gcc compile options.

Switches tend to be easy for the compiler to optimize, and gcc will optimize them as best it can. If you want to see what it is doing you can use the '-save-temps' option and look at the assembly. You don't really have to know assembly to guess at what is going on. Lots of testing will still be required for maximum performance.

Keep it simple and logical. Compilers are written to be able to understand and optimize logical constructs as assembly instructions. The easier your code is to understand by the compiler, the faster it will be.
 
Old 08-25-2014, 12:44 PM   #14
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,198

Rep: Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307Reputation: 5307
This post was originally some examples of the assembler that would be produced in each case, but I've given up trying to come up with meaningful examples and compare them. I suggest doing the investigations yourself:

Using GCC to produce readable assembly?

You cannot determine why one is faster without actually looking at and understanding the generated assembly.

I recommend Programming from the Ground Up if you need an assembly primer.

Last edited by dugan; 08-25-2014 at 02:19 PM.
 
1 members found this post helpful.
Old 08-25-2014, 01:33 PM   #15
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,610
Blog Entries: 4

Rep: Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905Reputation: 3905
Also:
Quote:
Outsmarting a production quality compiler these days is a nearly impossible task. A programmer should not even try to optimize the code that is not proven to be a bottleneck by carefully profiling the whole program. If a piece of code is proven to be slow and there is an obvious optimization that compiler has failed to perform, shop for a better compiler. Start by optimizing the logic and not the code — doing less steps where possible, avoiding chaotic dynamic memory manipulations, using well designed data structures — will pay off more than any micro-optimization.
Or, more simply:
Quote:
Originally Posted by The Elements of Programming Style (Kernighan & Plauger):
Don't "diddle" code to make it faster ... find a better algorithm.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: SAN switch performance monitoring using IBM Network Advisor LXer Syndicated Linux News 0 05-29-2014 08:11 AM
[SOLVED] bash, switch case, none break between cases. ted_chou12 Linux - Newbie 2 01-13-2013 01:25 PM
Bad WAN port performance on WRT54GL when used as a switch. (DD-WRT) Synt4x_3rr0r Linux - Networking 0 08-23-2010 11:44 AM
Switch Hell...NFS and degrading performance with gigabit switched edman007 Linux - Networking 4 03-12-2006 06:49 PM
PC cases berrance General 6 01-14-2005 05:53 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:36 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration