I came across a post on reddit earlier today which claimed to show the breakdown of instructions on x86 asm generated by gcc. We found it rather curious that 1/4th of the instruction were adds. This sounded fishy, how often have you looked at code that had 1/4th adds, so we looked at how the disassembly was done. The person read in the binary, ran it through UDis86 and outputted the instruction counts. Looking at the documentation revealed that UDis86 takes a binary stream and decodes it, interesting. This means that it doesn't check which section it's in. The tipoff to this was a very high level of invalid instructions. If that person is just decoding the code parts of the text section, there should be no invalid instructions; unless that disassembler is so terrible it isn't worth a mention. Another reason why add may be so prominent is that '00 00' decodes to add; and the fact that add has a lot of encodings, although a lot of x86 instructions do too. In case you're interested, you can find the blog here.

But let's see, is there actually a problem with this? We went forth and disassembled /bin in a few different machines. We used objdump, restricting it to only count the code in the text section. First let's look at debian x86:


Debian x86

We can see from this that add is not the most used instruction. But how does this differ from something like Gentoo, where everything is compiled from source, with optimizations and the appropriate -mcpu options.
Gentoo x86

Interesting, almost the same for the top few instructions. How close these ratios are to each other surprised us. One last thing that we did was looking at an amd64 box, running debian.
Debian amd64

Given how close the other two graphs were, this is very dissimilar to them. With the reasons given above and the acquired data, we think we can safely say the previous blog was quite off in its counts.
We generated these just with: objdump --prefix-addresses -d /bin/*|grep "<"|cut --delimiter=" " -f 3|sort|uniq -c|sort -n followed by a bit of postprocessing.

Here's the data:
debian x86
gentoo x86
debian amd64


Note:
The shoddy graphs are generated using ploticus. It would be great if someone knew of a nicer way to generate them, the fact that some of the tags overlap is rather unpleasant.

Another note:
It was pointed out that maybe it isn't too clear what the striking differences between the x86 and the amd64 charts are. We guess that if you haven't looked at these at these while making them until you were entirely sick of them you might have some trouble readily noticing the differences.
So here's a list of major differences: a lot less moves (the extra registers surely helped here), a lot more xors, more pops than pushes on amd64, but more pushes than pops on x86 (the difference in calling convention probably accounts for this), more adds. There are others, though the above seemed to be the most interesting.