[Simh] more info on gcc vs the VAX780...

Jason Stevens neozeed at gmail.com
Sun Nov 7 17:16:30 EST 2010


(Sorry this is long.. the short version is for i386 use -O2, where possible,
and for the x86_64 -O1 beats -O2!!, and the x86_64 is a good 30% faster)

For all my testing, I'm using my desktop, an Intel Q9300 @ 2.5Ghz (quad core
cpu, 3MB cache) running OS X 10.6.4, with 8GB of ram.

I have my machine setup to use the full 64bit kernel

$ uname -a
Darwin Jason-Stevenss-Mac-Pro.local 10.4.0 Darwin Kernel Version 10.4.0: Fri
Apr 23 18:27:12 PDT 2010; root:xnu-1504.7.4~1/RELEASE_X86_64 x86_64

$ hostinfo
Mach kernel version:
 Darwin Kernel Version 10.4.0: Fri Apr 23 18:27:12 PDT 2010;
root:xnu-1504.7.4~1/RELEASE_X86_64
Kernel configured for up to 4 processors.
4 processors are physically available.
4 processors are logically available.
Processor type: i486 (Intel 80486)
Processors active: 0 1 2 3
Primary memory available: 8.00 gigabytes
Default processor set: 87 tasks, 393 threads, 4 processors
Load average: 0.03, Mach factor: 3.96


$ gcc -v

Using built-in specs.

Target: i686-apple-darwin10

Configured with: /var/tmp/gcc/gcc-5664~38/src/configure --disable-checking
--enable-werror --prefix=/usr --mandir=/share/man
--enable-languages=c,objc,c++,obj-c++
--program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib
--build=i686-apple-darwin10 --program-prefix=i686-apple-darwin10-
--host=x86_64-apple-darwin10 --target=i686-apple-darwin10
--with-gxx-include-dir=/include/c++/4.2.1

Thread model: posix

gcc version 4.2.1 (Apple Inc. build 5664)


Ok, very exciting I know.  I'm testing with 4.3 BSD UWisc [
http://sourceforge.net/projects/bsd42/files/4BSD%20under%20Windows/v0.4/4.3BSD-Uwisc-install-0.4.exe/download]
from the TUHS, along with gcc 2.7.2.2 in the VM, and the dhyrstone program
from http://www.superglobalmegacorp.com/index.php/Dhrystone.c


Every time I build vax780, I'm using the first set of flags for all of the
program, and the second for the isolated op_ldpctx,op_mtpr procedures..  I'm
also listing the exe size for some comparison.


When building for the i386 I get the following results:


    -O2/O1 533,924

Dhrystone(1.1) time for 500000 passes = 17

This machine benchmarks at 29411 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 17

This machine benchmarks at 29411 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 17

This machine benchmarks at 29411 dhrystones/second


-O1/-O1 513,448

Dhrystone(1.1) time for 500000 passes = 18

This machine benchmarks at 27777 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 18

This machine benchmarks at 27777 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 18

This machine benchmarks at 27777 dhrystones/second


-Os/-O1 513,396

Dhrystone(1.1) time for 500000 passes = 17

This machine benchmarks at 29411 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 18

This machine benchmarks at 27777 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 17

This machine benchmarks at 29411 dhrystones/second


And as we can see, and what I'd have expected is the -O2/-O1 combination was
the most consistent for speed.

Now onto the 64bit stuff...


   -O2/-O1 576,112

Dhrystone(1.1) time for 500000 passes = 14

This machine benchmarks at 35714 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 13

This machine benchmarks at 38461 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 12

This machine benchmarks at 41666 dhrystones/second



-O1/-O1 559,736

Dhrystone(1.1) time for 500000 passes = 12

This machine benchmarks at 41666 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 13

This machine benchmarks at 38461 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 13

This machine benchmarks at 38461 dhrystones/second


-O0/-O0 675,832

Dhrystone(1.1) time for 500000 passes = 19

This machine benchmarks at 26315 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 19

This machine benchmarks at 26315 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 19

This machine benchmarks at 26315 dhrystones/second


-O0/-O1 675,816

Dhrystone(1.1) time for 500000 passes = 19

This machine benchmarks at 26315 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 19

This machine benchmarks at 26315 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 17

This machine benchmarks at 29411 dhrystones/second


-Os/-O1 555,576

Dhrystone(1.1) time for 500000 passes = 13

This machine benchmarks at 38461 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 12

This machine benchmarks at 41666 dhrystones/second

Dhrystone(1.1) time for 500000 passes = 14

This machine benchmarks at 35714 dhrystones/second


What is interesting to me, is that the -O2 wasn't as fast as the -O1.. I'll
have to try this on some other x86_64 platforms to see if it's consistent,
but I thought I'd pass along just how much SIMH is on 64bit machines with a
64bit compiler, and that O2 isn't necessarily the best fit....
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.trailing-edge.com/pipermail/simh/attachments/20101107/ea107b57/attachment-0002.html>


More information about the Simh mailing list