[Simh] more info on gcc vs the VAX780...
Jason Stevens
neozeed at gmail.com
Sun Nov 7 17:16:30 EST 2010
(Sorry this is long.. the short version is for i386 use -O2, where possible,
and for the x86_64 -O1 beats -O2!!, and the x86_64 is a good 30% faster)
For all my testing, I'm using my desktop, an Intel Q9300 @ 2.5Ghz (quad core
cpu, 3MB cache) running OS X 10.6.4, with 8GB of ram.
I have my machine setup to use the full 64bit kernel
$ uname -a
Darwin Jason-Stevenss-Mac-Pro.local 10.4.0 Darwin Kernel Version 10.4.0: Fri
Apr 23 18:27:12 PDT 2010; root:xnu-1504.7.4~1/RELEASE_X86_64 x86_64
$ hostinfo
Mach kernel version:
Darwin Kernel Version 10.4.0: Fri Apr 23 18:27:12 PDT 2010;
root:xnu-1504.7.4~1/RELEASE_X86_64
Kernel configured for up to 4 processors.
4 processors are physically available.
4 processors are logically available.
Processor type: i486 (Intel 80486)
Processors active: 0 1 2 3
Primary memory available: 8.00 gigabytes
Default processor set: 87 tasks, 393 threads, 4 processors
Load average: 0.03, Mach factor: 3.96
$ gcc -v
Using built-in specs.
Target: i686-apple-darwin10
Configured with: /var/tmp/gcc/gcc-5664~38/src/configure --disable-checking
--enable-werror --prefix=/usr --mandir=/share/man
--enable-languages=c,objc,c++,obj-c++
--program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib
--build=i686-apple-darwin10 --program-prefix=i686-apple-darwin10-
--host=x86_64-apple-darwin10 --target=i686-apple-darwin10
--with-gxx-include-dir=/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Apple Inc. build 5664)
Ok, very exciting I know. I'm testing with 4.3 BSD UWisc [
http://sourceforge.net/projects/bsd42/files/4BSD%20under%20Windows/v0.4/4.3BSD-Uwisc-install-0.4.exe/download]
from the TUHS, along with gcc 2.7.2.2 in the VM, and the dhyrstone program
from http://www.superglobalmegacorp.com/index.php/Dhrystone.c
Every time I build vax780, I'm using the first set of flags for all of the
program, and the second for the isolated op_ldpctx,op_mtpr procedures.. I'm
also listing the exe size for some comparison.
When building for the i386 I get the following results:
-O2/O1 533,924
Dhrystone(1.1) time for 500000 passes = 17
This machine benchmarks at 29411 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 17
This machine benchmarks at 29411 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 17
This machine benchmarks at 29411 dhrystones/second
-O1/-O1 513,448
Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second
-Os/-O1 513,396
Dhrystone(1.1) time for 500000 passes = 17
This machine benchmarks at 29411 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 18
This machine benchmarks at 27777 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 17
This machine benchmarks at 29411 dhrystones/second
And as we can see, and what I'd have expected is the -O2/-O1 combination was
the most consistent for speed.
Now onto the 64bit stuff...
-O2/-O1 576,112
Dhrystone(1.1) time for 500000 passes = 14
This machine benchmarks at 35714 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
-O1/-O1 559,736
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
-O0/-O0 675,832
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second
-O0/-O1 675,816
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 19
This machine benchmarks at 26315 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 17
This machine benchmarks at 29411 dhrystones/second
-Os/-O1 555,576
Dhrystone(1.1) time for 500000 passes = 13
This machine benchmarks at 38461 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 12
This machine benchmarks at 41666 dhrystones/second
Dhrystone(1.1) time for 500000 passes = 14
This machine benchmarks at 35714 dhrystones/second
What is interesting to me, is that the -O2 wasn't as fast as the -O1.. I'll
have to try this on some other x86_64 platforms to see if it's consistent,
but I thought I'd pass along just how much SIMH is on 64bit machines with a
64bit compiler, and that O2 isn't necessarily the best fit....
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.trailing-edge.com/pipermail/simh/attachments/20101107/ea107b57/attachment-0002.html>
More information about the Simh
mailing list