[Simh] Simh execution speed improvement - about 45% improvement.

Tue Jan 22 12:07:40 EST 2008

Hi, Davis

Your results are surprising to me, as if the access to the RAM is not
getting faster through the inlining of the Read and Write functions.
In your data, I see that the time between an not-inlined and the inlined
Version is the same (at one second difference) and that puzzles me.   
Mark Pizzolato did some tests on various Intel machines for the vax simulator
and he got some 35 to 69% improvement in speed just by inlining various
functions calls, and by keeping the default optimisation level at gcc: -O2
Mark also tested this on Microsoft visual C compiler with similar levels
oOf improvement.

I can only guess a few factors to explain this:
1) your benchmark is I/O bound and the time lost in accessing the RAM is 
Very small compared to the time used to wait for I/O.
2) The inline mods are not correctly done, which would surprise me from you.
If you are capable of developing a id32 cpu emulator, I would guess you know
How to inline a function call.

What I would suggest you to validate the not-inlined / inlined results would
be to run a program that only loops and increment a variable for a big number 
of times.  As this loop would be CPU bound only it would rule-out the I/O bound hypothesis 1).
As I don't work with the interdata simulator, I am unable to create such a 
program and have it compiled/run on the id32.

>Impressive indeed... Impressive enough to get me to do a little 
>semi-scientific experiment. Kudos to François for doing the runtime 
>profiling. The results aren't supprising, but it is important to have 
>solid solid data rather than guesses.

>I normally play with the Interdata 32 bit emulator instead of the VAX. I 
>also normally compile with no optimization specified because it makes 
>debuggers easier to use. This was a big enough performance bump to get 
>my attention, however.

>I compiled id32 four different ways, to test two different variables. To 
>check the effect of optimization I compiled with default (no?) 
>optimization and with -O9. To check the effect of inlining I used my 
>current development version of id32_cpu.c and a version edited to inline 
>the memory access and relocation/protection functions. I used the GCC 
>compiler Sun shipped with Solaris 10 11/06. I did not have a formal 
>benchmark handy so I used a reasembly of the mag tape driver. Times 
>reported are minutes:seconds based on time of day as reported by OS/32, 
>which does well on overnight clock drift testing.

             |    Optimization       |
             | default  | -O9        |
not-inlined  |  1:13    |  0:28      |
inlined      |  1:12    |  0:28      |

>I'd be inclined not to tinker with the source based on these results, 
>but rather trust the compiler to do the right thing.

>I would be interested to see the same four numbers for the VAX, results 
>may be different.

Best regards,

Francois Boucher ing.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.trailing-edge.com/pipermail/simh/attachments/20080122/1808cffe/attachment-0003.html>