[Simh] Simh execution speed improvement - about 45% improvement.

Tue Jan 22 20:21:00 EST 2008

Boucher wrote:

>Hi, Davis
>
> 
>
>Your results are surprising to me, as if the access to the RAM is not
>
>getting faster through the inlining of the Read and Write functions.
>
>In your data, I see that the time between an not-inlined and the inlined
>
>Version is the same (at one second difference) and that puzzles me.   
>
>Mark Pizzolato did some tests on various Intel machines for the vax simulator
>
>and he got some 35 to 69% improvement in speed just by inlining various
>
>functions calls, and by keeping the default optimisation level at gcc: -O2
>
>Mark also tested this on Microsoft visual C compiler with similar levels
>
>oOf improvement.
>
My results were surprising to me, also. I did not expect much 
improvement from inlining alone. Function call overhead is relatively 
small, after all. I did expect it to be at least measurable. I expected 
inline to make more difference with the optimizer. The more you give an 
optimizer to chew on the better it does. Having more context for the 
operations within the  inlined function gives the optimizer more insight 
into what can be improved.

> 
>
>I can only guess a few factors to explain this:
>
>1) your benchmark is I/O bound and the time lost in accessing the RAM is 
>
>Very small compared to the time used to wait for I/O.
>
Possible. The file I/O in OS/32 is a DMA transfer from disk to "system 
space", and then copied into user space using the CPU. So this test is 
more CPU intensive than it sounds, but not as CPU intensive as it should 
be for this test.

>2) The inline mods are not correctly done, which would surprise me from you.
>
>If you are capable of developing a id32 cpu emulator, I would guess you know
>
>How to inline a function call.
>
Inlining function calls isn't something I do every day. My general 
philosophy is to design it so that it can be fast, build it so it is 
correct, and go back and make it fast if need be. I tend to avoid source 
tweaks like inlining. So I did need a number of compiles to get the 
prototypes matching the functions. The nature of the errors I got, and 
what I did to correct them, leads me to suspect I got it right in the 
end. When I did get a clean compile with no measurable improvement I 
suspected that I might have something wrong. I even moved the function 
source to above all the references in hopes that would help. Another 
possibility is that I missed inlining an important sub function, I'll 
have to look. If you are interested I could send my source file.

> 
>
>What I would suggest you to validate the not-inlined / inlined results would
>
>be to run a program that only loops and increment a variable for a big number 
>
>of times.  As this loop would be CPU bound only it would rule-out the I/O bound hypothesis 1).
>
>As I don't work with the interdata simulator, I am unable to create such a 
>
>program and have it compiled/run on the id32.
>
That is a good idea, and easy enough to do.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.trailing-edge.com/pipermail/simh/attachments/20080122/dd1332ca/attachment-0003.html>