<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

  <title></title>

</head>

<body bgcolor="#ffffff" text="#000000">

Jan-Benedict Glaw wrote:

<blockquote cite="mid20080123215814.GM4912@lug-owl.de" type="cite">

  <pre wrap="">On Tue, 2008-01-22 20:21:00 -0500, Davis Johnson <a class="moz-txt-link-rfc2396E" href="mailto:davis@frizzen.com"><davis@frizzen.com></a> wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">improvement from inlining alone. Function call overhead is relatively 

small, after all. I did expect it to be at least measurable. I expected 

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Calling a function is quite some thing! I wouldn't call it `small'

after all!  RAM accessing is one of the most used operations and as a

hot path, every single CPU instruction used to access it really counts

a lot...

MfG, JBG

  </pre>

</blockquote>

Function call overhead is relatively small. <br>

<br>

I ran my numbers on an ultra-SPARC system where there is generally no

need to save any registers, or restore them on exit. The call itself is

little more expensive than a regular branch. The return is just another

branch. Parameters are generally passed in registers, so setup before

the call is making sure the right values are in the right registers. A

similar situation exists on other RISC architectures (PPC, Alpha --

less so for MIPS).<br>

<br>

While every cycle does count, the few used by the call itself is small

compared to the body of  all but the smallest functions. A simulated

memory access in SIMH typically needs to worry about one or more of

memory relocation, protection (read, write, execute protect), access

levels, possible page faults, dirty bits, byte order issues, and

possibly other minutia.<br>

<br>

I'd say that the function call overhead is <u>relatively</u> small.

Probably even on Intel CPUs.<br>

<br>

Inlining the functions is still a powerful idea, just not because of

eliminated function call overhead. It is powerful because it allows the

optimizer to apply information from the context of the call to the

optimization of the function body. The function body gets optimized in

the same register allocation context as the caller. Common sub

expressions can be found that exist in both called and caller. If tests

and branches can frequently be eliminated. And so on and so on. Almost

every optimization trick works better.<br>

<br>

</body>

</html>