[Simh] clang (was Re: XCode and LTO)
Michael Bloom
mabloom at dslextreme.com
Sat Apr 28 17:42:20 EDT 2012
I think I would have to agree with Peter. Or at least, half agree.
nanosleep() would give you the exact timing needed. I'm very much
against polling the time as a solution because it could use up a lot of
CPU cycles that other processes would love to have. System calls such
as gettimeofday() typically do nothing besides copy the internally
maintained time back to user space (and , and as such, do not need to
sleep and so never voluntarily give up the CPU. Some systems, will
recognize when processes seem to exhibit "infinite loop" behavior, and
"auto-nice" the process, potentially affecting the "brittleness" that
Peter refers to.
In any event, "spinning" when there are ways that are more accurate (not
to mention friendlier to other processes who'd like some CPU time) to
simulate timing, is not the best choice, in my opinion.
-Michael
P.S. One of the best known "offensive spinners" is Google Chrome, which
after a short while can bring a system to it's knees.. Some of it's
many processes are constantly spinning. Run strace on one that's racked
up a lot of time, and in ten seconds, you'll see: about 167,000
identical getrlimit() calls per second with always the same args, and
always the same returned values; 55000 clock_gettime() calls, 22000
gettimeofday() calls (can't they make up their mind?); 1379 zero-timeout
poll calls which always check descriptors 14, and 15, and 49 of those
calls also check descriptor 5 ... all but two of those poll calls
immediately time out. The two that don't time out are for (you guessed
it) descriptor 5. Right after the poll calls have timed out (because
there is no i/o available on descriptors 14, and 15), the code then
issues 1379 read calls on each of those those 2 descriptors. Of course,
because they didn't want those calls to block, fd's 14 and 15 were set
non-blocking. Thus they returned immediately with errno set to EAGAIN.
After the only two polls that did not time out, chrome issued 2 reads on
fd 5. So how about the time spent? looking at /proc/<pid>/stat on
linux, I'm seeing up to a ten to 1 ratio in user to kernel time. So,
whatever the amount of cpu time it took to execute 250,000 system calls
in ten seconds, it spent up to ten times that much cpu time executing
in user space.
On 04/27/2012 08:15 PM, Peter Svensson wrote:
> Hi,
>
> What is the rom code comparing against and why do we not do the delay
> compared to that?
>
> If it is against the real time clock, would not nanosleep() or just
> polling the time be more portable?
>
> Playing games with the C memory model to acheive a certain performace
> seems to me to always be brittle.
>
> Peter
>
> On Fri, 27 Apr 2012, Sergey Oboguev wrote:
>
>> Hi Mark,
>>
>> The goal is to prevent smart compiler from collapsing the loops in
>> rom_read_delay, especially the bottom loop, by optimizing them.
>> Declaring "loopval" as volatile does just that, by effectively disabling
>> compiler's capability to optimize, and does it in a portable way.
>>
>> Disabling inlining of rom_swapb, in fact, does not provide such guarantee long
>> term.
>> It may shut off compiler's optimizations today, but once the compiler (or
>> compilers) gets even smarter in the future, it can some day figure out the code
>> "does not need" to call rom_swapb.
>> Compiler may leave the function un-inlined, but just figure out it does not
>> need to be called and optimize the whole loop construct away.
>>
>> Therefore volatile is both portable and -- long-term -- safer approach.
>> The caveat is, compilers do have bugs and can sometimes disregard volatile
>> declaration.
>> See for ex. "Volatiles Are Miscompiled, and What to Do about It"
>> http://dl.acm.org/citation.cfm?id=1450093
>> Note that in older versions of LLVM used to be a particularly bad offender,
>> miscompiling (in LLVC-GCC version 2.2) 19% of volatile references, however it
>> got better since then.
>>
>> So when using volatile it's worth to take some extra steps that reduce
>> probability of triggering compiler's bug, particularly avoiding declaring
>> variable in question as local scope.
>> Or, perhaps even better, what Eide& Regehr suggest in the mentioned article:
>> instead of accessing variable directly, perform accesses via via per-type
>> accessor routines. Or both.
>>
>> Thanks,
>> Sergey
More information about the Simh
mailing list