[Simh] clang (was Re: XCode and LTO)

Sat Apr 28 17:42:20 EDT 2012

I think I would have to agree with Peter. Or at least, half agree.  
nanosleep() would give you the exact timing needed.  I'm very much 
against polling the time as a solution because it could use up a lot of 
CPU cycles that other processes would love to have.   System calls such 
as gettimeofday() typically do nothing besides copy the internally 
maintained time back to user space (and ,  and as such,  do not need to 
sleep and so never voluntarily give up the CPU.  Some systems, will 
recognize when processes seem to exhibit "infinite loop" behavior, and 
"auto-nice" the process,  potentially affecting the "brittleness" that 
Peter refers to.

In any event, "spinning" when there are ways that are more accurate (not 
to mention friendlier to other processes who'd like some CPU time) to 
simulate timing,  is not the best choice, in my opinion.

-Michael

P.S. One of the best known "offensive spinners" is Google Chrome, which 
after a short while can bring a system to it's knees..  Some of it's 
many processes are constantly spinning.  Run strace on one that's racked 
up a lot of time, and in ten seconds, you'll see: about 167,000 
identical getrlimit() calls per second with always the same args, and 
always the same returned values; 55000 clock_gettime() calls, 22000 
gettimeofday() calls (can't they make up their mind?); 1379 zero-timeout 
poll calls which always check descriptors 14, and 15, and  49 of those 
calls also check descriptor 5 ... all but two of those poll calls 
immediately time out.  The two that don't time out are for (you guessed 
it) descriptor 5. Right after the poll calls have timed out (because 
there is no i/o available on descriptors 14, and 15),  the code then 
issues 1379 read calls on each of those those 2 descriptors.  Of course, 
because they didn't want those calls to block,  fd's 14 and 15 were set 
non-blocking.  Thus they returned immediately with errno set to EAGAIN.  
After the only two polls that did not time out, chrome issued 2 reads on 
fd 5.  So how about the time spent? looking at /proc/<pid>/stat on 
linux, I'm seeing up to a ten to 1 ratio in user to kernel time.  So, 
whatever the amount of cpu time it took to execute 250,000 system calls 
in ten seconds,  it spent up to ten times that much cpu time executing 
in user space.

On 04/27/2012 08:15 PM, Peter Svensson wrote:
> Hi,
>
> What is the rom code comparing against and why do we not do the delay
> compared to that?
>
> If it is against the real time clock, would not nanosleep() or just
> polling the time be more portable?
>
> Playing games with the C memory model to acheive a certain performace
> seems to me to always be brittle.
>
> Peter
>
> On Fri, 27 Apr 2012, Sergey Oboguev wrote:
>
>> Hi Mark,
>>
>> The goal is to prevent smart compiler from collapsing the loops in
>> rom_read_delay, especially the bottom loop, by optimizing them.
>> Declaring "loopval" as volatile does just that, by effectively disabling
>> compiler's capability to optimize, and does it in a portable way.
>>
>> Disabling inlining of rom_swapb, in fact, does not provide such guarantee long
>> term.
>> It may shut off compiler's optimizations today, but once the compiler  (or
>> compilers) gets even smarter in the future, it can some day figure  out the code
>> "does not need" to call rom_swapb.
>> Compiler may leave the function un-inlined, but just figure out it does  not
>> need to be called and optimize the whole loop construct away.
>>
>> Therefore volatile is both portable and -- long-term -- safer approach.
>> The caveat is, compilers do have bugs and can sometimes disregard volatile
>> declaration.
>> See for ex. "Volatiles Are Miscompiled, and What to Do about It"
>> http://dl.acm.org/citation.cfm?id=1450093
>> Note that in older versions of LLVM used to be a particularly bad  offender,
>> miscompiling (in LLVC-GCC version 2.2) 19% of volatile  references, however it
>> got better since then.
>>
>> So when using volatile it's worth to take some extra steps that reduce
>> probability of triggering compiler's bug, particularly avoiding  declaring
>> variable in question as local scope.
>> Or, perhaps even better, what Eide&  Regehr suggest in the mentioned article:
>> instead of  accessing variable directly, perform accesses via via per-type
>> accessor  routines. Or both.
>>
>> Thanks,
>> Sergey