[Simh] single cycle stepping for pdp11

Johnny Billquist bqt at softjar.se
Fri Sep 6 11:01:45 EDT 2013


On 2013-09-06 16:32, Timothe Litt wrote:
>> The number of cycles above is the *total*. Read the handbook.
> Not really.  You're confusing observed timing with which cycles happen.
>
> This is all a matter of how you account for the cycles.  The instruction
> must be fetched.  The source must be read.  And the destination must be
> written.  These all cause some sort of cycle - whether or not it appears
> on the external bus.  The handbook is talking not about how many cycles
> happen, but about the effect on instruction timing.  That is, accounting
> for their effect on observed timing.

Did you even read the book? They list both execution times, and memory 
cycles.
Yes, the chapter is about execution times.
That does not change that the number of memory cycles used is also 
shown, and that MOV (R0)+,(R1)+, as an example, is only two memory 
cycles total. And that includes instruction fetch and execution, source 
processing and destination processing.

	Johnny

>
> The context of the OP's question was an assumption about which cycles
> *happen*, not their timing.  He wanted to observe/single step them, not
> measure how long instructions take.  I chose to reply mostly in the
> PDP-11 context, but intentionally included some general information
> since this question has come up with other architectures as well.
>
>  From a timing point of view, the fetch can be hidden by an i-cache, by
> prefetch that overlaps execution, and other tricks. The source can come
> from cache, from a bypassed result of a previous instruction in the
> pipeline, or from memory.  The destination can go to memory, a write
> buffer, or a cache.  And it can be bypassed direct to the ALU.
>
> This all has an impact on timing.  And depending on precisely how you
> define your observation point, which cycles happen.  Inside a CPU, most
> of them happen - except for bypassed operands.  Past that point, things
> get wild depending on cache/write buffer effectiveness.
>
> Architecturally, all three cycles always happen sequentially.  In a
> given instance on a given implementation, which ones appear on any
> particular bus and the resulting instruction timing may vary greatly.
>
> [Bypass:  A machine may send a write to memory, but notice that a
> subsequent instruction needs the value before the value arrives in
> memory.  In this case, the value can be sent to the execution unit
> before it reaches memory - completely or partially bypassing the
> memory/cache unit, and eliminating a memory cycle.  Depending on the
> exact timing/microarchitecture, the value can come directly from the
> ALU, from pipeline stages in the CPU, from pipe stages in the memory
> unit, from a write queue, write buffer or cache.]
>
>> It's actually the destination processing that takes no memory cycles
>> (well, on the 11/34 it does take that cycle). I don't know exactly how
>> they pull that one off, but it's very clear from the book that is
>> really is that step which doesn't cost anything on some of the
>> architectures. If it is caching, or if it done in parallel with the
>> source read, with the next fetch, or whatnot, I don't know.
>>
>> My guess is that the CPU actually manage to squeeze it in with the
>> source read phase.
> You can't do this.  A write requires the source operand; you can't start
> it until you you have the source data.  (A clever machine can overlap
> the access checks, and maybe the address phase of an a/d external bus if
> the source read doesn't require the bus.  But the 11s were not that smart.)
>
> You *can* disconnect the write from the next fetch with a write buffer
> (as I noted), but that doesn't eliminate the write, it just delays it
> until the write buffer is evicted.
>
> All of this amounts to accounting and buffering tricks.  With buffers
> (and I include caches in this), you eliminate some bus cycles.  But
> eventually things get evicted.  So the bus cycle happens later - and is
> charged to something else.   Maybe a subsequent instruction.  Maybe
> averaged into memory overhead. With luck, there are multiple references
> to the the same address, and there is a reduction in external bus cycles
> due to coalescing.  Statistically, that works.  Worst case, they all happen.
>
> In any case, the details are only of interest to hardware types - they
> don't and never will enter SimH.
>
>> They are not rolled back on a PDP-11.
> Not in hardware.  As you noted, the OS has to do it, which is even slower.
>
> However, the resulting memory/register fetches create bus traffic, which
> the OP would like to see.
>> Also, the 11/45/50/55 could have a separate Unibus for memory. You
>> could say that this would be a memory bus.
> Yes, I was thinking of those machines - the 11/45 counts as 'early'.
>> My whole point was that single stepping on the cycle level would not
>> be universally the same on all PDP-11s, so doing it in simh would mean
>> you'd have to do different stuff depending on specific model.
> We are in violent agreement on the first point.  On the second, I go
> further: the effort would be inconsistent with SimH's goals and design.
> It should not be attempted.
>> The PDP-11s that have cache all use a write through cache.
> Yes.  The KL10 was the first DEC machine to have a write-back cache.
> And it has some novel aspects for software.
>
> ['aspect' DEC jargon file: something that *just is*, v.s. a *feature* or
> a *bug*]
>
> This communication may not represent my employer's views,
> if any, on the matters discussed.
>
> On 06-Sep-13 08:46, Johnny Billquist wrote:
>> The number of cycles above is the *total*. Read the handbook.
>> It's actually the destination processing that takes no memory cycles
>> (well, on the 11/34 it does take that cycle). I don't know exactly how
>> they pull that one off, but it's very clear from the book that is
>> really is that step which doesn't cost anything on some of the
>> architectures. If it is caching, or if it done in parallel with the
>> source read, with the next fetch, or whatnot, I don't know.
>>
>> My guess is that the CPU actually manage to squeeze it in with the
>> source read phase.
>
>
>
> _______________________________________________
> Simh mailing list
> Simh at trailing-edge.com
> http://mailman.trailing-edge.com/mailman/listinfo/simh
>




More information about the Simh mailing list