[Simh] single cycle stepping for pdp11

Fri Sep 6 10:32:50 EDT 2013

> The number of cycles above is the *total*. Read the handbook.
Not really.  You're confusing observed timing with which cycles happen.

This is all a matter of how you account for the cycles.  The instruction 
must be fetched.  The source must be read.  And the destination must be 
written.  These all cause some sort of cycle - whether or not it appears 
on the external bus.  The handbook is talking not about how many cycles 
happen, but about the effect on instruction timing.  That is, accounting 
for their effect on observed timing.

The context of the OP's question was an assumption about which cycles 
*happen*, not their timing.  He wanted to observe/single step them, not 
measure how long instructions take.  I chose to reply mostly in the 
PDP-11 context, but intentionally included some general information 
since this question has come up with other architectures as well.

 From a timing point of view, the fetch can be hidden by an i-cache, by 
prefetch that overlaps execution, and other tricks. The source can come 
from cache, from a bypassed result of a previous instruction in the 
pipeline, or from memory.  The destination can go to memory, a write 
buffer, or a cache.  And it can be bypassed direct to the ALU.

This all has an impact on timing.  And depending on precisely how you 
define your observation point, which cycles happen.  Inside a CPU, most 
of them happen - except for bypassed operands.  Past that point, things 
get wild depending on cache/write buffer effectiveness.

Architecturally, all three cycles always happen sequentially.  In a 
given instance on a given implementation, which ones appear on any 
particular bus and the resulting instruction timing may vary greatly.

[Bypass:  A machine may send a write to memory, but notice that a 
subsequent instruction needs the value before the value arrives in 
memory.  In this case, the value can be sent to the execution unit 
before it reaches memory - completely or partially bypassing the 
memory/cache unit, and eliminating a memory cycle.  Depending on the 
exact timing/microarchitecture, the value can come directly from the 
ALU, from pipeline stages in the CPU, from pipe stages in the memory 
unit, from a write queue, write buffer or cache.]

> It's actually the destination processing that takes no memory cycles 
> (well, on the 11/34 it does take that cycle). I don't know exactly how 
> they pull that one off, but it's very clear from the book that is 
> really is that step which doesn't cost anything on some of the 
> architectures. If it is caching, or if it done in parallel with the 
> source read, with the next fetch, or whatnot, I don't know.
>
> My guess is that the CPU actually manage to squeeze it in with the 
> source read phase. 
You can't do this.  A write requires the source operand; you can't start 
it until you you have the source data.  (A clever machine can overlap 
the access checks, and maybe the address phase of an a/d external bus if 
the source read doesn't require the bus.  But the 11s were not that smart.)

You *can* disconnect the write from the next fetch with a write buffer 
(as I noted), but that doesn't eliminate the write, it just delays it 
until the write buffer is evicted.

All of this amounts to accounting and buffering tricks.  With buffers 
(and I include caches in this), you eliminate some bus cycles.  But 
eventually things get evicted.  So the bus cycle happens later - and is 
charged to something else.   Maybe a subsequent instruction.  Maybe 
averaged into memory overhead. With luck, there are multiple references 
to the the same address, and there is a reduction in external bus cycles 
due to coalescing.  Statistically, that works.  Worst case, they all happen.

In any case, the details are only of interest to hardware types - they 
don't and never will enter SimH.

> They are not rolled back on a PDP-11.
Not in hardware.  As you noted, the OS has to do it, which is even slower.

However, the resulting memory/register fetches create bus traffic, which 
the OP would like to see.
> Also, the 11/45/50/55 could have a separate Unibus for memory. You 
> could say that this would be a memory bus. 
Yes, I was thinking of those machines - the 11/45 counts as 'early'.
> My whole point was that single stepping on the cycle level would not 
> be universally the same on all PDP-11s, so doing it in simh would mean 
> you'd have to do different stuff depending on specific model. 
We are in violent agreement on the first point.  On the second, I go 
further: the effort would be inconsistent with SimH's goals and design.  
It should not be attempted.
> The PDP-11s that have cache all use a write through cache.
Yes.  The KL10 was the first DEC machine to have a write-back cache.  
And it has some novel aspects for software.

['aspect' DEC jargon file: something that *just is*, v.s. a *feature* or 
a *bug*]

This communication may not represent my employer's views,
if any, on the matters discussed.

On 06-Sep-13 08:46, Johnny Billquist wrote:
> The number of cycles above is the *total*. Read the handbook.
> It's actually the destination processing that takes no memory cycles 
> (well, on the 11/34 it does take that cycle). I don't know exactly how 
> they pull that one off, but it's very clear from the book that is 
> really is that step which doesn't cost anything on some of the 
> architectures. If it is caching, or if it done in parallel with the 
> source read, with the next fetch, or whatnot, I don't know.
>
> My guess is that the CPU actually manage to squeeze it in with the 
> source read phase. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.trailing-edge.com/pipermail/simh/attachments/20130906/ec07e5a2/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5159 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.trailing-edge.com/pipermail/simh/attachments/20130906/ec07e5a2/attachment-0002.bin>