[Simh] single cycle stepping for pdp11
Timothe Litt
litt at ieee.org
Fri Sep 6 10:32:50 EDT 2013
> The number of cycles above is the *total*. Read the handbook.
Not really. You're confusing observed timing with which cycles happen.
This is all a matter of how you account for the cycles. The instruction
must be fetched. The source must be read. And the destination must be
written. These all cause some sort of cycle - whether or not it appears
on the external bus. The handbook is talking not about how many cycles
happen, but about the effect on instruction timing. That is, accounting
for their effect on observed timing.
The context of the OP's question was an assumption about which cycles
*happen*, not their timing. He wanted to observe/single step them, not
measure how long instructions take. I chose to reply mostly in the
PDP-11 context, but intentionally included some general information
since this question has come up with other architectures as well.
From a timing point of view, the fetch can be hidden by an i-cache, by
prefetch that overlaps execution, and other tricks. The source can come
from cache, from a bypassed result of a previous instruction in the
pipeline, or from memory. The destination can go to memory, a write
buffer, or a cache. And it can be bypassed direct to the ALU.
This all has an impact on timing. And depending on precisely how you
define your observation point, which cycles happen. Inside a CPU, most
of them happen - except for bypassed operands. Past that point, things
get wild depending on cache/write buffer effectiveness.
Architecturally, all three cycles always happen sequentially. In a
given instance on a given implementation, which ones appear on any
particular bus and the resulting instruction timing may vary greatly.
[Bypass: A machine may send a write to memory, but notice that a
subsequent instruction needs the value before the value arrives in
memory. In this case, the value can be sent to the execution unit
before it reaches memory - completely or partially bypassing the
memory/cache unit, and eliminating a memory cycle. Depending on the
exact timing/microarchitecture, the value can come directly from the
ALU, from pipeline stages in the CPU, from pipe stages in the memory
unit, from a write queue, write buffer or cache.]
> It's actually the destination processing that takes no memory cycles
> (well, on the 11/34 it does take that cycle). I don't know exactly how
> they pull that one off, but it's very clear from the book that is
> really is that step which doesn't cost anything on some of the
> architectures. If it is caching, or if it done in parallel with the
> source read, with the next fetch, or whatnot, I don't know.
>
> My guess is that the CPU actually manage to squeeze it in with the
> source read phase.
You can't do this. A write requires the source operand; you can't start
it until you you have the source data. (A clever machine can overlap
the access checks, and maybe the address phase of an a/d external bus if
the source read doesn't require the bus. But the 11s were not that smart.)
You *can* disconnect the write from the next fetch with a write buffer
(as I noted), but that doesn't eliminate the write, it just delays it
until the write buffer is evicted.
All of this amounts to accounting and buffering tricks. With buffers
(and I include caches in this), you eliminate some bus cycles. But
eventually things get evicted. So the bus cycle happens later - and is
charged to something else. Maybe a subsequent instruction. Maybe
averaged into memory overhead. With luck, there are multiple references
to the the same address, and there is a reduction in external bus cycles
due to coalescing. Statistically, that works. Worst case, they all happen.
In any case, the details are only of interest to hardware types - they
don't and never will enter SimH.
> They are not rolled back on a PDP-11.
Not in hardware. As you noted, the OS has to do it, which is even slower.
However, the resulting memory/register fetches create bus traffic, which
the OP would like to see.
> Also, the 11/45/50/55 could have a separate Unibus for memory. You
> could say that this would be a memory bus.
Yes, I was thinking of those machines - the 11/45 counts as 'early'.
> My whole point was that single stepping on the cycle level would not
> be universally the same on all PDP-11s, so doing it in simh would mean
> you'd have to do different stuff depending on specific model.
We are in violent agreement on the first point. On the second, I go
further: the effort would be inconsistent with SimH's goals and design.
It should not be attempted.
> The PDP-11s that have cache all use a write through cache.
Yes. The KL10 was the first DEC machine to have a write-back cache.
And it has some novel aspects for software.
['aspect' DEC jargon file: something that *just is*, v.s. a *feature* or
a *bug*]
This communication may not represent my employer's views,
if any, on the matters discussed.
On 06-Sep-13 08:46, Johnny Billquist wrote:
> The number of cycles above is the *total*. Read the handbook.
> It's actually the destination processing that takes no memory cycles
> (well, on the 11/34 it does take that cycle). I don't know exactly how
> they pull that one off, but it's very clear from the book that is
> really is that step which doesn't cost anything on some of the
> architectures. If it is caching, or if it done in parallel with the
> source read, with the next fetch, or whatnot, I don't know.
>
> My guess is that the CPU actually manage to squeeze it in with the
> source read phase.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.trailing-edge.com/pipermail/simh/attachments/20130906/ec07e5a2/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5159 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.trailing-edge.com/pipermail/simh/attachments/20130906/ec07e5a2/attachment-0002.bin>
More information about the Simh
mailing list