<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">

      <blockquote type="cite">The number of cycles above is the <b

          class="moz-txt-star"><span class="moz-txt-tag">*</span>total<span

class="moz-txt-tag">*. Read the handbook.

        <br>

      </blockquote>

      Not really.  You're confusing observed timing with which cycles

      happen.<br>

      <br>

      This is all a matter of how you account for the cycles.  The

      instruction must be fetched.  The source must be read.  And the

      destination must be written.  These all cause some sort of cycle -

      whether or not it appears on the external bus.  The handbook is

      talking not about how many cycles happen, but about the effect on

      instruction timing.  That is, accounting for their effect on

      observed timing.  <br>

      <br>

      The context of the OP's question was an assumption about which

      cycles *happen*, not their timing.  He wanted to observe/single

      step them, not measure how long instructions take.  I chose to

      reply mostly in the PDP-11 context, but intentionally included

      some general information since this question has come up with

      other architectures as well.<br>

      <br>

      From a timing point of view, the fetch can be hidden by an

      i-cache, by prefetch that overlaps execution, and other tricks. 

      The source can come from cache, from a bypassed result of a

      previous instruction in the pipeline, or from memory.  The

      destination can go to memory, a write buffer, or a cache.  And it

      can be bypassed direct to the ALU.<br>

      <br>

      This all has an impact on timing.  And depending on precisely how

      you define your observation point, which cycles happen.  Inside a

      CPU, most of them happen - except for bypassed operands.  Past

      that point, things get wild depending on cache/write buffer

      effectiveness.<br>

      <br>

      Architecturally, all three cycles always happen sequentially.  In

      a given instance on a given implementation, which ones appear on

      any particular bus and the resulting instruction timing may vary

      greatly.<br>

      <br>

      [Bypass:  A machine may send a write to memory, but notice that a

      subsequent instruction needs the value before the value arrives in

      memory.  In this case, the value can be sent to the execution unit

      before it reaches memory - completely or partially bypassing the

      memory/cache unit, and eliminating a memory cycle.  Depending on

      the exact timing/microarchitecture, the value can come directly

      from the ALU, from pipeline stages in the CPU, from pipe stages in

      the memory unit, from a write queue, write buffer or cache.]<br>

      <br>

      <blockquote type="cite">It's actually the destination processing

        that takes no memory cycles (well, on the 11/34 it does take

        that cycle). I don't know exactly how they pull that one off,

        but it's very clear from the book that is really is that step

        which doesn't cost anything on some of the architectures. If it

        is caching, or if it done in parallel with the source read, with

        the next fetch, or whatnot, I don't know.

        <br>

        <br>

        My guess is that the CPU actually manage to squeeze it in with

        the source read phase.

      </blockquote>

      You can't do this.  A write requires the source operand; you can't

      start it until you you have the source data.  (A clever machine

      can overlap the access checks, and maybe the address phase of an

      a/d external bus if the source read doesn't require the bus.  But

      the 11s were not that smart.)<br>

      <br>

      You *can* disconnect the write from the next fetch with a write

      buffer (as I noted), but that doesn't eliminate the write, it just

      delays it until the write buffer is evicted.<br>

      <br>

      All of this amounts to accounting and buffering tricks.  With

      buffers (and I include caches in this), you eliminate some bus

      cycles.  But eventually things get evicted.  So the bus cycle

      happens later - and is charged to something else.   Maybe a

      subsequent instruction.  Maybe averaged into memory overhead. 

      With luck, there are multiple references to the the same address,

      and there is a reduction in external bus cycles due to

      coalescing.  Statistically, that works.  Worst case, they all

      happen.<br>

      <br>

      In any case, the details are only of interest to hardware types -

      they don't and never will enter SimH.  <br>

      <br>

      <blockquote type="cite">They are not rolled back on a PDP-11.</blockquote>

      Not in hardware.  As you noted, the OS has to do it, which is even

      slower.  <br>

      <br>

      However, the resulting memory/register fetches create bus traffic,

      which the OP would like to see.<br>

      <blockquote type="cite">Also, the 11/45/50/55 could have a

        separate Unibus for memory. You could say that this would be a

        memory bus.

      </blockquote>

      Yes, I was thinking of those machines - the 11/45 counts as

      'early'.<br>

      <blockquote type="cite">My whole point was that single stepping on

        the cycle level would not be universally the same on all

        PDP-11s, so doing it in simh would mean you'd have to do

        different stuff depending on specific model.

      </blockquote>

      We are in violent agreement on the first point.  On the second, I

      go further: the effort would be inconsistent with SimH's goals and

      design.  It should not be attempted.<br>

      <blockquote type="cite">The PDP-11s that have cache all use a

        write through cache.

        <br>

      </blockquote>

      Yes.  The KL10 was the first DEC machine to have a write-back

      cache.  And it has some novel aspects for software.<br>

      <br>

      ['aspect' DEC jargon file: something that *just is*, v.s. a

      *feature* or a *bug*]<br>

      <br>

      <pre class="moz-signature" cols="72">This communication may not represent my employer's views,

if any, on the matters discussed. 

</pre>

      On 06-Sep-13 08:46, Johnny Billquist wrote:<br>

    </div>

    <blockquote cite="mid:5229CE88.2010108@softjar.se" type="cite">The

      number of cycles above is the <b class="moz-txt-star"><span

class="moz-txt-tag">*total<span class="moz-txt-tag">*.

      Read the handbook.

      <br>

      It's actually the destination processing that takes no memory

      cycles (well, on the 11/34 it does take that cycle). I don't know

      exactly how they pull that one off, but it's very clear from the

      book that is really is that step which doesn't cost anything on

      some of the architectures. If it is caching, or if it done in

      parallel with the source read, with the next fetch, or whatnot, I

      don't know.

      <br>

      <br>

      My guess is that the CPU actually manage to squeeze it in with the

      source read phase.

    </blockquote>

    <br>

  </body>

</html>