<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">On 25/05/2017 04:03, Sergey Oboguev

      wrote:<br>

    </div>

    <blockquote

      cite="mid:1131665231.1633296.1495681413047@mail.yahoo.com"

      type="cite">

      <div style="color:#000; background-color:#fff;

        font-family:Helvetica Neue, Helvetica, Arial, Lucida Grande,

        sans-serif;font-size:16px">Superficially looking at (AS)MP VMS

        code, it appears that the following should (hopefully) suffice

        for correct operation:<br

          id="yui_3_16_0_ym19_1_1495680785574_3703">

        <br id="yui_3_16_0_ym19_1_1495680785574_3704">

        1. BBSSI and BBCCI should acquire a lock when accessing the

        memory location. A simplistic implementation may use one lock

        for the whole memory (or the whole MA780 memory bank). A more

        sophisticated implementation may use a bucket of locks, with a

        particular physical address within an MA bank mapping to a

        corresponding lock in the bucket (with a lock being shared by a

        range of MA physical addresses) -- but that would probably be an

        overkill for 2-CPU config which is not particularly heavy on

        synchronization.<br id="yui_3_16_0_ym19_1_1495680785574_3705">

      </div>

    </blockquote>

    <br>

    My plan was to use just one global lock which would be set on the

    read cycle and cleared on the write cycle.<br>

    <br>

    <blockquote

      cite="mid:1131665231.1633296.1495681413047@mail.yahoo.com"

      type="cite">

      <div style="color:#000; background-color:#fff;

        font-family:Helvetica Neue, Helvetica, Arial, Lucida Grande,

        sans-serif;font-size:16px"><br

          id="yui_3_16_0_ym19_1_1495680785574_3706">

        1.2. VMS itself does not appear to use anything other than BBSSI

        and BBCCI in the ASMP code. However applications or libraries

        using the multiprocessing may, so for their sake the same

        applies to other interlocked instructions as well. Those

        applications or libraries might also conceivably use a higher

        rate of locking (justifying the bucketing of locks in this case)

        -- but do they even exist in the first place?<br

          id="yui_3_16_0_ym19_1_1495680785574_3707">

      </div>

    </blockquote>

    <br>

    Chapter 4 of the VAX-11/782 User's Guide recommends the interlocked

    instructions for user written code so they all need to be supported.

    We really need the MA780 technical description of the field

    maintenance print set to understand how it handles the

    read-interlocked SBI cycle.<br>

    <br>

    <blockquote

      cite="mid:1131665231.1633296.1495681413047@mail.yahoo.com"

      type="cite">

      <div style="color:#000; background-color:#fff;

        font-family:Helvetica Neue, Helvetica, Arial, Lucida Grande,

        sans-serif;font-size:16px"><br

          id="yui_3_16_0_ym19_1_1495680785574_3708">

        2. When sending out an IPI, the sending VCPU thread should

        execute a write memory barrier right before writing to the

        interrupt register.<br id="yui_3_16_0_ym19_1_1495680785574_3709">

        <br id="yui_3_16_0_ym19_1_1495680785574_3710">

        3. When receiving an IPI and before handling it, the receiving

        VCPU thread should execute a read memory barrier matching the

        barrier in (2). An obvious implementation would be for (2) and

        (3) to acquire a lock on the "interrupt pending" register of the

        CPU that is the target of the IPI.<br

          id="yui_3_16_0_ym19_1_1495680785574_3711">

      </div>

    </blockquote>

    <br>

    I probably should have researched memory barriers a bit more. I knew

    a little bit about them but wasn't sure if they were needed here.

    The problem may also exist for the rest of the shared memory.<br>

    <br>

    <blockquote

      cite="mid:1131665231.1633296.1495681413047@mail.yahoo.com"

      type="cite">

      <div style="color:#000; background-color:#fff;

        font-family:Helvetica Neue, Helvetica, Arial, Lucida Grande,

        sans-serif;font-size:16px"><br

          id="yui_3_16_0_ym19_1_1495680785574_3712">

        As is always with legacy MP code though, it is a bit of a

        gamble. Modern host processors have different cache coherency

        model than that of the 780 CPUs. Thus it is possible for some

        sequences that worked on the 11/78x multiprocessor to start

        failing when emulated on x86 or other contemporary host CPU.

        Only a detailed review of the code with respect to the cache

        coherency assumed by the code can tell.<br

          id="yui_3_16_0_ym19_1_1495680785574_3713">

        <br id="yui_3_16_0_ym19_1_1495680785574_3714">

        But do we even know how the 780 cache operates?<br

          id="yui_3_16_0_ym19_1_1495680785574_3715">

        Is it write-through or lazy writeback?<br

          id="yui_3_16_0_ym19_1_1495680785574_3716">

        Do interlocked instructions (such as BBSSI/BBCCI) invalidate the

        780 read cache?<br id="yui_3_16_0_ym19_1_1495680785574_3717">

        Do they commit pending writebacks from the cache to MA780/main

        memory (MS780) before the instruction completion?<br

          id="yui_3_16_0_ym19_1_1495680785574_3720">

      </div>

    </blockquote>

    <br>

    Here is extract from the VAX-11/782 User's Guide that partly answers

    the question:<br>

    <br>

    "Each MA780 shared memory subsystem should have the cache

    invalidation map option. This option reduces traffic on the

    Synchronous Backplane Interconnect (SBI) by reducing the number of

    cache invalidate requests sent to each processor. By keeping track

    of which locations in MA780 memory have been placed in the cache of

    each processor, the option allows cache invalidate requests to be

    sent only to the processor(s) whose cache contains the location that

    has been invalidated."<br>

    <br>

    So it looks like the port invalidation control register contains a

    mask of the SBI nodes that need to have cache invalidate requests

    sent to them. The ASMP code sets the bit for nexus 0 (CPU) as part

    of the initialisation.<br>

    <br>

    Matt<br>

  </body>

</html>