<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 25/05/2017 04:03, Sergey Oboguev
wrote:<br>
</div>
<blockquote
cite="mid:1131665231.1633296.1495681413047@mail.yahoo.com"
type="cite">
<div style="color:#000; background-color:#fff;
font-family:Helvetica Neue, Helvetica, Arial, Lucida Grande,
sans-serif;font-size:16px">Superficially looking at (AS)MP VMS
code, it appears that the following should (hopefully) suffice
for correct operation:<br
id="yui_3_16_0_ym19_1_1495680785574_3703">
<br id="yui_3_16_0_ym19_1_1495680785574_3704">
1. BBSSI and BBCCI should acquire a lock when accessing the
memory location. A simplistic implementation may use one lock
for the whole memory (or the whole MA780 memory bank). A more
sophisticated implementation may use a bucket of locks, with a
particular physical address within an MA bank mapping to a
corresponding lock in the bucket (with a lock being shared by a
range of MA physical addresses) -- but that would probably be an
overkill for 2-CPU config which is not particularly heavy on
synchronization.<br id="yui_3_16_0_ym19_1_1495680785574_3705">
</div>
</blockquote>
<br>
My plan was to use just one global lock which would be set on the
read cycle and cleared on the write cycle.<br>
<br>
<blockquote
cite="mid:1131665231.1633296.1495681413047@mail.yahoo.com"
type="cite">
<div style="color:#000; background-color:#fff;
font-family:Helvetica Neue, Helvetica, Arial, Lucida Grande,
sans-serif;font-size:16px"><br
id="yui_3_16_0_ym19_1_1495680785574_3706">
1.2. VMS itself does not appear to use anything other than BBSSI
and BBCCI in the ASMP code. However applications or libraries
using the multiprocessing may, so for their sake the same
applies to other interlocked instructions as well. Those
applications or libraries might also conceivably use a higher
rate of locking (justifying the bucketing of locks in this case)
-- but do they even exist in the first place?<br
id="yui_3_16_0_ym19_1_1495680785574_3707">
</div>
</blockquote>
<br>
Chapter 4 of the VAX-11/782 User's Guide recommends the interlocked
instructions for user written code so they all need to be supported.
We really need the MA780 technical description of the field
maintenance print set to understand how it handles the
read-interlocked SBI cycle.<br>
<br>
<blockquote
cite="mid:1131665231.1633296.1495681413047@mail.yahoo.com"
type="cite">
<div style="color:#000; background-color:#fff;
font-family:Helvetica Neue, Helvetica, Arial, Lucida Grande,
sans-serif;font-size:16px"><br
id="yui_3_16_0_ym19_1_1495680785574_3708">
2. When sending out an IPI, the sending VCPU thread should
execute a write memory barrier right before writing to the
interrupt register.<br id="yui_3_16_0_ym19_1_1495680785574_3709">
<br id="yui_3_16_0_ym19_1_1495680785574_3710">
3. When receiving an IPI and before handling it, the receiving
VCPU thread should execute a read memory barrier matching the
barrier in (2). An obvious implementation would be for (2) and
(3) to acquire a lock on the "interrupt pending" register of the
CPU that is the target of the IPI.<br
id="yui_3_16_0_ym19_1_1495680785574_3711">
</div>
</blockquote>
<br>
I probably should have researched memory barriers a bit more. I knew
a little bit about them but wasn't sure if they were needed here.
The problem may also exist for the rest of the shared memory.<br>
<br>
<blockquote
cite="mid:1131665231.1633296.1495681413047@mail.yahoo.com"
type="cite">
<div style="color:#000; background-color:#fff;
font-family:Helvetica Neue, Helvetica, Arial, Lucida Grande,
sans-serif;font-size:16px"><br
id="yui_3_16_0_ym19_1_1495680785574_3712">
As is always with legacy MP code though, it is a bit of a
gamble. Modern host processors have different cache coherency
model than that of the 780 CPUs. Thus it is possible for some
sequences that worked on the 11/78x multiprocessor to start
failing when emulated on x86 or other contemporary host CPU.
Only a detailed review of the code with respect to the cache
coherency assumed by the code can tell.<br
id="yui_3_16_0_ym19_1_1495680785574_3713">
<br id="yui_3_16_0_ym19_1_1495680785574_3714">
But do we even know how the 780 cache operates?<br
id="yui_3_16_0_ym19_1_1495680785574_3715">
Is it write-through or lazy writeback?<br
id="yui_3_16_0_ym19_1_1495680785574_3716">
Do interlocked instructions (such as BBSSI/BBCCI) invalidate the
780 read cache?<br id="yui_3_16_0_ym19_1_1495680785574_3717">
Do they commit pending writebacks from the cache to MA780/main
memory (MS780) before the instruction completion?<br
id="yui_3_16_0_ym19_1_1495680785574_3720">
</div>
</blockquote>
<br>
Here is extract from the VAX-11/782 User's Guide that partly answers
the question:<br>
<br>
"Each MA780 shared memory subsystem should have the cache
invalidation map option. This option reduces traffic on the
Synchronous Backplane Interconnect (SBI) by reducing the number of
cache invalidate requests sent to each processor. By keeping track
of which locations in MA780 memory have been placed in the cache of
each processor, the option allows cache invalidate requests to be
sent only to the processor(s) whose cache contains the location that
has been invalidated."<br>
<br>
So it looks like the port invalidation control register contains a
mask of the SBI nodes that need to have cache invalidate requests
sent to them. The ASMP code sets the bit for nexus 0 (CPU) as part
of the initialisation.<br>
<br>
Matt<br>
</body>
</html>