[Simh] Simulating the PDP-15/76 Unichannel

Tue Mar 17 11:16:45 EDT 2015

On 2015-03-17 08:36, Sergey Oboguev wrote:
>> From: E. Groenenberg <quapla at xs4all.nl>
>>
>> Any current & reasonably seasoned Unix or derivative has shmat() & shmdt()
>> (or equivalent) calls for attaching & detaching a segment of a given size.
>>
>> Also MS$ has this
>> [...]
>> so a common API for SIMH would be fairly easy.
>>
>> [ ...]
>>
>> With this feature, it would probably also easy to realize a PDP-11/74
>> with 4 instances of SIMH.
>
> Mapping shared memory segment into the address spaces of multiple SIMH
> instances is the straightforward part.

Actually, it might be less trivial than first thought...

There is a problem in that different emulators might use a different 
layout for memory, that you somehow need to overlap when you come here. 
And that layout can become rather weird, and how do you actually figure 
this out.

Remember, this thread started with the explicit wish to deal with the 
PDP-15 and PDP-11 sharing memory. The PDP-15 uses 18-bit memory, while 
the PDP-11 uses 16-bit memory. Now we need a way to map addresses 
between the two, while the size of each memory word differs.

IN a way, you need to say that address X on the PDP-15 is address Y on 
the PDP-11. Now, address X+1 on the PDP-15 (18 bits), needs to map to 
address Y+2 on the PDP-11 (16 bits, but with 8-bit addressable units).
And in this, what does Y+1 on the PDP-11 map to?

> More intricate parts are:
>
> 1) Mapping guest memory-consistency model to host memory-consistency model.
>
> Does legacy software expect to observe updates to the shared memory executed
> by simulated VCPU1 (or virtual IO processor) to become observable by VCPU2
> in the same order they were executed by VCPU1? If yes, this would not work
> (without special handling by the simulator) on modern machines with weaker
> memory-consistency model.

Correct. And yes, order will be important in most cases.

> What is the cache coherency protocol of the simulated system and how does it
> map to the host system?
>
> Does simulated system have any system events (such as inter-processor
> interrupts or IO completion) that affect cache synchronization between VCPUs
> or VCPUs/VIOPs, and if yes, how this is to be mapped to the simulator?
>
> Ultimately it all depends on exact actual synchronization patterns utilized
> by legacy software.

I'll just go for the easy case here to start with. You normally do not 
have cache coherency in these old machines. In most cases, you do not 
even have caches.
And in the case systems do have caches, it's mostly write through, not 
write back.
So memory always holds the "correct" data immediately. MP systems had to 
know when there was a potential for cache problems, and work around it 
explicitly.

If we start trying to emulate machines that do have caches, we'll have 
to choose to either implement our own caching layer, in order to emulate 
the cache correctly, or else accept the behavior of the cache of the 
host machine. All modern machines implement cache coherency, so in 
essence you'll get the same effects as if you had no cache. In short, 
things work correct, and you never have broken cache coherency on modern 
machines, and you will not get any issues in normal simulation.

The write order might cause more problems though, if data is updated in 
a different order than you write it, on another CPU. That might require 
that you start playing with memory barriers, which will hurt performance.
But I don't see any way around this, if it becomes a problem.

> 2) Mapping memory atomicity.
>
> Does host machine provide the same memory access atomicity and separability
> as the simulated machine? For instance, if a simulated machine provides a way
> to update a byte at address A without interfering with concurrent updates to
> bytes at A-1 and A+1 by other VCPUs, then this would take a special effort
> to be implemented on a host machine that has let us say a 4-byte word as
> the minimum separable unit. Ditto for atomic and separable 2-byte word
> accesses (atomicity would mean that concurrent writes to the word do not
> result in resultant bytes values coming from different writes,
> separability would mean that concurrent writes to neighbor words do not
> interfere with each other).

Good point. That needs to be found out on the simulated machines first, 
on how it actually behaves.

> 3) Does a simulated system have any synchronization facilities such as
> interlocked instructions or machine-specific registers that affect cache
> coherency?

The cache coherency is not something I would worry too much about here. 
The host machine will essentially give you an environment where you 
never have inconsistent caches at the host level, and therefore not at 
the memory level either.
If the simulated machine have special handling to deal with potential 
cache incoherencies in the original machine design, those will just 
become like NOOPs in the simulated environment, since the cache will 
always be coherent.
(Unless we start emulating the cache layer, at which point we can do 
whatever we want.)

> 4) Mapping execution model.
>
> What happens if a host system thread simulating the execution of VCPU1 (or
> virtual IO processor) gets preempted, while VCPU2 is waiting for a response
> from VCPU1? Does the simulated system (specifically, its legacy software)
> rely on finite timeouts?

I don't see a memory issue here, but there is a more general issue about 
response times between independent CPUs that might become very hard to 
maintain correctly when they are all simulated in a system without real 
time properties.
However, even if we have realtime properties, you can get problems. In 
some cases, you will be finding spinlocks on systems, that timeout based 
on loop counts. When *much* faster CPUs, those spinlocks might be timing 
out way too fast.

	Johnny