<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head><body ><div>Handle the sgatrd memory likre device regosrrrs. Vapyirr rradd and erites yo the dhared memoty amf  ley code in simh hamfle the fetails og mappinh betweem yhe figgerent gormats.</div><div><br></div><div><br></div><div><br></div><div><div style="font-size:100%">Sent from Samsung tablet</div></div><br><br><br>-------- Original message --------<br>From Johnny Billquist <bqt@softjar.se> <br>Date: 03/17/2015  9:16 AM  (GMT-07:00) <br>To simh@trailing-edge.com <br>Subject Re: [Simh] Simulating the PDP-15/76 Unichannel <br> <br><br>On 2015-03-17 08:36, Sergey Oboguev wrote:<br>>> From: E. Groenenberg <quapla@xs4all.nl><br>>><br>>> Any current & reasonably seasoned Unix or derivative has shmat() & shmdt()<br>>> (or equivalent) calls for attaching & detaching a segment of a given size.<br>>><br>>> Also MS$ has this<br>>> [...]<br>>> so a common API for SIMH would be fairly easy.<br>>><br>>> [ ...]<br>>><br>>> With this feature, it would probably also easy to realize a PDP-11/74<br>>> with 4 instances of SIMH.<br>><br>> Mapping shared memory segment into the address spaces of multiple SIMH<br>> instances is the straightforward part.<br><br>Actually, it might be less trivial than first thought...<br><br>There is a problem in that different emulators might use a different <br>layout for memory, that you somehow need to overlap when you come here. <br>And that layout can become rather weird, and how do you actually figure <br>this out.<br><br>Remember, this thread started with the explicit wish to deal with the <br>PDP-15 and PDP-11 sharing memory. The PDP-15 uses 18-bit memory, while <br>the PDP-11 uses 16-bit memory. Now we need a way to map addresses <br>between the two, while the size of each memory word differs.<br><br>IN a way, you need to say that address X on the PDP-15 is address Y on <br>the PDP-11. Now, address X+1 on the PDP-15 (18 bits), needs to map to <br>address Y+2 on the PDP-11 (16 bits, but with 8-bit addressable units).<br>And in this, what does Y+1 on the PDP-11 map to?<br><br>> More intricate parts are:<br>><br>> 1) Mapping guest memory-consistency model to host memory-consistency model.<br>><br>> Does legacy software expect to observe updates to the shared memory executed<br>> by simulated VCPU1 (or virtual IO processor) to become observable by VCPU2<br>> in the same order they were executed by VCPU1? If yes, this would not work<br>> (without special handling by the simulator) on modern machines with weaker<br>> memory-consistency model.<br><br>Correct. And yes, order will be important in most cases.<br><br>> What is the cache coherency protocol of the simulated system and how does it<br>> map to the host system?<br>><br>> Does simulated system have any system events (such as inter-processor<br>> interrupts or IO completion) that affect cache synchronization between VCPUs<br>> or VCPUs/VIOPs, and if yes, how this is to be mapped to the simulator?<br>><br>> Ultimately it all depends on exact actual synchronization patterns utilized<br>> by legacy software.<br><br>I'll just go for the easy case here to start with. You normally do not <br>have cache coherency in these old machines. In most cases, you do not <br>even have caches.<br>And in the case systems do have caches, it's mostly write through, not <br>write back.<br>So memory always holds the "correct" data immediately. MP systems had to <br>know when there was a potential for cache problems, and work around it <br>explicitly.<br><br>If we start trying to emulate machines that do have caches, we'll have <br>to choose to either implement our own caching layer, in order to emulate <br>the cache correctly, or else accept the behavior of the cache of the <br>host machine. All modern machines implement cache coherency, so in <br>essence you'll get the same effects as if you had no cache. In short, <br>things work correct, and you never have broken cache coherency on modern <br>machines, and you will not get any issues in normal simulation.<br><br>The write order might cause more problems though, if data is updated in <br>a different order than you write it, on another CPU. That might require <br>that you start playing with memory barriers, which will hurt performance.<br>But I don't see any way around this, if it becomes a problem.<br><br>> 2) Mapping memory atomicity.<br>><br>> Does host machine provide the same memory access atomicity and separability<br>> as the simulated machine? For instance, if a simulated machine provides a way<br>> to update a byte at address A without interfering with concurrent updates to<br>> bytes at A-1 and A+1 by other VCPUs, then this would take a special effort<br>> to be implemented on a host machine that has let us say a 4-byte word as<br>> the minimum separable unit. Ditto for atomic and separable 2-byte word<br>> accesses (atomicity would mean that concurrent writes to the word do not<br>> result in resultant bytes values coming from different writes,<br>> separability would mean that concurrent writes to neighbor words do not<br>> interfere with each other).<br><br>Good point. That needs to be found out on the simulated machines first, <br>on how it actually behaves.<br><br>> 3) Does a simulated system have any synchronization facilities such as<br>> interlocked instructions or machine-specific registers that affect cache<br>> coherency?<br><br>The cache coherency is not something I would worry too much about here. <br>The host machine will essentially give you an environment where you <br>never have inconsistent caches at the host level, and therefore not at <br>the memory level either.<br>If the simulated machine have special handling to deal with potential <br>cache incoherencies in the original machine design, those will just <br>become like NOOPs in the simulated environment, since the cache will <br>always be coherent.<br>(Unless we start emulating the cache layer, at which point we can do <br>whatever we want.)<br><br>> 4) Mapping execution model.<br>><br>> What happens if a host system thread simulating the execution of VCPU1 (or<br>> virtual IO processor) gets preempted, while VCPU2 is waiting for a response<br>> from VCPU1? Does the simulated system (specifically, its legacy software)<br>> rely on finite timeouts?<br><br>I don't see a memory issue here, but there is a more general issue about <br>response times between independent CPUs that might become very hard to <br>maintain correctly when they are all simulated in a system without real <br>time properties.<br>However, even if we have realtime properties, you can get problems. In <br>some cases, you will be finding spinlocks on systems, that timeout based <br>on loop counts. When *much* faster CPUs, those spinlocks might be timing <br>out way too fast.<br><br>     Johnny<br><br>_______________________________________________<br>Simh mailing list<br>Simh@trailing-edge.com<br>http://mailman.trailing-edge.com/mailman/listinfo/simh</body>