[Simh] Cluster communications errors

Wed Jul 18 20:53:52 EDT 2018

On 2018-07-19 02:29, Paul Koning wrote:
> 
> 
>> On Jul 18, 2018, at 8:22 PM, Johnny Billquist <bqt at softjar.se> wrote:
>>
>> On 2018-07-19 02:07, Paul Koning wrote:
>>>> On Jul 18, 2018, at 7:18 PM, Johnny Billquist <bqt at softjar.se> wrote:
>>>>
>>>>> ...
>>>>
>>>> It's probably worth pointing out that the reason I implemented that was not because of hardware problems, but because of software problems. DECnet can degenerate pretty badly when packets are lost. And if you shove packets fast enough at the interface, the interface will (obviously) eventually run out of buffers, at which point packets will be dropped.
>>>> This is especially noticeable in DECnet/RSX at least. I think I know how to improve that software, but I have not had enough time to actually try fixing it. And it is especially noticeable when doing file transfers over DECnet.
>>> All ARQ protocols suffer dramatically with packet loss.  The other day I was reading a recent paper about high speed long distance TCP.  It showed a graph of throughput vs. packet loss rate.  I forgot the exact numbers, but it was something like 0.01% packet loss rate causes a 90% throughput drop.  Compare that with the old (1970s) ARPAnet rule of thumb that 1% packet loss means 90% loss of throughput.  Those both make sense; the old one was for "high speed" links running at 56 kbps, rather than the multi-Gbps of current links.
>>> The other thing with nontrivial packet loss is that any protocol with congestion control algorithms triggered by packet loss (such as recent versions of DECnet), the flow control machinery will severely throttle the link under such conditions.
>>> So yes, anything you can do in the infrastructure to keep the packet loss well under 1% is going to be very helpful indeed.
>>
>> Right. That said, TCP behaves extremely much better than DECnet here. At least if we talk about TCP with the ability to deal with out of order packets (which most should do) and DECnet under RSX. The problem with DECnet under RSX is that recovering from a lost packet because of congestion essentially guarantees that congestion will happen again, while TCP pretty quickly comes into a steady working state.
> 
> Out of order packet handling isn't involved in that.  Congestion doesn't reorder packets.  If you drop a packet, TCP and DECnet both force the retransmission of all packets starting with the dropped one.  (At least, I don't think selective ACK is used in TCP.)  DECnet described out of order packet caching for the same reason TCP does: to work efficiently in packet topologies that have multiple paths in which the routers do equal cost path splitting.  In DECnet, that support is optional; it's not in DECnet/E and I wouldn't expect it in other 16-bit platforms either.

This is maybe getting too technical, so let me know if we should take 
this off list.

Yes, congestion does not reorder packets. However, if you cannot handle 
out of order packets, you have to retransmit everything from the point 
where a packet was lost.
If you can deal with packets out of order, you can keep the packets you 
received, even though there is a hole, and once that hole is plugged, 
you can ACK everything. And this is pretty normal in TCP, even without 
selective ACK.

So, in TCP, what normally happens is that a node is spraying packets as 
fast as it can. Some packets are lost, but not all of them. Including 
some holes in the sequence of received packets.
TCP will after some time, or other heuristics, start retransmitting from 
the point where packets were lost, and as soon as the receiving end have 
plugged the hole, it will jump forward with the ACKs, meaning the sender 
does not need to retransmit everything. Even more, if the sender does 
retransmit everything, loosing some of those retransmitted packets will 
not matter, since the receiver already have them anyway. At some point, 
you will get to a state where the receiver have no window open, so the 
transmitter is getting blocked, and every time the receiver opens up a 
window, which usually is just a packet or two in size, the transmitter 
can send that much data. But this much data is usually less than the 
number of buffers the hardware have, so there are no problems receiving 
those packets, and TCP gets into a steady state where the transmitter 
can transmit packets as fast as the receiver can consume them, and apart 
from a few lost packets in the early stages, no packets are lost.

DECnet (at least in RSX) on the other hand will transmit a whole bunch 
of packets. The first few will get through, but at some point one or 
several are lost. After some time, DECnet decides that packets were 
lost, and will back up and start transmitting again from the point where 
the packets were lost. Once more it will soon blast more packets than 
the receiver can process, and you will once more get a timeout 
situation. DECnet is backing off on the timeouts every time this 
happens, and soon you are at a horrendous 127s timeout for pretty much 
every other packet sent, meaning in effect you are only managing to send 
one packet every 127s. This is worsened, I think, by something that 
looks like a bug in the NFT/FAL code in RSX, where the code assumes it 
is faster than the packet transfer rate, and can manage to do a few 
things before two packets have been received. How much is to blame on 
DECnet in general, and how much on NFT/FAL, I'm not entirely clear. Like 
I said, I have not had time to really test things around this.
But it's very easy to demonstrate the problem. Just setup an old PDP-11 
and a simh (or similar) machine on the same DECnet, and try to transfer 
a larger file to the real PDP-11, and check network counters and observe 
how thing immediately go to a standstill.

Which is why I implemented the throttling in the bridge, which Mark 
mentioned.

As far as path splitting goes, it is implemented in RSX-11M-PLUS, but 
disabled. I tried enabling it once, but the system crashed. The manuals 
have it documented, but I'm wondering if DEC never actually completed 
the work.

>> I have not analyzed other DECnet implementation enough to tell for sure if they also exhibit the same problem.
> 
> Another consideration is that TCP has seen another 20 years of work on congestion control since DECnet Phase IV.  But in any case, it may well be that VMS handles these things better.  It's also possible that DECnet/OSI does, since it is newer and was designed right around the time that DEC very seriously got into congestion control algorithm research.  Phase IV isn't so well developed; it largely predates that work.

Well, this isn't really about congestion control so much as just being 
able to handle out of order packets. Although congestion control could 
certainly also be applied to alleviate the problem.

I know that OSI originally stated the same basic assumption DECnet have 
- links are 100% reliable and never drops or reorder packets.
A very bad assumption to build protocols on, and OSI eventually also 
defined links and operations based on technology where these assumptions 
were not true. So I would hope/assume that DECnet/OSI eventually got 
better. But I strongly suspect it was not the case from the start.

   Johnny

-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt at softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol