[Simh] Cluster communications errors

Paul Koning paulkoning at comcast.net
Wed Jul 18 20:29:18 EDT 2018



> On Jul 18, 2018, at 8:22 PM, Johnny Billquist <bqt at softjar.se> wrote:
> 
> On 2018-07-19 02:07, Paul Koning wrote:
>>> On Jul 18, 2018, at 7:18 PM, Johnny Billquist <bqt at softjar.se> wrote:
>>> 
>>>> ...
>>> 
>>> It's probably worth pointing out that the reason I implemented that was not because of hardware problems, but because of software problems. DECnet can degenerate pretty badly when packets are lost. And if you shove packets fast enough at the interface, the interface will (obviously) eventually run out of buffers, at which point packets will be dropped.
>>> This is especially noticeable in DECnet/RSX at least. I think I know how to improve that software, but I have not had enough time to actually try fixing it. And it is especially noticeable when doing file transfers over DECnet.
>> All ARQ protocols suffer dramatically with packet loss.  The other day I was reading a recent paper about high speed long distance TCP.  It showed a graph of throughput vs. packet loss rate.  I forgot the exact numbers, but it was something like 0.01% packet loss rate causes a 90% throughput drop.  Compare that with the old (1970s) ARPAnet rule of thumb that 1% packet loss means 90% loss of throughput.  Those both make sense; the old one was for "high speed" links running at 56 kbps, rather than the multi-Gbps of current links.
>> The other thing with nontrivial packet loss is that any protocol with congestion control algorithms triggered by packet loss (such as recent versions of DECnet), the flow control machinery will severely throttle the link under such conditions.
>> So yes, anything you can do in the infrastructure to keep the packet loss well under 1% is going to be very helpful indeed.
> 
> Right. That said, TCP behaves extremely much better than DECnet here. At least if we talk about TCP with the ability to deal with out of order packets (which most should do) and DECnet under RSX. The problem with DECnet under RSX is that recovering from a lost packet because of congestion essentially guarantees that congestion will happen again, while TCP pretty quickly comes into a steady working state.

Out of order packet handling isn't involved in that.  Congestion doesn't reorder packets.  If you drop a packet, TCP and DECnet both force the retransmission of all packets starting with the dropped one.  (At least, I don't think selective ACK is used in TCP.)  DECnet described out of order packet caching for the same reason TCP does: to work efficiently in packet topologies that have multiple paths in which the routers do equal cost path splitting.  In DECnet, that support is optional; it's not in DECnet/E and I wouldn't expect it in other 16-bit platforms either.

> I have not analyzed other DECnet implementation enough to tell for sure if they also exhibit the same problem.

Another consideration is that TCP has seen another 20 years of work on congestion control since DECnet Phase IV.  But in any case, it may well be that VMS handles these things better.  It's also possible that DECnet/OSI does, since it is newer and was designed right around the time that DEC very seriously got into congestion control algorithm research.  Phase IV isn't so well developed; it largely predates that work.

	paul



More information about the Simh mailing list