[Simh] Cluster communications errors
Mark Pizzolato
Mark at infocomm.com
Thu Jul 19 23:34:14 EDT 2018
On Thursday, July 19, 2018 at 8:18 PM, Hunter Goatley wrote:
> Another data point. After more playing around and several reboots,
> I can confirm that with tunneling using the host system's Ethernet
> device, communications with other cluster members only drops
> when DECnet is started.
> %%%%%%%%%%% OPCOM 19-JUL-2018 23:14:55.58 %%%%%%%%%%%
> Message from user DECNET on DARTH
> DECnet starting
>
> %CNXMAN, lost connection to system QUEST
> %CNXMAN, lost connection to system GALAXY
> %CNXMAN, re-established connection to system FASTER
> %CNXMAN, quorum lost, blocking activity
> %CNXMAN, re-established connection to system VADER
> %CNXMAN, re-established connection to system QUEST
> %CNXMAN, quorum regained, resuming activity
> That's not a full log, but as soon as I see the OPCOM message about
> DECnet starting, I get the "lost connection" messages, then the "re-established"
> messages, and then everything is fine afterward.
The improvement by setting the port speed to 10Mbit suggests
that packet loss/overruns are happening and they are reduced
by limiting the wire speed.
If this wasn't a cluster, I say that DECnet starting might have
caused XQ device's MAC address to be changed around that
time to reflect the DECnet Phase IV address switch that is done.
Which might then have some effect on the switch's learning
of MAC addresses... However, in a cluster this change is done
when the LAN device is first brought online with info in
SYSGEN parameter (SCS_SYSTEMID).
The arrival of DECnet's traffic might be causing a burst of traffic
that still ends up overrunning another systems ability to receive
it. Do things change if you throttle the simh VAX down?
sim> SET CPU NOIDLE
sim> SET THROTTLE 25%
- Mark
More information about the Simh
mailing list