[Simh] Cluster communications errors

Mark Pizzolato Mark at infocomm.com
Thu Jul 19 23:34:14 EDT 2018


On Thursday, July 19, 2018 at 8:18 PM, Hunter Goatley wrote:
> Another data point. After more playing around and several reboots, 
> I can confirm that with tunneling using the host system's Ethernet 
> device, communications with other cluster members only drops 
> when DECnet is started.
> %%%%%%%%%%%  OPCOM  19-JUL-2018 23:14:55.58  %%%%%%%%%%%
> Message from user DECNET on DARTH
> DECnet starting
>
> %CNXMAN,  lost connection to system QUEST
> %CNXMAN,  lost connection to system GALAXY
> %CNXMAN,  re-established connection to system FASTER
> %CNXMAN,  quorum lost, blocking activity
> %CNXMAN,  re-established connection to system VADER
> %CNXMAN,  re-established connection to system QUEST
> %CNXMAN,  quorum regained, resuming activity
> That's not a full log, but as soon as I see the OPCOM message about 
> DECnet starting, I get the "lost connection" messages, then the "re-established"
>  messages, and then everything is fine afterward.

The improvement by setting the port speed to 10Mbit suggests 
that packet loss/overruns are happening and they are reduced
by limiting the wire speed.

If this wasn't a cluster, I say that DECnet starting might have 
caused XQ device's MAC address to be changed around that 
time to reflect the DECnet Phase IV address switch that is done.
Which might then have some effect on the switch's learning 
of MAC addresses...  However, in a cluster this change is done 
when the LAN device is first brought online with info in 
SYSGEN parameter (SCS_SYSTEMID).

The arrival of DECnet's traffic might be causing a burst of traffic 
that still ends up overrunning another systems ability to receive 
it.  Do things change if you throttle the simh VAX down?

      sim> SET CPU NOIDLE
      sim> SET THROTTLE 25%

- Mark


More information about the Simh mailing list