[Simh] Cluster communications errors

Hunter Goatley goathunter at goatley.com
Thu Jul 19 22:15:34 EDT 2018


Here's where we stand on our cluster communications errors: nothing we 
did worked. We tried different ports on the switch. We tried forcing 
1Gbps. We tried forcing the port down to 10 Mbps. That actually seemed 
to help slightly, in that we only lost communications every 63 seconds 
or so, instead of every 15--60 seconds. But it would lose and 
re-establish connection to the cluster every 63 seconds.

So I decided to try setting up and using a TAP device, just to see what 
would happen.

Using the dedicated Ethernet card, it made no difference. It still lost 
communications every 63 seconds.

When I say dedicated Ethernet card, I probably should have stated 
earlier that it's a USB -> Ethernet device plugged into the system. I 
don't know what brand or model, but I can find out, if anyone wants to know.

So I decided to try tunneling through the "real" Ethernet port used by 
the Linux system. After figuring out what to do for the missing tunctl 
command under CentOS, I was able to set up a tunnel, and I did "attach 
xq tap:tap0". I then booted the system and wonder of wonders, miracle of 
miracles, it was seven minutes into the boot (yes, it takes a long time, 
mounting a slew of disks that needed to be rebuilt) before it lost 
communications. But it re-established them immediately, and as of my 
typing this, it was been twenty-nine minutes since that happened. No 
further drops. Normally, I wouldn't think twenty-nine minutes is enough 
to prove anything, but when it was dropping every 15--63 seconds for two 
solid days, this sounds like a fix to me.

So what does it mean? One thing it suggests is that the USB Ethernet 
device may be buggy or bad. I mean, it seems to work OK for TCP/IP 
communications, etc, but it sure sounds like it may be the part 
responsible for the problems. Especially since tunneling through the 
built-in Ethernet card seems to work and tunneling through the USB 
device did not.

These are the commands I used to set up the tap device for CentOS:

    brctl addbr br0
    ifconfig eno1 0.0.0.0          ; eno1 is the host's Ethernet device
    ifconfig br0 XXX.XX.XX.XX up   ; the IP address of the host system
    brctl addif br0 eno1
    brctl setfd br0 0
    #tunctl -t tap0
    ip tuntap add tap0 mode tap    ; Replacement for tunctl on CentOS 7
    brctl addif br0 tap0
    ifconfig tap0 up

I then just did "xq attach tap:tap0" in the init file. I guess I should 
set up a special MAC address, but I haven't yet, and so far, nothing 
seems amiss.

While I thought having a dedicated Ethernet device would be the simplest 
thing, I can live with tunneling it through the shared Ethernet device, 
especially since it works and the former does not. ;-)

Thank you for all of your input over the past couple of days, and thank 
you for all of your work on SIMH!

Hunter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.trailing-edge.com/pipermail/simh/attachments/20180719/79b7e7fe/attachment.html>


More information about the Simh mailing list