[Simh] Cluster communications errors
Hunter Goatley
goathunter at goatley.com
Thu Jul 19 22:15:34 EDT 2018
Here's where we stand on our cluster communications errors: nothing we
did worked. We tried different ports on the switch. We tried forcing
1Gbps. We tried forcing the port down to 10 Mbps. That actually seemed
to help slightly, in that we only lost communications every 63 seconds
or so, instead of every 15--60 seconds. But it would lose and
re-establish connection to the cluster every 63 seconds.
So I decided to try setting up and using a TAP device, just to see what
would happen.
Using the dedicated Ethernet card, it made no difference. It still lost
communications every 63 seconds.
When I say dedicated Ethernet card, I probably should have stated
earlier that it's a USB -> Ethernet device plugged into the system. I
don't know what brand or model, but I can find out, if anyone wants to know.
So I decided to try tunneling through the "real" Ethernet port used by
the Linux system. After figuring out what to do for the missing tunctl
command under CentOS, I was able to set up a tunnel, and I did "attach
xq tap:tap0". I then booted the system and wonder of wonders, miracle of
miracles, it was seven minutes into the boot (yes, it takes a long time,
mounting a slew of disks that needed to be rebuilt) before it lost
communications. But it re-established them immediately, and as of my
typing this, it was been twenty-nine minutes since that happened. No
further drops. Normally, I wouldn't think twenty-nine minutes is enough
to prove anything, but when it was dropping every 15--63 seconds for two
solid days, this sounds like a fix to me.
So what does it mean? One thing it suggests is that the USB Ethernet
device may be buggy or bad. I mean, it seems to work OK for TCP/IP
communications, etc, but it sure sounds like it may be the part
responsible for the problems. Especially since tunneling through the
built-in Ethernet card seems to work and tunneling through the USB
device did not.
These are the commands I used to set up the tap device for CentOS:
brctl addbr br0
ifconfig eno1 0.0.0.0 ; eno1 is the host's Ethernet device
ifconfig br0 XXX.XX.XX.XX up ; the IP address of the host system
brctl addif br0 eno1
brctl setfd br0 0
#tunctl -t tap0
ip tuntap add tap0 mode tap ; Replacement for tunctl on CentOS 7
brctl addif br0 tap0
ifconfig tap0 up
I then just did "xq attach tap:tap0" in the init file. I guess I should
set up a special MAC address, but I haven't yet, and so far, nothing
seems amiss.
While I thought having a dedicated Ethernet device would be the simplest
thing, I can live with tunneling it through the shared Ethernet device,
especially since it works and the former does not. ;-)
Thank you for all of your input over the past couple of days, and thank
you for all of your work on SIMH!
Hunter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.trailing-edge.com/pipermail/simh/attachments/20180719/79b7e7fe/attachment.html>
More information about the Simh
mailing list