<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Here's where we stand on our cluster communications errors: nothing
we did worked. We tried different ports on the switch. We tried
forcing 1Gbps. We tried forcing the port down to 10 Mbps. That
actually seemed to help slightly, in that we only lost
communications every 63 seconds or so, instead of every 15--60
seconds. But it would lose and re-establish connection to the
cluster every 63 seconds.<br>
<br>
So I decided to try setting up and using a TAP device, just to see
what would happen.<br>
<br>
Using the dedicated Ethernet card, it made no difference. It still
lost communications every 63 seconds.<br>
<br>
When I say dedicated Ethernet card, I probably should have stated
earlier that it's a USB -> Ethernet device plugged into the
system. I don't know what brand or model, but I can find out, if
anyone wants to know.<br>
<br>
So I decided to try tunneling through the "real" Ethernet port used
by the Linux system. After figuring out what to do for the missing
tunctl command under CentOS, I was able to set up a tunnel, and I
did "attach xq tap:tap0". I then booted the system and wonder of
wonders, miracle of miracles, it was seven minutes into the boot
(yes, it takes a long time, mounting a slew of disks that needed to
be rebuilt) before it lost communications. But it re-established
them immediately, and as of my typing this, it was been twenty-nine
minutes since that happened. No further drops. Normally, I wouldn't
think twenty-nine minutes is enough to prove anything, but when it
was dropping every 15--63 seconds for two solid days, this sounds
like a fix to me.<br>
<br>
So what does it mean? One thing it suggests is that the USB Ethernet
device may be buggy or bad. I mean, it seems to work OK for TCP/IP
communications, etc, but it sure sounds like it may be the part
responsible for the problems. Especially since tunneling through the
built-in Ethernet card seems to work and tunneling through the USB
device did not.<br>
<br>
These are the commands I used to set up the tap device for CentOS:<br>
<blockquote>
<pre>brctl addbr br0
ifconfig eno1 0.0.0.0 ; eno1 is the host's Ethernet device
ifconfig br0 XXX.XX.XX.XX up ; the IP address of the host system
brctl addif br0 eno1
brctl setfd br0 0
#tunctl -t tap0
ip tuntap add tap0 mode tap ; Replacement for tunctl on CentOS 7
brctl addif br0 tap0
ifconfig tap0 up
</pre>
</blockquote>
I then just did "xq attach tap:tap0" in the init file. I guess I
should set up a special MAC address, but I haven't yet, and so far,
nothing seems amiss.<br>
<br>
While I thought having a dedicated Ethernet device would be the
simplest thing, I can live with tunneling it through the shared
Ethernet device, especially since it works and the former does not.
;-)<br>
<br>
Thank you for all of your input over the past couple of days, and
thank you for all of your work on SIMH!<br>
<br>
Hunter<br>
<br>
</body>
</html>