Archive for August, 2006

Update – Netboot on RHEL x86 (32bit) with Broadcom (tg3) – no network

Sunday, August 27th, 2006

In my post just below, I have defined a set of tests to verify the possible cause of the tg3 problem. It had nothing to do with autoneg, and was fixed in RHEL 4 Update 4. That 32bit installer works correctly.

One last thing to test – rebuild the installer initrd, and replace tg3 module by one built from source (for example, HP’s tg3 drivers from the Proliant Support Pack) for this kernel. I wonder if it will work.

Netboot on RHEL x86 (32bit) with Broadcom (tg3) – no network

Thursday, August 24th, 2006

I have a PXE installation server setup, and it usually works quite well. I have tried to install a Tyan based system using this setup, but this time – RHEL4 U3 X86 and not the usual X86_64 system.

RH installer starts by asking few questions (language, keyboard, method of installation) and fails to obtain DHCP IP. Even setting manual IP results with no communication.

I got an idea from a friend and would try it today – since the 1Gb/s Broadcom is connected to 100Mb/s switch, I should try and disable the auto-negotiation, and set a predefined speed for the card. We’ll see how/if it works, and if it allows for the 32bit installer to work. The 64bit installer works fine, by the way.

Tyan Thunder K8QE and Linux RHEL 4 Update 3

Sunday, August 13th, 2006

This board is a tricky board. 4GB RAM and above behave in a weird manner in Linux. It appears that PCI 32bit mapping doesn’t work correctly under Linux.

To allow Linux to work on this hardware without failure (such as kernel crush during startup), you must follow these three simple guidelines:

1. Spread the memory equally near all CPUs. For example, if you have 4GB RAM for the four CPU version (8 cores, in my case), spread the memory 1GB near each CPU.

2. Make sure you set the type of OS to Linux in the BIOS. PCI mapping won’t work otherwise.

3. Do not put PCI 32bit cards in the PCI-X slots. It will render the onboard network cards unusable.

HP-UX – allowed shells, and connecting FC Multipath to NetApp

Thursday, August 10th, 2006

When adding a certain shell to an HP-UX system, for example, /usr/bin/tcsh, each user set to use this shell will not be able to FTP to the machine, until there is entry in /etc/shells. The trick is that even if the file doesn’t exist, you have to create it. By default, HP-UX allows only /sbin/sh and /bin/sh shells, but as soon as you setup this file, you can allow more shells. Mind you that you have to include /sbin/sh and /bin/sh in /etc/shells, else other things might not work correctly. Taken from here.

Connecting HP-UX to SAN storage is never too simple. The actual list of actions is:

1. Install HP-UX drivers for the FC adapter

2. Map the PWWN obtained from (reading the sticker at the back of the machine, or querying the storage/SAN switch) the machine to the relevant LUNs.

3. Run “/usr/sbin/ioscan -fnC disk” and see that the new disk devices are detected.

4. Run “/usr/sbin/ioinit -i” to create the relevant device files.

A note – HP-UX might require a reboot after the initial connection. On several cases I’ve noticed that if the server was running for a while with disconnected fiber, only being connected during before startup would result in link and in SAN registration. Of course, the driver must be installed then.

If you are to connect your HP-UX to NetApp device, as we did, take a day (or more) notice and open “now” account in http://now.netapp.com. You can find documentation about HP-UX (including step-by-step), you can find the “SAN Attach Kit for HP-UX” which will make your life easier, and set of best-practice guides. Just follow these guides, and you will find it easy and simple task to do.

Troubleshooting weird networking problem

Wednesday, August 9th, 2006

Problem as follows: A Linux server is connected to a 1Gb/s LAN using 1Gb/s interface.

I was told that SSH to the machine fails with “socket error” when done from Windows/Putty. One of the tests was done using Linux/ssh client, and it went fine. A switch was replaced, and other methods of detection showed weird results.

When I came to the place I have started with the usual procedure – ifconfig, and to see there are no TX or RX errors, dmesg, checking /var/log/messages, ethtool. All produced the results expected when everything is working fine. I even switched network interfaces (using the 2nd Ethernet port on the server), but for no avail.

The actual results looked a bit different – clients were unable to connect to the server using SSH for the first time (in general), but were able to connect the next time. You can’t run your Oracle server on such a setup…

I have escalated my tests into tcpdump, which showed only part of the information expected, but gave too much junk to be readable enough to fetch anything out of it.

Using remote desktop from another server to client’s desktop we’ve encountered that same problem – first time failure, and then success, and then it hit me! On another (it was third or fourth desktop) I have looked in the output of “arp -a” (Windows Desktop) right after the first failure, and saw that the MAC address assigned to the server’s IP is a wrong one. Some other machine on the network had this same IP address. Replacing the Linux Server’s IP address to a free one solved everything, as it seems, and resulted in a fine working server, and some free time devouted to hunting down the renegade spoofing machine.