Linux Troubleshooting: June 2010

Tuesday, June 29, 2010

packet drops in Linux

Packet drops can happen at two layers. one at the NIC level or at the Network stack level.

Check 'ifconfig' output.

RX packets:297126179 errors:0 dropped:3981 overruns:0 frame:0
TX packets:233589229 errors:0 dropped:0 overruns:0 carrier:0

That means packets drops at the NIC level. These are most likely caused by exhaustion of the RX ring buffer. Increase the size of the ethernet device ring buffer.

First inspect the output of "ethtool -g eth0". If the "Pre-set maximums" are more than the what's listed in the current hardware settings it's recommend to increase this number. As an example:

# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX: 1020
RX Mini: 0
RX Jumbo: 16320
TX: 255
Current hardware settings:
RX: 255
RX Mini: 0
RX Jumbo: 0
TX: 255

To increase the RX ring buffer to 4080 you would run "ethtool -G eth0 rx 1020".

Sunday, June 27, 2010

when we bring the node back , failback is not happening

when poweroff command is given through terminal , fail over is working but when we bring the node back , failback is not happening?

Sunday, June 20, 2010

kernel panic due to openafs filesystem

By the looks of the logs kernel panic occurred due to 'openafs' file system corruption.

Tuesday, June 15, 2010

configure guest virtual machines to use SAN storage using libvirt

Provision a new logical unit on iSCSI or fibre channel storage. Use virsh to trigger a scan for it, and confirm that it appears correctly.
To discover logical units on a particular HBA, create a pool for that HBA using:
virsh pool-create hbapool.xml
where hbapool.xml contains:


  host6
  /dev/disk/by-id

Confirm that all the appropriate logical units are visible as volumes with:
virsh vol-list host6
After creating the pool, add a new logical unit on a target that's visible on that host and refresh the pool with:
virsh pool-refresh host6
and confirm that the new storage is visible. Note that the refresh code only scans for new LUs on existing targets and does not issue a LIP to discover new targets as that would be disruptive to I/O.

Thursday, June 10, 2010

backup and restore partition table

Backing up partition table.

dd if=/dev/device of=/path/to/backup bs=512 count=1

Restore

dd if=/path/to/backup of=/dev/device bs=512 count=1

You can make sure data is intact using 'hexdump'.

hexdump /path/to/backup

Tuesday, June 8, 2010

gfs_controld mount: not in default fence domain

The above logs indicate that either means that the fence daemon was not running, the node was not a full member of cluster, or the node had joined the cluster in an unstable state. If any node did not join fence domain would cause to shared file systems hung.

To confirm the above run the following commands:

# group_tool dump fence | grep members

and

cman_tool services

type level name id state
fence 0 default 00000000 JOIN_START_WAIT
[1]
dlm 1 clvmd 00000000 none

If the state is "JOIN_START_WAIT" the above description of the problem is correct.

Linux Troubleshooting