Linux Troubleshooting

Tuesday, October 26, 2010

SIOCADDRT: No such device or SIOCADDRT: Network is unreachable

1. Check 'ifconfig ethx' to determine ip address has been assigned. If not assign ip address

Wednesday, October 20, 2010

Write protecting the kernel read-only data

There was a bug in previous version, please update to the latest version. That should get rid of these error messages

Monday, October 11, 2010

upgrade from RHEL4 to RHEL5

Upgrading between major versions is not support. Install a required version from scratch.

Sunday, September 5, 2010

linux bonding does not double bandwidth

For 1G Nic, the maximum theoretical speed limit is 128MB/s.
In real world you get much lesser, adding the protocol overhead like we have scp, wget and other protocols.
If you are currently using mode 0.
In mode 0 there is chances for packet re-ordering. Which can delay the speed.
There may be slight improvement if you use mode 4.
With mode 4 the incoming connection is controlled by switch. So there will be no packet re-order issues or packet drops.
Even if you use mode 4, you cannot achieve 2G speed for a single connection. You can enjoy 2G bandwidth for multiple connections.

There is a assumption that configuration of bonding with multiple network cards gives double bandwidth. That is not at all true.

Thursday, August 5, 2010

GFS2 lockdump analysis

G: s:EX n:2/1fda70 f:Dy t:EX d:UN/132055446000 l:0 a:0 r:5
H: s:EX f:H e:0 p:1405 [httpd] gfs2_write_begin+0x61/0x33e [gfs2]
H: s:EX f:W e:0 p:31819 [umount] gfs2_write_inode+0x57/0x152 [gfs2]

The content of the file is a series of lines. Each line starting with G: represents one glock, and the following lines, indented by a single space, represent an item of information relating to the glock immediately before them in the file

Lines in the debugfs file starting with H: (holders) represent lock requests either granted or waiting to be granted

The flags field on the holders line f: shows which: The 'W' flag refers to a waiting request, the 'H' flag refers to a granted request

The glocks which have large numbers of waiting requests are likely to be those which are experiencing particular contention.

Having identified a glock which is causing a problem, the next step is to find out which inode it relates to. The glock number (n: on the G: line) indicates this. It is of the form type/number and if type is 2, then the glock is an inode glock and the number is an inode number. To track down the inode, you can then run find -inum number where number is the inode number converted from the hex format in the glocks file into decimal.

If the glock that was identified was of a different type, then it is most likely to be of type 3: (resource group). If you see significant numbers of processes waiting for other types of glock under normal loads, then please report this to Red Hat support.

f you do see a number of waiting requests queued on a resource group lock there may be a number of reason for this. One is that there are a large number of nodes compared to the number of resource groups in the filesystem. Another is that the filesystem may be very nearly full (requiring, on average, longer searches for free blocks). The situation in both cases can be improved by adding more storage and using the gfs2_grow command to expand the filesystem.

The W flag indicates that this is waiting for a glock
UN (unlocked) state
SH (shared) lock

Monday, August 2, 2010

du command usage

1. du -hs * (shows linux directory size)
2. du * | sort -n (finding large files with sort option)

Friday, July 2, 2010

bash: fork: Resource temporarily unavailable

The error is generated because of recursive defunct processes going on in the memory and is unable to allocate specific amount of memory for the newly created process. Basically the general structure of process generation is as followed.

This is the normal life of a program.
fork()INIT->exec()->RUN->exit()->ZOMBIE->done

INIT(fork)
The program is started by a parent process, an action called fork()
The fork makes a copy(the child) of the calling process(the parent).
exec() The child then issues an exec() system call which replaces the new
process with the intended executable file.

SRUN/URUN (system/user run space) the new child program now runs. Now the parent is either waiting(in a SLEEP) for the child to finish or checks for the childs completion or the system notifys the parent on exit
of the child process.

exit()
The child exits and returns the resources(memory) to the system.
ZOMBIE At this point the child has terminated and is in ZOMBIE. THIS IS NORMAL!! It will stay in this state until the parent process acknowledges receipt of the appropriate signal or terminates.

If the parent process has died unexpectedly or something else has prevented acknowledgment from the parent then process ID #1 (init) takes over and becomes the childs parent until reboot.

So...
A zombie does not tie up memory but it still has a slot in the process table. I/O devices can get locked out.

You can't kill a ZOMBIE process because......IT'S ALREADY DEAD!!!

So finally try to check these defunct processes going on server You can do this by giving the ps -dfa command and killing manually. If this work you can start by simply relogin into the shell and if not you will need a reboot to the server in order to refresh the memory states.