Lychee sytem
ssh Hostbased Authentication
In order to make queue transfer data from and to cluster nodes (mango*) smoothly, ssh host based Authentication must be correctly setup.
- /etc/ssh/sshd_config on servers (actually everynodes & lychee) must have the following lines:
AllowUsers root *@mango* *@lychee*
HostbasedAuthentication yes IgnoreUserKnownHosts yes
- /etc/ssh/ssh_config on clients (mango* & lychee) must have:
Host * HostbasedAuthentication yes EnableSSHKeysign yes
- /etc/ssh/ssh_known_hosts2 stores protocol 2 ssh public keys, which can be obtained by:
ssh-keyscan -vt rsa mango02 >> /etc/ssh/ssh_known_host2
Different entries can share the same key, as long as the host machines use the same ssh_host_rsa_key key pairs(recommended).
- /etc/hosts.equiv stores all the possible hostname one in a line like
mango01 192.168.0.3 mango02 192.168.0.4 .... lychee lychee.md.smms.uq.edu.au 192.168.1.249 ...
- restart sshd server and it should work.
see also:
http://www.snailbook.com/faq/trusted-host-howto.auto.html
https://www.cs.uwaterloo.ca/twiki/view/CF/SSHHostBasedAuthentication
http://docs.hp.com/en/5992-4213/ch04s06.html
Torque PBS qsub wrapper
Using a wrapper of qsub will be helpful in case that some rules/restrains to the jobs are difficult to be added by qmgr.
To use a filter, add the this to /var/spool/PBS/torque.cfg .
SUBMITFILTER /path/to/your/wrapper
The wrapper will read lines, which is content of the job script, from STDIN, analyze it, and output the modified version to STDOUT. Useful information can be displayed by writing to STDERR as well.
LDAP server gidName index for group name searching
quoted from martin's email
The fedora-ds install configuration builds indexes for most of the commonly searched attributes, but not for "gidNumber". The fedora-ds GUI console provides an "indexes" page, where this (and other attributes) may be added. Following any changes, the DS must be stopped and a db2index command run to recreate the indexes.
LDAP server open file descriptor problem
By default, fedora-ds only can have 1024 open file descriptors, which would be run out very soon and cause every client machine/node to hang.
The number of open file descriptor is limited by the system. The hard limit can be checked with
ulimit -a
To change that value when the ldap server starts, add
ulimit -n 8192
to /etc/init.d/fedora-ds script.
If the value cannot be changed to exceed 1024, check the following places:
- /etc/security/limits.conf, add the following line:
"* - nofile 8192"
- /etc/sysctl.conf, make sure fs.file-max is larger than the limit specified in /etc/limits.conf:
# ADDED FOR FDS net.ipv4.tcp_keepalive_time = 300 net.ipv4.ip_local_port_range = 1024 65000 fs.file-max = 64000
# hostname/domainname kernel.hostname = lychee.md.smms.uq.edu.au kernel.domainname = md.smms.uq.edu.au
- Change option in /opt/fedora-ds/slapd-lychee/config/dse.ldif (this maybe overwritten by fedora-ds during server restart)
nsslapd-maxdescriptors: 8192