Difference between revisions of "Lychee"

From MDWiki
Jump to navigationJump to search
Line 59: Line 59:
 
** python setup.py build
 
** python setup.py build
 
** python setup.py install
 
** python setup.py install
 +
 +
== Maui Resource Manager ==
 +
 +
=== Fairshare ===
 +
 +
To enable Fairshare, FSPOLICY must be set
 +
 +
    FSPOLICY              DEDICATEDPES
 +
    FSDEPTH              10
 +
    FSINTERVAL            24:00:00
 +
    FSDECAY              0.80
 +
 +
Above setting means use each fairshare interval is 24hrs, total number of 10 intervals are taken into statistics, at a decay rate of 0.8.
 +
 +
    FSWEIGHT                1
 +
    FSGROUPWEIGHT          50
 +
    FSUSERWEIGHT            40
 +
    FSQOSWEIGHT            10
 +
 +
also the fairshare of GROUP, USER, QOS must be set.
 +
 +
To check current FS of the jobs queueing, use
 +
    sudo diagnose -p
  
 
== Install SystemImager Image Server (for Imaging Workstations) ==
 
== Install SystemImager Image Server (for Imaging Workstations) ==

Revision as of 08:22, 25 June 2009

Lychee

lychee is one of our servers. This document provides a full description of its services. If a service is moved to another server, please just move the corresponding description to its page.

Software

/marksw

/marksw/ contains a large selection of pre-compiled software for use by the group. Each package is found in /marksw/$NAME/$VERSION/. The source code and details of how the software was built is in /marksw/$NAME/$VERSION/SOURCE. In the bash (csh) configuration file /marksw/BASHRC (/marksw/CSHRC), is a simple shell function, module, which can be used in the following ways:

  1. module avail
  2. module $NAME/$VERSION ...

The first use lists the possible $NAME/$VERSION options for the second use. The second use simply adds for each $NAME/$VERSION argument, /marksw/$NAME/$VERSION/bin and /marksw/$NAME/$VERSION/sbin to the beginning of the contents of the PATH environment variable; /marksw/$NAME/$VERSION/lib64 and /marksw/$NAME/$VERSION/lib to the beginning of the contents of the LD_LIBRARY_PATH environment variable; and /marksw/$NAME/$VERSION/man to the beginning of the contents of the MANPATH environment variable.

The module command is sourced in the global bash configuration file /etc/bashrc and in the global csh configuration file /etc/csh.local.

Yum

Yum Repositories

Repository Software Installed

zsh yum-utils OpenIPMI OpenIPMI-tools openbabel openbabel-devel

/opt

Rasmol

  • Download source from [1]
  • tar -xzvf RasMol_Latest.tar.gz
  • cd Rasmol_2.7.4.2_23Mar08/src/
  • Edit Imakefile:
  • LDLIBS = -L/usr/lib64 '...'
  • Make sure /usr/lib64/libforms.so exists (package xforms-devel doesn't actually create symbolic links correctly)
  • ./build_all.sh
  • Move rasmol_32BIT to /opt/bin/rasmol

Acrobat

VMD

  • Done by Mitch/AJ

Pymol

  • Done by Mitch/AJ

AMD Core Math Library (ACML)

  • Download from here
  • Extract & run ./install-acml-4-2-0-gfortran-64bit.sh
    • Licence? accept
    • Location (/opt/acml4.2.0)? [Enter]

TwistedConch (Python SSH library)

OpenBabel Python Bindings

Maui Resource Manager

Fairshare

To enable Fairshare, FSPOLICY must be set

   FSPOLICY              DEDICATEDPES
   FSDEPTH               10
   FSINTERVAL            24:00:00
   FSDECAY               0.80

Above setting means use each fairshare interval is 24hrs, total number of 10 intervals are taken into statistics, at a decay rate of 0.8.

   FSWEIGHT                1
   FSGROUPWEIGHT           50
   FSUSERWEIGHT            40
   FSQOSWEIGHT             10

also the fairshare of GROUP, USER, QOS must be set.

To check current FS of the jobs queueing, use

   sudo diagnose -p

Install SystemImager Image Server (for Imaging Workstations)

The following relates to the once-off installation of systemimager for serving images. See CentosWorkstation for information as to the actual use of systemimager for getting the images onto the image server for serving.

Backups

  • Root's crontab on lychee states:
0   23  *   *   sun  /marksw/scripts/0.1/backup_dir /home/
0   22  *   *   sat  updatedb 
    • This runs a very basic rsync copy of /home/ to cirrus.hpcu.uq.edu.au
    • updatedb updates the file database used by the locate command

Known Issues/Solving Problems

Gromacs Excessive Log Files

A cron job is run every 20 minutes as root to kill jobs that crash creating excessive log files:

0,20,40 * * * * /opt/c3-4/cexec '/usr/sbin/lsof | /bin/grep --line-buffered "md[0-9]\+.log" | /bin/awk "{if (\$7>21474836480) print \$2;}" | /usr/bin/xargs --verbose -r /bin/kill -9' >& /var/log/GromacsBigLogfileKill.log

High Load/Monitoring Network Traffic

The command /usr/sbin/iftop may be run by anyone in groups BMMG, MDGroup, SBeatson, KobeLab. This is to determine what the network traffic is doing (nfs activity can cause a high load). This was enabled by adding the following lines to /etc/sudoers:

%BMMG     ALL=/usr/sbin/iftop
%MDGroup  ALL=/usr/sbin/iftop
%SBeatson ALL=/usr/sbin/iftop
%KobeLab  ALL=/usr/sbin/iftop

To read the network usage on the MD VLAN interface on lychee run:

sudo /usr/sbin/iftop -i eth0

To read the network usage on the cluster interface on lychee run:

sudo /usr/sbin/iftop -i eth2

Fedora-DS

Fedora-ds runs out of open file descriptors (default 1024) very quickly. The default value has been changed to 8192, allowing the system to survive longer but it still runs out after months. When this problem happens, all workstations freeze. Restarting fedora-ds will temporally fix the problem (/etc/init.d/fedora-ds restart and wait a little while).

See here for reference.

NFS

If lychee is rebooted, NFS mounts from clients to lychee's NFS exports get stale. To fix this, you can try on the client:

mount -o remount -a

If this doesn't work the client has to be rebooted. Lychee's NFS can be restarted by /etc/init.d/nfs restart.