Go to main content

School of Computer Science Intranet

APT research areas

Discover our main research areas

Stella Computer Cluster Architecture

Hardware

Stella is composed of:

  • Four identical 4-way nodes (comp02-comp05), each of them with:
    • 4 dual-core opterons 2.4GHz
    • 16GB RAM
    • 73GB SCSI320 HD
    • Infinipath HTX adapter [not delivered yet]
    • 1U chassis
  • One extensible 4-way node (comp00):
    • Same configuration as comp02-05 plus:
    • Extra 300GB SCSI320 HD
    • 4U chassis with available PCI-Express slots
  • One 2-way node (comp01)
    • 2 dual-core opterons 2.4GHz
    • 8GB RAM
    • 73GB SATA Raptor HD
    • Infinipath PCI-Express adapter
    • 4U chassis with available HTX and PCI-Express slots
  • Front-end node + NFS file server (stella)
    • 2 single-core opterons 2.4GHz
    • 4GB RAM
    • 150GB RAID1
    • 2TB RAID5 accessible via NFS via Gigabit ethernet
  • Cluster Management Appliance (CMA)
  • Gigabit Ethernet Switch
  • Silverstorm Infiniband Switch
  • UPS

Cluster interconnect

The following figure outlines the functional connectivity within the cluster. Users connect via the front-end node, which is a full Linux system with shells, compilers, editors, etc. Jobs are submitted to the front-end node's scheduler, which distributes them to the compute nodes via gigabit ethernet. The compute nodes have a further fast interconnect for high-bandwidth low-latency communications.

Gigabit ethernet

The Gigabit ethernet switch is connecting all the components of the cluster together (front-end node, compute nodes and CMA). The Gbit ethernet LAN is mainly used for the following tasks:

  • Job control between front-end node and compute nodes
  • NFS file server transfers
  • Monitoring functions by the CMA
  • Serial-Over-Ethernet communications

MPI Infinipath/Infiniband interconnect

The Infinipath/Infiniband interconnect is (in 2006) the interconnect which provides the lowest latency for MPI communications. A tiny fraction of the latency was additionally saved by selecting Infinipath HTX adapters instead of PCI-Express ones.
This low-latency interconnect is exploited by the MPI libraries installed on the cluster.

Software

The front-end and compute nodes are based on Suse 9.3. As such, all the usual linux tools and libraries are available:

  • gcc
  • liblapack, libscalapack
  • etc.
In addition to these, we installed the following applications:
  • Java 1.5 and 1.6
  • Synopsys VCS with SystemC
  • Matlab with Distributed Toolkit

Scheduler: Sun N1 Grid Engine

Every task intended to run on the cluster's compute nodes must be submitted to the Sun Grid Engine software: Grid Engine keeps track of available resources, running jobs, user history etc., and automatically decides where and when the various jobs should run to use the compute nodes at the best of their capabilities and share the resources equitably between users.
TODO: Add a description of all those useful things in SGE. Add description of queues.

Network File System

Two common ways of copying files accross computers are by using scp and by sharing them via NFS.
Scp is available to and from Stella.
Regarding NFS, we decided not to mount any departmental NFS share on stella for two reasons:

  • They are unreliable: machines mounting them usually hang when the department's network goes down.
  • A file written to a department's NFS file server goes via a series of 100Mbps switches.
Instead, Stella's NFS is configured to be mounted on linux desktops.
Stella is linked to the APT group via a 1Gbps switch.