CHARM-Card: Hardware Based Cluster Control And Management System

Year
2009
Degree
PhD
Author
Panse, Ralf Erich
Institution
Heidelberg U.
Abstract

The selection and analysis of detector events of the heavy ion collider experiment ALICE at CERN are accomplished by the so-called trigger levels. The High Level Trigger (HLT) is the last trigger level of this experiment. Currently, it consists of up to over 120 computers and it is planned to upgrade the cluster to up to 300 computers. However, the manual installation, configuration and maintenance of such a big computer farm require a large amount of administrative effort. This thesis describes the implementation and functionality of an autonomous control unit, which was installed to every node of the HLT computing cluster. The main tasks of the control unit are the remote control of the cluster nodes and the automatic installation, monitoring and maintenance of the computers. By the reason of the heterogeneous layout of the target cluster, the control unit was developed to be flexible in use independent of the computer model or operating system of the cluster node. This characteristic enables remote control of cost-efficient COTS (commercial-off-the-shelf) PCs, which do not have integrated remote control capabilities as expensive server boards. The HLT computing cluster is already remotely controlled by the help of the control unit. Furthermore, this control unit was also used for the automatic setup, testing and configuration of all cluster nodes.

Date of last update
2016-11-22