Categories
Pages
-

HPC Cluster Usage Analysis

Welcome to my internship

This blog accompanies my internship within the Master programme “Artificial Intelligence” at the University of Maastricht. The project’s name is HPC Cluster Usage Analysis for which the preprocessing will be done during this internship.

Background

The RWTH Aachen University runs a large HPC cluster consisting of several sections that provides work space for research and work as well as for learning and practicing. Currently, the users specify their requirements when they apply for computing time on those cluster sections. An example for those requirements would be the resource requirements in CPU-hours, parallel processing, memory and storage requirements. A detailed description of the process can be found here.

In previous observations it was found that the specific needs are often driven by implicit needs, e.g. the amount of space users request is derived by the number of cores that is needed. Therefore it is likely that someone asks for a specific amount of space but only needs more cores than the previous granted ones. This results in unused but reserved space which could be used by other users instead and thus reduce the waiting time for others.

Though there is a distinction between scientific and educational users in the application process, this distinction does not necessarily include information about the type of usage on the cluster system, e.g. heavy and light, frequent and occasional, short term and long term users. Such a distinction could reveal correlations to external conditions like the academic year or specific research periods. A possible insight could be that reoccurring courses need approximately the same space every year. Additionally, a more specific support could be provided if the users would be distinguished based on their requirements. This could happen in the form of instructions for first time users but also with preinstalled software based on their typical behaviour if they are reoccurring users.

Several user data, e.g. the scripts that get executed, the programs that are added to the script and how much space is actually used, already gets logged. Still, most of the data is not analysed to predict future use and is only consulted if a job goes wrong. Better support and preventive actions could be taken if the data would get analysed earlier and frequently. Additionally, the prediction of future user behaviour allows the IT Center to apply for new equipment to the government in a well-founded way and can give an advantage over other providers in Germany.

Project description

The project, in which this internship should be the first step, raises the following three problem statements:

  • Can future users be classified based on their behaviour?
  • Can behaviour based clustering help to predict an individual user’s behaviour over time?
  • Can the needed amount of space and cores be predicted over time?

These problems are complex and thus need intensive preparation before actual clustering and prediction methods can be successfully applied. Therefore the goal of my internship is in-depth data preprocessing. The major goal is to build a data base or data warehouse that automatically collects relevant information that can later be used for further processing. In order to do so, I want to answer the following problem statements:

  • What cluster related information already gets collected? And where is it stored?
  • Which information could be used as features for later prediction and clustering tasks?
  • Which data needs to be anonymised?
  • Where can the data be stored without violating security restrictions?
  • Which data scheme efficiently represents the information?
  • How can the data scheme be implemented?
  • How can the data base automatically be kept up to date?
  • How can the results be adequately presented?