Methods for comparative analysis of supercomputer applications based on data mining techniques
Seminars
Laboratory of Information Technologies
Joint Laboratory Seminar
Date and Time: Thursday, 6 February 2025, at 2:00 PM
Venue: Conference Hall, Meshcheryakov Laboratory of Information Technologies, online on Webinar
Seminar topic: “Research and development of methods for comparative analysis of supercomputer applications based on data mining techniques”
Speaker: Denis Shaykhislamov (Moscow State University)
Modern supercomputers provide a lot of useful information about the applications running on them: structural data, performance or communication profile of applications; names of the used applied software packages, libraries and compilers; detailed information on the task start, etc. The volume of collected information is growing, and it is almost impossible to process it manually. Therefore, the problem of developing data mining methods is becoming increasingly relevant. These methods will allow administrators to more comprehensively, accurately, and quickly evaluate the work of a supercomputer based on the specified information, and identify and eliminate problems that lead to a decrease in the efficiency of supercomputers. One of the areas for such analysis is the issue of finding similar applications. The information about the similarity of various applications makes it possible not only to study new tasks using the previously obtained results of the analysis of similar, already studied applications, but also to group tasks or predict their behavior, which will significantly facilitate the process of studying the efficiency of applications for both users and administrators of supercomputers. This research presents two approaches to solving the problem of finding similar supercomputer applications and proposes algorithms for studying the supercomputer task flow based on the proposed approaches, which allow identifying the software package usages, task clustering, and predicting the quality assessment of supercomputer resources usage.