Personnel training and software for data storage and analysis for NICA discussed at MEPhI

News, 15 December 2023

On 12 December, a MEPhI-JINR workshop on computing for the NICA megascience project took place at the National Research Nuclear University MEPhI. The workshop was devoted to the discussion of issues related to the organization of experimental data processing and analysis. Special attention was paid to the creation of systems for geographically distributed data processing and storage. The agenda of the event embraced the training of specialists to solve the tasks of the NICA megaproject.

The workshop, which was held in a mixed format, brought together over 50 participants, including spokesmen and coordinators of software development and computing of all three major experiments at NICA, namely, BM@N, MPD, and SPD, as well as experts from MLIT, VBLHEP, DLNP JINR, and MEPhI specialists who are engaged in these three projects and take part both in the creation of detector complexes and in the development of software for data processing and physics analysis.

Photo: © https://mephi.ru/

NRNU MEPhI Vice-Rector Natalya Barbashina welcomed the audience, especially emphasizing the importance of cooperation between MEPhI and JINR. The meeting was opened by the report of NRNU MEPhI Associate Professor Arkady Taranenko, which was devoted to the active participation of MEPhI specialists, postgraduates and students in all three experiments, BM@N, MPD and SPD, at NICA. In the next talk, Director of the JINR Meshcheryakov Laboratory of Information Technologies Sergei Shmatov presented information on MLIT’s activity and IT solutions for the NICA megaproject created on the basis of the resources of the JINR MLIT Multifunctional Information and Computing Complex.

Sergei Shmatov spoke about the major results of the meeting. Its main focus was the organization of processing, analyzing, storing, transferring and managing data that will be acquired during the implementation of the physical research program of the experiments at NICA. “Already now, the BM@N experiment with a fixed target is receiving real data, the other experiments are working with Monte Carlo modeling data. These volumes are already quite significant, and the question arises how we can work with them,” Sergei Shmatov said. There exist three aspects to this question: where to physically store this data and how to ensure its security? How will access to the data be organized, and what software will be used to process it? The third aspect is the development of tools that will be employed to extract physics information (programs for the reconstruction and identification of elementary particles, physics analysis, etc.).

The keynote at the workshop was the issue of distributed data processing and analysis. “The data volume is so large that, by analogy with experiments at the Large Hadron Collider, storing it within one local computer center, as well as organizing computing and the storage, processing and analysis of this data, is simply physically impossible,” Sergei Shmatov commented. He said that the Joint Institute, namely, the Meshcheryakov Laboratory of Information Technologies, would be able to cover the needs of the experiments in the amount of 25%. The rest will fall to partner organizations, i.e., they will create and maintain computing complexes that will be combined into a global data processing and analysis system.

“At present, work is underway in two directions: firstly, it is necessary that such systems and centers appear, and secondly, it is required to develop tools that will enable to connect all these centers together. There is such experience, for example, within the WLCG (Worldwide LHC Computing Grid), a global project on computing for experiments at the LHC. We are going to repeat this for Russian megascience projects. Within our cloud infrastructure we connect to it centers located both in the JINR Member States and in other countries, so that we could be linked into a unified system and work with this data regardless of where we are geographically located”, Sergei Shmatov noted. The volume of third-party computing is still small, however, it already allows training specialists and adjusting software and hardware. Subsequently, as the data volume grows, it can be scaled.

Another reason why the distributed approach is required is security requirements. If everything is created in one place, data may be lost due to force majeure. “Applying the distributed approach, when we have data replicas in other centers, will enable to secure data and avoid losses in the case of unforeseen situations,” MLIT Director explained.

In addition, the issues of training highly qualified specialists for the NICA experiments and of organizing a personnel training system that will work in a unique field at the intersection of physics and IT were discussed. The specialized departments of NRNU MEPhI offer a master’s program that will allow training specialists in two specialties simultaneously: particle physics and information technology. Graduates will receive a diploma in two specialties at once.

At the workshop, MLIT Scientific Leader Vladimir Korenkov delivered a talk on the history of the creation, development and current state of distributed computing in high-energy physics. The experience accumulated at MLIT in this area will certainly be in demand when creating computing for the experiments at NICA. Oleg Rogachevsky, Konstantin Gertsenberger and Alexey Zhemchugov, software coordinators of the MPD, BM@N and SPD experiments, respectively, provided an overview of software systems and complexes for modeling, obtaining and processing experimental data at the NICA complex, as well as presented tasks in which MEPhI specialists, postgraduates and students could participate. The talk by MLIT Researcher Igor Pelevanyuk evoked great interest among the audience. It considered data processing and generation in the heterogeneous distributed computing environment under the management of the DIRAC platform, which integrates the computing resources and data storage resources of MLIT, VBLHEP, MEPhI, and a number of other institutes participating in collaborations at NICA. At present, all three experiments at NICA use DIRAC to solve their tasks. The final reports at the meeting were devoted to the training of specialists to solve the tasks of the NICA megaproject. MLIT Researcher Oksana Streltsova introduced the meeting participants to the experience of training specialists in the field of parallel programming, creating machine and deep learning algorithms, as well as elaborating IT services using the ML/DL/HPC ecosystem of the HybriLIT platform. In his report, MEPhI Associate Professor Evgeny Soldatov presented the new master’s program of NRNU MEPhI “Software Engineering and Data Analysis for High Energy Physics”.

After the talks there was held a general discussion, as a result of which it was decided to establish a permanent MEPhI-JINR Council for consolidating the efforts of the organizations to cope with problems facing the participants of collaborations of the experiments at the NICA complex, as well as for solving personnel training issues.

This is not the first time that an event devoted to the NICA megascience project takes place at NRNU MEPhI. Last December, an international online seminar on methods for data processing and analysis within the experiments at the NICA accelerator complex was held.

Photo: © https://mephi.ru/