Component architecture of software complex for intelligent analysis of scientific and technical data

Seminars

Laboratory of Information Technologies

Joint Laboratory Seminar

Date and Time: Wednesday, 12 February 2025, at 3:00 PM

Venue: Conference Hall, Meshcheryakov Laboratory of Information Technologies, online on Webinar

Seminar topic: “Component architecture of the software complex for intelligent analysis of scientific and technical data”

Speaker: Evgeny Antonov (Moscow Engineering Physics Institute)

Abstract:

Modern methods of intelligent data analysis (IDA) based on machine learning, natural language processing, and visualisation technologies require adaptation to the specifics of scientific and technical information (STI), which is characterised by a variety of formats and unstructured and poorly structured data. The present work deals with the development of the component architecture of the STI IDA program complex, which provides horizontal scalability for working with big data. The author presents specialised algorithms for data extraction and saturation, taking into account the peculiarities of scientific publications, including the extraction of text keywords, physical quantities and units of measurement, chemical elements, tables, images, the unification of names of affiliations and countries, and the definition of intergovernmental associations. The program complex is composed of four main blocks: a client-server module, a distributed workflow management module, a data processing and saturation module, and a data warehouse. The system’s architecture offers flexibility, enabling the augmentation of its functionality through the integration of contemporary technological solutions. The system’s capacity to operate with diverse data sources, encompassing PDF documents, web pages, and databases, is a notable feature. It furnishes interactive analytical dashboards to facilitate the visualisation of results, thereby enhancing the system’s utility. The theoretical and practical significance of the work is evident in the development of existing approaches to IDA and the implementation of the developed solutions in real-world projects. In particular, the system has been successfully implemented in several notable projects, including the creation of a database of properties and structures of irradiated materials, the digitalisation of experimental data, and the establishment of a repository of scientific publications from the Joint Institute for Nuclear Research.

(Based on materials of a Candidate’s dissertation.)