Text data mining methods and web tools for information processing and visualisation / Data mining of scientific and technical information on example of patent documentation
Seminars
Laboratory of Information Technologies
Joint Laboratory Seminar
Date and Time: Tuesday, 25 February 2025, at 11:00 AM
Venue: room 310, Meshcheryakov Laboratory of Information Technologies, online on Webinar
More information about the seminar
-
Seminar topic: “Text data mining methods and web tools for information processing and visualisation”
Speaker: Anna Ilina
Abstract:
The speaker will present the results of the development and application of advanced methods for text processing and data mining in scientific and technical domains, and will speak on the design of efficient web tools for data processing and visualisation. In particular, the results of research in the semantic analysis and named entity recognition will be described. In addition, the seminar will suggest solutions for monitoring resource utilisation within the DIRAC distributed infrastructure and for the security of network connections to the internal services of JINR.
-
Seminar topic: “Data mining of scientific and technical information on example of patent documentation”
Speaker: Daria Zrelova
Abstract:
The presentation focuses on the data mining of scientific and technical information, using patent documentation as an example. Patent data represents a relevant source of information on long-term technological trends and the practical implementation of innovations. However, analysing patents creates certain challenges due to their specific structure, which prioritises legal protection of the invention rather than a detailed explanation of its essence. This work studies the specifics of patent information analysis, emphasising modern approaches and methods for extracting valuable insights from patent data to identify promising avenues for technological development. In particular, the study analyses and examines in detail data related to classifications, dates, languages of submission, authors, owners, and other relevant fields. To conduct semantic analysis, a document corpus was created from patent abstracts, and a corresponding vocabulary was defined. These resources were used to train a neural network language model based on Word2Vec. Semantic analysis is crucial for analysing the textual content of patents, extracting key terms, phrases, and concepts that characterise each patent, and supporting their classification.