The best way to conference proceedings by Francis Academic Press

Web of Proceedings - Francis Academic Press
Web of Proceedings - Francis Academic Press

Multi-Modal Perception and Understanding: Application of Large Model in Real-Time Video Analysis

Download as PDF

DOI: 10.25236/icamfss.2024.008

Author(s)

Chen Xi, Ma Baijun, Li Haoxuan

Corresponding Author

Chen Xi

Abstract

With the advent of artificial intelligence technology into a new era, research and application of multi-modal perception and understanding have reached a stage of high-quality development. This paper focuses on multi-modal perception and understanding in real-time video analysis. It introduces scientific propositions to enhance multi-modal perception and understanding in this context. Based on the dynamic evolution of multi-modal perception and understanding development, a theoretical analysis framework for developing multi-modal perception and understanding is constructed according to the inherent logic of real-time video analysis. This framework can explain the mechanism of multi-modal perception and understanding development, jointly generated by the real-time video analysis mechanism and the cyclic mechanism involving multi-modal perception, understanding, and large models. The potential for advancing the goal of high-quality real-time video analysis is further explored from technical challenges and practical implications related to developing multi-modal perception and understanding. The purpose of developing modal perception and understanding is to provide multi-modal perception and understanding that meet the expected real-time video analysis standards, continuously improve the quality of real-time video analysis, and enhance user satisfaction. To achieve high-quality development of real-time video analysis measures such as strengthening data control based on the internal circulation of data quality, constructing a multi-modal perception and understanding model, establishing a mechanism for interaction and feedback between real-time video analysis quality perception, and setting up an evaluation system for real-time video analysis efficiency and accuracy should be implemented. These actions will promote the application of multi-modal perception and understanding and effectively meet the requirements of real-time video analysis.

Keywords

Multimodal Perception; Real-Time Video Analysis; Large Model; Applications