矿山安全隐患多模态智能感知与识别方法研究综述

程德强; 王衍辰; 岳阳; 田亮; 翟杰; 寇旗旗

doi:10.13779/j.cnki.issn1001-0076.2026.02.001

矿山安全隐患多模态智能感知与识别方法研究综述

A Survey on Multi-Modal Intelligent Perception and Hazard Recognition Techniques in Mine Safety

摘要

摘要: 在“双碳”战略与智慧矿山建设持续推进的背景下，矿井安全生产对高可靠智能感知技术提出了更高要求。受低照度、粉尘遮挡、水雾弥漫及空间受限等复杂环境因素影响，基于可见光、红外、深度图像或三维点云的单模态感知方法易出现成像退化与特征缺失，导致识别稳定性不足，难以满足高风险场景下对人员行为、装备运行状态及环境灾害征兆的精准识别需求。多模态融合感知凭借模态间在纹理细节、温度特征、几何关系等方面的互补性，为提升矿井安全隐患识别能力提供了新的技术途径。综述了矿井常用的多模态数据特性，分析了可见光相机、红外相机、激光雷达与深度相机在信息采集与特征表达上的优势与局限；进一步梳理了数据级、特征级与决策级三类典型融合策略及其适用场景；重点总结了近年来多模态融合目标检测与隐患识别方法的研究与应用进展，涵盖可见光图像−红外图像、可见光图像−三维点云与可见光图像−深度图像三种多模态协同路径，并以煤矿应用场景为例展示其在人员定位、装备运行监测与复杂工况环境感知等任务中的工程价值。最后，讨论了多模态感知在标注成本、跨模态对齐、鲁棒性提升与轻量化部署方面面临的挑战，并展望其在矿井自动化、无人作业与安全风险预测中的未来发展方向，旨在为矿山安全隐患多模态智能感知与识别技术的理论研究与工程化应用提供参考与技术指引。

Abstract: With the advancement of the “dual-carbon” strategy and intelligent mine construction, mine safety production places increasingly high demands on reliable intelligent perception technologies. In underground environments with insufficient lighting, dust, water mist, and confined spaces, single-modal perception methods based on visible images, infrared images, depth maps, or 3D point clouds often suffer from imaging degradation and feature loss. As a result, their recognition stability is limited, making it difficult to achieve accurate perception of personnel behaviors, equipment operating states, and environmental hazards in high-risk mining scenarios. Multimodal perception, which leverages complementary information across modalities in texture, thermal and geometric characteristics, offers an effective solution for enhancing mine safety hazard recognition. This paper reviews the characteristics of multimodal data commonly used in mines and analyzes the strengths and limitations of visible cameras, infrared cameras, LiDAR, and depth cameras in information acquisition and feature representation. Three representative fusion paradigms, including data-level, feature-level, and decision-level fusion, are then summarized with their applicable scenarios. Recent progress in multimodal-based target detection and hazard recognition is reviewed, focusing on visible-infrared image fusion, visible image-point cloud fusion, and visible-depth image fusion. Coal mine scenarios are taken as representative examples, typical applications are discussed to illustrate their engineering value in personnel localization, equipment operation monitoring, and environmental condition perception. Finally, key challenges, such as annotation cost, cross-modal alignment, robustness, and lightweight deployment, are analyzed, and future directions toward mine automation, unmanned operations, and safety risk prediction are outlined. This paper aims to provide a reference and technical guidance for both theoretical studies and engineering applications of multimodal intelligent perception and recognition in mine safety analysis.

HTML全文

参考文献(71)

施引文献

资源附件(0)