Advances in Multi-modal Fusion Techniques and Applications in Agricultural Field

LI Daoliang, ZHAO Ye, DU Zhuangzhuang

Abstract

Multi-modal fusion technology, by combining data from multiple sources, has been widely applied in fields such as medicine, autonomous driving, and emotion recognition to overcome the limitations of a single modality. In recent years, advancements in sensor and remote sensing technologies have provided richer data sources for crop monitoring, including spectral data, image data, radar data, and thermal infrared data. By utilizing computer vision and data analysis methods, information such as phenotypic parameters and physicochemical characteristics of crops can be obtained, helping to assess crop growth and guide agricultural production management. Most existing studies were based on single-modal data, which involved only one type of input and lacked an understanding of the overall information, making them susceptible to noise from a single modality. Although some studies employed multi-modal fusion technology, they still did not fully consider the complex interactions between modalities. To thoroughly analyze the potential of multi-modal fusion technology in crop monitoring, the advanced technologies and methods of multi-modal fusion in the agricultural field were firstly outlined, with a focus on its application in crop identification, trait analysis, yield prediction, stress analysis, and pest and disease diagnosis. The existing challenges were also discussed and an outlook on future developments was provided, aiming to promote precision agriculture management and improve production efficiency through multi-modal fusion methods.

 

Keywords: multi-modal fusion, sensors, remote sensing technology, crop monitoring, computer vision, precision agriculture management

 

Download Full Text:

PDF


References


YUE Xuejun, SONG Qingkui, Ll Zhiqing, et al. Research status and prospect of crop information monitoring technology in field [J]. Journal of Soutli China Agricultural University, 2023, 44(1) :43 -56.

WANG Pengxin, TIAN Huiren, ZHANG Yue, et al. Crop growth monitoring and yield estimation based on deep learning: state of the art and beyond [J]. Transactions of the Chinese Society for Agricultural Machinery, 2022,53(2) ; 1 — 14.

MA Yanpeng, BIAN Mingbo, FAN Yiguang, et al. Estimation of potassium content in potato leaves based on canopy spectrum and coverage [J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 (12) :226 -233.

TANG Zijun, WANG Xin,et al. Nitrogen nutrition diagnosis of winter oilseed rape using spectral indexes optimized by correlation matrix method [J]. Transactions of the CSAE, 2023, 39 (17); 97 - 106.

BERGER K, VERRELST J, FfiRET J В , et al. Crop nitrogen monitoring: recent progress and principal developments in the context of imaging spectroscopy missions [J] . Remote Sensing of Environment, 2020,242; 111758.

MIHZAEV К G, KIOURT C. Machine learning and thermal imaging in precision agriculture[С] / International Conference on Information, Intelligence, Systems, and Applications, 2023:168 -187.

LI Qingsong, KANG Lichun, RAO Honghui, et al. Recognition method of Camellia oleifera fruit in natural environment based on improved YOLO v4~Tiny[J]. Journal of Chinese Agricultural Mechanization, 2023,44 ( 10) ;224 -230. (in Chinese).

YUAN Hongbo, ZHAO Nudong, CHENG Man. Review of weeds recognition based on image processing. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51 (2) :323 -334.

HUANG S C, PAKEEK A, ZAMANIAN R, et al. Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record;a case-study in pulmonary embolism detection [J]. Scientific Reports,2020,10(1) :22147.

ZHANG X, YIN X, GAO X, et al. Adaptive entropy multi-modal fusion for nighttime lane segmentation [J]. IEEE Transactions on Intelligent Vehicles, 2024,31:1 - 13.

SHEN Weihao, ZHONG Yanfei, WANG Junjue, et al. Construction and application of flood disaster knowledge graph based on multi-modal data [J]. Geomatics and Information Science of Wuhan University,2023,48(12) :2009 -2018. (in Chinese)

WU Y, CHEN J, WU S, et al. An improved YOLO v7 network using KGB - D multi-modal feature fusion for tea shoots detection [J]- Computers and Electronics in Agriculture,2024, 216:108541.

ZHANG Na, LlU Juan, JIN Yu, et al. An adaptive multi-modal hybrid model for classifying thyroid nodules by combining ultrasound and infrared thermal images [J] . BMC Bioinformatics, 2023,24( 1) ;315.

ALGHOWINEM S, GOECKE R, COHN J F, et al. Cross-cultural detection of depression from nonverbal behaviour [C] //2025 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). 1EEE,2015:1 -8.

KONG J, ZHENG W, QI Z, et al. KTMFusion; an enhanced dual-stream architecture algorithm fusing KGB and depth features for instance segmentation of tomato organs [J]. Measurement, 2024, 239(15 ): 115484.

ZHONG Z, LIU X, JIANG J, et al. High-resolution depth maps imaging via attention-based hierarchical multi-modal fusion [J]- IEEE Transactions on Image Processing,2021,31 : 648 -663.

YANG D, WANG F, HU Y, et al. Citrus huanglongbing detection based on multi-modal feature fusion learning[J]. Frontiers in Plant Science,2021,12:809506.

KEN G, WU К , YIN L,et al. Description of tea quality using deep learning and multi-sensor feature fusion[J]. Journal of Food Composition and Analysis,2024,126 :105924.

LI Shanjun, SONG Zhuping, LIANG Qianyue,et al. Nondestructive detection of citrus infested by Bactrocera dorsalis based on X-ray and RGB image data fusion[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(1) :385 -392.

LIU F, CHEN J, LI K, et al. STP-MFM: semi-tensor product-based multi-modal factorized multilinear pooling for information fusion in sentiment analysis[J] . Digital Signal Processing, 2024,145:104265.

LIU Z, CHENG J, LIU L, et al. Dual-stream cross-modality fusion transformer for KGB –D action recognition[J]. Knowledge - Based Systems,2022,255:109741.

CAI Z, HU Q, ZHANG X, et al. Improving agricultural field parcel delineation with a dual branch spatiotemporal fusion network by integrating multimodal satellite data [J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2023, 205:34 -49.


Refbacks

  • There are currently no refbacks.