Multimodal Cognitive Fusion for Low-Resource Environments

Khushi Rathore ^[/], Harsha Narayan^[/], Shashwat Singh^[/], Sowmya Natarajan

Department of ECE, SRM Institute of Science and Technology, Kattankulathur, Chennai 603203, India

ABSTRACT: The growing use of Artificial Intelligence in real-world systems has created a strong need for models that can understand and process different types of data at the same time. However, current multimodal fusion models rely heav-ily on large datasets and computing power, which makes them unsuitable for low-resource environments. This paper presents CAF-Net (Cognitive Adaptive Fusion Network), a lightweight and resource-efficient framework inspired by human thinking. CAF-Net uses specific encoders for each modality, a Cognitive Attention Fusion Layer (CAFL), and an Adaptive Decision Layer (ADL) to balance accuracy and efficiency based on system limits. The CAFL dynamical-ly assigns attention weights to the most useful modalities, while the ADL ad-justs inference by turning modalities on or off depending on real-time resource availability. Experimental results show that CAF-Net achieves accuracy similar to transformer-based models while significantly lowering computing demands, energy use, and latency. Its adaptability makes it ideal for edge AI, IoT healthcare, and mobile computing applications in settings with limited resources. .

KEYWORDS: Multimodal Learning, Cognitive Fusion, Low-Resource AI, Edge Computing, TinyML.

ACKNOWLEDGMENTS: The authors would like to thank the Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, for their guidance and support throughout this research. The authors also express their gratitude to faculty mentors and peers for their valuable suggestions and technical discussions that contributed to the develop-ment of this work.

REFERENCES:

[1] Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 41(2), 423–443 (2019).

[2] Tsai, Y.H., Bai, S., Liang, P.P., Kolter, J.Z., Morency, L.P., Salakhutdinov, R.: Multimodal Transformer for Unaligned Multimodal Language Sequences. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 6558–6569 (2019).

[3] Ma, S., Zhang, X., Lee, K.: TinyML: Enabling Deep Learning on Resource-Constrained Devices. IEEE Internet of Things Journal 9(15), 13432–13448 (2022).

[4] Zhou, L., Chen, X., Wang, J., Zhang, Y.: Efficient Multimodal Learning via Cross-Modal Knowledge Distillation. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 22531–22542 (2021)

[5] Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009).

[6] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention Is All You Need. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 5998–6008 (2017).

[7] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint arXiv:1704.04861 (2017).

[8] Wang, H., Ji, Z., Chen, W.: Adaptive Fusion for Multimodal Emotion Recognition. IEEE Transactions on Affective Computing 11(4), 602–614 (2020).

[9] Xu, R., Sun, S.: Cognitive-Inspired AI for Resource-Constrained Systems. IEEE Access 11, 12214–12229 (2023).

[10] Zong, Y., Mac Aodha, O., Hospedales, T.: Self-Supervised Multimodal Learning: A Survey. arXiv preprint arXiv:2303.14567 (2023).

[11] Li, H., Yang, Z., Wang, L.: A Survey of Multimodal Learning: Methods, Applications, and Future Directions. ACM Computing Surveys (2024).

[12] Wu, R., Wang, H., Chen, H.T., Carneiro, G.: Deep Multimodal Learning with Missing Modality: A Survey. arXiv preprint arXiv:2401.08244 (2024).

[13] Manzoor, M.A., Javed, A., Rahman, S., Khan, M.: Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications. IEEE Access 11, 45611–45629 (2023).

[14] Li, S., Tang, H.: Multimodal Alignment and Fusion: A Survey. IEEE Transactions on Multimedia (2024).

[15] Kulkarni, V.: TinyML Using Neural Networks for Resource-Constrained Devices. IEEE Embedded Systems Letters (2024).

[16] Immonen, R., Niemi, T., Tuovinen, P.: Tiny Machine Learning for Resource-Constrained Devices. IEEE Internet of Things Magazine 5(3), 60–68 (2022).

[17] Lê, M.T.: Efficient Neural Networks for Tiny Machine Learning. Sensors 23(8), 3921 (2023).

[18] Pietrołaj, M., Pławiak, P., Kawala-Sterniuk, A.: Resource-Constrained Neural Network Training. Applied Intelligence (2024).

[19] Chen, Z., Zhao, Y., Liu, X.: Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey. ACM Computing Surveys (2024).

[20] Rashid, H.A., Ovi, P.R., Busart, C., Gangopadhyay, A.: TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices. IEEE Transactions on Multimedia (2024).

[21] Bhattacharya, S., Sahu, S.K., Reddy, P.: Edge-AI for Multimodal Systems: Challenges and Emerging Solutions. IEEE Internet of Things Journal 11(2), 1586–1601 (2024).

[22] Kim, J., Lee, D., Park, S.: Energy-Aware Deep Neural Networks for Edge Computing. IEEE Transactions on Neural Networks and Learning Systems 34(6), 2459–2473 (2023).

[23] Ahmed, M., Zhao, W., Chen, H.: Lightweight Multimodal Fusion for IoT-Based Environmental Monitoring. Sensors 23(12), 5657 (2023).

[24] Liu, Q., Zhang, F., Jiang, Y.: Cognitive-Inspired Attention Models for Multimodal Perception in Embedded Systems. Neurocomputing 565, 127054 (2024).

[25] Singh, A., Bansal, P., Srivastava, M.: Adaptive Knowledge Distillation for Tiny Multimodal Networks. In: Proceedings of the IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 309–314 (2024).

IZVOR: Proceedings of the 16th International Conference on Business Information Security BISEC’2025

Menu

Multimodal Cognitive Fusion for Low-Resource Environments

Multimodal Cognitive Fusion for Low-Resource Environments

Khushi Rathore ^[/], Harsha Narayan^[/], Shashwat Singh^[/], Sowmya Natarajan

Department of ECE, SRM Institute of Science and Technology, Kattankulathur, Chennai 603203, India

sowmyan1@srmist.edu.in

DOI: 10.46793/BISEC25.405R

Multimodal Cognitive Fusion for Low-Resource Environments

Multimodal Cognitive Fusion for Low-Resource Environments

Khushi Rathore [/], Harsha Narayan[/], Shashwat Singh[/], Sowmya Natarajan

Department of ECE, SRM Institute of Science and Technology, Kattankulathur, Chennai 603203, India

sowmyan1@srmist.edu.in

DOI: 10.46793/BISEC25.405R

Khushi Rathore ^[/], Harsha Narayan^[/], Shashwat Singh^[/], Sowmya Natarajan