Speech Signal Enhancement with Integrated Weighted Filtering for PSNR Reduction in Multimedia Applications
DOI:
https://doi.org/10.69996/jcai.2024011Keywords:
Speech Signal, Kalman Filter, Speech Enhancement, Classification, MultimediaAbstract
This paper investigates the effectiveness of the Weighted Kalman Integrated Band Rejection (WKBR) method for enhancing speech signals in multimedia applications. Speech enhancement is crucial for improving the quality and intelligibility of audio in environments with varying noise types and levels. The WKBR method is evaluated across ten different noise scenarios, including white noise, babble noise, street noise, airplane cabin noise, and more. Performance metrics such as Peak Signal-to-Noise Ratio (PSNR), Mean Squared Error (MSE), and Short-Time Objective Intelligibility (STOI) are used to quantify the enhancement. The results show significant improvements, with PSNR increasing from an average of 12.8 dB before enhancement to 21.9 dB after enhancement, MSE reducing from an average of 0.0179 to 0.0053, and STOI scores improving from an average of 0.58 to 0.75. These findings highlight the potential of WKBR as a powerful tool for speech signal enhancement, making it a promising solution for real-world multimedia applications where clear and intelligible speech is essential.
References
[1] V. K. Padarti, G. S. Polavarapu, M. Madiraju, V. V. Naga Sai Nuthalapati, V. B. Thota et al., “A Study on Effectiveness of Deep Neural Networks for Speech Signal Enhancement in Comparison with Wiener Filtering Technique,” In Advances in Speech and Music Technology: Computational Aspects and Applications, pp.121-135, 2022.
[2] V. Srinivasarao, “An efficient recurrent Rats function network (Rrfn) based speech enhancement through noise reduction,” Multimedia Tools and Applications, vol.81, no.21, pp.30599-30614, 2022. [3] Sreedhar Bhukya, D. Srinivasarao and Khasim Saheb, “Environmental Monitoring with Wireless Sensor Network for Energy Aware Routing and Localization,” Journal of Sensors, IoT & Health Sciences, vol.1, no.1, pp.27-39, 2023.
[4] P. Brundavani, D. Vishnu Vardhan and B. Abdul Raheem, “Ffsgc-Based Classification of Environmental Factors in IOT Sports Education Data during the Covid-19 Pandemic,” Journal of Sensors, IoT & Health Sciences, vol.2, no.1, pp.28-54, 2024. [5] B. K. Pandey, D. Pandey, S. Wairya, G. Agarwal, P. Dadeech et al., “Application of integrated steganography and image compressing techniques for confidential information transmission,” Cyber Security and Network Security, pp.169-191, 2022.
[6] I. Schiopu and A. Munteanu, “Deep Learning Post-Filtering Using Multi-Head Attention and Multiresolution Feature Fusion for Image and Intra-Video Quality Enhancement,” Sensors, vol.22, no.4, pp.1353, 2022.
[7] Y. Wang, S. Hu, S. Yin, Z. Deng and Y. H. Yang, “A multi-level wavelet-based underwater image enhancement network with color compensation prior,” Expert Systems with Applications, vol.242, pp.122710, 2024.
[8] Z. Q. Wang, G. Wichern, S. Watanabe and J. Le Roux, “STFT-domain neural speech enhancement with very low algorithmic latency,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.31, pp.397-410, 2022.
[9] A. B. Abdusalomov, F. Safarov, M. Rakhimov, B. Turaev and T. K. Whangbo, “Improved feature parameter extraction from speech signals using machine learning algorithm,” Sensors, vol.22, no.21, pp.8122, 2022.
[10] X. Bie, S. Leglaive, X. Alameda-Pineda and L. Girin, “Unsupervised speech enhancement using dynamical variational autoencoders,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.30, pp.2993-3007, 2022.
[11] K. Mannepalli, P. N. Sastry and M. Suman, “Emotion recognition in speech signals using optimization based multi-SVNN classifier,” Journal of King Saud University-Computer and Information Sciences, vol.34, no.2, pp.384-397, 2022.
[12] S. C. Venkateswarlu, N. U. Kumar, D. Veeraswamy and V. Vijay, “Speech intelligibility quality in telugu speech patterns using a wavelet-based hybrid threshold transform method,” In Intelligent systems and sustainable computing: proceedings of ICISSC 2021, pp. 449-462, 2022.
[13] S. Y. Chuang, H. M. Wang and Y. Tsao, “Improved lite audio-visual speech enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.30, pp.1345-1359, 2022.
[14] Z. Huang, S. Watanabe, S. W. Yang, P. García and S. Khudanpur, “Investigating self-supervised learning for speech enhancement and separation,” In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6837-6841, 2022.
[15] A. A. Abdelhamid, E. S. M. El-Kenawy, B. Alotaibi, G. M. Amer, M. Y. Abdelkader et al., “Robust speech emotion recognition using CNN+ LSTM based on stochastic fractal search optimization algorithm,” IEEE Access, vol.10, pp.49265-49284, 2022.
[16] A. Jain and B. Saha, “Blockchain integration for secure payroll transactions in Oracle Cloud HCM,” International Journal of New Research and Development, vol.5, no.12, pp.71-81, 2020.
[17] M. A.Khan, S. Abbas, A. Raza, F. Khan and T. Whangbo, “Emotion Based Signal Enhancement Through Multisensory Integration Using Machine Learning,” Computers, Materials & Continua, vol.71, no.3, 2022.
[18] L. Kumar and A. Biswanath Saha, “Evaluating the impact of AI-driven project prioritization on program success in hybrid cloud environments,” International Journal of Research in All Subjects in Multi Languages (IJRSML), vol.7, no.1, pp.78-99, 2019.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Journal of Computer Allied Intelligence(JCAI)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Fringe Global Scientific Press publishes all the papers under a Creative Commons Attribution-Non-Commercial 4.0 International (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/) license. Authors have the liberty to replicate and distribute their work. Authors have the ability to use either the whole or a portion of their piece in compilations or other publications that include their own work. Please see the licensing terms for more information on reusing the work.