Speech Signal Enhancement with Integrated Weighted Filtering for PSNR Reduction in Multimedia Applications
DOI:
https://doi.org/10.69996/jcai.2024011Keywords:
Speech Signal, Kalman Filter, Speech Enhancement, Classification, MultimediaAbstract
This paper investigates the effectiveness of the Weighted Kalman Integrated Band Rejection (WKBR) method for enhancing speech signals in multimedia applications. Speech enhancement is crucial for improving the quality and intelligibility of audio in environments with varying noise types and levels. The WKBR method is evaluated across ten different noise scenarios, including white noise, babble noise, street noise, airplane cabin noise, and more. Performance metrics such as Peak Signal-to-Noise Ratio (PSNR), Mean Squared Error (MSE), and Short-Time Objective Intelligibility (STOI) are used to quantify the enhancement. The results show significant improvements, with PSNR increasing from an average of 12.8 dB before enhancement to 21.9 dB after enhancement, MSE reducing from an average of 0.0179 to 0.0053, and STOI scores improving from an average of 0.58 to 0.75. These findings highlight the potential of WKBR as a powerful tool for speech signal enhancement, making it a promising solution for real-world multimedia applications where clear and intelligible speech is essential.
References
[1] V. K.Padarti, G. S. Polavarapu, M.Madiraju, V. V. Naga Sai Nuthalapati, V. B. Thota et al., “A Study on Effectiveness of Deep Neural Networks for Speech Signal Enhancement in Comparison with Wiener Filtering Technique,” In Advances in Speech and Music Technology: Computational Aspects and Applications, pp. 121-135, 2022.
[2] V.Srinivasarao, “An efficient recurrent Rats function network (Rrfn) based speech enhancement through noise reduction,”Multimedia Tools and Applications, vol.81, no.21, pp.30599-30614,2022.
[3] P.Singh, A.K.Bhandari and R. Kumar, “Naturalness balance contrast enhancement using adaptive gamma with cumulative histogram and median filtering,” Optik, vol.251, pp.168251, 2022.
[4] V. R.Tank and S. P. Mahajan, “Adaptive recurrent nonnegative matrix factorization with phase compensation for Single-Channel speech enhancement,” Multimedia Tools and Applications, vol.81, no.20, pp.28249-28294, 2022.
[5] B. K.Pandey, D.Pandey, S. Wairya, G.Agarwal, P. Dadeech et al., “Application of integrated steganography and image compressing techniques for confidential information transmission,” Cyber Security and Network Security, pp.169-191, 2022.
[6] I.Schiopu and A. Munteanu, “Deep Learning Post-Filtering Using Multi-Head Attention and Multiresolution Feature Fusion for Image and Intra-Video Quality Enhancement,” Sensors, vol.22, no.4, pp.1353, 2022.
[7] Y.Wang, S.Hu, S.Yin, Z.Deng and Y. H. Yang, “A multi-level wavelet-based underwater image enhancement network with color compensation prior,” Expert Systems with Applications, vol.242,pp.122710, 2024.
[8] Z. Q.Wang, G.Wichern, S.Watanabe and J. Le Roux, “STFT-domain neural speech enhancement with very low algorithmic latency,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.31, pp.397-410, 2022.
[9] A. B.Abdusalomov, F.Safarov, M.Rakhimov, B.Turaev and T.K. Whangbo, “Improved feature parameter extraction from speech signals using machine learning algorithm,” Sensors, vol.22, no.21, pp.8122, 2022.
[10] X.Bie, S. Leglaive, X.Alameda-Pineda and L. Girin, “Unsupervised speech enhancement using dynamical variational autoencoders,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.30, pp.2993-3007, 2022.
[11] K.Mannepalli, P.N.Sastry and M. Suman, “Emotion recognition in speech signals using optimization based multi-SVNN classifier,” Journal of King Saud University-Computer and Information Sciences, vol.34, no.2, pp.384-397, 2022.
[12] S. C.Venkateswarlu, N. U.Kumar, D.Veeraswamy and V. Vijay, “Speech intelligibility quality in telugu speech patterns using a wavelet-based hybrid threshold transform method,” In Intelligent systems and sustainable computing: proceedings of ICISSC 2021, pp. 449-462, 2022.
[13] S. Y.Chuang, H. M.Wang and Y. Tsao, “Improved lite audio-visual speech enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.30, pp.1345-1359,2022.
[14] Z.Huang, S.Watanabe, S.W. Yang, P. García and S. Khudanpur, “Investigating self-supervisedlearning for speech enhancement and separation,” In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6837-6841, 2022.
[15] A. A.Abdelhamid, E. S. M. El-Kenawy, B. Alotaibi, G. M. Amer, M. Y. Abdelkader et al., “Robust speech emotion recognition using CNN+ LSTM based on stochastic fractal search optimization algorithm,” IEEE Access, vol.10, pp.49265-49284, 2022.
[16] M. A.Khan, S. Abbas, A. Raza, F. Khan and T. Whangbo, “Emotion Based Signal Enhancement Through Multisensory Integration Using Machine Learning,” Computers, Materials & Continua, vol.71, no.3, 2022.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Journal of Computer Allied Intelligence(JCAI)
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Fringe Global Scientific Press publishes all the papers under a Creative Commons Attribution-Non-Commercial 4.0 International (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/) license. Authors have the liberty to replicate and distribute their work. Authors have the ability to use either the whole or a portion of their piece in compilations or other publications that include their own work. Please see the licensing terms for more information on reusing the work.