A Study of Speech Recognition with Deep Learning

¹Feng Li, ²Yiyang Wei

^1,2School of management science and Engineering, Anhui University of Finance and Economics, Bengbu 233030, China

DOI : https://doi.org/10.47191/ijmra/v5-i5-12

Google Scholar Download Pdf
ABSTRACT:

The development of deep learning and the continuous progress of artificial intelligence have contributed to the rapid development of speech recognition. Among them, the end-to-end structure is the more important part of the whole speech recognition. This paper introduces two end-to-end speech recognition methods, the attention model and the CTC loss function, describes the practical application of deep learning in speech recognition and suggests improvements to the two models. Finally, the practical usefulness of speech recognition is demonstrated by analyzing the application of trigger word detection and sentiment analysis in artificial intelligence in teaching and learning.

KEYWORDS:

Speech Recognition; Deep learning; CTC Loss Function; Sentiment Analysis.

REFERENCES

1) P. Wang, "Research and Design of Smart Home Speech Recognition System Based on Deep Learning," 2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL), 2020, pp. 218-221.

2) A. Winursito, R. Hidayat and A. Bejo, "Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition," 2018 International Conference on Information and Communications Technology (ICOIACT), 2018, pp. 379-383.

3) T. R. Kumar, S. Padmapriya, V. T. Bai, P. M. Beulah Devamalar and G. R. Suresh, "Conversion of non-audible murmur to normal speech through Wi-Fi transceiver for speech recognition based on GMM model," 2015 2nd International Conference on Electronics and Communication Systems (ICECS), 2015, pp. 802-808.

4) J. Rahman Saurav, S. Amin, S. Kibria and M. Shahidur Rahman, "Bangla Speech Recognition for Voice Search," 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), 2018, pp. 1-4.

5) M. Mimura, S. Ueno, H. Inaguma, S. Sakai and T. Kawahara, "Leveraging Sequence-to-Sequence Speech Synthesis for Enhancing Acoustic-to-Word Speech Recognition," 2018 IEEE Spoken Language Technology Workshop (SLT), 2018, pp. 477-484.

6) F. Mitsugi, S. Kusumegi, T. Kawasaki, T. Nakamiya and Y. Sonoda, "Detection of Pressure Waves Emitted From Plasma Jets With Fibered Optical Wave Microphone in Gas and Liquid Phases," in IEEE Transactions on Plasma Science, vol. 44, no. 12, pp. 3077-3082, Dec. 2016.

7) Liu Chien Chih and Chiang Che Ming, "The effect of environment of different noise frequencies on human physiological responses," 2011 International Conference on Multimedia Technology, 2011, pp. 1808-1811.

8) N. Uma Maheswari, A. P. Kabilan and R. Venkatesh, "Speaker independent speech recognition system based on phoneme identification," 2008 International Conference on Computing, Communication and Networking, 2008, pp. 1-6.

9) C. Fan, J. Yi, J. Tao, Z. Tian, B. Liu and Z. Wen, "Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 198-209, 2021.

10) J. Sun, G. Zhou, H. Yang and M. Wang, "End-to-end Tibetan Ando dialect speech recognition based on hybrid CTC/attention architecture," 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2019, pp. 628-632.

11) J. -n. Chen, S. Gao, H. -z. Sun, X. -h. Liu, Z. -n. Wang and Y. Zheng, "An End-to-end Speech Recognition Algorithm based on Attention Mechanism," 2020 39th Chinese Control Conference (CCC), 2020, pp. 2935-2940.

12) H. Zhang, "An Exploration of Recurrent Units for Automatic Speech Recognition with RNN based Acoustic Model," 2019 2nd International Conference on Information Systems and Computer Aided Education (ICISCAE), 2019, pp. 563-566.

13) C. Shan, J. Zhang, Y. Wang and L. Xie, "Attention-Based End-to-End Speech Recognition on Voice Search," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 4764-4768.

14) J. Cui et al., "Improving Attention-Based End-to-End ASR Systems with Sequence-Based Loss Functions," 2018 IEEE Spoken Language Technology Workshop (SLT), 2018, pp. 353-360.

15) S. Sigtia, J. Bridle, H. Richards, P. Clark, E. Marchi and V. Garg, "Progressive Voice Trigger Detection: Accuracy vs Latency," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6843-6847.

16) H. Shim, D. Lowet, S. Luca and B. Vanrumste, "LETS: A Label-Efficient Training Scheme for Aspect-Based Sentiment Analysis by Using a Pre-Trained Language Model," in IEEE Access, vol. 9, pp. 115563-115578, 2021.

17) M. Aliramezani, E. Doostmohammadi, M. H. Bokaei and H. Sameti, "Persian Sentiment Analysis Without Training Data Using Cross-Lingual Word Embeddings," 2020 10th International Symposium on Telecommunications (IST), 2020, pp. 78-82.

18) R. MohammadiBaghmolaei and A. Ahmadi, "Word Embedding for Emotional Analysis: An Overview," 2020 28th Iranian Conference on Electrical Engineering (ICEE), 2020, pp. 1-5.

19) D. Goularas and S. Kamis, "Evaluation of Deep Learning Techniques in Sentiment Analysis from Twitter Data," 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), 2019, pp. 12-17.

Volume 05 Issue 05 MAY 2022

There is an Open Access article, distributed under the term of the Creative Commons Attribution – Non Commercial 4.0 International (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/), which permits remixing, adapting and building upon the work for non-commercial use, provided the original work is properly cited.

Our Services and Policies

The Journal reserves the right to make any further formal changes and language corrections necessary in a manuscript accepted for publication so that it conforms to the formatting requirements of the Journal.

International Journal of Multidisciplinary Research and Analysis will publish 12 monthly online issues per year,IJMRA publishes articles as soon as the final copy-edited version is approved. IJMRA publishes articles and review papers of all subjects area.

Open access is a mechanism by which research outputs are distributed online, Hybrid open access journals, contain a mixture of open access articles and closed access articles.

International Journal of Multidisciplinary Research and Analysis initiate a call for research paper for Volume 07 Issue 12 (December 2024).

PUBLICATION DATES:
1) Last Date of Submission : 26 December 2024 .
2) Article published within a week.
3) Submit Article : editor@ijmra.in or Online

Why with us

International Journal of Multidisciplinary Research and Analysis is better then other journals because:-
1 : IJMRA only accepts original and high quality research and technical papers.
2 : Paper will publish immediately in current issue after registration.
3 : Authors can download their full papers at any time with digital certificate.

The Editors reserve the right to reject papers without sending them out for review.

Authors should prepare their manuscripts according to the instructions given in the authors' guidelines. Manuscripts which do not conform to the format and style of the Journal may be returned to the authors for revision or rejected. The Journal reserves the right to make any further formal changes and language corrections necessary in a manuscript accepted for publication so that it conforms to the formatting requirements of the Journal.

Volume 05 Issue 05 MAY 2022

A Study of Speech Recognition with Deep Learning

1Feng Li, 2Yiyang Wei

1,2School of management science and Engineering, Anhui University of Finance and Economics, Bengbu 233030, China

Volume 05 Issue 05 MAY 2022

Our Services and Policies

Why with us

The Editors reserve the right to reject papers without sending them out for review.

Indexed In

¹Feng Li, ²Yiyang Wei

^1,2School of management science and Engineering, Anhui University of Finance and Economics, Bengbu 233030, China