Dahl, G. (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82-97.

From Digital Culture & Society

(Difference between revisions)
Jump to: navigation, search
Revision as of 11:03, 5 March 2018 (edit)
Wb14no (Talk | contribs)
(New page: <!-- Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., . . . Kingsbury, B. (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four...)
← Previous diff
Revision as of 11:04, 5 March 2018 (edit) (undo)
Wb14no (Talk | contribs)

Next diff →
Line 1: Line 1:
-<!-- Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., . . . Kingsbury, B. (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82-97. doi:10.1109/msp.2012.2205597+[[ Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., . . . Kingsbury, B. (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82-97. doi:10.1109/msp.2012.2205597
'''Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups''' '''Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups'''
Line 18: Line 18:
Warren Buzanko - 5750021 Warren Buzanko - 5750021
-Monday, March 5th, 2018+Monday, March 5th, 2018]]
- -->+

Revision as of 11:04, 5 March 2018

[[ Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., . . . Kingsbury, B. (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82-97. doi:10.1109/msp.2012.2205597

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

The goal of speech recognition software is twofold. Speech recognition software aims to achieve 100% accuracy interpreting audio input, as well as 100% accuracy translating audio input into textual output. Most current automated intelligence speech recognition systems use hidden Markov models and Gaussian mixture models to understand acoustic input and determine the appropriate response. For the purpose of this article, this process will be referred to as the “traditional model”. When speech is input, software uses statistical analysis to correctly interpret spoken language, and predict the “real,” or “actual” meaning the user intended to associated with input. Every new data input creates a new “occurrence”. New occurrences are recorded and stored in the computer’s memory, where outcomes are aggregated and called upon to inform the computer about how it should respond to the next occurrence. The software immediately starts comparing input against known language patterns and information recorded from previous experiences about that specific individual’s speech patterns. The computer uses a predictive model to forecast the next logical output, communicating custom information tailored for each individual user via text.

Google Inc. is a significant contributor on several advanced projects concerning the development of speech recognition software. Researchers working for Google include some of the most published and most cited individuals currently in the field. While working as part of an advanced research team at Google in 2012, researcher George Dahl and 10 others published a paper in Signal Processing Magazine titled, Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.

Working collaboratively the 4 teams from the University of Toronto, Google Inc., Microsoft Research, and IBM Research, identified a new alternative method for Acoustic Modeling in Speech Recognition, which uses a feed-forward Deep Neural Network to achieve near 100% accuracy interpreting audio input, and translating audio input into textual output. The authors claim that Deep Neural Networks outperform speech recognition softwares informed by the “traditional model” on a variety of benchmarks, sometimes by a large margin.

The researchers claim that despite all their advantages, the traditional model has serious shortcomings, mainly it is statistically inefficient for modeling data. The systems’ over complicated procedures slow down processing speeds, which increases the probability of the software producing an inaccurate response. Deep Neural Networks address these inefficiencies by introducing new advances in both machine learning algorithms and computer hardware. The implementation of new algorithms and an increase in computational power led to the development of modern training models for Deep Neural Networks, which created levels of system efficiency beyond what researchers previously considered possible.

When deep neural networks were first used, they were trained discriminatively. It was only as recently as this publication that researchers showed significant gains could be achieved by adding an initial stage of generative pretraining. The addition of this pre training stage reduced overfitting by exploiting information in neighboring “frames/ occurrences,” and the time taken for fine-tuning. Pre training DNN’s also reduces the time required for discriminative fine-tuning with backpropagation, one of the main impediments to using DNN’s when neural networks were first used in place of the traditional model. The Researchers also found that similar reductions in training time can be achieved by carefully adjusting the scales of the initial random weights in each layer.

Understanding AI systems, specifically Acoustic Modeling in Speech Recognition, has significant implication for those studying mass communication and digital culture. As products integrate these technologies more seamlessly into daily life, consumers need to be aware of how reliance on advanced thought machines can change their behaviour.

^^^^Needs a bit more expansion here. Unfinished. WB.

Warren Buzanko - 5750021 Monday, March 5th, 2018]]

Personal tools
Bookmark and Share