Phonetic Search Methods for Large Speech Databases by Ami Moyal, Vered Aharonson, Ella Tetariy, Michal Gishri

By Ami Moyal, Vered Aharonson, Ella Tetariy, Michal Gishri

“Phonetic seek tools for giant Databases” makes a speciality of key-phrase recognizing (KWS) inside huge speech databases. The short will start via outlining the demanding situations linked to key-phrase recognizing inside huge speech databases utilizing dynamic key-phrase vocabularies. it's going to then proceed by means of highlighting a number of the industry segments short of KWS strategies, in addition to, the explicit specifications of every marketplace phase. The paintings additionally contains a unique description of the complexity of the duty and different tools which are used, together with the benefits and drawbacks of every procedure and an in-depth comparability. the focus can be at the Phonetic seek technique and its effective implementation. it will comprise a literature assessment of some of the equipment used for the effective implementation of Phonetic seek key-phrase recognizing, with an emphasis at the authors’ personal study which includes a comparative research of the Phonetic seek approach such as algorithmic info. This short turns out to be useful for researchers and builders in academia and from the fields of speech processing and speech acceptance, particularly key-phrase Spotting.

Show description

Read or Download Phonetic Search Methods for Large Speech Databases PDF

Similar ai & machine learning books

Artificial Intelligence Through Prolog

Man made Intelligence via Prolog booklet

Language, Cohesion and Form (Studies in Natural Language Processing)

As a pioneer in computational linguistics, operating within the earliest days of language processing via machine, Margaret Masterman believed that which means, now not grammar, was once the major to figuring out languages, and that machines might be certain the that means of sentences. This quantity brings jointly Masterman's groundbreaking papers for the 1st time, demonstrating the significance of her paintings within the philosophy of technology and the character of iconic languages.

Handbook of Natural Language Processing

This examine explores the layout and alertness of common language text-based processing platforms, in response to generative linguistics, empirical copus research, and synthetic neural networks. It emphasizes the sensible instruments to deal with the chosen process

Additional info for Phonetic Search Methods for Large Speech Databases

Sample text

1) These probabilities can then be used to define a posterior quality measure per phoneme, which is a weighted sum of all the posterior probabilities, as follows: 26 4 Search Space Complexity Reduction Fig. 10 Hypothesis generation using real-time dynamic anchors Keyword List On-line (KWS) Real-time Anchor computation based QM estimation Off-line or Real-Time (Recognition) Input Speech DB Phonetic Search KWS Engine Keyword hypotheses Acoustic Models Phoneme Decoder Textual Phoneme Sequence DB (Length M) Bi-phone (LM) Off-line (Training) Acoustic Models Speech Dev.

The expected computational complexity of the exhaustive search and the suggested algorithm using the parameters of the DBs is presented in Table 2. The tests were run on an input sequence of phonemes, but could also be run on a lattice. 3 seconds. 55 seconds. A reduction of almost 90% in processing time was achieved. 1 Exhaustive Search In order to test the effectiveness of the overall mechanism, including the distance measure and the threshold-based decision, a synthetic experiment was first conducted as a benchmark.

These characteristics lead to much poorer phoneme recognition results, making this seeming inconsistency less surprising. When a separate threshold was used for each keyword, the DR drastically improved, thus producing similar KWS performance for both spontaneous and read speech. Evidently, the distance threshold value has a major influence on the working point of the system in the sense of the trade-off between the FAR and DR rates. Further analysis suggests that the dynamic range of the values differs per word, and thus, a different threshold per word can dramatically improve performance.

Download PDF sample

Rated 4.56 of 5 – based on 43 votes