Abstract: There are multiple reasons to expect that detecting the
word order errors in a text will be a difficult problem, and detection
rates reported in the literature are in fact low. Although grammatical
rules constructed by computer linguists improve the performance of
grammar checker in word order diagnosis, the repairing task is still
very difficult. This paper presents an approach for repairing word
order errors in English text by reordering words in a sentence and
choosing the version that maximizes the number of trigram hits
according to a language model. The novelty of this method concerns
the use of an efficient confusion matrix technique for reordering the
words. The comparative advantage of this method is that works with
a large set of words, and avoids the laborious and costly process of
collecting word order errors for creating error patterns.
Abstract: Emotion in speech is an issue that has been attracting
the interest of the speech community for many years, both in the
context of speech synthesis as well as in automatic speech
recognition (ASR). In spite of the remarkable recent progress in
Large Vocabulary Recognition (LVR), it is still far behind the
ultimate goal of recognising free conversational speech uttered by
any speaker in any environment. Current experimental tests prove
that using state of the art large vocabulary recognition systems the
error rate increases substantially when applied to
spontaneous/emotional speech. This paper shows that recognition
rate for emotionally coloured speech can be improved by using a
language model based on increased representation of emotional
utterances.
Abstract: Parsing is important in Linguistics and Natural
Language Processing to understand the syntax and semantics of a
natural language grammar. Parsing natural language text is
challenging because of the problems like ambiguity and inefficiency.
Also the interpretation of natural language text depends on context
based techniques. A probabilistic component is essential to resolve
ambiguity in both syntax and semantics thereby increasing accuracy
and efficiency of the parser. Tamil language has some inherent
features which are more challenging. In order to obtain the solutions,
lexicalized and statistical approach is to be applied in the parsing
with the aid of a language model. Statistical models mainly focus on
semantics of the language which are suitable for large vocabulary
tasks where as structural methods focus on syntax which models
small vocabulary tasks. A statistical language model based on Trigram
for Tamil language with medium vocabulary of 5000 words has
been built. Though statistical parsing gives better performance
through tri-gram probabilities and large vocabulary size, it has some
disadvantages like focus on semantics rather than syntax, lack of
support in free ordering of words and long term relationship. To
overcome the disadvantages a structural component is to be
incorporated in statistical language models which leads to the
implementation of hybrid language models. This paper has attempted
to build phrase structured hybrid language model which resolves
above mentioned disadvantages. In the development of hybrid
language model, new part of speech tag set for Tamil language has
been developed with more than 500 tags which have the wider
coverage. A phrase structured Treebank has been developed with 326
Tamil sentences which covers more than 5000 words. A hybrid
language model has been trained with the phrase structured Treebank
using immediate head parsing technique. Lexicalized and statistical
parser which employs this hybrid language model and immediate
head parsing technique gives better results than pure grammar and
trigram based model.