Structural Parsing of Natural Language Text in Tamil Using Phrase Structure Hybrid Language Model

Parsing is important in Linguistics and Natural Language Processing to understand the syntax and semantics of a natural language grammar. Parsing natural language text is challenging because of the problems like ambiguity and inefficiency. Also the interpretation of natural language text depends on context based techniques. A probabilistic component is essential to resolve ambiguity in both syntax and semantics thereby increasing accuracy and efficiency of the parser. Tamil language has some inherent features which are more challenging. In order to obtain the solutions, lexicalized and statistical approach is to be applied in the parsing with the aid of a language model. Statistical models mainly focus on semantics of the language which are suitable for large vocabulary tasks where as structural methods focus on syntax which models small vocabulary tasks. A statistical language model based on Trigram for Tamil language with medium vocabulary of 5000 words has been built. Though statistical parsing gives better performance through tri-gram probabilities and large vocabulary size, it has some disadvantages like focus on semantics rather than syntax, lack of support in free ordering of words and long term relationship. To overcome the disadvantages a structural component is to be incorporated in statistical language models which leads to the implementation of hybrid language models. This paper has attempted to build phrase structured hybrid language model which resolves above mentioned disadvantages. In the development of hybrid language model, new part of speech tag set for Tamil language has been developed with more than 500 tags which have the wider coverage. A phrase structured Treebank has been developed with 326 Tamil sentences which covers more than 5000 words. A hybrid language model has been trained with the phrase structured Treebank using immediate head parsing technique. Lexicalized and statistical parser which employs this hybrid language model and immediate head parsing technique gives better results than pure grammar and trigram based model.

On-line Speech Enhancement by Time-Frequency Masking under Prior Knowledge of Source Location

This paper presents the source extraction system which can extract only target signals with constraints on source localization in on-line systems. The proposed system is a kind of methods for enhancing a target signal and suppressing other interference signals. But, the performance of proposed system is superior to any other methods and the extraction of target source is comparatively complete. The method has a beamforming concept and uses an improved time-frequency (TF) mask-based BSS algorithm to separate a target signal from multiple noise sources. The target sources are assumed to be in front and test data was recorded in a reverberant room. The experimental results of the proposed method was evaluated by the PESQ score of real-recording sentences and showed a noticeable speech enhancement.