Abstract: The internet is growing larger and becoming the most popular platform for the people to share their opinion in different interests. We choose the education domain specifically comparing some Malaysian universities against each other. This comparison produces benchmark based on different criteria shared by the online users in various online resources including Twitter, Facebook and web pages. The comparison is accomplished using opinion mining framework to extract, process the unstructured text and classify the result to positive, negative or neutral (polarity). Hence, we divide our framework to three main stages; opinion collection (extraction), unstructured text processing and polarity classification. The extraction stage includes web crawling, HTML parsing, Sentence segmentation for punctuation classification, Part of Speech (POS) tagging, the second stage processes the unstructured text with stemming and stop words removal and finally prepare the raw text for classification using Named Entity Recognition (NER). Last phase is to classify the polarity and present overall result for the comparison among the Malaysian universities. The final result is useful for those who are interested to study in Malaysia, in which our final output declares clear winners based on the public opinions all over the web.
Abstract: Bone Anchored Hearing Implants (BAHI) are
routinely used in patients with conductive or mixed hearing loss, e.g.
if conventional air conduction hearing aids cannot be used. New
sound processors and new fitting software now allow the adjustment
of parameters such as loudness compression ratios or maximum
power output separately. Today it is unclear, how the choice of these
parameters influences aided speech understanding in BAHI users.
In this prospective experimental study, the effect of varying the
compression ratio and lowering the maximum power output in a
BAHI were investigated.
Twelve experienced adult subjects with a mixed hearing loss
participated in this study. Four different compression ratios (1.0; 1.3;
1.6; 2.0) were tested along with two different maximum power output
settings, resulting in a total of eight different programs. Each
participant tested each program during two weeks. A blinded Latin
square design was used to minimize bias.
For each of the eight programs, speech understanding in quiet and
in noise was assessed. For speech in quiet, the Freiburg number test
and the Freiburg monosyllabic word test at 50, 65, and 80 dB SPL
were used. For speech in noise, the Oldenburg sentence test was
administered.
Speech understanding in quiet and in noise was improved
significantly in the aided condition in any program, when compared
to the unaided condition. However, no significant differences were
found between any of the eight programs. In contrast, on a subjective
level there was a significant preference for medium compression
ratios of 1.3 to 1.6 and higher maximum power output.
Abstract: Paper presents an comparative evaluation of features extraction algorithm for a real-time isolated word recognition system
based on FPGA. The Mel-frequency cepstral, linear frequency cepstral, linear predictive and their cepstral coefficients were
implemented in hardware/software design. The proposed system was investigated in speaker dependent mode for 100 different
Lithuanian words. The robustness of features extraction algorithms was tested recognizing the speech records at different signal to noise rates. The experiments on clean records show highest accuracy for Mel-frequency cepstral and linear frequency cepstral coefficients. For records with 15 dB signal to noise rate the linear predictive cepstral coefficients gives best result. The hard and soft part of the system is clocked on 50 MHz and 100 MHz accordingly. For the classification purpose the pipelined dynamic time warping core was implemented. The proposed word recognition system satisfy the real-time requirements and is suitable for applications in embedded systems.
Abstract: Real time non-invasive Brain Computer Interfaces have a significant progressive role in restoring or maintaining a quality life for medically challenged people. This manuscript provides a comprehensive review of emerging research in the field of cognitive/affective computing in context of human neural responses. The perspectives of different emotion assessment modalities like face expressions, speech, text, gestures, and human physiological responses have also been discussed. Focus has been paid to explore the ability of EEG (Electroencephalogram) signals to portray thoughts, feelings, and unspoken words. An automated workflow-based protocol to design an EEG-based real time Brain Computer Interface system for analysis and classification of human emotions elicited by external audio/visual stimuli has been proposed. The front end hardware includes a cost effective and portable Emotiv EEG Neuroheadset unit, a personal computer and a set of external stimulators. Primary signal analysis and processing of real time acquired EEG shall be performed using MATLAB based advanced brain mapping toolbox EEGLab/BCILab. This shall be followed by the development of MATLAB based self-defined algorithm to capture and characterize temporal and spectral variations in EEG under emotional stimulations. The extracted hybrid feature set shall be used to classify emotional states using artificial intelligence tools like Artificial Neural Network. The final system would result in an inexpensive, portable and more intuitive Brain Computer Interface in real time scenario to control prosthetic devices by translating different brain states into operative control signals.
Abstract: This paper presents an algorithm for reconstructing phase and magnitude responses of the impulse response when only the output data are available. The system is driven by a zero-mean independent identically distributed (i.i.d) non-Gaussian sequence that is not observed. The additive noise is assumed to be Gaussian. This is an important and essential problem in many practical applications of various science and engineering areas such as biomedical, seismic, and speech processing signals. The method is based on evaluating the bicepstrum of the third-order statistics of the observed output data. Simulations results are presented that demonstrate the performance of this method.
Abstract: Mostly transforms are used for speech data
compressions which are lossy algorithms. Such algorithms are
tolerable for speech data compression since the loss in quality is not
perceived by the human ear. However the vector quantization (VQ)
has a potential to give more data compression maintaining the same
quality. In this paper we propose speech data compression algorithm
using vector quantization technique. We have used VQ algorithms
LBG, KPE and FCG. The results table shows computational
complexity of these three algorithms. Here we have introduced a new
performance parameter Average Fractional Change in Speech
Sample (AFCSS). Our FCG algorithm gives far better performance
considering mean absolute error, AFCSS and complexity as
compared to others.
Abstract: A robust still image face localization algorithm
capable of operating in an unconstrained visual environment is
proposed. First, construction of a robust skin classifier within a
shifted HSV color space is described. Then various filtering
operations are performed to better isolate face candidates and
mitigate the effect of substantial non-skin regions. Finally, a novel
Bhattacharyya-based face detection algorithm is used to compare
candidate regions of interest with a unique illumination-dependent
face model probability distribution function approximation.
Experimental results show a 90% face detection success rate despite
the demands of the visually noisy environment.
Abstract: Speech enhancement is the process of eliminating
noise and increasing the quality of a speech signal, which is
contaminated with other kinds of distortions. This paper is on
developing an optimum cascaded system for speech enhancement.
This aim is attained without diminishing any relevant speech
information and without much computational and time complexity.
LMS algorithm, Spectral Subtraction and Kalman filter have been
deployed as the main de-noising algorithms in this work. Since these
algorithms suffer from respective shortcomings, this work has been
undertaken to design cascaded systems in different combinations and
the evaluation of such cascades by qualitative (listening) and
quantitative (SNR) tests.
Abstract: People have the habitual pitch level which is used when people say something generally. However this pitch should be changed irregularly in the presence of noise. So it is useful to estimate SNR of speech signal by pitch. In this paper, we obtain the energy of input speech signal and then we detect a stationary region on voiced speech. And we get the pitch period by NAMDF for the stationary region that is not varied pitch rapidly. After getting pitch, each frame is divided by pitch period and the likelihood of closed pitch is estimated. In this paper, we proposed new parameter, NLF, to estimate the SNR of received speech signal. The NLF is derived from the correlation of near pitch periods. The NLF is obtained for each stationary region in voiced speech. Finally we confirmed good performance of the estimation of the SNR of received input speech in the presence of noise.
Abstract: In this paper, we use Radial Basis Function Networks
(RBFN) for solving the problem of environmental interference
cancellation of speech signal. We show that the Second Order Thin-
Plate Spline (SOTPS) kernel cancels the interferences effectively.
For make comparison, we test our experiments on two conventional
most used RBFN kernels: the Gaussian and First order TPS (FOTPS)
basis functions. The speech signals used here were taken from the
OGI Multi-Language Telephone Speech Corpus database and were
corrupted with six type of environmental noise from NOISEX-92
database. Experimental results show that the SOTPS kernel can
considerably outperform the Gaussian and FOTPS functions on
speech interference cancellation problem.
Abstract: The goal of speech parameterization is to extract the relevant information about what is being spoken from the audio signal. In speech recognition systems Mel-Frequency Cepstral Coefficients (MFCC) and Relative Spectral Mel-Frequency Cepstral Coefficients (RASTA-MFCC) are the two main techniques used. It will be shown in this paper that it presents some modifications to the original MFCC method. In our work the effectiveness of proposed changes to MFCC called Modified Function Cepstral Coefficients (MODFCC) were tested and compared against the original MFCC and RASTA-MFCC features. The prosodic features such as jitter and shimmer are added to baseline spectral features. The above-mentioned techniques were tested with impulsive signals under various noisy conditions within AURORA databases.
Abstract: Discrete Cosine Transform (DCT) based transform coding is very popular in image, video and speech compression due to its good energy compaction and decorrelating properties. However, at low bit rates, the reconstructed images generally suffer from visually annoying blocking artifacts as a result of coarse quantization. Lapped transform was proposed as an alternative to the DCT with reduced blocking artifacts and increased coding gain. Lapped transforms are popular for their good performance, robustness against oversmoothing and availability of fast implementation algorithms. However, there is no proper study reported in the literature regarding the statistical distributions of block Lapped Orthogonal Transform (LOT) and Lapped Biorthogonal Transform (LBT) coefficients. This study performs two goodness-of-fit tests, the Kolmogorov-Smirnov (KS) test and the 2- test, to determine the distribution that best fits the LOT and LBT coefficients. The experimental results show that the distribution of a majority of the significant AC coefficients can be modeled by the Generalized Gaussian distribution. The knowledge of the statistical distribution of transform coefficients greatly helps in the design of optimal quantizers that may lead to minimum distortion and hence achieve optimal coding efficiency.
Abstract: People usually have a telephone voice, which means
they adjust their speech to fit particular situations and to blend in with
other interlocutors. The question is: Do we speak differently to
different people? This possibility has been suggested by social
psychologists within Accommodation Theory [1]. Converging toward
the speech of another person can be regarded as a polite speech
strategy while choosing a language not used by the other interlocutor
can be considered as the clearest example of speech divergence [2].
The present study sets out to investigate such processes in the course
of everyday telephone conversations. Using Joos-s [3] model of
formality in spoken English, the researchers try to explore
convergence to or divergence from the addressee. The results
propound the actuality that lexical choice, and subsequently, patterns
of style vary intriguingly in concordance with the person being
addressed.
Abstract: This paper describes a 3D modeling system in
Augmented Reality environment, named 3DARModeler. It can be
considered a simple version of 3D Studio Max with necessary
functions for a modeling system such as creating objects, applying
texture, adding animation, estimating real light sources and casting
shadows. The 3DARModeler introduces convenient, and effective
human-computer interaction to build 3D models by combining both
the traditional input method (mouse/keyboard) and the tangible input
method (markers). It has the ability to align a new virtual object with
the existing parts of a model. The 3DARModeler targets nontechnical
users. As such, they do not need much knowledge of
computer graphics and modeling techniques. All they have to do is
select basic objects, customize their attributes, and put them together
to build a 3D model in a simple and intuitive way as if they were
doing in the real world. Using the hierarchical modeling technique,
the users are able to group several basic objects to manage them as a
unified, complex object. The system can also connect with other 3D
systems by importing and exporting VRML/3Ds Max files. A
module of speech recognition is included in the system to provide
flexible user interfaces.
Abstract: Acoustical properties of speech have been shown to
be related to mental states of speaker with symptoms: depression
and remission. This paper describes way to address the issue of
distinguishing depressed patients from remitted subjects based on
measureable acoustics change of their spoken sound. The vocal-tract
related frequency characteristics of speech samples from female
remitted and depressed patients were analyzed via speech
processing techniques and consequently, evaluated statistically by
cross-validation with Support Vector Machine. Our results
comparatively show the classifier's performance with effectively
correct separation of 93% determined from testing with the subjectbased
feature model and 88% from the frame-based model based on
the same speech samples collected from hospital visiting interview
sessions between patients and psychiatrists.
Abstract: This paper presents a new strategy of identification
and classification of pathological voices using the hybrid method
based on wavelet transform and neural networks. After speech
acquisition from a patient, the speech signal is analysed in order to
extract the acoustic parameters such as the pitch, the formants, Jitter,
and shimmer. Obtained results will be compared to those normal and
standard values thanks to a programmable database. Sounds are
collected from normal people and patients, and then classified into
two different categories. Speech data base is consists of several
pathological and normal voices collected from the national hospital
“Rabta-Tunis". Speech processing algorithm is conducted in a
supervised mode for discrimination of normal and pathology voices
and then for classification between neural and vocal pathologies
(Parkinson, Alzheimer, laryngeal, dyslexia...). Several simulation
results will be presented in function of the disease and will be
compared with the clinical diagnosis in order to have an objective
evaluation of the developed tool.
Abstract: One of the essential components of much of DSP
application is noise cancellation. Changes in real time signals are
quite rapid and swift. In noise cancellation, a reference signal which
is an approximation of noise signal (that corrupts the original
information signal) is obtained and then subtracted from the noise
bearing signal to obtain a noise free signal. This approximation of
noise signal is obtained through adaptive filters which are self
adjusting. As the changes in real time signals are abrupt, this needs
adaptive algorithm that converges fast and is stable. Least mean
square (LMS) and normalized LMS (NLMS) are two widely used
algorithms because of their plainness in calculations and
implementation. But their convergence rates are small. Adaptive
averaging filters (AFA) are also used because they have high
convergence, but they are less stable. This paper provides the
comparative study of LMS and Normalized NLMS, AFA and new
enhanced average adaptive (Average NLMS-ANLMS) filters for noise
cancelling application using speech signals.
Abstract: Different pseudo-random or pseudo-noise (PN) as well as orthogonal sequences that can be used as spreading codes for code division multiple access (CDMA) cellular networks or can be used for encrypting speech signals to reduce the residual intelligence are investigated. We briefly review the theoretical background for direct sequence CDMA systems and describe the main characteristics of the maximal length, Gold, Barker, and Kasami sequences. We also discuss about variable- and fixed-length orthogonal codes like Walsh- Hadamard codes. The equivalence of PN and orthogonal codes are also derived. Finally, a new PN sequence is proposed which is shown to have certain better properties than the existing codes.
Abstract: Investigating language acquisition is one of the most
challenging problems in the area of studying language. Syllable
learning as a level of language acquisition has a considerable
significance since it plays an important role in language acquisition.
Because of impossibility of studying language acquisition directly
with children, especially in its developmental phases, computer
models will be useful in examining language acquisition. In this
paper a computer model of early language learning for syllable
learning is proposed. It is guided by a conceptual model of syllable
learning which is named Directions Into Velocities of Articulators
model (DIVA). The computer model uses simple associational and
reinforcement learning rules within neural network architecture
which are inspired by neuroscience. Our simulation results verify the
ability of the proposed computer model in producing phonemes
during babbling and early speech. Also, it provides a framework for
examining the neural basis of language learning and communication
disorders.
Abstract: The traditional public relations manager is usually responsible for maintaining and enhancing the reputation of the organization among key publics. While the principal focus of this effort is on support publics, it is quite clearly recognized that an organization's image has important effects on its own employees, its donors and volunteers, and its clients. The aim of paper is to define application`s aspects of public relations media and tools by nonprofit organizations in Albanian reality. Actually does used public relations media and tools, like written material, audiovisual material, organizational identity media, news, interviews and speeches, events, web sites by nonprofit organizations to attract donors? If, public relations media and tools are used, does exists a relation between public relation media and fundraising?