Information Filtering using Index Word Selection based on the Topics

We have proposed an information filtering system using index word selection from a document set based on the topics included in a set of documents. This method narrows down the particularly characteristic words in a document set and the topics are obtained by Sparse Non-negative Matrix Factorization. In information filtering, a document is often represented with the vector in which the elements correspond to the weight of the index words, and the dimension of the vector becomes larger as the number of documents is increased. Therefore, it is possible that useless words as index words for the information filtering are included. In order to address the problem, the dimension needs to be reduced. Our proposal reduces the dimension by selecting index words based on the topics included in a document set. We have applied the Sparse Non-negative Matrix Factorization to the document set to obtain these topics. The filtering is carried out based on a centroid of the learning document set. The centroid is regarded as the user-s interest. In addition, the centroid is represented with a document vector whose elements consist of the weight of the selected index words. Using the English test collection MEDLINE, thus, we confirm the effectiveness of our proposal. Hence, our proposed selection can confirm the improvement of the recommendation accuracy from the other previous methods when selecting the appropriate number of index words. In addition, we discussed the selected index words by our proposal and we found our proposal was able to select the index words covered some minor topics included in the document set.

Application of l1-Norm Minimization Technique to Image Retrieval

Image retrieval is a topic where scientific interest is currently high. The important steps associated with image retrieval system are the extraction of discriminative features and a feasible similarity metric for retrieving the database images that are similar in content with the search image. Gabor filtering is a widely adopted technique for feature extraction from the texture images. The recently proposed sparsity promoting l1-norm minimization technique finds the sparsest solution of an under-determined system of linear equations. In the present paper, the l1-norm minimization technique as a similarity metric is used in image retrieval. It is demonstrated through simulation results that the l1-norm minimization technique provides a promising alternative to existing similarity metrics. In particular, the cases where the l1-norm minimization technique works better than the Euclidean distance metric are singled out.

Novel Adaptive Channel Equalization Algorithms by Statistical Sampling

In this paper, novel statistical sampling based equalization techniques and CNN based detection are proposed to increase the spectral efficiency of multiuser communication systems over fading channels. Multiuser communication combined with selective fading can result in interferences which severely deteriorate the quality of service in wireless data transmission (e.g. CDMA in mobile communication). The paper introduces new equalization methods to combat interferences by minimizing the Bit Error Rate (BER) as a function of the equalizer coefficients. This provides higher performance than the traditional Minimum Mean Square Error equalization. Since the calculation of BER as a function of the equalizer coefficients is of exponential complexity, statistical sampling methods are proposed to approximate the gradient which yields fast equalization and superior performance to the traditional algorithms. Efficient estimation of the gradient is achieved by using stratified sampling and the Li-Silvester bounds. A simple mechanism is derived to identify the dominant samples in real-time, for the sake of efficient estimation. The equalizer weights are adapted recursively by minimizing the estimated BER. The near-optimal performance of the new algorithms is also demonstrated by extensive simulations. The paper has also developed a (Cellular Neural Network) CNN based approach to detection. In this case fast quadratic optimization has been carried out by t, whereas the task of equalizer is to ensure the required template structure (sparseness) for the CNN. The performance of the method has also been analyzed by simulations.

Partially Knowing of Least Support Orthogonal Matching Pursuit (PKLS-OMP) for Recovering Signal

Given a large sparse signal, great wishes are to reconstruct the signal precisely and accurately from lease number of measurements as possible as it could. Although this seems possible by theory, the difficulty is in built an algorithm to perform the accuracy and efficiency of reconstructing. This paper proposes a new proved method to reconstruct sparse signal depend on using new method called Least Support Matching Pursuit (LS-OMP) merge it with the theory of Partial Knowing Support (PSK) given new method called Partially Knowing of Least Support Orthogonal Matching Pursuit (PKLS-OMP). The new methods depend on the greedy algorithm to compute the support which depends on the number of iterations. So to make it faster, the PKLS-OMP adds the idea of partial knowing support of its algorithm. It shows the efficiency, simplicity, and accuracy to get back the original signal if the sampling matrix satisfies the Restricted Isometry Property (RIP). Simulation results also show that it outperforms many algorithms especially for compressible signals.

Feature Selection with Kohonen Self Organizing Classification Algorithm

In this paper a one-dimension Self Organizing Map algorithm (SOM) to perform feature selection is presented. The algorithm is based on a first classification of the input dataset on a similarity space. From this classification for each class a set of positive and negative features is computed. This set of features is selected as result of the procedure. The procedure is evaluated on an in-house dataset from a Knowledge Discovery from Text (KDT) application and on a set of publicly available datasets used in international feature selection competitions. These datasets come from KDT applications, drug discovery as well as other applications. The knowledge of the correct classification available for the training and validation datasets is used to optimize the parameters for positive and negative feature extractions. The process becomes feasible for large and sparse datasets, as the ones obtained in KDT applications, by using both compression techniques to store the similarity matrix and speed up techniques of the Kohonen algorithm that take advantage of the sparsity of the input matrix. These improvements make it feasible, by using the grid, the application of the methodology to massive datasets.

Some Computational Results on MPI Parallel Implementation of Dense Simplex Method

There are two major variants of the Simplex Algorithm: the revised method and the standard, or tableau method. Today, all serious implementations are based on the revised method because it is more efficient for sparse linear programming problems. Moreover, there are a number of applications that lead to dense linear problems so our aim in this paper is to present some computational results on parallel implementation of dense Simplex Method. Our implementation is implemented on a SMP cluster using C programming language and the Message Passing Interface MPI. Preliminary computational results on randomly generated dense linear programs support our results.

A Patricia-Tree Approach for Frequent Closed Itemsets

In this paper, we propose an adaptation of the Patricia-Tree for sparse datasets to generate non redundant rule associations. Using this adaptation, we can generate frequent closed itemsets that are more compact than frequent itemsets used in Apriori approach. This adaptation has been experimented on a set of datasets benchmarks.

A Computer Aided Detection (CAD) System for Microcalcifications in Mammograms - MammoScan mCaD

Clusters of microcalcifications in mammograms are an important sign of breast cancer. This paper presents a complete Computer Aided Detection (CAD) scheme for automatic detection of clustered microcalcifications in digital mammograms. The proposed system, MammoScan μCaD, consists of three main steps. Firstly all potential microcalcifications are detected using a a method for feature extraction, VarMet, and adaptive thresholding. This will also give a number of false detections. The goal of the second step, Classifier level 1, is to remove everything but microcalcifications. The last step, Classifier level 2, uses learned dictionaries and sparse representations as a texture classification technique to distinguish single, benign microcalcifications from clustered microcalcifications, in addition to remove some remaining false detections. The system is trained and tested on true digital data from Stavanger University Hospital, and the results are evaluated by radiologists. The overall results are promising, with a sensitivity > 90 % and a low false detection rate (approx 1 unwanted pr. image, or 0.3 false pr. image).

Boundary-Element-Based Finite Element Methods for Helmholtz and Maxwell Equations on General Polyhedral Meshes

We present new finite element methods for Helmholtz and Maxwell equations on general three-dimensional polyhedral meshes, based on domain decomposition with boundary elements on the surfaces of the polyhedral volume elements. The methods use the lowest-order polynomial spaces and produce sparse, symmetric linear systems despite the use of boundary elements. Moreover, piecewise constant coefficients are admissible. The resulting approximation on the element surfaces can be extended throughout the domain via representation formulas. Numerical experiments confirm that the convergence behavior on tetrahedral meshes is comparable to that of standard finite element methods, and equally good performance is attained on more general meshes.

Weight Functions for Signal Reconstruction Based On Level Crossings

Although the level crossing concept has been the subject of intensive investigation over the last few years, certain problems of great interest remain unsolved. One of these concern is distribution of threshold levels. This paper presents a new threshold level allocation schemes for level crossing based on nonuniform sampling. Intuitively, it is more reasonable if the information rich regions of the signal are sampled finer and those with sparse information are sampled coarser. To achieve this objective, we propose non-linear quantization functions which dynamically assign the number of quantization levels depending on the importance of the given amplitude range. Two new approaches to determine the importance of the given amplitude segment are presented. The proposed methods are based on exponential and logarithmic functions. Various aspects of proposed techniques are discussed and experimentally validated. Its efficacy is investigated by comparison with uniform sampling.

Learning an Overcomplete Dictionary using a Cauchy Mixture Model for Sparse Decay

An algorithm for learning an overcomplete dictionary using a Cauchy mixture model for sparse decomposition of an underdetermined mixing system is introduced. The mixture density function is derived from a ratio sample of the observed mixture signals where 1) there are at least two but not necessarily more mixture signals observed, 2) the source signals are statistically independent and 3) the sources are sparse. The basis vectors of the dictionary are learned via the optimization of the location parameters of the Cauchy mixture components, which is shown to be more accurate and robust than the conventional data mining methods usually employed for this task. Using a well known sparse decomposition algorithm, we extract three speech signals from two mixtures based on the estimated dictionary. Further tests with additive Gaussian noise are used to demonstrate the proposed algorithm-s robustness to outliers.

Sparse Networks-Based Speedup Technique for Proteins Betweenness Centrality Computation

The study of proteomics reached unexpected levels of interest, as a direct consequence of its discovered influence over some complex biological phenomena, such as problematic diseases like cancer. This paper presents the latest authors- achievements regarding the analysis of the networks of proteins (interactome networks), by computing more efficiently the betweenness centrality measure. The paper introduces the concept of betweenness centrality, and then describes how betweenness computation can help the interactome net- work analysis. Current sequential implementations for the between- ness computation do not perform satisfactory in terms of execution times. The paper-s main contribution is centered towards introducing a speedup technique for the betweenness computation, based on modified shortest path algorithms for sparse graphs. Three optimized generic algorithms for betweenness computation are described and implemented, and their performance tested against real biological data, which is part of the IntAct dataset.

Effects of Data Correlation in a Sparse-View Compressive Sensing Based Image Reconstruction

Computed tomography and laminography are heavily investigated in a compressive sensing based image reconstruction framework to reduce the dose to the patients as well as to the radiosensitive devices such as multilayer microelectronic circuit boards. Nowadays researchers are actively working on optimizing the compressive sensing based iterative image reconstruction algorithm to obtain better quality images. However, the effects of the sampled data’s properties on reconstructed the image’s quality, particularly in an insufficient sampled data conditions have not been explored in computed laminography. In this paper, we investigated the effects of two data properties i.e. sampling density and data incoherence on the reconstructed image obtained by conventional computed laminography and a recently proposed method called spherical sinusoidal scanning scheme. We have found that in a compressive sensing based image reconstruction framework, the image quality mainly depends upon the data incoherence when the data is uniformly sampled.

Apply Super-SVA to SAR Imaging with Both Aperture Gaps and Bandwidth Gaps

Synthetic aperture radar (SAR) imaging usually requires echo data collected continuously pulse by pulse with certain bandwidth. However in real situation, data collection or part of signal spectrum can be interrupted due to various reasons, i.e. there will be gaps in spatial spectrum. In this case we need to find ways to fill out the resulted gaps and get image with defined resolution. In this paper we introduce our work on how to apply iterative spatially variant apodization (Super-SVA) technique to extrapolate the spatial spectrum in both azimuthal and range directions so as to fill out the gaps and get correct radar image.

Culturally Enhanced Collaborative Filtering

We propose an enhanced collaborative filtering method using Hofstede-s cultural dimensions, calculated for 111 countries. We employ 4 of these dimensions, which are correlated to the costumers- buying behavior, in order to detect users- preferences for items. In addition, several advantages of this method demonstrated for data sparseness and cold-start users, which are important challenges in collaborative filtering. We present experiments using a real dataset, Book Crossing Dataset. Experimental results shows that the proposed algorithm provide significant advantages in terms of improving recommendation quality.

Symbolic Analysis of Large Circuits Using Discrete Wavelet Transform

Symbolic Circuit Analysis (SCA) is a technique used to generate the symbolic expression of a network. It has become a well-established technique in circuit analysis and design. The symbolic expression of networks offers excellent way to perform frequency response analysis, sensitivity computation, stability measurements, performance optimization, and fault diagnosis. Many approaches have been proposed in the area of SCA offering different features and capabilities. Numerical Interpolation methods are very common in this context, especially by using the Fast Fourier Transform (FFT). The aim of this paper is to present a method for SCA that depends on the use of Wavelet Transform (WT) as a mathematical tool to generate the symbolic expression for large circuits with minimizing the analysis time by reducing the number of computations.

Frame Texture Classification Method (FTCM) Applied on Mammograms for Detection of Abnormalities

Texture classification is an important image processing task with a broad application range. Many different techniques for texture classification have been explored. Using sparse approximation as a feature extraction method for texture classification is a relatively new approach, and Skretting et al. recently presented the Frame Texture Classification Method (FTCM), showing very good results on classical texture images. As an extension of that work the FTCM is here tested on a real world application as detection of abnormalities in mammograms. Some extensions to the original FTCM that are useful in some applications are implemented; two different smoothing techniques and a vector augmentation technique. Both detection of microcalcifications (as a primary detection technique and as a last stage of a detection scheme), and soft tissue lesions in mammograms are explored. All the results are interesting, and especially the results using FTCM on regions of interest as the last stage in a detection scheme for microcalcifications are promising.

Through Biometric Card in Romania: Person Identification by Face, Fingerprint and Voice Recognition

In this paper three different approaches for person verification and identification, i.e. by means of fingerprints, face and voice recognition, are studied. Face recognition uses parts-based representation methods and a manifold learning approach. The assessment criterion is recognition accuracy. The techniques under investigation are: a) Local Non-negative Matrix Factorization (LNMF); b) Independent Components Analysis (ICA); c) NMF with sparse constraints (NMFsc); d) Locality Preserving Projections (Laplacianfaces). Fingerprint detection was approached by classical minutiae (small graphical patterns) matching through image segmentation by using a structural approach and a neural network as decision block. As to voice / speaker recognition, melodic cepstral and delta delta mel cepstral analysis were used as main methods, in order to construct a supervised speaker-dependent voice recognition system. The final decision (e.g. “accept-reject" for a verification task) is taken by using a majority voting technique applied to the three biometrics. The preliminary results, obtained for medium databases of fingerprints, faces and voice recordings, indicate the feasibility of our study and an overall recognition precision (about 92%) permitting the utilization of our system for a future complex biometric card.

A Kernel Classifier using Linearised Bregman Iteration

In this paper we introduce a novel kernel classifier based on a iterative shrinkage algorithm developed for compressive sensing. We have adopted Bregman iteration with soft and hard shrinkage functions and generalized hinge loss for solving l1 norm minimization problem for classification. Our experimental results with face recognition and digit classification using SVM as the benchmark have shown that our method has a close error rate compared to SVM but do not perform better than SVM. We have found that the soft shrinkage method give more accuracy and in some situations more sparseness than hard shrinkage methods.

Review and Experiments on SDMSCue

In this work, I present a review on Sparse Distributed Memory for Small Cues (SDMSCue), a variant of Sparse Distributed Memory (SDM) that is capable of handling small cues. I then conduct and show some cognitive experiments on SDMSCue to test its cognitive soundness compared to SDM. Small cues refer to input cues that are presented to memory for reading associations; but have many missing parts or fields from them. The original SDM failed to handle such a problem. SDMSCue handles and overcomes this pitfall. The main idea in SDMSCue; is the repeated projection of the semantic space on smaller subspaces; that are selected based on the input cue length and pattern. This process allows for Read/Write operations using an input cue that is missing a large portion. SDMSCue is augmented with the use of genetic algorithms for memory allocation and initialization. I claim that SDM functionality is a subset of SDMSCue functionality.