Abstract: In order to accelerate the similarity search in highdimensional database, we propose a new hierarchical indexing method. It is composed of offline and online phases. Our contribution concerns both phases. In the offline phase, after gathering the whole of the data in clusters and constructing a hierarchical index, the main originality of our contribution consists to develop a method to construct bounding forms of clusters to avoid overlapping. For the online phase, our idea improves considerably performances of similarity search. However, for this second phase, we have also developed an adapted search algorithm. Our method baptized NOHIS (Non-Overlapping Hierarchical Index Structure) use the Principal Direction Divisive Partitioning (PDDP) as algorithm of clustering. The principle of the PDDP is to divide data recursively into two sub-clusters; division is done by using the hyper-plane orthogonal to the principal direction derived from the covariance matrix and passing through the centroid of the cluster to divide. Data of each two sub-clusters obtained are including by a minimum bounding rectangle (MBR). The two MBRs are directed according to the principal direction. Consequently, the nonoverlapping between the two forms is assured. Experiments use databases containing image descriptors. Results show that the proposed method outperforms sequential scan and SRtree in processing k-nearest neighbors.
Abstract: A number of automated shot-change detection
methods for indexing a video sequence to facilitate browsing and
retrieval have been proposed in recent years. This paper emphasizes
on the simulation of video shot boundary detection using one of the
methods of the color histogram wherein scaling of the histogram
metrics is an added feature. The difference between the histograms of
two consecutive frames is evaluated resulting in the metrics. Further
scaling of the metrics is performed to avoid ambiguity and to enable
the choice of apt threshold for any type of videos which involves
minor error due to flashlight, camera motion, etc. Two sample videos
are used here with resolution of 352 X 240 pixels using color
histogram approach in the uncompressed media. An attempt is made
for the retrieval of color video. The simulation is performed for the
abrupt change in video which yields 90% recall and precision value.
Abstract: Pattern recognition and image recognition methods are commonly developed and tested using testbeds, which contain known responses to a query set. Until now, testbeds available for image analysis and content-based image retrieval (CBIR) have been scarce and small-scale. Here we present the one million images CEA-List Image Collection (CLIC) testbed that we have produced, and report on our use of this testbed to evaluate image analysis merging techniques. This testbed will soon be made publicly available through the EU MUSCLE Network of Excellence.
Abstract: The paper describes a new approach for fingerprint
classification, based on the distribution of local features (minute
details or minutiae) of the fingerprints. The main advantage is that
fingerprint classification provides an indexing scheme to facilitate
efficient matching in a large fingerprint database. A set of rules based
on heuristic approach has been proposed. The area around the core
point is treated as the area of interest for extracting the minutiae
features as there are substantial variations around the core point as
compared to the areas away from the core point. The core point in a
fingerprint has been located at a point where there is maximum
curvature. The experimental results report an overall average
accuracy of 86.57 % in fingerprint classification.
Abstract: On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.
Abstract: This paper describes a novel and effective approach to content-based image retrieval (CBIR) that represents each image in the database by a vector of feature values called “Standard deviation of mean vectors of color distribution of rows and columns of images for CBIR". In many areas of commerce, government, academia, and hospitals, large collections of digital images are being created. This paper describes the approach that uses contents as feature vector for retrieval of similar images. There are several classes of features that are used to specify queries: colour, texture, shape, spatial layout. Colour features are often easily obtained directly from the pixel intensities. In this paper feature extraction is done for the texture descriptor that is 'variance' and 'Variance of Variances'. First standard deviation of each row and column mean is calculated for R, G, and B planes. These six values are obtained for one image which acts as a feature vector. Secondly we calculate variance of the row and column of R, G and B planes of an image. Then six standard deviations of these variance sequences are calculated to form a feature vector of dimension six. We applied our approach to a database of 300 BMP images. We have determined the capability of automatic indexing by analyzing image content: color and texture as features and by applying a similarity measure Euclidean distance.
Abstract: This paper proposes a new method for image searches and image indexing in databases with a color temperature histogram. The color temperature histogram can be used for performance improvement of content–based image retrieval by using a combination of color temperature and histogram. The color temperature histogram can be represented by a range of 46 colors. That is more than the color histogram and the dominant color temperature. Moreover, with our method the colors that have the same color temperature can be separated while the dominant color temperature can not. The results showed that the color temperature histogram retrieved an accurate image more often than the dominant color temperature method or color histogram method. This also took less time so the color temperature can be used for indexing and searching for images.
Abstract: Phrases has a long history in information retrieval, particularly in commercial systems. Implicit semantic relationship between words in a form of BaseNP have shown significant improvement in term of precision in many IR studies. Our research focuses on linguistic phrases which is language dependent. Our results show that using BaseNP can improve performance although above 62% of words formation in Malay Language based on derivational affixes and suffixes.
Abstract: In this paper we address the problem of musical style
classification, which has a number of applications like indexing in
musical databases or automatic composition systems. Starting from
MIDI files of real-world improvisations, we extract the melody track
and cut it into overlapping segments of equal length. From these
fragments, some numerical features are extracted as descriptors of
style samples. We show that a standard Bayesian classifier can be
conveniently employed to build an effective musical style classifier,
once this set of features has been extracted from musical data.
Preliminary experimental results show the effectiveness of the
developed classifier that represents the first component of a musical
audio retrieval system
Abstract: In this work we develop an object extraction method
and propose efficient algorithms for object motion characterization.
The set of proposed tools serves as a basis for development of objectbased
functionalities for manipulation of video content. The
estimators by different algorithms are compared in terms of quality
and performance and tested on real video sequences. The proposed
method will be useful for the latest standards of encoding and
description of multimedia content – MPEG4 and MPEG7.
Abstract: This paper introduces and studies new indexing techniques for content-based queries in images databases. Indexing is the key to providing sophisticated, accurate and fast searches for queries in image data. This research describes a new indexing approach, which depends on linear modeling of signals, using bases for modeling. A basis is a set of chosen images, and modeling an image is a least-squares approximation of the image as a linear combination of the basis images. The coefficients of the basis images are taken together to serve as index for that image. The paper describes the implementation of the indexing scheme, and presents the findings of our extensive evaluation that was conducted to optimize (1) the choice of the basis matrix (B), and (2) the size of the index A (N). Furthermore, we compare the performance of our indexing scheme with other schemes. Our results show that our scheme has significantly higher performance.
Abstract: In this paper, we present a new and effective image indexing technique that extracts features directly from DCT domain. Our proposed approach is an object-based image indexing. For each block of size 8*8 in DCT domain a feature vector is extracted. Then, feature vectors of all blocks of image using a k-means algorithm is clustered into groups. Each cluster represents a special object of the image. Then we select some clusters that have largest members after clustering. The centroids of the selected clusters are taken as image feature vectors and indexed into the database. Also, we propose an approach for using of proposed image indexing method in automatic image classification. Experimental results on a database of 800 images from 8 semantic groups in automatic image classification are reported.
Abstract: Content-based Image Retrieval (CBIR) aims at searching image databases for specific images that are similar to a given query image based on matching of features derived from the image content. This paper focuses on a low-dimensional color based indexing technique for achieving efficient and effective retrieval performance. In our approach, the color features are extracted using the mean shift algorithm, a robust clustering technique. Then the cluster (region) mode is used as representative of the image in 3-D color space. The feature descriptor consists of the representative color of a region and is indexed using a spatial indexing method that uses *R -tree thus avoiding the high-dimensional indexing problems associated with the traditional color histogram. Alternatively, the images in the database are clustered based on region feature similarity using Euclidian distance. Only representative (centroids) features of these clusters are indexed using *R -tree thus improving the efficiency. For similarity retrieval, each representative color in the query image or region is used independently to find regions containing that color. The results of these methods are compared. A JAVA based query engine supporting query-by- example is built to retrieve images by color.
Abstract: Apart from geometry, functionality is one of the most
significant hallmarks of a product. The functionality of a product can
be considered as the fundamental justification for a product
existence. Therefore a functional analysis including a complete and
reliable descriptor has a high potential to improve product
development process in various fields especially in knowledge-based
design. One of the important applications of the functional analysis
and indexing is in retrieval and design reuse concept. More than 75%
of design activity for a new product development contains reusing
earlier and existing design know-how. Thus, analysis and
categorization of product functions concluded by functional
indexing, influences directly in design optimization. This paper
elucidates and evaluates major classes for functional analysis by
discussing their major methods. Moreover it is finalized by
presenting a noble hybrid approach for functional analysis.
Abstract: Over the past few years, XML (eXtensible Mark-up
Language) has emerged as the standard for information
representation and data exchange over the Internet. This paper
provides a kick-start for new researches venturing in XML databases
field. We survey the storage representation for XML document,
review the XML query processing and optimization techniques with
respect to the particular storage instance. Various optimization
technologies have been developed to solve the query retrieval and
updating problems. Towards the later year, most researchers
proposed hybrid optimization techniques. Hybrid system opens the
possibility of covering each technology-s weakness by its strengths.
This paper reviews the advantages and limitations of optimization
techniques.
Abstract: This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing on the words and their statistical distributions in documents. The main contributions of the technique are that it integrates XML technology with Information Retrieval scheme (TFIDF) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) and use Data Mining technique for association rules discovery. It consists of three phases: Text Preprocessing phase (transformation, filtration, stemming and indexing of the documents), Association Rule Mining (ARM) phase (applying our designed algorithm for Generating Association Rules based on Weighting scheme GARW) and Visualization phase (visualization of results). Experiments applied on WebPages news documents related to the outbreak of the bird flu disease. The extracted association rules contain important features and describe the informative news included in the documents collection. The performance of the EART system compared with another system that uses the Apriori algorithm throughout the execution time and evaluating extracted association rules.
Abstract: The main goal in this paper is to quantify the quality of
different techniques for radiation treatment plans, a back-propagation
artificial neural network (ANN) combined with biomedicine theory
was used to model thirteen dosimetric parameters and to calculate
two dosimetric indices. The correlations between dosimetric indices
and quality of life were extracted as the features and used in the ANN
model to make decisions in the clinic. The simulation results show
that a trained multilayer back-propagation neural network model can
help a doctor accept or reject a plan efficiently. In addition, the
models are flexible and whenever a new treatment technique enters
the market, the feature variables simply need to be imported and the
model re-trained for it to be ready for use.
Abstract: The paper discusses the mathematics of pattern
indexing and its applications to recognition of visual patterns that are
found in video clips. It is shown that (a) pattern indexes can be
represented by collections of inverted patterns, (b) solutions to
pattern classification problems can be found as intersections and
histograms of inverted patterns and, thus, matching of original
patterns avoided.
Abstract: XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query.
Abstract: The main idea behind in network aggregation is that,
rather than sending individual data items from sensors to sinks,
multiple data items are aggregated as they are forwarded by the
sensor network. Existing sensor network data aggregation techniques
assume that the nodes are preprogrammed and send data to a central
sink for offline querying and analysis. This approach faces two major
drawbacks. First, the system behavior is preprogrammed and cannot
be modified on the fly. Second, the increased energy wastage due to
the communication overhead will result in decreasing the overall
system lifetime. Thus, energy conservation is of prime consideration
in sensor network protocols in order to maximize the network-s
operational lifetime. In this paper, we give an energy efficient
approach to query processing by implementing new optimization
techniques applied to in-network aggregation. We first discuss earlier
approaches in sensors data management and highlight their
disadvantages. We then present our approach “Energy Efficient
Indexed Aggregation" (EEIA) and evaluate it through several
simulations to prove its efficiency, competence and effectiveness.