Abstract: In this paper, an intelligent algorithm for optimal
document archiving is presented. It is kown that electronic archives
are very important for information system management. Minimizing
the size of the stored data in electronic archive is a main issue to
reduce the physical storage area. Here, the effect of different types of
Arabic fonts on electronic archives size is discussed. Simulation
results show that PDF is the best file format for storage of the Arabic
documents in electronic archive. Furthermore, fast information
detection in a given PDF file is introduced. Such approach uses fast
neural networks (FNNs) implemented in the frequency domain. The
operation of these networks relies on performing cross correlation in
the frequency domain rather than spatial one. It is proved
mathematically and practically that the number of computation steps
required for the presented FNNs is less than that needed by
conventional neural networks (CNNs). Simulation results using
MATLAB confirm the theoretical computations.
Abstract: The emergence of the Internet has brewed the
revolution of information storage and retrieval. As most of the
data in the web is unstructured, and contains a mix of text,
video, audio etc, there is a need to mine information to cater to
the specific needs of the users without loss of important
hidden information. Thus developing user friendly and
automated tools for providing relevant information quickly
becomes a major challenge in web mining research. Most of
the existing web mining algorithms have concentrated on
finding frequent patterns while neglecting the less frequent
ones that are likely to contain outlying data such as noise,
irrelevant and redundant data. This paper mainly focuses on
Signed approach and full word matching on the organized
domain dictionary for mining web content outliers. This
Signed approach gives the relevant web documents as well as
outlying web documents. As the dictionary is organized based
on the number of characters in a word, searching and retrieval
of documents takes less time and less space.