Abstract: The emergence of the Internet has brewed the
revolution of information storage and retrieval. As most of the
data in the web is unstructured, and contains a mix of text,
video, audio etc, there is a need to mine information to cater to
the specific needs of the users without loss of important
hidden information. Thus developing user friendly and
automated tools for providing relevant information quickly
becomes a major challenge in web mining research. Most of
the existing web mining algorithms have concentrated on
finding frequent patterns while neglecting the less frequent
ones that are likely to contain outlying data such as noise,
irrelevant and redundant data. This paper mainly focuses on
Signed approach and full word matching on the organized
domain dictionary for mining web content outliers. This
Signed approach gives the relevant web documents as well as
outlying web documents. As the dictionary is organized based
on the number of characters in a word, searching and retrieval
of documents takes less time and less space.
Abstract: With the enormous growth on the web, users get easily
lost in the rich hyper structure. Thus developing user friendly and
automated tools for providing relevant information without any
redundant links to the users to cater to their needs is the primary task
for the website owners. Most of the existing web mining algorithms
have concentrated on finding frequent patterns while neglecting the
less frequent one that are likely to contain the outlying data such as
noise, irrelevant and redundant data. This paper proposes new
algorithm for mining the web content by detecting the redundant
links from the web documents using set theoretical(classical
mathematics) such as subset, union, intersection etc,. Then the
redundant links is removed from the original web content to get the
required information by the user..