Abstract: Automatic transformation of paper documents into electronic documents requires document segmentation at the first stage. However, some parameters restrictions such as variations in character font sizes, different text line spacing, and also not uniform document layout structures altogether have made it difficult to design a general-purpose document layout analysis algorithm for many years. Thus in most previously reported methods it is inevitable to include these parameters. This problem becomes excessively acute and severe, especially in Persian/Arabic documents. Since the Persian/Arabic scripts differ considerably from the English scripts, most of the proposed methods for the English scripts do not render good results for the Persian scripts. In this paper, we present a novel parameter-free method for segmenting the Persian/Arabic document images which also works well for English scripts. This method segments the document image into maximal homogeneous regions and identifies them as texts and non-texts based on a pyramidal image structure. In other words the proposed method is capable of document segmentation without considering the character font sizes, text line spacing, and document layout structures. This algorithm is examined for 150 Arabic/Persian and English documents and document segmentation process are done successfully for 96 percent of documents.
Abstract: The paper shows how the CASMAS modeling language,
and its associated pervasive computing architecture, can be
used to facilitate continuity of care by providing members of patientcentered
communities of care with a support to cooperation and
knowledge sharing through the usage of electronic documents and
digital devices. We consider a scenario of clearly fragmented care to
show how proper mechanisms can be defined to facilitate a better
integration of practices and information across heterogeneous care
networks. The scenario is declined in terms of architectural components
and cooperation-oriented mechanisms that make the support
reactive to the evolution of the context where these communities
operate.
Abstract: Text categorization (the assignment of texts in natural language into predefined categories) is an important and extensively studied problem in Machine Learning. Currently, popular techniques developed to deal with this task include many preprocessing and learning algorithms, many of which in turn require tuning nontrivial internal parameters. Although partial studies are available, many authors fail to report values of the parameters they use in their experiments, or reasons why these values were used instead of others. The goal of this work then is to create a more thorough comparison of preprocessing parameters and their mutual influence, and report interesting observations and results.
Abstract: Advent enhancements in the field of computing have
increased massive use of web based electronic documents. Current
Copyright protection laws are inadequate to prove the ownership for
electronic documents and do not provide strong features against
copying and manipulating information from the web. This has
opened many channels for securing information and significant
evolutions have been made in the area of information security.
Digital Watermarking has developed into a very dynamic area of
research and has addressed challenging issues for digital content.
Watermarking can be visible (logos or signatures) and invisible
(encoding and decoding). Many visible watermarking techniques
have been studied for text documents but there are very few for web
based text. XML files are used to trade information on the internet
and contain important information. In this paper, two invisible
watermarking techniques using Synonyms and Acronyms are
proposed for XML files to prove the intellectual ownership and to
achieve the security. Analysis is made for different attacks and
amount of capacity to be embedded in the XML file is also noticed.
A comparative analysis for capacity is also made for both methods.
The system has been implemented using C# language and all tests are
made practically to get the results.
Abstract: Software testing is important stage of software development cycle. Current testing process involves tester and electronic documents with test case scenarios. In this paper we focus on new approach to testing process using automated test case generation and tester guidance through the system based on the model of the system. Test case generation and model-based testing is not possible without proper system model. We aim on providing better feedback from the testing process thus eliminating the unnecessary paper work.