Measuring the Structural Similarity of Web-based Documents: A Novel Approach

Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.

NOHIS-Tree: High-Dimensional Index Structure for Similarity Search

In Content-Based Image Retrieval systems it is important to use an efficient indexing technique in order to perform and accelerate the search in huge databases. The used indexing technique should also support the high dimensions of image features. In this paper we present the hierarchical index NOHIS-tree (Non Overlapping Hierarchical Index Structure) when we scale up to very large databases. We also present a study of the influence of clustering on search time. The performance test results show that NOHIS-tree performs better than SR-tree. Tests also show that NOHIS-tree keeps its performances in high dimensional spaces. We include the performance test that try to determine the number of clusters in NOHIS-tree to have the best search time.

A Study of the Problems and Demands of Community Leaders- Training in the Upper Northeastern Region

This research is aimed at studying the nature of problems and demands of the training for community leaders in the upper northeastern region of Thailand. Population and group samplings are based on 360 community leaders in the region who have experienced prior training from the Udonthani Rajabhat University. Stratified random samplings have been drawn upon 186 participants. The research tools is questionnaires. The frequency, percentage and standard deviation are employed in data analysis. The findings indicate that most of community leaders are males and senior adults. The problems in training are associated with the inconveniences of long-distance travelling to training locations, inadequacy of learning centers and training sites and high training costs. The demand of training is basically motivated by a desire for self-development in modern knowledge in keeping up-to-date with the changing world and the need for technological application and facilitation in shortening the distance to training locations and in limiting expensive training costs.

CART Method for Modeling the Output Power of Copper Bromide Laser

This paper examines the available experiment data for a copper bromide vapor laser (CuBr laser), emitting at two wavelengths - 510.6 and 578.2nm. Laser output power is estimated based on 10 independent input physical parameters. A classification and regression tree (CART) model is obtained which describes 97% of data. The resulting binary CART tree specifies which input parameters influence considerably each of the classification groups. This allows for a technical assessment that indicates which of these are the most significant for the manufacture and operation of the type of laser under consideration. The predicted values of the laser output power are also obtained depending on classification. This aids the design and development processes considerably.

Adding Edges between One Node and Every Other Node with the Same Depth in a Complete K-ary Tree

This paper proposes a model of adding relations between members of the same level in a pyramid organization structure which is a complete K-ary tree such that the communication of information between every member in the organization becomes the most efficient. When edges between one node and every other node with the same depth N in a complete K-ary tree of height H are added, an optimal depth N* = H is obtained by minimizing the total path length which is the sum of lengths of shortest paths between every pair of all nodes.

Enhance Performance of Secure Image Using Wavelet Compression

The increase popularity of multimedia application especially in image processing places a great demand on efficient data storage and transmission techniques. Network communication such as wireless network can easily be intercepted and cause of confidential information leaked. Unfortunately, conventional compression and encryption methods are too slow; it is impossible to carry out real time secure image processing. In this research, Embedded Zerotree Wavelet (EZW) encoder which specially designs for wavelet compression is examined. With this algorithm, three methods are proposed to reduce the processing time, space and security protection that will be secured enough to protect the data.

A Detailed Timber Harvest Simulator Coupled with 3-D Visualization

In today-s world, the efficient utilization of wood resources comes more and more to the mind of forest owners. It is a very complex challenge to ensure an efficient harvest of the wood resources. This is one of the scopes the project “Virtual Forest II" addresses. Its core is a database with data about forests containing approximately 260 million trees located in North Rhine-Westphalia (NRW). Based on this data, tree growth simulations and wood mobilization simulations can be conducted. This paper focuses on the latter. It describes a discrete-event-simulation with an attached 3-D real time visualization which simulates timber harvest using trees from the database with different crop resources. This simulation can be displayed in 3-D to show the progress of the wood crop. All the data gathered during the simulation is presented as a detailed summary afterwards. This summary includes cost-benefit calculations and can be compared to those of previous runs to optimize the financial outcome of the timber harvest by exchanging crop resources or modifying their parameters.

Automata Theory Approach for Solving Frequent Pattern Discovery Problems

The various types of frequent pattern discovery problem, namely, the frequent itemset, sequence and graph mining problems are solved in different ways which are, however, in certain aspects similar. The main approach of discovering such patterns can be classified into two main classes, namely, in the class of the levelwise methods and in that of the database projection-based methods. The level-wise algorithms use in general clever indexing structures for discovering the patterns. In this paper a new approach is proposed for discovering frequent sequences and tree-like patterns efficiently that is based on the level-wise issue. Because the level-wise algorithms spend a lot of time for the subpattern testing problem, the new approach introduces the idea of using automaton theory to solve this problem.

Application of a Similarity Measure for Graphs to Web-based Document Structures

Due to the tremendous amount of information provided by the World Wide Web (WWW) developing methods for mining the structure of web-based documents is of considerable interest. In this paper we present a similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as linear integer strings, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments for solving a novel and challenging problem: Measuring the structural similarity of generalized trees. In other words: We first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem for developing a efficient graph similarity measure. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based document structures.

Novel Rao-Blackwellized Particle Filter for Mobile Robot SLAM Using Monocular Vision

This paper presents the novel Rao-Blackwellised particle filter (RBPF) for mobile robot simultaneous localization and mapping (SLAM) using monocular vision. The particle filter is combined with unscented Kalman filter (UKF) to extending the path posterior by sampling new poses that integrate the current observation which drastically reduces the uncertainty about the robot pose. The landmark position estimation and update is also implemented through UKF. Furthermore, the number of resampling steps is determined adaptively, which seriously reduces the particle depletion problem, and introducing the evolution strategies (ES) for avoiding particle impoverishment. The 3D natural point landmarks are structured with matching Scale Invariant Feature Transform (SIFT) feature pairs. The matching for multi-dimension SIFT features is implemented with a KD-Tree in the time cost of O(log2 N). Experiment results on real robot in our indoor environment show the advantages of our methods over previous approaches.

A Study of Social and Cultural Context for Tourism Management by Community Kamchanoad District, Amphoe Ban Dung, Udon Thani Province

This research was to study on background and social and cultural context of Kamchanoad community for sustainable tourism management. All data was collected through in-depth interview with village headmen, community committees, teacher, monks, Kamchanoad forest field officers and respected senior citizen above 60 years old in the community who have lived there for more than 40 years. Altogether there were 30 participants for this research. After analyzing the data, content from interview and discussion, Kamchanoad has both high land and low land in the region as well as swamps that are very capable of freshwater animals’ conservation. Kamchanoad is also good for agriculture and animal farming. 80% of Kamchanoad’s land are forest, freshwater and rice farms. Kamchanoad was officially set up as community in 1994 as “Baan Nonmuang”. Inhabitants in Kamchanoad make a living by farming based on sufficiency economy. They have rice farm, eucalyptus farm, cassava farm and rubber tree farm. Local people in Kamchanoad still believe in the myth of Srisutto Naga. They are still religious and love to preserve their traditional way of life. In order to understand how to create successful tourism business in Kamchanoad, we have to study closely on local culture and traditions. Outstanding event in Kamchanoad is the worship of Grand Srisutto, which is on the fullmoon day of 6th month or Visakhabucha Day. Other big events are also celebration at the end of Buddhist lent, Naga firework, New Year celebration, Boon Mahachart, Songkran, Buddhist Lent, Boon Katin and Loy Kratong. Buddhism is the main religion in Kamchanoad. The promotion of tourism in Kamchanoad is expected to help spreading more income for this region. More infrastructures will be provided for local people as well as funding for youth support and people activities.

Unequal Error Protection for Region of Interest with Embedded Zerotree Wavelet

This paper describes a new method of unequal error protection (UEP) for region of interest (ROI) with embedded zerotree wavelet algorithm (EZW). ROI technique is important in applications with different parts of importance. In ROI coding, a chosen ROI is encoded with higher quality than the background (BG). Unequal error protection of image is provided by different coding techniques. In our proposed method, image is divided into two parts (ROI, BG) that consist of more important bytes (MIB) and less important bytes (LIB). The experimental results verify effectiveness of the design. The results of our method demonstrate the comparison of the unequal error protection (UEP) of image transmission with defined ROI and the equal error protection (EEP) over multiple noisy channels.

Synthesis of Analogue to Camptothecine

Camptothecin (CPT) is a cytotoxic quinoline alkaloid, which inhibits the DNA enzyme topoisomerase I (topo I). It was discovered in 1966 by M. E. Wall and M. C. Wani in systematic screening of natural products for anticancer drugs. It was isolated from the bark and stem of Camptotheca acuminata (Camptotheca, Happy tree), a tree native in China. CPT showed remarkable anticancer activity in preliminary clinical trials but also low solubility and (high) adverse drug reaction. Because of these disadvantages synthetic and medicinal chemists have developed numerous syntheses of Camptothecine [1][2][3] and various derivatives to increase the benefits of the chemical, with good results. In our method CPT analogues has be six steps starting from available material DL Malic acid.

Improved C-Fuzzy Decision Tree for Intrusion Detection

As the number of networked computers grows, intrusion detection is an essential component in keeping networks secure. Various approaches for intrusion detection are currently being in use with each one has its own merits and demerits. This paper presents our work to test and improve the performance of a new class of decision tree c-fuzzy decision tree to detect intrusion. The work also includes identifying best candidate feature sub set to build the efficient c-fuzzy decision tree based Intrusion Detection System (IDS). We investigated the usefulness of c-fuzzy decision tree for developing IDS with a data partition based on horizontal fragmentation. Empirical results indicate the usefulness of our approach in developing the efficient IDS.

A Thai to English Machine Translation System Using Thai LFG Tree Structure as Interlingua

Machine Translation (MT) between the Thai and English languages has been a challenging research topic in natural language processing. Most research has been done on English to Thai machine translation, but not the other way around. This paper presents a Thai to English Machine Translation System that translates a Thai sentence into interlingua of a Thai LFG tree using LFG grammar and a bottom up parser. The Thai LFG tree is then transformed into the corresponding English LFG tree by pattern matching and node transformation. Finally, an equivalent English sentence is created using structural information prescribed by the English LFG tree. Based on results of experiments designed to evaluate the performance of the proposed system, it can be stated that the system has been proven to be effective in providing a useful translation from Thai to English.

Fiber Microstructure in Solanum Found in Thailand

The study aimed to investigate characteristics of vegetative tissue for taxonomic purpose and possibly trend of waste application in industry. Stems and branches of 15 species in Solanum found in Thailand were prepared for fiber and examined by light microscopy. Microstructural characteristic data of fiber i.e. fiber length and width, fiber lumen diameter and fiber cell wall thickness were recorded. The longest average fiber cell length (>3.9 mm.) were obtained in S. lycopersicum L. and S. tuberosum L. Fiber cells from S. lycopersicum also revealed the widest average diameter of whole cell and its lumen at >45.5 μm and >29 μm respectively. However fiber cells with thickest wall of > 9.6 μm were belonged to the ornamental tree species, S. wrightii Benth. The results showed that the slenderness ratio, Runkel ratio, and flexibility coefficient, with potentially suitable for feedstock in paper industry fell in 4 exotic species, i.e. Solanumamericanum L., S. lycopersicum, S. seaforthianum Andr., and S. tuberosum L

Taiwan Sugar Corporation's Participation in the Mechanism of Payment for Environmental Services (PES)

The Taiwan government has started to promote the “Plain Landscape Afforestation and Greening Program" since 2002. A key task of the program was the payment for environmental services (PES), entitled the “Plain Landscape Afforestation Policy" (PLAP), which was certificated by the Executive Yuan on August 31, 2001 and enacted on January 1, 2002. According to the policy, it is estimated that the total area of afforestation will be 25,100 hectares by December 31, 2007. Until the end of 2007, the policy had been enacted for six years in total and the actual area of afforestation was 8,919.18 hectares. Among them, Taiwan Sugar Corporation (TSC) was accounted for 7,960 hectares (with 2,450.83 hectares as public service area) which occupied 86.22% of the total afforestation area; the private farmland promoted by local governments was accounted for 869.18 hectares which occupied 9.75% of the total afforestation area. Based on the above, we observe that most of the afforestation area in this policy is executed by TSC, and the achievement ratio by TSC is better than by others. It implies that the success of the PLAP is seriously related to the execution of TSC. The objective of this study is to analyze the relevant policy planning of TSC's participation in the PLAP, suggest complementary measures, and draw up effective adjustment mechanisms, so as to improve the effectiveness of executing the policy. Our main conclusions and suggestions are summarized as follows: 1. The main reason for TSC’s participation in the PLAP is based on their passive cooperation with the central government or company policy. Prior to TSC’s participation in the PLAP, their lands were mainly used for growing sugarcane. 2. The main factors of TSC's consideration on the selection of tree species are based on the suitability of land and species. The largest proportion of tree species is allocated to economic forests, and the lack of technical instruction was the main problem during afforestation. Moreover, the method of improving TSC’s future development in leisure agriculture and landscape business becomes a key topic. 3. TSC has developed short and long-term plans on participating in the PLAP for the future. However, there is no great willingness or incentive on budgeting for such detailed planning. 4. Most people from TSC interviewed consider the requirements on PLAP unreasonable. Among them, an unreasonable requirement on the number of trees accounted for the greatest proportion; furthermore, most interviewees suggested that the government should continue to provide incentives even after 20 years. 5. Since the government shares the same goals as TSC, there should be sufficient cooperation and communication that support the technical instruction and reduction of afforestation cost, which will also help to improve effectiveness of the policy.

Clustering Multivariate Empiric Characteristic Functions for Multi-Class SVM Classification

A dissimilarity measure between the empiric characteristic functions of the subsamples associated to the different classes in a multivariate data set is proposed. This measure can be efficiently computed, and it depends on all the cases of each class. It may be used to find groups of similar classes, which could be joined for further analysis, or it could be employed to perform an agglomerative hierarchical cluster analysis of the set of classes. The final tree can serve to build a family of binary classification models, offering an alternative approach to the multi-class SVM problem. We have tested this dendrogram based SVM approach with the oneagainst- one SVM approach over four publicly available data sets, three of them being microarray data. Both performances have been found equivalent, but the first solution requires a smaller number of binary SVM models.

Data Mining in Oral Medicine Using Decision Trees

Data mining has been used very frequently to extract hidden information from large databases. This paper suggests the use of decision trees for continuously extracting the clinical reasoning in the form of medical expert-s actions that is inherent in large number of EMRs (Electronic Medical records). In this way the extracted data could be used to teach students of oral medicine a number of orderly processes for dealing with patients who represent with different problems within the practice context over time.

Game-Tree Simplification by Pattern Matching and Its Acceleration Approach using an FPGA

In this paper, we propose a Connect6 solver which adopts a hybrid approach based on a tree-search algorithm and image processing techniques. The solver must deal with the complicated computation and provide high performance in order to make real-time decisions. The proposed approach enables the solver to be implemented on a single Spartan-6 XC6SLX45 FPGA produced by XILINX without using any external devices. The compact implementation is achieved through image processing techniques to optimize a tree-search algorithm of the Connect6 game. The tree search is widely used in computer games and the optimal search brings the best move in every turn of a computer game. Thus, many tree-search algorithms such as Minimax algorithm and artificial intelligence approaches have been widely proposed in this field. However, there is one fundamental problem in this area; the computation time increases rapidly in response to the growth of the game tree. It means the larger the game tree is, the bigger the circuit size is because of their highly parallel computation characteristics. Here, this paper aims to reduce the size of a Connect6 game tree using image processing techniques and its position symmetric property. The proposed solver is composed of four computational modules: a two-dimensional checkmate strategy checker, a template matching module, a skilful-line predictor, and a next-move selector. These modules work well together in selecting next moves from some candidates and the total amount of their circuits is small. The details of the hardware design for an FPGA implementation are described and the performance of this design is also shown in this paper.