A Text Clustering System based on k-means Type Subspace Clustering and Ontology

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

A 3D Approach for Extraction of the Coronaryartery and Quantification of the Stenosis

Segmentation and quantification of stenosis is an important task in assessing coronary artery disease. One of the main challenges is measuring the real diameter of curved vessels. Moreover, uncertainty in segmentation of different tissues in the narrow vessel is an important issue that affects accuracy. This paper proposes an algorithm to extract coronary arteries and measure the degree of stenosis. Markovian fuzzy clustering method is applied to model uncertainty arises from partial volume effect problem. The algorithm employs: segmentation, centreline extraction, estimation of orthogonal plane to centreline, measurement of the degree of stenosis. To evaluate the accuracy and reproducibility, the approach has been applied to a vascular phantom and the results are compared with real diameter. The results of 10 patient datasets have been visually judged by a qualified radiologist. The results reveal the superiority of the proposed method compared to the Conventional thresholding Method (CTM) on both datasets.

Dichotomous Logistic Regression with Leave-One-Out Validation

In this paper, the concepts of dichotomous logistic regression (DLR) with leave-one-out (L-O-O) were discussed. To illustrate this, the L-O-O was run to determine the importance of the simulation conditions for robust test of spread procedures with good Type I error rates. The resultant model was then evaluated. The discussions included 1) assessment of the accuracy of the model, and 2) parameter estimates. These were presented and illustrated by modeling the relationship between the dichotomous dependent variable (Type I error rates) with a set of independent variables (the simulation conditions). The base SAS software containing PROC LOGISTIC and DATA step functions can be making used to do the DLR analysis.

Applications of Genetic Programming in Data Mining

This paper details the application of a genetic programming framework for induction of useful classification rules from a database of income statements, balance sheets, and cash flow statements for North American public companies. Potentially interesting classification rules are discovered. Anomalies in the discovery process merit further investigation of the application of genetic programming to the dataset for the problem domain.

Automatic Visualization Pipeline Formation for Medical Datasets on Grid Computing Environment

Distance visualization of large datasets often takes the direction of remote viewing and zooming techniques of stored static images. However, the continuous increase in the size of datasets and visualization operation causes insufficient performance with traditional desktop computers. Additionally, the visualization techniques such as Isosurface depend on the available resources of the running machine and the size of datasets. Moreover, the continuous demand for powerful computing powers and continuous increase in the size of datasets results an urgent need for a grid computing infrastructure. However, some issues arise in current grid such as resources availability at the client machines which are not sufficient enough to process large datasets. On top of that, different output devices and different network bandwidth between the visualization pipeline components often result output suitable for one machine and not suitable for another. In this paper we investigate how the grid services could be used to support remote visualization of large datasets and to break the constraint of physical co-location of the resources by applying the grid computing technologies. We show our grid enabled architecture to visualize large medical datasets (circa 5 million polygons) for remote interactive visualization on modest resources clients.

Real-time Performance Study of EPA Periodic Data Transmission

EPA (Ethernet for Plant Automation) resolves the nondeterministic problem of standard Ethernet and accomplishes real-time communication by means of micro-segment topology and deterministic scheduling mechanism. This paper studies the real-time performance of EPA periodic data transmission from theoretical and experimental perspective. By analyzing information transmission characteristics and EPA deterministic scheduling mechanism, 5 indicators including delivery time, time synchronization accuracy, data-sending time offset accuracy, utilization percentage of configured timeslice and non-RTE bandwidth that can be used to specify the real-time performance of EPA periodic data transmission are presented and investigated. On this basis, the test principles and test methods of the indicators are respectively studied and some formulas for real-time performance of EPA system are derived. Furthermore, an experiment platform is developed to test the indicators of EPA periodic data transmission in a micro-segment. According to the analysis and the experiment, the methods to improve the real-time performance of EPA periodic data transmission including optimizing network structure, studying self-adaptive adjustment method of timeslice and providing data-sending time offset accuracy for configuration are proposed.

EEG Spikes Detection, Sorting, and Localization

This study introduces a new method for detecting, sorting, and localizing spikes from multiunit EEG recordings. The method combines the wavelet transform, which localizes distinctive spike features, with Super-Paramagnetic Clustering (SPC) algorithm, which allows automatic classification of the data without assumptions such as low variance or Gaussian distributions. Moreover, the method is capable of setting amplitude thresholds for spike detection. The method makes use of several real EEG data sets, and accordingly the spikes are detected, clustered and their times were detected.

Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern

The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.

Cloud Computing Databases: Latest Trends and Architectural Concepts

The Economic factors are leading to the rise of infrastructures provides software and computing facilities as a service, known as cloud services or cloud computing. Cloud services can provide efficiencies for application providers, both by limiting up-front capital expenses, and by reducing the cost of ownership over time. Such services are made available in a data center, using shared commodity hardware for computation and storage. There is a varied set of cloud services available today, including application services (salesforce.com), storage services (Amazon S3), compute services (Google App Engine, Amazon EC2) and data services (Amazon SimpleDB, Microsoft SQL Server Data Services, Google-s Data store). These services represent a variety of reformations of data management architectures, and more are on the horizon.

Parametric Optimization of Hospital Design

Present paper presents a parametric performancebased design model for optimizing hospital design. The design model operates with geometric input parameters defining the functional requirements of the hospital and input parameters in terms of performance objectives defining the design requirements and preferences of the hospital with respect to performances. The design model takes point of departure in the hospital functionalities as a set of defined parameters and rules describing the design requirements and preferences.

Graphical Programming of Programmable Logic Controllers -Case Study for a Punching Machine-

The Programmable Logic Controller (PLC) plays a vital role in automation and process control. Grafcet is used for representing the control logic, and traditional programming languages are used for describing the pure algorithms. Grafcet is used for dividing the process to be automated in elementary sequences that can be easily implemented. Each sequence represent a step that has associated actions programmed using textual or graphical languages after case. The programming task is simplified by using a set of subroutines that are used in several steps. The paper presents an example of implementation for a punching machine for sheets and plates. The use the graphical languages the programming of a complex sequential process is a necessary solution. The state of Grafcet can be used for debugging and malfunction determination. The use of the method combined with a set of knowledge acquisition for process application reduces the downtime of the machine and improve the productivity.

An Empirical Analysis of the Board Composition Concerning Logistics Competencies

Empirical insights into the implementation of logistics competencies at the top management level are scarce. This paper addresses this issue with an explorative approach which is based on a dataset of 872 observations in the years 2000, 2004 and 2008 using quantitative content analysis from annual reports of the 500 publicly listed firms with the highest global research and development expenditures according to the British Department for Business Innovation and Skills. We find that logistics competencies are more pronounced in Asian companies than in their European or American counterparts. On an industrial level the results are quite mixed. Using partial point-biserial correlations we show that logistics competencies are positively related to financial performance.

Online Signature Verification Using Angular Transformation for e-Commerce Services

The rapid growth of e-Commerce services is significantly observed in the past decade. However, the method to verify the authenticated users still widely depends on numeric approaches. A new search on other verification methods suitable for online e-Commerce is an interesting issue. In this paper, a new online signature-verification method using angular transformation is presented. Delay shifts existing in online signatures are estimated by the estimation method relying on angle representation. In the proposed signature-verification algorithm, all components of input signature are extracted by considering the discontinuous break points on the stream of angular values. Then the estimated delay shift is captured by comparing with the selected reference signature and the error matching can be computed as a main feature used for verifying process. The threshold offsets are calculated by two types of error characteristics of the signature verification problem, False Rejection Rate (FRR) and False Acceptance Rate (FAR). The level of these two error rates depends on the decision threshold chosen whose value is such as to realize the Equal Error Rate (EER; FAR = FRR). The experimental results show that through the simple programming, employed on Internet for demonstrating e-Commerce services, the proposed method can provide 95.39% correct verifications and 7% better than DP matching based signature-verification method. In addition, the signature verification with extracting components provides more reliable results than using a whole decision making.

Outer-Brace Stress Concentration Factors of Offshore Two-Planar Tubular DKT-Joints

In the present paper, a set of parametric FE stress analyses is carried out for two-planar welded tubular DKT-joints under two different axial load cases. Analysis results are used to present general remarks on the effect of geometrical parameters on the stress concentration factors (SCFs) at the inner saddle, outer saddle, toe, and heel positions on the main (outer) brace. Then a new set of SCF parametric equations is developed through nonlinear regression analysis for the fatigue design of two-planar DKT-joints. An assessment study of these equations is conducted against the experimental data; and the satisfaction of the criteria regarding the acceptance of parametric equations is checked. Significant effort has been devoted by researchers to the study of SCFs in various uniplanar tubular connections. Nevertheless, for multi-planar joints covering the majority of practical applications, very few investigations have been reported due to the complexity and high cost involved.

Implementation of Feed-in Tariffs into Multi-Energy Systems

This paper considers the influence of promotion instruments for renewable energy sources (RES) on a multi-energy modeling framework. In Europe, so called Feed-in Tariffs are successfully used as incentive structures to increase the amount of energy produced by RES. Because of the stochastic nature of large scale integration of distributed generation, many problems have occurred regarding the quality and stability of supply. Hence, a macroscopic model was developed in order to optimize the power supply of the local energy infrastructure, which includes electricity, natural gas, fuel oil and district heating as energy carriers. Unique features of the model are the integration of RES and the adoption of Feed-in Tariffs into one optimization stage. Sensitivity studies are carried out to examine the system behavior under changing profits for the feed-in of RES. With a setup of three energy exchanging regions and a multi-period optimization, the impact of costs and profits are determined.

A Novel Pilot Scheme for Frequency Offset and Channel Estimation in 2x2 MIMO-OFDM

The Carrier Frequency Offset (CFO) due to timevarying fading channel is the main cause of the loss of orthogonality among OFDM subcarriers which is linked to inter-carrier interference (ICI). Hence, it is necessary to precisely estimate and compensate the CFO. Especially for mobile broadband communications, CFO and channel gain also have to be estimated and tracked to maintain the system performance. Thus, synchronization pilots are embedded in every OFDM symbol to track the variations. In this paper, we present the pilot scheme for both channel and CFO estimation where channel estimation process can be carried out with only one OFDM symbol. Additional, the proposed pilot scheme also provides better performance in CFO estimation comparing with the conventional orthogonal pilot scheme due to the increasing of signal-tointerference ratio.

Self-Efficacy, Anxiety, and Performance in the English Language among Middle-School Students in English Language Program in Satri Si Suriyothai School, Bangkok

This study investigated students- perception of self efficacy and anxiety in acquiring English language, and consequently examined the relationship existing among the independent variables, confounding variables and students- performances in the English language. The researcher tested the research hypotheses using a sample group of 318 respondents out of the population size of 400 students. The results obtained revealed that there was a significant moderate negative relationship between English language anxiety and performance in English language, but no significant relationship between self-efficacy and English language performance, among the middle-school students. There was a significant moderate negative relationship between English language anxiety and self-efficacy. It was discovered that general self-efficacy and English language anxiety represented a significantly more powerful set of predictors than the set of confounding variables. Thus, the study concluded that English language anxiety and general self-efficacy were significant predictors of English language performance among middle-school students in Satri Si Suriyothai School.

Performance Evaluation of Wavelet Based Coders on Brain MRI Volumetric Medical Datasets for Storage and Wireless Transmission

In this paper, we evaluate the performance of some wavelet based coding algorithms such as 3D QT-L, 3D SPIHT and JPEG2K. In the first step we achieve an objective comparison between three coders, namely 3D SPIHT, 3D QT-L and JPEG2K. For this purpose, eight MRI head scan test sets of 256 x 256x124 voxels have been used. Results show superior performance of 3D SPIHT algorithm, whereas 3D QT-L outperforms JPEG2K. The second step consists of evaluating the robustness of 3D SPIHT and JPEG2K coding algorithm over wireless transmission. Compressed dataset images are then transmitted over AWGN wireless channel or over Rayleigh wireless channel. Results show the superiority of JPEG2K over these two models. In fact, it has been deduced that JPEG2K is more robust regarding coding errors. Thus we may conclude the necessity of using corrector codes in order to protect the transmitted medical information.

Metal Streak Analysis with different Acquisition Settings in Postoperative Spine Imaging: A Phantom Study

CT assessment of postoperative spine is challenging in the presence of metal streak artifacts that could deteriorate the quality of CT images. In this paper, we studied the influence of different acquisition parameters on the magnitude of metal streaking. A water-bath phantom was constructed with metal insertion similar with postoperative spine assessment. The phantom was scanned with different acquisition settings and acquired data were reconstructed using various reconstruction settings. Standardized ROIs were defined within streaking region for image analysis. The result shows increased kVp and mAs enhanced SNR values by reducing image noise. Sharper kernel enhanced image quality compared to smooth kernel, but produced more noise in the images with higher CT fluctuation. The noise between both kernels were significantly different (P

Inverse Sets-based Recognition of Video Clips

The paper discusses the mathematics of pattern indexing and its applications to recognition of visual patterns that are found in video clips. It is shown that (a) pattern indexes can be represented by collections of inverted patterns, (b) solutions to pattern classification problems can be found as intersections and histograms of inverted patterns and, thus, matching of original patterns avoided.