Text Mining Technique for Data Mining Application

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Leaf Chlorophyll of Corn, Sweet basil and Borage under Intercropping System in Weed Interference

Intercropping is one of the sustainable agricultural factors. The SPAD meter can be used to predict nitrogen index reliably, it may also be a useful tool for assessing the relative impact of weeds on crops. In order to study the effect of weeds on SPAD in corn (Zea mays L.), sweet basil (Ocimum basilicum L.) and borage (Borago officinalis L.) in intercropping system, a factorial experiment was conducted in three replications in 2011. Experimental factors were included intercropping of corn with sweet basil and borage in different ratios (100:0, 75:25, 50:50, 25:75 and 0:100 corn: borage or sweet basil) and weed infestation (weed control and weed interference). The results showed that intercropping of corn with sweet basil and borage increased the SPAD value of corn compare to monoculture in weed interference condition. Sweet basil SPAD value in weed control treatments (43.66) was more than weed interference treatments (40.17). Corn could increase the borage SPAD value compare to monoculture in weed interference treatments.

Intelligent Heart Disease Prediction System Using CANFIS and Genetic Algorithm

Heart disease (HD) is a major cause of morbidity and mortality in the modern society. Medical diagnosis is an important but complicated task that should be performed accurately and efficiently and its automation would be very useful. All doctors are unfortunately not equally skilled in every sub specialty and they are in many places a scarce resource. A system for automated medical diagnosis would enhance medical care and reduce costs. In this paper, a new approach based on coactive neuro-fuzzy inference system (CANFIS) was presented for prediction of heart disease. The proposed CANFIS model combined the neural network adaptive capabilities and the fuzzy logic qualitative approach which is then integrated with genetic algorithm to diagnose the presence of the disease. The performances of the CANFIS model were evaluated in terms of training performances and classification accuracies and the results showed that the proposed CANFIS model has great potential in predicting the heart disease.

Technique for Processing and Preservation of Human Amniotic Membrane for Ocular Surface Reconstruction

Human amniotic membrane (HAM) is a useful biological material for the reconstruction of damaged ocular surface. The processing and preservation of HAM is critical to prevent the patients undergoing amniotic membrane transplant (AMT) from cross infections. For HAM preparation human placenta is obtained after an elective cesarean delivery. Before collection, the donor is screened for seronegativity of HCV, Hbs Ag, HIV and Syphilis. After collection, placenta is washed in balanced salt solution (BSS) in sterile environment. Amniotic membrane is then separated from the placenta as well as chorion while keeping the preparation in BSS. Scrapping of HAM is then carried out manually until all the debris is removed and clear transparent membrane is acquired. Nitrocellulose membrane filters are then placed on the stromal side of HAM, cut around the edges with little membrane folded towards other side making it easy to separate during surgery. HAM is finally stored in solution of glycerine and Dulbecco-s Modified Eagle Medium (DMEM) in 1:1 ratio containing antibiotics. The capped borosil vials containing HAM are kept at -80°C until use. This vial is thawed to room temperature and opened under sterile operation theatre conditions at the time of surgery.

Multi-Agent Simulation of Wayfinding for Rescue Operation during Building Fire

Recently research on human wayfinding has focused mainly on mental representations rather than processes of wayfinding. The objective of this paper is to demonstrate the rationality behind applying multi-agent simulation paradigm to the modeling of rescuer team wayfinding in order to develop computational theory of perceptual wayfinding in crisis situations using image schemata and affordances, which explains how people find a specific destination in an unfamiliar building such as a hospital. The hypothesis of this paper is that successful navigation is possible if the agents are able to make the correct decision through well-defined cues in critical cases, so the design of the building signage is evaluated through the multi-agent-based simulation. In addition, a special case of wayfinding in a building, finding one-s way through three hospitals, is used to demonstrate the model. Thereby, total rescue time for rescue operation during building fire is computed. This paper discuses the computed rescue time for various signage localization and provides experimental result for optimization of building signage design. Therefore the most appropriate signage design resulted in the shortest total rescue time in various situations.

Product Configuration Strategy Based On Product Family Similarity

To offer a large variety of products while maintaining low costs, high speed, and high quality in a mass customization product development environment, platform based product development has much benefit and usefulness in many industry fields. This paper proposes a product configuration strategy by similarity measure, incorporating the knowledge engineering principles such as product information model, ontology engineering, and formal concept analysis.

Hybrid Honeypot System for Network Security

Nowadays, we are facing with network threats that cause enormous damage to the Internet community day by day. In this situation, more and more people try to prevent their network security using some traditional mechanisms including firewall, Intrusion Detection System, etc. Among them honeypot is a versatile tool for a security practitioner, of course, they are tools that are meant to be attacked or interacted with to more information about attackers, their motives and tools. In this paper, we will describe usefulness of low-interaction honeypot and high-interaction honeypot and comparison between them. And then we propose hybrid honeypot architecture that combines low and high -interaction honeypot to mitigate the drawback. In this architecture, low-interaction honeypot is used as a traffic filter. Activities like port scanning can be effectively detected by low-interaction honeypot and stop there. Traffic that cannot be handled by low-interaction honeypot is handed over to high-interaction honeypot. In this case, low-interaction honeypot is used as proxy whereas high-interaction honeypot offers the optimal level realism. To prevent the high-interaction honeypot from infections, containment environment (VMware) is used.

Detecting Abnormal ECG Signals Utilising Wavelet Transform and Standard Deviation

ECG contains very important clinical information about the cardiac activities of the heart. Often the ECG signal needs to be captured for a long period of time in order to identify abnormalities in certain situations. Such signal apart of a large volume often is characterised by low quality due to the noise and other influences. In order to extract features in the ECG signal with time-varying characteristics at first need to be preprocessed with the best parameters. Also, it is useful to identify specific parts of the long lasting signal which have certain abnormalities and to direct the practitioner to those parts of the signal. In this work we present a method based on wavelet transform, standard deviation and variable threshold which achieves 100% accuracy in identifying the ECG signal peaks and heartbeat as well as identifying the standard deviation, providing a quick reference to abnormalities.

Mining Correlated Bicluster from Web Usage Data Using Discrete Firefly Algorithm Based Biclustering Approach

For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.

Cluster Algorithm for Genetic Diversity

With the hardware technology advancing, the cost of storing is decreasing. Thus there is an urgent need for new techniques and tools that can intelligently and automatically assist us in transferring this data into useful knowledge. Different techniques of data mining are developed which are helpful for handling these large size databases [7]. Data mining is also finding its role in the field of biotechnology. Pedigree means the associated ancestry of a crop variety. Genetic diversity is the variation in the genetic composition of individuals within or among species. Genetic diversity depends upon the pedigree information of the varieties. Parents at lower hierarchic levels have more weightage for predicting genetic diversity as compared to the upper hierarchic levels. The weightage decreases as the level increases. For crossbreeding, the two varieties should be more and more genetically diverse so as to incorporate the useful characters of the two varieties in the newly developed variety. This paper discusses the searching and analyzing of different possible pairs of varieties selected on the basis of morphological characters, Climatic conditions and Nutrients so as to obtain the most optimal pair that can produce the required crossbreed variety. An algorithm was developed to determine the genetic diversity between the selected wheat varieties. Cluster analysis technique is used for retrieving the results.

Joint Use of Factor Analysis (FA) and Data Envelopment Analysis (DEA) for Ranking of Data Envelopment Analysis

This article combines two techniques: data envelopment analysis (DEA) and Factor analysis (FA) to data reduction in decision making units (DMU). Data envelopment analysis (DEA), a popular linear programming technique is useful to rate comparatively operational efficiency of decision making units (DMU) based on their deterministic (not necessarily stochastic) input–output data and factor analysis techniques, have been proposed as data reduction and classification technique, which can be applied in data envelopment analysis (DEA) technique for reduction input – output data. Numerical results reveal that the new approach shows a good consistency in ranking with DEA.

Satisfying and Frustrating Aspects of ICT Teaching: A Comparison Based On Self-Efficacy

The purpose of this study was to determine the most satisfying and frustrating aspects of ICT (Information and Communications Technologies) teaching in Turkish schools. Another aim was to compare these aspects based-on ICT teachers- selfefficacy. Participants were 119 ICT teachers from different geographical areas of Turkey. Participants were asked to list salient satisfying and frustrating aspects of ICT teaching, and to fill out the Self-Efficacy Scale for ICT Teachers. Results showed that the high self-efficacy teachers listed more positive and negative aspects of ICT teaching then did the low self-efficacy teachers. The satisfying aspects of ICT teaching were the dynamic nature of ICT subject, higher student interest, having opportunity to help other subject teachers, and lecturing in well-equipped labs, whereas the most frequently cited frustrating aspects of ICT teaching were ICT-related extra works of schools and colleagues, shortages of hardware and technical problems, indifferent students, insufficient teaching time, and the status of ICT subject in school curriculum. This information could be useful in redesigning ICT teachers- roles and responsibilities as well as job environment in schools.

Split-Pipe Design of Water Distribution Networks Using a Combination of Tabu Search and Genetic Algorithm

In this paper a combination approach of two heuristic-based algorithms: genetic algorithm and tabu search is proposed. It has been developed to obtain the least cost based on the split-pipe design of looped water distribution network. The proposed combination algorithm has been applied to solve the three well-known water distribution networks taken from the literature. The development of the combination of these two heuristic-based algorithms for optimization is aimed at enhancing their strengths and compensating their weaknesses. Tabu search is rather systematic and deterministic that uses adaptive memory in search process, while genetic algorithm is probabilistic and stochastic optimization technique in which the solution space is explored by generating candidate solutions. Split-pipe design may not be realistic in practice but in optimization purpose, optimal solutions are always achieved with split-pipe design. The solutions obtained in this study have proved that the least cost solutions obtained from the split-pipe design are always better than those obtained from the single pipe design. The results obtained from the combination approach show its ability and effectiveness to solve combinatorial optimization problems. The solutions obtained are very satisfactory and high quality in which the solutions of two networks are found to be the lowest-cost solutions yet presented in the literature. The concept of combination approach proposed in this study is expected to contribute some useful benefits in diverse problems.

Design Method for Knowledge Base Systems in Education Using COKB-ONT

Nowadays e-Learning is more popular, in Vietnam especially. In e-learning, materials for studying are very important. It is necessary to design the knowledge base systems and expert systems which support for searching, querying, solving of problems. The ontology, which was called Computational Object Knowledge Base Ontology (COB-ONT), is a useful tool for designing knowledge base systems in practice. In this paper, a design method for knowledge base systems in education using COKB-ONT will be presented. We also present the design of a knowledge base system that supports studying knowledge and solving problems in higher mathematics.

Dimension Reduction of Microarray Data Based on Local Principal Component

Analysis and visualization of microarraydata is veryassistantfor biologists and clinicians in the field of diagnosis and treatment of patients. It allows Clinicians to better understand the structure of microarray and facilitates understanding gene expression in cells. However, microarray dataset is a complex data set and has thousands of features and a very small number of observations. This very high dimensional data set often contains some noise, non-useful information and a small number of relevant features for disease or genotype. This paper proposes a non-linear dimensionality reduction algorithm Local Principal Component (LPC) which aims to maps high dimensional data to a lower dimensional space. The reduced data represents the most important variables underlying the original data. Experimental results and comparisons are presented to show the quality of the proposed algorithm. Moreover, experiments also show how this algorithm reduces high dimensional data whilst preserving the neighbourhoods of the points in the low dimensional space as in the high dimensional space.

High-Speed High-Gain CMOS OTA for SC Applications

A fast settling multipath CMOS OTA for high speed switched capacitor applications is presented here. With the basic topology similar to folded-cascode, bandwidth and DC gain of the OTA are enhanced by adding extra paths for signal from input to output. Designed circuit is simulated with HSPICE using level 49 parameters (BSIM 3v3) in 0.35mm standard CMOS technology. DC gain achieved is 56.7dB and Unity Gain Bandwidth (UGB) obtained is 1.15GHz. These results confirm that adding extra paths for signal can improve DC gain and UGB of folded-cascode significantly.

The Relationship between Depression Interpersonal Communication and Media Using Among International Students

Student-s movements have been going increasing in last decades. International students can have different psychological and sociological problems in their adaptation process. Depression is one of the most important problems in this procedure. This research purposed to reveal level of foreign students- depression, kinds of interpersonal communication networks (host/ethnic interpersonal communication) and media usage (host/ethnic media usage). Additionally study aimed to display the relationship between depression and communication (host/ethnic interpersonal communication and host/ethnic media usage) among foreign university students. A field research was performed among 283 foreign university students who have been attending 8 different universities in Turkey. A purposeful sampling technique was used in this research cause of data collect facilities. Results indicated that 58.3% of foreign students- depression stage was “intermediate" while 33.2% of foreign students- depression level was “low". Add to this, host interpersonal communication behaviors and Turkish web sites usages were negatively and significantly correlated with depression.

Issues Problems of Sedimentation in Reservoir Siazakh Dam Case Study

Sedimentation in reservoirs lowers the quality of consumed water, reduce the volume of reservoir, lowers the controllable amount of flood, increases the risk of water overflow during possible floods and the risk of reversal and reduction of dam's useful life. So in all stages of dam establishment such as cognitive studies, phase-1 studies of design, control, construction and maintenance, the problem of sedimentation in reservoir should be considered. What engineers need to do is examine and develop the methods to keep effective capacity of a reservoir, however engineers should also consider the influences of the methods on the flood disaster, functions of water use facilities and environmental issues.This article first examines the sedimentation in reservoirs and shows how to control it and then discusses the studies about the sedimens in Siazakh Dam.

Investigation on the Antimicrobial Effect of Ammonyx on Some Pathogenic Microbes Observed on Sweatshirt Sport

In this research, the main aim is to investigate the antimicrobial effectiveness of ammonyx solutions finishing on Sweatshirt Sport with immersion method. 60 Male healthy subjects (football player) participated in this study. They were dressed in a Sweatshirt for 14 days and some microbes found on them were investigated. The antimicrobial effect of different ammonyx solutions(1/100, 1/500, 1/1000, 1/2000 v/v solutions of Ammonyx) on the identified microbes was studied by the zone inhabitation method in vitro. In the next step the Sweatshirt Sports were treated with the same different solutions of ammonyx and the antimicrobial effectiveness was assessed by colony count method in different times and the results were compared whit untreated ones. Some mechanical properties of treated cotton/polyester yarn that used in Sweatshirt Sport were measured after 30 days and were compared with untreated one. Finally after finishing, scanning electron microscopy (SEM) was used to compare the surfaces of the finished and unfinished specimens. The results showed the presence of five pathogenic microbes on Sweatshirt Sports such as Escherichia coli, Staphylococcus aureus, Aspergillus, Mucor and Candida. The inhalation time for treated on Sweatshirt Sports improved. The amount of colony growth on treated clothes reduced considerably and moreover the mechanical tests results showed no significant deterioration effect of studies properties in comparison to the untreated yarn. The visual examination of the SEM indicated that the antimicrobial treatments were applied usefully to fabrics.

Extraction of Phenol, o-Cresol, and p-Cresol from Coal Tar: Effect of Temperature and Mixing

Coal tar is a liquid by-product of the process of coal gasification and carbonation. This liquid oil mixture contains various kinds of useful compounds such as phenol, o-cresol, and p-cresol. These compounds are widely used as raw material for insecticides, dyes, medicines, perfumes, coloring matters, and many others. This research needed to be done that given the optimum conditions for the separation of phenol, o-cresol, and p-cresol from the coal tar by solvent extraction process. The aim of the present work was to study the effect of two kinds of aqueous were used as solvents: methanol and acetone solutions, the effect of temperature (298, 306, and 313K) and mixing (30, 35, and 40rpm) for the separation of phenol, o-cresol, and p-cresol from coal tar by solvent extraction. Results indicated that phenol, o-cresol, and p-cresol in coal tar were selectivity extracted into the solvent phase and these components could be separated by solvent extraction. The aqueous solution of methanol, mass ratio of solvent to feed, Eo/Ro=1, extraction temperature 306K and mixing 35 rpm were the most efficient for extraction of phenol, o-cresol, and p-cresol from coal tar.