Transcriptomics Analysis on Comparing Non-Small Cell Lung Cancer versus Normal Lung, and Early Stage Compared versus Late-Stages of Non-Small Cell Lung Cancer

Lung cancer is one of the most common malignancies and primary cause of death due to cancer worldwide. Non-small cell lung cancer (NSCLC) is the main subtype in which majority of patients present with advanced-stage disease. Herein, we analyzed differentially expressed genes to find potential biomarkers for lung cancer diagnosis as well as prognostic markers. We used transcriptome data from our 2 NSCLC patients and public data (GSE81089) composing of 8 NSCLC and 10 normal lung tissues. Differentially expressed genes (DEGs) between NSCLC and normal tissue and between early-stage and late-stage NSCLC were analyzed by the DESeq2. Pairwise correlation was used to find the DEGs with false discovery rate (FDR) adjusted p-value £ 0.05 and |log2 fold change| ³ 4 for NSCLC versus normal and FDR adjusted p-value £ 0.05 with |log2 fold change| ³ 2 for early versus late-stage NSCLC. Bioinformatic tools were used for functional and pathway analysis. Moreover, the top ten genes in each comparison group were verified the expression and survival analysis via GEPIA. We found 150 up-regulated and 45 down-regulated genes in NSCLC compared to normal tissues. Many immnunoglobulin-related genes e.g., IGHV4-4, IGHV5-10-1, IGHV4-31, IGHV4-61, and IGHV1-69D were significantly up-regulated. 22 genes were up-regulated, and five genes were down-regulated in late-stage compared to early-stage NSCLC. The top five DEGs genes were KRT6B, SPRR1A, KRT13, KRT6A and KRT5. Keratin 6B (KRT6B) was the most significantly increased gene in the late-stage NSCLC. From GEPIA analysis, we concluded that IGHV4-31 and IGKV1-9 might be used as diagnostic biomarkers, while KRT6B and KRT6A might be used as prognostic biomarkers. However, further clinical validation is needed.

Hybrid Methods for Optimisation of Weights in Spatial Multi-Criteria Evaluation Decision for Fire Risk and Hazard

The challenge for everyone involved in preserving the ecosystem is to find creative ways to protect and restore the remaining ecosystems while accommodating and enhancing the country social and economic well-being. Frequent fires of anthropogenic origin have been affecting the ecosystems in many countries adversely. Hence adopting ways of decision making such as Multicriteria Decision Making (MCDM) is appropriate since it will enhance the evaluation and analysis of fire risk and hazard of the ecosystem. In this paper, fire risk and hazard data from the West Gonja area of Ghana were used in some of the methods (Analytical Hierarchy Process, Compromise Programming, and Grey Relational Analysis (GRA) for MCDM evaluation and analysis to determine the optimal weight method for fire risk and hazard. Ranking of the land cover types was carried out using; Fire Hazard, Fire Fighting Capacity and Response Risk Criteria. Pairwise comparison under Analytic Hierarchy Process (AHP) was used to determine the weight of the various criteria. Weights for sub-criteria were also obtained by the pairwise comparison method. The results were optimised using GRA and Compromise Programming (CP). The results from each method, hybrid GRA and CP, were compared and it was established that all methods were satisfactory in terms of optimisation of weight. The most optimal method for spatial multicriteria evaluation was the hybrid GRA method. Thus, a hybrid AHP and GRA method is more effective method for ranking alternatives in MCDM than the hybrid AHP and CP method.

Utilizing the Analytic Hierarchy Process in Improving Performances of Blind Judo

Identifying, structuring, and racking the most important factors related to improving athletes’ performances could pave the way for improve training system. The purpose of this study was to identify the relative importance factors to improve performance of the of judo athletes with visual impairments, including blindness by using the Analytic Hierarchy Process (AHP). After reviewing the literature, the relative importance of factors affecting performance of the blind judo was selected. A group of expert reviewed the first draft of the questionnaires, and then finally selected performance factors were classified into the major categories of techniques, physical fitness, and psychological categories. Later, a pre-selected experts group was asked to review the final version of questionnaire and confirm the priories of performance factors. The order of priority was determined by performing pairwise comparisons using Expert Choice 2000. Results indicated that “grappling” (.303) and “throwing” (.234) were the most important lower hierarchy factors for blind judo skills. In addition, the most important physical factors affecting performance were “muscular strength and endurance” (.238). Further, among other psychological factors “competitive anxiety” (.393) was important factor that affects performance. It is important to offer psychological skills training to reduce anxiety of judo athletes with visual impairments and blindness, so they can compete in their optimal states. These findings offer insights into what should be considered when determining factors to improve performance of judo athletes with visual impairments and blindness.

Improving Similarity Search Using Clustered Data

This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.

A Construction Management Tool: Determining a Project Schedule Typical Behaviors Using Cluster Analysis

Delays in the construction industry are a global phenomenon. Many construction projects experience extensive delays exceeding the initially estimated completion time. The main purpose of this study is to identify construction projects typical behaviors in order to develop a prognosis and management tool. Being able to know a construction projects schedule tendency will enable evidence-based decision-making to allow resolutions to be made before delays occur. This study presents an innovative approach that uses Cluster Analysis Method to support predictions during Earned Value Analyses. A clustering analysis was used to predict future scheduling, Earned Value Management (EVM), and Earned Schedule (ES) principal Indexes behaviors in construction projects. The analysis was made using a database with 90 different construction projects. It was validated with additional data extracted from literature and with another 15 contrasting projects. For all projects, planned and executed schedules were collected and the EVM and ES principal indexes were calculated. A complete linkage classification method was used. In this way, the cluster analysis made considers that the distance (or similarity) between two clusters must be measured by its most disparate elements, i.e. that the distance is given by the maximum span among its components. Finally, through the use of EVM and ES Indexes and Tukey and Fisher Pairwise Comparisons, the statistical dissimilarity was verified and four clusters were obtained. It can be said that construction projects show an average delay of 35% of its planned completion time. Furthermore, four typical behaviors were found and for each of the obtained clusters, the interim milestones and the necessary rhythms of construction were identified. In general, detected typical behaviors are: (1) Projects that perform a 5% of work advance in the first two tenths and maintain a constant rhythm until completion (greater than 10% for each remaining tenth), being able to finish on the initially estimated time. (2) Projects that start with an adequate construction rate but suffer minor delays culminating with a total delay of almost 27% of the planned time. (3) Projects which start with a performance below the planned rate and end up with an average delay of 64%, and (4) projects that begin with a poor performance, suffer great delays and end up with an average delay of a 120% of the planned completion time. The obtained clusters compose a tool to identify the behavior of new construction projects by comparing their current work performance to the validated database, thus allowing the correction of initial estimations towards more accurate completion schedules.

Computing the Similarity and the Diversity in the Species Based on Cronobacter Genome

The purpose of computing the similarity and the diversity in the species is to trace the process of evolution and to find the relationship between the species and discover the unique, the special, the common and the universal proteins. The proteins of the whole genome of 40 species are compared with the cronobacter genome which is used as reference genome. More than 3 billion pairwise alignments are performed using blastp. Several findings are introduced in this study, for example, we found 172 proteins in cronobacter genome which have insignificant hits in other species, 116 significant proteins in the all tested species with very high score value and 129 common proteins in the plants but have insignificant hits in mammals, birds, fishes, and insects.

CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm

The biological function of an RNA molecule depends on its structure. The objective of the alignment is finding the homology between two or more RNA secondary structures. Knowing the common functionalities between two RNA structures allows a better understanding and a discovery of other relationships between them. Besides, identifying non-coding RNAs -that is not translated into a protein- is a popular application in which RNA structural alignment is the first step A few methods for RNA structure-to-structure alignment have been developed. Most of these methods are partial structure-to-structure, sequence-to-structure, or structure-to-sequence alignment. Less attention is given in the literature to the use of efficient RNA structure representation and the structure-to-structure alignment methods are lacking. In this paper, we introduce an O(N2) Component-based Pairwise RNA Structure Alignment (CompPSA) algorithm, where structures are given as a component-based representation and where N is the maximum number of components in the two structures. The proposed algorithm compares the two RNA secondary structures based on their weighted component features rather than on their base-pair details. Extensive experiments are conducted illustrating the efficiency of the CompPSA algorithm when compared to other approaches and on different real and simulated datasets. The CompPSA algorithm shows an accurate similarity measure between components. The algorithm gives the flexibility for the user to align the two RNA structures based on their weighted features (position, full length, and/or stem length). Moreover, the algorithm proves scalability and efficiency in time and memory performance.

Genetic Diversity Based Population Study of Freshwater Mud Eel (Monopterus cuchia) in Bangladesh

As genetic diversity is most important for existing, breeding and production of any fish; this study was undertaken for investigating genetic diversity of freshwater mud eel, Monopterus cuchia at population level where three ecological populations such as flooded area of Sylhet (P1), open water of Moulvibazar (P2) and open water of Sunamganj (P3) districts of Bangladesh were considered. Four arbitrary RAPD primers (OPB-12, C0-4, B-03 and OPB-08) were screened and RAPD banding patterns were analyzed among the populations considering 15 individuals of each population. In total 174, 138 and 149 bands were detected in the populations of P1, P2 and P3 respectively; however, each primer revealed less number of bands in each population. 100% polymorphic loci were recorded in P2 and P3 whereas only one monomorphic locus was observed in P1, recorded 97.5% polymorphism. Different genetic parameters such as inter-individual pairwise similarity, genetic distance, Nei genetic similarity, linkage distances, cluster analysis and allelic information, etc. were considered for measuring genetic diversity. The average inter-individual pairwise similarity was recorded 2.98, 1.47 and 1.35 in P1, P2 and P3 respectively. Considering genetic distance analysis, the highest distance 1 was recorded in P2 and P3 and the lowest genetic distance 0.444 was found in P2. The average Nei genetic similarity was observed 0.19, 0.16 and 0.13 in P1, P2 and P3, respectively; however, the average linkage distance was recorded 24.92, 17.14 and 15.28 in P1, P3 and P2 respectively. Based on linkage distance, genetic clusters were generated in three populations where 6 clades and 7 clusters were found in P1, 3 clades and 5 clusters were observed in P2 and 4 clades and 7 clusters were detected in P3. In addition, allelic information was observed where the frequency of p and q alleles were observed 0.093 and 0.907 in P1, 0.076 and 0.924 in P2, 0.074 and 0.926 in P3 respectively. The average gene diversity was observed highest in P2 (0.132) followed by P3 (0.131) and P1 (0.121) respectively.

Establishing Pairwise Keys Using Key Predistribution Schemes for Sensor Networks

Designing cost-efficient, secure network protocols for Wireless Sensor Networks (WSNs) is a challenging problem because sensors are resource-limited wireless devices. Security services such as authentication and improved pairwise key establishment are critical to high efficient networks with sensor nodes. For sensor nodes to correspond securely with each other efficiently, usage of cryptographic techniques is necessary. In this paper, two key predistribution schemes that enable a mobile sink to establish a secure data-communication link, on the fly, with any sensor nodes. The intermediate nodes along the path to the sink are able to verify the authenticity and integrity of the incoming packets using a predicted value of the key generated by the sender’s essential power. The proposed schemes are based on the pairwise key with the mobile sink, our analytical results clearly show that our schemes perform better in terms of network resilience to node capture than existing schemes if used in wireless sensor networks with mobile sinks.

Performance Evaluation of Cooperative Diversity in Flat Fading Channel with Error Control Coding

Cooperative communication provides transmit diversity, even when, due to size constraints, mobile units cannot accommodate multiple antennas. A versatile cooperation method called coded cooperation has been developed, in which cooperation is implemented through channel coding with a view to controlling the errors inherent in wireless communication. In this work we evaluate the performance of coded cooperation in flat Rayleigh fading environment using a concept known as the pair wise error probability (PEP). We derive the PEP for a flat fading scenario in coded cooperation and then compare with the signal-to-noise ratio of the users in the network. Results show that an increase in the SNR leads to a decrease in the PEP. We also carried out simulations to validate the result.

Remote Sensing, GIS, and AHP for Assessing Physical Vulnerability to Tsunami Hazard

Remote sensing image processing, spatial data analysis through GIS approach, and analytical hierarchy process were introduced in this study for assessing the vulnerability area and inundation area due to tsunami hazard in the area of Rikuzentakata, Iwate Prefecture, Japan. Appropriate input parameters were derived from GSI DEM data, ALOS AVNIR-2, and field data. We used the parameters of elevation, slope, shoreline distance, and vegetation density. Five classes of vulnerability were defined and weighted via pairwise comparison matrix. The assessment results described that 14.35km2 of the study area was under tsunami vulnerability zone. Inundation areas are those of high and slightly high vulnerability. The farthest area reached by a tsunami was about 7.50km from the shoreline and shows that rivers act as flooding strips that transport tsunami waves into the hinterland. This study can be used for determining a priority for land-use planning in the scope of tsunami hazard risk management.

A Pairwise-Gaussian-Merging Approach: Towards Genome Segmentation for Copy Number Analysis

Segmentation, filtering out of measurement errors and identification of breakpoints are integral parts of any analysis of microarray data for the detection of copy number variation (CNV). Existing algorithms designed for these tasks have had some successes in the past, but they tend to be O(N2) in either computation time or memory requirement, or both, and the rapid advance of microarray resolution has practically rendered such algorithms useless. Here we propose an algorithm, SAD, that is much faster and much less thirsty for memory – O(N) in both computation time and memory requirement -- and offers higher accuracy. The two key ingredients of SAD are the fundamental assumption in statistics that measurement errors are normally distributed and the mathematical relation that the product of two Gaussians is another Gaussian (function). We have produced a computer program for analyzing CNV based on SAD. In addition to being fast and small it offers two important features: quantitative statistics for predictions and, with only two user-decided parameters, ease of use. Its speed shows little dependence on genomic profile. Running on an average modern computer, it completes CNV analyses for a 262 thousand-probe array in ~1 second and a 1.8 million-probe array in 9 seconds

Modeling and Optimization of Abrasive Waterjet Parameters using Regression Analysis

Abrasive waterjet is a novel machining process capable of processing wide range of hard-to-machine materials. This research addresses modeling and optimization of the process parameters for this machining technique. To model the process a set of experimental data has been used to evaluate the effects of various parameter settings in cutting 6063-T6 aluminum alloy. The process variables considered here include nozzle diameter, jet traverse rate, jet pressure and abrasive flow rate. Depth of cut, as one of the most important output characteristics, has been evaluated based on different parameter settings. The Taguchi method and regression modeling are used in order to establish the relationships between input and output parameters. The adequacy of the model is evaluated using analysis of variance (ANOVA) technique. The pairwise effects of process parameters settings on process response outputs are also shown graphically. The proposed model is then embedded into a Simulated Annealing algorithm to optimize the process parameters. The optimization is carried out for any desired values of depth of cut. The objective is to determine proper levels of process parameters in order to obtain a certain level of depth of cut. Computational results demonstrate that the proposed solution procedure is quite effective in solving such multi-variable problems.

Different Multimedia Presentation Types and Students' Interpretation Achievement

The main purpose of the study was to determine whether students- interpretation achievement differed with the use of various multimedia presentation types. Four groups of students, text only (T), audio only (A), text and audio (TA), text and image (TI), were arranged and they were presented the same story via different types of multimedia presentations. Inference achievement was measured by a critical thinking inference test. Higher mean scores for the TA group compared to the other three groups were found. Also when compared pairwise, interpretation achievement of the TA group differed significantly from scores of the T and TI groups. These differences were interpreted with the increased cognitive load. Increased cognitive load for the TA group may have invited students to put more effort into comprehending the text, thus resulting in better test scores. Findings of the study can be seen as a sign of the importance of learning situations and learning outcomes in multimedia-supported learning environments and may have practical benefits for instructional designers.

A Formulation of the Latent Class Vector Model for Pairwise Data

In this research, a latent class vector model for pairwise data is formulated. As compared to the basic vector model, this model yields consistent estimates of the parameters since the number of parameters to be estimated does not increase with the number of subjects. The result of the analysis reveals that the model was stable and could classify each subject to the latent classes representing the typical scales used by these subjects.

Optimal and Generalized Multiple Descriptions Image Coding Transform in the Wavelet Domain

In this paper we propose a Multiple Description Image Coding(MDIC) scheme to generate two compressed and balanced rates descriptions in the wavelet domain (Daubechies biorthogonal (9, 7) wavelet) using pairwise correlating transform optimal and application method for Generalized Multiple Description Coding (GMDC) to image coding in the wavelet domain. The GMDC produces statistically correlated streams such that lost streams can be estimated from the received data. Our performance test shown that the proposed method gives more improvement and good quality of the reconstructed image when the wavelet coefficients are normalized by Gaussian Scale Mixture (GSM) model then the Gaussian one ,.

Protein-Protein Interaction Detection Based on Substring Sensitivity Measure

Detecting protein-protein interactions is a central problem in computational biology and aberrant such interactions may have implicated in a number of neurological disorders. As a result, the prediction of protein-protein interactions has recently received considerable attention from biologist around the globe. Computational tools that are capable of effectively identifying protein-protein interactions are much needed. In this paper, we propose a method to detect protein-protein interaction based on substring similarity measure. Two protein sequences may interact by the mean of the similarities of the substrings they contain. When applied on the currently available protein-protein interaction data for the yeast Saccharomyces cerevisiae, the proposed method delivered reasonable improvement over the existing ones.

On Reversal and Transposition Medians

During the last years, the genomes of more and more species have been sequenced, providing data for phylogenetic recon- struction based on genome rearrangement measures. A main task in all phylogenetic reconstruction algorithms is to solve the median of three problem. Although this problem is NP-hard even for the sim- plest distance measures, there are exact algorithms for the breakpoint median and the reversal median that are fast enough for practical use. In this paper, this approach is extended to the transposition median as well as to the weighted reversal and transposition median. Although there is no exact polynomial algorithm known even for the pairwise distances, we will show that it is in most cases possible to solve these problems exactly within reasonable time by using a branch and bound algorithm.

Codebook Generation for Vector Quantization on Orthogonal Polynomials based Transform Coding

In this paper, a new algorithm for generating codebook is proposed for vector quantization (VQ) in image coding. The significant features of the training image vectors are extracted by using the proposed Orthogonal Polynomials based transformation. We propose to generate the codebook by partitioning these feature vectors into a binary tree. Each feature vector at a non-terminal node of the binary tree is directed to one of the two descendants by comparing a single feature associated with that node to a threshold. The binary tree codebook is used for encoding and decoding the feature vectors. In the decoding process the feature vectors are subjected to inverse transformation with the help of basis functions of the proposed Orthogonal Polynomials based transformation to get back the approximated input image training vectors. The results of the proposed coding are compared with the VQ using Discrete Cosine Transform (DCT) and Pairwise Nearest Neighbor (PNN) algorithm. The new algorithm results in a considerable reduction in computation time and provides better reconstructed picture quality.

Multiple Sequence Alignment Using Three- Dimensional Fragments

Background: Dialign is a DNA/Protein alignment tool for performing pairwise and multiple pairwise alignments through the comparison of gap-free segments (fragments) between sequence pairs. An alignment of two sequences is a chain of fragments, i.e local gap-free pairwise alignments, with the highest total score. METHOD: A new approach is defined in this article which relies on the concept of using three-dimensional fragments – i.e. local threeway alignments -- in the alignment process instead of twodimensional ones. These three-dimensional fragments are gap-free alignments constituting of equal-length segments belonging to three distinct sequences. RESULTS: The obtained results showed good improvments over the performance of DIALIGN.