WebAppShield: An Approach Exploiting Machine Learning to Detect SQLi Attacks in an Application Layer in Run-Time

In recent years, SQL injection attacks have been identified as being prevalent against web applications. They affect network security and user data, which leads to a considerable loss of money and data every year. This paper presents the use of classification algorithms in machine learning using a method to classify the login data filtering inputs into "SQLi" or "Non-SQLi,” thus increasing the reliability and accuracy of results in terms of deciding whether an operation is an attack or a valid operation. A method as a Web-App is developed for auto-generated data replication to provide a twin of the targeted data structure. Shielding against SQLi attacks (WebAppShield) that verifies all users and prevents attackers (SQLi attacks) from entering and or accessing the database, which the machine learning module predicts as "Non-SQLi", has been developed. A special login form has been developed with a special instance of the data validation; this verification process secures the web application from its early stages. The system has been tested and validated, and up to 99% of SQLi attacks have been prevented.

Fuzzy Uncertainty Theory for Stealth Fighter Aircraft Selection in Entropic Fuzzy TOPSIS Decision Analysis Process

The purpose of this paper is to present fuzzy TOPSIS in an entropic fuzzy environment. Due to the ambiguous concepts often represented in decision data, exact values are insufficient to model real-life situations. In this paper, the rating of each alternative is defined in fuzzy linguistic terms, which can be expressed with triangular fuzzy numbers. The weight of each criterion is then derived from the decision matrix using the entropy weighting method. Next, a vertex method is proposed to calculate the distance between two triangular fuzzy numbers. According to the TOPSIS concept, a closeness coefficient is defined to determine the ranking order of all alternatives by simultaneously calculating the distances to both the fuzzy positive-ideal solution (FPIS) and the fuzzy negative-ideal solution (FNIS). Finally, an illustrative example of selecting stealth fighter aircraft is shown at the end of this article to highlight the procedure of the proposed method. Correlation analysis and validation analysis using TOPSIS, WSM, and WPM methods were performed to compare the ranking order of the alternatives.

Microservices-Based Provisioning and Control of Network Services for Heterogeneous Networks

Microservices architecture has been widely embraced for rapid, frequent, and reliable delivery of complex applications. It enables organizations to evolve their technology stack in various domains. Today, the networking domain is flooded with plethora of devices and software solutions which address different functionalities ranging from elementary operations, viz., switching, routing, firewall etc., to complex analytics and insights based intelligent services. In this paper, we attempt to bring in the microservices based approach for agile and adaptive delivery of network services for any underlying networking technology. We discuss the life cycle management of each individual microservice and a distributed control approach with emphasis for dynamic provisioning, management, and orchestration in an automated fashion which can provide seamless operations in large scale networks. We have conducted validations of the system in lab testbed comprising of Traditional/Legacy and Software Defined Wireless Local Area networks.

Additive Friction Stir Manufacturing Process: Interest in Understanding Thermal Phenomena and Numerical Modeling of the Temperature Rise Phase

Additive Friction Stir Manufacturing, or AFSM, is a new industrial process that follows the emergence of friction-based processes. The AFSM process is a solid-state additive process using the energy produced by the friction at the interface between a rotating non-consumable tool and a substrate. Friction depends on various parameters like axial force, rotation speed or friction coefficient. The feeder material is a metallic rod that flows through a hole in the tool. There is still a lack in understanding of the physical phenomena taking place during the process. This research aims at a better AFSM process understanding and implementation, thanks to numerical simulation and experimental validation performed on a prototype effector. Such an approach is considered a promising way for studying the influence of the process parameters and to finally identify a process window that seems relevant. The deposition of material through the AFSM process takes place in several phases. In chronological order these phases are the docking phase, the dwell time phase, the deposition phase, and the removal phase. The present work focuses on the dwell time phase that enables the temperature rise of the system due to pure friction. An analytic modeling of heat generation based on friction considers as main parameters the rotational speed and the contact pressure. Another parameter considered influential is the friction coefficient assumed to be variable, due to the self-lubrication of the system with the rise in temperature or the materials in contact roughness smoothing over time. This study proposes through a numerical modeling followed by an experimental validation to question the influence of the various input parameters on the dwell time phase. Rotation speed, temperature, spindle torque and axial force are the main monitored parameters during experimentations and serve as reference data for the calibration of the numerical model. This research shows that the geometry of the tool as well as fluctuations of the input parameters like axial force and rotational speed are very influential on the temperature reached and/or the time required to reach the targeted temperature. The main outcome is the prediction of a process window which is a key result for a more efficient process implementation.

Translation, Cultural Adaptation and Validation of the Hungarian Version of Self-Determination Scale

There is a scarcity of validated instruments in Hungarian for the assessment of self-determination related traits and behaviors. In order to fill in this gap, the aim of this study was the translation, cultural adaptation and validation of Self-Determination Scale (SDS) for the Hungarian population. A total of 4335 adults participated in the study. The mean age of the participants was 27.97 (SD = 9.60). The sample consisted mostly of females, less than 20% were males. Exploratory and Confirmatory Factor Analysis was performed for factorial structure checking and validation Cronbach’s alpha was used to examine the reliability of the factors. Our results revealed that the Hungarian version of SDS has good psychometric properties and it is a reliable tool for psychologists who would like to study or assess self-determination traits in their clients. The adapted and validated Hungarian version of SDS is presented in this paper.

Learning Objects Content Presentation Adaptation Model Considering Students' Learning Styles

Learning styles (LSs) correspond to the individual preferences of a person regarding the modes and forms in which he/she prefers to learn throughout the teaching/learning process. The content presentation of learning objects (LOs) using knowledge about the students’ LSs offers them digital educational resources tailored to their individual learning preferences. In this context, the most relevant characteristics of the LSs along with the most appropriate forms of LOs' content presentation were mapped and associated. Such was performed in order to define the composition of an adaptive model of LO's content presentation considering the LSs, which was called Adaptation of Content Presentation of Learning Objects Considering Learning Styles (ACPLOLS). LO prototypes were created with interfaces that were adapted to students' LSs. These prototypes were based on a model created for validation of the approaches that were used, which were established through experiments with the students. The results of subjective measures of students' emotional responses demonstrated that the ACPLOLS has reached the desired results in relation to the adequacy of the LOs interface, in accordance with the Felder-Silverman LSs Model.

IntelligentLogger: A Heavy-Duty Vehicles Fleet Management System Based on IoT and Smart Prediction Techniques

Both daily and long-term management of a heavy-duty vehicles and construction machinery fleet is an extremely complicated and hard to solve issue. This is mainly due to the diversity of the fleet vehicles – machinery, which concerns not only the vehicle types, but also their age/efficiency, as well as the fleet volume, which is often of the order of hundreds or even thousands of vehicles/machineries. In the present paper we present “InteligentLogger”, a holistic heavy-duty fleet management system covering a wide range of diverse fleet vehicles. This is based on specifically designed hardware and software for the automated vehicle health status and operational cost monitoring, for smart maintenance. InteligentLogger is characterized by high adaptability that permits to be tailored to practically any heavy-duty vehicle/machinery (of different technologies -modern or legacy- and of dissimilar uses). Contrary to conventional logistic systems, which are characterized by raised operational costs and often errors, InteligentLogger provides a cost-effective and reliable integrated solution for the e-management and e-maintenance of the fleet members. The InteligentLogger system offers the following unique features that guarantee successful heavy-duty vehicles/machineries fleet management: (a) Recording and storage of operating data of motorized construction machinery, in a reliable way and in real time, using specifically designed Internet of Things (IoT) sensor nodes that communicate through the available network infrastructures, e.g., 3G/LTE; (b) Use on any machine, regardless of its age, in a universal way; (c) Flexibility and complete customization both in terms of data collection, integration with 3rd party systems, as well as in terms of processing and drawing conclusions; (d) Validation, error reporting & correction, as well as update of the system’s database; (e) Artificial intelligence (AI) software, for processing information in real time, identifying out-of-normal behavior and generating alerts; (f) A MicroStrategy based enterprise BI, for modeling information and producing reports, dashboards, and alerts focusing on vehicles– machinery optimal usage, as well as maintenance and scraping policies; (g) Modular structure that allows low implementation costs in the basic fully functional version, but offers scalability without requiring a complete system upgrade.

Research Design for Developing and Validating Ice-Hockey Team Diagnostics Scale

In the modern world, ice-hockey (and in a broader sense, team sports) is becoming an increasingly popular field of entertainment. Although the main element is most likely perceived as the show itself, winning is an inevitable part of the successful operation of any sports team. In this paper, the author creates a research design allowing to develop and validate an ice-hockey team-focused diagnostics scale, which enables researchers and practitioners to identify the problems associated with underperforming teams. The construction of the scale starts with personal interviews with experts of the field, carefully chosen from Hungarian ice-hockey sector. Based on the interviews, the author is shown to be in the position to create the categories and the relevant items for the scale. When constructed, the next step is the validation process on a Hungarian sample. Data for validation are acquired through reaching the licensed database of the Hungarian Ice-Hockey Federation involving Hungarian ice-hockey coaches and players. The Ice-Hockey Team Diagnostics Scale is to be created to orientate practitioners in understanding both effective and underperforming team work.

Participatory Financial Inclusion Hypothesis: A Preliminary Empirical Validation Using Survey Design

In Nigeria, enormous efforts/resources had, over the years, been expended on promoting financial inclusion (FI); however, it is seemingly discouraging that many of its self-declared targets on FI remained unachieved, especially amongst the Rural Dwellers and Actors in the Informal Sectors (RDAIS). Expectedly, many reasons had been earmarked for these failures: low literacy level, huge informal/rural sectors etc. This study posits that in spite of these truly-debilitating factors, these FI policy failures could have been avoided or mitigated if the principles of active and better-managed citizens’ participation had been strictly followed in the (re)design/implementation of its FI policies. In other words, in a bid to mitigate the prevalent financial exclusion (FE) in Nigeria, this study hypothesizes the significant positive impact of involving the RDAIS in policy-wide decision making in the FI domain, backed by a preliminary empirical validation. Also, the study introduces the RDAIS-focused Participatory Financial Inclusion Policy (PFIP) as a major FI policy regeneration/improvement tool. The three categories of respondents that served as research subjects are FI experts in Nigeria (n = 72), RDAIS from the very rural/remote village of Unguwar Dogo in Northern Nigeria (n = 43) and RDAIS from another rural village of Sekere (n = 56) in the Southern region of Nigeria. Using survey design (5-point Likert scale questionnaires), random/stratified sampling, and descriptive/inferential statistics, the study often recorded independent consensus (amongst these three categories of respondents) that RDAIS’s active participation in iterative FI policy initiation, (re)design, implementation, (re)evaluation could indeed give improved FI outcomes. However, few questionnaire items also recorded divergent opinions and various statistically (in)significant differences on the mean scores of these three categories. The PFIP (or any customized version of it) should then be carefully integrated into the NFIS of Nigeria (and possibly in the NFIS of other developing countries) to truly/fully provide FI policy integration for these excluded RDAIS and arrest the prevalence of FE.

An Approach for Coagulant Dosage Optimization Using Soft Jar Test: A Case Study of Bangkhen Water Treatment Plant

The most important process of the water treatment plant process is coagulation, which uses alum and poly aluminum chloride (PACL). Therefore, determining the dosage of alum and PACL is the most important factor to be prescribed. This research applies an artificial neural network (ANN), which uses the Levenberg–Marquardt algorithm to create a mathematical model (Soft Jar Test) for chemical dose prediction, as used for coagulation, such as alum and PACL, with input data consisting of turbidity, pH, alkalinity, conductivity, and, oxygen consumption (OC) of the Bangkhen Water Treatment Plant (BKWTP), under the authority of the Metropolitan Waterworks Authority of Thailand. The data were collected from 1 January 2019 to 31 December 2019 in order to cover the changing seasons of Thailand. The input data of ANN are divided into three groups: training set, test set, and validation set. The coefficient of determination and the mean absolute errors of the alum model are 0.73, 3.18 and the PACL model are 0.59, 3.21, respectively.

Classification of Extreme Ground-Level Ozone Based on Generalized Extreme Value Model for Air Monitoring Station

Higher ground-level ozone (GLO) concentration adversely affects human health, vegetations as well as activities in the ecosystem. In Malaysia, most of the analysis on GLO concentration are carried out using the average value of GLO concentration, which refers to the centre of distribution to make a prediction or estimation. However, analysis which focuses on the higher value or extreme value in GLO concentration is rarely explored. Hence, the objective of this study is to classify the tail behaviour of GLO using generalized extreme value (GEV) distribution estimation the return level using the corresponding modelling (Gumbel, Weibull, and Frechet) of GEV distribution. The results show that Weibull distribution which is also known as short tail distribution and considered as having less extreme behaviour is the best-fitted distribution for four selected air monitoring stations in Peninsular Malaysia, namely Larkin, Pelabuhan Kelang, Shah Alam, and Tanjung Malim; while Gumbel distribution which is considered as a medium tail distribution is the best-fitted distribution for Nilai station. The return level of GLO concentration in Shah Alam station is comparatively higher than other stations. Overall, return levels increase with increasing return periods but the increment depends on the type of the tail of GEV distribution’s tail. We conduct this study by using maximum likelihood estimation (MLE) method to estimate the parameters at four selected stations in Peninsular Malaysia. Next, the validation for the fitted block maxima series to GEV distribution is performed using probability plot, quantile plot and likelihood ratio test. Profile likelihood confidence interval is tested to verify the type of GEV distribution. These results are important as a guide for early notification on future extreme ozone events.

Data Analysis Techniques for Predictive Maintenance on Fleet of Heavy-Duty Vehicles

The present study proposes a methodology for the efficient daily management of fleet vehicles and construction machinery. The application covers the area of remote monitoring of heavy-duty vehicles operation parameters, where specific sensor data are stored and examined in order to provide information about the vehicle’s health. The vehicle diagnostics allow the user to inspect whether maintenance tasks need to be performed before a fault occurs. A properly designed machine learning model is proposed for the detection of two different types of faults through classification. Cross validation is used and the accuracy of the trained model is checked with the confusion matrix.

Solid State Drive End to End Reliability Prediction, Characterization and Control

A flaw or drift from expected operational performance in one component (NAND, PMIC, controller, DRAM, etc.) may affect the reliability of the entire Solid State Drive (SSD) system. Therefore, it is important to ensure the required quality of each individual component through qualification testing specified using standards or user requirements. Qualification testing is time-consuming and comes at a substantial cost for product manufacturers. A highly technical team, from all the eminent stakeholders is embarking on reliability prediction from beginning of new product development, identify critical to reliability parameters, perform full-blown characterization to embed margin into product reliability and establish control to ensure the product reliability is sustainable in the mass production. The paper will discuss a comprehensive development framework, comprehending SSD end to end from design to assembly, in-line inspection, in-line testing and will be able to predict and to validate the product reliability at the early stage of new product development. During the design stage, the SSD will go through intense reliability margin investigation with focus on assembly process attributes, process equipment control, in-process metrology and also comprehending forward looking product roadmap. Once these pillars are completed, the next step is to perform process characterization and build up reliability prediction modeling. Next, for the design validation process, the reliability prediction specifically solder joint simulator will be established. The SSD will be stratified into Non-Operating and Operating tests with focus on solder joint reliability and connectivity/component latent failures by prevention through design intervention and containment through Temperature Cycle Test (TCT). Some of the SSDs will be subjected to the physical solder joint analysis called Dye and Pry (DP) and Cross Section analysis. The result will be feedbacked to the simulation team for any corrective actions required to further improve the design. Once the SSD is validated and is proven working, it will be subjected to implementation of the monitor phase whereby Design for Assembly (DFA) rules will be updated. At this stage, the design change, process and equipment parameters are in control. Predictable product reliability at early product development will enable on-time sample qualification delivery to customer and will optimize product development validation, effective development resource and will avoid forced late investment to bandage the end-of-life product failures. Understanding the critical to reliability parameters earlier will allow focus on increasing the product margin that will increase customer confidence to product reliability.

Methodology for the Multi-Objective Analysis of Data Sets in Freight Delivery

Data flow and the purpose of reporting the data are different and dependent on business needs. Different parameters are reported and transferred regularly during freight delivery. This business practices form the dataset constructed for each time point and contain all required information for freight moving decisions. As a significant amount of these data is used for various purposes, an integrating methodological approach must be developed to respond to the indicated problem. The proposed methodology contains several steps: (1) collecting context data sets and data validation; (2) multi-objective analysis for optimizing freight transfer services. For data validation, the study involves Grubbs outliers analysis, particularly for data cleaning and the identification of statistical significance of data reporting event cases. The Grubbs test is often used as it measures one external value at a time exceeding the boundaries of standard normal distribution. In the study area, the test was not widely applied by authors, except when the Grubbs test for outlier detection was used to identify outsiders in fuel consumption data. In the study, the authors applied the method with a confidence level of 99%. For the multi-objective analysis, the authors would like to select the forms of construction of the genetic algorithms, which have more possibilities to extract the best solution. For freight delivery management, the schemas of genetic algorithms' structure are used as a more effective technique. Due to that, the adaptable genetic algorithm is applied for the description of choosing process of the effective transportation corridor. In this study, the multi-objective genetic algorithm methods are used to optimize the data evaluation and select the appropriate transport corridor. The authors suggest a methodology for the multi-objective analysis, which evaluates collected context data sets and uses this evaluation to determine a delivery corridor for freight transfer service in the multi-modal transportation network. In the multi-objective analysis, authors include safety components, the number of accidents a year, and freight delivery time in the multi-modal transportation network. The proposed methodology has practical value in the management of multi-modal transportation processes.

An Effort at Improving Reliability of Laboratory Data in Titrimetric Analysis for Zinc Sulphate Tablets Using Validated Spreadsheet Calculators

The requirement for maintaining data integrity in laboratory operations is critical for regulatory compliance. Automation of procedures reduces incidence of human errors. Quality control laboratories located in low-income economies may face some barriers in attempts to automate their processes. Since data from quality control tests on pharmaceutical products are used in making regulatory decisions, it is important that laboratory reports are accurate and reliable. Zinc Sulphate (ZnSO4) tablets is used in treatment of diarrhea in pediatric population, and as an adjunct therapy for COVID-19 regimen. Unfortunately, zinc content in these formulations is determined titrimetrically; a manual analytical procedure. The assay for ZnSO4 tablets involves time-consuming steps that contain mathematical formulae prone to calculation errors. To achieve consistency, save costs, and improve data integrity, validated spreadsheets were developed to simplify the two critical steps in the analysis of ZnSO4 tablets: standardization of 0.1M Sodium Edetate (EDTA) solution, and the complexometric titration assay procedure. The assay method in the United States Pharmacopoeia was used to create a process flow for ZnSO4 tablets. For each step in the process, different formulae were input into two spreadsheets to automate calculations. Further checks were created within the automated system to ensure validity of replicate analysis in titrimetric procedures. Validations were conducted using five data sets of manually computed assay results. The acceptance criteria set for the protocol were met. Significant p-values (p < 0.05, α = 0.05, at 95% Confidence Interval) were obtained from students’ t-test evaluation of the mean values for manual-calculated and spreadsheet results at all levels of the analysis flow. Right-first-time analysis and principles of data integrity were enhanced by use of the validated spreadsheet calculators in titrimetric evaluations of ZnSO4 tablets. Human errors were minimized in calculations when procedures were automated in quality control laboratories. The assay procedure for the formulation was achieved in a time-efficient manner with greater level of accuracy. This project is expected to promote cost savings for laboratory business models.

Artificial Neural Network-Based Short-Term Load Forecasting for Mymensingh Area of Bangladesh

Electrical load forecasting is considered to be one of the most indispensable parts of a modern-day electrical power system. To ensure a reliable and efficient supply of electric energy, special emphasis should have been put on the predictive feature of electricity supply. Artificial Neural Network-based approaches have emerged to be a significant area of interest for electric load forecasting research. This paper proposed an Artificial Neural Network model based on the particle swarm optimization algorithm for improved electric load forecasting for Mymensingh, Bangladesh. The forecasting model is developed and simulated on the MATLAB environment with a large number of training datasets. The model is trained based on eight input parameters including historical load and weather data. The predicted load data are then compared with an available dataset for validation. The proposed neural network model is proved to be more reliable in terms of day-wise load forecasting for Mymensingh, Bangladesh.

Early Melt Season Variability of Fast Ice Degradation Due to Small Arctic Riverine Heat Fluxes

In order to determine the importance of small-system riverine heat flux on regional landfast sea ice breakup, our study explores the annual spring freshet of the Sagavanirktok River from 2014-2019. Seasonal heat cycling ultimately serves as the driving mechanism behind the freshet; however, as an emerging area of study, the extent to which inland thermodynamics influence coastal tundra geomorphology and connected landfast sea ice has not been extensively investigated in relation to small-scale Arctic river systems. The Sagavanirktok River is a small-to-midsized river system that flows south-to-north on the Alaskan North Slope from the Brooks mountain range to the Beaufort Sea at Prudhoe Bay. Seasonal warming in the spring rapidly melts snow and ice in a northwards progression from the Brooks Range and transitional tundra highlands towards the coast and when coupled with seasonal precipitation, results in a pulsed freshet that propagates through the Sagavanirktok River. The concentrated presence of newly exposed vegetation in the transitional tundra region due to spring melting results in higher absorption of solar radiation due to a lower albedo relative to snow-covered tundra and/or landfast sea ice. This results in spring flood runoff that advances over impermeable early-season permafrost soils with elevated temperatures relative to landfast sea ice and sub-ice flow. We examine the extent to which interannual temporal variability influences the onset and magnitude of river discharge by analyzing field measurements from the United States Geological Survey (USGS) river and meteorological observation sites. Rapid influx of heat to the Arctic Ocean via riverine systems results in a noticeable decay of landfast sea ice independent of ice breakup seaward of the shear zone. Utilizing MODIS imagery from NASA’s Terra satellite, interannual variability of river discharge is visualized, allowing for optical validation that the discharge flow is interacting with landfast sea ice. Thermal erosion experienced by sediment fast ice at the arrival of warm overflow preconditions the ice regime for rapid thawing. We investigate the extent to which interannual heat flux from the Sagavanirktok River’s freshet significantly influences the onset of local landfast sea ice breakup. The early-season warming of atmospheric temperatures is evidenced by the presence of storms which introduce liquid, rather than frozen, precipitation into the system. The resultant decreased albedo of the transitional tundra supports the positive relationship between early-season precipitation events, inland thermodynamic cycling, and degradation of landfast sea ice. Early removal of landfast sea ice increases coastal erosion in these regions and has implications for coastline geomorphology which stress industrial, ecological, and humanitarian infrastructure.

Blockchain’s Feasibility in Military Data Networks

Communication security is of particular interest to military data networks. A relatively novel approach to network security is blockchain, a cryptographically secured distribution ledger with a decentralized consensus mechanism for data transaction processing. Recent advances in blockchain technology have proposed new techniques for both data validation and trust management, as well as different frameworks for managing dataflow. The purpose of this work is to test the feasibility of different blockchain architectures as applied to military command and control networks. Various architectures are tested through discrete-event simulation and the feasibility is determined based upon a blockchain design’s ability to maintain long-term stable performance at industry standards of throughput, network latency, and security. This work proposes a consortium blockchain architecture with a computationally inexpensive consensus mechanism, one that leverages a Proof-of-Identity (PoI) concept and a reputation management mechanism.

Feature Analysis of Predictive Maintenance Models

Research in predictive maintenance modeling has improved in the recent years to predict failures and needed maintenance with high accuracy, saving cost and improving manufacturing efficiency. However, classic prediction models provide little valuable insight towards the most important features contributing to the failure. By analyzing and quantifying feature importance in predictive maintenance models, cost saving can be optimized based on business goals. First, multiple classifiers are evaluated with cross-validation to predict the multi-class of failures. Second, predictive performance with features provided by different feature selection algorithms are further analyzed. Third, features selected by different algorithms are ranked and combined based on their predictive power. Finally, linear explainer SHAP (SHapley Additive exPlanations) is applied to interpret classifier behavior and provide further insight towards the specific roles of features in both local predictions and global model behavior. The results of the experiments suggest that certain features play dominant roles in predictive models while others have significantly less impact on the overall performance. Moreover, for multi-class prediction of machine failures, the most important features vary with type of machine failures. The results may lead to improved productivity and cost saving by prioritizing sensor deployment, data collection, and data processing of more important features over less importance features.

Transcriptomics Analysis on Comparing Non-Small Cell Lung Cancer versus Normal Lung, and Early Stage Compared versus Late-Stages of Non-Small Cell Lung Cancer

Lung cancer is one of the most common malignancies and primary cause of death due to cancer worldwide. Non-small cell lung cancer (NSCLC) is the main subtype in which majority of patients present with advanced-stage disease. Herein, we analyzed differentially expressed genes to find potential biomarkers for lung cancer diagnosis as well as prognostic markers. We used transcriptome data from our 2 NSCLC patients and public data (GSE81089) composing of 8 NSCLC and 10 normal lung tissues. Differentially expressed genes (DEGs) between NSCLC and normal tissue and between early-stage and late-stage NSCLC were analyzed by the DESeq2. Pairwise correlation was used to find the DEGs with false discovery rate (FDR) adjusted p-value £ 0.05 and |log2 fold change| ³ 4 for NSCLC versus normal and FDR adjusted p-value £ 0.05 with |log2 fold change| ³ 2 for early versus late-stage NSCLC. Bioinformatic tools were used for functional and pathway analysis. Moreover, the top ten genes in each comparison group were verified the expression and survival analysis via GEPIA. We found 150 up-regulated and 45 down-regulated genes in NSCLC compared to normal tissues. Many immnunoglobulin-related genes e.g., IGHV4-4, IGHV5-10-1, IGHV4-31, IGHV4-61, and IGHV1-69D were significantly up-regulated. 22 genes were up-regulated, and five genes were down-regulated in late-stage compared to early-stage NSCLC. The top five DEGs genes were KRT6B, SPRR1A, KRT13, KRT6A and KRT5. Keratin 6B (KRT6B) was the most significantly increased gene in the late-stage NSCLC. From GEPIA analysis, we concluded that IGHV4-31 and IGKV1-9 might be used as diagnostic biomarkers, while KRT6B and KRT6A might be used as prognostic biomarkers. However, further clinical validation is needed.