Towards End-To-End Disease Prediction from Raw Metagenomic Data

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Early Depression Detection for Young Adults with a Psychiatric and AI Interdisciplinary Multimodal Framework

During COVID-19, the depression rate has increased dramatically. Young adults are most vulnerable to the mental health effects of the pandemic. Lower-income families have a higher ratio to be diagnosed with depression than the general population, but less access to clinics. This research aims to achieve early depression detection at low cost, large scale, and high accuracy with an interdisciplinary approach by incorporating clinical practices defined by American Psychiatric Association (APA) as well as multimodal AI framework. The proposed approach detected the nine depression symptoms with Natural Language Processing sentiment analysis and a symptom-based Lexicon uniquely designed for young adults. The experiments were conducted on the multimedia survey results from adolescents and young adults and unbiased Twitter communications. The result was further aggregated with the facial emotional cues analyzed by the Convolutional Neural Network on the multimedia survey videos. Five experiments each conducted on 10k data entries reached consistent results with an average accuracy of 88.31%, higher than the existing natural language analysis models. This approach can reach 300+ million daily active Twitter users and is highly accessible by low-income populations to promote early depression detection to raise awareness in adolescents and young adults and reveal complementary cues to assist clinical depression diagnosis.

A Real-Time Bayesian Decision-Support System for Predicting Suspect Vehicle’s Intended Target Using a Sparse Camera Network

We present a decision-support tool to assist an operator in the detection and tracking of a suspect vehicle traveling to an unknown target destination. Multiple data sources, such as traffic cameras, traffic information, weather, etc., are integrated and processed in real-time to infer a suspect’s intended destination chosen from a list of pre-determined high-value targets. Previously, we presented our work in the detection and tracking of vehicles using traffic and airborne cameras. Here, we focus on the fusion and processing of that information to predict a suspect’s behavior. The network of cameras is represented by a directional graph, where the edges correspond to direct road connections between the nodes and the edge weights are proportional to the average time it takes to travel from one node to another. For our experiments, we construct our graph based on the greater Los Angeles subset of the Caltrans’s “Performance Measurement System” (PeMS) dataset. We propose a Bayesian approach where a posterior probability for each target is continuously updated based on detections of the suspect in the live video feeds. Additionally, we introduce the concept of ‘soft interventions’, inspired by the field of Causal Inference. Soft interventions are herein defined as interventions that do not immediately interfere with the suspect’s movements; rather, a soft intervention may induce the suspect into making a new decision, ultimately making their intent more transparent. For example, a soft intervention could be temporarily closing a road a few blocks from the suspect’s current location, which may require the suspect to change their current course. The objective of these interventions is to gain the maximum amount of information about the suspect’s intent in the shortest possible time. Our system currently operates in a human-on-the-loop mode where at each step, a set of recommendations are presented to the operator to aid in decision-making. In principle, the system could operate autonomously, only prompting the operator for critical decisions, allowing the system to significantly scale up to larger areas and multiple suspects. Once the intended target is identified with sufficient confidence, the vehicle is reported to the authorities to take further action. Other recommendations include a selection of road closures, i.e., soft interventions, or to continue monitoring. We evaluate the performance of the proposed system using simulated scenarios where the suspect, starting at random locations, takes a noisy shortest path to their intended target. In all scenarios, the suspect’s intended target is unknown to our system. The decision thresholds are selected to maximize the chances of determining the suspect’s intended target in the minimum amount of time and with the smallest number of interventions. We conclude by discussing the limitations of our current approach to motivate a machine learning approach, based on reinforcement learning in order to relax some of the current limiting assumptions.

Methodology for the Multi-Objective Analysis of Data Sets in Freight Delivery

Data flow and the purpose of reporting the data are different and dependent on business needs. Different parameters are reported and transferred regularly during freight delivery. This business practices form the dataset constructed for each time point and contain all required information for freight moving decisions. As a significant amount of these data is used for various purposes, an integrating methodological approach must be developed to respond to the indicated problem. The proposed methodology contains several steps: (1) collecting context data sets and data validation; (2) multi-objective analysis for optimizing freight transfer services. For data validation, the study involves Grubbs outliers analysis, particularly for data cleaning and the identification of statistical significance of data reporting event cases. The Grubbs test is often used as it measures one external value at a time exceeding the boundaries of standard normal distribution. In the study area, the test was not widely applied by authors, except when the Grubbs test for outlier detection was used to identify outsiders in fuel consumption data. In the study, the authors applied the method with a confidence level of 99%. For the multi-objective analysis, the authors would like to select the forms of construction of the genetic algorithms, which have more possibilities to extract the best solution. For freight delivery management, the schemas of genetic algorithms' structure are used as a more effective technique. Due to that, the adaptable genetic algorithm is applied for the description of choosing process of the effective transportation corridor. In this study, the multi-objective genetic algorithm methods are used to optimize the data evaluation and select the appropriate transport corridor. The authors suggest a methodology for the multi-objective analysis, which evaluates collected context data sets and uses this evaluation to determine a delivery corridor for freight transfer service in the multi-modal transportation network. In the multi-objective analysis, authors include safety components, the number of accidents a year, and freight delivery time in the multi-modal transportation network. The proposed methodology has practical value in the management of multi-modal transportation processes.

Time and Wavelength Division Multiplexing Passive Optical Network Comparative Analysis: Modulation Formats and Channel Spacings

In light of the substantial increase in end-user requirements and the incessant need of network operators to upgrade the capabilities of access networks, in this paper, the performance of the different modulation formats on eight-channels Time and Wavelength Division Multiplexing Passive Optical Network (TWDM-PON) transmission system has been examined and compared. Limitations and features of modulation formats have been determined to outline the most suitable design to enhance the data rate and transmission reach to obtain the best performance of the network. The considered modulation formats are On-Off Keying Non-Return-to-Zero (NRZ-OOK), Carrier Suppressed Return to Zero (CSRZ), Duo Binary (DB), Modified Duo Binary (MODB), Quadrature Phase Shift Keying (QPSK), and Differential Quadrature Phase Shift Keying (DQPSK). The performance has been analyzed by varying transmission distances and bit rates under different channel spacing. Furthermore, the system is evaluated in terms of minimum Bit Error Rate (BER) and Quality factor (Qf) without applying any dispersion compensation technique, or any optical amplifier. Optisystem software was used for simulation purposes.

Generative Adversarial Network Based Fingerprint Anti-Spoofing Limitations

Fingerprint Anti-Spoofing approaches have been actively developed and applied in real-world applications. One of the main problems for Fingerprint Anti-Spoofing is not robust to unseen samples, especially in real-world scenarios. A possible solution will be to generate artificial, but realistic fingerprint samples and use them for training in order to achieve good generalization. This paper contains experimental and comparative results with currently popular GAN based methods and uses realistic synthesis of fingerprints in training in order to increase the performance. Among various GAN models, the most popular StyleGAN is used for the experiments. The CNN models were first trained with the dataset that did not contain generated fake images and the accuracy along with the mean average error rate were recorded. Then, the fake generated images (fake images of live fingerprints and fake images of spoof fingerprints) were each combined with the original images (real images of live fingerprints and real images of spoof fingerprints), and various CNN models were trained. The best performances for each CNN model, trained with the dataset of generated fake images and each time the accuracy and the mean average error rate, were recorded. We observe that current GAN based approaches need significant improvements for the Anti-Spoofing performance, although the overall quality of the synthesized fingerprints seems to be reasonable. We include the analysis of this performance degradation, especially with a small number of samples. In addition, we suggest several approaches towards improved generalization with a small number of samples, by focusing on what GAN based approaches should learn and should not learn.

Loss Function Optimization for CNN-Based Fingerprint Anti-Spoofing

As biometric systems become widely deployed, the security of identification systems can be easily attacked by various spoof materials. This paper contributes to finding a reliable and practical anti-spoofing method using Convolutional Neural Networks (CNNs) based on the types of loss functions and optimizers. The types of CNNs used in this paper include AlexNet, VGGNet, and ResNet. By using various loss functions including Cross-Entropy, Center Loss, Cosine Proximity, and Hinge Loss, and various loss optimizers which include Adam, SGD, RMSProp, Adadelta, Adagrad, and Nadam, we obtained significant performance changes. We realize that choosing the correct loss function for each model is crucial since different loss functions lead to different errors on the same evaluation. By using a subset of the Livdet 2017 database, we validate our approach to compare the generalization power. It is important to note that we use a subset of LiveDet and the database is the same across all training and testing for each model. This way, we can compare the performance, in terms of generalization, for the unseen data across all different models. The best CNN (AlexNet) with the appropriate loss function and optimizers result in more than 3% of performance gain over the other CNN models with the default loss function and optimizer. In addition to the highest generalization performance, this paper also contains the models with high accuracy associated with parameters and mean average error rates to find the model that consumes the least memory and computation time for training and testing. Although AlexNet has less complexity over other CNN models, it is proven to be very efficient. For practical anti-spoofing systems, the deployed version should use a small amount of memory and should run very fast with high anti-spoofing performance. For our deployed version on smartphones, additional processing steps, such as quantization and pruning algorithms, have been applied in our final model.

Impact of Climate Change on Sea Level Rise along the Coastline of Mumbai City, India

Sea-level rise being one of the most important impacts of anthropogenic induced climate change resulting from global warming and melting of icebergs at Arctic and Antarctic, the investigations done by various researchers both on Indian Coast and elsewhere during the last decade has been reviewed in this paper. The paper aims to ascertain the propensity of consistency of different suggested methods to predict the near-accurate future sea level rise along the coast of Mumbai. Case studies at East Coast, Southern Tip and West and South West coast of India have been reviewed. Coastal Vulnerability Index of several important international places has been compared, which matched with Intergovernmental Panel on Climate Change forecasts. The application of Geographic Information System mapping, use of remote sensing technology, both Multi Spectral Scanner and Thematic Mapping data from Landsat classified through Iterative Self-Organizing Data Analysis Technique for arriving at high, moderate and low Coastal Vulnerability Index at various important coastal cities have been observed. Instead of data driven, hindcast based forecast for Significant Wave Height, additional impact of sea level rise has been suggested. Efficacy and limitations of numerical methods vis-à-vis Artificial Neural Network has been assessed, importance of Root Mean Square error on numerical results is mentioned. Comparing between various computerized methods on forecast results obtained from MIKE 21 has been opined to be more reliable than Delft 3D model.

Improving Subjective Bias Detection Using Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory

Detecting subjectively biased statements is a vital task. This is because this kind of bias, when present in the text or other forms of information dissemination media such as news, social media, scientific texts, and encyclopedias, can weaken trust in the information and stir conflicts amongst consumers. Subjective bias detection is also critical for many Natural Language Processing (NLP) tasks like sentiment analysis, opinion identification, and bias neutralization. Having a system that can adequately detect subjectivity in text will boost research in the above-mentioned areas significantly. It can also come in handy for platforms like Wikipedia, where the use of neutral language is of importance. The goal of this work is to identify the subjectively biased language in text on a sentence level. With machine learning, we can solve complex AI problems, making it a good fit for the problem of subjective bias detection. A key step in this approach is to train a classifier based on BERT (Bidirectional Encoder Representations from Transformers) as upstream model. BERT by itself can be used as a classifier; however, in this study, we use BERT as data preprocessor as well as an embedding generator for a Bi-LSTM (Bidirectional Long Short-Term Memory) network incorporated with attention mechanism. This approach produces a deeper and better classifier. We evaluate the effectiveness of our model using the Wiki Neutrality Corpus (WNC), which was compiled from Wikipedia edits that removed various biased instances from sentences as a benchmark dataset, with which we also compare our model to existing approaches. Experimental analysis indicates an improved performance, as our model achieved state-of-the-art accuracy in detecting subjective bias. This study focuses on the English language, but the model can be fine-tuned to accommodate other languages.

Transformation of Industrial Policy towards Industry 4.0 and Its Impact on Firms' Competition

Although Europe is on the threshold of a new industrial revolution called Industry 4.0, many believe that this will increase the flexibility of production, the mass adaptation of products to consumers and the speed of their service; it will also improve product quality and dramatically increase productivity. However, as expected, all the benefits of Industry 4.0 face many of the inevitable changes and challenges they pose. One of them is the inevitable transformation of current competition and business models. This article examines the possible results of competitive conversion from the classic Bertrand and Cournot models to qualitatively new competition based on innovation. Ability to deliver a new product quickly and the possibility to produce the individual design (through flexible and quickly configurable factories) by reducing equipment failures and increasing process automation and control is highly important. This study shows that the ongoing transformation of the competition model is changing the game. This, together with the creation of complex value networks, means huge investments that make it particularly difficult for small and medium-sized enterprises. In addition, the ongoing digitalization of data raises new concerns regarding legal obligations, intellectual property, and security.

An Exploratory Study on the Difference between Online and Offline Conformity Behavior among Chinese College Students

Conformity is defined as one in a social group changing his or her behavior to match the others’ behavior in the group. It is used to find that people show a higher level of online conformity behavior than offline. However, as anonymity can decrease the level of online conformity behavior, the difference between online and offline conformity behavior among Chinese college students still needs to be tested. In this study, college students (N = 60) have been randomly assigned into three groups: control group, offline experimental group, and online experimental group. Through comparing the results of offline experimental group and online experimental group with the Mann-Whitney U test, this study verified the results of Asch’s experiment, and found out that people show a lower level of online conformity behavior than offline, which contradicted the previous finding found in China. These results can be used to explain why some people make a lot of vicious remarks and radical ideas on the Internet but perform normally in their real life: the anonymity of the network makes the online group pressure less than offline, so people are less likely to conform to social norms and public opinions on the Internet. What is more, these results support the importance and relevance of online voting, because fewer online group pressures make it easier for people to expose their true ideas, thus gathering more comprehensive and truthful views and opinions.

Users’ Information Disclosure Determinants in Social Networking Sites: A Systematic Literature Review

The privacy paradox describes a phenomenon whereby there is no connection between stated privacy concerns and privacy behaviours. We need to understand the underlying reasons for this paradox if we are to help users to preserve their privacy more effectively. In particular, the Social Networking System (SNS) domain offers a rich area of investigation due to the risks of unwise information disclosure decisions. Our study thus aims to untangle the complicated nature and underlying mechanisms of online privacy-related decisions in SNSs. In this paper, we report on the findings of a Systematic Literature Review (SLR) that revealed a number of factors that are likely to influence online privacy decisions. Our deductive analysis approach was informed by Communicative Privacy Management (CPM) theory. We uncovered a lack of clarity around privacy attitudes and their link to behaviours, which makes it challenging to design privacy-protecting SNS platforms and to craft legislation to ensure that users’ privacy is preserved.

Identifying Network Subgraph-Associated Essential Genes in Molecular Networks

Essential genes play an important role in the survival of an organism. It has been shown that cancer-associated essential genes are genes necessary for cancer cell proliferation, where these genes are potential therapeutic targets. Also, it was demonstrated that mutations of the cancer-associated essential genes give rise to the resistance of immunotherapy for patients with tumors. In the present study, we focus on studying the biological effects of the essential genes from a network perspective. We hypothesize that one can analyze a biological molecular network by decomposing it into both three-node and four-node digraphs (subgraphs). These network subgraphs encode the regulatory interaction information among the network’s genetic elements. In this study, the frequency of occurrence of the subgraph-associated essential genes in a molecular network was quantified by using the statistical parameter, odds ratio. Biological effects of subgraph-associated essential genes are discussed. In summary, the subgraph approach provides a systematic method for analyzing molecular networks and it can capture useful biological information for biomedical research.

Neural Network Models for Actual Cost and Actual Duration Estimation in Construction Projects: Findings from Greece

Predicting the actual cost and duration in construction projects concern a continuous and existing problem for the construction sector. This paper addresses this problem with modern methods and data available from past public construction projects. 39 bridge projects, constructed in Greece, with a similar type of available data were examined. Considering each project’s attributes with the actual cost and the actual duration, correlation analysis is performed and the most appropriate predictive project variables are defined. Additionally, the most efficient subgroup of variables is selected with the use of the WEKA application, through its attribute selection function. The selected variables are used as input neurons for neural network models through correlation analysis. For constructing neural network models, the application FANN Tool is used. The optimum neural network model, for predicting the actual cost, produced a mean squared error with a value of 3.84886e-05 and it was based on the budgeted cost and the quantity of deck concrete. The optimum neural network model, for predicting the actual duration, produced a mean squared error with a value of 5.89463e-05 and it also was based on the budgeted cost and the amount of deck concrete.

Random Access in IoT Using Naïve Bayes Classification

This paper deals with the random access procedure in next-generation networks and presents the solution to reduce total service time (TST) which is one of the most important performance metrics in current and future internet of things (IoT) based networks. The proposed solution focuses on the calculation of optimal transmission probability which maximizes the success probability and reduces TST. It uses the information of several idle preambles in every time slot, and based on it, it estimates the number of backlogged IoT devices using Naïve Bayes estimation which is a type of supervised learning in the machine learning domain. The estimation of backlogged devices is necessary since optimal transmission probability depends on it and the eNodeB does not have information about it. The simulations are carried out in MATLAB which verify that the proposed solution gives excellent performance.

Handover for Dense Small Cells Heterogeneous Networks: A Power-Efficient Game Theoretical Approach

In this paper, a non-cooperative game method is formulated where all players compete to transmit at higher power. Every base station represents a player in the game. The game is solved by obtaining the Nash equilibrium (NE) where the game converges to optimality. The proposed method, named Power Efficient Handover Game Theoretic (PEHO-GT) approach, aims to control the handover in dense small cell networks. Players optimize their payoff by adjusting the transmission power to improve the performance in terms of throughput, handover, power consumption and load balancing. To select the desired transmission power for a player, the payoff function considers the gain of increasing the transmission power. Then, the cell selection takes place by deploying Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS). A game theoretical method is implemented for heterogeneous networks to validate the improvement obtained. Results reveal that the proposed method gives a throughput improvement while reducing the power consumption and minimizing the frequent handover.

Integrated Grey Rational Analysis-Standard Deviation Method for Handover in Heterogeneous Networks

The dense deployment of small cells is a promising solution to enhance the coverage and capacity of the heterogeneous networks (HetNets). However, the unplanned deployment could bring new challenges to the network ranging from interference, unnecessary handovers and handover failures. This will cause a degradation in the quality of service (QoS) delivered to the end user. In this paper, we propose an integrated Grey Rational Analysis Standard Deviation based handover method (GRA-SD) for HetNet. The proposed method integrates the Standard Deviation (SD) technique to acquire the weight of the handover metrics and the GRA method to select the best handover base station. The performance of the GRA-SD method is evaluated and compared with the traditional Multiple Attribute Decision Making (MADM) methods including Simple Additive Weighting (SAW) and VIKOR methods. Results reveal that the proposed method has outperformed the other methods in terms of minimizing the number of frequent unnecessary handovers and handover failures, in addition to improving the energy efficiency.

Integrated Social Support through Social Networks to Enhance the Quality of Life of Metastatic Breast Cancer Patients

Being diagnosed with metastatic breast cancer, the patients as well as their caretakers are affected physically and mentally. Although the medical systems in Thailand have been attempting to improve the quality and effectiveness of the treatment of the disease in terms of physical illness, the success of the treatment also depends on the quality of mental health. Metastatic breast cancer patients have found that social support is a key factor that helps them through this difficult time. It is recognized that social support in different dimensions, including emotional support, social network support, informational support, instrumental support and appraisal support, are contributing factors that positively affect the quality of life of patients in general, and it is undeniable that social support in various forms is important in promoting the quality of life of metastatic breast patients. However, previous studies have not been dedicated to investigating their quality of life concerning affective, cognitive, and behavioral outcomes. Therefore, this study aims to develop integrated social support through social networks to improve the quality of life of metastatic breast cancer patients in Thailand.

Alternative Key Exchange Algorithm Based on Elliptic Curve Digital Signature Algorithm Certificate and Usage in Applications

The Elliptic Curve Digital Signature algorithm-based X509v3 certificates are becoming more popular due to their short public and private key sizes. Moreover, these certificates can be stored in Internet of Things (IoT) devices, with limited resources, using less memory and transmitted in network security protocols, such as Internet Key Exchange (IKE), Transport Layer Security (TLS) and Secure Shell (SSH) with less bandwidth. The proposed method gives another advantage, in that it increases the performance of the above-mentioned protocols in terms of key exchange by saving one scalar multiplication operation.

A Structural Support Vector Machine Approach for Biometric Recognition

Face is a non-intrusive strong biometrics for identification of original and dummy facial by different artificial means. Face recognition is extremely important in the contexts of computer vision, psychology, surveillance, pattern recognition, neural network, content based video processing. The availability of a widespread face database is crucial to test the performance of these face recognition algorithms. The openly available face databases include face images with a wide range of poses, illumination, gestures and face occlusions but there is no dummy face database accessible in public domain. This paper presents a face detection algorithm based on the image segmentation in terms of distance from a fixed point and template matching methods. This proposed work is having the most appropriate number of nodal points resulting in most appropriate outcomes in terms of face recognition and detection. The time taken to identify and extract distinctive facial features is improved in the range of 90 to 110 sec. with the increment of efficiency by 3%.