Towards End-To-End Disease Prediction from Raw Metagenomic Data

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Solid State Drive End to End Reliability Prediction, Characterization and Control

A flaw or drift from expected operational performance in one component (NAND, PMIC, controller, DRAM, etc.) may affect the reliability of the entire Solid State Drive (SSD) system. Therefore, it is important to ensure the required quality of each individual component through qualification testing specified using standards or user requirements. Qualification testing is time-consuming and comes at a substantial cost for product manufacturers. A highly technical team, from all the eminent stakeholders is embarking on reliability prediction from beginning of new product development, identify critical to reliability parameters, perform full-blown characterization to embed margin into product reliability and establish control to ensure the product reliability is sustainable in the mass production. The paper will discuss a comprehensive development framework, comprehending SSD end to end from design to assembly, in-line inspection, in-line testing and will be able to predict and to validate the product reliability at the early stage of new product development. During the design stage, the SSD will go through intense reliability margin investigation with focus on assembly process attributes, process equipment control, in-process metrology and also comprehending forward looking product roadmap. Once these pillars are completed, the next step is to perform process characterization and build up reliability prediction modeling. Next, for the design validation process, the reliability prediction specifically solder joint simulator will be established. The SSD will be stratified into Non-Operating and Operating tests with focus on solder joint reliability and connectivity/component latent failures by prevention through design intervention and containment through Temperature Cycle Test (TCT). Some of the SSDs will be subjected to the physical solder joint analysis called Dye and Pry (DP) and Cross Section analysis. The result will be feedbacked to the simulation team for any corrective actions required to further improve the design. Once the SSD is validated and is proven working, it will be subjected to implementation of the monitor phase whereby Design for Assembly (DFA) rules will be updated. At this stage, the design change, process and equipment parameters are in control. Predictable product reliability at early product development will enable on-time sample qualification delivery to customer and will optimize product development validation, effective development resource and will avoid forced late investment to bandage the end-of-life product failures. Understanding the critical to reliability parameters earlier will allow focus on increasing the product margin that will increase customer confidence to product reliability.

Performance Prediction Methodology of Slow Aging Assets

Asset management of urban infrastructures faces a multitude of challenges that need to be overcome to obtain a reliable measurement of performances. Predicting the performance of slowly aging systems is one of those challenges, which helps the asset manager to investigate specific failure modes and to undertake the appropriate maintenance and rehabilitation interventions to avoid catastrophic failures as well as to optimize the maintenance costs. This article presents a methodology for modeling the deterioration of slowly degrading assets based on an operating history. It consists of extracting degradation profiles by grouping together assets that exhibit similar degradation sequences using an unsupervised classification technique derived from artificial intelligence. The obtained clusters are used to build the performance prediction models. This methodology is applied to a sample of a stormwater drainage culvert dataset.

Deorbiting Performance of Electrodynamic Tethers to Mitigate Space Debris

International guidelines recommend removing any artificial body in Low Earth Orbit (LEO) within 25 years from mission completion. Among disposal strategies, electrodynamic tethers appear to be a promising option for LEO, thanks to the limited storage mass and the minimum interface requirements to the host spacecraft. In particular, recent technological advances make it feasible to deorbit large objects with tether lengths of a few kilometers or less. To further investigate such an innovative passive system, the European Union is currently funding the project E.T.PACK – Electrodynamic Tether Technology for Passive Consumable-less Deorbit Kit in the framework of the H2020 Future Emerging Technologies (FET) Open program. The project focuses on the design of an end of life disposal kit for LEO satellites. This kit aims to deploy a taped tether that can be activated at the spacecraft end of life to perform autonomous deorbit within the international guidelines. In this paper, the orbital performance of the E.T.PACK deorbiting kit is compared to other disposal methods. Besides, the orbital decay prediction is parametrized as a function of spacecraft mass and tether system performance. Different values of length, width, and thickness of the tether will be evaluated for various scenarios (i.e., different initial orbital parameters). The results will be compared to other end-of-life disposal methods with similar allocated resources. The analysis of the more innovative system’s performance with the tape coated with a thermionic material, which has a low work-function (LWT), for which no active component for the cathode is required, will also be briefly discussed. The results show that the electrodynamic tether option can be a competitive and performant solution for satellite disposal compared to other deorbit technologies.

Lexicon-Based Sentiment Analysis for Stock Movement Prediction

Sentiment analysis is a broad and expanding field that aims to extract and classify opinions from textual data. Lexicon-based approaches are based on the use of a sentiment lexicon, i.e., a list of words each mapped to a sentiment score, to rate the sentiment of a text chunk. Our work focuses on predicting stock price change using a sentiment lexicon built from financial conference call logs. We present a method to generate a sentiment lexicon based upon an existing probabilistic approach. By using a domain-specific lexicon, we outperform traditional techniques and demonstrate that domain-specific sentiment lexicons provide higher accuracy than generic sentiment lexicons when predicting stock price change.

Feature Analysis of Predictive Maintenance Models

Research in predictive maintenance modeling has improved in the recent years to predict failures and needed maintenance with high accuracy, saving cost and improving manufacturing efficiency. However, classic prediction models provide little valuable insight towards the most important features contributing to the failure. By analyzing and quantifying feature importance in predictive maintenance models, cost saving can be optimized based on business goals. First, multiple classifiers are evaluated with cross-validation to predict the multi-class of failures. Second, predictive performance with features provided by different feature selection algorithms are further analyzed. Third, features selected by different algorithms are ranked and combined based on their predictive power. Finally, linear explainer SHAP (SHapley Additive exPlanations) is applied to interpret classifier behavior and provide further insight towards the specific roles of features in both local predictions and global model behavior. The results of the experiments suggest that certain features play dominant roles in predictive models while others have significantly less impact on the overall performance. Moreover, for multi-class prediction of machine failures, the most important features vary with type of machine failures. The results may lead to improved productivity and cost saving by prioritizing sensor deployment, data collection, and data processing of more important features over less importance features.

The Origin, Diffusion and a Comparison of Ordinary Differential Equations Numerical Solutions Used by SIR Model in Order to Predict SARS-CoV-2 in Nordic Countries

SARS-CoV-2 virus is currently one of the most infectious pathogens for humans. It started in China at the end of 2019 and now it is spread in all over the world. The origin and diffusion of the SARS-CoV-2 epidemic, is analysed based on the discussion of viral phylogeny theory. With the aim of understanding the spread of infection in the affected countries, it is crucial to modelize the spread of the virus and simulate its activity. In this paper, the prediction of coronavirus outbreak is done by using SIR model without vital dynamics, applying different numerical technique solving ordinary differential equations (ODEs). We find out that ABM and MRT methods perform better than other techniques and that the activity of the virus will decrease in April but it never cease (for some time the activity will remain low) and the next cycle will start in the middle July 2020 for Norway and Denmark, and October 2020 for Sweden, and September for Finland.

Performance Prediction of a SANDIA 17-m Vertical Axis Wind Turbine Using Improved Double Multiple Streamtube

Different approaches have been used to predict the performance of the vertical axis wind turbines (VAWT), such as experimental, computational fluid dynamics (CFD), and analytical methods. Analytical methods, such as momentum models that use streamtubes, have low computational cost and sufficient accuracy. The double multiple streamtube (DMST) is one of the most commonly used of momentum models, which divide the rotor plane of VAWT into upwind and downwind. In fact, results from the DMST method have shown some discrepancy compared with experiment results; that is because the Darrieus turbine is a complex and aerodynamically unsteady configuration. In this study, analytical-experimental-based corrections, including dynamic stall, streamtube expansion, and finite blade length correction are used to improve the DMST method. Results indicated that using these corrections for a SANDIA 17-m VAWT will lead to improving the results of DMST.

Enhancing Temporal Extrapolation of Wind Speed Using a Hybrid Technique: A Case Study in West Coast of Denmark

The demand for renewable energy is significantly increasing, major investments are being supplied to the wind power generation industry as a leading source of clean energy. The wind energy sector is entirely dependable and driven by the prediction of wind speed, which by the nature of wind is very stochastic and widely random. This s0tudy employs deep multi-fidelity Gaussian process regression, used to predict wind speeds for medium term time horizons. Data of the RUNE experiment in the west coast of Denmark were provided by the Technical University of Denmark, which represent the wind speed across the study area from the period between December 2015 and March 2016. The study aims to investigate the effect of pre-processing the data by denoising the signal using empirical wavelet transform (EWT) and engaging the vector components of wind speed to increase the number of input data layers for data fusion using deep multi-fidelity Gaussian process regression (GPR). The outcomes were compared using root mean square error (RMSE) and the results demonstrated a significant increase in the accuracy of predictions which demonstrated that using vector components of the wind speed as additional predictors exhibits more accurate predictions than strategies that ignore them, reflecting the importance of the inclusion of all sub data and pre-processing signals for wind speed forecasting models.

Air Handling Units Power Consumption Using Generalized Additive Model for Anomaly Detection: A Case Study in a Singapore Campus

The emergence of digital twin technology, a digital replica of physical world, has improved the real-time access to data from sensors about the performance of buildings. This digital transformation has opened up many opportunities to improve the management of the building by using the data collected to help monitor consumption patterns and energy leakages. One example is the integration of predictive models for anomaly detection. In this paper, we use the GAM (Generalised Additive Model) for the anomaly detection of Air Handling Units (AHU) power consumption pattern. There is ample research work on the use of GAM for the prediction of power consumption at the office building and nation-wide level. However, there is limited illustration of its anomaly detection capabilities, prescriptive analytics case study, and its integration with the latest development of digital twin technology. In this paper, we applied the general GAM modelling framework on the historical data of the AHU power consumption and cooling load of the building between Jan 2018 to Aug 2019 from an education campus in Singapore to train prediction models that, in turn, yield predicted values and ranges. The historical data are seamlessly extracted from the digital twin for modelling purposes. We enhanced the utility of the GAM model by using it to power a real-time anomaly detection system based on the forward predicted ranges. The magnitude of deviation from the upper and lower bounds of the uncertainty intervals is used to inform and identify anomalous data points, all based on historical data, without explicit intervention from domain experts. Notwithstanding, the domain expert fits in through an optional feedback loop through which iterative data cleansing is performed. After an anomalously high or low level of power consumption detected, a set of rule-based conditions are evaluated in real-time to help determine the next course of action for the facilities manager. The performance of GAM is then compared with other approaches to evaluate its effectiveness. Lastly, we discuss the successfully deployment of this approach for the detection of anomalous power consumption pattern and illustrated with real-world use cases.

Estimation of the Drought Index Based on the Climatic Projections of Precipitation of the Uruguay River Basin

The impact the climate change is not recent, the main variable in the hydrological cycle is the sequence and shortage of a drought, which has a significant impact on the socioeconomic, agricultural and environmental spheres. This study aims to characterize and quantify, based on precipitation climatic projections, the rainy and dry events in the region of the Uruguay River Basin, through the Standardized Precipitation Index (SPI). The database is the image that is part of the Intercomparison of Model Models, Phase 5 (CMIP5), which provides condition prediction models, organized according to the Representative Routes of Concentration (CPR). Compared to the normal set of climates in the Uruguay River Watershed through precipitation projections, seasonal precipitation increases for all proposed scenarios, with a low climate trend. From the data of this research, the idea is that this article can be used to support research and the responsible bodies can use it as a subsidy for mitigation measures in other hydrographic basins.

Development of Fuzzy Logic and Neuro-Fuzzy Surface Roughness Prediction Systems Coupled with Cutting Current in Milling Operation

Development of two real-time surface roughness (Ra) prediction systems for milling operations was attempted. The systems used not only cutting parameters, such as feed rate and spindle speed, but also the cutting current generated and corrected by a clamp type energy sensor. Two different approaches were developed. First, a fuzzy inference system (FIS), in which the fuzzy logic rules are generated by experts in the milling processes, was used to conduct prediction modeling using current cutting data. Second, a neuro-fuzzy system (ANFIS) was explored. Neuro-fuzzy systems are adaptive techniques in which data are collected on the network, processed, and rules are generated by the system. The inference system then uses these rules to predict Ra as the output. Experimental results showed that the parameters of spindle speed, feed rate, depth of cut, and input current variation could predict Ra. These two systems enable the prediction of Ra during the milling operation with an average of 91.83% and 94.48% accuracy by FIS and ANFIS systems, respectively. Statistically, the ANFIS system provided better prediction accuracy than that of the FIS system.

A Continuous Real-Time Analytic for Predicting Instability in Acute Care Rapid Response Team Activations

A reliable, real-time, and non-invasive system that can identify patients at risk for hemodynamic instability is needed to aid clinicians in their efforts to anticipate patient deterioration and initiate early interventions. The purpose of this pilot study was to explore the clinical capabilities of a real-time analytic from a single lead of an electrocardiograph to correctly distinguish between rapid response team (RRT) activations due to hemodynamic (H-RRT) and non-hemodynamic (NH-RRT) causes, as well as predict H-RRT cases with actionable lead times. The study consisted of a single center, retrospective cohort of 21 patients with RRT activations from step-down and telemetry units. Through electronic health record review and blinded to the analytic’s output, each patient was categorized by clinicians into H-RRT and NH-RRT cases. The analytic output and the categorization were compared. The prediction lead time prior to the RRT call was calculated. The analytic correctly distinguished between H-RRT and NH-RRT cases with 100% accuracy, demonstrating 100% positive and negative predictive values, and 100% sensitivity and specificity. In H-RRT cases, the analytic detected hemodynamic deterioration with a median lead time of 9.5 hours prior to the RRT call (range 14 minutes to 52 hours). The study demonstrates that an electrocardiogram (ECG) based analytic has the potential for providing clinical decision and monitoring support for caregivers to identify at risk patients within a clinically relevant timeframe allowing for increased vigilance and early interventional support to reduce the chances of continued patient deterioration.

Semi-Analytic Method in Fast Evaluation of Thermal Management Solution in Energy Storage System

This article presents the application of the semi-analytic method (SAM) in the thermal management solution (TMS) of the energy storage system (ESS). The TMS studied in this work is fluid cooling. In fluid cooling, both effective heat conduction and heat convection are indispensable due to the heat transfer from solid to fluid. Correspondingly, an efficient TMS requires a design investigation of the following parameters: fluid inlet temperature, ESS initial temperature, fluid flow rate, working c rate, continuous working time, and materials properties. Their variation induces a change of thermal performance in the battery module, which is usually evaluated by numerical simulation. Compared to complicated computation resources and long computation time in simulation, the SAM is developed in this article to predict the thermal influence within a few seconds. In SAM, a fast prediction model is reckoned by combining numerical simulation with theoretical/empirical equations. The SAM can explore the thermal effect of boundary parameters in both steady-state and transient heat transfer scenarios within a short time. Therefore, the SAM developed in this work can simplify the design cycle of TMS and inspire more possibilities in TMS design.

A Low-Cost Air Quality Monitoring Internet of Things Platform

In the present paper, a low cost, compact and modular Internet of Things (IoT) platform for air quality monitoring in urban areas is presented. This platform comprises of dedicated low cost, low power hardware and the associated embedded software that enable measurement of particles (PM2.5 and PM10), NO, CO, CO2 and O3 concentration in the air, along with relative temperature and humidity. This integrated platform acts as part of a greater air pollution data collecting wireless network that is able to monitor the air quality in various regions and neighborhoods of an urban area, by providing sensor measurements at a high rate that reaches up to one sample per second. It is therefore suitable for Big Data analysis applications such as air quality forecasts, weather forecasts and traffic prediction. The first real world test for the developed platform took place in Thessaloniki, Greece, where 16 devices were installed in various buildings in the city. In the near future, many more of these devices are going to be installed in the greater Thessaloniki area, giving a detailed air quality map of the city.

Churn Prediction for Telecommunication Industry Using Artificial Neural Networks

Telecommunication service providers demand accurate and precise prediction of customer churn probabilities to increase the effectiveness of their customer relation services. The large amount of customer data owned by the service providers is suitable for analysis by machine learning methods. In this study, expenditure data of customers are analyzed by using an artificial neural network (ANN). The ANN model is applied to the data of customers with different billing duration. The proposed model successfully predicts the churn probabilities at 83% accuracy for only three months expenditure data and the prediction accuracy increases up to 89% when the nine month data is used. The experiments also show that the accuracy of ANN model increases on an extended feature set with information of the changes on the bill amounts.

Probabilistic Approach of Dealing with Uncertainties in Distributed Constraint Optimization Problems and Situation Awareness for Multi-agent Systems

In this paper, we describe how Bayesian inferential reasoning will contributes in obtaining a well-satisfied prediction for Distributed Constraint Optimization Problems (DCOPs) with uncertainties. We also demonstrate how DCOPs could be merged to multi-agent knowledge understand and prediction (i.e. Situation Awareness). The DCOPs functions were merged with Bayesian Belief Network (BBN) in the form of situation, awareness, and utility nodes. We describe how the uncertainties can be represented to the BBN and make an effective prediction using the expectation-maximization algorithm or conjugate gradient descent algorithm. The idea of variable prediction using Bayesian inference may reduce the number of variables in agents’ sampling domain and also allow missing variables estimations. Experiment results proved that the BBN perform compelling predictions with samples containing uncertainties than the perfect samples. That is, Bayesian inference can help in handling uncertainties and dynamism of DCOPs, which is the current issue in the DCOPs community. We show how Bayesian inference could be formalized with Distributed Situation Awareness (DSA) using uncertain and missing agents’ data. The whole framework was tested on multi-UAV mission for forest fire searching. Future work focuses on augmenting existing architecture to deal with dynamic DCOPs algorithms and multi-agent information merging.

A Deep-Learning Based Prediction of Pancreatic Adenocarcinoma with Electronic Health Records from the State of Maine

Predicting the risk of Pancreatic Adenocarcinoma (PA) in advance can benefit the quality of care and potentially reduce population mortality and morbidity. The aim of this study was to develop and prospectively validate a risk prediction model to identify patients at risk of new incident PA as early as 3 months before the onset of PA in a statewide, general population in Maine. The PA prediction model was developed using Deep Neural Networks, a deep learning algorithm, with a 2-year electronic-health-record (EHR) cohort. Prospective results showed that our model identified 54.35% of all inpatient episodes of PA, and 91.20% of all PA that required subsequent chemoradiotherapy, with a lead-time of up to 3 months and a true alert of 67.62%. The risk assessment tool has attained an improved discriminative ability. It can be immediately deployed to the health system to provide automatic early warnings to adults at risk of PA. It has potential to identify personalized risk factors to facilitate customized PA interventions.

SNR Classification Using Multiple CNNs

Noise estimation is essential in today wireless systems for power control, adaptive modulation, interference suppression and quality of service. Deep learning (DL) has already been applied in the physical layer for modulation and signal classifications. Unacceptably low accuracy of less than 50% is found to undermine traditional application of DL classification for SNR prediction. In this paper, we use divide-and-conquer algorithm and classifier fusion method to simplify SNR classification and therefore enhances DL learning and prediction. Specifically, multiple CNNs are used for classification rather than a single CNN. Each CNN performs a binary classification of a single SNR with two labels: less than, greater than or equal. Together, multiple CNNs are combined to effectively classify over a range of SNR values from −20 ≤ SNR ≤ 32 dB.We use pre-trained CNNs to predict SNR over a wide range of joint channel parameters including multiple Doppler shifts (0, 60, 120 Hz), power-delay profiles, and signal-modulation types (QPSK,16QAM,64-QAM). The approach achieves individual SNR prediction accuracy of 92%, composite accuracy of 70% and prediction convergence one order of magnitude faster than that of traditional estimation.

Multi-Agent Searching Adaptation Using Levy Flight and Inferential Reasoning

In this paper, we describe how to achieve knowledge understanding and prediction (Situation Awareness (SA)) for multiple-agents conducting searching activity using Bayesian inferential reasoning and learning. Bayesian Belief Network was used to monitor agents' knowledge about their environment, and cases are recorded for the network training using expectation-maximisation or gradient descent algorithm. The well trained network will be used for decision making and environmental situation prediction. Forest fire searching by multiple UAVs was the use case. UAVs are tasked to explore a forest and find a fire for urgent actions by the fire wardens. The paper focused on two problems: (i) effective agents’ path planning strategy and (ii) knowledge understanding and prediction (SA). The path planning problem by inspiring animal mode of foraging using Lévy distribution augmented with Bayesian reasoning was fully described in this paper. Results proof that the Lévy flight strategy performs better than the previous fixed-pattern (e.g., parallel sweeps) approaches in terms of energy and time utilisation. We also introduced a waypoint assessment strategy called k-previous waypoints assessment. It improves the performance of the ordinary levy flight by saving agent’s resources and mission time through redundant search avoidance. The agents (UAVs) are to report their mission knowledge at the central server for interpretation and prediction purposes. Bayesian reasoning and learning were used for the SA and results proof effectiveness in different environments scenario in terms of prediction and effective knowledge representation. The prediction accuracy was measured using learning error rate, logarithm loss, and Brier score and the result proves that little agents mission that can be used for prediction within the same or different environment. Finally, we described a situation-based knowledge visualization and prediction technique for heterogeneous multi-UAV mission. While this paper proves linkage of Bayesian reasoning and learning with SA and effective searching strategy, future works is focusing on simplifying the architecture.