Abstract: Analysis of the human microbiome using metagenomic
sequencing data has demonstrated high ability in discriminating
various human diseases. Raw metagenomic sequencing data require
multiple complex and computationally heavy bioinformatics steps
prior to data analysis. Such data contain millions of short sequences
read from the fragmented DNA sequences and stored as fastq files.
Conventional processing pipelines consist in multiple steps including
quality control, filtering, alignment of sequences against genomic
catalogs (genes, species, taxonomic levels, functional pathways,
etc.). These pipelines are complex to use, time consuming and
rely on a large number of parameters that often provide variability
and impact the estimation of the microbiome elements. Training
Deep Neural Networks directly from raw sequencing data is a
promising approach to bypass some of the challenges associated with
mainstream bioinformatics pipelines. Most of these methods use the
concept of word and sentence embeddings that create a meaningful
and numerical representation of DNA sequences, while extracting
features and reducing the dimensionality of the data. In this paper
we present an end-to-end approach that classifies patients into disease
groups directly from raw metagenomic reads: metagenome2vec. This
approach is composed of four steps (i) generating a vocabulary of
k-mers and learning their numerical embeddings; (ii) learning DNA
sequence (read) embeddings; (iii) identifying the genome from which
the sequence is most likely to come and (iv) training a multiple
instance learning classifier which predicts the phenotype based on
the vector representation of the raw data. An attention mechanism
is applied in the network so that the model can be interpreted,
assigning a weight to the influence of the prediction for each genome.
Using two public real-life data-sets as well a simulated one, we
demonstrated that this original approach reaches high performance,
comparable with the state-of-the-art methods applied directly on
processed data though mainstream bioinformatics workflows. These
results are encouraging for this proof of concept work. We believe
that with further dedication, the DNN models have the potential to
surpass mainstream bioinformatics workflows in disease classification
tasks.
Abstract: A flaw or drift from expected operational performance in one component (NAND, PMIC, controller, DRAM, etc.) may affect the reliability of the entire Solid State Drive (SSD) system. Therefore, it is important to ensure the required quality of each individual component through qualification testing specified using standards or user requirements. Qualification testing is time-consuming and comes at a substantial cost for product manufacturers. A highly technical team, from all the eminent stakeholders is embarking on reliability prediction from beginning of new product development, identify critical to reliability parameters, perform full-blown characterization to embed margin into product reliability and establish control to ensure the product reliability is sustainable in the mass production. The paper will discuss a comprehensive development framework, comprehending SSD end to end from design to assembly, in-line inspection, in-line testing and will be able to predict and to validate the product reliability at the early stage of new product development. During the design stage, the SSD will go through intense reliability margin investigation with focus on assembly process attributes, process equipment control, in-process metrology and also comprehending forward looking product roadmap. Once these pillars are completed, the next step is to perform process characterization and build up reliability prediction modeling. Next, for the design validation process, the reliability prediction specifically solder joint simulator will be established. The SSD will be stratified into Non-Operating and Operating tests with focus on solder joint reliability and connectivity/component latent failures by prevention through design intervention and containment through Temperature Cycle Test (TCT). Some of the SSDs will be subjected to the physical solder joint analysis called Dye and Pry (DP) and Cross Section analysis. The result will be feedbacked to the simulation team for any corrective actions required to further improve the design. Once the SSD is validated and is proven working, it will be subjected to implementation of the monitor phase whereby Design for Assembly (DFA) rules will be updated. At this stage, the design change, process and equipment parameters are in control. Predictable product reliability at early product development will enable on-time sample qualification delivery to customer and will optimize product development validation, effective development resource and will avoid forced late investment to bandage the end-of-life product failures. Understanding the critical to reliability parameters earlier will allow focus on increasing the product margin that will increase customer confidence to product reliability.
Abstract: Asset management of urban infrastructures faces a multitude of challenges that need to be overcome to obtain a reliable measurement of performances. Predicting the performance of slowly aging systems is one of those challenges, which helps the asset manager to investigate specific failure modes and to undertake the appropriate maintenance and rehabilitation interventions to avoid catastrophic failures as well as to optimize the maintenance costs. This article presents a methodology for modeling the deterioration of slowly degrading assets based on an operating history. It consists of extracting degradation profiles by grouping together assets that exhibit similar degradation sequences using an unsupervised classification technique derived from artificial intelligence. The obtained clusters are used to build the performance prediction models. This methodology is applied to a sample of a stormwater drainage culvert dataset.
Abstract: International guidelines recommend removing any
artificial body in Low Earth Orbit (LEO) within 25 years from
mission completion. Among disposal strategies, electrodynamic
tethers appear to be a promising option for LEO, thanks to the
limited storage mass and the minimum interface requirements to the
host spacecraft. In particular, recent technological advances make it
feasible to deorbit large objects with tether lengths of a few kilometers
or less. To further investigate such an innovative passive system,
the European Union is currently funding the project E.T.PACK
– Electrodynamic Tether Technology for Passive Consumable-less
Deorbit Kit in the framework of the H2020 Future Emerging
Technologies (FET) Open program. The project focuses on the design
of an end of life disposal kit for LEO satellites. This kit aims to
deploy a taped tether that can be activated at the spacecraft end of life
to perform autonomous deorbit within the international guidelines.
In this paper, the orbital performance of the E.T.PACK deorbiting
kit is compared to other disposal methods. Besides, the orbital decay
prediction is parametrized as a function of spacecraft mass and tether
system performance. Different values of length, width, and thickness
of the tether will be evaluated for various scenarios (i.e., different
initial orbital parameters). The results will be compared to other
end-of-life disposal methods with similar allocated resources. The
analysis of the more innovative system’s performance with the tape
coated with a thermionic material, which has a low work-function
(LWT), for which no active component for the cathode is required,
will also be briefly discussed. The results show that the electrodynamic tether option can be a
competitive and performant solution for satellite disposal compared
to other deorbit technologies.
Abstract: Sentiment analysis is a broad and expanding field that aims to extract and classify opinions from textual data. Lexicon-based approaches are based on the use of a sentiment lexicon, i.e., a list of words each mapped to a sentiment score, to rate the sentiment of a text chunk. Our work focuses on predicting stock price change using a sentiment lexicon built from financial conference call logs. We present a method to generate a sentiment lexicon based upon an existing probabilistic approach. By using a domain-specific lexicon, we outperform traditional techniques and demonstrate that domain-specific sentiment lexicons provide higher accuracy than generic sentiment lexicons when predicting stock price change.
Abstract: Research in predictive maintenance modeling has improved in the recent years to predict failures and needed maintenance with high accuracy, saving cost and improving manufacturing efficiency. However, classic prediction models provide little valuable insight towards the most important features contributing to the failure. By analyzing and quantifying feature importance in predictive maintenance models, cost saving can be optimized based on business goals. First, multiple classifiers are evaluated with cross-validation to predict the multi-class of failures. Second, predictive performance with features provided by different feature selection algorithms are further analyzed. Third, features selected by different algorithms are ranked and combined based on their predictive power. Finally, linear explainer SHAP (SHapley Additive exPlanations) is applied to interpret classifier behavior and provide further insight towards the specific roles of features in both local predictions and global model behavior. The results of the experiments suggest that certain features play dominant roles in predictive models while others have significantly less impact on the overall performance. Moreover, for multi-class prediction of machine failures, the most important features vary with type of machine failures. The results may lead to improved productivity and cost saving by prioritizing sensor deployment, data collection, and data processing of more important features over less importance features.
Abstract: SARS-CoV-2 virus is currently one of the most
infectious pathogens for humans. It started in China at the end of
2019 and now it is spread in all over the world. The origin and
diffusion of the SARS-CoV-2 epidemic, is analysed based on the
discussion of viral phylogeny theory. With the aim of understanding
the spread of infection in the affected countries, it is crucial to
modelize the spread of the virus and simulate its activity. In this
paper, the prediction of coronavirus outbreak is done by using SIR
model without vital dynamics, applying different numerical technique
solving ordinary differential equations (ODEs). We find out that ABM
and MRT methods perform better than other techniques and that the
activity of the virus will decrease in April but it never cease (for
some time the activity will remain low) and the next cycle will start
in the middle July 2020 for Norway and Denmark, and October 2020
for Sweden, and September for Finland.
Abstract: Different approaches have been used to predict the performance of the vertical axis wind turbines (VAWT), such as experimental, computational fluid dynamics (CFD), and analytical methods. Analytical methods, such as momentum models that use streamtubes, have low computational cost and sufficient accuracy. The double multiple streamtube (DMST) is one of the most commonly used of momentum models, which divide the rotor plane of VAWT into upwind and downwind. In fact, results from the DMST method have shown some discrepancy compared with experiment results; that is because the Darrieus turbine is a complex and aerodynamically unsteady configuration. In this study, analytical-experimental-based corrections, including dynamic stall, streamtube expansion, and finite blade length correction are used to improve the DMST method. Results indicated that using these corrections for a SANDIA 17-m VAWT will lead to improving the results of DMST.
Abstract: The demand for renewable energy is significantly increasing, major investments are being supplied to the wind power generation industry as a leading source of clean energy. The wind energy sector is entirely dependable and driven by the prediction of wind speed, which by the nature of wind is very stochastic and widely random. This s0tudy employs deep multi-fidelity Gaussian process regression, used to predict wind speeds for medium term time horizons. Data of the RUNE experiment in the west coast of Denmark were provided by the Technical University of Denmark, which represent the wind speed across the study area from the period between December 2015 and March 2016. The study aims to investigate the effect of pre-processing the data by denoising the signal using empirical wavelet transform (EWT) and engaging the vector components of wind speed to increase the number of input data layers for data fusion using deep multi-fidelity Gaussian process regression (GPR). The outcomes were compared using root mean square error (RMSE) and the results demonstrated a significant increase in the accuracy of predictions which demonstrated that using vector components of the wind speed as additional predictors exhibits more accurate predictions than strategies that ignore them, reflecting the importance of the inclusion of all sub data and pre-processing signals for wind speed forecasting models.
Abstract: The emergence of digital twin technology, a digital replica of physical world, has improved the real-time access to data from sensors about the performance of buildings. This digital transformation has opened up many opportunities to improve the management of the building by using the data collected to help monitor consumption patterns and energy leakages. One example is the integration of predictive models for anomaly detection. In this paper, we use the GAM (Generalised Additive Model) for the anomaly detection of Air Handling Units (AHU) power consumption pattern. There is ample research work on the use of GAM for the prediction of power consumption at the office building and nation-wide level. However, there is limited illustration of its anomaly detection capabilities, prescriptive analytics case study, and its integration with the latest development of digital twin technology. In this paper, we applied the general GAM modelling framework on the historical data of the AHU power consumption and cooling load of the building between Jan 2018 to Aug 2019 from an education campus in Singapore to train prediction models that, in turn, yield predicted values and ranges. The historical data are seamlessly extracted from the digital twin for modelling purposes. We enhanced the utility of the GAM model by using it to power a real-time anomaly detection system based on the forward predicted ranges. The magnitude of deviation from the upper and lower bounds of the uncertainty intervals is used to inform and identify anomalous data points, all based on historical data, without explicit intervention from domain experts. Notwithstanding, the domain expert fits in through an optional feedback loop through which iterative data cleansing is performed. After an anomalously high or low level of power consumption detected, a set of rule-based conditions are evaluated in real-time to help determine the next course of action for the facilities manager. The performance of GAM is then compared with other approaches to evaluate its effectiveness. Lastly, we discuss the successfully deployment of this approach for the detection of anomalous power consumption pattern and illustrated with real-world use cases.
Abstract: The impact the climate change is not recent, the main variable in the hydrological cycle is the sequence and shortage of a drought, which has a significant impact on the socioeconomic, agricultural and environmental spheres. This study aims to characterize and quantify, based on precipitation climatic projections, the rainy and dry events in the region of the Uruguay River Basin, through the Standardized Precipitation Index (SPI). The database is the image that is part of the Intercomparison of Model Models, Phase 5 (CMIP5), which provides condition prediction models, organized according to the Representative Routes of Concentration (CPR). Compared to the normal set of climates in the Uruguay River Watershed through precipitation projections, seasonal precipitation increases for all proposed scenarios, with a low climate trend. From the data of this research, the idea is that this article can be used to support research and the responsible bodies can use it as a subsidy for mitigation measures in other hydrographic basins.
Abstract: Development of two real-time surface roughness (Ra) prediction systems for milling operations was attempted. The systems used not only cutting parameters, such as feed rate and spindle speed, but also the cutting current generated and corrected by a clamp type energy sensor. Two different approaches were developed. First, a fuzzy inference system (FIS), in which the fuzzy logic rules are generated by experts in the milling processes, was used to conduct prediction modeling using current cutting data. Second, a neuro-fuzzy system (ANFIS) was explored. Neuro-fuzzy systems are adaptive techniques in which data are collected on the network, processed, and rules are generated by the system. The inference system then uses these rules to predict Ra as the output. Experimental results showed that the parameters of spindle speed, feed rate, depth of cut, and input current variation could predict Ra. These two systems enable the prediction of Ra during the milling operation with an average of 91.83% and 94.48% accuracy by FIS and ANFIS systems, respectively. Statistically, the ANFIS system provided better prediction accuracy than that of the FIS system.
Abstract: A reliable, real-time, and non-invasive system that can identify patients at risk for hemodynamic instability is needed to aid clinicians in their efforts to anticipate patient deterioration and initiate early interventions. The purpose of this pilot study was to explore the clinical capabilities of a real-time analytic from a single lead of an electrocardiograph to correctly distinguish between rapid response team (RRT) activations due to hemodynamic (H-RRT) and non-hemodynamic (NH-RRT) causes, as well as predict H-RRT cases with actionable lead times. The study consisted of a single center, retrospective cohort of 21 patients with RRT activations from step-down and telemetry units. Through electronic health record review and blinded to the analytic’s output, each patient was categorized by clinicians into H-RRT and NH-RRT cases. The analytic output and the categorization were compared. The prediction lead time prior to the RRT call was calculated. The analytic correctly distinguished between H-RRT and NH-RRT cases with 100% accuracy, demonstrating 100% positive and negative predictive values, and 100% sensitivity and specificity. In H-RRT cases, the analytic detected hemodynamic deterioration with a median lead time of 9.5 hours prior to the RRT call (range 14 minutes to 52 hours). The study demonstrates that an electrocardiogram (ECG) based analytic has the potential for providing clinical decision and monitoring support for caregivers to identify at risk patients within a clinically relevant timeframe allowing for increased vigilance and early interventional support to reduce the chances of continued patient deterioration.
Abstract: This article presents the application of the semi-analytic method (SAM) in the thermal management solution (TMS) of the energy storage system (ESS). The TMS studied in this work is fluid cooling. In fluid cooling, both effective heat conduction and heat convection are indispensable due to the heat transfer from solid to fluid. Correspondingly, an efficient TMS requires a design investigation of the following parameters: fluid inlet temperature, ESS initial temperature, fluid flow rate, working c rate, continuous working time, and materials properties. Their variation induces a change of thermal performance in the battery module, which is usually evaluated by numerical simulation. Compared to complicated computation resources and long computation time in simulation, the SAM is developed in this article to predict the thermal influence within a few seconds. In SAM, a fast prediction model is reckoned by combining numerical simulation with theoretical/empirical equations. The SAM can explore the thermal effect of boundary parameters in both steady-state and transient heat transfer scenarios within a short time. Therefore, the SAM developed in this work can simplify the design cycle of TMS and inspire more possibilities in TMS design.
Abstract: In the present paper, a low cost, compact and modular Internet of Things (IoT) platform for air quality monitoring in urban areas is presented. This platform comprises of dedicated low cost, low power hardware and the associated embedded software that enable measurement of particles (PM2.5 and PM10), NO, CO, CO2 and O3 concentration in the air, along with relative temperature and humidity. This integrated platform acts as part of a greater air pollution data collecting wireless network that is able to monitor the air quality in various regions and neighborhoods of an urban area, by providing sensor measurements at a high rate that reaches up to one sample per second. It is therefore suitable for Big Data analysis applications such as air quality forecasts, weather forecasts and traffic prediction. The first real world test for the developed platform took place in Thessaloniki, Greece, where 16 devices were installed in various buildings in the city. In the near future, many more of these devices are going to be installed in the greater Thessaloniki area, giving a detailed air quality map of the city.
Abstract: Telecommunication service providers demand accurate
and precise prediction of customer churn probabilities to increase the
effectiveness of their customer relation services. The large amount of
customer data owned by the service providers is suitable for analysis
by machine learning methods. In this study, expenditure data of
customers are analyzed by using an artificial neural network (ANN).
The ANN model is applied to the data of customers with different
billing duration. The proposed model successfully predicts the churn
probabilities at 83% accuracy for only three months expenditure data
and the prediction accuracy increases up to 89% when the nine month
data is used. The experiments also show that the accuracy of ANN
model increases on an extended feature set with information of the
changes on the bill amounts.
Abstract: In this paper, we describe how Bayesian inferential reasoning will contributes in obtaining a well-satisfied prediction for Distributed Constraint Optimization Problems (DCOPs) with uncertainties. We also demonstrate how DCOPs could be merged to multi-agent knowledge understand and prediction (i.e. Situation Awareness). The DCOPs functions were merged with Bayesian Belief Network (BBN) in the form of situation, awareness, and utility nodes. We describe how the uncertainties can be represented to the BBN and make an effective prediction using the expectation-maximization algorithm or conjugate gradient descent algorithm. The idea of variable prediction using Bayesian inference may reduce the number of variables in agents’ sampling domain and also allow missing variables estimations. Experiment results proved that the BBN perform compelling predictions with samples containing uncertainties than the perfect samples. That is, Bayesian inference can help in handling uncertainties and dynamism of DCOPs, which is the current issue in the DCOPs community. We show how Bayesian inference could be formalized with Distributed Situation Awareness (DSA) using uncertain and missing agents’ data. The whole framework was tested on multi-UAV mission for forest fire searching. Future work focuses on augmenting existing architecture to deal with dynamic DCOPs algorithms and multi-agent information merging.
Abstract: Predicting the risk of Pancreatic Adenocarcinoma (PA) in advance can benefit the quality of care and potentially reduce population mortality and morbidity. The aim of this study was to develop and prospectively validate a risk prediction model to identify patients at risk of new incident PA as early as 3 months before the onset of PA in a statewide, general population in Maine. The PA prediction model was developed using Deep Neural Networks, a deep learning algorithm, with a 2-year electronic-health-record (EHR) cohort. Prospective results showed that our model identified 54.35% of all inpatient episodes of PA, and 91.20% of all PA that required subsequent chemoradiotherapy, with a lead-time of up to 3 months and a true alert of 67.62%. The risk assessment tool has attained an improved discriminative ability. It can be immediately deployed to the health system to provide automatic early warnings to adults at risk of PA. It has potential to identify personalized risk factors to facilitate customized PA interventions.
Abstract: Noise estimation is essential in today wireless systems
for power control, adaptive modulation, interference suppression and
quality of service. Deep learning (DL) has already been applied in the
physical layer for modulation and signal classifications. Unacceptably
low accuracy of less than 50% is found to undermine traditional
application of DL classification for SNR prediction. In this paper,
we use divide-and-conquer algorithm and classifier fusion method
to simplify SNR classification and therefore enhances DL learning
and prediction. Specifically, multiple CNNs are used for classification
rather than a single CNN. Each CNN performs a binary classification
of a single SNR with two labels: less than, greater than or equal.
Together, multiple CNNs are combined to effectively classify over a
range of SNR values from −20 ≤ SNR ≤ 32 dB.We use pre-trained
CNNs to predict SNR over a wide range of joint channel parameters
including multiple Doppler shifts (0, 60, 120 Hz), power-delay
profiles, and signal-modulation types (QPSK,16QAM,64-QAM). The
approach achieves individual SNR prediction accuracy of 92%,
composite accuracy of 70% and prediction convergence one order
of magnitude faster than that of traditional estimation.
Abstract: In this paper, we describe how to achieve knowledge understanding and prediction (Situation Awareness (SA)) for multiple-agents conducting searching activity using Bayesian inferential reasoning and learning. Bayesian Belief Network was used to monitor agents' knowledge about their environment, and cases are recorded for the network training using expectation-maximisation or gradient descent algorithm. The well trained network will be used for decision making and environmental situation prediction. Forest fire searching by multiple UAVs was the use case. UAVs are tasked to explore a forest and find a fire for urgent actions by the fire wardens. The paper focused on two problems: (i) effective agents’ path planning strategy and (ii) knowledge understanding and prediction (SA). The path planning problem by inspiring animal mode of foraging using Lévy distribution augmented with Bayesian reasoning was fully described in this paper. Results proof that the Lévy flight strategy performs better than the previous fixed-pattern (e.g., parallel sweeps) approaches in terms of energy and time utilisation. We also introduced a waypoint assessment strategy called k-previous waypoints assessment. It improves the performance of the ordinary levy flight by saving agent’s resources and mission time through redundant search avoidance. The agents (UAVs) are to report their mission knowledge at the central server for interpretation and prediction purposes. Bayesian reasoning and learning were used for the SA and results proof effectiveness in different environments scenario in terms of prediction and effective knowledge representation. The prediction accuracy was measured using learning error rate, logarithm loss, and Brier score and the result proves that little agents mission that can be used for prediction within the same or different environment. Finally, we described a situation-based knowledge visualization and prediction technique for heterogeneous multi-UAV mission. While this paper proves linkage of Bayesian reasoning and learning with SA and effective searching strategy, future works is focusing on simplifying the architecture.