Towards End-To-End Disease Prediction from Raw Metagenomic Data

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Solid State Drive End to End Reliability Prediction, Characterization and Control

A flaw or drift from expected operational performance in one component (NAND, PMIC, controller, DRAM, etc.) may affect the reliability of the entire Solid State Drive (SSD) system. Therefore, it is important to ensure the required quality of each individual component through qualification testing specified using standards or user requirements. Qualification testing is time-consuming and comes at a substantial cost for product manufacturers. A highly technical team, from all the eminent stakeholders is embarking on reliability prediction from beginning of new product development, identify critical to reliability parameters, perform full-blown characterization to embed margin into product reliability and establish control to ensure the product reliability is sustainable in the mass production. The paper will discuss a comprehensive development framework, comprehending SSD end to end from design to assembly, in-line inspection, in-line testing and will be able to predict and to validate the product reliability at the early stage of new product development. During the design stage, the SSD will go through intense reliability margin investigation with focus on assembly process attributes, process equipment control, in-process metrology and also comprehending forward looking product roadmap. Once these pillars are completed, the next step is to perform process characterization and build up reliability prediction modeling. Next, for the design validation process, the reliability prediction specifically solder joint simulator will be established. The SSD will be stratified into Non-Operating and Operating tests with focus on solder joint reliability and connectivity/component latent failures by prevention through design intervention and containment through Temperature Cycle Test (TCT). Some of the SSDs will be subjected to the physical solder joint analysis called Dye and Pry (DP) and Cross Section analysis. The result will be feedbacked to the simulation team for any corrective actions required to further improve the design. Once the SSD is validated and is proven working, it will be subjected to implementation of the monitor phase whereby Design for Assembly (DFA) rules will be updated. At this stage, the design change, process and equipment parameters are in control. Predictable product reliability at early product development will enable on-time sample qualification delivery to customer and will optimize product development validation, effective development resource and will avoid forced late investment to bandage the end-of-life product failures. Understanding the critical to reliability parameters earlier will allow focus on increasing the product margin that will increase customer confidence to product reliability.

Electrical Effects during the Wetting-Drying Cycle of Porous Brickwork: Electrical Aspects of Rising Damp

Rising damp is an extremely complex phenomenon that is of great practical interest to the field of building conservation due to the irreversible damages it can make to old and historic structures. The electrical effects occurring in damp masonry have been scarcely researched and are a largely unknown aspect of rising damp. Present paper describes the typical electrical patterns occurring in porous brickwork during a wetting and drying cycle. It has been found that in contrast with dry masonry, where electrical phenomena are virtually non-existent, damp masonry exhibits a wide array of electrical effects. Long-term real-time measurements performed in the lab on small-scale brick structures, using an array of embedded micro-sensors, revealed significant voltage, current, capacitance and resistance variations which can be linked to the movement of moisture inside porous materials. The same measurements performed on actual old buildings revealed a similar behaviour, the electrical effects being more significant in areas of the brickwork affected by rising damp. Understanding these electrical phenomena contributes to a better understanding of the driving mechanisms of rising damp, potentially opening new avenues of dealing with it in a less invasive manner.

Improving Subjective Bias Detection Using Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory

Detecting subjectively biased statements is a vital task. This is because this kind of bias, when present in the text or other forms of information dissemination media such as news, social media, scientific texts, and encyclopedias, can weaken trust in the information and stir conflicts amongst consumers. Subjective bias detection is also critical for many Natural Language Processing (NLP) tasks like sentiment analysis, opinion identification, and bias neutralization. Having a system that can adequately detect subjectivity in text will boost research in the above-mentioned areas significantly. It can also come in handy for platforms like Wikipedia, where the use of neutral language is of importance. The goal of this work is to identify the subjectively biased language in text on a sentence level. With machine learning, we can solve complex AI problems, making it a good fit for the problem of subjective bias detection. A key step in this approach is to train a classifier based on BERT (Bidirectional Encoder Representations from Transformers) as upstream model. BERT by itself can be used as a classifier; however, in this study, we use BERT as data preprocessor as well as an embedding generator for a Bi-LSTM (Bidirectional Long Short-Term Memory) network incorporated with attention mechanism. This approach produces a deeper and better classifier. We evaluate the effectiveness of our model using the Wiki Neutrality Corpus (WNC), which was compiled from Wikipedia edits that removed various biased instances from sentences as a benchmark dataset, with which we also compare our model to existing approaches. Experimental analysis indicates an improved performance, as our model achieved state-of-the-art accuracy in detecting subjective bias. This study focuses on the English language, but the model can be fine-tuned to accommodate other languages.

Self-Organization-Based Approach for Embedded Real-Time System Design

This paper proposes a self-organization-based approach for real-time systems design. The addressed issue is the mapping of an application onto an architecture of heterogeneous processors while optimizing both makespan and reliability. Since this problem is NP-hard, a heuristic algorithm is used to obtain efficiently approximate solutions. The proposed approach takes into consideration the quality as well as the diversity of solutions. Indeed, an alternate treatment of the two objectives allows to produce solutions of good quality while a self-organization approach based on the neighborhood structure is used to reorganize solutions and consequently to enhance their diversity. Produced solutions make different compromises between the makespan and the reliability giving the user the possibility to select the solution suited to his (her) needs.

Destination Decision Model for Cruising Taxis Based on Embedding Model

In Japan, taxi is one of the popular transportations and taxi industry is one of the big businesses. However, in recent years, there has been a difficult problem of reducing the number of taxi drivers. In the taxi business, mainly three passenger catching methods are applied. One style is "cruising" that drivers catches passengers while driving on a road. Second is "waiting" that waits passengers near by the places with many requirements for taxies such as entrances of hospitals, train stations. The third one is "dispatching" that is allocated based on the contact from the taxi company. Above all, the cruising taxi drivers need the experience and intuition for finding passengers, and it is difficult to decide "the destination for cruising". The strong recommendation system for the cruising taxies supports the new drivers to find passengers, and it can be the solution for the decreasing the number of drivers in the taxi industry. In this research, we propose a method of recommending a destination for cruising taxi drivers. On the other hand, as a machine learning technique, the embedding models that embed the high dimensional data to a low dimensional space is widely used for the data analysis, in order to represent the relationship of the meaning between the data clearly. Taxi drivers have their favorite courses based on their experiences, and the courses are different for each driver. We assume that the course of cruising taxies has meaning such as the course for finding business man passengers (go around the business area of the city of go to main stations) and course for finding traveler passengers (go around the sightseeing places or big hotels), and extract the meaning of their destinations. We analyze the cruising history data of taxis based on the embedding model and propose the recommendation system for passengers. Finally, we demonstrate the recommendation of destinations for cruising taxi drivers based on the real-world data analysis using proposing method.

Evaluating the Impact of Replacement Policies on the Cache Performance and Energy Consumption in Different Multicore Embedded Systems

The cache has an important role in the reduction of access delay between a processor and memory in high-performance embedded systems. In these systems, the energy consumption is one of the most important concerns, and it will become more important with smaller processor feature sizes and higher frequencies. Meanwhile, the cache system dissipates a significant portion of energy compared to the other components of a processor. There are some elements that can affect the energy consumption of the cache such as replacement policy and degree of associativity. Due to these points, it can be inferred that selecting an appropriate configuration for the cache is a crucial part of designing a system. In this paper, we investigate the effect of different cache replacement policies on both cache’s performance and energy consumption. Furthermore, the impact of different Instruction Set Architectures (ISAs) on cache’s performance and energy consumption has been investigated.

Development of a Feedback Control System for a Lab-Scale Biomass Combustion System Using Programmable Logic Controller

The application of combustion technologies for thermal conversion of biomass and solid wastes to energy has been a major solution to the effective handling of wastes over a long period of time. Lab-scale biomass combustion systems have been observed to be economically viable and socially acceptable, but major concerns are the environmental impacts of the process and deviation of temperature distribution within the combustion chamber. Both high and low combustion chamber temperature may affect the overall combustion efficiency and gaseous emissions. Therefore, there is an urgent need to develop a control system which measures the deviations of chamber temperature from set target values, sends these deviations (which generates disturbances in the system) in the form of feedback signal (as input), and control operating conditions for correcting the errors. In this research study, major components of the feedback control system were determined, assembled, and tested. In addition, control algorithms were developed to actuate operating conditions (e.g., air velocity, fuel feeding rate) using ladder logic functions embedded in the Programmable Logic Controller (PLC). The developed control algorithm having chamber temperature as a feedback signal is integrated into the lab-scale swirling fluidized bed combustor (SFBC) to investigate the temperature distribution at different heights of the combustion chamber based on various operating conditions. The air blower rates and the fuel feeding rates obtained from automatic control operations were correlated with manual inputs. There was no observable difference in the correlated results, thus indicating that the written PLC program functions were adequate in designing the experimental study of the lab-scale SFBC. The experimental results were analyzed to study the effect of air velocity operating at 222-273 ft/min and fuel feeding rate of 60-90 rpm on the chamber temperature. The developed temperature-based feedback control system was shown to be adequate in controlling the airflow and the fuel feeding rate for the overall biomass combustion process as it helps to minimize the steady-state error.

Electronic Structure Calculation of AsSiTeB/SiAsBTe Nanostructures Using Density Functional Theory

The electronic structure calculation for the nanoclusters of AsSiTeB/SiAsBTe quaternary semiconductor alloy belonging to the III-V Group elements was performed. Motivation for this research work was to look for accurate electronic and geometric data of small nanoclusters of AsSiTeB/SiAsBTe in the gaseous form. The two clusters, one in the linear form and the other in the bent form, were studied under the framework of Density Functional Theory (DFT) using the B3LYP functional and LANL2DZ basis set with the software packaged Gaussian 16. We have discussed the Optimized Energy, Frontier Orbital Energy Gap in terms of HOMO-LUMO, Dipole Moment, Ionization Potential, Electron Affinity, Binding Energy, Embedding Energy, Density of States (DoS) spectrum for both structures. The important findings of the predicted nanostructures are that these structures have wide band gap energy, where linear structure has band gap energy (Eg) value is 2.375 eV and bent structure (Eg) value is 2.778 eV. Therefore, these structures can be utilized as wide band gap semiconductors. These structures have high electron affinity value of 4.259 eV for the linear structure and electron affinity value of 3.387 eV for the bent structure form. It shows that electron acceptor capability is high for both forms. The widely known application of these compounds is in the light emitting diodes due to their wide band gap nature.

Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator

Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.

Security in Resource Constraints Network Light Weight Encryption for Z-MAC

Wireless sensor network was formed by a combination of nodes, systematically it transmitting the data to their base stations, this transmission data can be easily compromised if the limited processing power and the data consistency from these nodes are kept in mind; there is always a discussion to address the secure data transfer or transmission in actual time. This will present a mechanism to securely transmit the data over a chain of sensor nodes without compromising the throughput of the network by utilizing available battery resources available in the sensor node. Our methodology takes many different advantages of Z-MAC protocol for its efficiency, and it provides a unique key by sharing the mechanism using neighbor node MAC address. We present a light weighted data integrity layer which is embedded in the Z-MAC protocol to prove that our protocol performs well than Z-MAC when we introduce the different attack scenarios.

Application of UAS in Forest Firefighting for Detecting Ignitions and 3D Fuel Volume Estimation

The article presents results from the AF3 project “Advanced Forest Fire Fighting” focused on Unmanned Aircraft Systems (UAS)-based 3D surveillance and 3D area mapping using high-resolution photogrammetric methods from multispectral imaging, also taking advantage of the 3D scanning techniques from the SCAN4RECO project. We also present a proprietary embedded sensor system used for the detection of fire ignitions in the forest using near-infrared based scanner with weight and form factors allowing it to be easily deployed on standard commercial micro-UAVs, such as DJI Inspire or Mavic. Results from real-life pilot trials in Greece, Spain, and Israel demonstrated added-value in the use of UAS for precise and reliable detection of forest fires, as well as high-resolution 3D aerial modeling for accurate quantification of human resources and equipment required for firefighting.

Personal Information Classification Based on Deep Learning in Automatic Form Filling System

Recently, the rapid development of deep learning makes artificial intelligence (AI) penetrate into many fields, replacing manual work there. In particular, AI systems also become a research focus in the field of automatic office. To meet real needs in automatic officiating, in this paper we develop an automatic form filling system. Specifically, it uses two classical neural network models and several word embedding models to classify various relevant information elicited from the Internet. When training the neural network models, we use less noisy and balanced data for training. We conduct a series of experiments to test my systems and the results show that our system can achieve better classification results.

Hybrid Equity Warrants Pricing Formulation under Stochastic Dynamics

A warrant is a financial contract that confers the right but not the obligation, to buy or sell a security at a certain price before expiration. The standard procedure to value equity warrants using call option pricing models such as the Black–Scholes model had been proven to contain many flaws, such as the assumption of constant interest rate and constant volatility. In fact, existing alternative models were found focusing more on demonstrating techniques for pricing, rather than empirical testing. Therefore, a mathematical model for pricing and analyzing equity warrants which comprises stochastic interest rate and stochastic volatility is essential to incorporate the dynamic relationships between the identified variables and illustrate the real market. Here, the aim is to develop dynamic pricing formulations for hybrid equity warrants by incorporating stochastic interest rates from the Cox-Ingersoll-Ross (CIR) model, along with stochastic volatility from the Heston model. The development of the model involves the derivations of stochastic differential equations that govern the model dynamics. The resulting equations which involve Cauchy problem and heat equations are then solved using partial differential equation approaches. The analytical pricing formulas obtained in this study comply with the form of analytical expressions embedded in the Black-Scholes model and other existing pricing models for equity warrants. This facilitates the practicality of this proposed formula for comparison purposes and further empirical study.

A Low-Cost Air Quality Monitoring Internet of Things Platform

In the present paper, a low cost, compact and modular Internet of Things (IoT) platform for air quality monitoring in urban areas is presented. This platform comprises of dedicated low cost, low power hardware and the associated embedded software that enable measurement of particles (PM2.5 and PM10), NO, CO, CO2 and O3 concentration in the air, along with relative temperature and humidity. This integrated platform acts as part of a greater air pollution data collecting wireless network that is able to monitor the air quality in various regions and neighborhoods of an urban area, by providing sensor measurements at a high rate that reaches up to one sample per second. It is therefore suitable for Big Data analysis applications such as air quality forecasts, weather forecasts and traffic prediction. The first real world test for the developed platform took place in Thessaloniki, Greece, where 16 devices were installed in various buildings in the city. In the near future, many more of these devices are going to be installed in the greater Thessaloniki area, giving a detailed air quality map of the city.

Efficient HAAR Wavelet Transform with Embedded Zerotrees of Wavelet Compression for Color Images

This study is expected to compress true color image with compression algorithms in color spaces to provide high compression rates. The need of high compression ratio is to improve storage space. Alternative aim is to rank compression algorithms in a suitable color space. The dataset is sequence of true color images with size 128 x 128. HAAR Wavelet is one of the famous wavelet transforms, has great potential and maintains image quality of color images. HAAR wavelet Transform using Set Partitioning in Hierarchical Trees (SPIHT) algorithm with different color spaces framework is applied to compress sequence of images with angles. Embedded Zerotrees of Wavelet (EZW) is a powerful standard method to sequence data. Hence the proposed compression frame work of HAAR wavelet, xyz color space, morphological gradient and applied image with EZW compression, obtained improvement to other methods, in terms of Compression Ratio, Mean Square Error, Peak Signal Noise Ratio and Bits Per Pixel quality measures.

A Study of the Trade-off Energy Consumption-Performance-Schedulability for DVFS Multicore Systems

Dynamic Voltage and Frequency Scaling (DVFS) multicore platforms are promising execution platforms that enable high computational performance, less energy consumption and flexibility in scheduling the system processes. However, the resulting interleaving and memory interference together with per-core frequency tuning make real-time guarantees hard to be delivered. Besides, energy consumption represents a strong constraint for the deployment of such systems on energy-limited settings. Identifying the system configurations that would achieve a high performance and consume less energy while guaranteeing the system schedulability is a complex task in the design of modern embedded systems. This work studies the trade-off between energy consumption, cores utilization and memory bottleneck and their impact on the schedulability of DVFS multicore time-critical systems with a hierarchy of shared memories. We build a model-based framework using Parametrized Timed Automata of UPPAAL to analyze the mutual impact of performance, energy consumption and schedulability of DVFS multicore systems, and demonstrate the trade-off on an actual case study.

The Flashnews as a Commercial Session of Political Marketing: The Content Analysis of the Embedded Political Narratives in Non-Political Media Products

Political communication in Hungary has undergone a significant change in the 2010s. One element of the transformation is the Flashnews. This media product was launched in March 2015 and since then 40-50 blocks are broadcasted, daily, on 5 channels. Flashnews blocks are condensed news sessions, containing the summary of political narratives. It starts with the introduction of the narrator, then, usually four news topics are presented and, finally, the narrator concludes the block. The block lasts only one minute and, therefore, it provides a blink session into the main narratives of political communication at the time. Beyond its rapid pace, what makes its avoidance difficult is that these blocks are always in the first position in the commercial break of a non-political media product. Although it is only one minute long, its significance is high. The content of the Flashnews reflects the main governmental narratives and, therefore, the Flashnews is part of the agenda-setting capacity of political communication. It reaches media consumers who have limited knowledge and interest in politics, and their use of media products is not politically related. For this audience, the Flashnews pops up in the same way as commercials. Due to its structure and appearance, the impact of Flashnews seems to be similar to commercials, imbedded into the break of media products. It activates existing knowledge constructs, builds up associational links and maintains their presence in a way that the recipient is not aware of the phenomenon. The research aims to examine the extent to which the Flashnews and the main news narratives are identical in their content. This aim is realized with the content analysis of the two news products by examining the Flashnews and the evening news during main sport events from 2016 to 2018. The initial hypothesis of the research is that Flashnews is a contribution to the news management technique for an effective articulation of political narratives in public service media channels.

Deep Learning Based, End-to-End Metaphor Detection in Greek with Recurrent and Convolutional Neural Networks

This paper presents and benchmarks a number of end-to-end Deep Learning based models for metaphor detection in Greek. We combine Convolutional Neural Networks and Recurrent Neural Networks with representation learning to bear on the metaphor detection problem for the Greek language. The models presented achieve exceptional accuracy scores, significantly improving the previous state-of-the-art results, which had already achieved accuracy 0.82. Furthermore, no special preprocessing, feature engineering or linguistic knowledge is used in this work. The methods presented achieve accuracy of 0.92 and F-score 0.92 with Convolutional Neural Networks (CNNs) and bidirectional Long Short Term Memory networks (LSTMs). Comparable results of 0.91 accuracy and 0.91 F-score are also achieved with bidirectional Gated Recurrent Units (GRUs) and Convolutional Recurrent Neural Nets (CRNNs). The models are trained and evaluated only on the basis of training tuples, the related sentences and their labels. The outcome is a state-of-the-art collection of metaphor detection models, trained on limited labelled resources, which can be extended to other languages and similar tasks.

Discussing Embedded versus Central Machine Learning in Wireless Sensor Networks

Machine learning (ML) can be implemented in Wireless Sensor Networks (WSNs) as a central solution or distributed solution where the ML is embedded in the nodes. Embedding improves privacy and may reduce prediction delay. In addition, the number of transmissions is reduced. However, quality factors such as prediction accuracy, fault detection efficiency and coordinated control of the overall system suffer. Here, we discuss and highlight the trade-offs that should be considered when choosing between embedding and centralized ML, especially for multihop networks. In addition, we present estimations that demonstrate the energy trade-offs between embedded and centralized ML. Although the total network energy consumption is lower with central prediction, it makes the network more prone for partitioning due to the high forwarding load on the one-hop nodes. Moreover, the continuous improvements in the number of operations per joule for embedded devices will move the energy balance toward embedded prediction.