Abstract: Autonomous structural health monitoring (SHM) of many structures and bridges became a topic of paramount importance for maintenance purposes and safety reasons. This paper proposes a set of machine learning (ML) tools to perform automatic feature selection and detection of anomalies in a bridge from vibrational data and compare different feature extraction schemes to increase the accuracy and reduce the amount of data collected. As a case study, the Z-24 bridge is considered because of the extensive database of accelerometric data in both standard and damaged conditions. The proposed framework starts from the first four fundamental frequencies extracted through operational modal analysis (OMA) and clustering, followed by time-domain filtering (tracking). The fundamental frequencies extracted are then fed to a dimensionality reduction block implemented through two different approaches: feature selection (intelligent multiplexer) that tries to estimate the most reliable frequencies based on the evaluation of some statistical features (i.e., entropy, variance, kurtosis), and feature extraction (auto-associative neural network (ANN)) that combine the fundamental frequencies to extract new damage sensitive features in a low dimensional feature space. Finally, one-class classification (OCC) algorithms perform anomaly detection, trained with standard condition points, and tested with normal and anomaly ones. In particular, principal component analysis (PCA), kernel principal component analysis (KPCA), and autoassociative neural network (ANN) are presented and their performance are compared. It is also shown that, by evaluating the correct features, the anomaly can be detected with accuracy and an F1 score greater than 95%.
Abstract: Data in IT systems in enterprises have been growing at phenomenal pace. This has provided opportunities to run analytics to gather intelligence on key business parameters that enable them to provide better products and services to customers. While there are several Artificial Intelligence/Machine Learning (AI/ML) and Business Intelligence (BI) tools and technologies available in marketplace to run analytics, there is a need for an integrated view when developing intelligent solutions in enterprises. This paper progressively elaborates a reference model for enterprise solutions, builds an integrated view of data, information and intelligence components and presents a reference architecture for intelligent enterprise solutions. Finally, it applies the reference architecture to an insurance organization. The reference architecture is the outcome of experience and insights gathered from developing intelligent solutions for several organizations.
Abstract: The present study proposes a methodology for the efficient daily management of fleet vehicles and construction machinery. The application covers the area of remote monitoring of heavy-duty vehicles operation parameters, where specific sensor data are stored and examined in order to provide information about the vehicle’s health. The vehicle diagnostics allow the user to inspect whether maintenance tasks need to be performed before a fault occurs. A properly designed machine learning model is proposed for the detection of two different types of faults through classification. Cross validation is used and the accuracy of the trained model is checked with the confusion matrix.
Abstract: Swimming with the tide of deep learning, the field of music information retrieval (MIR) experiences parallel development and a sheer variety of feature-learning models has been applied to music classification and tagging tasks. Among those learning techniques, the deep convolutional neural networks (CNNs) have been widespreadly used with better performance than the traditional approach especially in music genre classification and prediction. However, regarding the music recommendation, there is a large semantic gap between the corresponding audio genres and the various aspects of a song that influence user preference. In our study, aiming to bridge the gap, we strive to construct an automatic music aesthetic annotation model with MIDI format for better comparison and measurement of the similarity between music pieces in the way of harmonic analysis. We use the matrix of qualification converted from MIDI files as input to train two different classifiers, support vector machine (SVM) and Decision Tree (DT). Experimental results in performance of a tag prediction task have shown that both learning algorithms are capable of extracting high-level properties in an end-to end manner from music information. The proposed model is helpful to learn the audience taste and then the resulting recommendations are likely to appeal to a niche consumer.
Abstract: We present a decision-support tool to assist an operator in the detection and tracking of a suspect vehicle traveling to an unknown target destination. Multiple data sources, such as traffic cameras, traffic information, weather, etc., are integrated and processed in real-time to infer a suspect’s intended destination chosen from a list of pre-determined high-value targets. Previously, we presented our work in the detection and tracking of vehicles using traffic and airborne cameras. Here, we focus on the fusion and processing of that information to predict a suspect’s behavior. The network of cameras is represented by a directional graph, where the edges correspond to direct road connections between the nodes and the edge weights are proportional to the average time it takes to travel from one node to another. For our experiments, we construct our graph based on the greater Los Angeles subset of the Caltrans’s “Performance Measurement System” (PeMS) dataset. We propose a Bayesian approach where a posterior probability for each target is continuously updated based on detections of the suspect in the live video feeds. Additionally, we introduce the concept of ‘soft interventions’, inspired by the field of Causal Inference. Soft interventions are herein defined as interventions that do not immediately interfere with the suspect’s movements; rather, a soft intervention may induce the suspect into making a new decision, ultimately making their intent more transparent. For example, a soft intervention could be temporarily closing a road a few blocks from the suspect’s current location, which may require the suspect to change their current course. The objective of these interventions is to gain the maximum amount of information about the suspect’s intent in the shortest possible time. Our system currently operates in a human-on-the-loop mode where at each step, a set of recommendations are presented to the operator to aid in decision-making. In principle, the system could operate autonomously, only prompting the operator for critical decisions, allowing the system to significantly scale up to larger areas and multiple suspects. Once the intended target is identified with sufficient confidence, the vehicle is reported to the authorities to take further action. Other recommendations include a selection of road closures, i.e., soft interventions, or to continue monitoring. We evaluate the performance of the proposed system using simulated scenarios where the suspect, starting at random locations, takes a noisy shortest path to their intended target. In all scenarios, the suspect’s intended target is unknown to our system. The decision thresholds are selected to maximize the chances of determining the suspect’s intended target in the minimum amount of time and with the smallest number of interventions. We conclude by discussing the limitations of our current approach to motivate a machine learning approach, based on reinforcement learning in order to relax some of the current limiting assumptions.
Abstract: Machine learning is a new and exciting area of
artificial intelligence nowadays. Machine learning is the most
valuable, time, supervised, and cost-effective approach. It is not a
narrow learning approach; it also includes a wide range of methods
and techniques that can be applied to a wide range of complex realworld
problems and time domains. Biological image classification,
adaptive testing, computer vision, natural language processing, object
detection, cancer detection, face recognition, handwriting
recognition, speech recognition, and many other applications of
machine learning are widely used in research, industry, and
government. Every day, more data are generated, and conventional
machine learning techniques are becoming obsolete as users move to
distributed and real-time operations. By providing fundamental
knowledge of machine learning tools and research opportunities in
the field, the aim of this article is to serve as both a comprehensive
overview and a guide. A diverse set of machine learning resources is
demonstrated and contrasted with the key features in this survey.
Abstract: Detecting subjectively biased statements is a vital task. This is because this kind of bias, when present in the text or other forms of information dissemination media such as news, social media, scientific texts, and encyclopedias, can weaken trust in the information and stir conflicts amongst consumers. Subjective bias detection is also critical for many Natural Language Processing (NLP) tasks like sentiment analysis, opinion identification, and bias neutralization. Having a system that can adequately detect subjectivity in text will boost research in the above-mentioned areas significantly. It can also come in handy for platforms like Wikipedia, where the use of neutral language is of importance. The goal of this work is to identify the subjectively biased language in text on a sentence level. With machine learning, we can solve complex AI problems, making it a good fit for the problem of subjective bias detection. A key step in this approach is to train a classifier based on BERT (Bidirectional Encoder Representations from Transformers) as upstream model. BERT by itself can be used as a classifier; however, in this study, we use BERT as data preprocessor as well as an embedding generator for a Bi-LSTM (Bidirectional Long Short-Term Memory) network incorporated with attention mechanism. This approach produces a deeper and better classifier. We evaluate the effectiveness of our model using the Wiki Neutrality Corpus (WNC), which was compiled from Wikipedia edits that removed various biased instances from sentences as a benchmark dataset, with which we also compare our model to existing approaches. Experimental analysis indicates an improved performance, as our model achieved state-of-the-art accuracy in detecting subjective bias. This study focuses on the English language, but the model can be fine-tuned to accommodate other languages.
Abstract: This paper deals with the random access procedure in next-generation networks and presents the solution to reduce total service time (TST) which is one of the most important performance metrics in current and future internet of things (IoT) based networks. The proposed solution focuses on the calculation of optimal transmission probability which maximizes the success probability and reduces TST. It uses the information of several idle preambles in every time slot, and based on it, it estimates the number of backlogged IoT devices using Naïve Bayes estimation which is a type of supervised learning in the machine learning domain. The estimation of backlogged devices is necessary since optimal transmission probability depends on it and the eNodeB does not have information about it. The simulations are carried out in MATLAB which verify that the proposed solution gives excellent performance.
Abstract: Promotion is a key element in the retail business. Thus, analysis of promotions to quantify their effectiveness in terms of Revenue and/or Margin is an essential activity in the retail industry. However, measuring the sales/revenue uplift is based on estimations, as the actual sales/revenue without the promotion is not present. Further, the presence of Halo and Cannibalization in a multiple parallel promotions’ scenario complicates the problem. Calculating Baseline by considering inter-brand/competitor items or using Halo and Cannibalization's impact on Revenue calculations by considering Baseline as an interpretation of items’ unit sales in neighboring nonpromotional weeks individually may not capture the overall Revenue uplift in the case of multiple parallel promotions. Hence, this paper proposes a Machine Learning based method for calculating the Revenue uplift by considering the Halo and Cannibalization impact on the Baseline and the Revenue. In the first section of the proposed methodology, Baseline of an item is calculated by incorporating the impact of the promotions on its related items. In the later section, the Revenue of an item is calculated by considering both Halo and Cannibalization impacts. Hence, this methodology enables correct calculation of the overall Revenue uplift due a given promotion.
Abstract: In Japan, taxi is one of the popular transportations and taxi industry is one of the big businesses. However, in recent years, there has been a difficult problem of reducing the number of taxi drivers. In the taxi business, mainly three passenger catching methods are applied. One style is "cruising" that drivers catches passengers while driving on a road. Second is "waiting" that waits passengers near by the places with many requirements for taxies such as entrances of hospitals, train stations. The third one is "dispatching" that is allocated based on the contact from the taxi company. Above all, the cruising taxi drivers need the experience and intuition for finding passengers, and it is difficult to decide "the destination for cruising". The strong recommendation system for the cruising taxies supports the new drivers to find passengers, and it can be the solution for the decreasing the number of drivers in the taxi industry. In this research, we propose a method of recommending a destination for cruising taxi drivers. On the other hand, as a machine learning technique, the embedding models that embed the high dimensional data to a low dimensional space is widely used for the data analysis, in order to represent the relationship of the meaning between the data clearly. Taxi drivers have their favorite courses based on their experiences, and the courses are different for each driver. We assume that the course of cruising taxies has meaning such as the course for finding business man passengers (go around the business area of the city of go to main stations) and course for finding traveler passengers (go around the sightseeing places or big hotels), and extract the meaning of their destinations. We analyze the cruising history data of taxis based on the embedding model and propose the recommendation system for passengers. Finally, we demonstrate the recommendation of destinations for cruising taxi drivers based on the real-world data analysis using proposing method.
Abstract: For security purposes, it is important to detect passwords entered by unauthorized users. With traditional alphanumeric passwords, if the content of a password is acquired and correctly entered by an intruder, it is impossible to differentiate the password entered by the intruder from those entered by the authorized user because the password entries contain precisely the same character set. However, no two entries for the gesture-based passwords, even those entered by the person who created the password, will be identical. There are always variations between entries, such as the shape and length of each stroke, the location of each stroke, and the speed of drawing. It is possible that passwords entered by the unauthorized user contain higher levels of variations when compared with those entered by the authorized user (the creator). The difference in the levels of variations may provide cues to detect unauthorized entries. To test this hypothesis, we designed an empirical study, collected and analyzed the data with the help of machine-learning algorithms. The results of the study are significant.
Abstract: Research in predictive maintenance modeling has improved in the recent years to predict failures and needed maintenance with high accuracy, saving cost and improving manufacturing efficiency. However, classic prediction models provide little valuable insight towards the most important features contributing to the failure. By analyzing and quantifying feature importance in predictive maintenance models, cost saving can be optimized based on business goals. First, multiple classifiers are evaluated with cross-validation to predict the multi-class of failures. Second, predictive performance with features provided by different feature selection algorithms are further analyzed. Third, features selected by different algorithms are ranked and combined based on their predictive power. Finally, linear explainer SHAP (SHapley Additive exPlanations) is applied to interpret classifier behavior and provide further insight towards the specific roles of features in both local predictions and global model behavior. The results of the experiments suggest that certain features play dominant roles in predictive models while others have significantly less impact on the overall performance. Moreover, for multi-class prediction of machine failures, the most important features vary with type of machine failures. The results may lead to improved productivity and cost saving by prioritizing sensor deployment, data collection, and data processing of more important features over less importance features.
Abstract: Network security is role of the ICT environment
because malicious users are continually growing that realm of
education, business, and then related with ICT. The network security
contravention is typically described and examined centrally based
on a security event management system. The firewalls, Intrusion
Detection System (IDS), and Intrusion Prevention System are
becoming essential to monitor or prevent of potential violations,
incidents attack, and imminent threats. In this system, the firewall
rules are set only for where the system policies are needed. Dataset
deployed in this system are derived from the testbed environment. The
traffic as in DoS and PortScan traffics are applied in the testbed with
firewall and IDS implementation. The network traffics are classified
as normal or attacks in the existing testbed environment based on
six machine learning classification methods applied in the system.
It is required to be tested to get datasets and applied for DoS and
PortScan. The dataset is based on CICIDS2017 and some features
have been added. This system tested 26 features from the applied
dataset. The system is to reduce false positive rates and to improve
accuracy in the implemented testbed design. The system also proves
good performance by selecting important features and comparing
existing a dataset by machine learning classifiers.
Abstract: In Abu Dhabi, there are many different education curriculums where sector of private schools and quality assurance is supervising many private schools in Abu Dhabi for many nationalities. As there are many different education curriculums in Abu Dhabi to meet expats’ needs, there are different requirements for registration and success. In addition, there are different age groups for starting education in each curriculum. In fact, each curriculum has a different number of years, assessment techniques, reassessment rules, and exam boards. Currently, students that transfer curriculums are not being placed in the right year group due to different start and end dates of each academic year and their date of birth for each year group is different for each curriculum and as a result, we find students that are either younger or older for that year group which therefore creates gaps in their learning and performance. In addition, there is not a way of storing student data throughout their academic journey so that schools can track the student learning process. In this paper, we propose to develop a computational framework applicable in multicultural countries such as UAE in which multi-education systems are implemented. The ultimate goal is to use cloud and fog computing technology integrated with Artificial Intelligence techniques of Machine Learning to aid in a smooth transition when assigning students to their year groups, and provide leveling and differentiation information of students who relocate from a particular education curriculum to another, whilst also having the ability to store and access student data from anywhere throughout their academic journey.
Abstract: Sentiment analysis is a very active research topic.
Every day, Facebook, Twitter, Weibo, and other social media,
as well as significant e-commerce websites, generate a massive
amount of comments, which can be used to analyse peoples
opinions or emotions. The existing methods for sentiment analysis
are based mainly on sentiment dictionaries, machine learning, and
deep learning. The first two kinds of methods rely on heavily
sentiment dictionaries or large amounts of labelled data. The third
one overcomes these two problems. So, in this paper, we focus
on the third one. Specifically, we survey various sentiment analysis
methods based on convolutional neural network, recurrent neural
network, long short-term memory, deep neural network, deep belief
network, and memory network. We compare their futures, advantages,
and disadvantages. Also, we point out the main problems of
these methods, which may be worthy of careful studies in the
future. Finally, we also examine the application of deep learning in
multimodal sentiment analysis and aspect-level sentiment analysis.
Abstract: Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.
Abstract: Telecommunication service providers demand accurate
and precise prediction of customer churn probabilities to increase the
effectiveness of their customer relation services. The large amount of
customer data owned by the service providers is suitable for analysis
by machine learning methods. In this study, expenditure data of
customers are analyzed by using an artificial neural network (ANN).
The ANN model is applied to the data of customers with different
billing duration. The proposed model successfully predicts the churn
probabilities at 83% accuracy for only three months expenditure data
and the prediction accuracy increases up to 89% when the nine month
data is used. The experiments also show that the accuracy of ANN
model increases on an extended feature set with information of the
changes on the bill amounts.
Abstract: The management of COVID-19 patients based on chest imaging is emerging as an essential tool for evaluating the spread of the pandemic which has gripped the global community. It has already been used to monitor the situation of COVID-19 patients who have issues in respiratory status. There has been increase to use chest imaging for medical triage of patients who are showing moderate-severe clinical COVID-19 features, this is due to the fast dispersal of the pandemic to all continents and communities. This article demonstrates the development of machine learning techniques for the test of COVID-19 patients using Chest X-Ray (CXR) images in nearly real-time, to distinguish the COVID-19 infection with a significantly high level of accuracy. The testing performance has covered a combination of different datasets of CXR images of positive COVID-19 patients, patients with viral and bacterial infections, also, people with a clear chest. The proposed AI scheme successfully distinguishes CXR scans of COVID-19 infected patients from CXR scans of viral and bacterial based pneumonia as well as normal cases with an average accuracy of 94.43%, sensitivity 95%, and specificity 93.86%. Predicted decisions would be supported by visual evidence to help clinicians speed up the initial assessment process of new suspected cases, especially in a resource-constrained environment.
Abstract: In an era where machines run and shape our world, the need for a stable, non-ending source of energy emerges. In this study, the focus was on the solar energy in Egypt as a renewable source, the most important factors that could affect the solar energy’s market share throughout its life cycle production were analyzed and filtered, the relationships between them were derived before structuring a Bayesian network. Also, forecasted models were built for multiple factors to predict the states in Egypt by 2035, based on historical data and patterns, to be used as the nodes’ states in the network. 37 factors were found to might have an impact on the use of solar energy and then were deducted to 12 factors that were chosen to be the most effective to the solar energy’s life cycle in Egypt, based on surveying experts and data analysis, some of the factors were found to be recurring in multiple stages. The presented Bayesian network could be used later for scenario and decision analysis of using solar energy in Egypt, as a stable renewable source for generating any type of energy needed.
Abstract: Machine learning (ML) can be implemented in Wireless Sensor Networks (WSNs) as a central solution or distributed solution where the ML is embedded in the nodes. Embedding improves privacy and may reduce prediction delay. In addition, the number of transmissions is reduced. However, quality factors such as prediction accuracy, fault detection efficiency and coordinated control of the overall system suffer. Here, we discuss and highlight the trade-offs that should be considered when choosing between embedding and centralized ML, especially for multihop networks. In addition, we present estimations that demonstrate the energy trade-offs between embedded and centralized ML. Although the total network energy consumption is lower with central prediction, it makes the network more prone for partitioning due to the high forwarding load on the one-hop nodes. Moreover, the continuous improvements in the number of operations per joule for embedded devices will move the energy balance toward embedded prediction.