Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embedding. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic, and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n2) to O(n2/k), and the memory requirement from n2 to 2(n/k)2 which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.

Simulation and Assessment of Carbon Dioxide Separation by Piperazine Blended Solutions Using E-NRTL and Peng-Robinson Models: A Study of Regeneration Heat Duty

High pressure carbon dioxide (CO2) absorption from a specific off-gas in a conventional column has been evaluated for the environmental concerns by the Aspen HYSYS simulator using a wide range of single absorbents and piperazine (PZ) blended solutions to estimate the outlet CO2 concentration, CO2 loading, reboiler power supply and regeneration heat duty to choose the most efficient solution in terms of CO2 removal and required heat duty. The property package, which is compatible with all applied solutions for the simulation in this study, estimates the properties based on electrolyte non-random two-liquid (E-NRTL) model for electrolyte thermodynamics and Peng-Robinson equation of state for vapor phase and liquid hydrocarbon phase properties. The results of the simulation indicate that PZ in addition to the mixture of PZ and monoethanolamine (MEA) demand the highest regeneration heat duty compared with other studied single and blended amine solutions respectively. The blended amine solutions with the lowest PZ concentrations (5wt% and 10wt%) were considered and compared to reduce the cost of process, among which the blended solution of 10wt%PZ+35wt%MDEA (methyldiethanolamine) was found as the most appropriate solution in terms of CO2 content in the outlet gas, rich-CO2 loading and regeneration heat duty.

HaskellFL: A Tool for Detecting Logical Errors in Haskell

Understanding and using the functional paradigm is a challenge for many programmers. Looking for logical errors in code may take a lot of a developer’s time when a program grows in size. In order to facilitate both processes, this paper presents HaskellFL, a tool that uses fault localization techniques to locate a logical error in Haskell code. The Haskell subset used in this work is sufficiently expressive for those studying Functional Programming to get immediate help debugging their code and to answer questions about key concepts associated with the functional paradigm. HaskellFL was tested against Functional Programming assignments submitted by students enrolled at the Functional Programming class at the Federal University of Minas Gerais and against exercises from the Exercism Haskell track that are publicly available in GitHub. This work also evaluated the effectiveness of two fault localization techniques, Tarantula and Ochiai, in the Haskell context. Furthermore, the EXAM score was chosen to evaluate the tool’s effectiveness, and results showed that HaskellFL reduced the effort needed to locate an error for all tested scenarios. The results also showed that the Ochiai method was more effective than Tarantula.

A Convolutional Deep Neural Network Approach for Skin Cancer Detection Using Skin Lesion Images

Malignant Melanoma, known simply as Melanoma, is a type of skin cancer that appears as a mole on the skin. It is critical to detect this cancer at an early stage because it can spread across the body and may lead to the patient death. When detected early, Melanoma is curable. In this paper we propose a deep learning model (Convolutional Neural Networks) in order to automatically classify skin lesion images as Malignant or Benign. Images underwent certain pre-processing steps to diminish the effect of the normal skin region on the model. The result of the proposed model showed a significant improvement over previous work, achieving an accuracy of 97%.

An Approach for Coagulant Dosage Optimization Using Soft Jar Test: A Case Study of Bangkhen Water Treatment Plant

The most important process of the water treatment plant process is coagulation, which uses alum and poly aluminum chloride (PACL). Therefore, determining the dosage of alum and PACL is the most important factor to be prescribed. This research applies an artificial neural network (ANN), which uses the Levenberg–Marquardt algorithm to create a mathematical model (Soft Jar Test) for chemical dose prediction, as used for coagulation, such as alum and PACL, with input data consisting of turbidity, pH, alkalinity, conductivity, and, oxygen consumption (OC) of the Bangkhen Water Treatment Plant (BKWTP), under the authority of the Metropolitan Waterworks Authority of Thailand. The data were collected from 1 January 2019 to 31 December 2019 in order to cover the changing seasons of Thailand. The input data of ANN are divided into three groups: training set, test set, and validation set. The coefficient of determination and the mean absolute errors of the alum model are 0.73, 3.18 and the PACL model are 0.59, 3.21, respectively.

Catalytic Pyrolysis of Sewage Sludge for Upgrading Bio-Oil Quality Using Sludge-Based Activated Char as an Alternative to HZSM5

Due to the concerns about the depletion of fossil fuel sources and the deteriorating environment, the attempt to investigate the production of renewable energy will play a crucial role as a potential to alleviate the dependency on mineral fuels. One particular area of interest is generation of bio-oil through sewage sludge (SS) pyrolysis. SS can be a potential candidate in contrast to other types of biomasses due to its availability and low cost. However, the presence of high molecular weight hydrocarbons and oxygenated compounds in the SS bio-oil hinders some of its fuel applications. In this context, catalytic pyrolysis is another attainable route to upgrade bio-oil quality. Among different catalysts (i.e., zeolites) studied for SS pyrolysis, activated chars (AC) are eco-friendly alternatives. The beneficial features of AC derived from SS comprise the comparatively large surface area, porosity, enriched surface functional groups and presence of a high amount of metal species that can improve the catalytic activity. Hence, a sludge-based AC catalyst was fabricated in a single-step pyrolysis reaction with NaOH as the activation agent and was compared with HZSM5 zeolite in this study. The thermal decomposition and kinetics were invested via thermogravimetric analysis (TGA) for guidance and control of pyrolysis and catalytic pyrolysis and the design of the pyrolysis setup. The results indicated that the pyrolysis and catalytic pyrolysis contain four obvious stages and the main decomposition reaction occurred in the range of 200-600 °C. Coats-Redfern method was applied in the 2nd and 3rd devolatilization stages to estimate the reaction order and activation energy (E) from the mass loss data. The average activation energy (Em) values for the reaction orders n = 1, 2 and 3 were in the range of 6.67-20.37 kJ/mol for SS; 1.51-6.87 kJ/mol for HZSM5; and 2.29-9.17 kJ/mol for AC, respectively. According to the results, AC and HZSM5 both were able to improve the reaction rate of SS pyrolysis by abridging the Em value. Moreover, to generate and examine the effect of the catalysts on the quality of bio-oil, a fixed-bed pyrolysis system was designed and implemented. The composition analysis of the produced bio-oil was carried out via gas chromatography/mass spectrometry (GC/MS). The selected SS to catalyst ratios were 1:1, 2:1 and 4:1. The optimum ratio in terms of cracking the long-chain hydrocarbons and removing oxygen-containing compounds was 1:1 for both catalysts. The upgraded bio-oils with HZSM5 and AC were in the total range of C4-C17 with around 72% in the range of C4-C9. The bio-oil from pyrolysis of SS contained 49.27% oxygenated compounds while the presence of HZSM5 and AC dropped to 7.3% and 13.02%, respectively. Meanwhile, generation of value-added chemicals such as light aromatic compounds were significantly improved in the catalytic process. Furthermore, the fabricated AC catalyst was characterized by BET, SEM-EDX, FT-IR and TGA techniques. Overall, this research demonstrated that AC is an efficient catalyst in the pyrolysis of SS and can be used as a cost-competitive catalyst in contrast to HZSM5.

An Investigation into Libyan Teachers’ Views of Children’s Emotional and Behavioural Difficulties

A great number of children in mainstream schools across Libya is currently living with emotional, behavioural difficulties. This study aims to explore teachers’ perceptions of children’s emotional and behavioural difficulties (EBD) and their attributions of the causes of EBD. The relevance of this area of study to current educational practice is illustrated in the fact that primary school teachers in Libya find classroom behaviour problems one of the major difficulties they face. The information presented in this study was gathered from 182 teachers that responded back to the survey, of whom, 27 teachers were later interviewed. In general, teachers’ perceptions of EBD reflect personal experience, training, and attitudes. Teachers appear from this study to use words such as indifferent, frightened, withdrawn, aggressive, disobedient, hyperactive, less ambitious, lacking concentration, and academically weak to describe pupils with EBD. The implications of this study are envisaged as being extremely important to support teachers addressing children’s EBD and shed light on the contributing factors to EBD for a successful teaching-learning process in Libyan primary schools.

The Application of Fuzzy Set Theory to Mobile Internet Advertisement Fraud Detection

This paper presents the application of fuzzy set theory to implement of mobile advertisement anti-fraud systems. Mobile anti-fraud is a method aiming to identify mobile advertisement fraudsters. One of the main problems of mobile anti-fraud is the lack of evidence to prove a user to be a fraudster. In this paper, we implement an application by using fuzzy set theory to demonstrate how to detect cheaters. The advantage of our method is that the hardship in detecting fraudsters in small data samples has been avoided. We achieved this by giving each user a suspicious degree showing how likely the user is cheating and decide whether a group of users (like all users of a certain APP) together to be fraudsters according to the average suspicious degree. This makes the process more accurate as the data of a single user is too small to be predictable.

Hybrid Weighted Multiple Attribute Decision Making Handover Method for Heterogeneous Networks

Small cell deployment in 5G networks is a promising technology to enhance the capacity and coverage. However, unplanned deployment may cause high interference levels and high number of unnecessary handovers, which in turn result in an increase in the signalling overhead. To guarantee service continuity, minimize unnecessary handovers and reduce signalling overhead in heterogeneous networks, it is essential to properly model the handover decision problem. In this paper, we model the handover decision problem using Multiple Attribute Decision Making (MADM) method, specifically Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS), and propose a hybrid TOPSIS method to control the handover in heterogeneous network. The proposed method adopts a hybrid weighting policy, which is a combination of entropy and standard deviation. A hybrid weighting control parameter is introduced to balance the impact of the standard deviation and entropy weighting on the network selection process and the overall performance. Our proposed method show better performance, in terms of the number of frequent handovers and the mean user throughput, compared to the existing methods.

Designing for Inclusion within the Learning Management System: Social Justice, Identities, and Online Design for Digital Spaces in Higher Education

The aim of this paper is to propose pedagogical design for learning management systems (LMS) that offers greater inclusion for students based on a number of theoretical perspectives and delineated through an example. Considering the impact of COVID-19, including on student mental health, the research suggesting the importance of student sense of belonging on retention, success, and student well-being, the author describes intentional LMS design incorporating theoretically based practices informed by critical theory, feminist theory, indigenous theory and practices, and new materiality. This article considers important aspects of these theories and practices which attend to inclusion, identities, and socially just learning environments. Additionally, increasing student sense of belonging and mental health through LMS design influenced by adult learning theory and the community of inquiry model are described.  The process of thinking through LMS pedagogical design with inclusion intentionally in mind affords the opportunity to allow LMS to go beyond course use as a repository of documents, to an intentional community of practice that facilitates belonging and connection, something much needed in our times. In virtual learning environments it has been harder to discern how students are doing, especially in feeling connected to their courses, their faculty, and their student peers. Increasingly at the forefront of public universities is addressing the needs of students with multiple and intersecting identities and the multiplicity of needs and accommodations. Education in 2020, and moving forward, calls for embedding critical theories and inclusive ideals and pedagogies to the ways instructors design and teach in online platforms. Through utilization of critical theoretical frameworks and instructional practices, students may experience the LMS as a welcoming place with intentional plans for welcoming diversity in identities.

The Impact of ISO 9001 Certification on Brazilian Firms’ Performance: Insights from Multiple Case Studies

The evolution of quality management by companies was strongly enabled by, among others, ISO 9001 certification, which is considered a crucial requirement for several customers. Likewise, performance measurement provides useful insights for companies to identify the reflection of their decision-making process on their improvement. One of the most used performance measurement models is the balanced scorecard (BSC), which uses four perspectives to address a firm’s performance: financial, internal process, customer satisfaction, and learning and growth. Since ISO 9001 certified firms are likely to measure their performance through BSC approach, it is important to verify whether the certificate influences the firm performance or not. Therefore, this paper aims to verify the impact of ISO 9001:2015 on Brazilian firms’ performance based on the BSC perspective. Hence, nine certified companies located in the Southeast region of Brazil were studied through a multiple case study approach. Within this study, it was possible to identify the positive impact of ISO 9001 on firms’ overall performance, and four Critical Success Factors (CSFs) were identified as relevant on the linkage among ISO 9001 and firms’ performance: employee involvement, top management, process management, and customer focus. Due to the COVID-19 pandemic, the number of interviews was limited to the quality manager specialist, and the sample was limited since several companies were closed during the period of the study. This study presents an in-depth analysis of how the relationship between ISO 9001 certification and firms’ performance in a developing country is.

Fatigue Failure Analysis in AISI 304 Stainless Wind Turbine Shafts

Wind turbines are equipment of great importance for generating clean energy in countries and regions with abundant winds. However, complex loadings fluctuations to which they are subject can cause premature failure of these equipment due to the material fatigue process. This work evaluates fatigue failures in small AISI 304 stainless steel turbine shafts. Fractographic analysis techniques, chemical analyzes using energy dispersive spectrometry (EDS), and hardness tests were used to verify the origin of the failures, characterize the properties of the components and the material. The nucleation of cracks on the shafts' surface was observed due to a combined effect of variable stresses, geometric stress concentrating details, and surface wear, leading to the crack's propagation until the catastrophic failure. Beach marks were identified in the macrographic examination, characterizing the probable failure due to fatigue. The sensitization phenomenon was also observed.

Scientific Methods in Educational Management: The Metasystems Perspective

Although scientific methods have been the subject of a large number of papers, the term ‘scientific methods in educational management’ is still not well defined. In this paper, it is adopted the metasystems perspective to define the mentioned term and distinguish them from methods used in time of the scientific management and knowledge management paradigms. In our opinion, scientific methods in educational management rely on global phenomena, events, and processes and their influence on the educational organization. Currently, scientific methods in educational management are integrated with the phenomenon of globalization, cognitivisation, and openness, etc. of educational systems and with global events like the COVID-19 pandemic. Concrete scientific methods are nested in a hierarchy of more and more abstract models of educational management, which form the context of the global impact on education, in general, and learning outcomes, in particular. However, scientific methods can be assigned to a specific mission, strategy, or tactics of educational management of the concrete organization, either by the global management, local development of school organization, or/and development of the life-long successful learner. By accepting this assignment, the scientific method becomes a personal goal of each individual with the educational organization or the option to develop the educational organization at the global standards. In our opinion, in educational management, the scientific methods need to confine the scope to the deep analysis of concrete tasks of the educational system (i.e., teaching, learning, assessment, development), which result in concrete strategies of organizational development. More important are seeking the ways for dynamic equilibrium between the strategy and tactic of the planetary tasks in the field of global education, which result in a need for ecological methods of learning and communication. In sum, distinction between local and global scientific methods is dependent on the subjective conception of the task assignment, measurement, and appraisal. Finally, we conclude that scientific methods are not holistic scientific methods, but the strategy and tactics implemented in the global context by an effective educational/academic manager.

Lamb Wave Wireless Communication in Healthy Plates Using Coherent Demodulation

Guided ultrasonic waves are used in Non-Destructive Testing and Structural Health Monitoring for inspection and damage detection. Recently, wireless data transmission using ultrasonic waves in solid metallic channels has gained popularity in some industrial applications such as nuclear, aerospace and smart vehicles. The idea is to find a good substitute for electromagnetic waves since they are highly attenuated near metallic components due to Faraday shielding. The proposed solution is to use ultrasonic guided waves such as Lamb waves as an information carrier due to their capability of propagation for long distances. In addition to this, valuable information about the health of the structure could be extracted simultaneously. In this work, the reliable frequency bandwidth for communication is extracted experimentally from dispersion curves at first. Then, an experimental platform for wireless communication using Lamb waves is described and built. After this, coherent demodulation algorithm used in telecommunications is tested for Amplitude Shift Keying, On-Off Keying and Binary Phase Shift Keying modulation techniques. Signal processing parameters such as threshold choice, number of cycles per bit and Bit Rate are optimized. Experimental results are compared based on the average bit error percentage. Results has shown high sensitivity to threshold selection for Amplitude Shift Keying and On-Off Keying techniques resulting a Bit Rate decrease. Binary Phase Shift Keying technique shows the highest stability and data rate between all tested modulation techniques.

Speedup Breadth-First Search by Graph Ordering

Breadth-First Search (BFS) is a core graph algorithm that is widely used for graph analysis. As it is frequently used in many graph applications, improving the BFS performance is essential. In this paper, we present a graph ordering method that could reorder the graph nodes to achieve better data locality, thus, improving the BFS performance. Our method is based on an observation that the sibling relationships will dominate the cache access pattern during the BFS traversal. Therefore, we propose a frequency-based model to construct the graph order. First, we optimize the graph order according to the nodes’ visit frequency. Nodes with high visit frequency will be processed in priority. Second, we try to maximize the child nodes’ overlap layer by layer. As it is proved to be NP-hard, we propose a heuristic method that could greatly reduce the preprocessing overheads.We conduct extensive experiments on 16 real-world datasets. The result shows that our method could achieve comparable performance with the state-of-the-art methods while the graph ordering overheads are only about 1/15.

The Applicability of Distillation as an Alternative Nuclear Reprocessing Method

A customized two-stage model has been developed to simulate, analyse, and visualize distillation of actinides as a useful alternative low-pressure separation method in the nuclear recycling cases. Under the most optimal conditions of idealized thermodynamic equilibrium stages and under total reflux of distillate the investigated cases of chloride systems for the separation of such actinides are (A) UCl4-CsCl-PuCl3 and (B) ThCl4-NaCl-PuCl3. Simulatively, uranium tetrachloride in case A is successfully separated by distillation into a six-stage distillation column, and thorium tetrachloride from case B into an eight-stage distillation column. For this, a permissible mole fraction value of 1E-06 has been assumed for the residual impurification degree. With further separation effort of eleven to seventeen required separation stages, the monochlorides of plutonium trichloride from both systems A and B are simulatively shown to be separated as high pure distillation products.

Image Processing Approach for Detection of Three-Dimensional Tree-Rings from X-Ray Computed Tomography

Tree-ring analysis is an important part of the quality assessment and the dating of (archaeological) wood samples. It provides quantitative data about the whole anatomical ring structure, which can be used, for example, to measure the impact of the fluctuating environment on the tree growth, for the dendrochronological analysis of archaeological wooden artefacts and to estimate the wood mechanical properties. Despite advances in computer vision and edge recognition algorithms, detection and counting of annual rings are still limited to 2D datasets and performed in most cases manually, which is a time consuming, tedious task and depends strongly on the operator’s experience. This work presents an image processing approach to detect the whole 3D tree-ring structure directly from X-ray computed tomography imaging data. The approach relies on a modified Canny edge detection algorithm, which captures fully connected tree-ring edges throughout the measured image stack and is validated on X-ray computed tomography data taken from six wood species.

Platform-as-a-Service Sticky Policies for Privacy Classification in the Cloud

In this paper, we present a Platform-as-a-Service (PaaS) model for controlling the privacy enforcement mechanisms applied on user data when stored and processed in Cloud data centers. The proposed architecture consists of establishing user configurable ‘sticky’ policies on the Graphical User Interface (GUI) data-bound components during the application development phase to specify the details of privacy enforcement on the contents of these components. Various privacy classification classes on the data components are formally defined to give the user full control on the degree and scope of privacy enforcement including the type of execution containers to process the data in the Cloud. This not only enhances the privacy-awareness of the developed Cloud services, but also results in major savings in performance and energy efficiency due to the fact that the privacy mechanisms are solely applied on sensitive data units and not on all the user content. The proposed design is implemented in a real PaaS cloud computing environment on the Microsoft Azure platform.

Towards End-To-End Disease Prediction from Raw Metagenomic Data

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Early Depression Detection for Young Adults with a Psychiatric and AI Interdisciplinary Multimodal Framework

During COVID-19, the depression rate has increased dramatically. Young adults are most vulnerable to the mental health effects of the pandemic. Lower-income families have a higher ratio to be diagnosed with depression than the general population, but less access to clinics. This research aims to achieve early depression detection at low cost, large scale, and high accuracy with an interdisciplinary approach by incorporating clinical practices defined by American Psychiatric Association (APA) as well as multimodal AI framework. The proposed approach detected the nine depression symptoms with Natural Language Processing sentiment analysis and a symptom-based Lexicon uniquely designed for young adults. The experiments were conducted on the multimedia survey results from adolescents and young adults and unbiased Twitter communications. The result was further aggregated with the facial emotional cues analyzed by the Convolutional Neural Network on the multimedia survey videos. Five experiments each conducted on 10k data entries reached consistent results with an average accuracy of 88.31%, higher than the existing natural language analysis models. This approach can reach 300+ million daily active Twitter users and is highly accessible by low-income populations to promote early depression detection to raise awareness in adolescents and young adults and reveal complementary cues to assist clinical depression diagnosis.