Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embedding. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic, and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n2) to O(n2/k), and the memory requirement from n2 to 2(n/k)2 which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.

6D Posture Estimation of Road Vehicles from Color Images

Currently, in the field of object posture estimation, there is research on estimating the position and angle of an object by storing a 3D model of the object to be estimated in advance in a computer and matching it with the model. However, in this research, we have succeeded in creating a module that is much simpler, smaller in scale, and faster in operation. Our 6D pose estimation model consists of two different networks – a classification network and a regression network. From a single RGB image, the trained model estimates the class of the object in the image, the coordinates of the object, and its rotation angle in 3D space. In addition, we compared the estimation accuracy of each camera position, i.e., the angle from which the object was captured. The highest accuracy was recorded when the camera position was 75°, the accuracy of the classification was about 87.3%, and that of regression was about 98.9%.

The Mediating Role of Level of Education and Income on the Relationship between Political Ideology and Attitude towards Immigration

This study is investigating the impact of ideological structures in terms of conservative and liberal on shaping immigration acceptance attitudes under the contribution of socio-economic status. According to motivated reasoning theory, political ideology is identified as a recurrent impact on the formation of attitude, while conservatives tend to express more hostility toward immigrants in comparison to liberals which are proposed to be more tolerant towards immigrants. Our finding suggests that political ideology will structure individual attitudes when citizens socio-economic vulnerability and level of education are low enough to consider immigrants as a threat. Therefore, economic vulnerability is proposed to weaken the ideological predispositions’ resistance. There has been some threats and factors such as level of education and economic condition proposed by group competition theory and labor market competition theory as fundamental factors which can strengthen or weaken the effects of political ideology on individuals’ attitudes towards immigration; those mechanisms for liberals and conservatives will be operated differently.

Participatory Financial Inclusion Hypothesis: A Preliminary Empirical Validation Using Survey Design

In Nigeria, enormous efforts/resources had, over the years, been expended on promoting financial inclusion (FI); however, it is seemingly discouraging that many of its self-declared targets on FI remained unachieved, especially amongst the Rural Dwellers and Actors in the Informal Sectors (RDAIS). Expectedly, many reasons had been earmarked for these failures: low literacy level, huge informal/rural sectors etc. This study posits that in spite of these truly-debilitating factors, these FI policy failures could have been avoided or mitigated if the principles of active and better-managed citizens’ participation had been strictly followed in the (re)design/implementation of its FI policies. In other words, in a bid to mitigate the prevalent financial exclusion (FE) in Nigeria, this study hypothesizes the significant positive impact of involving the RDAIS in policy-wide decision making in the FI domain, backed by a preliminary empirical validation. Also, the study introduces the RDAIS-focused Participatory Financial Inclusion Policy (PFIP) as a major FI policy regeneration/improvement tool. The three categories of respondents that served as research subjects are FI experts in Nigeria (n = 72), RDAIS from the very rural/remote village of Unguwar Dogo in Northern Nigeria (n = 43) and RDAIS from another rural village of Sekere (n = 56) in the Southern region of Nigeria. Using survey design (5-point Likert scale questionnaires), random/stratified sampling, and descriptive/inferential statistics, the study often recorded independent consensus (amongst these three categories of respondents) that RDAIS’s active participation in iterative FI policy initiation, (re)design, implementation, (re)evaluation could indeed give improved FI outcomes. However, few questionnaire items also recorded divergent opinions and various statistically (in)significant differences on the mean scores of these three categories. The PFIP (or any customized version of it) should then be carefully integrated into the NFIS of Nigeria (and possibly in the NFIS of other developing countries) to truly/fully provide FI policy integration for these excluded RDAIS and arrest the prevalence of FE.

HaskellFL: A Tool for Detecting Logical Errors in Haskell

Understanding and using the functional paradigm is a challenge for many programmers. Looking for logical errors in code may take a lot of a developer’s time when a program grows in size. In order to facilitate both processes, this paper presents HaskellFL, a tool that uses fault localization techniques to locate a logical error in Haskell code. The Haskell subset used in this work is sufficiently expressive for those studying Functional Programming to get immediate help debugging their code and to answer questions about key concepts associated with the functional paradigm. HaskellFL was tested against Functional Programming assignments submitted by students enrolled at the Functional Programming class at the Federal University of Minas Gerais and against exercises from the Exercism Haskell track that are publicly available in GitHub. This work also evaluated the effectiveness of two fault localization techniques, Tarantula and Ochiai, in the Haskell context. Furthermore, the EXAM score was chosen to evaluate the tool’s effectiveness, and results showed that HaskellFL reduced the effort needed to locate an error for all tested scenarios. The results also showed that the Ochiai method was more effective than Tarantula.

Battery Grading Algorithm in 2nd-Life Repurposing Li-ion Battery System

This article presents a methodology that improves reliability and cyclability of 2nd-life Li-ion battery system repurposed as energy storage system (ESS). Most of the 2nd-life retired battery systems in market have module/pack-level state of health (SOH) indicator, which is utilized for guiding appropriate depth of discharge (DOD) in the application of ESS. Due to the lack of cell-level SOH indication, the different degrading behaviors among various cells cannot be identified upon reaching retired status; in the end, considering end of life (EOL) loss and pack-level DOD, the repurposed ESS has to be oversized by > 1.5 times to complement the application requirement of reliability and cyclability. This proposed battery grading algorithm, using non-invasive methodology, is able to detect outlier cells based on historical voltage data and calculate cell-level historical maximum temperature data using semi-analytic methodology. In this way, the individual battery cell in the 2nd-life battery system can be graded in terms of SOH on basis of the historical voltage fluctuation and estimated historical maximum temperature variation. These grades will have corresponding DOD grades in the application of the repurposed ESS to enhance the system reliability and cyclability. In all, this introduced battery grading algorithm is non-invasive, compatible with all kinds of retired Li-ion battery systems which lack of cell-level SOH indication, as well as potentially being embedded into battery management software for preventive maintenance and real-time cyclability optimization.

Thin Bed Reservoir Delineation Using Spectral Decomposition and Instantaneous Seismic Attributes, Pohokura Field, Taranaki Basin, New Zealand

The thick bed hydrocarbon reservoirs are primarily interested because of the more prolific production. When the amount of petroleum in the thick bed starts decreasing, the thin bed reservoirs are the alternative targets to maintain the reserves. The conventional interpretation of seismic data cannot delineate the thin bed having thickness less than the vertical seismic resolution. Therefore, spectral decomposition and instantaneous seismic attributes were used to delineate the thin bed in this study. Short Window Discrete Fourier Transform (SWDFT) spectral decomposition and instantaneous frequency attributes were used to reveal the thin bed reservoir, while Continuous Wavelet Transform (CWT) spectral decomposition and envelope (instantaneous amplitude) attributes were used to indicate hydrocarbon bearing zone. The study area is located in the Pohokura Field, Taranaki Basin, New Zealand. The thin bed target is the uppermost part of Mangahewa Formation, the most productive in the gas-condensate production in the Pohokura Field. According to the time-frequency analysis, SWDFT spectral decomposition can reveal the thin bed using a 72 Hz SWDFT isofrequency section and map, and that is confirmed by the instantaneous frequency attribute. The envelope attribute showing the high anomaly indicates the hydrocarbon accumulation area at the thin bed target. Moreover, the CWT spectral decomposition shows the low-frequency shadow zone and abnormal seismic attenuation in the higher isofrequencies below the thin bed confirms that the thin bed can be a prospective hydrocarbon zone.

Catalytic Pyrolysis of Sewage Sludge for Upgrading Bio-Oil Quality Using Sludge-Based Activated Char as an Alternative to HZSM5

Due to the concerns about the depletion of fossil fuel sources and the deteriorating environment, the attempt to investigate the production of renewable energy will play a crucial role as a potential to alleviate the dependency on mineral fuels. One particular area of interest is generation of bio-oil through sewage sludge (SS) pyrolysis. SS can be a potential candidate in contrast to other types of biomasses due to its availability and low cost. However, the presence of high molecular weight hydrocarbons and oxygenated compounds in the SS bio-oil hinders some of its fuel applications. In this context, catalytic pyrolysis is another attainable route to upgrade bio-oil quality. Among different catalysts (i.e., zeolites) studied for SS pyrolysis, activated chars (AC) are eco-friendly alternatives. The beneficial features of AC derived from SS comprise the comparatively large surface area, porosity, enriched surface functional groups and presence of a high amount of metal species that can improve the catalytic activity. Hence, a sludge-based AC catalyst was fabricated in a single-step pyrolysis reaction with NaOH as the activation agent and was compared with HZSM5 zeolite in this study. The thermal decomposition and kinetics were invested via thermogravimetric analysis (TGA) for guidance and control of pyrolysis and catalytic pyrolysis and the design of the pyrolysis setup. The results indicated that the pyrolysis and catalytic pyrolysis contain four obvious stages and the main decomposition reaction occurred in the range of 200-600 °C. Coats-Redfern method was applied in the 2nd and 3rd devolatilization stages to estimate the reaction order and activation energy (E) from the mass loss data. The average activation energy (Em) values for the reaction orders n = 1, 2 and 3 were in the range of 6.67-20.37 kJ/mol for SS; 1.51-6.87 kJ/mol for HZSM5; and 2.29-9.17 kJ/mol for AC, respectively. According to the results, AC and HZSM5 both were able to improve the reaction rate of SS pyrolysis by abridging the Em value. Moreover, to generate and examine the effect of the catalysts on the quality of bio-oil, a fixed-bed pyrolysis system was designed and implemented. The composition analysis of the produced bio-oil was carried out via gas chromatography/mass spectrometry (GC/MS). The selected SS to catalyst ratios were 1:1, 2:1 and 4:1. The optimum ratio in terms of cracking the long-chain hydrocarbons and removing oxygen-containing compounds was 1:1 for both catalysts. The upgraded bio-oils with HZSM5 and AC were in the total range of C4-C17 with around 72% in the range of C4-C9. The bio-oil from pyrolysis of SS contained 49.27% oxygenated compounds while the presence of HZSM5 and AC dropped to 7.3% and 13.02%, respectively. Meanwhile, generation of value-added chemicals such as light aromatic compounds were significantly improved in the catalytic process. Furthermore, the fabricated AC catalyst was characterized by BET, SEM-EDX, FT-IR and TGA techniques. Overall, this research demonstrated that AC is an efficient catalyst in the pyrolysis of SS and can be used as a cost-competitive catalyst in contrast to HZSM5.

Dimensionality Reduction in Modal Analysis for Structural Health Monitoring

Autonomous structural health monitoring (SHM) of many structures and bridges became a topic of paramount importance for maintenance purposes and safety reasons. This paper proposes a set of machine learning (ML) tools to perform automatic feature selection and detection of anomalies in a bridge from vibrational data and compare different feature extraction schemes to increase the accuracy and reduce the amount of data collected. As a case study, the Z-24 bridge is considered because of the extensive database of accelerometric data in both standard and damaged conditions. The proposed framework starts from the first four fundamental frequencies extracted through operational modal analysis (OMA) and clustering, followed by time-domain filtering (tracking). The fundamental frequencies extracted are then fed to a dimensionality reduction block implemented through two different approaches: feature selection (intelligent multiplexer) that tries to estimate the most reliable frequencies based on the evaluation of some statistical features (i.e., entropy, variance, kurtosis), and feature extraction (auto-associative neural network (ANN)) that combine the fundamental frequencies to extract new damage sensitive features in a low dimensional feature space. Finally, one-class classification (OCC) algorithms perform anomaly detection, trained with standard condition points, and tested with normal and anomaly ones. In particular, principal component analysis (PCA), kernel principal component analysis (KPCA), and autoassociative neural network (ANN) are presented and their performance are compared. It is also shown that, by evaluating the correct features, the anomaly can be detected with accuracy and an F1 score greater than 95%.

Rolling Element Bearing Diagnosis by Improved Envelope Spectrum: Optimal Frequency Band Selection

The Rolling Element Bearing (REB) vibration diagnosis is worth of special interest by the variety of REB and the wide necessity of those elements in industrial applications. The presence of a localized fault in a REB gives rise to a vibrational response, characterized by the modulation of a carrier signal. Frequency content of carrier signal (Spectral Frequency –f) is mainly related to resonance frequencies of the REB. This carrier signal is modulated by another signal, governed by the periodicity of the fault impact (Cyclic Frequency –α). In this sense, REB fault vibration response gives rise to a second-order cyclostationary signal. Second order cyclostationary signals could be represented in a bi-spectral map, where Spectral Coherence –SCoh are plotted against f and α. The Improved Envelope Spectrum –IES, is a useful approach to execute REB fault diagnosis. IES could be applied by the integration of SCoh over a predefined bandwidth on the f axis. Approaches to select f-bandwidth have been recently exposed by the definition of a metric which intends to evaluate the magnitude of the IES at the fault characteristics frequencies. This metric is represented in a 1/3-binary tree as a function of the frequency bandwidth and centre. Based on this binary tree the optimal frequency band is selected. However, some advantages have been seen if the metric is changed, which in fact tends to dictate different optimal f-bandwidth and so improve the IES representation. This paper evaluates the behaviour of the IES from a different metric optimization. This metric is based on the sample correlation coefficient, detecting high peaks in the selected frequencies while penalizing high peaks in the neighbours of the selected frequencies. Prior results indicate an improvement on the signal-noise ratio (SNR) on around 86% of samples analysed, which belong to IMS database.

A Medical Vulnerability Scoring System Incorporating Health and Data Sensitivity Metrics

With the advent of complex software and increased connectivity, security of life-critical medical devices is becoming an increasing concern, particularly with their direct impact to human safety. Security is essential, but it is impossible to develop completely secure and impenetrable systems at design time. Therefore, it is important to assess the potential impact on security and safety of exploiting a vulnerability in such critical medical systems. The common vulnerability scoring system (CVSS) calculates the severity of exploitable vulnerabilities. However, for medical devices, it does not consider the unique challenges of impacts to human health and privacy. Thus, the scoring of a medical device on which a human life depends (e.g., pacemakers, insulin pumps) can score very low, while a system on which a human life does not depend (e.g., hospital archiving systems) might score very high. In this paper, we present a Medical Vulnerability Scoring System (MVSS) that extends CVSS to address the health and privacy concerns of medical devices. We propose incorporating two new parameters, namely health impact and sensitivity impact. Sensitivity refers to the type of information that can be stolen from the device, and health represents the impact to the safety of the patient if the vulnerability is exploited (e.g., potential harm, life threatening). We evaluate 15 different known vulnerabilities in medical devices and compare MVSS against two state-of-the-art medical device-oriented vulnerability scoring system and the foundational CVSS.

Robot-assisted Relaxation Training for Children with Autism Spectrum Disorders

Cognitive Behavioral Therapy (CBT) has been proven an effective tool to address anger and anxiety issues in children and adolescents with Autism Spectrum Disorders (ASD). Robot-enhanced therapy has been used in psychosocial and educational interventions for children with ASD with promising results. Whenever CBT-based techniques were incorporated in robot-based interventions, they were mainly performed in group sessions. Objectives: The study’s main objective was the implementation and evaluation of the effectiveness of a relaxation training intervention for children with ASD, delivered by the social robot NAO. Methods: 20 children (aged 7–12 years) were randomly assigned to 16 sessions of relaxation training implemented twice a week. Two groups were formed: the NAO group (children participated in individual sessions with the support of NAO) and the control group (children participated in individual sessions with the support of the therapist only). Participants received three different relaxation scenarios of increasing difficulty (a breathing scenario, a progressive muscle relaxation scenario and a body scan medication scenario), as well as related homework sheets for practicing. Pre- and post-intervention assessments were conducted using the Child Behavior Checklist (CBCL) and the Strengths and Difficulties Questionnaire for parents (SDQ-P). Participants were also asked to complete an open-ended questionnaire to evaluate the effectiveness of the training. Parents’ satisfaction was evaluated via a questionnaire and children satisfaction was assessed by a thermometer scale. Results: The study supports the use of relaxation training with the NAO robot as instructor for children with ASD. Parents of enrolled children reported high levels of satisfaction and provided positive ratings of the training acceptability. Children in the NAO group presented greater motivation to complete homework and adopt the learned techniques at home. Conclusions: Relaxation training could be effectively integrated in robot-assisted protocols to help children with ASD regulate emotions and develop self-control.

The Application of Fuzzy Set Theory to Mobile Internet Advertisement Fraud Detection

This paper presents the application of fuzzy set theory to implement of mobile advertisement anti-fraud systems. Mobile anti-fraud is a method aiming to identify mobile advertisement fraudsters. One of the main problems of mobile anti-fraud is the lack of evidence to prove a user to be a fraudster. In this paper, we implement an application by using fuzzy set theory to demonstrate how to detect cheaters. The advantage of our method is that the hardship in detecting fraudsters in small data samples has been avoided. We achieved this by giving each user a suspicious degree showing how likely the user is cheating and decide whether a group of users (like all users of a certain APP) together to be fraudsters according to the average suspicious degree. This makes the process more accurate as the data of a single user is too small to be predictable.

Classification of Extreme Ground-Level Ozone Based on Generalized Extreme Value Model for Air Monitoring Station

Higher ground-level ozone (GLO) concentration adversely affects human health, vegetations as well as activities in the ecosystem. In Malaysia, most of the analysis on GLO concentration are carried out using the average value of GLO concentration, which refers to the centre of distribution to make a prediction or estimation. However, analysis which focuses on the higher value or extreme value in GLO concentration is rarely explored. Hence, the objective of this study is to classify the tail behaviour of GLO using generalized extreme value (GEV) distribution estimation the return level using the corresponding modelling (Gumbel, Weibull, and Frechet) of GEV distribution. The results show that Weibull distribution which is also known as short tail distribution and considered as having less extreme behaviour is the best-fitted distribution for four selected air monitoring stations in Peninsular Malaysia, namely Larkin, Pelabuhan Kelang, Shah Alam, and Tanjung Malim; while Gumbel distribution which is considered as a medium tail distribution is the best-fitted distribution for Nilai station. The return level of GLO concentration in Shah Alam station is comparatively higher than other stations. Overall, return levels increase with increasing return periods but the increment depends on the type of the tail of GEV distribution’s tail. We conduct this study by using maximum likelihood estimation (MLE) method to estimate the parameters at four selected stations in Peninsular Malaysia. Next, the validation for the fitted block maxima series to GEV distribution is performed using probability plot, quantile plot and likelihood ratio test. Profile likelihood confidence interval is tested to verify the type of GEV distribution. These results are important as a guide for early notification on future extreme ozone events.

Designing for Inclusion within the Learning Management System: Social Justice, Identities, and Online Design for Digital Spaces in Higher Education

The aim of this paper is to propose pedagogical design for learning management systems (LMS) that offers greater inclusion for students based on a number of theoretical perspectives and delineated through an example. Considering the impact of COVID-19, including on student mental health, the research suggesting the importance of student sense of belonging on retention, success, and student well-being, the author describes intentional LMS design incorporating theoretically based practices informed by critical theory, feminist theory, indigenous theory and practices, and new materiality. This article considers important aspects of these theories and practices which attend to inclusion, identities, and socially just learning environments. Additionally, increasing student sense of belonging and mental health through LMS design influenced by adult learning theory and the community of inquiry model are described.  The process of thinking through LMS pedagogical design with inclusion intentionally in mind affords the opportunity to allow LMS to go beyond course use as a repository of documents, to an intentional community of practice that facilitates belonging and connection, something much needed in our times. In virtual learning environments it has been harder to discern how students are doing, especially in feeling connected to their courses, their faculty, and their student peers. Increasingly at the forefront of public universities is addressing the needs of students with multiple and intersecting identities and the multiplicity of needs and accommodations. Education in 2020, and moving forward, calls for embedding critical theories and inclusive ideals and pedagogies to the ways instructors design and teach in online platforms. Through utilization of critical theoretical frameworks and instructional practices, students may experience the LMS as a welcoming place with intentional plans for welcoming diversity in identities.

Investigation of Tbilisi City Atmospheric Air Pollution with PM in Usual and Emergency Situations Using the Observational and Numerical Modeling Data

Pollution of the Tbilisi atmospheric air with PM2.5 and PM10 in usual and pandemic situations by using the data of 5 stationary observation points is investigated. The values of the statistical characteristic parameters of PM in the atmosphere of Tbilisi are analyzed and trend graphs are constructed. By means of analysis of pollution levels in the quarantine and usual periods the proportion of vehicle traffic in pollution of city is estimated. Experimental measurements of PM2.5, PM10 in the atmosphere have been carried out in different districts of the city and map of the distribution of their concentrations were constructed. It is shown that maximum pollution values are recorded in the city center and along major motorways. It is shown that the average monthly concentrations vary in the range of 0.6-1.6 Maximum Permissible Concentration (MPC). Average daily values of concentration vary at 2-4 days intervals. The distribution of PM10 generated as a result of traffic is numerical modeled. The modeling results are compared with the observation data.

The Impact of ISO 9001 Certification on Brazilian Firms’ Performance: Insights from Multiple Case Studies

The evolution of quality management by companies was strongly enabled by, among others, ISO 9001 certification, which is considered a crucial requirement for several customers. Likewise, performance measurement provides useful insights for companies to identify the reflection of their decision-making process on their improvement. One of the most used performance measurement models is the balanced scorecard (BSC), which uses four perspectives to address a firm’s performance: financial, internal process, customer satisfaction, and learning and growth. Since ISO 9001 certified firms are likely to measure their performance through BSC approach, it is important to verify whether the certificate influences the firm performance or not. Therefore, this paper aims to verify the impact of ISO 9001:2015 on Brazilian firms’ performance based on the BSC perspective. Hence, nine certified companies located in the Southeast region of Brazil were studied through a multiple case study approach. Within this study, it was possible to identify the positive impact of ISO 9001 on firms’ overall performance, and four Critical Success Factors (CSFs) were identified as relevant on the linkage among ISO 9001 and firms’ performance: employee involvement, top management, process management, and customer focus. Due to the COVID-19 pandemic, the number of interviews was limited to the quality manager specialist, and the sample was limited since several companies were closed during the period of the study. This study presents an in-depth analysis of how the relationship between ISO 9001 certification and firms’ performance in a developing country is.

The Art of Leadership: Skills to Inspire the Team to Overcome Project Challenges and Achieve Their Goals

This paper highlights skills that a leader needs to acquire to lead a team successfully. With an appropriate vision and strategy, a team can be inspired, influenced and easily led. The importance of setting codes of conduct and establishing mutual agreements between the team members can help in minimizing issues and improving overall productivity. Leadership skills include the power of questioning (PoQ), effective communication, identification of team member responsibilities, and assessment of self and the team. This paper will highlight the impact of good leadership on work progress and overall team performance. The paper explains how leaders make correct decisions by avoiding hasty actions that could generate new errors, mistakes, and issues. The importance of positive expectations for the team is addressed in this paper that could result in efficient control of the work with better outcomes.

Speedup Breadth-First Search by Graph Ordering

Breadth-First Search (BFS) is a core graph algorithm that is widely used for graph analysis. As it is frequently used in many graph applications, improving the BFS performance is essential. In this paper, we present a graph ordering method that could reorder the graph nodes to achieve better data locality, thus, improving the BFS performance. Our method is based on an observation that the sibling relationships will dominate the cache access pattern during the BFS traversal. Therefore, we propose a frequency-based model to construct the graph order. First, we optimize the graph order according to the nodes’ visit frequency. Nodes with high visit frequency will be processed in priority. Second, we try to maximize the child nodes’ overlap layer by layer. As it is proved to be NP-hard, we propose a heuristic method that could greatly reduce the preprocessing overheads.We conduct extensive experiments on 16 real-world datasets. The result shows that our method could achieve comparable performance with the state-of-the-art methods while the graph ordering overheads are only about 1/15.

Machine Learning for Music Aesthetic Annotation Using MIDI Format: A Harmony-Based Classification Approach

Swimming with the tide of deep learning, the field of music information retrieval (MIR) experiences parallel development and a sheer variety of feature-learning models has been applied to music classification and tagging tasks. Among those learning techniques, the deep convolutional neural networks (CNNs) have been widespreadly used with better performance than the traditional approach especially in music genre classification and prediction. However, regarding the music recommendation, there is a large semantic gap between the corresponding audio genres and the various aspects of a song that influence user preference. In our study, aiming to bridge the gap, we strive to construct an automatic music aesthetic annotation model with MIDI format for better comparison and measurement of the similarity between music pieces in the way of harmonic analysis. We use the matrix of qualification converted from MIDI files as input to train two different classifiers, support vector machine (SVM) and Decision Tree (DT). Experimental results in performance of a tag prediction task have shown that both learning algorithms are capable of extracting high-level properties in an end-to end manner from music information. The proposed model is helpful to learn the audience taste and then the resulting recommendations are likely to appeal to a niche consumer.