Teaching Science Content Area Literacy to 21st Century Learners

The use of new literacies within science classrooms needs to be balanced by teachers to both teach different forms of communication while assessing content area proficiency. Using new literacies such as Twitter and Facebook needs to be incorporated into science content area literacy studies in addition to continuing to use generally-accepted forms of scientific content area presentation which include scientific papers and textbooks. The research question this literature review seeks to answer is “What are some ways in which new forms of literacy are better suited to teach scientific content area literacy to 21st century learners?” The research question is addressed through a literature review that highlights methods currently being used to educate the next wave of learners in the world of science content area literacy. Both temporal discourse analysis (TDA) and critical discourse analysis (CDA) were used to determine the need to use new literacies to teach science content area literacy. Increased use of digital technologies and a change in science content area pedagogy were explored.

The Use of Symbolic Signs in Modern Ukrainian Monumental Church Painting: Classification and Hidden Semantics

Monumental church paintings are often perceived either as the interior decoration of the temple or as the "Gospel for the illiterate," as the temple painting often contains scenes from Holy Scripture. In science the painting of the Orthodox Church is mainly the subject of study of art critics, but from the point of view of culturology and semiotics, it is insufficiently studied. The symbolism of monumental church painting is insufficiently revealed. The aim of this paper is to give a description of symbolic signs, to classify them, to give examples for each type of sign from the paintings of modern temples of Eastern Ukraine, on the basis of semiotic analysis of iconographic plots used in monumental church painting. We offer own classification of symbols of monumental church painting, using examples from the murals of modern Orthodox churches in Eastern Ukraine, mainly from the Donetsk region. When analyzing the semantics of symbolic signs, the following methods of the culturological approach were used: semiotic, iconological, iconographic, hermeneutic, culturological, descriptive, comparative-historical, visual-analytical. When interpreting the meanings of symbolic signs, scientific, cultural and theological literature were used. Photos taken by the author have been added to the article.

Controlled Vocabularies and Information Retrieval: 1918 Pandemic’s Scientific Literature as an Example

The role of controlled vocabularies in information retrieval is broadly recognized as a relevant feature. Besides, there is a standing demand that editors and databases should consider the effective introduction of controlled vocabularies in their procedures to index scientific literature. That is especially important because information retrieval is pointed out as a significant point to drive systematic literature review. Hence, a first question emerges: Are the controlled vocabularies at this moment considered? On the other hand, subject searching in the catalogs is complex mainly due to the dichotomy between keywords from authors versus keywords based on controlled vocabularies. Finally, there is some demand to unify the terminology related to health to make easier the medical history exploitation and research. Considering these features, this paper focuses on controlled vocabularies related to the health field and their role for storing, classifying, and retrieving relevant literature. The objective is knowing which role plays the controlled vocabularies related to the health field to index and retrieve research literature in data bases such as Web of Science (WoS) and Scopus. So, this exploratory research is grounded over two research questions: 1) Which are the terms considered in specific controlled vocabularies of the health field; and 2) How papers are indexed in relevant databases to be easily retrieved, considering keywords vs specific health’ controlled vocabularies? This research takes as fieldwork the controlled vocabularies related to health and the scientific interest for 1918 flu pandemic, also known equivocally as ‘Spanish flu’. This interest has been fostered by the emergence in the early 21st of epidemics of pneumonic diseases caused by virus. Searches about and with controlled vocabularies on WoS and Scopus databases are conducted. First results of this work in progress are surprising. There are different controlled vocabularies for the health field, into which the terms collected and preferred related to ‘1918 pandemic’ are identified. To summarize, ‘Spanish influenza epidemic’ or ‘Spanish flu’ are collected as not preferred terms. The preferred terms are: ‘influenza’ or ‘influenza pandemic, 1918-1919’. Although the controlled vocabularies are clear in their election, most of the literature about ‘1918 pandemic’ is retrievable either by ‘Spanish’ or by ‘1918’ disjunct, and the dominant word to retrieve literature is ‘Spanish’ rather than ‘1918’. This is surprising considering the existence of suitable controlled vocabularies related to health topics, and the modern guidelines of World Health Organization concerning naming of diseases that point out to other preferred terms. A first conclusion is the failure of using controlled vocabularies for a field such as health, and in consequence for WoS and Scopus. This research opens further research questions about which is the role that controlled vocabularies play in the instructions to authors that journals deliver to documents’ authors.

Optimizing Data Evaluation Metrics for Fraud Detection Using Machine Learning

The use of technology has benefited society in more ways than one ever thought possible. Unfortunately, as society’s knowledge of technology has advanced, so has its knowledge of ways to use technology to manipulate others. This has led to a simultaneous advancement in the world of fraud. Machine learning techniques can offer a possible solution to help decrease these advancements. This research explores how the use of various machine learning techniques can aid in detecting fraudulent activity across two different types of fraudulent datasets, and the accuracy, precision, recall, and F1 were recorded for each method. Each machine learning model was also tested across five different training and testing splits in order to discover which split and technique would lead to the most optimal results.

The Use of Artificial Intelligence in Digital Forensics and Incident Response in a Constrained Environment

Digital investigators often have a hard time spotting evidence in digital information. It has become hard to determine which source of proof relates to a specific investigation. A growing concern is that the various processes, technology, and specific procedures used in the digital investigation are not keeping up with criminal developments. Therefore, criminals are taking advantage of these weaknesses to commit further crimes. In digital forensics investigations, artificial intelligence (AI) is invaluable in identifying crime. Providing objective data and conducting an assessment is the goal of digital forensics and digital investigation, which will assist in developing a plausible theory that can be presented as evidence in court. This research paper aims at developing a multiagent framework for digital investigations using specific intelligent software agents (ISAs). The agents communicate to address particular tasks jointly and keep the same objectives in mind during each task. The rules and knowledge contained within each agent are dependent on the investigation type. A criminal investigation is classified quickly and efficiently using the case-based reasoning (CBR) technique. The proposed framework development is implemented using the Java Agent Development Framework, Eclipse, Postgres repository, and a rule engine for agent reasoning. The proposed framework was tested using the Lone Wolf image files and datasets. Experiments were conducted using various sets of ISAs and VMs. There was a significant reduction in the time taken for the Hash Set Agent to execute. As a result of loading the agents, 5% of the time was lost, as the File Path Agent prescribed deleting 1,510, while the Timeline Agent found multiple executable files. In comparison, the integrity check carried out on the Lone Wolf image file using a digital forensic tool kit took approximately 48 minutes (2,880 ms), whereas the MADIK framework accomplished this in 16 minutes (960 ms). The framework is integrated with Python, allowing for further integration of other digital forensic tools, such as AccessData Forensic Toolkit (FTK), Wireshark, Volatility, and Scapy.

Atherosclerosis Prevalence within Populations of the Southeastern United States

A prevalence cohort study of atherosclerotic lesions within cadavers was performed to better understand and characterize the prevalence of atherosclerosis among Georgia residents within body donors in the Philadelphia College of Osteopathic Medicine (PCOM) - Georgia body donor program. We procured specimens from cadavers used for medical student, physical therapy student, and biomedical science student cadaveric anatomical dissection at PCOM - South Georgia and PCOM - Georgia. Tissues were prepared using hematoxylin and eosin (H&E) stain as histological slides by Colquitt Regional Medical Center Laboratory Services. One section from each of the following arteries was taken after cadaveric dissection at the site of most calcification palpated grossly (if present): left anterior descending coronary artery, left internal carotid artery, abdominal aorta, splenic artery, and hepatic artery. All specimens were graded and categorized according to the American Heart Association’s Modified and Conventional Standards for Atherosclerotic Lesions using x4, x10, x40 microscopic magnification. Our study cohort included 22 cadavers, with 16 females and 6 males. The average age was 72.54 and median age was 72, with a range of 52 to 90 years old. The cause of death determination listing vascular and/or cardiovascular causes were present on 6 of the 22 death certificates. 19 of 22 (86%) cadavers had at least a single artery grading > 5. Of the cadavers with at least a single artery graded at greater than 5, only 5 of 19 (26%) cadavers had a vascular or cardiovascular cause of death reported. Malignancy was listed as a cause of death on 7 (32%) of death certificates. The average atherosclerosis grading of the common hepatic, splenic and left internal carotid arteries (2.15, 3.05, and 3.36 respectively) were lower than the left anterior descending artery and the abdominal aorta (5.16 and 5.86 respectively). This prevalence study characterizes atherosclerosis found in five medium and large systemic arteries within cadavers from the state of Georgia.

Attitudes of Gratitude: An Analysis of 30 Cancer Narratives Published by Leading U.S. Cancer Care Centers

This study examines the ways in which cancer patient narratives are portrayed and framed on the websites of three leading U.S. cancer care centers – The University of Texas MD Anderson Cancer Center in Houston, Memorial Sloan Kettering Cancer Center in New York, and Seattle Cancer Care Alliance. Thirty patient stories, 10 from each cancer center website blog, were analyzed using qualitative and quantitative textual analysis of unstructured data, documenting common themes and other elements of story structure and content. Patient narratives were coded using grounded theory as the basis for conducting emergent qualitative research. As part of a systematic, inductive approach to collecting and analyzing data, recurrent and unique themes were examined and compared in terms of positive and negative framing, patient agency, and institutional praise. All three of these cancer care centers are teaching hospitals, with university affiliations, that emphasize an evidence-based scientific approach to treatment that utilizes the latest research and cutting-edge techniques and technology. The featured cancer stories suggest positive outcomes based on anecdotal narratives as opposed to the science-based treatment models employed by the cancer centers. An analysis of 30 sample stories found skewed representation of the “cancer experience” that emphasizes positive outcomes while minimizing or excluding more negative realities of cancer diagnosis and treatment. The stories also deemphasize patient agency, instead focusing on deference and gratitude toward the cancer care centers, which are cast in the role of savior.  

Knowledge, Attitude and Practice of Pregnant Women toward Antenatal Care at Public Hospitals in Sana'a City-Yemen

Background: Antenatal care can be defined as the care provided by skilled healthcare professionals to pregnant women and adolescent girls to ensure the best health conditions for both mother and baby during pregnancy. The components of Antenatal Care (ANC) include risk identification; prevention and management of pregnancy-related or concurrent diseases; and health education and health promotion. The aim of this study: to assess the knowledge, attitude, and practice of pregnant women regarding ANC. Methodology: A descriptive knowledge, attitude, and practice (KAP) study was conducted in public hospitals in Sana'a City, Yemen. The study population included all pregnant women that intended to the prenatal department and clinical outpatient department; the final sample size was 371 pregnant women. A self-administered questionnaire was used to collect the data, statistical package for social sciences SPSS was used to data analysis. The results: Most (79%) of pregnant women had correct answers in total knowledge regarding ANC, and about two-thirds (67%) of pregnant women had performance practice regarding ANC and two-third (68%) of pregnant women had a positive attitude. Conclusions: More than three quarter of pregnant women had good knowledge level, most of pregnant women had moderate practice level, and more than two-thirds of pregnant women had a positive attitude regarding antenatal care. There was a statistically significant association between overall knowledge and practice level toward ANC and demographic characteristics of pregnant women, at P-value ≤ 0.05. Recommendations: we recommended more education and training courses, lecturers, and education sessions in clinical facilitators focused on ANC, which relies on evidence-based interventions provided to women during pregnancy by skilled healthcare providers such as midwives, doctors, and nurses.

Identifying Chaotic Architecture: Origins of Nonlinear Design Theory

Through the emergence of modern architecture, an aggressive desire for new design theories appeared through the works of architects and critics. The discourse of complexity and volumetric composition happened to be an important and controversial issue in the discipline of architecture which was discussed through a general point of view in Robert Venturi and Denise Scott Brown's book “Complexity and contradiction in architecture” in 1966, this paper attempts to identify chaos theory as a scientific model of complexity and its relation to architecture design theory by conducting a qualitative analysis and multidisciplinary critical approach through architecture and basic sciences resources. Accordingly, we identify chaotic architecture as the correlation between chaos theory and the discipline of architecture, and as an independent nonlinear design theory with specific characteristics and properties.

Auditory Brainstem Response in Wave VI for the Detection of Learning Disabilities

The use of brain stem auditory evoked potential (BAEP) is a common way to study the hearing function of people, a way to learn the functionality of a part of the brain neuronal groups that intervene in the learning process by studying the behaviour of wave VI. The latest advances in neuroscience have revealed the existence of different brain activity in the learning process that can be highlighted through the use of innocuous, low-cost and easy-access techniques such as, among others, the BAEP that can help us to detect early possible neurodevelopmental difficulties for their subsequent assessment and cure. To date and the authors best knowledge, only the latency data obtained, observing the first to V waves and mainly in the left ear, were taken into account. This work shows that it is essential to consider both ears; with these latest data, it has been possible to diagnose more precisely some cases than with the previous data had been diagnosed as “normal”despite showing signs of some alteration that motivated the new consultation to the specialist.

The Latency-Amplitude Binomial of Waves Resulting from the Application of Evoked Potentials for the Diagnosis of Dyscalculia

Recent advances in cognitive neuroscience have allowed a step forward in perceiving the processes involved in learning from the point of view of acquiring new information or the modification of existing mental content. The evoked potentials technique reveals how basic brain processes interact to achieve adequate and flexible behaviours. The objective of this work, using evoked potentials, is to study if it is possible to distinguish if a patient suffers a specific type of learning disorder to decide the possible therapies to follow. The methodology used in this work is to analyze the dynamics of different brain areas during a cognitive activity to find the relationships between the other areas analyzed to understand the functioning of neural networks better. Also, the latest advances in neuroscience have revealed the exis-tence of different brain activity in the learning process that can be highlighted through the use of non-invasive, innocuous, low-cost and easy-access techniques such as, among others, the evoked potentials that can help to detect early possible neurodevelopmental difficulties for their subsequent assessment and therapy. From the study of the amplitudes and latencies of the evoked potentials, it is possible to detect brain alterations in the learning process, specifically in dyscalculia, to achieve specific corrective measures for the application of personalized psycho-pedagogical plans that allow obtaining an optimal integral development of the affected people.

Engineering Topology of Photonic Systems for Sustainable Molecular Structure: Autopoiesis Systems

This paper introduces topological order in descried social systems starting with the original concept of autopoiesis by biologists and scientists, including the modification of general systems based on socialized medicine. Topological order is important in describing the physical systems for exploiting optical systems and improving photonic devices. The stats of topologically order have some interesting properties of topological degeneracy and fractional statistics that reveal the entanglement origin of topological order, etc. Topological ideas in photonics form exciting developments in solid-state materials, that being; insulating in the bulk, conducting electricity on their surface without dissipation or back-scattering, even in the presence of large impurities. A specific type of autopoiesis system is interrelated to the main categories amongst existing groups of the ecological phenomena interaction social and medical sciences. The hypothesis, nevertheless, has a nonlinear interaction with its natural environment ‘interactional cycle’ for exchange photon energy with molecules without changes in topology (i.e., chemical transformation into products do not propagate any changes or variation in the network topology of physical configuration). The engineering topology of a biosensor is based on the excitation boundary of surface electromagnetic waves in photonic band gap multilayer films. The device operation is similar to surface Plasmonic biosensors in which a photonic band gap film replaces metal film as the medium when surface electromagnetic waves are excited. The use of photonic band gap film offers sharper surface wave resonance leading to the potential of greatly enhanced sensitivity. So, the properties of the photonic band gap material are engineered to operate a sensor at any wavelength and conduct a surface wave resonance that ranges up to 470 nm. The wavelength is not generally accessible with surface Plasmon sensing. Lastly, the photonic band gap films have robust mechanical functions that offer new substrates for surface chemistry to understand the molecular design structure, and create sensing chips surface with different concentrations of DNA sequences in the solution to observe and track the surface mode resonance under the influences of processes that take place in the spectroscopic environment. These processes led to the development of several advanced analytical technologies, which are automated, real-time, reliable, reproducible and cost-effective. This results in faster and more accurate monitoring and detection of biomolecules on refractive index sensing, antibody–antigen reactions with a DNA or protein binding. Ultimately, the controversial aspect of molecular frictional properties is adjusted to each other in order to form unique spatial structure and dynamics of biological molecules for providing the environment mutual contribution in investigation of changes due the pathogenic archival architecture of cell clusters.

Review and Evaluation of Trending Canonical Correlation Analyses-Based Brain-Computer Interface Methods

The fast development of technology that has advanced neuroscience and human interaction with computers has enabled solutions to various problems and issues of this new era. The Brain-Computer Interface (BCI) has opened the door to several new research areas and have been able to provide solutions to critical and vital issues such as supporting a paralyzed patient to interact with the outside world, controlling a robot arm, playing games in VR with the brain, driving a wheelchair. This review presents the state-of-the-art methods and improvements of canonical correlation analyses (CCA), an SSVEP-based BCI method. These are the methods used to extract EEG signal features or, to be said differently, the features of interest that we are looking for in the EEG analyses. Each of the methods from oldest to newest has been discussed while comparing their advantages and disadvantages. This would create a great context and help researchers understand the most state-of-the-art methods available in this field, their pros and cons, and their mathematical representations and usage. This work makes a vital contribution to the existing field of study. It differs from other similar recently published works by providing the following: (1) stating most of the main methods used in this field in a hierarchical way, (2) explaining the pros and cons of each method and their performance, (3) presenting the gaps that exist at the end of each method that can improve the understanding and open doors to new researches or improvements. 

Capacities of Early Childhood Education Professionals for the Prevention of Social Exclusion of Children

Both policymakers and researchers recognize that participating in early childhood education and care (ECEC) is useful for all children, especially for those who are exposed to the high risk of social exclusion. Social exclusion of children is understood as a multidimensional construct including economic, social, cultural, health, and other aspects of disadvantage and deprivation, which individually or combined can have an unfavorable effect on the current life and development of a child, as well as on the child’s development and on disadvantaged life chances in adult life. ECEC institutions should be able to promote educational approaches that portray developmental, cultural, language, and other diversity amongst children. However, little is known about the ways in which Croatian ECEC institutions recognize and respect the diversity of children and their families and how they respond to their educational needs. That is why this paper is dedicated to the analysis of the capacities of ECEC professionals to respond to the demands of educational needs of this very diverse group of children and their families. The results obtained in the frame of the project “Models of response to educational needs of children at risk of social exclusion in ECEC institutions,” funded by the Croatian Science Foundation, will be presented. The research methodology arises from explanations of educational processes and risks of social exclusion as a complex and heterogeneous phenomenon. The preliminary results of the qualitative data analysis of educational practices regarding capacities to identify and appropriately respond to the requirements of children at risk of social exclusion will be presented. The data have been collected by interviewing educational staff in 10 Croatian ECEC institutions (n = 10). The questions in the interviews were related to various aspects of inclusive institutional policy, culture, and practices. According to the analysis, it is possible to conclude that Croatian ECEC professionals are still faced with great challenges in the process of implementation of inclusive policies, culture, and practices. There are several baselines of this conclusion. The interviewed educational professionals are not familiar enough with the whole complexity and diversity of needs of children at risk of social exclusion, and the ECEC institutions do not have enough resources to provide all interventions that these children and their families need.

The Use of Knowledge Management Systems and ICT Service Desk Management to Minimize the Digital Divide Experienced in the Museum Sector

Since the introduction of ServiceNow, the UK’s Science Museum Group’s (SMG) ICT service desk portal, there has not been an analysis of the tools available to SMG staff for Just-in-time knowledge acquisition (Knowledge Management Systems) and reporting ICT incidents with a focus on an aspect of professional identity namely, gender. Therefore, it is important for SMG to investigate the apparent disparities so that solutions can be derived to minimize this digital divide if one exists. This study is conducted in the milieu of UK museums, galleries, arts, academic, charitable, and cultural heritage sector. It is acknowledged at SMG that there are challenges with keeping up with an ever-changing digital landscape. Subsequently, this entails the rapid upskilling of staff and developing an infrastructure that supports just-in-time technological knowledge acquisition and reporting technology related issues. This problem was addressed by analysing ServiceNow ICT incident reports and reports from knowledge articles from a six-month period from February to July. This study found a statistically significant relationship between gender and reporting an ICT incident. There is also a significant relationship between gender and the priority level of ICT incident. Interestingly, there is no statistically significant relationship between gender and reading knowledge articles. Additionally, there is no statistically significant relationship between gender and reporting an ICT incident related to the knowledge article that was read by staff. The knowledge acquired from this study is useful to service desk management practice as it will help to inform the creation of future knowledge articles and ICT incident reporting processes.

Incorporating Lexical-Semantic Knowledge into Convolutional Neural Network Framework for Pediatric Disease Diagnosis

The utilization of electronic medical record (EMR) data to establish the disease diagnosis model has become an important research content of biomedical informatics. Deep learning can automatically extract features from the massive data, which brings about breakthroughs in the study of EMR data. The challenge is that deep learning lacks semantic knowledge, which leads to impracticability in medical science. This research proposes a method of incorporating lexical-semantic knowledge from abundant entities into a convolutional neural network (CNN) framework for pediatric disease diagnosis. Firstly, medical terms are vectorized into Lexical Semantic Vectors (LSV), which are concatenated with the embedded word vectors of word2vec to enrich the feature representation. Secondly, the semantic distribution of medical terms serves as Semantic Decision Guide (SDG) for the optimization of deep learning models. The study evaluates the performance of LSV-SDG-CNN model on four kinds of Chinese EMR datasets. Additionally, CNN, LSV-CNN, and SDG-CNN are designed as baseline models for comparison. The experimental results show that LSV-SDG-CNN model outperforms baseline models on four kinds of Chinese EMR datasets. The best configuration of the model yielded an F1 score of 86.20%. The results clearly demonstrate that CNN has been effectively guided and optimized by lexical-semantic knowledge, and LSV-SDG-CNN model improves the disease classification accuracy with a clear margin.

Modern Tragic Substance in O’Neill’s Desire under the Elms and Mourning Becomes Electra

The position Eugene O’Neill occupies in the history of American drama is undisputable. Critics have agreed that the American theatre was waiting for O’Neill to give it substance, character, and value. The American dramatist continues to be considered as a major influence on the body of dramatic repertoire across the globe. The American theatre before O’Neill knew playwrights who were mostly viewed as entertainers. The serious drama had to wait until O’Neill started his career with expressionistic and social drama. His breakthrough, however, came in 1925 when he published Desire Under the Elms, described as the first important tragedy to be written in America. Mourning Becomes Electra, published in 1931, further reinforced the reputation of Eugene O’Neill and was described as his 'magnum opus'. Aspiring to portray the essence of life and man’s innermost conflicts, O’Neill turned to the classical model, rather than to social realistic drama, to create modern tragedies with the aid of the then-new science of psychology. The present paper aims to undertake an in-depth study of how overtones from classical tragedies by the classical masters Aeschylus, Sophocles, and Euripides resonate through O’Neill’s two plays. The paper shows how leaning on classical themes and concepts interpreted in terms of psychological forces have added depth and tragic substance to a modern milieu and produced masterpieces of dramaturgy.

Aircraft Selection Process Using Preference Analysis for Reference Ideal Solution (PARIS)

Multiple criteria decision making analysis (MCDMA) methods are applied to many real - life problems in different fields of engineering science and technology. The "preference analysis for reference ideal solution (PARIS)" method is proposed for an efficient MCDMA evaluation of decision problems. The multiple criteria aircraft evaluation approach is based on the integrated the mean weight, entropy weight, PARIS, and TOPSIS method, which eliminates the subjective importance weight assignment process. The evaluation criteria were identified from an extensive literature review of aircraft selection process. The aim of this study is to propose an efficient methodology for handling the aircraft selection process in which the proposed method solves effectively the MCDMA problem. A numerical example is presented to demonstrate the applicability and validity of the proposed MCDMA approach. 

Scholar Index for Research Performance Evaluation Using Multiple Criteria Decision Making Analysis

This paper aims to present an objective quantitative methodology on how to evaluate individual’s scholarly research output using multiple criteria decision analysis. A multiple criteria decision making analysis (MCDMA) methodological process is adopted to build a multiple criteria evaluation model. With the introduction of the scholar index, which gives significant information about a researcher's productivity and the scholarly impact of his or her publications in a single number (s is the number of publications with at least s citations); cumulative research citation index; the scholar index is included in the citation databases to cover the multidimensional complexity of scholarly research performance and to undertake objective evaluations with scholar index. The scholar index, one of publication activity indexes, is analyzed by considering it to be the most appropriate sciencemetric indicator which allows to smooth over many drawbacks of scholarly output assessment by mere calculation of the number of publications (quantity) and citations (quality). Hence, this study includes a set of indicators-based scholar index to be used for evaluating scholarly researchers. Google Scholar open science database was used to assess and discuss scholarly productivity and impact of researchers. Based on the experiment of computing the scholar index, and its derivative indexes for a set of researchers on open research database platform, quantitative methods of assessing scholarly research output were successfully considered to rank researchers. The proposed methodology considers the ranking, and the selection of data on which a scholarly research performance evaluation was based, the analysis of the data, and the presentation of the multiple criteria analysis results.

Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embedding. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic, and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n2) to O(n2/k), and the memory requirement from n2 to 2(n/k)2 which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.