Measuring Text-Based Semantics Relatedness Using WordNet

Measuring semantic similarity between texts is calculating semantic relatedness between texts using various techniques. Our web application (Measuring Relatedness of Concepts-MRC) allows user to input two text corpuses and get semantic similarity percentage between both using WordNet. Our application goes through five stages for the computation of semantic relatedness. Those stages are: Preprocessing (extracts keywords from content), Feature Extraction (classification of words into Parts-of-Speech), Synonyms Extraction (retrieves synonyms against each keyword), Measuring Similarity (using keywords and synonyms, similarity is measured) and Visualization (graphical representation of similarity measure). Hence the user can measure similarity on basis of features as well. The end result is a percentage score and the word(s) which form the basis of similarity between both texts with use of different tools on same platform. In future work we look forward for a Web as a live corpus application that provides a simpler and user friendly tool to compare documents and extract useful information.

Measuring Text-Based Semantics Relatedness Using WordNet

Measuring semantic similarity between texts is calculating semantic relatedness between texts using various techniques. Our web application (Measuring Relatedness of Concepts-MRC) allows user to input two text corpuses and get semantic similarity percentage between both using WordNet. Our application goes through five stages for the computation of semantic relatedness. Those stages are: Preprocessing (extracts keywords from content), Feature Extraction (classification of words into Parts-of-Speech), Synonyms Extraction (retrieves synonyms against each keyword), Measuring Similarity (using keywords and synonyms, similarity is measured) and Visualization (graphical representation of similarity measure). Hence the user can measure similarity on basis of features as well. The end result is a percentage score and the word(s) which form the basis of similarity between both texts with use of different tools on same platform. In future work we look forward for a Web as a live corpus application that provides a simpler and user friendly tool to compare documents and extract useful information.

Modeling of Alpha-Particles’ Epigenetic Effects in Short-Term Test on Drosophila melanogaster

In recent years, interest in ecogenetic and biomedical problems related to the effects on the population of radon and its daughter decay products has increased significantly. Of particular interest is the assessment of the consequence of irradiation at hazardous radon areas, which includes the Almaty region due to the large number of tectonic faults that enhance radon emanation. In connection with the foregoing, the purpose of this work was to study the genetic effects of exposure to supernormal radon doses on the alpha-radiation model. Irradiation does not affect the growth of the cell, but rather its ability to differentiate. In addition, irradiation can lead to somatic mutations, morphoses and modifications. These damages most likely occur from changes in the composition of the substances of the cell. Such changes are epigenetic since they affect the regulatory processes of ontogenesis. Variability in the expression of regulatory genes refers to conditional mutations that modify the formation of signs of intraspecific similarity. Characteristic features of these conditional mutations are the dominant type of their manifestation, phenotypic asymmetry and their instability in the generations. Currently, the terms “morphosis” and “modification” are used to describe epigenetic variability, which are maintained in Drosophila melanogaster cultures using linkaged X- chromosomes, and the mutant X-chromosome is transmitted along the paternal line. In this paper, we investigated the epigenetic effects of alpha particles, whose source in nature is mainly radon and its daughter decay products. In the experiment, an isotope of plutonium-238 (Pu238), generating radiation with an energy of about 5500 eV, was used as a source of alpha particles. In an experiment in the first generation (F1), deformities or morphoses were found, which can be called "radiation syndromes" or mutations, the manifestation of which is similar to the pleiotropic action of genes. The proportion of morphoses in the experiment was 1.8%, and in control 0.4%. In this experiment, the morphoses in the flies of the first and second generation looked like black spots, or melanomas on different parts of the imago body; "generalized" melanomas; curled, curved wings; shortened wing; bubble on one wing; absence of one wing, deformation of thorax, interruption and violation of tergite patterns, disruption of distribution of ocular facets and bristles; absence of pigmentation of the second and third legs. Statistical analysis by the Chi-square method showed the reliability of the difference in experiment and control at P ≤ 0.01. On the basis of this, it can be considered that alpha particles, which in the environment are mainly generated by radon and its isotopes, have a mutagenic effect that manifests itself, mainly in the formation of morphoses or deformities.

Improving Similarity Search Using Clustered Data

This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.

Quick Similarity Measurement of Binary Images via Probabilistic Pixel Mapping

In this paper we present a quick technique to measure the similarity between binary images. The technique is based on a probabilistic mapping approach and is fast because only a minute percentage of the image pixels need to be compared to measure the similarity, and not the whole image. We exploit the power of the Probabilistic Matching Model for Binary Images (PMMBI) to arrive at an estimate of the similarity. We show that the estimate is a good approximation of the actual value, and the quality of the estimate can be improved further with increased image mappings. Furthermore, the technique is image size invariant; the similarity between big images can be measured as fast as that for small images. Examples of trials conducted on real images are presented.

Similarity Based Membership of Elements to Uncertain Concept in Information System

The process of determining the degree of membership for an element to an uncertain concept has been found in many ways, using equivalence and symmetry relations in information systems. In the case of similarity, these methods did not take into account the degree of symmetry between elements. In this paper, we use a new definition for finding the membership based on the degree of symmetry. We provide an example to clarify the suggested methods and compare it with previous methods. This method opens the door to more accurate decisions in information systems.

The Effects of an Immigration Policy on the Economic Integration of Migrants and on Natives’ Attitudes: The Case of Syrian Refugees in Turkey

Turkey’s immigration policy is a controversial issue considering its legal, economic, social, and political and human rights dimensions. Formulation of an immigration policy goes hand in hand with political processes, where natives’ attitudes play a significant role. On the other hand, as was the case in Turkey, radical changes made in immigration policy or policies lacking transparency may cause severe reactions by the host society. The underlying discussion paper aims to analyze quantitatively the effects of the existing ‘open door’ immigration policy on the economic integration of Syrian refugees in Turkey, and on the perception of the native population of refugees. For the analysis, semi-structured in-depth interviews and focus group interviews have been conducted. After the introduction, a literature review is provided, followed by theoretical background on the explanation of natives’ attitudes towards immigrants. In the next section, a qualitative analysis of natives’ attitudes towards Syrian refugees is presented with the subtopics of (i) awareness, general opinions and expectations, (ii) open-door policy and management of the migration process, (iii) perception of positive and negative impacts of immigration, (iv) economic integration, and (v) cultural similarity. Results indicate that, natives concurrently have social, economic and security concerns regarding refugees, while difficulties regarding security and economic integration of refugees stand out. Socio-economic characteristics of the respondents, such as the educational level and employment status, are not sufficient to explain the overall attitudes towards refugees, while they can be used to explain the awareness of the respondents and the priority of the concerns felt.

The Simulation and Experimental Investigation to Study the Strain Distribution Pattern during the Closed Die Forging Process

Closed die forging is a very complex process, and measurement of actual forces for real material is difficult and time consuming. Hence, the modelling technique has taken the advantage of carrying out the experimentation with the proper model material which needs lesser forces and relatively low temperature. The results of experiments on the model material then may be correlated with the actual material by using the theory of similarity. There are several methods available to resolve the complexity involved in the closed die forging process. Finite Element Method (FEM) and Finite Difference Method (FDM) are relatively difficult as compared to the slab method. The slab method is very popular and very widely used by the people working on shop floor because it is relatively easy to apply and reasonably accurate for most of the common forging load requirement computations.

Object Negotiation Mechanism for an Intelligent Environment Using Event Agents

With advancements in science and technology, the concept of the Internet of Things (IoT) has gradually developed. The development of the intelligent environment adds intelligence to objects in the living space by using the IoT. In the smart environment, when multiple users share the living space, if different service requirements from different users arise, then the context-aware system will have conflicting situations for making decisions about providing services. Therefore, the purpose of establishing a communication and negotiation mechanism among objects in the intelligent environment is to resolve those service conflicts among users. This study proposes developing a decision-making methodology that uses “Event Agents” as its core. When the sensor system receives information, it evaluates a user’s current events and conditions; analyses object, location, time, and environmental information; calculates the priority of the object; and provides the user services based on the event. Moreover, when the event is not single but overlaps with another, conflicts arise. This study adopts the “Multiple Events Correlation Matrix” in order to calculate the degree values of incidents and support values for each object. The matrix uses these values as the basis for making inferences for system service, and to further determine appropriate services when there is a conflict.

Approximately Similarity Measurement of Web Sites Using Genetic Algorithms and Binary Trees

In this paper, we determine the similarity of two HTML web applications. We are going to use a genetic algorithm in order to determine the most significant web pages of each application (we are not going to use every web page of a site). Using these significant web pages, we will find the similarity value between the two applications. The algorithm is going to be efficient because we are going to use a reduced number of web pages for comparisons but it will return an approximate value of the similarity. The binary trees are used to keep the tags from the significant pages. The algorithm was implemented in Java language.

Debts and Debt-Based Sukuk Related to Risk Shifting Behavior

This paper elaborates risk shifting in debt financing system as the ultimate cause of the global financial crisis. In contrast, risk sharing in equity financing like sukuk helps the economic system to be better sustained. Nevertheless, some types of sukuk are haunted by the issue of imitation with bonds. The critics on the imitation issue not only have raised doubt on the ability of sukuk to diminish risk shifting behavior but also the ability of this Islamic financial instrument to ensure better future financial stability. Through that, this paper provides discussion on the possibility of sukuk to induce risk shifting and how equity financing may help sukuk to be free from risk shifting. This paper is important in the sense that sukuk receives a significant demand from investors throughout the world. For this instrument to be supportive in the future economic stability, the issue of imitation needs to be identified and addressed. Furthermore, critics cannot be focused on debts and its ability to gauge the financial flux but also to sukuk due to their structures similarity.

3D Objects Indexing with a Direct and Analytical Method for Calculating the Spherical Harmonics Coefficients

In this paper, we propose a new method for threedimensional object indexing based on D.A.M.C-S.H.C descriptor (Direct and Analytical Method for Calculating the Spherical Harmonics Coefficients). For this end, we propose a direct calculation of the coefficients of spherical harmonics with perfect precision. The aims of the method are to minimize, the processing time on the 3D objects database and the searching time of similar objects to a request object. Firstly we start by defining the new descriptor using a new division of 3-D object in a sphere. Then we define a new distance which will be tested and prove his efficiency in the search for similar objects in the database in which we have objects with very various and important size.

Distributed Manufacturing (DM) - Smart Units and Collaborative Processes

Applications of the Hausdorff space and its mappings into tangent spaces are outlined, including their fractal dimensions and self-similarities. The paper details this theory set up and further describes virtualizations and atomization of manufacturing processes. It demonstrates novel concurrency principles that will guide manufacturing processes and resources configurations. Moreover, varying levels of details may be produced by up folding and breaking down of newly introduced generic models. This choice of layered generic models for units and systems aspects along specific aspects allows research work in parallel to other disciplines with the same focus on all levels of detail. More credit and easier access are granted to outside disciplines for enriching manufacturing grounds. Specific mappings and the layers give hints for chances for interdisciplinary outcomes and may highlight more details for interoperability standards, as already worked on the international level. The new rules are described, which require additional properties concerning all involved entities for defining distributed decision cycles, again on the base of self-similarity. All properties are further detailed and assigned to a maturity scale, eventually displaying the smartness maturity of a total shopfloor or a factory. The paper contributes to the intensive ongoing discussion in the field of intelligent distributed manufacturing and promotes solid concepts for implementations of Cyber Physical Systems and the Internet of Things into manufacturing industry, like industry 4.0, as discussed in German-speaking countries.

Isolation and Identification of Diacylglycerol Acyltransferase Type- 2 (GAT2) Genes from Three Egyptian Olive Cultivars

Aim of this work was to study the genetic basis for oil accumulation in olive fruit via tracking DGAT2 (Diacylglycerol acyltransferase type-2) gene in three Egyptian Origen Olive cultivars namely Toffahi, Hamed and Maraki using molecular marker techniques and bioinformatics tools. Results illustrate that, firstly: specific genomic band of Maraki cultivars was identified as DGAT2 (Diacylglycerol acyltransferase type-2) and identical for this gene in Olea europaea with 100% of similarity. Secondly, differential genomic band of Maraki cultivars which produced from RAPD fingerprinting technique reflected predicted distinguished sequence which identified as DGAT2 (Diacylglycerol acyltransferase type-2) in Fragaria vesca subsp. Vesca with 76% of sequential similarity. Third and finally, specific genomic specific band of Hamed cultivars was identified as two fragments, 1- Olea europaea cultivar Koroneiki diacylglycerol acyltransferase type 2 mRNA, complete cds with two matches regions with 99% or 2- Predicted: Fragaria vesca subsp. vesca diacylglycerol O-acyltransferase 2-like (LOC101313050), mRNA with 86 % of similarity.

Time Series Regression with Meta-Clusters

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain subgroups of time series data with normal distribution from the inflow into wastewater treatment plant data, composed of several groups differing by mean value. Two simple algorithms, K-mean and EM, were chosen as a clustering method. The Rand index was used to measure the similarity. After simple meta-clustering, a regression model was performed for each subgroups. The final model was a sum of the subgroups models. The quality of the obtained model was compared with the regression model made using the same explanatory variables, but with no clustering of data. Results were compared using determination coefficient (R2), measure of prediction accuracy- mean absolute percentage error (MAPE) and comparison on a linear chart. Preliminary results allow us to foresee the potential of the presented technique.

A Distance Function for Data with Missing Values and Its Application

Missing values in data are common in real world applications. Since the performance of many data mining algorithms depend critically on it being given a good metric over the input space, we decided in this paper to define a distance function for unlabeled datasets with missing values. We use the Bhattacharyya distance, which measures the similarity of two probability distributions, to define our new distance function. According to this distance, the distance between two points without missing attributes values is simply the Mahalanobis distance. When on the other hand there is a missing value of one of the coordinates, the distance is computed according to the distribution of the missing coordinate. Our distance is general and can be used as part of any algorithm that computes the distance between data points. Because its performance depends strongly on the chosen distance measure, we opted for the k nearest neighbor classifier to evaluate its ability to accurately reflect object similarity. We experimented on standard numerical datasets from the UCI repository from different fields. On these datasets we simulated missing values and compared the performance of the kNN classifier using our distance to other three basic methods. Our  experiments show that kNN using our distance function outperforms the kNN using other methods. Moreover, the runtime performance of our method is only slightly higher than the other methods.

Graph-based High Level Motion Segmentation using Normalized Cuts

Motion capture devices have been utilized in producing several contents, such as movies and video games. However, since motion capture devices are expensive and inconvenient to use, motions segmented from captured data was recycled and synthesized to utilize it in another contents, but the motions were generally segmented by contents producers in manual. Therefore, automatic motion segmentation is recently getting a lot of attentions. Previous approaches are divided into on-line and off-line, where on-line approaches segment motions based on similarities between neighboring frames and off-line approaches segment motions by capturing the global characteristics in feature space. In this paper, we propose a graph-based high-level motion segmentation method. Since high-level motions consist of several repeated frames within temporal distances, we consider all similarities among all frames within the temporal distance. This is achieved by constructing a graph, where each vertex represents a frame and the edges between the frames are weighted by their similarity. Then, normalized cuts algorithm is used to partition the constructed graph into several sub-graphs by globally finding minimum cuts. In the experiments, the results using the proposed method showed better performance than PCA-based method in on-line and GMM-based method in off-line, as the proposed method globally segment motions from the graph constructed based similarities between neighboring frames as well as similarities among all frames within temporal distances.

Flocking Behaviors for Multiple Groups with Heterogeneous Agents

Most of researches for conventional simulations were studied focusing on flocks with a single species. While there exist the flocking behaviors with a single species in nature, the flocking behaviors are frequently observed with multi-species. This paper studies on the flocking simulation for heterogeneous agents. In order to simulate the flocks for heterogeneous agents, the conventional method uses the identifier of flock, while the proposed method defines the feature vector of agent and uses the similarity between agents by comparing with those feature vectors. Based on the similarity, the paper proposed the attractive force and repulsive force and then executed the simulation by applying two forces. The results of simulation showed that flock formation with heterogeneous agents is very natural in both cases. In addition, it showed that unlike the existing method, the proposed method can not only control the density of the flocks, but also be possible for two different groups of agents to flock close to each other if they have a high similarity.

OCIRS: An Ontology-based Chinese Idioms Retrieval System

Chinese Idioms are a type of traditional Chinese idiomatic expressions with specific meanings and stereotypes structure which are widely used in classical Chinese and are still common in vernacular written and spoken Chinese today. Currently, Chinese Idioms are retrieved in glossary with key character or key word in morphology or pronunciation index that can not meet the need of searching semantically. OCIRS is proposed to search the desired idiom in the case of users only knowing its meaning without any key character or key word. The user-s request in a sentence or phrase will be grammatically analyzed in advance by word segmentation, key word extraction and semantic similarity computation, thus can be mapped to the idiom domain ontology which is constructed to provide ample semantic relations and to facilitate description logics-based reasoning for idiom retrieval. The experimental evaluation shows that OCIRS realizes the function of searching idioms via semantics, obtaining preliminary achievement as requested by the users.

Cumulative Learning based on Dynamic Clustering of Hierarchical Production Rules(HPRs)

An important structuring mechanism for knowledge bases is building clusters based on the content of their knowledge objects. The objects are clustered based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity. Clustering can also facilitate taxonomy formation, that is, the organization of observations into a hierarchy of classes that group similar events together. Hierarchical representation allows us to easily manage the complexity of knowledge, to view the knowledge at different levels of details, and to focus our attention on the interesting aspects only. One of such efficient and easy to understand systems is Hierarchical Production rule (HPRs) system. A HPR, a standard production rule augmented with generality and specificity information, is of the following form Decision If < condition> Generality Specificity . HPRs systems are capable of handling taxonomical structures inherent in the knowledge about the real world. In this paper, a set of related HPRs is called a cluster and is represented by a HPR-tree. This paper discusses an algorithm based on cumulative learning scenario for dynamic structuring of clusters. The proposed scheme incrementally incorporates new knowledge into the set of clusters from the previous episodes and also maintains summary of clusters as Synopsis to be used in the future episodes. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested incremental structuring of clusters would be useful in mining data streams.