Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

A Neuron Model of Facial Recognition and Detection of an Authorized Entity Using Machine Learning System

This paper has critically examined the use of Machine Learning procedures in curbing unauthorized access into valuable areas of an organization. The use of passwords, pin codes, user’s identification in recent times has been partially successful in curbing crimes involving identities, hence the need for the design of a system which incorporates biometric characteristics such as DNA and pattern recognition of variations in facial expressions. The facial model used is the OpenCV library which is based on the use of certain physiological features, the Raspberry Pi 3 module is used to compile the OpenCV library, which extracts and stores the detected faces into the datasets directory through the use of camera. The model is trained with 50 epoch run in the database and recognized by the Local Binary Pattern Histogram (LBPH) recognizer contained in the OpenCV. The training algorithm used by the neural network is back propagation coded using python algorithmic language with 200 epoch runs to identify specific resemblance in the exclusive OR (XOR) output neurons. The research however confirmed that physiological parameters are better effective measures to curb crimes relating to identities.

Relation of Optimal Pilot Offsets in the Shifted Constellation-Based Method for the Detection of Pilot Contamination Attacks

One possible approach for maintaining the security of communication systems relies on Physical Layer Security mechanisms. However, in wireless time division duplex systems, where uplink and downlink channels are reciprocal, the channel estimate procedure is exposed to attacks known as pilot contamination, with the aim of having an enhanced data signal sent to the malicious user. The Shifted 2-N-PSK method involves two random legitimate pilots in the training phase, each of which belongs to a constellation, shifted from the original N-PSK symbols by certain degrees. In this paper, legitimate pilots’ offset values and their influence on the detection capabilities of the Shifted 2-N-PSK method are investigated. As the implementation of the technique depends on the relation between the shift angles rather than their specific values, the optimal interconnection between the two legitimate constellations is investigated. The results show that no regularity exists in the relation between the pilot contamination attacks (PCA) detection probability and the choice of offset values. Therefore, an adversary who aims to obtain the exact offset values can only employ a brute-force attack but the large number of possible combinations for the shifted constellations makes such a type of attack difficult to successfully mount. For this reason, the number of optimal shift value pairs is also studied for both 100% and 98% probabilities of detecting pilot contamination attacks. Although the Shifted 2-N-PSK method has been broadly studied in different signal-to-noise ratio scenarios, in multi-cell systems the interference from the signals in other cells should be also taken into account. Therefore, the inter-cell interference impact on the performance of the method is investigated by means of a large number of simulations. The results show that the detection probability of the Shifted 2-N-PSK decreases inversely to the signal-to-interference-plus-noise ratio.

Lexical Based Method for Opinion Detection on Tripadvisor Collection

The massive development of online social networks allows users to post and share their opinions on various topics. With this huge volume of opinion, it is interesting to extract and interpret these information for different domains, e.g., product and service benchmarking, politic, system of recommendation. This is why opinion detection is one of the most important research tasks. It consists on differentiating between opinion data and factual data. The difficulty of this task is to determine an approach which returns opinionated document. Generally, there are two approaches used for opinion detection i.e. Lexical based approaches and Machine Learning based approaches. In Lexical based approaches, a dictionary of sentimental words is used, words are associated with weights. The opinion score of document is derived by the occurrence of words from this dictionary. In Machine learning approaches, usually a classifier is trained using a set of annotated document containing sentiment, and features such as n-grams of words, part-of-speech tags, and logical forms. Majority of these works are based on documents text to determine opinion score but dont take into account if these texts are really correct. Thus, it is interesting to exploit other information to improve opinion detection. In our work, we will develop a new way to consider the opinion score. We introduce the notion of trust score. We determine opinionated documents but also if these opinions are really trustable information in relation with topics. For that we use lexical SentiWordNet to calculate opinion and trust scores, we compute different features about users like (numbers of their comments, numbers of their useful comments, Average useful review). After that, we combine opinion score and trust score to obtain a final score. We applied our method to detect trust opinions in TRIPADVISOR collection. Our experimental results report that the combination between opinion score and trust score improves opinion detection.

Understanding the Selectional Preferences of the Twitter Mentions Network

Users in social networks either unicast or broadcast their messages. At mention is the popular way of unicasting for Twitter whereas general tweeting could be considered as broadcasting method. Understanding the information flow and dynamics within a Social Network and modeling the same is a promising and an open research area called Information Diffusion. This paper seeks an answer to a fundamental question - understanding if the at-mention network or the unicasting pattern in social media is purely random in nature or is there any user specific selectional preference? To answer the question we present an empirical analysis to understand the sociological aspects of Twitter mentions network within a social network community. To understand the sociological behavior we analyze the values (Schwartz model: Achievement, Benevolence, Conformity, Hedonism, Power, Security, Self-Direction, Stimulation, Traditional and Universalism) of all the users. Empirical results suggest that values traits are indeed salient cue to understand how the mention-based communication network functions. For example, we notice that individuals possessing similar values unicast among themselves more often than with other value type people. We also observe that traditional and self-directed people do not maintain very close relationship in the network with the people of different values traits.

Cities Simulation and Representation in Locative Games from the Perspective of Cultural Studies

This work aims to analyze the locative structure used by the locative games of the company Niantic. To fulfill this objective, a literature review on the representation and simulation of cities was developed; interviews with Ingress players and playing Ingress. Relating these data, it was possible to deepen the relationship between the virtual and the real to create the simulation of cities and their cultural objects in locative games. Cities representation associates geo-location provided by the Global Positioning System (GPS), with augmented reality and digital image, and provides a new paradigm in the city interaction with its parts and real and virtual world elements, homeomorphic to real world. Bibliographic review of papers related to the representation and simulation study and their application in locative games was carried out and is presented in the present paper. The cities representation and simulation concepts in locative games, and how this setting enables the flow and immersion in urban space, are analyzed. Some examples of games are discussed for this new setting development, which is a mix of real and virtual world. Finally, it was proposed a Locative Structure for electronic games using the concepts of heterotrophic representations and isotropic representations conjoined with immediacy and hypermediacy.

The Effect of Computer-Mediated vs. Face-to-Face Instruction on L2 Pragmatics: A Meta-Analysis

This paper reports the results of a meta-analysis of studies on the effects of instruction mode on learning second language pragmatics during the last decade (from 2006 to 2016). After establishing related inclusion/ exclusion criteria, 39 published studies were retrieved and included in the present meta-analysis. Studies were later coded for face-to-face and computer-assisted mode of instruction. Statistical procedures were applied to obtain effect sizes. It was found that Computer-Assisted-Language-Learning studies generated larger effects than Face-to-Face instruction.

Machine Learning Methods for Network Intrusion Detection

Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE.

Efficient Antenna Array Beamforming with Robustness against Random Steering Mismatch

This paper deals with the problem of using antenna sensors for adaptive beamforming in the presence of random steering mismatch. We present an efficient adaptive array beamformer with robustness to deal with the considered problem. The robustness of the proposed beamformer comes from the efficient designation of the steering vector. Using the received array data vector, we construct an appropriate correlation matrix associated with the received array data vector and a correlation matrix associated with signal sources. Then, the eigenvector associated with the largest eigenvalue of the constructed signal correlation matrix is designated as an appropriate estimate of the steering vector. Finally, the adaptive weight vector required for adaptive beamforming is obtained by using the estimated steering vector and the constructed correlation matrix of the array data vector. Simulation results confirm the effectiveness of the proposed method.

Road Traffic Accidents Analysis in Mexico City through Crowdsourcing Data and Data Mining Techniques

Road traffic accidents are among the principal causes of traffic congestion, causing human losses, damages to health and the environment, economic losses and material damages. Studies about traditional road traffic accidents in urban zones represents very high inversion of time and money, additionally, the result are not current. However, nowadays in many countries, the crowdsourced GPS based traffic and navigation apps have emerged as an important source of information to low cost to studies of road traffic accidents and urban congestion caused by them. In this article we identified the zones, roads and specific time in the CDMX in which the largest number of road traffic accidents are concentrated during 2016. We built a database compiling information obtained from the social network known as Waze. The methodology employed was Discovery of knowledge in the database (KDD) for the discovery of patterns in the accidents reports. Furthermore, using data mining techniques with the help of Weka. The selected algorithms was the Maximization of Expectations (EM) to obtain the number ideal of clusters for the data and k-means as a grouping method. Finally, the results were visualized with the Geographic Information System QGIS.

Automated Heart Sound Classification from Unsegmented Phonocardiogram Signals Using Time Frequency Features

Cardiologists perform cardiac auscultation to detect abnormalities in heart sounds. Since accurate auscultation is a crucial first step in screening patients with heart diseases, there is a need to develop computer-aided detection/diagnosis (CAD) systems to assist cardiologists in interpreting heart sounds and provide second opinions. In this paper different algorithms are implemented for automated heart sound classification using unsegmented phonocardiogram (PCG) signals. Support vector machine (SVM), artificial neural network (ANN) and cartesian genetic programming evolved artificial neural network (CGPANN) without the application of any segmentation algorithm has been explored in this study. The signals are first pre-processed to remove any unwanted frequencies. Both time and frequency domain features are then extracted for training the different models. The different algorithms are tested in multiple scenarios and their strengths and weaknesses are discussed. Results indicate that SVM outperforms the rest with an accuracy of 73.64%.

Programming Language Extension Using Structured Query Language for Database Access

Relational databases constitute a very vital tool for the effective management and administration of both personal and organizational data. Data access ranges from a single user database management software to a more complex distributed server system. This paper intends to appraise the use a programming language extension like structured query language (SQL) to establish links to a relational database (Microsoft Access 2013) using Visual C++ 9 programming language environment. The methodology used involves the creation of tables to form a database using Microsoft Access 2013, which is Object Linking and Embedding (OLE) database compliant. The SQL command is used to query the tables in the database for easy extraction of expected records inside the visual C++ environment. The findings of this paper reveal that records can easily be accessed and manipulated to filter exactly what the user wants, such as retrieval of records with specified criteria, updating of records, and deletion of part or the whole records in a table.

Discovering the Dimension of Abstractness: Structure-Based Model that Learns New Categories and Categorizes on Different Levels of Abstraction

A structure-based model of category learning and categorization at different levels of abstraction is presented. The model compares different structures and expresses their similarity implicitly in the forms of mappings. Based on this similarity, the model can categorize different targets either as members of categories that it already has or creates new categories. The model is novel using two threshold parameters to evaluate the structural correspondence. If the similarity between two structures exceeds the higher threshold, a new sub-ordinate category is created. Vice versa, if the similarity does not exceed the higher threshold but does the lower one, the model creates a new category on higher level of abstraction.

Role-Governed Categorization and Category Learning as a Result from Structural Alignment: The RoleMap Model

The paper presents a symbolic model for category learning and categorization (called RoleMap). Unlike the other models which implement learning in a separate working mode, role-governed category learning and categorization emerge in RoleMap while it does its usual reasoning. The model is based on several basic mechanisms known as reflecting the sub-processes of analogy-making. It steps on the assumption that in their everyday life people constantly compare what they experience and what they know. Various commonalities between the incoming information (current experience) and the stored one (long-term memory) emerge from those comparisons. Some of those commonalities are considered to be highly important, and they are transformed into concepts for further use. This process denotes the category learning. When there is missing knowledge in the incoming information (i.e. the perceived object is still not recognized), the model makes anticipations about what is missing, based on the similar episodes from its long-term memory. Various such anticipations may emerge for different reasons. However, with time only one of them wins and is transformed into a category member. This process denotes the act of categorization.