Abstract: In recent years, the number of document data has been
increasing since the spread of the Internet. Many methods have been
studied for extracting topics from large document data. We proposed
Independent Topic Analysis (ITA) to extract topics independent of
each other from large document data such as newspaper data. ITA is a
method for extracting the independent topics from the document data
by using the Independent Component Analysis. The topic represented
by ITA is represented by a set of words. However, the set of words
is quite different from the topics the user imagines. For example,
the top five words with high independence of a topic are as follows.
Topic1 = {"scor", "game", "lead", "quarter", "rebound"}. This Topic
1 is considered to represent the topic of "SPORTS". This topic name
"SPORTS" has to be attached by the user. ITA cannot name topics.
Therefore, in this research, we propose a method to obtain topics easy
for people to understand by using the web search engine, topics given
by the set of words given by independent topic analysis. In particular,
we search a set of topical words, and the title of the homepage of
the search result is taken as the topic name. And we also use the
proposed method for some data and verify its effectiveness.
Abstract: To claim the ownership for an executable program is a non-trivial task. An emerging direction is to add a watermark to the program such that the watermarked program preserves the original program’s functionality and removing the watermark would heavily destroy the functionality of the watermarked program. In this paper, the first watermarking signature scheme with the watermark and the constraint function hidden in the symmetric key setting is constructed. The scheme uses well-known techniques of lattice trapdoors and a lattice evaluation. The watermarking signature scheme is unforgeable under the Short Integer Solution (SIS) assumption and satisfies other security requirements such as the unremovability security property.
Abstract: This paper compares the multipath mitigation
performance of code correlation reference waveform receivers
with variable and fixed window width, for binary offset carrier
and multiplexed binary offset carrier signals typically used in
global navigation satellite systems. In the variable window width
method, such width is iteratively reduced until the distortion on
the discriminator with multipath is eliminated. This distortion is
measured as the Euclidean distance between the actual discriminator
(obtained with the incoming signal), and the local discriminator
(generated with a local copy of the signal). The variable window
width have shown better performance compared to the fixed window
width. In particular, the former yields zero error for all delays for
the BOC and MBOC signals considered, while the latter gives
rather large nonzero errors for small delays in all cases. Due to
its computational simplicity, the variable window width method is
perfectly suitable for implementation in low-cost receivers.
Abstract: This research presents the first constant approximation
algorithm to the p-median network design problem with multiple
cable types. This problem was addressed with a single cable type and
there is a bifactor approximation algorithm for the problem. To the
best of our knowledge, the algorithm proposed in this paper is the first
constant approximation algorithm for the p-median network design
with multiple cable types. The addressed problem is a combination of
two well studied problems which are p-median problem and network
design problem. The introduced algorithm is a random sampling
approximation algorithm of constant factor which is conceived by
using some random sampling techniques form the literature. It is
based on a redistribution Lemma from the literature and a steiner tree
problem as a subproblem. This algorithm is simple, and it relies on the
notions of random sampling and probability. The proposed approach
gives an approximation solution with one constant ratio without
violating any of the constraints, in contrast to the one proposed in the
literature. This paper provides a (21 + 2)-approximation algorithm
for the p-median network design problem with multiple cable types
using random sampling techniques.
Abstract: With the recent advance of the deep neural network, we observe new applications of NLP (natural language processing) and CV (computer vision) powered by deep neural networks for processing business documents. However, creating a real-world document processing system needs to integrate several NLP and CV tasks, rather than treating them separately. There is a need to have a unified approach for processing documents containing textual and graphical elements with rich formats, diverse layout arrangement, and distinct semantics. In this paper, a framework that fulfills this unified approach is presented. The framework includes a representation model definition for holding the information generated by various tasks and specifications defining the coordination between these tasks. The framework is a blueprint for building a system that can process documents with rich formats, styles, and multiple types of elements. The flexible and lightweight design of the framework can help build a system for diverse business scenarios, such as contract monitoring and reviewing.
Abstract: Due to the development of the current civilization, one must create suitable models of its pervasive massive phenomena. Such a phenomenon is the digital transformation, which has a substantial number of disciplined, methodical interpretations forming the diversified reflection. This reflection could be understood pragmatically as the current temporal, a local differential state of knowledge. The model of the discursive space is proposed as a model for the analysis and description of this knowledge. Discursive space is understood as an autonomous multidimensional space where separate discourses traverse specific trajectories of what can be presented in multidimensional parallel coordinate system. Discursive space built on the world of facts preserves the complex character of that world. Digital transformation as a discursive space has a relativistic character that means that at the same time, it is created by the dynamic discourses and these discourses are molded by the shape of this space.
Abstract: In sports, individuals and teams are typically interested
in final rankings. Final results, such as times or distances, dictate
these rankings, also known as places. Places can be further associated
with ordered random variables, commonly referred to as order
statistics. In this work, we introduce a simple, yet accurate order
statistical ordinal regression function that predicts relay race places
with changeover-times. We call this function the Fenton-Wilkinson
Order Statistics model. This model is built on the following educated
assumption: individual leg-times follow log-normal distributions.
Moreover, our key idea is to utilize Fenton-Wilkinson approximations
of changeover-times alongside an estimator for the total number
of teams as in the notorious German tank problem. This original
place regression function is sigmoidal and thus correctly predicts
the existence of a small number of elite teams that significantly
outperform the rest of the teams. Our model also describes how place
increases linearly with changeover-time at the inflection point of the
log-normal distribution function. With real-world data from Jukola
2019, a massive orienteering relay race, the model is shown to be
highly accurate even when the size of the training set is only 5%
of the whole data set. Numerical results also show that our model
exhibits smaller place prediction root-mean-square-errors than linear
regression, mord regression and Gaussian process regression.
Abstract: In linear estimation, the traditional Kalman filter uses the Kalman filter gain in order to produce estimation and prediction of the n-dimensional state vector using the m-dimensional measurement vector. The computation of the Kalman filter gain requires the inversion of an m x m matrix in every iteration. In this paper, a variation of the Kalman filter eliminating the Kalman filter gain is proposed. In the time varying case, the elimination of the Kalman filter gain requires the inversion of an n x n matrix and the inversion of an m x m matrix in every iteration. In the time invariant case, the elimination of the Kalman filter gain requires the inversion of an n x n matrix in every iteration. The proposed Kalman filter gain elimination algorithm may be faster than the conventional Kalman filter, depending on the model dimensions.