Abstract: PPX(Pretty Printer for XML) is a query language that offers a concise description method of formatting the XML data into HTML. In this paper, we propose a simple specification of formatting method that is a combination description of automatic layout operators and variables in the layout expression of the GENERATE clause of PPX. This method can automatically format irregular XML data included in a part of XML with layout decision rule that is referred to DTD. In the experiment, a quick comparison shows that PPX requires far less description compared to XSLT or XQuery programs doing same tasks.
Abstract: This conference paper discusses a risk allocation problem for subprime investing banks involving investment in subprime structured mortgage products (SMPs) and Treasuries. In order to solve this problem, we develop a L'evy process-based model of jump diffusion-type for investment choice in subprime SMPs and Treasuries. This model incorporates subprime SMP losses for which credit default insurance in the form of credit default swaps (CDSs) can be purchased. In essence, we solve a mean swap-at-risk (SaR) optimization problem for investment which determines optimal allocation between SMPs and Treasuries subject to credit risk protection via CDSs. In this regard, SaR is indicative of how much protection investors must purchase from swap protection sellers in order to cover possible losses from SMP default. Here, SaR is defined in terms of value-at-risk (VaR). Finally, we provide an analysis of the aforementioned optimization problem and its connections with the subprime mortgage crisis (SMC).
Abstract: Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.
Abstract: The increasing interest on processing data created by
sensor networks has evolved into approaches to implement sensor
networks as databases. The aggregation operator, which calculates a
value from a large group of data such as computing averages or sums,
etc. is an essential function that needs to be provided when
implementing such sensor network databases. This work proposes to
add the DURING clause into TinySQL to calculate values during a
specific long period and suggests a way to implement the aggregation
service in sensor networks by applying materialized view and
incremental view maintenance techniques that is used in data
warehouses. In sensor networks, data values are passed from child
nodes to parent nodes and an aggregation value is computed at the root
node. As such root nodes need to be memory efficient and low
powered, it becomes a problem to recompute aggregate values from all
past and current data. Therefore, applying incremental view
maintenance techniques can reduce the memory consumption and
support fast computation of aggregate values.
Abstract: Glaucoma diagnosis involves extracting three features
of the fundus image; optic cup, optic disc and vernacular. Present
manual diagnosis is expensive, tedious and time consuming. A
number of researches have been conducted to automate this process.
However, the variability between the diagnostic capability of an
automated system and ophthalmologist has yet to be established. This
paper discusses the efficiency and variability between
ophthalmologist opinion and digital technique; threshold. The
efficiency and variability measures are based on image quality
grading; poor, satisfactory or good. The images are separated into
four channels; gray, red, green and blue. A scientific investigation
was conducted on three ophthalmologists who graded the images
based on the image quality. The images are threshold using multithresholding
and graded as done by the ophthalmologist. A
comparison of grade from the ophthalmologist and threshold is made.
The results show there is a small variability between result of
ophthalmologists and digital threshold.
Abstract: For the last decade, statistics show traumatic brain
injury (TBI) is a growing concern in our legal system. In an effort to
obtain data regarding the influence of neuropsychological expert
witness testimony in a criminal case, this study tested three
hypotheses. H1: The majority of jurors will vote not guilty, due to
mild head injury. H2: The jurors will give more credence to the
testimony of the neuropsychologist rather than the psychiatrist. H3:
The jurors will be more lenient in their sentencing, given the
testimony of the neuropsychologist-s testimony. The criterion for
inclusion in the study as a participant is identical to those used for
inclusion in the eligibility for jury duty in the United States. A chisquared
test was performed to analyze the data for the three
hypotheses. The results supported all of the hypotheses; however
statistical significance was seen in H1 and H2 only.
Abstract: In this paper we present a generic approach for the problem of the blind estimation of the parameters of linear and convolutional error correcting codes. In a non-cooperative context, an adversary has only access to the noised transmission he has intercepted. The intercepter has no knowledge about the parameters used by the legal users. So, before having acess to the information he has first to blindly estimate the parameters of the error correcting code of the communication. The presented approach has the main advantage that the problem of reconstruction of such codes can be expressed in a very simple way. This allows us to evaluate theorical bounds on the complexity of the reconstruction process but also bounds on the estimation rate. We show that some classical reconstruction techniques are optimal and also explain why some of them have theorical complexities greater than these experimentally observed.