A Proposal of an Automatic Formatting Method for Transforming XML Data

PPX(Pretty Printer for XML) is a query language that offers a concise description method of formatting the XML data into HTML. In this paper, we propose a simple specification of formatting method that is a combination description of automatic layout operators and variables in the layout expression of the GENERATE clause of PPX. This method can automatically format irregular XML data included in a part of XML with layout decision rule that is referred to DTD. In the experiment, a quick comparison shows that PPX requires far less description compared to XSLT or XQuery programs doing same tasks.

Optimal Allocation Between Subprime Structured Mortgage Products and Treasuries

This conference paper discusses a risk allocation problem for subprime investing banks involving investment in subprime structured mortgage products (SMPs) and Treasuries. In order to solve this problem, we develop a L'evy process-based model of jump diffusion-type for investment choice in subprime SMPs and Treasuries. This model incorporates subprime SMP losses for which credit default insurance in the form of credit default swaps (CDSs) can be purchased. In essence, we solve a mean swap-at-risk (SaR) optimization problem for investment which determines optimal allocation between SMPs and Treasuries subject to credit risk protection via CDSs. In this regard, SaR is indicative of how much protection investors must purchase from swap protection sellers in order to cover possible losses from SMP default. Here, SaR is defined in terms of value-at-risk (VaR). Finally, we provide an analysis of the aforementioned optimization problem and its connections with the subprime mortgage crisis (SMC).

Evaluation of the Impact of Dataset Characteristics for Classification Problems in Biological Applications

Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.

A Materialized View Approach to Support Aggregation Operations over Long Periods in Sensor Networks

The increasing interest on processing data created by sensor networks has evolved into approaches to implement sensor networks as databases. The aggregation operator, which calculates a value from a large group of data such as computing averages or sums, etc. is an essential function that needs to be provided when implementing such sensor network databases. This work proposes to add the DURING clause into TinySQL to calculate values during a specific long period and suggests a way to implement the aggregation service in sensor networks by applying materialized view and incremental view maintenance techniques that is used in data warehouses. In sensor networks, data values are passed from child nodes to parent nodes and an aggregation value is computed at the root node. As such root nodes need to be memory efficient and low powered, it becomes a problem to recompute aggregate values from all past and current data. Therefore, applying incremental view maintenance techniques can reduce the memory consumption and support fast computation of aggregate values.

Bridging Quantitative and Qualitative of Glaucoma Detection

Glaucoma diagnosis involves extracting three features of the fundus image; optic cup, optic disc and vernacular. Present manual diagnosis is expensive, tedious and time consuming. A number of researches have been conducted to automate this process. However, the variability between the diagnostic capability of an automated system and ophthalmologist has yet to be established. This paper discusses the efficiency and variability between ophthalmologist opinion and digital technique; threshold. The efficiency and variability measures are based on image quality grading; poor, satisfactory or good. The images are separated into four channels; gray, red, green and blue. A scientific investigation was conducted on three ophthalmologists who graded the images based on the image quality. The images are threshold using multithresholding and graded as done by the ophthalmologist. A comparison of grade from the ophthalmologist and threshold is made. The results show there is a small variability between result of ophthalmologists and digital threshold.

Perspectives on Neuropsychological Testimony

For the last decade, statistics show traumatic brain injury (TBI) is a growing concern in our legal system. In an effort to obtain data regarding the influence of neuropsychological expert witness testimony in a criminal case, this study tested three hypotheses. H1: The majority of jurors will vote not guilty, due to mild head injury. H2: The jurors will give more credence to the testimony of the neuropsychologist rather than the psychiatrist. H3: The jurors will be more lenient in their sentencing, given the testimony of the neuropsychologist-s testimony. The criterion for inclusion in the study as a participant is identical to those used for inclusion in the eligibility for jury duty in the United States. A chisquared test was performed to analyze the data for the three hypotheses. The results supported all of the hypotheses; however statistical significance was seen in H1 and H2 only.

Algebraic Approach for the Reconstruction of Linear and Convolutional Error Correcting Codes

In this paper we present a generic approach for the problem of the blind estimation of the parameters of linear and convolutional error correcting codes. In a non-cooperative context, an adversary has only access to the noised transmission he has intercepted. The intercepter has no knowledge about the parameters used by the legal users. So, before having acess to the information he has first to blindly estimate the parameters of the error correcting code of the communication. The presented approach has the main advantage that the problem of reconstruction of such codes can be expressed in a very simple way. This allows us to evaluate theorical bounds on the complexity of the reconstruction process but also bounds on the estimation rate. We show that some classical reconstruction techniques are optimal and also explain why some of them have theorical complexities greater than these experimentally observed.