Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO portion of the matrix is partitioned recursively into a ELL portion and a COO portion in the process of creating HYB-R format to ensure that there are as many non-zeros as possible in ELL format. The method of partitioning the matrix is an important problem for HYB-R kernel, so we also try to tune the parameters to partition the matrix for higher performance. Experimental results show that our method can get better performance than the fastest kernel (HYB) in NVIDIA-s SpMV library with as high as 17% speedup.

Iris Localization using Circle and Fuzzy Circle Detection Method

Iris localization is a very important approach in biometric identification systems. Identification process usually is implemented in three levels: iris localization, feature extraction, and pattern matching finally. Accuracy of iris localization as the first step affects all other levels and this shows the importance of iris localization in an iris based biometric system. In this paper, we consider Daugman iris localization method as a standard method, propose a new method in this field and then analyze and compare the results of them on a standard set of iris images. The proposed method is based on the detection of circular edge of iris, and improved by fuzzy circles and surface energy difference contexts. Implementation of this method is so easy and compared to the other methods, have a rather high accuracy and speed. Test results show that the accuracy of our proposed method is about Daugman method and computation speed of it is 10 times faster.

Concurrency without Locking in Parallel Hash Structures used for Data Processing

Various mechanisms providing mutual exclusion and thread synchronization can be used to support parallel processing within a single computer. Instead of using locks, semaphores, barriers or other traditional approaches in this paper we focus on alternative ways for making better use of modern multithreaded architectures and preparing hash tables for concurrent accesses. Hash structures will be used to demonstrate and compare two entirely different approaches (rule based cooperation and hardware synchronization support) to an efficient parallel implementation using traditional locks. Comparison includes implementation details, performance ranking and scalability issues. We aim at understanding the effects the parallelization schemes have on the execution environment with special focus on the memory system and memory access characteristics.

Modeling of Session Initiation Protocol Invite Transaction using Colored Petri Nets

Wireless mobile communications have experienced the phenomenal growth through last decades. The advances in wireless mobile technologies have brought about a demand for high quality multimedia applications and services. For such applications and services to work, signaling protocol is required for establishing, maintaining and tearing down multimedia sessions. The Session Initiation Protocol (SIP) is an application layer signaling protocols, based on request/response transaction model. This paper considers SIP INVITE transaction over an unreliable medium, since it has been recently modified in Request for Comments (RFC) 6026. In order to help in assuring that the functional correctness of this modification is achieved, the SIP INVITE transaction is modeled and analyzed using Colored Petri Nets (CPNs). Based on the model analysis, it is concluded that the SIP INVITE transaction is free of livelocks and dead codes, and in the same time it has both desirable and undesirable deadlocks. Therefore, SIP INVITE transaction should be subjected for additional updates in order to eliminate undesirable deadlocks. In order to reduce the cost of implementation and maintenance of SIP, additional remodeling of the SIP INVITE transaction is recommended.

Online Computing System for Cctuple-Precision Computation with Fortran

Computations with higher than the IEEE 754 standard double-precision (about 16 significant digits) are required recently. Although there are available software routines in Fortran and C for high-precision computation, users are required to implement such routines in their own computers with detailed knowledges about them. We have constructed an user-friendly online system for octupleprecision computation. In our Web system users with no knowledges about high-precision computation can easily perform octupleprecision computations, by choosing mathematical functions with argument(s) inputted, by writing simple mathematical expression(s) or by uploading C program(s). In this paper we enhance the Web system above by adding the facility of uploading Fortran programs, which have been widely used in scientific computing. To this end we construct converter routines in two stages.