Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator

Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.

Image Rotation Using an Augmented 2-Step Shear Transform

Image rotation is one of main pre-processing steps for image processing or image pattern recognition. It is implemented with a rotation matrix multiplication. It requires a lot of floating point arithmetic operations and trigonometric calculations, so it takes a long time to execute. Therefore, there has been a need for a high speed image rotation algorithm without two major time-consuming operations. However, the rotated image has a drawback, i.e. distortions. We solved the problem using an augmented two-step shear transform. We compare the presented algorithm with the conventional rotation with images of various sizes. Experimental results show that the presented algorithm is superior to the conventional rotation one.

An Effective Approach for Distribution System Power Flow Solution

An effective approach for unbalanced three-phase distribution power flow solutions is proposed in this paper. The special topological characteristics of distribution networks have been fully utilized to make the direct solution possible. Two matrices–the bus-injection to branch-current matrix and the branch-current to busvoltage matrix– and a simple matrix multiplication are used to obtain power flow solutions. Due to the distinctive solution techniques of the proposed method, the time-consuming LU decomposition and forward/backward substitution of the Jacobian matrix or admittance matrix required in the traditional power flow methods are no longer necessary. Therefore, the proposed method is robust and time-efficient. Test results demonstrate the validity of the proposed method. The proposed method shows great potential to be used in distribution automation applications.

All-Pairs Shortest-Paths Problem for Unweighted Graphs in O(n2 log n) Time

Given a simple connected unweighted undirected graph G = (V (G), E(G)) with |V (G)| = n and |E(G)| = m, we present a new algorithm for the all-pairs shortest-path (APSP) problem. The running time of our algorithm is in O(n2 log n). This bound is an improvement over previous best known O(n2.376) time bound of Raimund Seidel (1995) for general graphs. The algorithm presented does not rely on fast matrix multiplication. Our algorithm with slight modifications, enables us to compute the APSP problem for unweighted directed graph in time O(n2 log n), improving a previous best known O(n2.575) time bound of Uri Zwick (2002).

Some Characteristics of Systolic Arrays

In this paper is investigated a possible optimization of some linear algebra problems which can be solved by parallel processing using the special arrays called systolic arrays. In this paper are used some special types of transformations for the designing of these arrays. We show the characteristics of these arrays. The main focus is on discussing the advantages of these arrays in parallel computation of matrix product, with special approach to the designing of systolic array for matrix multiplication. Multiplication of large matrices requires a lot of computational time and its complexity is O(n3 ). There are developed many algorithms (both sequential and parallel) with the purpose of minimizing the time of calculations. Systolic arrays are good suited for this purpose. In this paper we show that using an appropriate transformation implicates in finding more optimal arrays for doing the calculations of this type.