Simulation of Obstacle Avoidance for Multiple Autonomous Vehicles in a Dynamic Environment Using Q-Learning

The availability of inexpensive, yet competent hardware allows for increased level of automation and self-optimization in the context of Industry 4.0. However, such agents require high quality information about their surroundings along with a robust strategy for collision avoidance, as they may cause expensive damage to equipment or other agents otherwise. Manually defining a strategy to cover all possibilities is both time-consuming and counter-productive given the capabilities of modern hardware. This paper explores the idea of a model-free self-optimizing obstacle avoidance strategy for multiple autonomous agents in a simulated dynamic environment using the Q-learning algorithm.

Design and Motion Control of a Two-Wheel Inverted Pendulum Robot

Two-wheel inverted pendulum robot (TWIPR) is designed with two-hub DC motors for human riding and motion control evaluation. In order to measure the tilt angle and angular velocity of the inverted pendulum robot, accelerometer and gyroscope sensors are chosen. The mobile robot’s moving position and velocity were estimated based on DC motor built in hall sensors. The control kernel of this electric mobile robot is designed with embedded Arduino Nano microprocessor. A handle bar was designed to work as steering mechanism. The intelligent model-free fuzzy sliding mode control (FSMC) was employed as the main control algorithm for this mobile robot motion monitoring with different control purpose adjustment. The intelligent controllers were designed for balance control, and moving speed control purposes of this robot under different operation conditions and the control performance were evaluated based on experimental results.

Stackelberg Security Game for Optimizing Security of Federated Internet of Things Platform Instances

This paper presents an approach for optimal cyber security decisions to protect instances of a federated Internet of Things (IoT) platform in the cloud. The presented solution implements the repeated Stackelberg Security Game (SSG) and a model called Stochastic Human behaviour model with AttRactiveness and Probability weighting (SHARP). SHARP employs the Subjective Utility Quantal Response (SUQR) for formulating a subjective utility function, which is based on the evaluations of alternative solutions during decision-making. We augment the repeated SSG (including SHARP and SUQR) with a reinforced learning algorithm called Naïve Q-Learning. Naïve Q-Learning belongs to the category of active and model-free Machine Learning (ML) techniques in which the agent (either the defender or the attacker) attempts to find an optimal security solution. In this way, we combine GT and ML algorithms for discovering optimal cyber security policies. The proposed security optimization components will be validated in a collaborative cloud platform that is based on the Industrial Internet Reference Architecture (IIRA) and its recently published security model.

Balancing and Synchronization Control of a Two Wheel Inverted Pendulum Vehicle

A two wheel inverted pendulum (TWIP) vehicle is built with two hub DC motors for motion control evaluation. Arduino Nano micro-processor is chosen as the control kernel for this electric test plant. Accelerometer and gyroscope sensors are built in to measure the tilt angle and angular velocity of the inverted pendulum vehicle. Since the TWIP has significantly hub motor dead zone and nonlinear system dynamics characteristics, the vehicle system is difficult to control by traditional model based controller. The intelligent model-free fuzzy sliding mode controller (FSMC) was employed as the main control algorithm. Then, intelligent controllers are designed for TWIP balance control, and two wheels synchronization control purposes.

Optimizing Dialogue Strategy Learning Using Learning Automata

Modeling the behavior of the dialogue management in the design of a spoken dialogue system using statistical methodologies is currently a growing research area. This paper presents a work on developing an adaptive learning approach to optimize dialogue strategy. At the core of our system is a method formalizing dialogue management as a sequential decision making under uncertainty whose underlying probabilistic structure has a Markov Chain. Researchers have mostly focused on model-free algorithms for automating the design of dialogue management using machine learning techniques such as reinforcement learning. But in model-free algorithms there exist a dilemma in engaging the type of exploration versus exploitation. Hence we present a model-based online policy learning algorithm using interconnected learning automata for optimizing dialogue strategy. The proposed algorithm is capable of deriving an optimal policy that prescribes what action should be taken in various states of conversation so as to maximize the expected total reward to attain the goal and incorporates good exploration and exploitation in its updates to improve the naturalness of humancomputer interaction. We test the proposed approach using the most sophisticated evaluation framework PARADISE for accessing to the railway information system.

A Model-Free Robust Control Approach for Robot Manipulator

A model-free robust control (MFRC) approach is proposed for position control of robot manipulators in the state space. The control approach is verified analytically to be robust subject to uncertainties including external disturbances, unmodeled dynamics, and parametric uncertainties. There is a high flexibility to work on different systems including actuators by the use of the proposed control approach. The proposed control approach can guarantee the robustness of control system. A PUMA 560 robot driven by geared permanent magnet dc motors is simulated. The simulation results show a satisfactory performance for control system under technical specifications. KeywordsModel-free, robust control, position control, PUMA 560.

Robotic End-Effector Impedance Control without Expensive Torque/Force Sensor

A novel low-cost impedance control structure is proposed for monitoring the contact force between end-effector and environment without installing an expensive force/torque sensor. Theoretically, the end-effector contact force can be estimated from the superposition of each joint control torque. There have a nonlinear matrix mapping function between each joint motor control input and end-effector actuating force/torques vector. This new force control structure can be implemented based on this estimated mapping matrix. First, the robot end-effector is manipulated to specified positions, then the force controller is actuated based on the hall sensor current feedback of each joint motor. The model-free fuzzy sliding mode control (FSMC) strategy is employed to design the position and force controllers, respectively. All the hardware circuits and software control programs are designed on an Altera Nios II embedded development kit to constitute an embedded system structure for a retrofitted Mitsubishi 5 DOF robot. Experimental results show that PI and FSMC force control algorithms can achieve reasonable contact force monitoring objective based on this hardware control structure.

Gain Tuning Fuzzy Controller for an Optical Disk Drive

Since the driving speed and control accuracy of commercial optical disk are increasing significantly, it needs an efficient controller to monitor the track seeking and following operations of the servo system for achieving the desired data extracting response. The nonlinear behaviors of the actuator and servo system of the optical disk drive will influence the laser spot positioning. Here, the model-free fuzzy control scheme is employed to design the track seeking servo controller for a d.c. motor driving optical disk drive system. In addition, the sliding model control strategy is introduced into the fuzzy control structure to construct a 1-D adaptive fuzzy rule intelligent controller for simplifying the implementation problem and improving the control performance. The experimental results show that the steady state error of the track seeking by using this fuzzy controller can maintain within the track width (1.6 μm ). It can be used in the track seeking and track following servo control operations.

Adaptive PID Controller based on Reinforcement Learning for Wind Turbine Control

A self tuning PID control strategy using reinforcement learning is proposed in this paper to deal with the control of wind energy conversion systems (WECS). Actor-Critic learning is used to tune PID parameters in an adaptive way by taking advantage of the model-free and on-line learning properties of reinforcement learning effectively. In order to reduce the demand of storage space and to improve the learning efficiency, a single RBF neural network is used to approximate the policy function of Actor and the value function of Critic simultaneously. The inputs of RBF network are the system error, as well as the first and the second-order differences of error. The Actor can realize the mapping from the system state to PID parameters, while the Critic evaluates the outputs of the Actor and produces TD error. Based on TD error performance index and gradient descent method, the updating rules of RBF kernel function and network weights were given. Simulation results show that the proposed controller is efficient for WECS and it is perfectly adaptable and strongly robust, which is better than that of a conventional PID controller.

Model-free Prediction based on Tracking Theory and Newton Form of Polynomial

The majority of existing predictors for time series are model-dependent and therefore require some prior knowledge for the identification of complex systems, usually involving system identification, extensive training, or online adaptation in the case of time-varying systems. Additionally, since a time series is usually generated by complex processes such as the stock market or other chaotic systems, identification, modeling or the online updating of parameters can be problematic. In this paper a model-free predictor (MFP) for a time series produced by an unknown nonlinear system or process is derived using tracking theory. An identical derivation of the MFP using the property of the Newton form of the interpolating polynomial is also presented. The MFP is able to accurately predict future values of a time series, is stable, has few tuning parameters and is desirable for engineering applications due to its simplicity, fast prediction speed and extremely low computational load. The performance of the proposed MFP is demonstrated using the prediction of the Dow Jones Industrial Average stock index.