Proto-value function

They provide a novel framework for solving the credit assignment problem.The framework introduces a novel approach to solving Markov decision processes (MDP) and reinforcement learning problems, using multiscale spectral and manifold learning methods.Proto-value functions were first introduced in the context of reinforcement learning by Sridhar Mahadevan in his paper, Proto-Value Functions: Developmental Reinforcement Learning at ICML 2005.[1] Value function approximation is a critical component to solving Markov decision processes (MDPs) defined over a continuous state space.A good function approximator allows a reinforcement learning (RL) agent to accurately represent the value of any state it has experienced, without explicitly storing its value.However, parameters associated with these basis functions often require significant domain-specific hand-engineering.[2] Proto-value functions attempts to solve this required hand-engineering by accounting for the underlying manifold structure of the problem domain.Previous approaches to this nonlinearity problem lacked a broad theoretical framework, and consequently have only been explored in the context of discrete MDPs.[3] This approach constructs the basis functions by spectral analysis of the graph Laplacian, a self-adjoint (or symmetric) operator on the space of functions on the graph, closely related to the random walk operator.For the sake of simplicity, assume that the underlying state space can be represented as an undirected unweighted graph[1] The spectral analysis of the Laplace operator on a graph consists of finding the eigenvalues and eigenfunctions which solve the equation whereHere the term "eigenfunction" is used to denote what is traditionally referred to as eigenvector in linear algebra, because the Laplacian eigenvectors can naturally be viewed as functions that map each vertex to a real number.Then It is important to note that this can only be done when the state space is finite and of reasonable size.There are a few issues to consider here:[4] Once the PVFs are generated, they can be plugged into a traditional function approximation framework.Hence, each entry of the gram matrix is The coefficients that minimize the least squares error are then described by the equation A nonlinear least-squares approach is possible by using the k PVFs with the largest absolute coefficients to compute the approximation.