Options
A systematic approach for acceleration of matrix-vector operations in cgra through algorithm-architecture co-design
Date Issued
2019-05-09
Author(s)
Merchant, Farhad
Vatwani, Tarun
Chattopadhyay, Anupam
Raha, Soumyendu
Nandy, S. K.
Narayan, Ranjani
Leupers, Rainer
DOI
10.1109/VLSID.2019.00030
Abstract
Matrix-vector operations play pivotal role in engineering and scientific applications ranging from machine learning to computational finance. Matrix-vector operations have time complexity of O(n2) and they are challenging to accelerate since these operations are memory bound operations where ratio of the arithmetic operations to the data movement is O(1). In this paper, we present a systematic methodology of algorithm-architecture co-design to accelerate matrix-vector operations where we emphasize on the matrix-vector multiplication (gemv) and the vector transpose-matrix multiplication (vtm). In our methodology, we perform a detailed analysis of directed acyclic graphs of the routines and identify macro operations that can be realized on a reconfigurable data-path that is tightly coupled to the pipeline of a processing element. It is shown that the PE clearly outperforms state-of-the-art realizations of gemv and vtm attaining 135% performance improvement over multicore and 200% over general purpose graphics processing units. In the parallel realization on REDEFINE coarse-grained reconfigurable architecture, it is shown that the solution is scalable.