Repository logo
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • Research Outputs
  • Projects
  • People
  • Statistics
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Scholalry Output
  3. Publications
  4. A systematic approach for acceleration of matrix-vector operations in cgra through algorithm-architecture co-design
 
  • Details
Options

A systematic approach for acceleration of matrix-vector operations in cgra through algorithm-architecture co-design

Date Issued
2019-05-09
Author(s)
Merchant, Farhad
Vatwani, Tarun
Chattopadhyay, Anupam
Raha, Soumyendu
Nandy, S. K.
Narayan, Ranjani
Leupers, Rainer
DOI
10.1109/VLSID.2019.00030
Abstract
Matrix-vector operations play pivotal role in engineering and scientific applications ranging from machine learning to computational finance. Matrix-vector operations have time complexity of O(n2) and they are challenging to accelerate since these operations are memory bound operations where ratio of the arithmetic operations to the data movement is O(1). In this paper, we present a systematic methodology of algorithm-architecture co-design to accelerate matrix-vector operations where we emphasize on the matrix-vector multiplication (gemv) and the vector transpose-matrix multiplication (vtm). In our methodology, we perform a detailed analysis of directed acyclic graphs of the routines and identify macro operations that can be realized on a reconfigurable data-path that is tightly coupled to the pipeline of a processing element. It is shown that the PE clearly outperforms state-of-the-art realizations of gemv and vtm attaining 135% performance improvement over multicore and 200% over general purpose graphics processing units. In the parallel realization on REDEFINE coarse-grained reconfigurable architecture, it is shown that the solution is scalable.
Subjects
  • dense linear algebra

  • Instruction level par...

  • Matrix-vector operati...

  • Scalability

Copyright © 2016-2025  Indian Institute of Technology Jodhpur

Developed and Maintaining by S. R. Ranganathan Learning Hub, IIT Jodhpur.

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback