Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
Posts
CompArch Research Group Database
Published:
This Notion database is a compilation of computer architecture research groups from around the world, sortable by university affiliation and research sub-domains. If you’d like to contribute towards additions, corrections, or updates, please feel free to reach out via email.
portfolio
Portfolio item number 1
Published:
Short description of portfolio item number 1
Portfolio item number 2
Published:
Short description of portfolio item number 2
publications
Optimizing RGB to Grayscale, Gaussian Blur and Sobel-Filter operations on FPGAs for reduced dynamic power consumption
Published in 3rd IEEE conference on AIIoT, 2024
Abstract: The conversion of pixels from their RGB to Grayscale formats is a crucial first step in numerous Image Pre-Processing, Computer Vision, and as highlighted here, edge detection modules. This paper presents an implementation of the Shift-Add Multiplication algorithm for efficient constant multiplications of the NTSC formula weights for RGB to Grayscale conversion on FPGAs. The proposed module is designed to be reconfigurable to both fixed-point and floating-point formats, providing flexibility in precision and resource utilization based on application requirements. Additionally, a Python script was developed to automate the generation of Verilog code for fractional constant multiplications, as proposed in this study. Pipelined modules for Gaussian Blur and the Sobel-Filter were also designed to enable the development of a complete real-time edge detection system on FPGAs. The findings reveal that Shift-Add algorithm based multiplier’s significantly reduce dynamic power consumption as compared to the use of the built-in DSP blocks on FPGA boards while performing constant multiplications for RGB to Grayscale conversion.
A Configurable Mixed-Precision Fused Dot Product Unit for GPGPU Tensor Computation
Published in Vortex Workshop, MICRO '58, 2025
Abstract: There has been increasing interest in developing and accelerating mixed-precision Matrix-Multiply-Accumulate operations in GPGPUs for Deep Learning workloads. However, existing open-source RTL implementations of inner dot product units rely on discrete arithmetic units, leading to suboptimal throughput and poor resource utilization. To address these challenges, we propose a scalable mixed-precision dot product unit that integrates floating-point and integer arithmetic pipelines within a singular fused architecture, implemented as part of the open-source RISC-V based Vortex GPGPU’s Tensor Core Unit extension. Our design supports low-precision multiplication in FP16/BF16/FP8/BF8/INT8/UINT4 formats and higher-precision accumulation in FP32/INT32, with an extensible framework for adding and evaluating other custom representations in the future. Experimental results demonstrate 4-cycle operation latency at 362.2 MHz clock frequency on the AMD Xilinx Alveo U55C FPGA, delivering an ideal filled pipeline throughput of 5.795 GFlops in a 4-thread per warp configuration.
talks
Talk 1 on Relevant Topic in Your Field
Published:
This is a description of your talk, which is a markdown file that can be all markdown-ified like any other post. Yay markdown!
Conference Proceeding talk 3 on Relevant Topic in Your Field
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
teaching
Teaching experience 1
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Teaching experience 2
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.