COE 586 Term Project: A COMPARATIVE STUDY OF DIFFERENT MULTIPLY ACCUMULATE ARCHITECTURE IMPLEMENTATIONS ON FPGA USING DISTRIBUTED ARITHMETIC AND RESIDUE ARITHMETIC
Abstract:
The proposed theoretical benefits of DA and RNS for realizing the full potential of FPGA architecture for hardware implementation of MAC calculation and achieving large parallelism are tested in this work. The relative area and speed efficiencies of DA and RNS based hardware MAC implementation on FPGA were analyzed. It has been found that though the FPGA has support for efficient implementation of components required in these architectures, the DA approach is superior. Compared to RNS, DA approach can achieve near to maximum clock rates possible with a given FPGA technology using only basic 4-LUT based blocks and the fast ripple carry chains while the multi stage modulo adders required in RNS implementation are slow, even for small word lengths, and as such the accumulator stage becomes the performance bottleneck. Also, RNS architecture require a large area overhead in forward converters using a direct implementation of CRT, besides being a speed bottleneck due to the large modulo adders required.