A Power-Efficient Hardware Implementation of L-Mul

  • First FPGA-based FP8 approximate multiplier design.
  • Integrate the design into a CNN accelerator, and experiments demonstrate that, among 8-bit designs, our design achieves the highest accuracy, energy efficiency, and the lowest latency.