A Power-Efficient Hardware Implementation of L-Mul
- First FPGA-based FP8 approximate multiplier design.
- Integrate the design into a CNN accelerator, and experiments demonstrate that, among 8-bit designs, our design achieves the highest accuracy, energy efficiency, and the lowest latency.