020 – RMS Metering with an FPGA

In this post we will go over the implementation of an RMS meter on an FPGA.

In our original LED Meter we measured the magnitude of our incoming audio signal by extracting the absolute value of each sample. When we expanded the LED Meter to support stereo inputs, this fundamental measurement strategy remained the same. However, this is not the only way to measure an audio signal. Today we will discuss another way to measure the magnitude of a signal: the Root Mean Square (RMS) measurement.

RMS Value in a Nutshell

The RMS value of a signal is calculated by averaging the square values of its samples over time and then extracting the square root of the result, as given by this formula:

RMS Formula
RMS Formula

In electrical engineering, the RMS value of an AC signal tells how much energy it contains. More specifically, it tells us the value that a DC signal would need to have in order to dissipate the same amount of power over a resistor as the AC signal that we are measuring. Because it measures energy, the RMS value of an audio signal has been historically used as a proxy for its perceived loudness.

By changing the length of the averaging window, we can adjust the sensitivity of our RMS Meter to sudden changes in the amplitude of the audio signal.

RMS Measurement in SystemVerilog

We will stick to floating-point processing for our RMS Meter so that we can use it as a building block for any algorithm that requires the magnitude of the signal as an input. We will use the same approach from our mono and stereo delays, where we rely on Xilinx’s Floating-Point Operator IP cores to perform the arithmetic calculations, and we implement a custom FSM to provide the inputs and wait for the outputs. Also like for the delay logic, we need a circular buffer to store the last N samples that we will use to calculate the RMS value. We will store 1024 samples, but the actual value is not important for now, as it can be easily parameterized later. We also introduced the circular buffer when describing the mono delay effect.

So how do we go about implementing the RMS formula above? A naive approach would be to run the entire calculation with each new sample, but that would be extremely inefficient: we would have to perform 1024 multiplications, 1023 additions, one division and one square root operation at every cycle. Instead, we will use an accumulator that keeps the running sum of the averaged samples. At each sample we will:

  1. Square the newest (incoming) sample, add it to the accumulator and store it in the circular buffer
  2. Fetch the oldest sample from the circular buffer (which is already squared) and subtract it from the accumulator
  3. Divide the accumulator by 1024 and store the result in a different signal (we need to keep the running sum in the accumulator!)
  4. Extract the square root from the result of the division

The figure below shows the state diagram for the FSM that performs the RMS calculation.


The source code for the FSM is shown below.

module rms_meter (
    input   logic i_clock,
    // Audio Input
    input   logic   [31 : 0]    i_data,
    input   logic               i_data_valid,
    // RMS Value Output
    output  logic   [31 : 0]    o_rms_value,
    output  logic               o_rms_value_valid

    timeunit 1ns;
    timeprecision 1ps;

    localparam shortreal BUFFER_SIZE = 32'h44800000;    // 1024

    // Buffer
    logic           buffer_wr_en = 1'b0;
    logic [9 : 0]   buffer_data_in_addr = 'b0;
    logic [31 : 0]  buffer_data_in = 'b0;
    logic [9 : 0]   buffer_data_out_addr = 'b0;
    logic [31 : 0]  buffer_data_out;
    rms_meter_buffer rms_meter_buffer_inst(
        .clka   (i_clock),
        .wea    (buffer_wr_en),
        .addra  (buffer_data_in_addr),
        .dina   (buffer_data_in),
        .clkb   (i_clock),
        .addrb  (buffer_data_out_addr),
        .doutb  (buffer_data_out)

    // Floating-point Multiplier
    logic           fp_mult_valid_in;
    logic [31 : 0]  fp_mult_data_a;
    logic [31 : 0]  fp_mult_data_b;
    logic           fp_mult_valid_out;
    logic [31 : 0]  fp_mult_data_out;
    fp_multiplier fp_multiplier_inst(
        .aclk                   (i_clock),
        .s_axis_a_tvalid        (fp_mult_valid_in),
        .s_axis_a_tdata         (fp_mult_data_a),
        .s_axis_b_tvalid        (fp_mult_valid_in),
        .s_axis_b_tdata         (fp_mult_data_b),
        .m_axis_result_tvalid   (fp_mult_valid_out),
        .m_axis_result_tdata    (fp_mult_data_out)

    // Floating-point Adder
    logic           fp_adder_valid_in;
    logic [31 : 0]  fp_adder_data_a;
    logic [31 : 0]  fp_adder_data_b;
    logic           fp_adder_valid_out;
    logic [31 : 0]  fp_adder_data_out;
    fp_adder fp_adder_inst(
        .aclk                   (i_clock),
        .s_axis_a_tvalid        (fp_adder_valid_in),
        .s_axis_a_tdata         (fp_adder_data_a),
        .s_axis_b_tvalid        (fp_adder_valid_in),
        .s_axis_b_tdata         (fp_adder_data_b),
        .m_axis_result_tvalid   (fp_adder_valid_out),
        .m_axis_result_tdata    (fp_adder_data_out)

    // Floating-point Subtractor
    logic           fp_subtractor_valid_in;
    logic [31 : 0]  fp_subtractor_data_a;
    logic [31 : 0]  fp_subtractor_data_b;
    logic           fp_subtractor_valid_out;
    logic [31 : 0]  fp_subtractor_data_out;
    fp_subtractor fp_subtractor_inst(
        .aclk                   (i_clock),
        .s_axis_a_tvalid        (fp_subtractor_valid_in),
        .s_axis_a_tdata         (fp_subtractor_data_a),
        .s_axis_b_tvalid        (fp_subtractor_valid_in),
        .s_axis_b_tdata         (fp_subtractor_data_b),
        .m_axis_result_tvalid   (fp_subtractor_valid_out),
        .m_axis_result_tdata    (fp_subtractor_data_out)

    // Floating-point Divider
    logic           fp_divider_valid_in;
    logic [31 : 0]  fp_divider_data_a;
    logic [31 : 0]  fp_divider_data_b;
    logic           fp_divider_valid_out;
    logic [31 : 0]  fp_divider_data_out;
    fp_divider fp_divider_inst(
        .aclk                   (i_clock),
        .s_axis_a_tvalid        (fp_divider_valid_in),
        .s_axis_a_tdata         (fp_divider_data_a),
        .s_axis_b_tvalid        (fp_divider_valid_in),
        .s_axis_b_tdata         (fp_divider_data_b),
        .m_axis_result_tvalid   (fp_divider_valid_out),
        .m_axis_result_tdata    (fp_divider_data_out)

    // Floating-point Square Root
    logic           fp_square_root_valid_in;
    logic [31 : 0]  fp_square_root_data_in;
    logic [31 : 0]  fp_square_root_data_out;
    fp_square_root fp_square_root_inst(
        .aclk                   (i_clock),
        .s_axis_a_tvalid        (fp_square_root_valid_in),
        .s_axis_a_tdata         (fp_square_root_data_in),
        .m_axis_result_tvalid   (fp_square_root_valid_out),
        .m_axis_result_tdata    (fp_square_root_data_out)

    // Main FSM
    enum logic [2 : 0]  {IDLE,
                        EXTRACT_SQUARE_ROOT} main_fsm_state = IDLE;

    logic [31 : 0] accumulator = 'b0;
    logic [31 : 0] aux = 'b0;

    always_ff @(posedge i_clock) begin : main_fsm
        buffer_data_out_addr <= buffer_data_in_addr + 1;
        case (main_fsm_state)
            IDLE : begin
                o_rms_value_valid <= 1'b0;
                buffer_wr_en <= 1'b0;
                fp_mult_valid_in <= 1'b0;
                fp_adder_valid_in <= 1'b0;
                fp_subtractor_valid_in <= 1'b0;
                fp_divider_valid_in <= 1'b0;
                fp_square_root_valid_in <= 1'b0;
                if (i_data_valid == 1'b1) begin
                    fp_mult_valid_in <= 1'b1;
                    fp_mult_data_a <= i_data;
                    fp_mult_data_b <= i_data;
                    buffer_data_in_addr <= buffer_data_in_addr + 1;
                    main_fsm_state <= SQUARE_NEWEST_SAMPLE;

            SQUARE_NEWEST_SAMPLE : begin
                fp_mult_valid_in <= 1'b0;
                if (fp_mult_valid_out == 1'b1) begin
                    buffer_wr_en <= 1'b1;
                    buffer_data_in <= fp_mult_data_out;
                    fp_adder_valid_in <= 1'b1;
                    fp_adder_data_a <= fp_mult_data_out;
                    fp_adder_data_b <= accumulator;
                    main_fsm_state <= ADD_NEWEST_SAMPLE;

            ADD_NEWEST_SAMPLE : begin
                buffer_wr_en <= 1'b0;
                fp_adder_valid_in <= 1'b0;
                if (fp_adder_valid_out == 1'b1) begin
                    fp_subtractor_valid_in <= 1'b1;
                    fp_subtractor_data_a <= fp_adder_data_out;
                    fp_subtractor_data_b <= buffer_data_out;
                    main_fsm_state <= REMOVE_OLDEST_SAMPLE;

            REMOVE_OLDEST_SAMPLE : begin
                fp_subtractor_valid_in <= 1'b0;
                if (fp_subtractor_valid_out == 1'b1) begin
                    accumulator <= fp_subtractor_data_out;
                    fp_divider_valid_in <= 1'b1;
                    fp_divider_data_a <= fp_subtractor_data_out;
                    fp_divider_data_b <= BUFFER_SIZE;
                    main_fsm_state <= CALCULATE_AVERAGE;

            CALCULATE_AVERAGE : begin
                fp_divider_valid_in <= 1'b0;
                if (fp_divider_valid_out == 1'b1) begin
                    fp_square_root_valid_in <= 1'b1;
                    fp_square_root_data_in <= fp_divider_data_out;
                    main_fsm_state <= EXTRACT_SQUARE_ROOT;

            EXTRACT_SQUARE_ROOT : begin
                fp_square_root_valid_in <= 1'b0;
                if (fp_square_root_valid_out == 1'b1) begin
                    o_rms_value <= fp_square_root_data_out;
                    o_rms_value_valid <= 1'b1;
                    main_fsm_state <= IDLE;

            default : begin
                main_fsm_state <= IDLE;

RMS Measurement Simulation in Vivado

We are now ready to simulate our design. We use our handy SystemVerilog WAVE file reader to load a snare hit into our testbench, which also includes the conversion from fixed- to floating-point representation. The results of the simulation are shown in the figure below.

Simulation of the RMS Meter Output of a Snare Hit

The upper analog waveform shows the incoming snare hit, while the lower one shows the calculated RMS value. We can see that the RMS measurement ‘lags’ behind the audio signal and reaches its maximum value much later. This lag can be controlled by adjusting the length of the averaging window, with a longer average introducing more lag. At about 20 ms the RMS values become more stable and then have a sharp decline for a short period before starting to decline more slowly. This correlates roughly with the time that our logic starts removing samples from the accumulator, since up until that point the circular buffer only contained zeroes (the data output of the circular buffer is the signal directly below the RMS bus).

In our next post we will introduce our first module for dynamic range processing of audio signals: the Clipper.



Leave a Reply

Your email address will not be published. Required fields are marked *