# Field-Programmable Gate Array-Based Ultra-Low Power Discrete Fourier Transforms for Closed-Loop Neural Sensing

Richard Yang, Heather D. Orser Member, IEEE, Kip A. Ludwig, Brandon S. Coventry\* Member, IEEE

Abstract—Digital implementations of discrete Fourier transforms (DFT) are a mainstay in feature assessment of recorded biopotentials, particularly in the quantification of biomarkers of neurological disease state for adaptive deep brain stimulation. Fast Fourier transform (FFT) algorithms and architectures present a substantial power demand from onboard batteries in implantable medical devices, necessitating the development of ultra-low power Fourier transform methods in resourceconstrained environments. Numerous FFT architectures aim to optimize power and resource demand through computational efficiency; however, prioritizing the reduction of logic complexity at the cost of additional computations can be equally or more effective. This paper introduces a minimal architecture singledelay feedback discrete Fourier transform (mSDF-DFT) for use in ultra-low-power field programmable gate array applications and shows energy and power improvements over state-of-the-art FFT methods. We observe a 33% reduction in dynamic power and 4% reduction in resource utilization in a neural sensing application when compared to state-of-the-art FFT algorithms. While designed for use in closed-loop deep brain stimulation and medical device implementations, the mSDF-DFT is also easily extendable to any ultra-low power embedded application.

Index Terms—Deep Brain Stimulation, FFT, Fourier Transform, FPGA, Medical Device, Neurotechnology, Real-Time Signal Processing, Implantable Pulse Generator

### I. INTRODUCTION

THE discrete Fourier transform (DFT) is a fundamental tool in embedded digital signal processing, facilitating the decomposition of input signals into their constituent frequency components. Its utility spans a vast number of applications, making it indispensable for embedded data processing across numerous application domains. Advancement in DFT applications have traditionally focused on reducing computation time and enhancing signal throughput through optimization of numerical algorithms and the

This work was supported in part by the United States National Institute of Neurological Disorders and Stroke (NINDS) grant #RF1-NS129955 and the Hilldale Undergraduate/Faculty Research Fellowship (University of Wisconsin-Madison).

R. Yang is with the Department of Biomedical Engineering, the Department of Computer Science, and the Wisconsin Institute for Translational Neuroengineering, University of Wisconsin-Madison, Madison WI 53701 USA. (e-mail: tyang296@wisc.edu).

H.D. Orser is with the Department of Electrical and Computer Engineering, University of St. Thomas, St. Paul MN 55105 (email: orser@stthomas.edu)

K.A. Ludwig is with the Department of Neurological Surgery, the Department of Surgery, and the Wisconsin Institute for Translational Neuroengineering, University of Wisconsin-Madison, Madison WI 53701 USA (e-mail: kip.ludwig@wisc.edu)

B.S. Coventry is with the Department of Neurological Surgery, the Department of Biomedical Engineering, and the Wisconsin Institute for Translational Neuroengineering, University of Wisconsin-Madison, Madison WI 53701 USA (e-mail: coventry@wisc.edu)

\*Author to whom correspondence should be addressed.

hardware on which DFTs are employed. The Fast Fourier Transform (FFT), rooted in the Cooley-Tukey algorithm[1], facilitates rapid computation of DFTs, reducing its computational complexity from  $\mathcal{O}(N^2)$  to  $\mathcal{O}(N\log(N))$ . The FFT reduces time complexity by decomposing calculations into smaller, atomic microoperations which leverages the symmetry and periodicity properties of the DFT to reuse intermediate results and provide temporally efficient calculations[1]. These microoperations are implemented in hardware via complex multiply and accumulate, delay pipeline, and storage ram blocks, commonly referred to as butterfly structures due to their characteristic connections and flow graph[1]-[3]. Extensions of the FFT algorithm have been further optimized to maximize throughput using hardware pipelining techniques or by minimizing implementation size using time-multiplexing[4], [5].

Implantable medical devices are increasingly employing Fourier transform algorithms and processing cores for real time analysis of biological signaling and control. For example, adaptive deep brain stimulation (DBS), a closedloop extension of clinical gold standard DBS for the treatment of the motor symptoms of Parkinson's disease[6] uses measurements of extracellular oscillatory electric potential activity, called local field potentials (LFP), as readouts for motor symptomology and control signals for neural stimulation. In particular, the amplitude of  $\beta$ -band oscillations between 13-30Hz in the subthalamic nucleus and globus pallidus interna are correlated with the magnitude of bradykinesia. Likewise,  $\gamma$ -band oscillations between 80-200Hz are associated with dyskinesia in patients with Parkinson's Disease[7]-[12]. As closed-loop neuromodulation matures, it is likely that more varied and complex LFPbased biomarkers will be employed for the rapeutic sensing, further reinforcing real-time DFT use in implanted devices.

DBS and other electrical stimulation therapies are primarily deployed through implantable pulse generators (IPG)[13]; devices that deliver constant current electrical stimulation, perform biological sensing and signal processing, and provide battery management operations. IPGs are chronically implanted and operate from an internal battery and are subject to strict power, hardware, and computational constraints. These constraints place an inherent tradeoff between incorporating advanced features, such as real-time sensing and adaptive control, and maintaining device longevity and therapeutic efficacy[14]. Longevity of IPGs is a particularly salient design constraint, with battery replacement necessitating surgical intervention adding potential medical risks, patient stress, and economic costs. Power constraints are even more pronounced in small

animal IPGs[15], [16] used in preclinical studies, where size and power usage are further constrained by the limited payload capacity of subjects. Efforts to minimize power consumption and optimize the use of limited hardware resources are critical for advancing IPG design. Such advancements could enable the development of smaller, minimally invasive, and potentially injectable devices, broadening the applications of neuromodulation therapies while reducing patient burden.

While the FFT has facilitated fast, high-throughput transforms, the increasingly parallelized architectures required for FFT computation introduce significant dynamic power consumption that directly competes with the power and efficiency requirements for chronic therapeutic stimulation in IPGs. These concerns in other resource constrained settings have been partially addressed through single-delay feedback fast Fourier transform methods (SDF-FFT), a class of memory efficient FFT implementations using a single multiply line with coefficients stored in feedback shift registers[17]-[20]. This class of architecture reduces memory loads over conventional highly-parallelized FFT implementations at the cost of increased latency[19], [20]. While these approaches attempt to minimize RAM and transistor usage, they are still bound by static and dynamic power loads owing to the complex computational structure of the FFT. Additional approaches, such as architectures using approximate multiplication and addition operations[21], can improve energy efficiency but often degrade signal resolution, an unacceptable compromise for applications requiring high-resolution Fourier representations. Alternatively, reducing processor supply voltages has been explored as a method to improve Fourier transform energy efficiency[22]. However, lowered system voltages may conflict with voltage and logic level requirements of peripheral components on IPG devices requiring extra level-shifting circuitry and potential increases in whole system power consumption.

This work presents the design, implementation, and evaluation of a power efficient minimal single-delay path discrete Fourier transform (mSDF-DFT) architecture for use in ultra-low power embedded applications that require online spectral estimation. The mSDF-DFT is a direct implementation of the DFT that computes in a time-multiplexed fashion. We show that simplified DFT architectures can be more power and resource efficient than the more computationally efficient FFT architectures by maximally reducing logic complexity while maintaining DFT accuracy and time complexity. The mSDF-DFT shows improvements in power and resource use while providing online performance comparable to embedded standard FFT implementations. Comparisons with state of the art (SOTA) Xilinx burst I/O FFT, pipelined FFT and canonical Goertzel Algorithm[23] are made to characterize and elucidate the advantages of the SDF-DFT. Lastly, we validate the use of the mSDF-DFT in neural sensing applications by calculating LFP  $\beta$  and  $\gamma$  power bands for use in a closed-loop DBS application. The mSDF-DFT architecture is shown to be a meaningful solution that reduces power consumption and maintains

TABLE I SUMMARY OF ALGORITHM RUN TIME CONFIGURABLE FEATURES

| 1  | Algorithm     | RTC-exclusive Features | General Features            |  |
|----|---------------|------------------------|-----------------------------|--|
| n  | nSDF-DFT      | Point size,            | Forward/backward transform, |  |
|    |               | Frequency bin index    | scaling schedule            |  |
|    | FFT           | Point size             | Forward/backward transform, |  |
|    |               |                        | scaling schedule            |  |
| Go | ertzel Filter | Point size,            | Scaling schedule            |  |
|    |               | Frequency bin index    |                             |  |

transform accuracy at the cost of increased latency. While evaluated in the context of implantable neural stimulators, applications where power useage is critical, such as satellite systems and internet-of-things devices, may benefit from use of mSDF-DFT architectures.

### II. ALGORITHM IMPLEMENTATION

The following sections describe the design and implementation of the mSDF-DFT and comparison with benchmark FFT methods commonly used in embedded systems. Hardware architectures were designed in SystemVerilog and evaluated on a Xilinx Spartan-7 FPGA (Boolean Board, Real Digital, Pullman WA) with synthesis and analysis performed in Vivado Design Suite (AMD). All hardware architectures included run time configurable (RTC) and non-run time configurable (Fixed) implementations. RTC variants allowed setting of DFT parameters, such as transform length and number of frequency bins at runtime at the cost of more complex architecture structures, while fixed implementations set DFT parameters before hardware synthesis. Available RTC parameters for each DFT are given in Table I. Input data, phase factors, and all intermediate results were expressed with 12-bit fixed point data format with 4 fractional bits. Truncation was used for rounding and all DFT architectures were implemented with an 80 MHz system clock. mSDF-DFT architecture performance was compared against benchmark Goertzel filter, burst I/O FFT, and pipelined FFT architectures representing current standard performance in hardware resource use, run-time and throughput efficiency, and power use in embedded systems and are described in subsequent sections.

### A. Minimal SDF-DFT Implementation Details

The DFT of an N point discretized signal  $x_n$  for k frequency samples is defined as:

$$X_k = \sum_{n=0}^{N-1} x_n W_N^k$$
 (1)

for  $k=0,1,\ldots N-1$ . The  $W_N^k$  term is the periodic basis defined as  $e^{-j2\pi\frac{kn}{N}}$  and called the phase or twiddle factor. The mSDF-DFT is implemented as a direct computation of the governing Fourier equation with a finite state machine (FSM) implementation given in Fig. 1A and algorithm 1. In contrast to traditional FFT structures, the mSDF-DFT performs direct computations around a predefined number of frequency bins, creating a computational complexity of  $\mathcal{O}(N\times N_B)$ , where  $N_B$  is the number of frequency bins to

be processed, and N is the point size. The module consists of one multiplier, one adder/subtractor, and one read-only memory (ROM) containing the phase factors (Fig. 1C). The input is a stream of N complex values, each represented by a pair of 12-bit-wide twos-complement numbers. The real and imaginary components of each sample are processed separately with a time-multiplexed approach. The compile-time parameters of the module consist of  $N_B$  and maximum point size  $(N_{MAX})$ . As such, the mSDF-DFT trades time-complexity for savings in dynamic power and FPGA resource utilization, paramount to ultra-low power embedded processing.

### Algorithm 1 Minimal SDF-DFT Finite State Machine.

WAIT

### Wait until new sample received

START(X)

FOR I = 1 TO k FREQUENCY BINS DO:

RETRIEVE PHASE FACTOR  $W_N^k$  FROM MEMORY

REAL COMPONENT  $X_{Real}[n] \rightarrow X[n] * Real(W_N^k)$   $X_{Real}[n] \rightarrow \text{ACCUMULATION REGISTER}$ IMAGINARY COMPONENT  $X_{Imag}[n] \rightarrow X[n] * Imag(W_N^k)$   $X_{Imag}[n] \rightarrow \text{ACCUMULATION REGISTER}$ CHECK

IF I = K

BREAK: ALL FREQUENCY BINS DONE

ELSE

GO TO NEXT FREQUENCY BIN

### B. Implementation of Goertzel filters

The Goertzel filter[23] is a commonly used modification of the DFT which utilizes the periodicity of the  $e^{-j2\pi\frac{kn}{N}}$  terms to reduce computational loads, using only real-valued coefficients and limited memory to facilitate efficient implementation in embedded applications. Goertzel filter operates on the input x[n] in two stages. The first stage produces a real-valued intermediate sequence s[n]:

$$s[n] = x[n] + 2cos(\omega)s[n-1] - s[n-2]$$
 (2)

where  $\omega$  is given by  $\frac{2\pi k}{N}$ , k is the frequency bin index, and N is the width of the transform window. The second stage produces the complex output sequence y[n]:

$$y[n] = s[n] - e^{-j\omega}s[n-1]$$
 (3)

representing a convolutional form of the DFT. Goertzel filter architectures were implemented as an FSM, with one multiplier, one adder and one ROM unit.

### C. Implementation of pipelined FFT

Decimation-in-frequency pipelined FFTs were implemented using the Xilinx FFT IP Core (AMD). The pipelined FFT implementation decomposes the input signal into

### mSDF-DFT Finite State Machine



## mSDF-DFT System Architecture



Fig. 1. Schematic description of the SDF-DFT finite state machine. A. Inputs to the SDF-DFT include clock, reset, write (wr), input time series (x), frequency bin index (k), and transform window (N). Outputs consist of output FT representation (X) and binary done indicators. B. Flow diagram of the SDF-DFT finite state machine. C. Hardware structure diagram of the mSDF-DFT.

parallel streams of add/multiply, twiddle storage ROM, and radix butterfly structures, allowing for temporally efficient FFT calculations at the potential cost of hardware complexity[24]. Pipelined architectures were implemented as radix-2 decomposition networks consisting of  $log_2(N)$  stages, each with  $\frac{N}{2}$  radix-2 butterflies.

### D. Implementation of burst I/O FFTs

Burst I/O FFT methods represent a tradeoff between low-resource Goertzel implementations and fully parallelized FFTs. Burst I/O architectures decouple DFT computation from data input and output operations, performing serial processing of blocks of input signals, as opposed to pipelined FFTs which process signals on arrival to Fourier transform cores. Burst I/O architectures provide minimal memory overhead at the cost of increased computational latency. Burst I/O FFTs were implemented as radix-2 employing a shared additive line with a radix-2 butterfly that shares one adder serving to perform parallelized decimation in time computation with minimal memory overhead at the expense of increased computation latency. Comparative FFTs were all implemented in Xilinx Vivado studio using the Xilinx Fast Fourier Transform 9.1 IP core (AMD).

### E. Performance Metrics

Performance evaluations are focused on dynamic power and physical resource utilization. Total system power was calculated using the Vivado power estimation toolset, with total power consumption defined as:

$$P_{Total} = P_{DS} + P_S + P_D \tag{4}$$

with  $P_DS_1P_S_2P_D$  denoting Spartan 7 static power consumption, design architecture static power consumption, and design architecture dynamic power consumption respectively. Design architecture dynamic power consumption was estimated for forward Fourier transform operations for N point transforms between 32 and 32,768 points. mSDF-DFT and Goertzel filter architectures contain an additional number of frequency bands parameter ( $N_B$ ) with  $N_B$  tested between 4 to 64 for all N point sizes.

To evaluate the total number of physical resources used by each DFT architecture, a physical resource utilization metric was defined as:

$$\sqrt{LUT^2 + FF^2 + DSP^2 + BRAM^2} \tag{5}$$

where LUT, FF, DSP, and BRAM refer to percent utilization measured in Vivado Design Suite of available look up tables, flip flops, digital signal processing cores, and block random access memory respectively. Physical resource utilization was estimated for forward Fourier transform operations with N between 32 and 32,768 points and  $N_B$  between 4 and 64.

Comparisons between mSDF-DFT and state of the art architecture performance were estimated as:

$$\frac{y_m - y_{reference}}{y_{reference}} * 100 \tag{6}$$

where  $y_m$  is the value of physical resource utilization or dynamic power for the mSDF-DFT and  $y_{reference}$  is the physical resource utilization or dynamic power for the comparative benchmark architecture respectively. The pathological case of calculating DFTs with transform length of 64 with 32 bins (32,64) was excluded from analysis as this

TABLE II SPECIFICATION OF THE NEURAL SENSING SYSTEM

| Clock Frequency          | 50 MHz  |
|--------------------------|---------|
| DFT Point size           | 128     |
| DFT Frequency Resolution | 11.9 Hz |
| DFT $N_B$                | 7       |
| LFP sampling frequency   | 1529 Hz |
| SPI Clock Frequency      | 500 KHz |
|                          |         |

set of parameters created an under constrained degenerate Fourier transform.

Architecture latencies were evaluated as the number of clock cycles elapsed between the loading of the initial input sample and the egress of the first computed frequency bin and the unloading of the first computed frequency bin. Latency of the pipelined and burst I/O transforms implemented with Xilinx IP cores were obtained with Vivado simulations. Least-squares linear regression was performed to obtain a linear estimate of latency as a function of the Fourier transform point size.

## III. EVALUATION OF MSDF-DFTS FOR NEURAL SENSING

To evaluate online neural sensing with the mSDF-DFT method, recordings from auditory cortex in response to medial geniculate body infrared neural stimulation recorded during a previous experiment[25] were used. Infrared neural stimulation (INS) is an optical neuromodulation technique which uses coherent infrared light to elicit spatially constrained excitatory neural responses in nerve and neuron[26]–[29] without electrical stimulation artifact.

To mimic common IPG closed-loop sensing, sample LFPs were loaded into the memory of a nRF5340 microprocessor (Nordic Semiconductor, Trondheim, Norway) and transmitted to a Xilinx Spartan 7 FPGA (AMD) through a custom serial peripheral interface (SPI) driver.  $\beta$  (13-30 Hz) and  $\gamma$  (30-100 Hz) band power are calculated with mSDF-DFT and transmitted back to the microprocessor. Specifications of the neural sensing implementation are summarized in Table 2. Ground truth spectral content of test LFPs was calculated offline in Matlab (Mathworks, Natick MA) using the same number of bins and transform lengths as online mSDF-DFTs. Error between online calculation and ground truth was calculated as:

$$e_{online}[i] = 10log\left(\frac{|X_{online}[i]|}{|X_{offline}[i]|}\right)$$
 (7)

where  $X_{online}[i]$  and  $X_{offline}[i]$  are the complex amplitude for frequency bin i for the online and Matlab calculations, respectively.

### IV. RESULTS

Minimal SDF-DFT synthesis and implementation results in Vivado are summarized in Table 3. The mSDF-DFT was implemented with the inverse DFT features to provide appropriate comparisons against the Xilinx IP cores; thus, further reduction in power and resource consumption can be easily realized by removing the feature if only Fourier decomposition is needed for the application.

| TABLE III                                               |  |  |  |  |  |
|---------------------------------------------------------|--|--|--|--|--|
| SHIMMARY OF SELECTED MSDF-DET IMPLEMENTATIONS IN VIVADO |  |  |  |  |  |

| Variant | $N_B$ | N   | LUT count | Flip-flop count | DSP count | BRAM count | Dynamic Power (mW) |
|---------|-------|-----|-----------|-----------------|-----------|------------|--------------------|
| RTC     | 8     | 512 | 360       | 353             | 2         | 1          | 9                  |
| RTC     | 16    | 512 | 477       | 546             | 2         | 1          | 10                 |
| RTC     | 32    | 512 | 954       | 936             | 2         | 1          | 15                 |
| Fixed   | 8     | 512 | 287       | 342             | 2         | 1          | 8                  |
| Fixed   | 16    | 512 | 363       | 535             | 2         | 1          | 9                  |
| Fixed   | 32    | 512 | 902       | 926             | 2         | 1          | 14                 |

### A. Performance Evaluation of mSDF-DFT

The mSDF-DFT was designed to provide ultra-low dynamic power consumption and minimal resource usage for a minimal reduction in time-complexity. mSDF-DFT performance was benchmarked against Goertzel filter DFTs as well as parallelized burst I/O and pipelined FFT methods. We found that mSDF-DFT's performance was consistent across the RTC and the fixed variants with respect to the pipelined and the burst I/O FFT. Comparisons were made in both non-RTC (Figure 2) and RTC (Figure 3) settings to account for differential memory usage for parameters set pre vs post compilation. It was observed that the magnitude of power reduction was greater than savings in resource usage. The mean resource saving was 14.2% with a standard deviation of 198% while mean power saving was 53.3% with a standard deviation of 43.1%. Minimal SDF-DFT achieved significant reduction in both power and resource consumption relative to the pipelined FFT at all parameters while it outperforms the burst I/O FFT with low  $N_B$  to N ratio (Fig 2A,2C,3A,3C). More specifically, this ratio was found to be 0.5 at point size of 32 and diminishes approximately in a power series fashion as a function of point size. This relationship can be modeled by:

$$\frac{N_B}{N} = 9N^{-0.9} \tag{8}$$

Minimal SDF-DFT exhibits clear advantages in both resource use and power consumption over the FFTs at point size greater than 2048, where resource use reduction ranges from 57% to 85%, and power reduction ranges from 31% to 91% (Fig 2B, 2D, 3B, 3D). This significant reduction is mainly attributable to the fact that the mSDF-DFT only calculates 0.01% to 0.8% of the frequency components relative to the FFTs.

Minimal SDF-DFT architectures outperformed Goertzel Filter in both non-RTC (Figure 2A,B) and RTC (Figure 3A,B) conditions except at larger bin sizes and transform lengths in which Goertzel begins to dominate. This is likely due to its efficient BRAM access patterns and storing minimal phase factors when the frequency bin indexes are fixed. However, mSDF-DFT showed clear advantages for small to moderate transform lengths and bin sizes relevant to low power and resource usage implementations. Taken together, this data suggests that  $N_B$  and transform sizes can be finely tuned to achieve minimal dynamic power and resource usage to facilitate transforms across a variety of application constraints. This data also suggests that mSDF-DFT is particularly advantageous when desired frequency bands

are known *a priori*, such as in adaptive DBS where  $\beta$  (13-30 HZ) and  $\gamma$  (30-100 HZ) LFP activity provide controllable biomarkers for Parkinson's disease[30], [31].

### B. mSDF-DFT Latency Evaluation

A fundamental tradeoff between mSDF-DFT and canonical FFT implementations is time complexity vs dynamic power and resource loads. Latency calculations were thus performed to quantify time tradeoffs vs mSDF-DFT  $N_B$ against canonical FFT methods. Latency of the mSDF-DFT was found to be  $N \times (5 \times N_B + 1)$  and  $N \times (4 \times N_B + 1)$  for the RTC and the fixed variants, respectively. For each sample, one clock cycle was required for initiation and  $\frac{5}{4}$  clock cycles were required to process each frequency bin. Similarly, Goertzel filter's latency was found to be  $N(4 \times N_B + 1) + N_B \times 4$ . The burst I/O FFT featured an approximate latency of  $16.8 \times N$  while both variants of the pipelined FFT featured an approximate latency of  $2 \times N$ . The mSDF-DFT displayed greater latency than the FFTs (Fig 4) compared to other FFT methods as expected. Minimal SDF-DFT latency displayed linear growth as a function of transform length across all  $N_B$ values. However, we observed that mSDF-DFTs can achieve comparable time performance to burst I/O FFTs with longer transform lengths and smaller  $N_B$ . Quantification of latency can therefore allow optimal choice of mSDF-DFT parameters to satisfy computational time costs across application. While the mSDF-DFT exhibits a worse asymptotic runtime complexity than FFT methods, mSDF-DFT transform time is dependent on the total number of frequency bins used  $(\mathcal{O}(N \times N_B))$ . Therefore, *a priori* knowledge of the frequency band of interest can significantly reduce algorithm latency. This is often the case with neural sensing applications, where the frequencies of interests are well-defined. For instance,  $\beta$  band signals have a period of around 50 ms. A response delivered within 5 ms of signal detection can be considered to be concurrent. Assuming an FPGA clock speed of 50 MHz, 1 kHz sampling rate ( $f_s$ ), and a frame size of 64, the SDF-DFT incurs 0.0064 ms latency, while burst I/O FFT and pipelined FFT yield 0.022 ms and 0.0027 ms, respectively. At a frame size of 1024, mSDF-DFT incurs 0.68 ms latency, while burst I/O FFT and pipelined FFT yield 0.34 ms and 0.04 ms, respectively. Crucially, fewer frequency bins need to be calculated at smaller frame size since the number of frequency bins that span a particular frequency band is:

$$N_B = \frac{Bandwidth \times N}{f_s} \tag{9}$$



Fig. 2. Comparison of non-runtime configurable mSDF-DFT and traditional FFT architectures. A,B. Heatmap of the resource-usage (A) and power (B) performance of the mSDF-DFT relative to traditional algorithms. At each combination of (the total number frequency bins  $(N_B)$ , point size(N)), the percent difference of the mSDF-DFT's dynamic power and resource consumption relative to the Goertzel, burst I/O FFT, and the pipelined FFT was calculated. The performance values of the FFTs are estimated with  $N_B$  equal to N. The combination  $(N_B = 64$ , Transform length = 32) was manually set to 0 as number of bins exceeds transform length. Blue shades indicate decreased resource and power consumption of mSDF-DFT relative to test algorithms, i.e. better performance. C,D. Line plots of performance for resource (C) and power (D) consumption. These plots represent the same data shown in A,B cast against mSDF-DFT  $N_B$  parameters of 4,8,16, and 32.



Fig. 3. Comparison of runtime configurable mSDF-DFT and traditional FFT architectures. A,B. Heatmap of the resource-usage (A) and power (B) performance of the mSDF-DFT relative to traditional algorithms. At each combination of (the total number frequency bins  $(N_B)$ , point size(N)), the percent difference of the mSDF-DFT's dynamic power and resource consumption relative to the Goertzel, burst I/O FFT, and the pipelined FFT was calculated. The performance values of the FFTs are estimated with  $N_B$  equal to N. The combination  $(N_B = 64$ , Transform length = 32) was manually set to 0 as number of bins exceeds transform length. Blue shades indicate decreased resource and power consumption of mSDF-DFT relative to test algorithms, i.e. better performance. C,D. Line plots of performance for resource (C) and power (D) consumption. These plots represent the same data shown in A,B cast against mSDF-DFT  $N_B$  parameters of 4,8,16, and 32.

Thus, assuming a constant bandwidth, the latency of the mSDF-DFT is much lower with smaller frame size.

### C. Evaluation of Online Neural Sensing

Finally, we evaluated the mSDF-DFT architecture performance and accuracy on LFPs recorded from an optical



Fig. 4. Characterization of latency between mSDF-DFT andFFT methods. Results show that mSDF-DFT throughput is less than FFT methods as expected. However, the choice of mSDF-DFT parameters can bring time performance to burst-I/O FFT levels.

thalamocortical deep brain stimulation application. Data was obtained from infrared stimulation of auditory thalamus with recordings made from microwire recordings of auditory cortex in the chronically implanted rat (Figure 5A). Stimulation of the ventral division of auditory thalamus drives excitatory responses in layers III/IV of auditory cortex through a single synapse (Figure 5B). Local field potential responses through this circuit have been extensively characterized, creating an opportune circuit by which to assess mSDF-DFT performance. A characteristic LFP driven by 3 mJ per pulse INS is shown in Figure 5C. A total of 10, 1 second LFP recordings were utilized. For each sample, eleven 128-point DFT were performed with no overlap, resulting in a total of N=14,080 DFT calculations utilized for accuracy analyses. A characteristic mSDF-DFT and ground truth FFT spectrums and spectral differences is shown in Figure 5D. D'Agostino and Pearson tests for normality[32] show that errors did not fit normal distributions (p<0.05), necessitating the use of mean and median measurements of absolute error as optimal summary statistics for mSDF-DFT error analysis[33]. Error analysis results are summarized in Table 4. The mean absolute error between measured mSDF-DFT and benchmark offline FFT was 2.01 dB with a standard error of the mean of 0.03 dB. The median of absolute error was found to be 1.11 dB, and the absolute maximum error was 22.2 dB. The relatively small mean absolute error suggests that the mSDF-DFT provides accurate low power and resource estimation of online neural signals. Sources of error are likely due to propagation of memory limitations and round off errors inherent to online systems, which can be mitigated by increasing word length[34]. Additionally, the mSDF-DFT architecture demonstrated superior resource and power efficiency when compared to benchmark architectures (Table 5) consistent with our observed power and resource utilization findings (Figures 2,3). A 33% dynamic power reduced and 4% resource utilization reduction over burst-I/O FFT was observed.

TABLE IV
SUMMARY STATISTICS FOR LFP MSDF-DFT AND BENCHMARK FFT

| Total Comparisons (N)           | 14080 |
|---------------------------------|-------|
| Mean Absolute Error (dB)        | 2.01  |
| Median Absolute Error (dB)      | 1.11  |
| Variance Absolute Error (dB)    | 14.7  |
| Standard Error of the Mean (dB) | 0.03  |
| Maximum Absolute Error (dB)     | 22.2  |

#### V. DISCUSSION

In this study, we describe the design and implementation of a power and resource efficient mSDF-DFT implementation. The algorithm implements the DFT in a completely serialized fashion. The algorithm is inverse-capable and possesses all the features of the SOTA Xilinx FFT IP cores. It is important to note that our data does show welldefined use cases for the mSDF-DFT. When minimization of power is the desired constraint, the mSDF-DFT is superior in applications with moderate bin counts and transform lengths and completely outperforms pipelined FFTs in power performance. Similarly, at moderate  $N_B$  to N given by Equation 7, the mSDF-DFT can outperform burst I/O FFT methods. Furthermore, the mSDF-DFT almost completely outperforms Goertzel filter in dynamic power, except at extremely small  $N_B$  to N ratio, where Goertzel filter's efficient memory access pattern is salient. Therefore, the area of best power performance can be characterized as moderate  $N_B$ to N ratio, which is precisely where neural sensing, among other medical devices, operates. Thus, mSDF-DFT offers high resolution spectral decomposition with minimal power consumption in any application where power constraints represent critical design parameters.

We also find use cases where the mSDF-DFT provides minimal resource usage. Although this metric (Equation 4) provides a holistic estimate of total system use, resource usage is defined as percentage utilization and is therefore FPGA dependent. Specifically, the different component categories, i.e., DSP, LUT, etc., are weighted inversely to their quantity present in the FPGA. Therefore, the results presented in resource consumption may not transfer to FPGA architectures with significantly different component composition.

While throughput for the mSDF-DFT is lower than massively parallelized methods such as pipelined FFT implementations, most power constrained applications aim to minimize with neural sensing and control systems. This is done by having sampling and throughput rates that adequately capture neural responses but at much lower speeds than typical FPGA clock speeds[35]. In a system that involves real-time signal acquisition and concurrent transform into the frequency domain, the maximum number of frequency bins that can be processed within the timeframe is given by:

$$N_{BMax} = \frac{f_{FPGA} \times (\frac{1}{f_s} - \frac{w}{f_{sclk}}) - 1}{4} \tag{10}$$

where  $f_{FPGA}$  is the FPGA clock frequency,  $f_s$  is the sampling frequency,  $f_{sclk}$  is the SPI frequency and w is the bit depth



Fig. 5. Application of the mSDF-DFT to closed-loop neural sensing. A,B. Recordings were obtained from a chronically implanted rat infrared neural stimulation (INS) preparation. Stimulating optical fibers were placed into auditory thalamus with recordings made from 16 channel microwire arrays implanted in auditory cortex. Thalamic afferents occur across a single synapse to cortex and thus represent direct thalamocortical entrainment from INS. A portion of the figure was constructed in BioRender (www.biorender.com) software. C. Example LFP waveform and time-frequency decomposition showing  $\beta$  and  $\gamma$  power vs time. D. Example histogram comparison of SDF-DFT and offline FFT power calculations show minimal error in calculation of  $\beta$  and  $\gamma$  power.

of the SPI transmission. The term  $\frac{w}{f_{sclk}}$  compensates for the SPI transmission time. Neural sensing typically involves a low sampling rate, e.g. 1 kHz for LFP recording, 100 Hz for intracranial EEG systems, and minimum 7 kHz for spike detection[36], [37]. At this sampling rate, assuming 50 MHz FPGA clock frequency and 1 MHz SPI frequency, 1600 to 124,000 frequency bins can be processed with the proposed mSDF-DFT, which is significantly above minimal requirements for online neural sensing and measurement applications.

As implemented, the mSDF-DFT provides accurate and low-resource usage DFT implementation. However, we do show that some small approximation error does exist in mSDF-DFT estimation which may compound if bin size and transform lengths grow large. It is possible to further reduce round off and propagation errors by implementing fault tolerance methods, such as online memory[38] or

algorithm based[39] fault tolerance into our mSDF-DFT architecture. However, fault correction methods will likely add to power and resource consumption, necessitating careful optimization of the accuracy-resource tradeoff. It was also observed that mSDF-DFTs are particularly poised for use in low to moderate transform length and bin size applications. We believe that more power and resource savings can be achieved with direct implementation of mSDF-DFT into application specific integrated circuits (ASICs) which would allow for full optimization of power consumption using only necessary resources providing best performance. Lastly, it is important to note that the mSDF-DFT does not only outperform Goertzel filter in power and resource utilization. It is well known that Goertzel filters exhibit numerical instability for fixed point arithmetic and long input sequences owing to its use of purely real digital filters for Fourier coefficient calculation[40]. This can be easily avoided by using true

 $TABLE\ V \\ SUMMARY\ OF\ MSDF-DFT\ PERFORMANCE\ AGAINST\ BENCHMARK\ ARCHITECTURES\ IN\ NEURAL\ SENSING$ 

|                      | mSDF-DFT | Burst I/O FFT | Goertzel Filter | Pipelined FFT |
|----------------------|----------|---------------|-----------------|---------------|
| LUT                  | 280      | 291           | 722             | 1169          |
| FF                   | 339      | 688           | 440             | 21941         |
| BRAM                 | 1        | 1             | 0               | 0             |
| DSP                  | 2        | 2             | 1               | 9             |
| Resource Utilization | 2.4      | 2.5           | 9.0             | 2.5           |
| Dynamic Power (mW)   | 8        | 12            | 13              | 52            |

DFT estimation of Fourier coefficients. Thus, the mSDF-DFT represents a promising high-performance alternative to the Goertzel filter .

While the proposed DFT is of interest for current clinical implementations of adaptive DBS, we envision this architecture able to facilitate the investigation of more complex control methods in using online encoding and decoding[41], [42], brain-machine interfaces utilizing spectral decomposition methods[43]–[45], or as a plug-in tool for other neural sensing and recording platforms necessitating online spectral estimation and/or closed-loop control[46], [47]. This architecture also extends generally to non-medical applications such as space systems and satellite instrumentation[48]–[50], wireless communication systems[51], and Fourier transform enabled deep learning accelerators[52], [53], or in any application where conserved power and resource usage is desired.

### VI. CONCLUSION

In this study, we show that a mSDF-DFT architecture can outperform benchmark FFTs and Goertzel filter and has well-defined use cases in biopotential sensing. Specifically, we observe a 33% reduction in dynamic power and 4% reduction in resource utilization in a neural sensing application when compared to SOTA burst I/O FFT. The mSDF-DFT has greater latency when compared to benchmark FFTs but achieves high accuracy transforms at state of the art low power consumption, making the mSDF-DFT a potent tool for neural sensing applications and beyond.

### CODE AND DATA AVAILABILITY

Raw and processed LFP data can be found at the following Open Science Framework data repository: https://osf.io/fb48z/.

### **DISCLOSURES**

R.Y., K.A.L, and B.S.C hold a provisional patent on the technology described in this manuscript. K.A.L. is a cofounder and equity holder for Neuronoff, Inc. K.A.L. is also a co-founder and equity holder of NeuraWorx. K.A.L. is a scientific board member and has stock interests in NeuroOne Medical Inc. K.A.L. is also a paid member of the scientific advisory board of Abbott and Presidio Medical, and a paid consultant for the HuMANNity, ONWARD and Restora Medical. H.D.O holds patents related to low power FFT implementations. H.D.O is also a consultant for Inspire Medical Systems. B.S.C is an unpaid scientific consultant for BECATech Inc.

#### ACKNOWLEDGMENTS

The authors would like to thank the following for helpful feedback for this manuscript: James Trevathan PhD, Suyash Bhatt PhD, and Claudia Krogmeier PhD. This study was supported by grants from the National Institutes of Health (NINDS #RF1-NS129955, PI: K.A.L.) and the Hilldale Undergraduate/Faculty Research Fellowship (University of Wisconsin-Madison, R.Y.)

### REFERENCES

- [1] J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series," *Mathematics of Computation*, vol. 19, no. 90, pp. 297– 301, 1965, ISSN: 0025-5718, 1088-6842. DOI: 10.1090/ S0025 - 5718 - 1965 - 0178586 - 1. [Online]. Available: https://www.ams.org/mcom/1965 - 19 - 090/S0025 -5718-1965-0178586-1/.
- [2] A. Oppenheim and C. Weinstein, "Effects of finite register length in digital filtering and the fast Fourier transform," *Proceedings of the IEEE*, vol. 60, no. 8, pp. 957–976, 1972, ISSN: 0018-9219. DOI: 10. 1109/PROC.1972.8820. [Online]. Available: http://ieeexplore.ieee.org/document/1450750/.
- [3] C. Weinstein, "Quantization Effects in Digital Filters," Massachusets Institute of Technology Lincoln Laboratory, Boston, MA, Tech. Rep. 468, 1969, p. 96. [Online]. Available: http://www.dtic.mil/get-tr-doc/pdf?AD=AD0706862.
- [4] H. Groginsky and G. Works, "A Pipeline Fast Fourier Transform," *IEEE Transactions on Computers*, vol. C-19, no. 11, pp. 1015–1019, Nov. 1970, ISSN: 0018-9340. DOI: 10.1109/T-C.1970.222826. [Online]. Available: http://ieeexplore.ieee.org/document/1671419/.
- [5] G. Bergland, "Fast Fourier transform hardware implementations—A survey," *IEEE Transactions on Audio and Electroacoustics*, vol. 17, no. 2, pp. 109–119, Jun. 1969, ISSN: 0018-9278. DOI: 10.1109/TAU.1969. 1162048. [Online]. Available: http://ieeexplore.ieee. org/document/1162048/.
- [6] I. E. Harmsen, G. J. Elias, M. E. Beyn, et al., "Clinical trials for deep brain stimulation: Current state of affairs," Brain Stimulation, vol. 13, no. 2, pp. 378–385, Mar. 2020, ISSN: 1935861X. DOI: 10.1016/j.brs.2019.11. 008. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1935861X19304668.

- [8] P. Brown, A. Oliviero, P. Mazzone, A. Insola, P. Tonali, and V. Di Lazzaro, "Dopamine Dependency of Oscillations between Subthalamic Nucleus and Pallidum in Parkinson's Disease," *The Journal of Neuroscience*, vol. 21, no. 3, pp. 1033–1038, Feb. 2001, ISSN: 0270-6474, 1529-2401. DOI: 10.1523 / JNEUROSCI.21-03-01033.2001. [Online]. Available: https://www.jneurosci.org/lookup/doi/10.1523/JNEUROSCI.21-03-01033.2001.
- [9] A. A. Kuhn, F. Kempf, C. Brucke, et al., "High-Frequency Stimulation of the Subthalamic Nucleus Suppresses Oscillatory Activity in Patients with Parkinson's Disease in Parallel with Improvement in Motor Performance," *Journal of Neuroscience*, vol. 28, no. 24, pp. 6165–6173, Jun. 2008, ISSN: 0270-6474, 1529-2401. DOI: 10.1523/JNEUROSCI.0282-08.2008. [Online]. Available: https://www.jneurosci.org/lookup/doi/10.1523/JNEUROSCI.0282-08.2008.
- [10] H. Bronte-Stewart, C. Barberini, M. M. Koop, B. C. Hill, J. M. Henderson, and B. Wingeier, "The STN beta-band profile in Parkinson's disease is stationary and shows prolonged attenuation after deep brain stimulation," *Experimental Neurology*, vol. 215, no. 1, pp. 20–28, Jan. 2009, ISSN: 00144886. DOI: 10. 1016/j.expneurol.2008.09.008. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0014488608003592.
- [11] C. Güttler, J. Altschüler, K. Tanev, et al., "Levodopa-Induced Dyskinesia Are Mediated by Cortical Gamma Oscillations in Experimental Parkinsonism," Movement Disorders, vol. 36, no. 4, pp. 927–937, Apr. 2021, ISSN: 0885-3185, 1531-8257. DOI: 10.1002/mds.28403. [Online]. Available: https://movementdisorders.onlinelibrary.wiley.com/doi/10.1002/mds.28403.
- [12] N. C. Swann, C. De Hemptinne, S. Miocinovic, *et al.*, "Gamma Oscillations in the Hyperkinetic State Detected with Chronic Human Brain Recordings in Parkinson's Disease," *The Journal of Neuroscience*, vol. 36, no. 24, pp. 6445–6458, Jun. 2016, ISSN: 0270-6474, 1529-2401. DOI: 10.1523/JNEUROSCI.1128-16. 2016. [Online]. Available: https://www.jneurosci.org/lookup/doi/10.1523/JNEUROSCI.1128-16.2016.
- [13] C. Sarica, C. Iorio-Morin, D. H. Aguirre-Padilla, et al., "Implantable Pulse Generators for Deep Brain Stimulation: Challenges, Complications, and Strategies for Practicality and Longevity," Frontiers in Human Neuroscience, vol. 15, p. 708 481, Aug. 2021, ISSN: 1662-5161. DOI: 10.3389 / fnhum.2021.708481. [Online]. Available: https://www.frontiersin.org/articles/10.3389/fnhum.2021.708481/full.

- [14] M. Hamed, F. Ba, O. Suchowersky, and T. Sankar, "Depleting implanted pulse generator (IPG) battery voltage is associated with worsening clinical symptoms in movement disorder patients receiving Deep brain stimulation (DBS)," *Clinical Parkinsonism & Related Disorders*, vol. 1, pp. 98–99, Nov. 2019, ISSN: 2590-1125. DOI: 10.1016/j.prdoa.2019.11.001. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8288559/.
- [15] D. J. Pederson, C. J. Quinkert, M. A. Arafat, *et al.*, "The Bionode: A Closed-Loop Neuromodulation Implant," *ACM Transactions on Embedded Computing Systems*, vol. 18, no. 1, pp. 1–20, Feb. 2019, ISSN: 1539-9087. DOI: 10.1145/3301310. [Online]. Available: https://dl.acm.org/doi/10.1145/3301310.
- [16] J. P. Wright, I. T. Mughrabi, J. Wong, et al., "A fully implantable wireless bidirectional neuromodulation system for mice," Biosensors and Bioelectronics, vol. 200, p. 113 886, Mar. 2022, ISSN: 09565663. DOI: 10.1016/j.bios.2021.113886. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0956566321009234.
- [17] Oh, Jung-yeol and Lim, Myoung-seob, "Fast Fourier transform processor based on low-power and areaefficient algorithm," in *Proceedings of 2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits*, Fukuoka, Japan: IEEE, 2004, pp. 198–201, ISBN: 978-0-7803-8637-2. DOI: 10.1109/APASIC.2004. 1349448. [Online]. Available: http://ieeexplore.ieee.org/document/1349448/.
- [18] K. Hanumantha Rao and C. K. C. Paul, "Efficient Implementation of Radix-4 Single pathDelay Feedback (SDF) FFT ProcessorsK," *Indian Journal of Science and Technology*, vol. 9, no. 12, Mar. 2016, ISSN: 0974-5645, 0974-6846. DOI: 10.17485/ijst/2016/v9i12/73128. [Online]. Available: https://indjst.org/articles/efficient-implementation-of-radix-4-single-path-delay-feedback-sdf-fft-processors.
- [19] Shousheng, He and M. Torkelson, "Designing pipeline FFT processor for OFDM (de)modulation," in 1998 URSI International Symposium on Signals, Systems, and Electronics. Conference Proceedings (Cat. No.98EX167), Pisa, Italy: IEEE, 1998, pp. 257–262, ISBN: 978-0-7803-4900-1. DOI: 10.1109/ISSSE.1998. 738077. [Online]. Available: http://ieeexplore.ieee.org/document/738077/.
- [20] Shousheng, He and M. Torkelson, "A new approach to pipeline FFT processor," in *Proceedings of International Conference on Parallel Processing*, Honolulu, HI, USA: IEEE Comput. Soc. Press, 1996, pp. 766–770, ISBN: 978-0-8186-7255-2. DOI: 10.1109/IPPS.1996.508145. [Online]. Available: http://ieeexplore.ieee.org/document/508145/.
- [21] P. T. L. Pereira, P. U. L. D. Costa, G. D. C. Ferreira, et al., "Energy-Quality Scalable Design Space Exploration of Approximate FFT Hardware Architectures," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 69, no. 11, pp. 4524–4534, Nov. 2022,

- [22] M. Safarpour and O. Silvén, "LoFFT: Low-Voltage FFT Using Lightweight Fault Detection for Energy Efficiency," *IEEE Embedded Systems Letters*, vol. 15, no. 3, pp. 125–128, Sep. 2023, ISSN: 1943-0663, 1943-0671. DOI: 10.1109/LES.2022.3212776. [Online]. Available: https://ieeexplore.ieee.org/document/9913652/.
- [23] G. Goertzel, "An Algorithm for the Evaluation of Finite Trigonometric Series," *The American Mathematical Monthly*, vol. 65, no. 1, pp. 34–35, 1958. [Online]. Available: http://www.jstor.org/stable/2310304.
- [24] M. Garrido, "A Survey on Pipelined FFT Hardware Architectures," *Journal of Signal Processing Systems*, vol. 94, no. 11, pp. 1345–1364, Nov. 2022, ISSN: 1939-8018, 1939-8115. DOI: 10.1007/s11265-021-01655-1. [Online]. Available: https://link.springer.com/10.1007/s11265-021-01655-1.
- [25] B. S. Coventry, G. L. Lawlor, C. B. Bagnati, C. Krogmeier, and E. L. Bartlett, "Characterization and closed-loop control of infrared thalamocortical stimulation produces spatially constrained single-unit responses," *PNAS Nexus*, vol. 3, no. 2, pgae082, Feb. 2024, ISSN: 2752-6542. DOI: 10.1093 / pnasnexus / pgae082.
- [26] A. D. Izzo, C. P. Richter, E. D. Jansen, and J. T. Walsh, "Laser stimulation of the auditory nerve," *Lasers in Surgery and Medicine*, vol. 38, no. 8, pp. 745–753, 2006, ISBN: 0196-8092 (Print)\n0196-8092 (Linking), ISSN: 01968092. DOI: 10.1002/lsm.20358.
- [27] J. Wells, C. Kao, K. Mariappan, *et al.*, "Optical stimulation of neural tissue in vivo," *Optics Letters*, vol. 30, no. 5, p. 504, Mar. 2005, ISSN: 0146-9592. DOI: 10. 1364/OL.30.000504. [Online]. Available: http://www.opticsinfobase.org/abstract.cfm?URI=ol-30-5-504.
- [28] B. S. Coventry, J. T. Sick, T. M. Talavage, K. M. Stantz, and E. L. Bartlett, "Short-wave Infrared Neural Stimulation Drives Graded Sciatic Nerve Activation Across A Continuum of Wavelengths," in 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), IEEE, Jul. 2020, ISBN: 978-1-72811-990-8. DOI: 10.1109/EMBC44109. 2020.9176177.
- [29] L. Pan, A. Ping, K. E. Schriver, A. W. Roe, J. Zhu, and K. Xu, "Infrared Neural Stimulation in Human Cerebral Cortex," *Brain Stimulation*, vol. 16, no. 2, pp. 418–430, 2023. DOI: 10.1016/j.brs.2023.01.1678.
- [30] S. He, F. Baig, A. Merla, et al., "Beta-triggered adaptive deep brain stimulation during reaching movement in Parkinson's disease," Brain, vol. 146, no. 12, pp. 5015–5030, Dec. 2023, ISSN: 0006-8950, 1460-2156. DOI: 10.1093 / brain / awad233. [Online]. Available: https://academic.oup.com/brain/article/146/12/5015/7222858.
- [31] S. L. Schmidt, A. H. Chowdhury, K. T. Mitchell, *et al.*, "At home adaptive dual target deep brain stimulation in Parkinson disease with proportional

- control," *Brain*, awad429, Dec. 2023, ISSN: 0006-8950, 1460-2156. DOI: 10.1093/brain/awad429. [Online]. Available: https://academic.oup.com/brain/advance-article/doi/10.1093/brain/awad429/7490826.
- [32] R. D'Agostino and E. S. Pearson, "Tests for Departure from Normality. Empirical Results for the Distributions of b 2 and b 1," *Biometrika*, vol. 60, no. 3, p. 613, Dec. 1973, ISSN: 00063444. DOI: 10.2307/2335012. [Online]. Available: https://www.jstor.org/stable/2335012?origin=crossref.
- [33] C. Willmott and K. Matsuura, "Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance," *Climate Research*, vol. 30, pp. 79–82, 2005, ISSN: 0936-577X, 1616-1572. DOI: 10.3354/cr030079. [Online]. Available: http://www.int-res.com/abstracts/cr/v30/n1/p79-82/.
- [34] R. Alt, "Error propagation in fourier transforms," *Mathematics and Computers in Simulation*, vol. 20, no. 1, pp. 37–43, Mar. 1978, ISSN: 03784754. DOI: 10.1016 / 0378 4754(78) 90052 6. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/0378475478900526.
- [35] J. Herron, S. Stanslaski, T. Chouinard, R. Corey, T. Denison, and H. Orser, "Bi-directional brain interfacing instrumentation," in 2018 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Houston, TX, USA: IEEE, May 2018, pp. 1–6, ISBN: 978-1-5386-2222-3. DOI: 10.1109 / I2MTC.2018.8409795. [Online]. Available: https://ieeexplore.ieee.org/document/8409795/.
- [36] J. Navajas, D. Y. Barsakcioglu, A. Eftekhar, A. Jackson, T. G. Constandinou, and R. Quian Quiroga, "Minimum requirements for accurate and efficient real-time on-chip spike sorting," *Journal of Neuroscience Methods*, vol. 230, pp. 51–64, Jun. 2014, ISSN: 01650270. DOI: 10.1016/j.jneumeth.2014.04.018. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0165027014001344.
- [37] K. A. Davis, S. P. Devries, A. Krieger, et al., "The effect of increased intracranial EEG sampling rates in clinical practice," Clinical Neurophysiology, vol. 129, no. 2, pp. 360–367, Feb. 2018, ISSN: 13882457. DOI: 10.1016/j.clinph.2017.10.039. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1388245717311471.
- [38] X. Liang, J. Chen, D. Tao, et al., "Correcting soft errors online in fast fourier transform," in *Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis*, Denver Colorado: ACM, Nov. 2017, pp. 1–12, ISBN: 978-1-4503-5114-0. DOI: 10.1145/3126908.3126915. [Online]. Available: https://dl.acm.org/doi/10.1145/3126908.3126915.
- [39] Sying-Jyan Wang and N. Jha, "Algorithm-based fault tolerance for FFT networks," *IEEE Transactions on Computers*, vol. 43, no. 7, pp. 849–854, Jul. 1994, ISSN:

- 00189340. DOI: 10.1109/12.293265. [Online]. Available: http://ieeexplore.ieee.org/document/293265/.
- [40] W. M. Gentleman, "An error analysis of Goertzel's (Watt's) method for computing Fourier coefficients," *The Computer Journal*, vol. 12, no. 2, pp. 160–164, Feb. 1969, ISSN: 0010-4620, 1460-2067. DOI: 10.1093/comjnl / 12 . 2 . 160. [Online]. Available: https://academic.oup.com/comjnl/article-lookup/doi/10.1093/comjnl/12.2.160.
- [41] I. Basu, A. Yousefi, B. Crocker, et al., "Closed-loop enhancement and neural decoding of cognitive control in humans," Nature Biomedical Engineering, vol. 7, no. 4, pp. 576–588, Nov. 2021, ISSN: 2157-846X. DOI: 10.1038/s41551-021-00804-y. [Online]. Available: https://www.nature.com/articles/s41551-021-00804-y.
- [42] B. S. Coventry and E. L. Bartlett, "Closed-Loop Reinforcement Learning Based Deep Brain Stimulation Using SpikerNet: A Computational Model," in 11th International IEEE EMBS Conference on Neural Engineering, Baltimore, Maryland USA, Apr. 2023, pp. 1–4. DOI: 10.1109/NER52421.2023.10123797..
- [43] V. G. Von Groll, N. Leeuwis, S. Rimbert, *et al.*, "Large scale investigation of the effect of gender on mu rhythm suppression in motor imagery brain-computer interfaces," *Brain-Computer Interfaces*, pp. 1–11, May 2024, ISSN: 2326-263X, 2326-2621. DOI: 10.1080/2326263X.2024.2345449. [Online]. Available: https://www.tandfonline.com/doi/full/10. 1080/2326263X.2024.2345449.
- [44] A. Kamble, P. Ghare, and V. Kumar, "Machine-learning-enabled adaptive signal decomposition for a brain-computer interface using EEG," *Biomedical Signal Processing and Control*, vol. 74, p. 103 526, Apr. 2022, ISSN: 17468094. DOI: 10.1016/j.bspc. 2022.103526. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1746809422000489.
- [45] A. Palumbo, F. Amato, B. Calabrese, et al., "An Embedded System for EEG Acquisition and Processing for Brain Computer Interface Applications," in Wearable and Autonomous Biomedical Devices and Systems for Smart Environment, A. Lay-Ekuakille and S. C. Mukhopadhyay, Eds., vol. 75, Series Title: Lecture Notes in Electrical Engineering, Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 137–154, ISBN: 978-3-642-15686-1 978-3-642-15687-8. DOI: 10.1007/978-3-642-15687-8\_7. [Online]. Available: http://link.springer.com/10.1007/978-3-642-15687-8\_7.
- [46] N. Lopresto, P. Cao, L. J. Koerner, and H. Orser, "Design of a Configurable 16-Electrode Sense and Stimulation Neuromodulation System," in 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia: IEEE, Jul. 2023, pp. 1–5, ISBN: 9798350324471. DOI: 10. 1109/EMBC40787.2023.10340821. [Online]. Available: https://ieeexplore.ieee.org/document/10340821/.
- [47] A. E. Mendrela, J. Cho, J. Fredenberg, *et al.*, "A Bidirectional Neural Interface Circuit With Active Stimulation

- Artifact Cancellation and Cross-Channel Common-Mode Noise Suppression," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 4, pp. 955–965, Apr. 2016, ISSN: 0018-9200, 1558-173X. DOI: 10.1109 / JSSC.2015. 2506651. [Online]. Available: http://ieeexplore.ieee.org/document/7370773/.
- [48] B. T. Fleming, K. C. France, J. Williams, *et al.*, "Highsensitivity far-ultraviolet imaging spectroscopy with the SPRITE Cubesat," in *UV, X-Ray, and Gamma-Ray Space Instrumentation for Astronomy XXI*, O. H. Siegmund, Ed., San Diego, United States: SPIE, Sep. 2019, p. 29, ISBN: 978-1-5106-2929-5 978-1-5106-2930-1. DOI: 10.1117/12.2529512. [Online]. Available: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11118/2529512/High-sensitivity-far ultraviolet imaging spectroscopy with the SPRITE-Cubesat/10.1117/12.2529512.full.
- [49] M. J. Persky, "A review of spaceborne infrared Fourier transform spectrometers for remote sensing," en, *Review of Scientific Instruments*, vol. 66, no. 10, pp. 4763–4797, Oct. 1995, ISSN: 0034-6748, 1089-7623. DOI: 10.1063/1.1146154. [Online]. Available: https://pubs.aip.org/rsi/article/66/10/4763/345985/Areview-of-spaceborne-infrared-Fourier-transform (visited on 02/10/2025).
- [50] F. Friedl-Vallon, T. Gulde, F. Hase, et al., "Instrument concept of the imaging Fourier transform spectrometer GLORIA," Atmospheric Measurement Techniques, vol. 7, no. 10, pp. 3565–3577, Oct. 2014, ISSN: 1867-8548. DOI: 10.5194 / amt 7 3565 2014. [Online]. Available: https://amt.copernicus.org/articles/7/3565/2014/.
- [51] Yu, Chu, Yen, Mao-Hsu, Hsiung, Pao-Ann, and Chen, Sao-Jie, "A low-power 64-point pipeline FFT/IFFT processor for OFDM applications," *IEEE Transactions on Consumer Electronics*, vol. 57, no. 1, pp. 40–40, Feb. 2011, ISSN: 0098-3063. DOI: 10.1109/TCE.2011. 5735479. [Online]. Available: http://ieeexplore.ieee.org/document/5735479/.
- [52] Z. Hu, S. Li, R. L. T. Schwartz, et al., "Batch processing and data streaming Fourier-based convolutional neural network accelerator," in Emerging Topics in Artificial Intelligence (ETAI) 2022, G. Volpe, J. B. Pereira, D. Brunner, and A. Ozcan, Eds., San Diego, United States: SPIE, Oct. 2022, p. 58, ISBN: 978-1-5106-5392-4 978-1-5106-5393-1. DOI: 10.1117/12.2633917. [Online]. Available: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/12204/2633917/Batch processing and data streaming Fourier based convolutional neural network / 10.1117 / 12. 2633917.full.
- [53] T. Abtahi, C. Shea, A. Kulkarni, and T. Mohsenin, "Accelerating Convolutional Neural Network With FFT on Embedded Hardware," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 26, no. 9, pp. 1737–1749, Sep. 2018, ISSN: 1063-8210, 1557-9999. DOI: 10.1109/TVLSI.2018.2825145. [Online]. Available: https://ieeexplore.ieee.org/document/8392465/.

**Richard Yang** is an undergraduate research assistant at the Wisconsin Institute for Translational Neuroengineering. He is currently pursuing a bachelor's degree in biomedical engineering and computer science at the

University of Wisconsin-Madison.

**Heather D. Orser** received her BSEE from Minnesota State University, Mankato and her MSEE and PhD from the University of Minnesota. She is currently an assistant professor of Electrical and Computer Engineering at the University of St Thomas, St. Paul, MN.

Prior to her time at St Thomas, Heather worked in the development of implantable neuromodulation systems at both Inspire Medical and Medtronic where she led the development of a number of next-generation systems and successfully assessed the safety of implantable devices for patients undergoing MRIs. Her research focuses on the development of neuromodulation systems for use in research and the clinic.

**Kip A. Ludwig** received his bachelor's degree in Biomedical Engineering from Arizona State University, and his Masters and PhD in Biomedical Engineering from the University of Michigan. He is currently an Associate Professor in the Departments of Neurological Surgery and Surgery at the University of Wisconsin-Madison, and the Co-Director for the Wisconsin Institute for Translational Neuroengineering (WITNe).

Prior to his time at UW-Madison, Dr. Ludwig worked in both industry and government. While at CVRx® he led the development of the 'Neo' electrode to treat hypertension and heart failure which has been subsequently FDA PMA approved. He also co-led the translational devices program at the National Institute for Neurological Disorders and Stroke, and led the trans-NIH translational neurotechology programs under the NIH SPARC and BRAIN Initiatives. His research focuses on accelerating the path to translation for next-generation devices to hack the nervous system to treat a variety of diseases/disorders inadequately managed by drugs/biologic.

Brandon S. Coventry received a bachelor's degree in electrical engineering from Saint Louis University, a master's degree in electrical and computer engineering from Purdue University, and a PhD in biomedical engineering from Purdue University. He is currently a postdoctoral research associate at the University of Wisconsin-Madison, WI USA in the Department of Neurological Surgery and the Wisconsin Institute for Translational Neuroengineering. His research interests include design of next generation implantable pulse generators, mechanisms of deep brain stimulation, translation of neurotechnologies, and thalamic neuroscience.