New firmware for the Freak!

Over the past few months, I’ve made progress developing a new testing platform to help verify the changes I made to the Freak firmware.

In this blog post, I’ll discuss the improvements in this release. Since most of the work focuses on code stability and robustness, I'll present these updates alongside the new testing platform I’ve been building.

Firmware versions 3.3 and 3.4 introduced bugs due to major changes I made. At my day job, we have a dedicated tester who thoroughly stresses every update, often finding edge cases developers might overlook. He’s also responsible for maintaining the testing infrastructure — a server room full of computers running tens of thousands of tests across all supported platforms. Thanks to this setup, we deliver high-quality products with relatively few bugs.

At Vult, I don’t have access to such large-scale resources. Initially, the project's size didn't seem to justify a big investment in testing. Manual testing was usually enough and could be done relatively quickly.

This approach worked fine for the Freak firmware — until I made a significant number of changes. I replaced a large portion of Lua code, which handled user interaction, with Vult code. The motivation behind this change was that Lua consumed a significant amount of memory. In contrast, the Vult code is much faster and more efficient, freeing up resources for new features.

After the issues I described in earlier posts, several of you reached out with valuable advice, especially regarding testing strategies. Thanks to your feedback, I decided to start developing a proper testing platform for the Freak and any future projects.

The platform is split into two parts:

One for testing code running on a computer
One for testing code running on the actual hardware module.

Testing code changes

Testing code on a computer is relatively straightforward. I can simulate the software side by providing input values, executing the code, and comparing the output to expected results. However, since we are dealing with audio, we need to run the code and capture a meaningful amount of data, usually as small WAV files. These recordings are then compared against reference files — if they differ beyond an acceptable margin, the test fails.

Comparing audio files has its challenges. A naive method would be to check each data point and flag any differences. But even small, acceptable changes, like a slight volume adjustment, would cause this method to fail. To avoid this, we can introduce a tolerance margin. For example, if two points differ by less than 1%, the test is considered successful.

Still, simple point-by-point comparison doesn’t handle well differences like phase shifts or changes in harmonic content. This variations can create large variations in the waveform but they result in negligible changes to the sound. To address this, I also compare the audio in the frequency domain using FFT analysis. This lets me focus on meaningful differences — like changes in audible harmonics — while ignoring irrelevant variations.

Each model in the Freak must be tested individually, across a range of parameter values. Thankfully, all models rely on just three main parameters. Testing 10 different values for each parameter results in about 1000 combinations per model.

To create the reference data, I used version 3.2 of the firmware — the last "stable" version before the recent fixes. I wrapped each model in a script that automatically feeds inputs and varies parameters, generating about 600 MB of WAV files. Then, I repeated the process using the latest code to generate new output files.

To process and compare the files, I used Mathematica, which is ideal for this type of task thanks to its powerful built-in functionality.

Generating a complete report with data analysis and images takes around one minute. For models where all tests match, the result is a very "boring" looking histogram — exactly what you want to see when everything is working correctly!

This means that across all 1000 test points, the maximum deviation for any harmonic is less than 0.05 in linear amplitude.

There's a case where I modified Tangents-XX in order to fix an issue that caused the filter to stop producing sound. To resolve this, I had to modify the nonlinear solver, which introduced minor simulation differences under certain operating conditions.

We have around 300 test points where the difference exceeds 0.05. In these cases, I generate plots to better visualize the discrepancies.

For example, the plot below shows a case where the difference is visually small, but one harmonic still crosses the threshold. The graph displays the corresponding values of Cutoff, Resonance, and Drive at that point. By using these parameter values, I can replay the audio produced by both the old and the new code, listen carefully, and decide whether the variation is acceptable or if it indicates a deeper issue that needs further investigation.

The plot below highlights one of the larger differences caused by the changes in the solver. In this case, the new results are actually more accurate.

When comparing the audio from both versions, the differences were extremely subtle — barely noticeable to the ear. However, the new solution is mathematically better, and more importantly, the solver no longer crashes under these conditions.

This method is excellent for ensuring that the models remain consistent and for measuring the impact of changes in auxiliary libraries. However, it’s not very effective at detecting edge cases, since the input signal remains constant and we only check a set of fixed points.

For example, in the 1000 test points run on Tangents-XX, none of them triggered a crash, so this method didn’t help uncover the instability issue.

To catch these kinds of problems, I use a different testing approach based on randomness.

Random Testing

When I receive a bug report for the Freak module, I usually ask for as much information as possible — ideally including a video showing exactly how the problem occurred. With that information, I try to reproduce the error so I can identify the cause and fix it. However, sometimes I’m unable to trigger the exact conditions needed for the issue to appear.

To catch bugs more proactively, I’ve been using a different method inside VCV Rack. I created a simple module called Stress, whose sole purpose is to generate random signals in different ways — using noise, standard waveforms, signals with varying harmonic content, and stepped signals. When combined with the Stoermelder module Strip, it’s possible to perform a kind of “Monkey Testing,” where all the knobs on a module are randomly moved and all kinds of unpredictable signals are injected into every input.

Here’s a short video demonstrating this approach:

Stressing the Freak module in VCV Rack

I can leave this setup running for as long as needed and check afterward if the module experienced any failures. This method has already helped me discover issues in other modules. However, while it tells me if a module crashes, it doesn’t reveal how to reproduce the problem.

To improve on this, specifically for the Freak firmware, I wrapped the code inside a function that performs random testing while capturing more detailed information. Every 128 samples, the function takes a snapshot of the model’s internal state. If an error occurs, the test automatically stops and saves all the data needed to recreate the issue during a debug session.

For this method to work, both the Freak firmware and the test patterns must be fully deterministic — meaning they behave the same way each time — so that the exact steps leading to the failure can be repeated.

Using this approach, I was able to fix an issue in the Nurage model — a problem that no one had reported yet and that I hadn’t been able to trigger through regular testing.

Testing Changes on the Hardware

So far, all testing has been done using the unmodified Freak code compiled for my daily-use computers — either an ARM-based Mac or an AMD64 Linux machine. However, the actual Freak module runs on an ARM Cortex-M4 processor. Even though both are ARM architectures, there are significant differences between them. Additionally, the compilers are different: GCC for the Cortex-M4 and Apple Clang for the Mac M1. Because of these differences, there’s no guarantee that the code will behave the same way on the actual hardware.

To gain more certainty, I need to run tests directly on the hardware itself. This is more challenging because the microcontroller has limitations — I can’t simply load all the testing code onto the module, run the tests, and transfer the results back.

The best approach is to create an automated system that can control a Freak module and perform tests. To achieve this, I designed a special board that acts like a "virtual user," capable of turning knobs, sending signals, and pressing buttons.

Here’s a photo of one of the early prototypes:

The board includes several digital potentiometers, electronically controlled switches, and both analog (low-frequency) and audio inputs and outputs. It connects to a special version of the Freak module that has been modified with external connectors. Using this setup, I can remotely adjust any setting on the Freak, send audio signals, and capture the output for analysis.

This part of the testing platform is still a work in progress. There are several features I still want to add, and I haven’t yet completed the result processing system. However, I expect to have everything ready in the coming months.

Once this new board is fully operational, I’ll be able to use it not only for testing Freak but also for any upcoming modules.

Performance Testing

As part of the testing platform, I also included performance measurements. During each test, I collect timing data for every model, allowing me to compare their performance against a reference version.

After processing the results, I generate plots like the one below, which show how the new code compares to the reference.

The issue with this graph is that the measurements aren’t completely reliable. There are many factors that can affect timing results: the operating system and processor may behave slightly differently depending on other tasks running in the background.

We can see an example of this in the measurements for “Debriatus_1,” which appears much slower than the reference — even though no changes were made to its code.

One way to improve reliability would be to run the performance tests many times and perform a statistical analysis on the results. However, instead of taking that approach, I plan to run the performance tests directly on the hardware module. Testing on the actual device will give me a much clearer and more accurate picture of how the code behaves in real-world conditions.

The New Firmware is Available

The latest version of the Freak firmware is now available for download.As mentioned earlier, this release focuses on improving stability and robustness. If you've experienced crashes with your module, this update is highly recommended.

This update is also essential for users with newer revisions of the Freak module, as some internal components have changed. If you've noticed any strange behavior with your module, please make sure to update.

One important improvement that may be a big deal for some users is the new bootloader.

Improvements to the Bootloader

If you're unfamiliar, the bootloader is a special program that runs when you update the Freak firmware. It handles receiving the audio file and overwriting the old firmware. The bootloader is designed to always be available — even if an update fails, you can still access it and try again.

Until now, the bootloader hadn't changed since the first Freak modules shipped with firmware version 1.0. Over time, I collected a few improvements I wanted to make. The most important one is that the new bootloader is more reliable when updating with lower audio signal levels. In fact, I can now update my module directly from my computer’s audio output without any issues.

Starting with this release, the bootloader will automatically update the next time you install new firmware.If you're unsure which bootloader you have, you can check during firmware update mode: the screen will display v2.0 for the new version.

Where to Get the New Firmware

You can find the latest firmware on the Freak product page:https://vult-dsp.com/vult

You can also find a collection of all previous firmware releases on the Freak GitHub page.If you have a GitHub account, you can subscribe to the repository to get notified whenever new updates are available.

What About Vraids?

It’s been a while since I last updated the Vraids firmware.I’m currently preparing a new release, mainly focused on making it compatible with the latest revisions of the Freak hardware. If you recently purchased a Freak module, please note that the old Vraids firmware might not work correctly until this update is available.

VULT

obsessive modeling