How much performance gain can we expect from WebAssembly? In this post I’ll try to compare performances between JavaScript and Rust compiled to WASM implementations of McLeod pitch tracking algorithm.
I remember a few years ago when I did some research about realtime pitch detection on Android, and back then Java was too slow for that. It should be done natively in C or C++ and invoked from Java via JNI. As I remember I tested a Java code found somewhere on the internet, and it wasn’t usable at all.
Seeing the TarsosDSP framework I wondered how that would work running on web compiled to WebAssembly. That would also make an interesting exercise since I started learning Rust language. So I rewrote it in Rust. But after that I wondered how would that work in plain JavaScript using a WebWorker. Are we going to gain much from WASM or not? Let’s see.
Note that this implementation is not production ready. There’s a web audio API to get frequency which is using native browser implemented FFT transformation and it should work faster than autocorrelation from this algorithm. And for FFT with WebAssembly, FFTW seems like the best choice.
Implementing McLeod pitch detection algorithm in Rust with reference from TarsosDSP
The hardest part was getting started. How do you write a pitch tracking algorithm? I’m a web developer, and I don’t have experience in DSP except from some Mathlab stuff we did at university. This Stack Overflow answer helped me get started. Idea is to write a simple sine wave generator, and get your pitch detector working with that, and then add some harmonics, and make sure pitch detector is still outputting the correct value.
I used synthrs crate to generate sine wave samples and created a simple custom Rust module on top of that with my logic for combining multiple sine waves as harmonics and changing volume over time.
My experience with Rust is very positive for now. I went through the first half of the official book which is enough for now, for this type of application.
After synth module, I tried playing those samples - they sounded similar to a string instrument. So synth module is working fine. It’s time to write some tests.
I wrote a few tests, but eventually removed them leaving only one that tests multiple samples ranging from 100Hz to 1KHz with steps of 100Hz. It’s testing all of the outputs from the pitch detector are correct with 0.3Hz +/- difference. It may be better to use percentage of the tested frequency for the threshold…
After having a working detector, it was time to test performance. Rust has micro benchmark tool integrated in Cargo package manager.
I added benchmarks (with different sample sizes - 1024, 2048, 4096 and 8192) for NSD function which is the slowest one here. The function has exponential complexity, so using 1024 sample buffer size seems like the best choice even if it will be inaccurate for lower tones. 512 buffer size would make it impossible to detect lower E string of a guitar. For more information about the algorithm check A Smarter Way to Find Pitch paper by Philip McLeod Geoff Wyvill.
Running my Rust program in web browser
There are two targets for compiling Rust programs to WebAssembly - wasm32-unknown-emscripten and wasm32-unknown-unknown. I tried both, but ended with Emscripten because it’s easier to use.
Compiling it by the instructions from Hello Rust website didn’t work for me. Rustc didn’t recognize my local crate, and I couldn’t make it work. After a lot of disappointment and web searches I got it compiling with cargo and cargo build --target=wasm32-unknown-emscripten command.
The most difficult part was with memory. I didn’t know how to pass a JavaScript Float32Array to my WebAssembly function, until I discovered the Module.HEAPF32
object. It’s hard to find documentation about this because it’s new technology
Sending a Float32Array to a Rust function would be something like this:1
2
3
4
5
6
7
8let bufferSize = 1024;
let data = new Float32Array(bufferSize);
let ptr = Module._malloc(bufferLength * data.BYTES_PER_ELEMENT);
// shift by 2 to divide by data.BYTES_PER_ELEMENT
Module.HEAPF32.set(data, ptr >> 2);
Module._my_rust_func(ptr, bufferSize);
And to read it in Rust:1
2
3
4
5
6
7
pub fn my_rust_func(ptr: *mut usize, buffer_len: usize) {
let buffer: Vec<f32>;
unsafe {
buffer = Vec::from_raw_parts(ptr as *mut f32, buffer_len, buffer_len);
}
}
Eventually I got it to compile, and to read my Web Audio JavaScript array from my microphone, and to output the right frequency, but it was very slow. My CPU went to 100% and remained there the whole time the page was open. I couldn’t believe how slow it was. My micro benchmarks were giving me a whole different sense of performance. I knew it couldn’t be as fast as that one when compiled to WebAssembly, but I had hoped for something close. My test for 1024 buffer size for NSD function was around 1ms, and being the slowest function here, my guess was that my program should run in less than 3ms. With 44.100Hz sampling rate there’s 23ms between two runs. So why is it hitting 100% CPU time?! Because it was running in debug mode. Adding a --release argument to my command for compiling fixed the issue, and now it’s running at around 30-35% in Chromium on my laptop.
But how much faster is this WebAssembly pitch detector compared to the same algorithm implemented in JavaScript WebWorker?
To find it out, I did a rewrite of Rust implementation into JavaScript, and created a simple web page with both implementations. You can have a look here.
In Chromium 63 WASM needs around 4.4ms to run for 1024 buffer, and in JavaScript it’s around 35ms for the same buffer - around 8x improvement. WASM in Chrome for Android has similar improvements.
In Firefox 58 beta 12 WASM takes around 3.8ms, and 90ms for JavaScript - around 23x improvement.
V8 is really fast in running this JavaScript implementation, but really slow compared to WASM.
I think that using C or C++ instead of Rust for this algorithm would result in similar performance. I didn’t choose Rust for this project - I started this project to experiment with Rust which I’m currently learning.
There was a similar comparison at Google Chrome Dev Summit with WebSight demo.
Conclusion
In this blog post I compared performance between JavaScript and Rust compiled to WASM to find out how much improvement we can expect in computationally intensive algorithms running in WebAssembly. It’s much faster - around 8 times faster than one of the fastest JavaScript engines there (I’m referring to V8, correct me if I’m wrong).
But WebAssembly is still in early phase. At the time of writing this blog post, it was supported by all major browsers, but it still lacks some very important features like multithreading support and garbage collector (which you don’t need if you choose Rust, C, or C++). I’m not sure if debugging in browser is possible at all, but if it is, it must be very hard without source maps support.
If you find a way to optimize Rust or JavaScript implementation, let me know, or send a pull request. Complete code for this project is available at my Github. https://github.com/bojan88/WASM-vs-JS-Pitch-detector
Thanks for reading. I hope this post will help someone make sense of what to expect from WebAssembly, or help solve similar problems I had.