Dissecting wsprd

July 9, 2023 29 minutes read

Purpose

The goal of this article is to pick apart exactly how wsprd works. There seems to be plenty of things written about the WSPR encoding process, less things written about the decoding process, and frankly not a whole lot about how the tool bundled with WSJT-X actually does it. So in the UTSL spirit, I present a dissection of wsprd.

I will point out that this is going to be a lengthy article, feel free to take it in chunks. I’ve included a Structure section to act as a table of contents if you want to jump around. It also helps to have the source for wsprd.c open on another screen to follow along.

Resources

Throughout the course of this exploration I found some great resources on how WSPR works. I’ve added a few of them here:

Reproducably Building WSJTX

If you grab the source code for WSJTX and read the INSTALL doc, you’ll quickly find that quite a few dependencies are required to build it:

INSTALL from WSJTX source

To  build WSJT-X  from sources  you need  some prerequisite  tools and
libraries.

On Linux:

   build-essential
   gcc-4.8.2 or clang-3.4 or newer
   g++-4.8.2 or clang-3.4 or newer
   gfortran-4.8.2 or newer
   CMake-2.8.9 or newer
   git
   asciidoc
   asciidoctor
   texinfo

   Also   qtmultimedia5-dev,    libqt5serialport5-dev,   qttools5-dev,
   qttools5-dev-tools,         libboost-all-dev,         libfftw3-dev,
   libreadline-dev,      libusb-1.0-0-dev,       libudev-dev,      and
   portaudio19-dev.   Note that  these are  Debian dpkg  style package
   names, other  distributions will  have different package  names and
   package contents.   For example  on RedHat RPM  style distributions
   the  packages   will  be  something   like  qt5-qtmultimedia-devel,
   qt5-qtserialpport-devel,  qt5-linguist,   boost-devel,  fftw-devel,
   readline-devel, systemd-devel, libusb-devel, and portaudio-devel.

It also doesn’t have all of the source code in that package you downloaded. By default the CMake build system clones a few repos to grab the actual sources that will be used including a modified version of HamLib and the actual WSJTX code.

The make things a little easier on myself, I started by making a Dockerfile and a few shell scripts that could easily build the wsprd component of WSJT. They can be found here. Now just by running init.sh I had an environment where I could modify wsprd.c locally, compile it with make.sh and test it on a sample WAV with run.sh.^[1]

readwavfile()

When you jump into the wsprd.c code the first thing you’ll see is that it can read input from a c2 file or a WAV file. In our case, we’re reading from a WAV file so it makes sense to start with the readwavefile() function. Fortunately, there’s been some excellent examination of this step that can be found in the README.md for the wspr-cui project.

I don’t want to repeat too much of what the wspr-cui documentation covers, so here is the executive summary of what this function does:

Read a 2 minute (actually 114 seconds) or 15 minute WAV file with a sample rate of 12000 Hz
Scale the signed short int values to a float between -1 and 1
Generate a real-to-complex FFT with 1474560 points (uses FFTW, 1474560 is 46080 x 32)
Pull out a 375 Hz subset of the FFT centered around 1500 Hz and copy it into an array of 46080 complex numbers. FFTW uses standard "in-order" output ordering and we must preserve that for the reverse FFT. The first half of this array is 1500 Hz to 1688 Hz (positive offset) and the second half is 1313 Hz to 1500 Hz (negative offset).
Perform a complex-to-complex, reverse FFT on the above array to get I/Q data
Scale down the I/Q data by 1000 as FFTW computes an unnormalized transform.

If you look at the wspr-cui documentation or read through the writec2file() you’ll see that a c2 file is just this I/Q data with a header. For the purposes of calculating doppler spread we could take this data, either in wsprd or by reading a c2 file with an external tool, take a FFT, identify the signals, and determine the spread. The problem I foresee is that we’d be replicating some of what wsprd already does but identifying the signals ourselves. Let’s dig deeper and see how wsprd actually finds the signals in this I/Q data.

Windowed FFTs

readwavfile() returned time domain data in the idat and qdat. To tease out WSPR signals we want to see how the frequencies of the signal are changing over time. To do that we can designate a time window over which we look at the frequency components of the signals.

From a discrete perspective this simply means taking a lot of small FFTs incrementally over the 46080 samples of data we have. wsprd uses 360 FFTs each with a width of 512 points to cover the entire spectrum. Each window has the ends tapered using a Hanning function (sine) to prevent strange results from the FFT due to discontinuities at the boundaries. Let’s take a look at the first window used:

You can see the index covers 0 to 512 and the real component of the time series is shown in purple. The windowing function, the first half of a sine wave, is shown in green and varies from 0 to 1 and back. Finally the two are multiplied, producing the blue series which tapers off at the edges.

An FFT of the windowed data is performed and the power is calculated as the real component squared plus the imaginary component squared. To account for the "in-order" output ordering the negative frequency offsets are done first, 256-511, followed by the positive frequencies offsets, 0-255. At this point we have 360 power spectrums that cover the I/Q data. Each one covers two symbols and there is a step of half a symbol between them.

Smoothing

At this point, wsprd does some processing to make it easier to pick candidates. First it goes through each power spectrum and sums the power in each index. This gives you a single array, psavg, that holds the total power for that set of frequencies (an index for an FFT bin). It is important to note that the FFTs were windowed across time, so this psavg represents the power of frequency components for the whole 2 minute transmission.

wsprd Average Spectrum Calculation

// Compute average spectrum (1)
for (i=0; i<512; i++) psavg[i]=0.0;
for (i=0; i<nffts; i++) {
    for (j=0; j<512; j++) {
	psavg[j]=psavg[j]+ps[j][i];
    }
}

1	While not technically the average, the author saves an expensive division operation knowing that this is really just being used to pick large signals. The relationship between this number for different frequencies matters more than whether or not it’s actually an average.

wsprd then takes the center 411 power numbers (+/- 150 Hz) and smooths them by iterating through the array and summing the power at i, the previous 3 powers, and the following 3. Let’s take a look at the effect of this windowing step:

The original sum of powers is shown in purple, with the smoothed sum of powers shown in green. As you can see, the large signals are amplified and the transient signals are minimized. Also note that the green plot does not cover as many frequencies.

Noise

Once the power spectrum is smoothed, each frequency bin is sorted from low to high and the 123rd power is pulled out to set the noise floor. As mentioned in the comments 122/410 corresponds to the 30th percentile.

SNR changes depending on the bandwidth over which is is measured and the author of wsprd notes that in the comments.

/* Renormalize spectrum so that (large) peaks represent an estimate of snr.
 * We know from experience that threshold snr is near -7dB in wspr bandwidth,
 * corresponding to -7-26.3=-33.3dB in 2500 Hz bandwidth.
 * The corresponding threshold is -42.3 dB in 2500 Hz bandwidth for WSPR-15. */

Whereas we are working within 300 Hz of, it is more customary in amateur radio to assess SNR in the 2500 Hz filter used for SSB. Pieter-Tjerk de Boer, PA3FWM, has an interesting article about this too. wsprd divides each power in the smoothed power spectrum by the noise floor, called noise_level, and subtracts one to convert it to SNR. If an SNR is lower than 1E-9, it is set at 1E-10.

At this point SNR is just a ratio, but eventually we will want it expressed in dB within 2500 Hz bandwidth. wspr sets snr_scaling_factor accordingly so we can use it once we have our candidates.

Candidates

Now that we’ve got got an array of SNR values corresponding to the frequencies in the WSPR signal we can identify candidates to attempt to decode. wsprd will try up to 200 candidates (by default) keeping track of freq, snr, shift, drift, and sync for each one:

struct cand { float freq; float snr; int shift; float drift; float sync; };
struct cand candidates[200];

wsprd performs a basic local maxima search that iterates through each value and checks if the previous value and the following value are each smaller. If they are, this value is a candidate. Candidates outside of +/- 110 Hz are thrown out (there is an option to widen this to +/- 150 Hz). Let’s take a look at some of the candidates that were found on the first pass:

I only labelled the first four candidates, but ten were found in total. wsprd bubble sorts the candidates based on SNR. Here they are in order:

Initial candidates:
  index=0 freq=2.929688 snr=-1.020897
  index=1 freq=-10.986328 snr=-5.870845
  index=2 freq=-53.466797 snr=-8.514095
  index=3 freq=86.425781 snr=-10.655240
  index=4 freq=-39.550781 snr=-14.824939
  index=5 freq=30.029297 snr=-18.309950
  index=6 freq=17.578125 snr=-20.642868
  index=7 freq=94.482422 snr=-25.315132
  index=8 freq=-99.609375 snr=-33.305218
  index=9 freq=-95.947266 snr=-33.418915

Coarse Estimates

One of the coolest features of wsprd is its ability to pull spots even if the timing or frequency is off. Likewise it has the ability to account for drift if the frequency is not stable. The segment of code we’re now going to analyze is the first part in that process. It’s important to note that we have lots of variables here, but ultimately all wsprd is doing is cycling through all possibilities to find the best. This is a good thing, because I don’t think I have multivariate regression in me today.

wsprd.c starting on line 1134

/* Make coarse estimates of shift (DT), freq, and drift

 * Look for time offsets up to +/- 8 symbols (about +/- 5.4 s) relative
 to nominal start time, which is 2 seconds into the file

 * Calculates shift relative to the beginning of the file

 * Negative shifts mean that signal started before start of file

 * The program prints DT = shift-2 s

 * Shifts that cause sync vector to fall off of either end of the data
 vector are accommodated by "partial decoding", such that missing
 symbols produce a soft-decision symbol value of 128

 * The frequency drift model is linear, deviation of +/- drift/2 over the
 span of 162 symbols, with deviation equal to 0 at the center of the
 signal vector.
 */

int idrift,ifr,if0,ifd,k0;
int kindex;
float smax,ss,pow,p0,p1,p2,p3;
for(j=0; j<npk; j++) {                              //For each candidate...
    smax=-1e30;
    if0=candidates[j].freq/df+256;
    for (ifr=if0-2; ifr<=if0+2; ifr++) {                      //Freq search
	for( k0=-10; k0<22; k0++) {                             //Time search
	    for (idrift=-maxdrift; idrift<=maxdrift; idrift++) {  //Drift search
		ss=0.0;
		pow=0.0;
		for (k=0; k<162; k++) {                             //Sum over symbols
		    ifd=ifr+((float)k-81.0)/81.0*( (float)idrift )/(2.0*df);
		    kindex=k0+2*k;
		    if( kindex < nffts ) {
			p0=ps[ifd-3][kindex];
			p1=ps[ifd-1][kindex];
			p2=ps[ifd+1][kindex];
			p3=ps[ifd+3][kindex];

			p0=sqrt(p0);
			p1=sqrt(p1);
			p2=sqrt(p2);
			p3=sqrt(p3);

			ss=ss+(2*pr3[k]-1)*((p1+p3)-(p0+p2));
			pow=pow+p0+p1+p2+p3;
		    }
		}
		sync1=ss/pow;
		if( sync1 > smax ) {                  //Save coarse parameters
		    smax=sync1;
		    candidates[j].shift=128*(k0+1);
		    candidates[j].drift=idrift;
		    candidates[j].freq=(ifr-256)*df;
		    candidates[j].sync=sync1;
		}
	    }
	}
    }
}

We’ll approach this snippet from the inside out.

Hector the Sync Vector Detector

At the core of these nested loops is this equation:

ss=ss+(2*pr3[k]-1)*((p1+p3)-(p0+p2));

This equation is so cool that I think it should have a name. I’ve chosen to call it Hector.

pr3 is the sync vector, a series of bits that all WSPR messages will have set. Since WSPR is FSK-4, it can represent up to two bits per symbol and uses four frequencies. The power level of each frequency is stored in p0, p1, p2, and p3. This equation is used to sum up how well each symbol matches the 162 bits of the sync vector. At this stage it is our primary metric for how well the time and frequency shifts match the signal.

To better understand this, let’s look at a simplified example of merging four bits with the sync vector. Let’s say that we want to encode 0 1 1 0. We have the first four bits of the WSPR sync vector, 1 1 0 0, which is an agreed upon value that everyone is using. We would go bit-by-bit shifting our bit one to the left and adding the sync vector bit to get our symbol:

0<<1 + 1 = 01
1<<1 + 1 = 11
1<<1 + 0 = 10
0<<1 + 0 = 00

Now let’s go symbol by symbol through a perfect transmission and see how our detector works. For each symbol this is what we would hope to receive:

01: p0 = 0, p1 = 1, p2 = 0, p3 = 0
11: p0 = 0, p1 = 0, p2 = 0, p3 = 1
10: p0 = 0, p1 = 0, p2 = 1, p3 = 0
00: p0 = 1, p1 = 0, p2 = 0, p3 = 0

Notice how each frequency shift is used to represent one symbol. We can’t have more than one frequency shift in a given time window. Also, I’m using zeros and ones to demonstrate but in actuality the values of p0, p1, p2, p3 would be power levels.

Let’s see what the wsprd detector would do with this:

First symbol: 01

p0 = 0, p1 = 1, p2 = 0, p3 = 0
sync_vector_bit = 1 (1)
(2 * sync_vector_bit - 1) * ((p1 + p3) - (p0 + p2))
(2 * 1 - 1) * ((1 + 0) - (0 + 0))
1 * 1
1

Second symbol: 11

p0 = 0, p1 = 0, p2 = 0, p3 = 1
sync_vector_bit = 1
(2 * sync_vector_bit - 1) * ((p1 + p3) - (p0 + p2))
(2 * 1 - 1) * ((0 + 1) - (0 + 0))
1 * 1
1

Third symbol: 10

p0 = 0, p1 = 0, p2 = 1, p3 = 0
sync_vector_bit = 0
(2 * sync_vector_bit - 1) * ((p1 + p3) - (p0 + p2))
(2 * 0 - 1) * ((0 + 0) - (0 + 1))
-1 * -1
1

Fourth symbol: 00

p0 = 1, p1 = 0, p2 = 0, p3 = 0
sync_vector_bit = 0
(2 * sync_vector_bit - 1) * ((p1 + p3) - (p0 + p2))
(2 * 0 - 1) * ((0 + 0) - (1 + 0))
-1 * -1
1

1	These values are looked up from the known sync vector: `pr3`

Not to belabor the point, but what would happen if we got the symbol wrong? Let’s take a look at the first symbol, but we’ll use the wrong p values. In the first scenario we’ll have multiple frequencies and in the second will have just one, but it’ll be incorrect:

First symbol: 01 (p0 erroneously set)

p0 = 1, p1 = 1, p2 = 0, p3 = 0
sync_vector_bit = 1
(2 * sync_vector_bit - 1) * ((p1 + p3) - (p0 + p2)
(2 * 1 - 1) * ((1 + 0) - (1 + 0))
1 * 0
0

First symbol: 01 (wrong freq entirely)

p0 = 0, p1 = 0, p2 = 1, p3 = 0
sync_vector_bit = 1
(2 * sync_vector_bit - 1) * ((p1 + p3) - (p0 + p2)
(2 * 1 - 1) * ((0 + 0) - (0 + 1))
1 * (-1)
-1

Drift Detection

So now that we know how the sync vector detector is working, lets take a look at the loops around it. The inner-most loop sums up the results of our detector over all 162 symbols while accounting for drift. The loop immediately outside of that one iterates over all of our drift values. For the first and second passes (yes there are loops around this whole thing outside of this one) utilize a maxdrift of 4. This means that the inner-most loop will try the drifting from 2 to -2, 1.5 to -1.5, 1 to -1, 0.5 to -0.5, 0 to 0, -0.5 to 0.5, -1 to 1, -1.5 to 1.5, and -2 to 2 (all measured in Hertz).

The drift loop passes that drift value into the inner-most loop which splits that drift linearly across all 162 samples that it’s trying. For context, if you had a transmitter that drifted up the dial by 2 Hz over the whole WSPR transmission, two minutes, this logic would find the highest "sync" at a drift of 2 and save it in the drift member of the candidate struct. To make things even clearer, here are the drift vs. sync readings for the first pass on the first candidate at the first estimate of its center frequency with no time adjustment:

This loud and clean signal peaks noticably at a drift of 0 Hz. It’s worth noting that I did try optimizing the drift loop by having it break as soon as it passed a peak rather than having it try all drifts in the range. This actually caused it to miss a decode as not all Sync Output vs. Drift graphs are as clean as this one.

Time Search

Recall that we have 360 FFTs covering 114 seconds of the transmission (including the first two seconds which should be empty) stored in a variable named ps. This gives us 114/360, or 0.32 seconds per FFT index, which means every symbol has 2 FFTs. The time search loop goes from a -10 FFT index (-3.2 s) to a +21 FFT index (+6.7 s) changing the time at which Hector the sync vector detector will start decoding.^[2] This allows WSPR beacons that don’t have their time set correctly and don’t start their transmissions at exactly two seconds after an even UTC minute can still be detected.

Frequency Search

Moving one more loop out, we have a frequency search that iterates from -1.46 Hz to 1.46 Hz centered around the frequency picked in the initial candidates section. Don’t let the addition of 256 when setting if0 throw you off, recall that our FFTs are stored with the positive offset frequencies first and the negative offset frequencies halfway through the array.

Results

Let’s take a look at the change for the first candidate:

	Before	After
Frequency (Hz)	2.93	2.93
SNR	-1.02	-1.02
Shift (time)	0	128
Drift (Hz)	0.0	0.0
Sync	0.0	0.42

As we’d expect for a strong signal the frequency doesn’t really change. Our estimate based on the smooth power spectrum was good enough. The SNR isn’t touched by this section of the code. The shift is interestingly set to 128*(k0+1). Since we had no time shift needed (k0 = 0) we get 128. Going forward, I’m curious as to how this is used/changes. No drift was needed to recognize the sync vector in this signal and lastly the value of that sync result is stored in Sync.

sync_and_demodulate()

For the next series of frequency, frequency drift, and time adjustments we’re going to need something a little more powerful than the SNR spectrum and windowed FFTs we’ve been using. This is where the sync_and_demodulate() function comes in. This function takes the I/Q data that we created way back in [readwavefile()] as well as information about our best guesses to make even better guesses. sync_and_demodulate() has three modes:

/***********************************************************************
* mode = 0: no frequency or drift search. find best time lag.          *
*        1: no time lag or drift search. find best frequency.          *
*        2: no frequency or time lag search. calculate soft-decision   *
*           symbols using passed frequency and shift.                  *
************************************************************************/

sync_and_demodulate() has a very similar structure to the Coarse Estimates loops. The outer loop will iterate the frequency around a central frequency. The next loop in will go through all possible time shifts. Finally the inner loop goes through all 162 symbols.

Use Fourier Analysis For Your Analysis

I’ll admit I was a little intimidated by the core section of this code when I first scrolled through the file, but a lot of reading, a little watching, and a lot of time spent looking at it and I think I can provide some insights.

The I/Q data we have is a time series. One of the first things this function does is generate DFT coefficients for the frequencies -2.20 Hz, -1.46, 1.46 Hz, and 2.20 Hz relative to the frequency being tested (2.929688 Hz for our first candidate).^[3] These frequencies correspond to the location of the frequency being used to represent a different symbol. Here’s the code in question:

wsprd Fourier Coefficient Calculation in sync_and_demodulate()

dphi0=twopidt*(fp-df15);
cdphi0=cos(dphi0); (1)
sdphi0=sin(dphi0);

dphi1=twopidt*(fp-df05);
cdphi1=cos(dphi1);
sdphi1=sin(dphi1);

dphi2=twopidt*(fp+df05);
cdphi2=cos(dphi2);
sdphi2=sin(dphi2);

dphi3=twopidt*(fp+df15);
cdphi3=cos(dphi3);
sdphi3=sin(dphi3);

c0[0]=1; s0[0]=0; (2)
c1[0]=1; s1[0]=0;
c2[0]=1; s2[0]=0;
c3[0]=1; s3[0]=0;

for (j=1; j<256; j++) {
    c0[j]=c0[j-1]*cdphi0 - s0[j-1]*sdphi0;
    s0[j]=c0[j-1]*sdphi0 + s0[j-1]*cdphi0;
    c1[j]=c1[j-1]*cdphi1 - s1[j-1]*sdphi1;
    s1[j]=c1[j-1]*sdphi1 + s1[j-1]*cdphi1;
    c2[j]=c2[j-1]*cdphi2 - s2[j-1]*sdphi2;
    s2[j]=c2[j-1]*sdphi2 + s2[j-1]*cdphi2;
    c3[j]=c3[j-1]*cdphi3 - s3[j-1]*sdphi3;
    s3[j]=c3[j-1]*sdphi3 + s3[j-1]*cdphi3;
}

1	The `cdphi` and `sdphi` variables are going to generate cosines and sines based on the previous result over a 256 sample window.^[4]
2	Fourier coefficients where the c (cosine) is the real part and the s (sine) is the imaginary part

Recall Euler’s formula (including the negative exponent formulation):

$$e^{ix} = \cos x + i \sin x$$ $$e^{-ix} = \cos x - i \ sin x$$

Taking a close look at the first set of coefficients (c0 and s0) we can see that the real part, c0, is equivalent to:

$$\cos (2 \pi \frac{1}{375} (f - 2.2)) - i \sin(2 \pi \frac{1}{375} (f - 2.2)) = e^{\frac{-i 2 \pi}{375} (f - 2.2)}$$

Likewise the imaginary part, s0, is equivalent to:

$$\cos (2 \pi \frac{1}{375} (f - 2.2)) + i \sin(2 \pi \frac{1}{375} (f - 2.2)) = e^{\frac{i 2 \pi}{375} (f - 2.2)}$$

Finally comparing this to the summation for a DFT:

$$X_k = \sum_{n=0}^{N-1} x_n e^{\frac{-i 2 \pi}{N} f n}$$

We can see that what we have in c0, s0, c1, s1, c2, s2, c3, s3 are the parts of the DFT we need to multiply and sum by the real and imaginary data (I/Q respectively) to get a representation of that frequency component’s contribution to the overall signal. We’re only investigating four frequencies and these coefficients are only calculated at the start (first symbol) of each decode attempt. The windowed FFTs from FFTW were a great start but now we need to do things by hand at a smaller scale to really narrow in on the frequencies and parameters we want.

Power, Sync, and Soft Symbols

With the coefficients in hand found, we perform the DFT as we travel through time for each of our symbols. Then taking the complex output we calculate the magnitude of the power of each frequency at that moment. We use good old Hector the Sync Vector Detector to get a measurement of sync and keep track of the parameters that yield the best result.

If we were asked to actually read the symbols, mode 2 for this function, we keep track of the power difference between two of the frequencies, depending on the sync vector bit:

if(pr3[i]==1) { (1)
    fsymb[i]=p3-p1;
} else { (2)
    fsymb[i]=p2-p0;
}

1	If the sync vector bit is set (it’s the lowest bit) this takes the difference between `p3` and `p1` (which should be set)
2	Otherwise it takes the difference between `p2` and `p0` (which should be set)

Finally, in modes 0 and 1 we simply return the best parameters. In mode 2 we divide each member of the fsymb array by their standard deviation to convert them to z-scores. We multiply by a scaling factor, symfac, which was passed to us and create an array of unsigned bytes. Each byte in the array represents the likelihood that, that bit was set.

Refined Estimates

Now that we know how sync_and_demodulate() works, let’s take a look at how it’s used. We start by taking shift in the candidates stuct for our lag estimate searching for the best lag +/- 128 samples around that number. The lag numbers used by sync_and_demodulate() are given in the amount of samples in the I/Q array, where 256 samples corresponds to one symbol. This is why we set shift to 128*(k0+1) earlier when we were using FFTs with half-symbol resolution. With these parameters we will call sync_and_demodulate() in mode 0 so there is no frequency or drift search.

Looking at the results, for the first candidate we actually get a better sync at a shift of 64. This is progress! Recall that our Coarse Estimates didn’t detect any time shift.

Next we do a frequency search from -2 Hz to 2 Hz around our coarse estimate in 0.25 Hz steps. This calls sync_and_demodulate() in mode 1, which is strictly a frequency search. Again, more progress! Our frequency estimate for the first candidate shifts down 0.25 Hz to 2.679688 Hz.

Now it’s important to note that sync_and_demodulate() does not actually have any logic within it to perform a drift search. It can take a drift parameter, but it’s up to the called to try sync_and_demodulate() with different drifts. There’s no internal loop (like we have for time lag or frequency) to do it for you.

That’s exactly what the next section of the code does. If you’re not on the final pass, it will try adjusting drift +/- 0.5 and refining the drift parameter if the sync is better. In the case of the first candidate, neither of those drifts yield a better sync.

Finally, if our estimate needs further refining we do more frequency and lag searches. The minimum sync for the first two passes needs to be 0.12 and for the last pass it needs to be 0.10. If it isn’t, we time lag search over +/- 32 samples with a step of 16 and then frequency search +/- 2 Hz with a step of 0.05 Hz.

In the case of candidate one on the first pass we actually do further refine our numbers and end with a shift of 80 samples and a frequency of 2.779687 Hz.

Dedupe and minsync2

At this point wsprd pulls out signals that may be duplicated in the candidate list or may not have enough sync for a good decode. It’s important to note that over though sync_and_demodulate() has a mode, 2, that demodulates the signal and gives us soft symbols we haven’t used it. This means that we can only really use frequency and shift to determine if two signals are the same. wsprd removes candidates within 0.05 Hz and a shift (time log) of 16. It also removes candidates that have a sync below minsync2 (0.12 for the first two passes and 0.10 for the last). Here’s the code that does it:

First Round of Dedupe in wsprd.c

int nwat=0;
int idupe;
for ( j=0; j<npk; j++) {
    idupe=0;
    for (k=0;k<nwat;k++) {
	if( fabsf(candidates[j].freq - candidates[k].freq) < 0.05  &&
	   abs(candidates[j].shift - candidates[k].shift) < 16 ) { // mention why shift matters
	    printf("First dedupe: %d is %d (%f %d %f %d)\n", j, k, candidates[j].freq, candidates[j].shift, candidates[k].freq, candidates[k].shift);
	    idupe=1;
	    break;
	}
    }
    if( idupe == 1 ) {
	if(candidates[j].sync > candidates[k].sync) candidates[k]=candidates[j];
    } else if ( candidates[j].sync > minsync2 ) {
	candidates[nwat]=candidates[j];
	nwat++;
    }
}

noncoherent_sequence_detection()

If you’ve made it through my incoherent ramblings so far, you’re already a noncoherent signal detection expert and this part should be no problem! In actuality this function is very similar (almost the same) as the first part of sync_and_demodulate(), we take many of the usual suspects as arguments: I/Q data, drift, shift, and lag. We calculate the DFT coefficients for the frequencies we’re interested in. In some parts we use arrays instead of individual variables names, but the DFT summation is the same.

This function has the code smell of an upgrade to the sync_and_demodulate() function. Mode 2 of sync_and_demodulate() (which is the demodulate part) isn’t called anywhere in the code currently, instead this function is used. This function doesn’t have modes, it demodulates each time. Finally the timing output refers to this function call as sync_and_demodulate(2). It’s also worth noting that both functions work on noncoherent frequency shifts, but this one gets noncoherent in its name.

The major difference you see is this code block:

Sequence Detection block from noncoherent_sequence_detection() in wsprd.c

for (i=0; i<162; i=i+nblock) {
    for (j=0;j<nseq;j++) {
        xi[j]=0.0; xq[j]=0.0;
        cm=1; sm=0;
        for (ib=0; ib<nblock; ib++) {
            b=(j&(1<<(nblock-1-ib)))>>(nblock-1-ib);
            itone=pr3[i+ib]+2*b;
            xi[j]=xi[j]+is[itone][i+ib]*cm + qs[itone][i+ib]*sm;
            xq[j]=xq[j]+qs[itone][i+ib]*cm - is[itone][i+ib]*sm;
            cmp=cf[itone][i+ib]*cm - sf[itone][i+ib]*sm;
            smp=sf[itone][i+ib]*cm + cf[itone][i+ib]*sm;
            cm=cmp; sm=smp;
        }
        p[j]=xi[j]*xi[j]+xq[j]*xq[j];
        p[j]=sqrt(p[j]);
    }
    for (ib=0; ib<nblock; ib++) {
        imask=1<<(nblock-1-ib);
        xm1=0.0; xm0=0.0;
        for (j=0; j<nseq; j++) {
            if((j & imask)!=0) {
                if(p[j] > xm1) xm1=p[j];
            }
            if((j & imask)==0) {
                if(p[j]>xm0) xm0=p[j];
            }
        }
        fsymb[i+ib]=xm1-xm0;
        if( bitbybit == 1 ) {
            fsymb[i+ib]=fsymb[i+ib]/(xm1 > xm0 ? xm1 : xm0);
        }
    }
}

This will go through all 162 symbols in blocks of nblock size. For each block it iterates through all combinations of bits and sums up the powers of the expected sync frequency components. I will readily admit that there is a scaling factor there, cm and sm adjusted by cmp and smp respectively, that I don’t fully understand. cm and sm start at one and zero respectively and seem to stay close to those values. They control how the real and imaginary parts of the data are summed. Once you have the powers it goes through each bit of the block and calculates the maximum power in all sequences when the bit is high, xm1, and when the bit is low xm0. This is used to create the soft symbol for that bit as high power minus low power.

The last part uses the same normalization routing (divide by standard deviation and multiply by 256) use in sync_and_demodulate().

Attempt to Decode

We’ve used windowed FFTs to get coarse estimates and refined those estimates with a custom DFT. At this point we should have a pretty good idea of the best frequency shift, frequency drift, and time lag for our candidates. So what are we going to do with our fancy noncoherent_sequence_detection() function? We’ll get the best set of soft-decision symbols we can and attempt to perform a full decode of course!

A parameter we have left to tweak is the blocksize used by our noncoherent_sequence_detection(). The outermost loop of this step will try blocksizes up to 3 bits and bit-by-bit where each soft-decision symbol is normalized by xm1 or xm0 depending on if it is closest to a one or zero respectively. Interestingly it will also tweak the frequency shift just slightly during these passes. The code refers to this process as shift jittering.

Finally within this loop we calculate the root mean squared of the soft symbols (centered at 128, recall they can be 0-255) and if it is above a certain value a decode is attempted. The symbols are uninterleaved (descrambled) and if an option is selected the jelenik decoder is used. If the options is not set the fano decoder is used. If these decoders fail and the pass conditions are correct an ordered statistical decoder (OSD) is used. As all of these decoders warrant analysis in their own right and exist outside of the wsprd.c file we’re looking at, I won’t be detailing their operations.

Sanity Checks

Now that we have symbols we can do some neat things most of which are detailed in this comment in wsprd.c:

// Unpack the decoded message, update the hashtable, apply
// sanity checks on grid and power, and return
// call_loc_pow string and also callsign (for de-duping).

unpk_() which can be found in wsprd_utils.c is called to unpack the message and check to make sure it makes sense (sanity checks). Finally we have one of the most interesting features of wsprd: subtracting signals.

Subtract Signals

You’ve already seen how the wsprd decoder exhaustively revises the parameters of the candidates to find the best fitting option. It also repeats this whole process three times, subtracting signals after it finds them. This allows small signals that are too close to other signals to be detected. Let’s take a look at the functions it uses to pull this off.

subtract_signal()

This function appears to be a holdover of a time period when a simpler method was used. It isn’t actually called anywhere in the code. That being said, it may make sense to look at it to get a basic idea of how subtraction works.

subtract_signal() takes 162 symbols as well as the time shift and frequency drift and generates the real and imaginary components of the signal at each point in time. Finally it subtracts this generated signal from the actual IQ data, which was passed as a pointers to float arrays.

subtract_signal2()

This updated function is significantly more complex than subtract_signal2(). It starts in a similar fashion by creating a reference signal including the time shift and frequency drift. It then sets up a low pass filter, running average in groups of 360 points, which I believe also performs some windowing so the initial and final values are weighted slighly less. The complex amplitude is estimated as the received signal times the complex conjugate of the reference signal. This should have the effect of removing the reference signal, but the comments mention a phase shift issue. The low pass filter is applied and finally the id and qd arrays are set to the received signal (their original value) minus the complex amplitude times the reference signal.

I will admit that I’m still slightly confused by the final steps of subtract_signal2() but it seems to perform much better than subtract_signal().

Dedupe, Decode (again), Perform More Passes, Print Results

The same callsign within 4 Hz of eachother in the candidate list is removed. The other candidates are decoded and subtracted and finally more passes are performed. If this is the first pass, the c2 file is written. If this is the last pass no more signals are subtracted.

Finally the list of decoded signals is sorted in order of frequency and printed out.

Summary

Hopefully this article has given you an idea as to just how exhaustive a search wsprd performs. It performs analysis in both the time and frequency domain. It accounts for frequency shift, frequency drift, and time shift. It subtracts large signals to make smaller signals more noticable and finally it performs multiple passes of all these techniques to get the most decodes possible.

From the appearance of the code it’s gone through several modifications over time to improve its decode rate. It’s also worth noting that there is also a significant quantity of code designed to collect performance metrics. It’s obvious that the authors have put in a lot of time to make this a great WSPR decoder and what is more they made the source code available so everyone can learn from it. I’d like to thank them for their hard work.

1. The Makefile for wsprd is actually pretty simple and it could probably be compiled out-of-tree with minimal effort. I wanted to make sure my development environment was exactly like the way WSJT-X specified just in case.

2. At the time this article being written, I reached out to the wsjtx-devel team regarding a negative index being used in this logic. The issue was quickly resolved by the team.

3. In Fourier analysis the term coefficients seems to be miserably overloaded. In this instance I’m using it to describe the value we have to multiply our time domain data by in the process of getting frequency domain data. It is not the frequency domain data yet.

4. Thanks Wolfenstein 3D for teaching me how to generate sine and cosines on the fly in a raycasting engine.