Processing math: 100%

Sunday, February 23, 2025

On quantization errors, and recovering resolution from dithering and filtering

Preface

Just a quick post about recovering a signal, which has been corrupted by quantization error. This is something I discovered while working on my GPSDO project, but has general applicability. This all might be common knowledge and trivial, but I enjoyed coming up with my own solution. Documenting it here so I don't forget about it.

Introduction

Consider a continuous observable, which is sampled by a low resolution process. By low resolution, I mean that there is a limited number of steps that can be differentiated. In a noiseless ideal case the response would look something like shown in figure 1.

Figure 1. Ideal quantization of continuous observable.

However, consider then that the observable is corrupted by additive noise. This causes the sampled signal to also exhibit noise. The result might look as shown in figure 2.

Figure 2. Effect of additive noise on sampling.

This noise in the sampled signal can be used as a way to increase resolution, where several samples are averaged to produce a single sample. Noise is sometimes deliberately injected into measurement signals to take advantage of this. With enough samples averaged, not only is the additive noise reduced, so is also the quantization error. This is illustrated in figure 3.

Figure 3. Result of averaging noisy sampled data.

Now, the example above exhibits a large amount of additive noise. Noise standard deviation was taken as 40% of quantization resolution. However, if the additive noise is small compared to the quantization resolution, this breaks down. See figure 4, where the noise standard deviation is just 15% of quantization resolution.

Figure 4. Effect of small additive noise.  

Suddenly, the quantization error is no longer removed through averaging and a step-like pattern emerges.

An important thing of note is that there are again some continuous values for which there is only a single sampled value. It seems rather clear that in general no further resolution can be extracted for those values. However, for the following discussion it's more interesting to observe the values that do still exhibit two distinct sampled values. This is also the situation I have in my GPSDO project that motivated all this.

In the quantization transition regions averaging clearly gives more resolution, but the averaged values do not match the true values. The obvious question is whether this can be improved.

Bayes saves the day

Initially I took a Bayesian approach to the problem. When presented with the quantized value q, what can be inferred of the continuous value c? This is answered by Bayes' theorem. p(c|q)=p(q|c)p(c)p(q).

That is, the probability of the continuous variable having value c after we've observed quantized value q is proportional to the prior probability of observing c times the probability of observing quantized value q given that the continuous variable has value c, which is also called the likelihood. The probability of observing the quantized value q in the denominator can be thought of just a normalizing factor.

The formula expresses how we update our knowledge of the continuous variable, when presented with a measurement. The posterior distribution is proportional to the prior times the likelihood.

Before we can formulate the likelihood, we must properly define the additive noise and the quantization. To keep things simple, let's take almost the simplest quantization possible: the round function. Also let's take the noise as normally distributed and zero mean nN(0,σ).

The likelihood is the probability of sampling a value, given that the continuous variable has some value. With our choice of quantization, we observe quantized value q if the continuous value plus noise c+n is in [q0.5,q+0.5). The probability can be evaluated as p(q|c)=p(q0.5c+nq+0.5)=p(q0.5cnq+0.5c). Which, since the noise is normally distributed, can be further evaluated as p(q|c)=Fn(q+0.5c)Fn(q0.5c)=12erf(q+0.5c2σ)12erf(q0.5c2σ).

To gain more intuition, let's look at what the likelihoods look like. Let's take the noise standard deviation as σ=0.1. This is shown in figure 5.

Figure 5. Likelihood functions for observing quantized value 0 and 1.

Near the continuous values 0 and 1 there is an almost 100% probability for observing quantized values 0 and 1, which causes the stairs we saw in figure 4. The significant overlap area of the probabilities is only a small fraction of the range, which leads to the average not matching the true value also seen in figure 4.

Then, say that we have observed a pattern 1, 0, 0, 0, 1, 0, 0, 0. What can we say about the continuous variable? Taking the prior distribution as uniform (i.e. no knowledge of the continuous variable), the Bayesian inference progresses as shown in figure 6.

Figure 6. Inference of pattern 1, 0, 0, 0, 1, 0, 0, 0.

This approach is very powerful, but also very heavy. And as an additional complexity, if the system is dynamic, we need to add uncertainty to the distribution between inference steps, to allow the inferred continuous variable to evolve. For the usual applications, this likely adds too much complexity to be practical.

This exercise, however, gave me insight in why the simple averages of the sampled values don't give the correct result, as well as how the likelihoods look like.

A practical solution

In practice, the sampled values alternate at most between 2 distinct values. While it's not strictly impossible to observe 3 or more values, for the situations of interest here, the probability is just extremely low. This allows us to simplify things by allowing only two values. The likelihood of observing a zero is then simplified as p(q=0|c)=Fn(0.5c)=12+12erf(0.5c2σ),while the likelihood of observing a 1 is the complement of this. These likelihood functions are visualized in figure 7.

Figure 7. Likelihoods of the two observable values.

Since the two cases are complements of each other, inferring the continuous variable becomes much easier: probability of observing, say, the value 1, can be directly mapped to the value of the continuous variable. The probability, on the other hand (for the values 0 and 1), can be simply estimated as the average of observed samples. Figure 8 shows the mapping from probability to value.

Figure 8. Continuous variable inference from probability of observed value.

While it appears in figure 8 that a probability of 0 gets mapped the continuous variable value 0 and that the probability of 1 gets mapped to the continuous variable value 1, this is not quite the case. This is because there are still very tiny probabilities unaccounted for in figure 8. However, since these are practically impossible, we can round the remaining probability to zero.

Since estimating the probability of observing a 1 is the same thing as computing the average of ones and zeros, we can extend the concept of averaging the values to the other values as well. Turns out this way we can define a single curve that takes the average of the observations and outputs an estimate of the continuous variable. See figure 9.

Figure 9. Continuous variable estimation from average of observations.

This approach is very lightweight to compute, and allows handling dynamic situations by choosing the length of time that the average is computed over.

Let's revisit the situation of figure 4 using this approach. The results are presented in figure 10.

Figure 10. Estimation method demonstration.

 

Monday, February 17, 2025

GPS disciplined oscillator - part 2

Introduction

After the good results I got from my prototype GPSDO, I wanted to go a bit further than the mess of wires that was the prototype. Besides achieving a better build quality, I needed more than just one 10 MHz output to support a few key instruments in my lab - and a PPS output would also be nice to have. I also wanted to improve the temperature dependency of the device by insulating it better, and by keeping the control voltage reference at constant temperature.

The device was built in April - June of 2024, but I only now got to documenting it.

Overview

The device is composed of two parts: the main unit and the GNSS receiver unit. They are connected to each other through standard twisted pair Ethernet cabling, which allows placing the receiver unit to a convenient location, which may be far away from the main unit.

GNSS receiver unit on the left, main unit on the right
Reverse sides

GNSS receiver unit

The GNSS receiver unit gets power through one of the four pairs of the cable. A suitable voltage for the receiver is then regulated locally from the provided voltage. The PPS output of the receiver is fed into a RS422 line driver and transmitted back to the main unit over another pair of the cable.

The receiver serial TX and RX are also connected to a RS422 transceiver to allow data communication with the main unit. Thus, all 4 pairs of the cable are used.

Cover dome, board carrier and pole mount are held together with 4 M3 screws




Close up on board carrier. RJ45 is on a separate board and connects via ribbon cable

RJ45 board is held in place with snap tabs. Receiver PCB is attached with 2 M2.5 screws and hold down tabs.
Reverse sides

Main unit

The main unit is comprised of several sub-units. These are the main board sub-unit, the OCXO board sub-unit and the LCD board sub-unit. Each sub-unit handles a separate part of the system and communicate with each other through I2C.

Main board sub-unit and LCD board sub-unit are attached to the cover plate. OCXO board sub-unit is inside block of foam insulation.
Mess of wires for the connectors. All wires connect to the main board. RJ45 connector is on a separate board, which is held in place by snap tabs.
LCD board sub-unit (top) and main board sub-unit (bottom)
Block of foam insulation with OCXO board sub unit inside removed from the main unit enclosure

LCD board sub-unit

The LCD board sub-unit handles displaying information to the user through a HD44780 type liquid crystal display. It acts as an I2C master and pulls information to display from the main board sub-unit as well as from the OCXO board sub-unit. It is built around a CH32V003 MCU.

Close up on the LCD board sub-unit with main board sub-unit

OXCO board sub-unit

The OCXO board sub-unit handles the 10 MHz frequency generation and its control voltage generation, and thus carries the OSC5A2B02 OCXO itself. It is housed inside a thermally insulating foam block to reduce the temperature dependency of the 10 MHz output.

The OCXO adjustment is about 1 ppb per millivolt, so the control voltage needs to be quite stable. This stability is provided by a TL431C voltage reference, which is at close thermal contact with the OCXO and thus also at a constant temperature. A 16 bit DAC is then implemented using a PWM signal from a CH32V003, followed by heavy filtering to try to keep the phase noise low.

The OCXO board sub-unit also features a DS18B20 temperature sensor, which is at close thermal contact with the OCXO. This is for now only used for monitoring, but could perhaps be used for some additional temperature compensation in the future.

The 10 MHz output from the OCXO is not used by any of the circuitry on the sub-unit itself, but is instead passed to the main board sub-unit.

The OCXO board sub-unit acts as an I2C slave and allows setting the OCXO control voltage based on a control word, which it receives from the main board sub-unit. Additionally, the LCD board sub-unit queries it for temperature information.

Block of foam insulation with OCXO board sub-unit inside
Foam cover removed
OCXO board sub-unit removed from block of insulation

Main board sub-unit

The main board sub-unit performs three functions: GNSS data interfacing, OCXO PLL control and clock buffering.

Main board
Reverse side

GNSS data interface

The GNSS data interface has a RS422 transceiver to allow serial data communication with the GNSS receiver. This is used to configure the receiver, as well as for receiving a time stamp for each PPS pulse for absolute phase control. The GNSS data interface is built around a CH32V003 and acts as both an I2C master and an I2C slave. As a master, it actively pushes validity and time stamp information to the OCXO phase lock loop system. As a slave it allows the LCD board sub-unit to fetch the validity, time stamp, positioning and other GNSS information.

PFD controller

The phase-frequency detector controller is the heart of the system, and is built around a CH32V003. The MCU is clocked from the OCXO board sub-unit 10 MHz output. This 10 MHz is internally clock doubled for a 20 MHz cycle rate at the MCU core.

Via an RS422 receiver, the MCU gets the PPS signal from the GNSS unit. This allows the MCU to count the number of 10 MHz cycles between PPS pulses at half-cycle (i.e. 50 nanosecond) resolution. This, averaged over a long period, represents the true frequency of the OCXO output.

At each PPS, the accumulated phase count of the OCXO is compared to the ideal phase (i.e. the phase derived from the time stamp). The difference of these represents the phase error. The phase error is then propagated through the PLL software controller to compute an OCXO control word.

The OCXO PFD controller MCU acts both as an I2C master and as an I2C slave. As a master, it transmits the OCXO control word to the OCXO board sub-unit. As a slave it allows the LCD board sub-unit to fetch status information regarding the PLL lock.

Clock buffers

The clock buffers allow isolating the external users of the 10 MHz clock signal from the clock generation. Thus the PLL keeps running even if some clock outputs were e.g. accidentally shorted or otherwise abused. There are a total of four 10 MHz outputs and one PPS output.

Each clock buffer is made from a 74HC04 hex inverter. Five of the inverters are paralleled for the output stage. This is done to provide a strong drive for a 50 ohm output. The sixth inverter is used as a buffer between the OCXO signal and the five paralleled inverters. This reduces the capacitance which the OCXO needs to drive as it now only sees the capacitance of one CMOS input per clock buffer instead of six.

Characterization

I think I've now spent more time measuring and tuning the device than actually building and designing it. In fact, I'm not yet even using it as a time base for any of my lab instruments.

OCXO gain

The OCXO is modeled as having a linear response to its control voltage (see previous post), while the control voltage is assumed to be linear to the control word. The OCXO frequency change per control word change is here called the OCXO gain. In order to tune the controller for the best response, the OCXO gain needs to be measured at the operating point.

OCXO gain measurement

The control word given to the OCXO board sub-assembly was varied between two values, one of which was 1000 counts above the operating point and one which was 1000 counts below the operating point. The frequency of the OCXO was then measured against the GNSS PPS signal. Note that since the PFD operates with a clock-doubled OCXO input, the gain is defined with respect to the clock-doubled 20 MHz frequency and not the 10 MHz output.

The clock-doubled OCXO output changed frequency by 0.458 Hz with a control word change of 2000, thus giving the OCXO gain an approximate value of 229 micro-Hz per count.

OCXO PLL time-scale

A critical parameter to choose is the bandwidth of the PLL. This is because the GNSS PPS is noisy at short time scales, while the OCXO is unstable at long time scales. To gain understanding on the matter, I collected phase error data from the OCXO while set as free running.

As comparison points, I found some stability data published for the OSC5A2B02 as well as for bare GPS PPS signals. These came from the blogs PA1EJO and www.febo.com. These sources used rubidium standards as their reference clocks, while I could only compare against the GNSS PPS.

Phase error of the OCXO against GNSS PPS
Low frequency phase noise on the 10 MHz output
Modified Allan deviations. My measurement and other published data.

Looking especially at the modified Allan deviation graphs, my data follows the [PA1EJO] GPS PPS data quite well at the low frequencies. This indicates that the deviation at those frequencies comes from the PPS and not from the OCXO. On the other hand, at slightly higher frequencies my data follows the [www.febo.com] OSC5A2B02 data, indicating that here the OCXO is causing the deviation. Though my OCXO in this measurement appears to be more stable than the unit in the external data. This could be due to additional thermal insulation my unit has.

The OCXO has apparent linear frequency shift, which at such long time scale could be easily corrected by the PLL. To still get more understanding of the stability, I computed the Hadamard deviation as well. This is insensitive to linear frequency drift, and the [PA1EJO] GPS PPS data has this metric also published.

Hadamard deviation comparison

In the Hadamard sense, the OCXO is remarkably stable. There is a slight change in the slope at around 10 seconds time scale, but it isn't anything to really worry about.

It looks that it's only important to select the time scale short enough, that the linear frequency drift is compensated, but still long enough that phase error measurements (through low-pass filtering) become accurate. This is probably a pretty wide time scale range. Mostly it's trying to get the scale as short as possible, while keeping the control stable.

To explain the last part a bit. The hardware can directly observe the phase error at only 50 ns resolution. This by itself is not nearly good enough for proper control. However, due to jitter in the PPS, the phase error is actually observed alternating between two values. Low-pass filtering this alternating raw error gives much improved resolution. As an additional trick, the controller can provoke jitter in the measurement by deliberately controlling the OCXO phase to lie at a midpoint between two values.

Conclusion

 I'm still in the progress of measuring the response of parameter choices, as well as making tweaks to the controller algorithm itself. There will be a part 3, in which I'll try to get some performance measurements. I'll also try to get the electronics and code published as well.

Tuesday, July 23, 2024

Single parameter controller for a GPS disciplined oscillator phase locked loop

Introduction

In a previous post I described some of my experiments with building a GPS disciplined oscillator. In my experiments, I lock a voltage controlled oscillator to the PPS signal of a GPS receiver. The locking is achieved by a control loop, which observes the phase error between GPS and the VCO and produces a control signal to adjust the frequency of the VCO.

The GPS PPS signal is very noisy at higher frequencies (above say 1 milli-Hz), so the controller must have a low bandwidth to reject the higher frequency noise. Also, since I don't want to spend a lot of time tuning the controller, I want the bandwidth to be the only parameter.

VCO phase error model

The VCO has a control voltage input u, which controls its frequency fvco. This relationship is well approximated as linear around an operating point. Thus near our target frequency ftarget we can model the VCO frequency as fvco(t)=ftarget+g(u(t)uo), in which g is the VCO gain and uo is the needed control value to attain the target frequency. The frequency error is thus given by f(t)=ftargetfocxo(t)=g(u(t)uo)

As the goal is to achieve phase lock, the quantity of interest is actually the phase error. We take the unit of phase to be a full cycle. This makes the phase error e simply the integral of the frequency error over time. e(t)=g(u(t)uo)

As the controller will be implemented in the digital domain, u will be piecewise constant. This allows us to discretize the model as the recurrence en+1=enΔtg(unuo), in which Δt is the time between the discrete changes of u, un is the value of u at t[nΔt,(n+1)Δt) and en=e(nΔt).

Controller

The simplest form of controller which can control the phase error to zero is a PI controller. Here the control is determined as a linear combination of the phase error e and its integral i as un=Pen+Iin, in which P and I are tuning parameters of the controller.

Unfortunately for us, our phase error measurement is very noisy at short time scales. With the simple PI controller the noise would couple straight to the control output through the P term. We thus need to consider something else. One could obviously use a longer interval between the measurements, but that introduces its own set of problems. Our solution is to instead apply a simple first order low-pass filter to the phase error measurement and then combine that with a PI controller.

Since the filter and the controller are implemented in the digital domain, let's transition now to an entirely discretized domain and ignore the continuous domain altogether. In this context, we take our low-pass filter as ˆen+1=(1α)ˆen+αen, in which ˆe is the filtered phase error and 0<α<1 is a parameter defining the filter bandwidth.

The integral of the filtered phase error, similarly, is considered only in the discretized sense, and is given as ˆin+1=ˆin+ˆen

The control law of the controller is then un=Pˆen+Iˆin, in which P and I are tuning parameters to set the behavior of the controller.

Combining the VCO phase error model with the controller gives the model of the entire system as (en+1ˆen+1ˆin+1)=(1ΔtgPΔtgIα1α0011)(enˆenˆin)+(Δtguo00)

For the controller to work, we want the errors to converge. The dynamics of the convergence are determined by the eigenvalues of the matrix. For simplicity, we'll choose all three eigenvalues as r, where 0<r<1.

The characteristic polynomial of the matrix is λ3+(α3)λ2+(ΔtgPα2α+3)λ+ΔtgIαΔtgPα+α1. On the other hand, to have the desired eigenvalues, we want the characteristic polynomial to be λ33rλ2+3r2λr3. Matching the coefficients, we get the equations α3=3rΔtgPα2α+3=3r2ΔtgIαΔtgPα+α1=r3

Solving those equations yield α=3(1r)P=1rΔtgI=(1r)23Δtg

The resulting matrix is unfortunately non-diagonalizable, making analyzing the resulting dynamics a bit tedious. The controller however appears to be well-behaved and produces convergence without too much overshoot.

The following figures show simulated responses from a controller with r=0.001.

Impulse response of the controller against phase error change
Impulse response of the controller against control offset change