Wednesday, December 4, 2019

GameBoy to VGA

VGA mods for the GameBoy DMG-01 aren't anything particularly new. There are several such projects laying around the internet. BennVenn even used to sell a mod kit.

My main motivation, however, is not to just have a VGA out on my GameBoy, but to teach myself Verilog through a non-trivial project. For development, I started with a cheap Cyclone 1 (EP1C3) based FPGA board I had ordered previously.

Hardware

The board, labeled "FPGA NIOS KIT Board TodayStart" has a two big bugs, and I can't really recommend this board for this use. The 50 MHz oscillator is connected to CLK3, which is not connected to the PLL on the EP1C3. Also, the PLL supply pin is connected to 3.3V through a 100 ohm resistor, which causes the PLL to not work properly even if you manage to connect a clock to it.

I had to first fix the issues by soldering a wire from the oscillator to CLK0 and by replacing the 100 ohm resistor with a 1 ohm one.

Some further discussion of the board (in German) at mikrokontroller.net forums.

The dev board I started the project with.

The display resolution on the GameBoy is 160x144 pixels, with 4 shades of gray per pixel. The pixel aspect ratio is 1:1 and the display is updated at approximately 60 Hz. Although the refresh rate matches VGA, no VGA resolution has this few pixels. Some type of upscaling is required, and thus also some form of image buffering. The EP1C3 has 59904 bits of block memory, which is just enough for buffering one GB frame (160*144*2 bits = 46080 bits).

Scaling each dimension by 4 results in a scaled image of size 640x576. A common resolution close to this is the Super VGA at 800x600. Using an 800x600 output and 4x scaling the GB frame, we are left with 80 pixels padding on the left and right side of the image, but only 12 pixels on the top and bottom. Since the image aspect ratios do not match, we obviously can't get rid of the horizontal padding completely. With this choice, however, the vertical padding is almost as little as possible.

The hardware to connect the FPGA to the monitor couldn't be much simpler. Sync signals use TTL logic levels and thus the 3.3V outputs from the FPGA are directly compatible. As for the video signal itself, I built a two bit DAC out of two resistors for each color channel. Taking the resistor values of 390 and 820 ohms together with the 75 ohm input impedance of the RGB inputs, gives the possible output voltages of 0V, 0.24V, 0.49V and 0.73V. This matches quite well the nominal signal levels of 0-0.7V specified for the RGB inputs.

The LCD drive signals from the GameBoy use 5V CMOS logic, which is incompatible with the FPGA inputs. For now I am using simple resistor divider level converters to get the 5V down to 3.3V. The resistor values I use are 2.2k and 3.9k. This has a significant effect on the rise and fall times of the signals, which causes some noise issues. These issues are addressed using digital filtering in the Verilog code. I'll definitely be using real level shifter ICs, if I make some custom boards for the project. The issue could be mitigated also by lowering the resistance values, but that might load the GB output too much.

VGA output from framebuffer

I started with generating the SVGA output from the FPGA. It turns out that generating a VGA signal in Verilog is quite easy. You need two counters to go through the image and blanking area and then assert the sync signals at the proper ranges of the counter values. For more details, see this timetoexplore.net blog entry.

SVGA 800x600 @ 60 Hz operates with a pixel clock frequency of 40MHz, so the system PLL was configured to generate that frequency.

On each positive clock edge
  • The horizontal and vertical counters are incremented, or reset if they would overflow
  • A new framebuffer address corresponding to the new counter values is computed
  • A pixel value fetch from the previous framebuffer address is initiated
  • An output pixel value is asserted from the previous fetch
  • Hsync and Vsync signals corresponding with the active output pixel are asserted
The video signals on the output are two clock cycles behind the counters, as both the fetch address computation and the actual fetching of the data happen in a synchronous manner. The hsync and vsync signals have an additional pipeline synchronize them with the display data.

I would rather have used an asynchronous memory type for the framebuffer, but it seems there is no such thing as an asynchronous 2-port RAM. At least not within the Altera megafunctions available for this FPGA.

First image displayed by my setup. Framebuffer RAM contents are initialized using a .mif file. Only red color channel was connected at this stage.

GameBoy signal decoding to framebuffer

The GB LCD board is connected to the main board with a ribbon cable. The image data is transferred using a seemingly simple protocol over five signal lines: vsync, hsync, clock, data0 and data1.

Looking at the signals with a logic analyzer reveals that the GB sets up the data on data0 and data1 on the rising edge of the clock, and the data is valid on the falling edge. However, it is not entirely clear from just a logic trace how hsync and vsync signals are to be understood.

Many projects online demonstrate the decoding of GB LCD signals into images. However, I've never seen anybody really document the mechanism for the proper decoding of said signals. I, at least, had quote a bit of trouble getting the leftmost column of the image to decode correctly under all circumstances (the GB uses slightly different timings of the signal depending on whether it is displaying sprite data or background tile data).

For the first testing I used a logic capture published by flashingleds.net in their nintendoscope project. This dataset had the problem that I couldn't compare my reconstruction with what the GB LCD was actually displaying. For this reason I later set up my own logic analyzer and captured data from a screen of the level "The Amazon" of the game "DuckTales". This scene has window tiles, background tiles and sprites. It also features a rendering bug in the final pixels of the window. If you're interested in my data, you can download it here.
Correctly decoded image for the data provided by flashingleds.net

Correctly decoded image for the opening in "The Amazon" in "DuckTales".

A rising edge on vsync starts a new frame

After the rising edge, the vsync signal remains high for one scanline of data. There doesn't seem to be anything special in the timing of the falling edge. It is the rising edge, which determines where the new frame starts.

This was determined by guessing the polarity and checking that rows were decoded in the correct order.

A rising edge on hsync starts a new scanline

The GB LCDC will start pushing out data of a new scanline after hsync goes high. More importantly however, the falling edge of hsync latches the data for the first visible pixel on a line.

Clock signal is not valid when hsync is high

There are 160 clock pulses between two hsync rising edges. There are 160 columns in the GB display. This seems like a simple one to one correspondence, but unfortunately this is not the case. One of the clock pulses occurs, while the hsync signal is high. For this pulse, the data is incorrect on both edges. After some investigation and experimentation, it turned out that the correct data for the first pixel is present at the falling edge of hsync.

The data for all other pixels on a line is valid on the falling edge of the clock. The simple approach of sampling all pixels on the falling edge of the clock results in the following decoded images
The image looks strange on the bottom left, but I couldn't be certain if this is how it was supposed to look like.

The first column data decoded on the falling edge of the invalid clock pulse. Notice some black pixels on the left edge, which should not be there. They are a ghost from the treasure chest sprite from the right edge of the previous scanline.
The image from the flashingleds.net data looked weird, but I couldn't be sure if that's how it was supposed to look like. However, after doing some preliminary tests with the decoding logic on the actual FPGA, it became evident that there was something wrong with the way I handled the first column. I had artifacts, where sprites on the right side of the picture were generating ghosts on the leftmost column.

Turning to the DuckTales data, this can be seen on the leftmost column, in which some black pixels are visible where white background should be. These are a ghost image from the last pixel on the previous scanline (the treasure chest sprite seen on the right side of the image). The background and window tile data look correct here, but that is coincidental.

The figure below shows the beginning of the 120th line of the DuckTales capture. This is one of the lines, which contains the treasure chest sprite ghost.
Logic trace of the beginning of the 120th scanline in the capture.
You can clearly see the invalid clockpulse while hsync is high. For the other clock pulses, data is set up on the rising edge, while data is valid on the falling edge. For the invalid pulse, data remains constant on the rising edge while it changes on the falling edge.

The first pixel of this row should be all white, which corresponds with 0 on both data lines. This is clearly not the case near the invalid clockpulse. Most of the time the value decodes as 1 on both lines. This results in a black pixel.

Not quite clear in the figure above, there is a small amount of time of time between the falling hsync edge and the rising edge of the first valid clock pulse / setup of data. During the falling hsync the data is 0, which is what we expect. Based on this observation on this single line, I went on to implement this into the decoding. I can't say with absolute certainty that this is the proper way to decode the signals, but at least I haven't noticed any artifacts pop up. Everything has looked the same as on the the LCD.

Tearing

The VGA output is not synchronized in any way to the GB input. Both are running at approximately 60 frames per second, but not exactly. There is necessarily some beat frequency between the two. I was expecting to see severe tearing artifacts at places, where the GB data input catches up with the VGA output, or where the VGA output overtakes the GB data input. I was pleasantly surprised to not notice any tearing in the output. Due to how the signal is processed, there necessarily is some tearing, but it is much less noticable than I feared.

Media

My GameBoy attached to the FPGA board. Level shift resistors are inside the yellow heat shrink. VGA DAC resistors are inside the HD15 connector.
The whole setup with the VGA monitor I'm using.


A clip showing the whole setup with me playing Super Mario Land

Some more Super Mario Land gameplay now zoomed in to the monitor

Code

The Verilog code can be grabbed here. It uses two Altera megafunction blocks. One ALTPLL block for controlling the PLL and one ALTSYNCRAM block configured as 2-port RAM for the framebuffer.

The whole thing is mostly written in one file and there is no test bench, or any other testing. It is a work in progress, and not complete in any way.