PDP-11 uWord Repair

In December of 2024, my PDP-11 developed a new fault in the uWord board. The system had gone from mostly functional to almost completely dead!

PDP-11 uWord Repair
PDP-11 UWord Board

In December of 2024, my PDP-11 developed a new fault while trying to use the unibone for the first time. The system had gone from mostly functional to almost completely dead!

Symptoms

Address Latch

The address latch appeared to be working correctly. I was able to change the address on the bus without any issue.

Examine

Pressing examine would result in one of two behaviours:
1. The gibberish would be loaded onto the data display
2. The system would exit the halt state and start trying to run.

Deposit

Pressing the deposit button would scramble the upper bits and increment the LSB without depositing any valid value. Looking at the memory card, I could see that it has detected a bus error (red LED lit).

KM11

Only later in my testing I found that I was unable to single set through the microcode. The KM11 had no effect on the system once MCLK was enabled. Once MCLK was enabled, it was impossible to disable it and restore the system RECLK.

Working Theory

For the last few days before the fault, I had been testing the instructions, and the system had been working well without any obvious malfunctions.

The next stage of the testing was to install my newly built UniBone. I installed the UniBone and powered up the system. I spent some time playing around with the uniboine, but I didn't make much progress. I then went over to use the console to test the processor and found that it was no longer working. This was not an immediate concern as I had read that the UniBone could stop the system from working if not configured correctly. After removing the UniBone, I was dismayed to find that the processor was still not working correctly.

The most obvious potential cause was that the UniBone had damaged the bus in some way.

Debugging The UniBone

The bus of the PDP-11 is wired-OR, that is to say, the bus lines a pull HIGH will pull-up resistors. When something on the bus wants to assert a value, it needs to pull the line down; this is done with an open collector. The main feature of an wired-OR bus is that multiple devices can simultaneously assert the same line without causing any damage; I2C works in this way.

For the UniBone to have damaged the PDP-11 it would have either had to feed a large voltage into the bus or pulled the bus high with a very strong pullup. If, for example, the bus was shorted to 5V when the bus drivers attempt to pull the line low, the bus drivers would have to sink a huge amount of current and would fail.

To check for any issues with the UniBone, I powered it from my bench power supply and checked the output pins, looking for any anomalies... I soon found one!
Checking the BG5, BG6, BG7 and NPG I would see that they were being pulled up to 5v! Ordinarily, those lines are pulled to about 3v, they should not have been at 5v. Looking into the cause, I found that I had used the wrong type of resistor pack, and the result was a moderately strong pull-up. There was the potential that this stronger pull-up could have caused some of the bus drivers to burn out, specifically NPG.

Debugging The Unibus

With my theory that one of the bus drivers had failed, I had a look at the schematics and ordered some replacement bus drivers. I also ordered a set of UniProbes so that I could examine the unibus signals and hopefully identify any missing signals.

I spent a day studying the bus protocol for memory reads and writes in preparation. With my newly acquired UniProbes installed, I began to record some of the bus transactions. When I analysed the bus transactions, I was able to see that all the signals I expected to be there were present and looked to be working mostly correctly. The only issue I could see was that some of the bus transactions were repeated multiple times or got stuck in the wrong state but, in general, things were happening in the correct sequence. This put into doubt my working theory...

To gather bus transactions a little more accurately, I decided I would single-step the system. I installed the KM11 and switched on MCLK and tried to single set the system. To my surprise, it completely refused to do anything! Furthermore, I was unable to get the processor to resume once the MCLK had been disabled.

Debugging the Single Step

The PDP-11 uses a signal called RECLK to trigger the next clock cycle, RECLK is generated at the end of the current clock. When MCLK is enabled the RECLK signal is gated preventing the next clock cycle from running. When the user single steps with the KM11, it manually injects RECLK to trigger another clock cycle.

Following the clock pulse from the KM11 through the PDP-11 timing circuitry, I could see that it was not getting past a couple of NAND gates. These two gates in question are used to decide what type of clock pulse will be generated (CL1, CL2, CL3).

Examining the inputs to the gates of E63 I was able to see that both pins 10 & 13 were low, they have the names CLK1 (0)H and CLK1 (1)H. With low inputs on both and gates, there was no possibility of the clock pulse ever making it through the gates.

The source of CLK1 (0)H and CLK1 (1)H is a 74175 which is essentially a 4 flip-flops in a single chip. Looking closely, we can see that the signals we are interested in are Q and Q̅, (if Q is 1 Q̅ has to be 0). As both Q and Q̅ were 0, that would heavily imply that one of the outputs had failed.

The Fix

Having identified it as E36 on the uWord board as bad, I swapped it for another board, and the system sprung to life!

I have ordered a replacement 74175 and will install it when it arrives.

Conclusion

Ultimately, the UniBone was not in any way responsible for the failure. I did, however, identify a mistake that may have caused problems later on.

The failure was ultimately a bad flip-flop in the clock logic and not a bus issue as originally suspected. When not running in single step mode, the output from the flip-flop was still bad but was permanently stuck at 0. It must have caused the system to generate the wrong clock sequences and ultimately caused the computer to malfunction.