The amateur radio payload on SumbandilaSat continues
to woo radio amateurs around the world thanks to SunSpace
engineer Niki Steenkamp’s rescue operation. Unfortunately for
SA radio amateurs, when images are being downloaded
SumbandilaSat is not available for transponder operation.
In early June for an unknown reason (but probably related to a major radiation event on 7 June 2011 – see www.space.com/11919-solar-flare-june-7-solar-weather-coronal-mass-ejections.html), the primary controller on the power distribution unit (PDU) powering the on board computer (OBC) stopped responding to commands from the ground station. It seemed that the microcontroller code had been corrupted and was executing erratically. But thanks to the ingenuity of SunSpace engineers the problem was resolved and SO67 is back in operation.
Niki Steenkamp at SunSpace who worked on the problem said that the power to each component on the satellite can be switched on or off through a set of four dual redundant power distribution units (PDUs). Each PDU contains eight switches which are controlled by a small microcontroller.
"Each switch has an electronic trip functionality to protect the switch (and the satellite) in the event of a load failure. The trip can be triggered by a number of sources, including over-current and feedback from the load. The feedback from the load allows the load to monitor itself in more detail and to initiate a trip when an anomaly arises. Each PDU has two separate and identical sets of switches and controller to be dual redundant. Each switch can be switched on in a number of modes (like auto reset after trip, watchdog timeout, trip override etc.).
"Normally we would just switch over to using the backup controller, but due to the PDU design and the specific failure mode of the primary controller, it caused the power switch on the backup controller to trip the moment it is switched on. We have determined that this trip was not a "legitimate" trip and was caused by the state of the primary controller. In other words, there was nothing wrong with the OBC. We could instruct the backup controller to switch on in "override" mode which would ignore all trips, and this worked, but it was not safe since in override mode the switch looses some functionality required for the OBC to operate reliably (the normal mode used for the OBC is called AutoWatchdog mode and includes auto reset after a trip and a watchdog mechanism to power cycle the OBC should it become unresponsive). What we needed was an AutoWatchdog mode, but with a hardware trip override active.
"There were two options: either do a full firmware update of the backup PDU controller from the ground station or hack the back-up PDU. The first option is tricky and to some extent a risky operation.
"The second option was to see of we could "hack" the backup PDU to behave like we wanted it to. After some investigation it was found that a hack may be possible. When switching on a channel in auto watchdog mode, the controller first sets the hardware override for a short duration to prevent the in-rush current from tripping the switch and then removes the override. Looking at the disassembled machine code, I found a sub-routine call instruction which cleared the override. All I needed to do was to overwrite the call with no-operation (NOP) instructions to prevent the override from being removed."
A low-level debugging command was used to selectively program the three bytes to 0x00 (the operation code of a NOP) in the program flash memory. This operation modified the code, and resulted in the cyclic redundancy check (CRC) that is done at boot-up, failing and the code not booting.
"I determined what values to write to an unused memory location within the code so that the CRC again matches the original. This was tested on the ground first and verified to be working. It was then done during an overpass and since then we could switch on the OBC in the required auto watchdog mode, with the hardware override permanently active. Work is continuing to try and recover the primary controller of the particular PDU, but for the time being, we can continue with the normal satellite operations.