Saturday, April 26, 2008

Self-healing computers for NASA spacecraft

A great usability feature ...

"As you can guess, hardwired computer systems are much faster than general-purpose ones because they are designed to do a single task. But when they fail, they need to be totally reconfigured. This can be just a costly problem in a lab on Earth, but it can be vital in space. This is why a University of Arizona (UA) team is working with NASA to design self-healing computer systems for spacecraft. The UA engineers are working on hybrid hardware/software systems using Field Programmable Gate Arrays (FPGAs) to develop these reconfigurable processing systems. As said the lead researcher, 'Our objective is to go beyond predicting a fault to using a self-healing system to fix the predicted fault before it occurs.'

This research work has been led by Ali Akoglu, an assistant professor in UA's Electrical and Computer Engineering Department, and his students in the Reconfigurable Computing Laboratory (RCL). You can see above three of these students, Kevin Carr, Adarsha Sreeramareddy and Jeff Josiah, showing the FPGA circuits they're working on. (Credit: UA) You'll find more details about this project -- and a larger version of the above picture -- on this page describing the project dubbed SCARS (for "Self-Configurable Architecture for Reusable Space Systems"), which is being carried out in collaboration with NASA's Jet Propulsion Laboratory.

Now, what is the UA team working on? "Currently, they are testing five hardware units that are linked together wirelessly. The units could represent a combination of five landers and rovers on Mars, for instance. 'When we create a test malfunction, we try to recover in two ways,' explained Akoglu. 'First, the unit tries to heal itself at the node level by reprogramming the problem circuits.'"

But what happens if this doesn't work? "If that fails, the second step is for the unit to try to recover by employing redundant circuitry. But if the unit's onboard resources can’t fix the problem, the network-level intelligence is alerted. In this case, another unit takes over the functions that were carried out by the broken unit. 'The second unit reconfigures itself so it can carry out both its own tasks and the critical tasks from the broken unit,' Akoglu explained. If two units go down and can't fix themselves, the three remaining units split up the tasks. All of this is done autonomously without human aid."

For more information, you can read a technical paper presented by the UA team on February 26, 2007 at the IEEE NASA/ESA Conference on Adaptive Hardware and Systems (AHS) held in Edinburgh, UK. The title of the paper is "Hierarchical Built-in Self-testing and FPGA Based Healing Methodology for System-on-a-Chip." Here is a link to the abstract.

Here is the end of the abstract. "We introduce a novel self-healing on the fly mechanism for system-on-chip (SoC) using field programmable gate array (FPGA) technology that localizes and isolates the faulty area and then replaces the functionality through partial configuration of the FPGA. Even though isolation mechanism requires additional control circuitry, overall area overhead is greatly reduced by eliminating the need for redundant components on the chip. In case of no fault, FPGA resources are available for additional functionality that might be required in time."    (Continued via Roland Piquepaille's Technology Trends )    [Usability Resources]


Post a Comment

<< Home

<< Home