EMBEDDED SYSTEM DEBUGGING
Application Debugging: Simulators and emulators are two powerful debugging tools which allow developers to debug (and verify) their application code. These tools enable programmer to perform the functional tests and performance tests on the application code. Simulator is a software which tries to imitate a given processor or hardware. Simulator is based on the mathematical model of the processor. Generally all the functional errors in an application can be detected by running it on the simulator. Since simulator is not actual device itself, it may not be an exact replica of the target hardware. Hence, some errors can pass undetected through the simulator. Also, the performance of an application can not be accurately measured using Simulator (it only provides a rough estimate). Generally most development tools come under an integrated environment, where Editor, Compiler, Archiver, Linker and Simulator are integrated together. Emulator (or Hardware Emulator) provides a way to run the application on actual target (but under the control of a emulation software) hardware. Results are more accurate with emulation, as the application is actually running on the real hardware target.
Hardware Debugging: Developer of an Embedded System often encounters problems which are related to the Hardware. Hence it is desirable to gain familiarity with some Hardware Debugging (probing tools). DVM, Oscilloscope (DSO or CRO) and Logical Analyzer (LA) are some of the common debugging tools, which are used in day to day debugging process.
Memory Testing Tools There are a number of commercially available tools which help programmers to test the memory related problems in their code. Apart from Memory leaks, these tools can catch other memory related errors - e.g. freeing a previously allocated memory more than once, writing to uninitialized memory etc. Here is a list of some freely (no cost) available Memory Testing tools:
Debugging an Embedded System
(a) Memory Faults
One of the major issue in embedded systems could be memory faults. Following types of Memory Fault are possible in a system
(i) Memory Device Failure: Some times the memory device may get damaged (some common causes are current transients and static discharge). If damaged, the memory device needs replacement. Such errors can occur in run time. However such failures are very rare.
(ii) Address Line Failure: Improper functioning of address lines can lead to memory faults. This could happen if one or more address lines are shorted (either with ground or with each other or with some other signal on the circuit board). Generaly these error occur during the production of circuit board, and post-production testing can catch such errors. Some times the address line drivers might get damaged during run time (again due to current transients or static discharge). This can lead to address line faults during run time.
(iii) Data Line Failure Can occur if the data lines (one or more) are shorted (to ground or with each other or with some other signal). Such errors can be detected and rectified during post-production testing. Again, the electric discharge and current transients can damage can damage the data line drivers, which might cause to memory failures during run time.
(iv) Corruption of few memory blocks : Some time a few address locations in the memory can be permanently damaged (either stuck to Low or stuck to High). Such errors are more common with Hard-disks (less common with RAMs). The test software (power on self test) can detect these errors, and avoid using these memory sectors (rather than replacing the whole memory).
(v) Other Faults : Some times the memory device may be loosely inserted (or may be completely missing) in to the memory slot. Also there is a possibility of Fault in Control Signals (similar to Address and Data Lines).
There are two types of sections in System Memory - Program (or code) sections, and Data sections. Faults in program sections are more critical because even the corruption of one single location can cause the program to crash. corruption of data memory also could lead to program crashes, but mostly it only cause erratic system behavior (from which the application could gracefully recover - provided that software design takes care of error handling).
Following simple tests can detect memory faults:
(a) Write a known patter "0xAAAA" (All odd data bits being "1" and even bits being "0") in to the memory (across all address ranges) and read it back. Verify that the same value (0xAAAA) is read back. If any Odd Data line is shorted (with even data line or with Ground), this test will detect it. Now repeat the same test with data pattern "0x5555". This test will detect any shorting of the even Data line (short with ground or with odd data line). Also, these two test in combination can detect any bad memory sectors.
(b) Write a unique value in to each memory word (across entire memory range). Easiest way to choose this unique value is to use the address of given word as the value. Now read back these values and verify them. If the verification of read back values fails (whereas the test-a passes), then there could be a fault in address lines.
The tests "a" and "b" can be easily performed as part of power on checks on the system. However it will be tricky to perform these tests during run time, because performing these test will mean loosing the existing contents in the memory. However certain systems run such memory tests during run time (once in every few days). In such scenarios, the tests should be performed on smaller memory sections at a time. Data of these memory sections can be backed up before performing the test, and this data can be restored after test completion. Tests can be run one by one on each section (rather than running the test on entire memory at a time).
(b) Hardware Vs Software Faults
In Embedded System, Software is closely knit with the Hardware. Hence, the line dividing the Hardware and Software issues is very thin. At times, you may keep debugging you software, whereas the fault may lie somewhere in the Hardware (and vice versa). The problem becomes more challenging when such faults occur at random and can not be reproduced consistently. In order to disect and debug such tricky issues, a step-wise approach needs to be followed.
* Prepare a test case: When you observing frequent application crash because of some unknown issues, you should plan to arrive at a simpler test case (rather than carrying on debugging with entire application). There are two benefits of this approach: A simpler application will take less time to reproduce the error and hence total debugging time is fairly reduced. Secondly, there are less unknown parameters (which might be causing the error) in a smaller application, and hence less debugging effort is needed. You can gradually strip down the application (such that the error is still reproducible) to get a stripped down version of the application which can act as a preliminary version of the test case.
* Does the error change with change in operating condition: Does the error change (frequency of error or type of error) with change in the operating conditions (e.g. board temperature, processor speed etc)? If so, then your errors might be hardware related (critical timings or glitches on signals).
* Is the error reproducible: Arriving at a test case which can reliably reproduce the error greatly helps the debug process. A random error (not reproducible consistently) is hard to trace down, because you can never be sure as to what is causing the system failure.
* Keep your Eyes Open: Always think lateral. Do not be inclined towards one possibility of error. The error could be in the application software, in the driver software, in the processor hardware on in one of the interface. Some times there could be multiple errors to make your life miserable - such errors can be caught only with stripped down test cases (each test case will catch a different error).
* Mantain Error Logs: While writing the application code, you should add the provision for system log in Debug version (Log can be disabled for the release version). Log of the events, just prior to system crash can tell you a great deal about the possible causes of failure.
Generally most systems perform a POST (Power On Self Test) on start up. POST may be during the pre-boot phase or during the boot phase. POST generally includes tests on memory and peripherals of a system.
User CommentsNo Posts found !
Login to Post a Comment.