Streszczenie treści zawartej na stronie nr. 1
Application Report
SPRAA56 – September 2004
DSP/BIOS Real-Time Analysis (RTA) and Debugging
Applied to a Video Application
Brian Jeff DSP Field Software Applications
Arnie Reynoso Software Development Systems
ABSTRACT
DSP/BIOS and the Reference Frameworks allow developers to non-intrusively instrument
real-time applications. The software provided with this application note applies real-time
analysis (RTA) services to a working applicationa H.263 encode/decode loopback
example for the
Streszczenie treści zawartej na stronie nr. 2
SPRAA56 Figures Figure 1. Basic Data Flow of the Video Application...................................................................... 4 Figure 2. Detailed Application Data Flow Showing Memory Buffers........................................... 8 Figure 3. Task Partitioning in the Modified Application ............................................................... 9 Figure 4. CPU Load Measurement at Run-Time .......................................................................... 15 Fig
Streszczenie treści zawartej na stronie nr. 3
SPRAA56 Quantization is the process of dividing a continuous range of input values into a finite number of subranges. Each subrange is assigned a specific output value. The Q factor, or quantization factor, describes the level of quantization used to store the frequency domain representation of the encoded image. Q factor often varies dynamically in an encoder when a constant bitrate is targeted, so it is useful to display the Q factor dynamically with the video stream. Frame type designat
Streszczenie treści zawartej na stronie nr. 4
SPRAA56 Figure 1 shows a simplified view of the sequential flow of capture, processing, and display tasks in the application. Camera TSK TSK TSK tskInput tskVideoProcess tskOutput Device Device Driver Driver SCOM Figure 1. Basic Data Flow of the Video Application Before video data reaches the first stage, it must be converted to digital data, a process that is managed by the input device driver. Analog video input is converted by an on-board NTSC decoder chip into a digital
Streszczenie treści zawartej na stronie nr. 5
SPRAA56 2.1 DSP/BIOS and RF5 Components Used The base application leverages various DSP/BIOS real-time analysis components to support debugging capabilities that are not intrusive to the system performance. The following three modules are included with the core DSP/BIOS library, and can be used in any application that uses DSP/BIOS and on any TI DSP supported by DSP/BIOS: • LOG Logging events • STS Statistics accumulators • TRC Control of real-time capture In addition to these DS
Streszczenie treści zawartej na stronie nr. 6
SPRAA56 2.1.2 STS An STS object accumulates the following statistical information about an arbitrary 32-bit wide data series: count, total, and maximum. Statistics are accumulated in 32-bit variables on the target DSP and in 64-bit variables on the host PC. When the host polls the target for real-time statistics, it resets the variables on the target. This minimizes space requirements on the target, while allowing you to keep statistics for long test runs. As part of using the DSP/BIOS
Streszczenie treści zawartej na stronie nr. 7
SPRAA56 2.2 Requirements for Viewing RTA Benchmarks In order for any of the DSP/BIOS-based RTA tools to be visible, the DSP/BIOS components in Code Composer Studio version 2.30 or earlier and version 3.0 require that the applications .cdb configuration file be accessible and consistent with the executable .out file. This requirement is easily met during development. It can also be satisfied in demonstrations or delivered test examples. If you do not want to deliver source code with the ap
Streszczenie treści zawartej na stronie nr. 8
SPRAA56 720x576 YAfter420 y Device D De evi vice ce bitBuf 414 KB 414 KB Driver Dr Driiv ve err 512 KB Buffer B Bu uffe ffer r Yuv Yuv 422to 422to H.263 H.263 3 frames 3 f 3 fr ram ames es Cr 420 CbAfter420 CbArrau 420 enc dec Shared Scratch CrAfter420 Cb 207 KB 207 KB 6 KB 92 KB 1.5 KB scratch1 scratch2 Instance Instance 14 KB = 20 lines memory memory 14 KB Ke Key y In Inte tern rna all M Me em mo ory ry D DM MA A R Read/ ead/W W r riitte ( e (bac back kg gr roun ound) d) E Ex xtte ern rna all
Streszczenie treści zawartej na stronie nr. 9
SPRAA56 if(controlVideoProc.frameRateChanged) { txMsg.cmd = FRAMERATECHANGED; txMsg.arg1 = chanNum; txMsg.arg2 = controlVideoProc.frameRateTarget; controlVideoProc.frameRateChanged = FALSE; MBX_post( &mbxProcess, &txMsg, 0 ); } While implementing control via the host PC did not specifically require a separate task in the modified application, adding a discrete control task makes the application more scalable. For example, a user interface or communications link from a
Streszczenie treści zawartej na stronie nr. 10
SPRAA56 This call returns a status structure of type IH263ENC_Status that contains the number of bits sent to the encoder, the frame type, and other data. The features implemented in the control API can vary widely from one algorithm to another. The bitrate and frame type measured by this API may not be available with all third-party video algorithms unless specifically requested. Thus, it is important that the encoder and decoder algorithms used by your application have the necessary hook
Streszczenie treści zawartej na stronie nr. 11
SPRAA56 4 RTA Techniques for Performance Measurement The RTA techniques described in this section are largely application-specific calls to DSP/BIOS RTA services via APIs in the run-time code. These API calls can be added to any application without modifying its logical structure. In the case of the video application, performance overhead of the RTA tools is expected to be minimal because the calls are made at the frame rate of 30 or 25 Hz, or even in some cases every 30 or 25 frames, a v
Streszczenie treści zawartej na stronie nr. 12
SPRAA56 4.2 Measuring Task Scheduling Latencies Scheduling latency is defined as the time between a wakeup signal (semaphore post) to a pending task and the actual start of that task's execution. DSP/BIOS provides a mechanism for measuring scheduling latency with the TSK_settime and TSK_deltatime APIs. These functions accumulate the difference in time from when a task is made ready to the time TSK_deltatime is called. The placement of the TSK_deltatime API therefore determines what is act
Streszczenie treści zawartej na stronie nr. 13
SPRAA56 The low-resolution CLK_getltime API is used instead of the high-resolution CLK_gethtime because the range of the latency is known to be on the order of one or more frame times, where a frame time is 33.33 ms in NTSC systems. The low-resolution timing measurement provided by CLK_getltime is more cycle efficient and is in milliseconds. Since the data is displayed in milliseconds, the lower-resolution time base results in a faster measurement, with sufficient accuracy for the latency
Streszczenie treści zawartej na stronie nr. 14
SPRAA56 last30frame.current = CLK_getltime(); // check to see if we dropped any frames benchVid.framesDropped.current = last30frame.current - last30frame.previous; benchVid.framesDropped.current -= 1000*(frameCnt / DISPLAYRATE); benchVid.framesDropped.current /= DISPLAYRATE; last30frame.previous = last30frame.current; if (benchVid.framesDropped.current > 0 && frameRateTarget == DISPLAYRATE ) { LOG_error("Dropped %d frames", benchVid.framesDropped.current); UTL_logDebug2(
Streszczenie treści zawartej na stronie nr. 15
SPRAA56 ‘minloop’ (in units of ~ cycles) ‘count’ is # hits of t0 t1 t0 t1 LOAD_idlefxn in the window Window = 500ms (default) IDL load 100 – IDLload gives App CPU Load cpuload = (100 - ((100 * (count * minloop)) / total)) Figure 4. CPU Load Measurement at Run-Time The LOAD module relies on an IDL thread to be inserted in an application to calibrate the amount of time needed to run a single iteration of the DSP/BIOS idle loop. It estimates the CPU load by dividing the idled tim
Streszczenie treści zawartej na stronie nr. 16
SPRAA56 In video applications that handle the full resolution of 720x480, each from contains about 675 KB of data. Such applications must constantly move video frames from internal working memory buffers to external frame buffers and back. This often results in several MB of memory transfers through the external bus for each frame. At 30 frames per second, the memory transfer bandwidth requirement can be a significant CPU resource requirement. As resolutions increase to high-definition siz
Streszczenie treści zawartej na stronie nr. 17
SPRAA56 These estimates are fairly accurate for the color conversion functions in the input and display tasks, but the estimates are less accurate for the encoder and decoder algorithms in the processing task. Ideally, the memory bus utilization should be available in the status structure or estimated on the data sheet of an algorithm. It is recommended that you request this information from third-party algorithm providers during application development, particularly for applications above
Streszczenie treści zawartej na stronie nr. 18
SPRAA56 Most current encoders use three primary frame types: Intracoded frames, Predicted frames, and Bidirectional predicted frames. These are referred to as I, P, and B frames. The H.263 encoder supplied with the example application encodes I and P frames only, but you can configure the ratio of I to P frames. Often this ratio is used in the quality vs. bitrate tradeoff. The H.263 encoder has hooks to allow for monitoring or selecting the frame type. This example application only monitor
Streszczenie treści zawartej na stronie nr. 19
SPRAA56 The benchmarking routines send out selected benchmark data at a prescribed interval: every th 30 frame, every I (Intracoded) frame, or only on a dropped frame. The interval can be selected by controlling the .rtaMode variable within the control structure. Benchmark data is transmitted to the CCStudio on the host PC via RTDX (Real-Time Data eXchange), which is used behind the scenes by the DSP/BIOS RTA tools. RTDX allows Code Composer Studio to read from or write to target buffers i
Streszczenie treści zawartej na stronie nr. 20
SPRAA56 The application supplied with this note references board support software and libraries installed with the DM642 EVM. The project options assume this software is installed in $TI_DIR$\boards\evmdm642. The project also references the H.263 encoder algorithm, which is provided as object code with the DM642 EVMs Board Support Package. Therefore, that package and all its associated components must be installed before running or building the supplied example as delivered. Tconf script