Design Analysis of Multi-core DSP Image Processing System

Pingping Jia
Weinan Normal University, Weinan, Shaanxi, 714099

Keywords: Multi-core DSP; Image Processing; C6678

Abstract: The development of data signal processing technology is changing with each passing day. For embedded systems, both the communication rate and the complexity of processing algorithms are constantly being upgraded with the development of embedded systems. The real-time performance of embedded data processing in many application fields and the reliability is also getting higher and higher. In the context that the embedded system composed of the traditional single processor is far from being able to meet the increasingly high data processing requirements of users, the application of multi-core DSP processing technology in the design of embedded systems is also more and more extensive. This study proposes an image processing system based on multi-core DSP technology.

1. The Advantages of Multi-Core DSP Image Processing System

Specifically, the advantages of the multi-core DSP real-time image processing system mainly include the following aspects: First, the function is powerful. The demand for image processing is different, and the types of image cards on the market are also diverse. However, such image cards are relatively inferior in versatility and processing performance, and cannot meet the image processing requirements of different systems. The multi-core DSP real-time image processing system includes various functions such as image acquisition, image processing, and CP interaction, so that it can meet diverse image processing requirements. Second, the image processing system is powerful. The calculation of image processing requires high efficiency for data processing and requires a large amount of data to be processed. Therefore, an image processing system based on DSP must have powerful computing power. Again, apply advanced image processing theory. The DSP-based image processing system uses neural network technology, image edge detection theory, etc., thus greatly improving system performance. The DSP-based image processing system supports more image data formats, so that it can be applied to different occasions and its storage capacity is also larger.

2. Analysis of Multi-Core DSP Structure

Under normal circumstances, the evaluation of DSP parallel interconnect system mainly depends on three good aspects, namely processor unit quality, parallel processing system interconnect structure rationality, task assignment and parallel algorithm, etc., these three aspects will be the whole machine The performance of the system has a decisive influence, and the three interact and influence each other. Therefore, pay attention to the above three aspects when designing the multi-core DSP image processing system. The image processing system proposed in this study adopts TMS320C6678 DSP, which adopts 40nm process based fixed-point and floating-point SoC structure, including 8 C66 cores. Each core operates at 1-1.25GHz and has a total power consumption of 10W. It has 64KB of Level 1 storage space and 512KB of Level 2 storage space. The chip includes a shared SRAM for core data interaction with a capacity of 4MB. It also provides an external DDR3 controller interface for 8GB addressing. The C6678 has multiple interfaces on and off the chip, including four SRI0 high-speed serial ports, two PCIE interfaces, a 16-bit external memory EMIF interface, an SPI bus, an I2C bus, and interfaces such as a network port, UART, TSIP, and GPIO. The on-chip high-speed interconnect bus is the channel through which these interfaces interact with each processor. The C6678 can perform 32 fixed-point data operations or 16.
floating-point data operations in a single instruction cycle. The entire chip can provide 320GMAC fixed point calculation or 160GFLOP floating point calculation capability. It can be seen that C6678 has the advantages of high performance, high density, low power consumption, multi-core and easy to interact and expand. It is not only suitable for parallel calculation of image detail enhancement, but also meets the requirements of embedded platform and can effectively solve image processing. The complexity of the algorithm and the real-time performance of image processing give full play to the advantages of multi-core performance.

3. The System Structure and Function

The main structure of the system includes C6678 multi-core DSP processor, FPGA processor, DDR3 controller, Ethernet interface, level shifter, 12C, SPI bus interface, RapidIo high-speed serial bus, memory, video input and output module, DSP power module. The FPGA processor has a Rocket I/O module embedded in it, which uses high-speed serial I/O technology. The single-channel transmission speed can reach 100Mb-3.2Gb per second, and the processor can implement a variety of high-speed serial communication protocols. RapidIo high-speed serial bus technology is a high-performance, low-pin-count packet-switched crossbar interconnect technology that supports multiple operating frequencies such as 250MHz, 500MHz, 750MHz, and 1GHz. Transmission performance can range from 1Gb to 60Gb per second. Long-distance transmission can be realized by 1x/4x serial interface. It adopts clock data recovery synchronization technology and 8B/10B encoding mechanism to support three baud rates of 1.25, 2.5 and 3.125GHz. In addition to the above main modules, it also includes reset monitoring, real-time clock, PMBUS debugging interface, power supply and so on. In each of the above functional modules, the main function of the FPGA is to preprocess the signal, which is formatted after receiving the external analog signal, and the multi-core DSP is mainly responsible for processing the back-end data for real-time output of the front end.

4. The System Key Module Design

Because C6678 works at speeds up to 10GHz, it can't use ordinary regulated power supply. The DSP power supply in this research and development is powered by 12V power supply. One UCD9222 and one UCD7242 form the DSP core voltage to realize two voltage outputs. The UCD9222 chip is a digital PWM controller that can control the output of two independent power supplies. The UCD7242 is a driver chip fully compatible with the 9222, which can generate two independent power supplies. During the operation of the system, the pulse width modulation wave of UCD9222 is used as the input of UCD7242 to control the MOSFET switch. The closing time of the switch can be different to obtain the output voltage of different sizes. The output voltage is returned to UCD9222 in the form of feedback signal to compare the internal reference signal of UCD9222 chip. The power control chip can respond to the output voltage offset according to the comparison result; it should be noted that the analog front-end input should be added with a voltage divider network to avoid the voltage feedback being greater than the reference voltage maximum.

In the whole image processing system, FPGA is the core device to realize the system processing algorithm and control chip function. In this system, the Virtex-5 series chip with MGT high-speed interface and rich logic resources is selected to realize real-time high-bandwidth video transmission. Develop complex image processing logic. Specifically, the main functions of the FPGA include: completing the C6678 power-on reset timing control, configuring the video codec chip, parsing and reconstructing the protocol of the input and output video, constructing the RapidIo controller, and completing the 4-channel video high-speed real-time transmission. During the system operation, the FPGA first detects and parses the format of the input video, extracts the Y, Cb, and Cr components of the effective pixels, converts them into RGB images, and sends them to the DSP for subsequent processing through RapidIo; and the FPGA also initiates RapidIo reading. Operation, reading the processed effective pixel data from the DSP memory according to the video output timing, and returning to the original digital video timing output after completing the reverse format conversion.
This research adopts 4x RapidIo controller, which conforms to SRIO1.3 specification, and the data transmission rate can reach 2.5Gb per second, which can meet the bandwidth requirement for multi-channel video synchronous transmission. The four-buffer mechanism is used for parallel pipeline processing of video stream. Read and process PAL video in real time; pre-distribute four buffers for each video in DDR3 memory, each buffer stores one frame of image. In the process of image pipeline processing, the FPGA writes the acquired image data to the 0 buffer, and the DSP reads the processed data and outputs it while processing the image in the buffer, 1 buffer waits; the current frame image is completely written to the 0 buffer. After the zone needs to notify the DSP, two GPIO control signals can be used to generate the edge trigger level. When the lower frame image is input, the effective pixel is written in the 1 buffer, and the pixel stream in the 3 buffer is read, and the DSP processes the buffer. The image in the area 0 finally realizes the real-time pipeline processing of the image sequence, and the DSP chip can also obtain the image processing time of 40s per frame.

GPIO[13:12] controls the RapidIo read and write loop and triggers the next operation on the edge. The read/write relationship between GPIO[13:12] and FPGA in this design is shown in Table 1 below:

Table 1 GPIO Pin Status Identification

<table>
<thead>
<tr>
<th>GPIO[13:12]</th>
<th>FPGA write</th>
<th>DSP processing</th>
<th>FPGA read</th>
<th>wait</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>3 buffer</td>
<td>2buffer</td>
<td>1buffer</td>
<td>0buffer</td>
</tr>
<tr>
<td>01</td>
<td>0 buffer</td>
<td>3buffer</td>
<td>2buffer</td>
<td>1buffer</td>
</tr>
<tr>
<td>10</td>
<td>1 buffer</td>
<td>0buffer</td>
<td>3buffer</td>
<td>2buffer</td>
</tr>
<tr>
<td>11</td>
<td>2buffer</td>
<td>1buffer</td>
<td>0buffer</td>
<td>3buffer</td>
</tr>
</tbody>
</table>

After receiving the GPIO interrupt, the DSP end reads the interrupt, processes the buffer data, and combines with the read and writes operations initiated by the FPGA to finally realize the circular pipeline operation of the four buffers.

OpenMP is a standard model based on the shared storage system structure. It integrates the compilation instruction statement and function library. It has the advantages of simple and practical, flexible design and short design cycle. The underlying shared storage architecture consists of multiple processors accessing the same memory, so they can interact with each other through shared variables. The TMS320C6678 multi-core DSP combined with the SYS/BIOS embedded operating system can run the OpenMP parallel programming model. When designing the parallel algorithm, this study is based on the iterative bilateral filter to enhance the image. A parallel bilateral filtering scheme is proposed. The iterative bilateral filter is a separable filter. The image is filtered once and the column is filtered. In the middle, the row direction should be processed in parallel, and each core needs to be filtered in a multi-core manner. After the row direction processing is completed, the column direction is divided in parallel, and sent to each core for parallel processing to obtain a final filtered output.

5. Conclusion

In short, with the continuous development of digital signal processing technology, the communication rate of embedded systems and the complexity of processing algorithms are getting higher and higher, especially users' requirements for real-time and reliability of the system are getting higher and higher. With the introduction of high-performance DSP chips and corresponding application solutions, the hardware design of embedded systems has become more and more popular, and the development cycle has also been shortened. This study proposes a real-time image processing system based on multi-core DSP, which greatly improves the efficiency of image acquisition and processing, and improves the quality of image processing. Of course, this technology also requires more in-depth research to apply higher performance multi-core DSP technology in embedded systems, and constantly improve the quality and efficiency of image...
processing.

References


