# **Embedded-DSP SuperH Family and Its Applications**

Toru Baji Hiroshi Takeyama Tetsuya Nakagawa ABSTRACT: Recently sales have grown markedly in the field of mobile communication terminals such as global system for mobile communications (GSM), personal digital cellular telecommunication system (PDC), personal handyphone system (PHS), and in digital consumer equipment such as car navigation systems and digital still cameras. These systems are configured with a CPU and a general-purpose digital signal processor (DSP). The CPU executes communication protocol control and system control; the general-purpose DSP performs signal processing such as speech compression and image processing. For mobile communication systems and digital consumer products for the near future, integration of the CPU and DSP is required to achieve a low-price, small-size, low-power consumption system. Hitachi, Ltd., has introduced a new-generation microcomputer SH-DSP into this market. The SH-DSP consists of a SuperH-RISC (reduced instruction set computer) microcomputer and a high-performance DSP engine. With the SH-DSP, Hitachi has enhanced its product line for a new wave of applications, such as digital still cameras, mobile communication terminals, and speech processing systems.

### INTRODUCTION

THE SH-DSP incorporates a 32-bit RISC microcomputer, which is completely upward compatible with the SH microcomputers (SH-1 and SH-2), and includes high performance DSP functions. Examples of target applications are mobile communication terminals, digital consumer equipment, and multimedia systems. One example of digital consumer equipment is the digital still camera. Hitachi has developed a digital camera system based on the SH-DSP (Fig. 1).

CCD: charge coupled device

TG: timing generator



MP-VRAM: multiport video RAM

ADC: analog-to-digital converter

C: chrominance signal

Fig. 1—SH-DSP Used in a Digital Still Camera System.

IrDA: Infrared Data Association

CDS: correlated double sampling

DAC: digital-to-analog converter

SH-DSP is a compact reduced instruction set computer (RISC) microcomputer with an embedded high-performance DSP engine. Digital still cameras using this processor are also able to perform speech compression and decompression efficiently.

ROM: read only memory

Y: illuminance signal

AGC: automatic gain control



Fig. 2—SH Block Diagram and Features.

SH-DSP incorporates a high-performance DSP engine and employs a 3-bus configuration for the DSP enhancement of the conventional SuperH RISC microcomputer. There are big advantages in chip size reduction and lower power consumption because memories and peripheral functions can be shared compared with a discrete CPU and DSP integrated in one chip.

In the SH-DSP, in addition to CPU operation, the RISC engine is also used to generate three addresses and control the DSP when executing DSP instructions. This mode of operation achieves high DSP performance with minimum increase in circuitry. A 3-bus configuration is employed that enables parallel execution of two data accesses and one program fetch for DSP operation. Fig. 2 shows the SH-DSP block diagram and features.

Besides 1-cycle multiply-accumulate, a total of 10 general-purpose data registers are incorporated in the DSP engine. For making the most of these hardware resources, 32-bit pseudo VLIW (very long instruction word) instructions are introduced.

Furthermore, functions such as zero-overhead repeat loop and module addressing, which are indispensable for DSPs, are also included. Memories and peripherals can be shared between the DSP and CPU. This enables the use of SuperH DRAM (dynamic random access memory) direct interface to the DSP, a function which is not seen in other DSPs. This is especially effective for digital still cameras, which require large-capacity DRAM image memory. Also sharing of various circuits

between CPU and DSP reduces the number of circuits on the LSL

### SH-DSP PERFORMANCE

Fig. 3 shows a comparison of CPU and DSP performance in each processor. SuperH has higher DSP performance than general RISC microcomputers because it has a 3-cycle multiply-accumulator.

A general-purpose DSP, on the other hand, has the highest DSP performance because of its 1-cycle multiply-accumulate and 3-bus configuration, but has lower CPU performance because it does not support 8-/16-/32-bit data access and does not include so many versatile CPU instructions.

Unlike the SuperH alone or a general-purpose DSP, the SH-DSP has higher performance for both the DSP and CPU because it is perfectly upward compatible with the SH-1 and SH-2 and executes all of the general-purpose DSP's functions. CPU performance of the SH-DSP is 60 million instructions per second (MIPS), which is the same as that of a 60-MHz SH-DSP. The DSP performance is 120 million operations per second (MOPS), which is almost the same as that of a general-

purpose DSP. The actual benchmark test shows that SH-DSP performance is at the top level among generalpurpose processors 1).

### SH-DSP DIGITAL STILL CAMERA SYSTEM

A digital still camera system based on the SH-DSP

has been developed. Still image compression by Joint Photographic Experts Group (JPEG) processing was executed in just 0.55 second for a Video Graphics Array (VGA) full-size image ( $640 \times 480$  pixels, Y: U: V = 4:2:2). For a quarter-VGA (QVGA) size image, four to seven frames can be processed in one second



Fig. 3—DSP and CPU Performance of SH-DSP. SuperH shows higher DSP performance due to a 3-cycle multiply-accumulator. The SH-DSP has 60-MIPS CPU performance as well as 120-MOPS DSP performance.

MOPS (million operations per second) is a DSP performance measure, which shows the number of operations such as multiplications executed in one second.

MIPS (million instructions per second) is a CPU performance measure, which shows the number of Dhrystone benchmark instructions executed in one second.



ATA: advanced technology attachment bus Y: Illuminance signal

ADC: analog-to-digital converter C: chrominance signal

Fig. 4—Block Diagram of an SH-DSP Digital Still Camera System. This digital still camera system consists of the camera board (MV-DS10) which captures the still image and converts it to a digital signal, and the system controller board (DC-DS1) that performs both image and speech compression/decompression and system control. The entire camera system can be controlled by a single SH-DSP.

in parallel with speech compression/decompression processing.

## Digital Still Camera System Concept

In a digital still camera system using SH-DSP, functions are shared between the DSP engine and the compact RISC microcomputer. The DSP engine executes digital signal processing, still image and speech compression; the compact RISC microcomputer executes the disc operating system (DOS) file management, communications, and camera system control. Although a dedicated LSI is also required for a conventional digital still camera systems, only the SH-DSP and its software are required to configure a digital still camera system using an SH-DSP.

### Digital Still Camera System

Digital still camera system features include: still image compression and decompression by JPEG; speech compression and decompression by adaptive differential pulse code modulation (ADPCM) G.721; data transfer and control with a personal computer via RS232C and infrared data association (IrDA 1.0) interfaces; data recording on a compact flash memory card; and decompression of compact flash memory image data on a personal computer.

Fig. 4 shows the configuration of the digital still camera system that executes JPEG processing on the SH-DSP. This system is configured with a camera board (MV-DS10) that captures the still image and a system controller board (DC-DS1) that executes functions such as JPEG processing, speech compression, and system control. Details of the system controller board will be described next.

In this system, the SH-DSP and external devices operate at 60 MHz and 15 MHz, respectively. The program is located so that the digital still camera tasks—image compression and speech compression that affect the total camera performance, and the system control tasks, can be allocated to the proper memories.

- (1) DSP tasks: Located in on-chip ROM (read only memory)
- (a) JPEG still image compression/decompression program
- (b) Speech compression/decompression program
- (2) CPU tasks: Located on external ROM
  - (a) HI-SH7 real-time operating system (OS)
- (b) DOS compatible file management (stores image and speech data on compact flash memory card)
- (c) Communication (data transfer through RS232C



Fig. 5—JPEG Processing Performance and Code Sizes. The SH-DSP executes JPEG processing approximately 2.9 times faster than the SH-3. Full-size VGA image processing time is four times as long as QVGA. The JPEG code size is 15 kbyte, which can be allocated in the compact on-chip memory.

and IrDA)

### (d) Digital still camera system control

In the following section, the high-speed JPEG processing based on SH-DSP high-performance DSP instructions will be explained.

JPEG consists of the following four functions: discrete cosine transformation (DCT) for image compression and decompression; quantization and dequantization; Huffman coding and decoding; and image data input/output

The following SH-DSP feature functions have been applied to the four processes listed above, in order to achieve a high-speed operation: 1-cycle multiplyaccumulate; up to four parallel instruction executions; zero-overhead repeat (used in shift amount calculation during normalization); high-speed normalization instruction; and barrel-shift instruction

DCT and Huffman coding/decoding, which require most of the JPEG processing time, are implemented very efficiently using these feature functions. DCT processing converts the time domain data to frequency domain data. VGA images with  $640 \times 480$  pixels consist of 9,600 blocks, with  $8 \times 8$  pixels in each block.

Table 1. SH-DSP Speech Compression Middleware and Required MIPS

| Middleware                            | Required MIPS  |
|---------------------------------------|----------------|
| Full-rate GSM speech codec            | 3.1            |
| Half-rate GSM speech codec            | 23             |
| ADPCM speech code                     | 9.92           |
| TV conference G.723 speech codec      | 25.6 or 24.1*1 |
| G.729 speech codec                    | 25.4           |
| TV conference speech echo canceller*2 | 9              |

<sup>\*1 25.6</sup> MIPS for high bit rate (6.3 kbps), and 24.1 MIPS for low bit rate (5.3 kbps)

DTC processing time is equal to discrete cosine  $transformation -- vector\ multiply-accumulate \times the$ number of blocks.

The important thing is to execute the vector multiply-accumulation as fast as possible. in the SH-DSP, multiply-accumulate speed was increased approximately 2.5 times that of the conventional SuperH by the 1-cycle multiply-accumulate and fourway parallel instruction. Huffman coding and decoding speed was also increased to approximately four times by the high-speed normalization instruction and the barrel shift operation. Thus JPEG image compression speed has been increased by 2.9 times compared to the conventional 60-MHz SH microcomputer. Fig. 5 shows performance comparison results.

### SH-DSP SPEECH COMPRESSION **MIDDLEWARE**

Speech compression/decompression is one of the main DSP applications. Table 1 shows speech processing middleware packages that were developed for the SH-DSP. GSM (global system for mobile communications) is a cellular standard that is used almost everywhere worldwide except for the USA and Japan. Two types of speech codecs have been developed for this standard.

ADPCM (adaptive differential pulse code modulation) is a toll-quality speech codec that is widely used in PHS and digital exchanges for subscriber telephone lines. G.723 is a TV-conference speech codec that is usually combined with an echo canceller. One SH-DSP chip can execute processing of these two program modules that together require approximately a total of 35 MIPS.

The G.729 compression rate is four times as fast as that of ADPCM; still its speech quality is not degraded very much. Its processing delay is also about 15 ms, which is relatively short. Thus G.729 is a speech codec candidate for the future public land mobile telecommunication system (FPLMTS) cellular telephones and digital simultaneous voice and data (DSVD).

### SH-DSP APPLICATION IN MOBILE COMMUNICATIONS

Next, applications of the SH-DSP to mobile communication terminals such as PHS cordless telephones, and cellular phones will be explained. There are two types of processing required in mobile communications. One is DSP processing such as speech compression and equalization; the other is CPU processing such as communication protocol control.

Speech compression applied to mobile phones requires a high-performance DSP. This speech compression will require approximately 100 MIPS if it is processed using a general-purpose RISC processor. Therefore, DSP is employed to reduce the required MIPS to 20 to 30 MIPS. With a dedicated DSP, speech compression can be achieved at lower frequency and lower operating voltage for lower power consumption.

Communication protocol control requires nearly 1 Mbyte of code; usually written in C language. Since a DSP cannot support the C language efficiently, communication protocol control is taken care of by a general-purpose CPU.

A single SH-DSP can efficiently execute these contradictory tasks that are conventionally performed by two chips, achieving a lower-cost and lower-power consumption. Fig. 6 shows an example of a cellular terminal using SH-DSP.

However, this type of one-engine system should be carefully designed. Critical real-time DSP tasks should be executed together with complex asynchronous CPU tasks, and this scheme has the possibility to cause some problems. However these problems can be solved by the appropriate setting of task and interrupt priorities.

One such example is the asynchronous key input interrupt by the user. This interrupt might be ignored while intensive processing by a DSP speech codec is performed. A solution follows:

Key input occurs at a very slow rate compared to the fast system clock. It may be executed just once a second. Speech compression, though, is performed on a 20-to-30 ms time-frame basis. In other words, key input tasks are executed just once for every 30 to 50

<sup>\*2</sup> Tail length 96 ms (restricted due to on-chip RAM size), and continuous adaptation in the frequency domain



Fig. 6—Example of a Cellular Terminal System Configuration Using the SH-DSP.

One SH-DSP executes two different processing tasks more efficiently than two conventional processors—a CPU and a DSP—providing lower power consumption and lower system cost.



Rx: receive Tx: transmit Mo: monitor

Fig. 7—Example of Task
Scheduling.
Loss of strict-real-time CPU
processing can be avoided by
executing such processing within
the highest-priority CPU task every
TDMA frame.

compression tasks. Moreover, the CPU tasks for key input are simply to fetch the character and determine its function, which does not take many processing cycles. Since speech compression takes at least 500,000 cycles, there is little effect even if one key input interrupt is inserted between speech compression. Therefore, key input interrupt loss can be avoided by setting the key input interrupt priority level higher than DSP processing such as speech compression.

Similarly, there is a possibility that a certain CPU tasks such as setting the TDMA timer or synthesizer, which require strict real-time operation, will be lost while intensive processing by a DSP speech codec is performed. Although such CPU processing does not require many cycles, at least it must be executed once every TDMA frame.

Again this problem can be solved if the highest priority level is given to one CPU task. The required processing can be executed every TDMA frame under this highest priority task. Fig. 7 shows an example.

As described above, the SH-DSP, which incorporates both DSP and CPU functions, can achieve a lower-cost and lower-power consumption system.

### **CONCLUSIONS**

A new-generation microcontroller SH-DSP with enhanced DSP functions has been discussed, together with its applications in digital still cameras, speech codecs, and mobile communication systems.

Recently mobile communication and digital consumer products have become very popular. Demands for lower prices, lower power consumption, and higher-performance have become stronger and stronger, sometimes requiring higher-level technology than that used in high-end equipment. The SH-DSP is a solution to these demands.

With a minimum of additional circuitry, the SH-

DSP can improve DSP performance to approximately three times that of conventional SuperH processors. For the future, Hitachi will continue to further enhance the DSP and CPU performance, reduce power consumption, and reduce the cost of the SuperH series.

#### **REFERENCE**

(1) DSP on General-Purpose Processors, Berkeley Design Technology Inc., WWW:http://www.bdti.com (1997)

#### **ABOUT THE AUTHORS**



#### Toru Baji

Joined Hitachi, Ltd. in 1977. Belongs to the 1st System LSI Development & Business Center, System LSI Business Operation, Semiconductor & Integrated Circuits Division. Currently working as the project manager of SH-DSP LSI developments. A member of the Institute of Electronics, Information and Communication Engineers of Japan and the Institute of Electrical and Electronics Engineers. E-mail: baji@cm.musashi.hitachi.co.jp



### Hiroshi Takeyama

Joined Hitachi, Ltd. in 1969. Belongs to the Customer Marketing Dept., Electronic Devices Business Gr., Semiconductor & Integrated Circuits Division. Currently managing the development of SH-DSP image processing applications.

E-mail: takeyama@cm.musashi.hitachi.co.jp



### Tetsuya Nakagawa

Joined Hitachi, Ltd. in 1983. Belongs to the Multi-Media LSI Development Dept., System LSI Business Operation, Semiconductor & Integrated Circuits Division. Currently working on the development of SH-DSP advanced applications.

E-mail: tetsuya@crl.hitachi.co.jp