## PSC1000 ${ }^{\text {TM }}$ Microprocessor Reference Manual

Patriot Scientific Corporation 10989 Via Frontera
San Diego, CA 92127
1 (619) 6745000 voice
1 (619) 6745005 fax
www.ptsc.com

## PSC1000 ${ }^{\text {TM }}$ Microprocessor Reference Manual

## DISCLAIMER

Patriot Scientific Corporation（PSC）reserves the right to make changes to its products or specifications at any time，or to discontinue any product，without notice．PSC advises its customers to obtain the latest product information available before designing－in or purchasing its products．PSC assumes no responsibility for the use of any circuitry described other than the circuitry embodied in a PSC product．PSC makes no representations that the circuitry described herein is free from patent infringement or other rights of third parties which may result from its use．No license is granted by implication or otherwise under any patent，patent rights or other rights， of PSC．

Information within this document is subject to change without notice，but was believed to be accurate at the time of publication．No warranty of any kind，including but not limited to implied warranties of merchantability or fitness for a particular application，are stated or implied．PSC and the author assume no responsibility for any errors or omissions，and disclaims responsibility for any consequences resulting from the use of the information included herein．

## PSC1000 ${ }^{\text {TM }}$ Microprocessor Reference Manual

32－BIT RISC PROCESSO R

Copyright © 1995 George William Shaw，All Rights Reserved．
Copyright © 1995－1999 Patriot Scientific Corporation
Printed in the U nited States of A merica
Printing Date： 1999 M arch 25

Text，tables，and illustrations by George W．Shaw
Edited by Jeffrey Conroy
For company and product information，access www．ptsc．com．Patriot Scientific Corporation is publicly traded over the counter，symbol PTSC．

ShBoom and PSC1000 are trademark of PatriotScientific Corporation．Any other brands and products used within this document are trademarks or registered trademarks of their respective owners．

The technology discussed in this document may be covered by one or more of the following US patents： $5,440,749 ; 5,530,890 ; 5,604,915 ; 5,659,703 ; 5,784,584$ ．O ther U S and Foreign patents pending．

## LIFE SU PPO RT PO LICY

Patriot Scientific Corporation＇s（PSC）products are not authorized for use as critical components in life－support appliances，devices or systems．Such use requires a specific written agreement signed by the appropriate PSC officer．Life－support devices or systems are devices or systems which（a）are intended for surgical implant into the body or（b）support or sustain life and whose failure to perform，when properly used in accordance with instructions for use provided in the labeling，can be reasonably expected to result in significant injury to the user．A critical component is any component of a life－support device or system whose failure to perform can be reasonably expected to cause the failure of the life－support device or system，or to affect its safety or effectiveness．U se of PSC products in such applications is understood to be fully at the risk of the customer．

## PSC1000 M icroprocessor

## 32-BIT RISC PROCESSO R

## Contents

DISCLAIM ER ..... ii
LIFE SU PPO RT PO LICY ..... iv
Figures ..... xii
Tables ..... XV
Documentation Typography and Nomenclature ..... xvii
Features ..... 1
General Description ..... 1
Purpose ..... 9
O verview ..... 9
Central Processing Unit ..... 11
Resources ..... 11
Clock Speed ..... 12
Microprocessing Unit ..... 15
Address Space ..... 17
Registers and Stacks ..... 17
Programming Model ..... 18
Instruction Set O verview ..... 19
ALU Operations ..... 21
Branches, Skips, and Loops ..... 22
Literals ..... 23
Data M ovement ..... 23
Loads and Stores ..... 24
Stack Data M anagement ..... 25
Stack Cache M anagement ..... 25
Byte O perations ..... 26
Floating-Point $M$ ath ..... 27
Debugging Features ..... 28
On-Chip Resources ..... 28
Miscellaneous ..... 28
Stacks and Stack Caches ..... 28
Stack-Page Exceptions ..... 29
Stack Initialization ..... 30
Stack Depth ..... 30

## PSC1000 Microprocessor

32-BIT RISC PROCESSO R
Stack Flush and Restore ..... 31
Floating-Point M ath Support ..... 33
D ata Formats ..... 33
Status and Control Bits ..... 33
GRS Extension Bits ..... 34
Rounding ..... 34
Exceptions ..... 35
Video RAM Support ..... 37
Register mode ..... 38
MPU Reset ..... 40
Interrupts ..... 40
Bit Inputs ..... 41
Instruction Pre-fetch ..... 41
> Posted-W rite ..... 41
On-Chip Resources ..... 41
Instruction Reference ..... 41
ANS Forth W ord Equivalents ..... 42
Java Byte Code Equivalents ..... 42
add ..... 43
adda ..... 43
addc ..... 43
addexp ..... 44
and ..... 44
bkpt ..... 45
_b ..... 46
cache ..... 48
call ..... 49
cmp ..... 49
copyb ..... 49
dbr ..... 50
dec ..... 50
denorm ..... 51
depth ..... 51
di ..... 52
divu ..... 52
ei ..... 52
eqz ..... 53
expdif ..... 53
extexp ..... 53
extsig ..... 54
frame ..... 55
iand ..... 56
inc ..... 56
Icache ..... 56
Id ..... 57
Ido ..... 58
Idepth ..... 58

## PSC1000 M icroprocessor

32-BIT RISC PROCESSO R
Iframe ..... 58
mloop ..... 59
mulfs ..... 60
muls ..... 61
mulu ..... 61
mxm ..... 61
neg ..... 62
nop ..... 62
norml ..... 63
normr ..... 64
notc ..... 64
or ..... 65
pop ..... 66
push ..... 68
replb ..... 71
replexp ..... 71
ret ..... 71
rev ..... 72
rnd ..... 72
scache ..... 72
sdepth ..... 72
sexb ..... 73
shift ..... 74
shl ..... 75
shr ..... 76
skip ..... 77
split ..... 78
st ..... 79
step ..... 80
sto ..... 80
sub ..... 81
subb ..... 81
subexp ..... 82
testb ..... 82
testexp ..... 83
xCg ..... 83
xor ..... 83
Virtual Peripheral Unit ..... 89
U sage ..... 90
Resources ..... 91
Register U sage ..... 91
Instruction Set ..... 91
Instruction Formats ..... 91
Jumps ..... 91
Literals ..... 92
O thers ..... 92

## PSC1000 Microprocessor

32-BIT RISC PROCESSO R
Execution Timing ..... 92
Techniques ..... 93
Address Space, M emory and Device Addressing ..... 94
Interrupts ..... 94
Bus Transactions ..... 94
Bit Inputs and Bit O utputs ..... 94
VPU Hardware and Software Reset ..... 94
Instruction Reference ..... 95
delay ..... 96
dskipz ..... 96
int ..... 96
jump ..... 97
Id ..... 97
mloop ..... 97
nop ..... 98
outf ..... 98
outt ..... 98
refresh ..... 99
tskipz ..... 99
xfer ..... 100
Direct Memory Access Controller ..... 103
Resources ..... 103
DMA Requests ..... 104
Prioritization ..... 104
Memory and Device Addressing ..... 105
Interrupts ..... 105
Bus Transaction Types ..... 105
Device Access Timing ..... 105
Maximum Bandwidth Transfers ..... 105
Terminating DMA I/O-Channel Transfers ..... 106
Other Capabilities ..... 106
Interrupt Controller ..... 107
Resources ..... 107
O peration ..... 108
Interrupt Request Servicing ..... 108
External Interrupts ..... 108
I/O -Channel Transfer Interrupts ..... 109
VPU int Interrupts ..... 109
ISR Processing ..... 109
Bit Inputs ..... 111
Resources ..... 111
Input Sources and Sampling ..... 111
DMA Usage ..... 112
Interrupt U sage ..... 112

## PSC1000 Microprocessor

## 32-BIT RISC PROCESSO R

General-Purpose Bits ..... 113
VPU Usage ..... 113
M PU U sage ..... 113
Bit O utputs ..... 115
Resources ..... 115
U sage ..... 115
Programmable M emory Interface ..... 117
Resources ..... 117
M emory System Architecture ..... 117
Memory Groups ..... 119
Memory Banks ..... 120
Device Requirements Programming ..... 120
Device Sizes ..... 120
Device Width ..... 121
Programmable Timing ..... 124
DRAM Refresh ..... 125
Video RAM Support ..... 125
System Requirements Programming ..... 126
RAS Cycle Generation ..... 126
D river Current ..... 127
Memory Faults ..... 127
I/O -Channel Programming ..... 127
On-Chip Resource Registers ..... 129
Bus O peration ..... 157
O peration ..... 157
I/O Addressing ..... 158
Bus Transaction Types ..... 158
M PU and VPU (non-xfer) M emory Cycles ..... 158
Cell M emory W rite from M PU ..... 159
Cell Memory Read to M PU/VPU ..... 159
Byte M emory W rite from M PU ..... 159
Byte Memory Read to M PU/VPU ..... 159
I/O -Channel Transfers ..... 159
Cell M emory W rite from Four-byte Byte-transfer Device ..... 159
Cell M emory Read to Four-byte Byte-transfer Device ..... 159
Byte M emory W rite from Four-byte Byte-transfer Device ..... 159
Byte M emory Read to Four-byte Byte-transfer Device ..... 159
Cell M emory W rite from O ne-byte Byte-transfer D evice ..... 159
Cell Memory Read to O ne-byte Byte-transfer Device ..... 160
Byte Memory W rite from O ne-byte Byte-transfer Device ..... 160
Byte Memory Read to O ne-byte Byte-transfer Device ..... 160
Cell M emory W rite from O ne-cell Cell-transfer D evice ..... 160
Cell Memory Read to O ne-cell Cell-transfer Device ..... 160

## PSC1000 M icroprocessor

32-BIT RISC PROCESSO R
Byte M emory W rite from O ne-cell Cell-transfer Device ..... 160
Byte Memory Read to One-cell Cell-transfer Device ..... 160
Bus Reset ..... 161
Video RAM Support ..... 161
Virtual-M emory Page Faults Input ..... 161
Alternate Inputs and O utputs ..... 161
Alternative Bit Inputs ..... 161
Alternative Bit O utputs ..... 161
Alternative M emory Fault Input ..... 162
Alternative Reset Input ..... 162
Processor Startup ..... 181
Power-on Reset ..... 181
$\forall$ Boot Memory ..... 181
Reset Process ..... 181
Bootstrap Programs ..... 181
Boot from Byte-W ide Boot-O nly M emory and Copy the Application Program to Cell-W ide R/W M emory182
Bootfrom Cell-W ide Boot-O nly Memory and Copy theApplication Program to Cell-W ide R/W Memory183
Boot and Run from Byte-W ide Memory ..... 183
Boot and Run from Cell-Wide Memory ..... 184
Stack Initialization ..... 184
Example PSC1000 CPU Systems ..... 187
Example System 1 ..... 187
Example System 2 ..... 187
Example System 3 ..... 187
Electrical Characteristics ..... 199
Power and Grounding ..... 199
Power Decoupling ..... 199
Connection Recommendations ..... 199
Clock ..... 200
Absolute Maximum Ratings ..... 201
O perating Conditions ..... 202
DC Specifications ..... 203
AC Characteristics ..... 204
Mechanical Characteristics ..... 223
Revision History ..... 224
Distributors and Sales Offices ..... 225
Index ..... 227

## PSC1000 M icroprocessor

## 32-BIT RISC PROCESSO R

## Figures

Figure 1. 100-Pin Thin Quad Flat Package (TQ FP) ..... 5
Figure 2. CPU Block Diagram ..... 11
Figure 3. MPU Block Diagram ..... 14
Figure 4. M PU Registers ..... 15
Figure 5. CPU Memory Map ..... 16
Figure 6. Byte Order ..... 17
Figure 7. add Execution Example ..... 18
Figure 8. MPU Instruction Formats ..... 22
Figure 9. Stack Exception Regions ..... 28
Figure 10. Floating-Point Number Formats ..... 33
Figure 11. Register mode ..... 39
Figure 12. VPU Block Diagram ..... 89
Figure 13. VPU Register U sage ..... 91
Figure 14. VPU Instruction Formats ..... 92
Figure 15. DMAC Block Diagram ..... 103
Figure 16. I/O-Channel Transfer Data Format ..... 104
Figure 17. INTC Block Diagram ..... 107
Figure 18. Bit Input Block Diagram ..... 111
Figure 19. Bit O utputs Block Diagram ..... 115
Figure 20. Group-Select and Bank-Select Bit Locations ..... 118
Figure 21. SM B M emory Architecture ..... 118
Figure 22. M M B M emory Architecture ..... 119
Figure 23. Programmable Bus Timing Reference ..... 128
Figure 24. On-Chip Resource Registers ..... 129
Figure 25. Example 0 n-Chip Register Diagram ..... 130
Figure 26. Bit Input Register ..... 131
Figure 27. Interrupt Pending Register ..... 132
Figure 28. Interrupt Under Service Register ..... 133
Figure 29. Bit O utput Register ..... 134
Figure 30. Interrupt Enable Register ..... 135
Figure 31. DMA Enable Register ..... 136
Figure 32. VRAM Control Bit Register ..... 137
Figure 33. M iscellaneous A Register ..... 139
Figure 34. M iscellaneous B Register ..... 140
Figure 35. Memory Fault Address Register ..... 142
Figure 36. Memory Fault Data Register ..... 142
Figure 37. M emory System Group-Select M ask Register ..... 143
Figure 38. Memory Group Device Size Register ..... 144
Figure 39. M iscellaneous C Register ..... 145
Figure 40. Memory Group 0-3 Extended Bus Timing Registers . . . . . . . . . . . . . . . . . . . . . . 146

Figure 41. Memory Group 0-3 CAS Bus Timing Registers . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Figure 42. Memory Group 0-3 RAS Bus Timing Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Figure 43. I/O Channel 0-7 Extended Bus Timing Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Figure 44. Memory System Refresh Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Figure 45. VPU Delay Counter Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Figure 46. I/O Device Transfer Types A Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Figure 47. I/O Device Transfer Types B Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Figure 48. Reserved Register Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Figure 49. DMA Enable Expiration Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Figure 50. Driver Current Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Figure 51. VPU Reset Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Figure 52. Virtual-M emory Page M apping Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Figure 53. Cell Memory W rite from M PU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Figure 54. Cell M emory Read to M PU NPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Figure 55. Byte Memory W rite from M PU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Figure 56. Byte Memory Read to M PU/VPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Figure 57. Cell M emory W rite from Four-byte Byte-transfer Device . . . . . . . . . . . . . . . . . . . 168
Figure 58. Cell Memory Read to Four-byte Byte-transfer Device . . . . . . . . . . . . . . . . . . . . . 169
Figure 59. Byte Memory W rite from Four-byte Byte-transfer Device . . . . . . . . . . . . . . . . . . 170
Figure 60. Byte M emory Read to Four-byte Byte-transfer Device . . . . . . . . . . . . . . . . . . . . 171
Figure 61. Cell M emory W rite from O ne-byte Byte-transfer Device . . . . . . . . . . . . . . . . . . . . 172
Figure 62. Cell M emory Read to O ne-byte Byte-transfer Device . . . . . . . . . . . . . . . . . . . . . 173
Figure 63. Byte Memory W rite from O ne-byte Byte-transfer Device . . . . . . . . . . . . . . . . . . 174
Figure 64. Byte Memory Read to O ne-byte Byte-transfer Device . . . . . . . . . . . . . . . . . . . . . . 175
Figure 65. Cell M emory W rite from One-cell Cell-transfer Device . . . . . . . . . . . . . . . . . . . . 176
Figure 66. Cell Memory Read to O ne-cell Cell-transfer Device . . . . . . . . . . . . . . . . . . . . . . 177
Figure 67. Byte Memory W rite from O ne-cell Cell-transfer Device . . . . . . . . . . . . . . . . . . . 178
Figure 68. Byte Memory Read to O ne-cell Cell-transfer Device . . . . . . . . . . . . . . . . . . . . . 179
Figure 69. Example Minimal System with 8-bit M emory . . . . . . . . . . . . . . . . . . . . . . . . . 188
Figure 70. Example M inimal System with 32-bit DRAM and I/O Decoding . . . . . . . . . . . . . 189
Figure 71. Example System with SRAM, DRAM and I/O Decode . . . . . . . . . . . . . . . . . . . . . 190

Figure 73. CPU Reset Timing ................................................................. . . . . 205
Figure 74. Memory Read Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Figure 75. Memory W rite Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Figure 76. Signal Coincidence Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Figure 77. Memory Fault Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Figure 78. Refresh Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Figure 79. VRAM Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Figure 80. DM A Request Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Figure 81. I/O on Bus Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Figure 82. Bit Input Sample Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

## PSC1000 M icroprocessor

32-BIT RISC PROCESSOR
Figure 83. Bit input from Bus Sample Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Figure 84. 100-Pin TQ FP Package Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

## Tables

Table 1. Signal Descriptions ..... 3
Table 1. Signal Descriptions (continued) ..... 4
Table 2. Pin Assignments, 100-Pin TQFP ..... 6
Table 3. PSC1000 M icroprocessor O rdering Information ..... 7
Table 4. Instruction Bandwidth Comparison ..... 15
Table 5. MPU Instruction Set ..... 20
Table 6. ALU Instructions ..... 21
Table 7. Code Examples: Rotate ..... 21
Table 8. Branch, Loop and Skip Instructions ..... 22
Table 9. M PU Branch Ranges ..... 22
Table 10. Literal Instructions ..... 23
Table 11. D ata M ovement Instructions ..... 23
Table 12. Load and Store Instructions ..... 24
Table 13. Code Example: Complex Addressing M ode ..... 24
Table 14. Code Examples: Memory M ove and Fill ..... 24
Table 15. Stack Data M anagement Instructions ..... 25
Table 16. Stack Cache M anagement Instructions ..... 25
Table 17. Byte O peration Instructions ..... 26
Table 18. Code Example: Byte Store ..... 26
Table 19. Code Example: Null Character Search ..... 26
Table 20. Code Example: Null-Terminated String M ove ..... 26
Table 21. Code Example: Byte Search ..... 27
Table 22. Floating-Point M ath Instructions ..... 27
Table 23. Debugging Instructions ..... 28
Table 24. On-Chip Resources Instructions ..... 28
Table 25. M iscellaneous Instructions ..... 28
Table 26. Code Example: Stack Initialization ..... 30
Table 27. Code Example: Stack Depth ..... 30
Table 28. Code Example: Save Context ..... 31
Table 29. Code Example: Restore Context ..... 31
Table 30. Traps Dependent on System State ..... 32
Table 31. Trap Priorities ..... 33
Table 32. Traps Independent of System State ..... 33
Table 33. GRS Extension Bit M anipulation Instructions ..... 34
Table 34. Rounding-M ode Actions ..... 34
Table 35. Code Example: Floating-Point M ultiply ..... 35
Table 36. Code Example: M emory-Fault Service Routine ..... 37
Table 37. VRAM Commands ..... 37
Table 38. Instructions That H old-off Pre-fetch ..... 41

## PSC1000 M icroprocessor

## 32-BIT RISC PROCESSO R

Table 39. M PU M nemonics and O pcodes (M nemonic O rder) ..... 84
Table 39. M PU M nemonics and Opcodes (M nemonic Order, continued) ..... 85
Table 40. M PU M nemonics and O pcodes (O pcode O rder) ..... 86
Table 40. M PU M nemonics and O pcodes (O pcode Order, continued) ..... 87
Table 41. VPU Instructions ..... 91
Table 42. VPU Branch Ranges ..... 92
Table 43. Code Example: VPU DRAM Refresh ..... 93
Table 44. VPU M nemonics and Opcodes (M nemonic Order) ..... 101
Table 45. VPU M nemonics and Opcodes (Opcode Order) ..... 102
Table 46. Sources of Interrupts ..... 107
Table 47. Code Example: ISR Vectors ..... 109
Table 48. Code Example: Bit Input W ithout Zero-Persistence ..... 113
Table 49. Code Example: M PU U sage of Bit Inputs ..... 113
Table 50. Code Example: M PU "Real-Time" Bit Input Read ..... 114
Table 51. RAS/CAS Address Line Configuration, Cell memory ..... 122
Table 52. RAS/CAS Address Line Configuration, Byte M emory ..... 123
Table 53. Sources of RAS cycles ..... 126
Table 54. Bit Field to On-Chip Register Cross-Reference ..... 156
Table 55. Slot Check Computation ..... 157
Table 56. Bus Access Priorities ..... 157
Table 57. I/O-Channel Transfer Characteristics ..... 158
Table 58. RAS/CAS Bus Transactions ..... 163
Table 59. System Configuration after CPU Reset ..... 185
Table 60. Absolute M aximum Ratings ..... 201
Table 61. O perating Conditions ..... 202
Table 62. DC Specifications ..... 203
Table 63. Input Characteristics ..... 203
Table 64. CPU-Clock and 2X-CPU-Clock ..... 204
Table 65. CPU Reset Timing ..... 205
Table 66. Memory Read and W rite Timing ..... 206
Table 66. Memory Read and W rite Timing (continued) ..... 207
Table 67. Signal Coincidence Timing ..... 210
Table 68. M emory Fault Timing ..... 211
Table 69. Refresh Timing ..... 213
Table 70. VRAM Timing ..... 214
Table 71. DMA Request Timing ..... 216
Table 72. I/O on Bus Timing ..... 218
Table 73. Bit Input Sample Timing ..... 219
Table 74. Bit Input from Bus Sample Timing ..... 220
Table 75. 100-Pin TQ FP Package Dimensions ..... 223
Table 76. 100-Pin TQ FP Package Thermal Characteristics ..... 223

32-BIT RISC PROCESSO R

## Documentation Typography and Nomenclature

R eferences to software commands, CPU instructions, registers, register fields, and package pins are in a different font than body text to minimize confusion and to distinguish them from the surrounding text. Specifically:

Processor instructions are in lowercase (e.g., "The mloop repeats refresh and delay, ...").
Registers or register fields are also in lowercase (e.g., "msra contains data used during..."). Contextually, use of a register or register field name can also imply its contents, (e.g. "... must contain the sum of mgebtdobe and mgebtcase"). When referring to a register or register field whose function is identical among its variants, X is used to hold the place of the identifying alpha or numeric character within the name (e.g. ioxebt).

Package pins are in uppercase (e.g., "... the timing for the CAS inactive portion, also referred to as CAS precharge..."). When referring to a pin whose function is identical among its variants, $x$ is used to hold the place of the identifying alpha or numeric character within the name (e.g., $\overline{\mathrm{CAS}} \overline{\mathrm{x}}$ ). The over bar or a prefix "-" on signal names indicates the signal is active in its low state; otherwise, signals are active high or the active state is not relevant (e.g. $\overline{\text { RAS }}$ and -RAS refer to the same signal).

To avoid confusion regarding the width in bits of a "word", the term "cell" is used to denote the full processor data element size of 32 bits.

PRODUCT PREVIEW indicates that the product is in the conceptual or design phase of development, and that the document represents the design goals for the product, which may change without notice before the product goes into production.

ADVANCE INFORMATION indicates that the product is in the sampling or pre-production phase of development and that data and specifications are preliminary and subject to change without notice.

## 32-BIT RISC PROCESSO R

```
Features
* Low-System-Cost 32-Bit RISC M icroprocessor
* Runs Java}\mp@subsup{}{}{\mathrm{ TM }}\mathrm{ at Native Speed
* Multiple Language Support
- Dual-Processor Architecture
    - M icroprocessing U nit (M PU )
        High-performance zero-operand dual-stack architecture
- Virtual Peripheral Unit (VPU)
Performs timing, time-synchronous data transfers, bit outputs, DRAM refresh, emulates peripherals
4-Gigabyte Physical Address Space
- Internal Clock M ultiplier
- 2X CPU clock, 4X Bus timing
- 4-Group M emory/Bus Interface
- Supports any combination of EPROM, SRAM , DRAM, VRAM
- Programmable memory and I/O timing
- Virtual Memory Support
- 8-Level Interrupt Controller
- 8-Level Direct M emory Access Controller
- 16 I/O bits
- 52 General-Purpose 32-Bit Registers
- "Glueless" System Interface
- Big Endian Byte O rdering
- Small, Low-Cost, 100-Pin TQ FP Package
```


## General Description

The PSC1000 microprocessor is a highly integrated 32-bit RISC processor that offers high performance and low power consumption at low system cost for a wide range of embedded applications. It is a highly integrated 32-bit RISC processor with a peak performance of one instruction per CPU -clock cycle. The 32-bitregisters and data paths fully support 32-bit addresses and data types. The processor addresses up to four gigabytes of physical memory, and supports virtual memory with the use of external mappinglogic.

As an implementation of the ShBoom ${ }^{\text {TM }}$ Microprocessor architecture, the PSC1000 CPU architectural philosophy is that of simplification and efficiency of use. A zero-operand design eliminates most operand
bits and the decoding time and instruction space they require. Instructions are shrunk to eight bits, significantly increasing instruction bandwidth and reducing program size. By not using pipeline or superscalar execution, the resulting control simplicity increases execution speed to issue and complete an instruction in a single clock cycle-as often as every clock cycle-without a conventional instruction cache. To ensure a low-cost chip, a data cache and its cost are also eliminated in favor of efficient register caches.

The stack architectures of the PSC1000 microprocessor and the Java Virtual Machine are very similar. This results in only a relatively simple byte code translator (20K) being required to produce executable native code from Java byte code, rather than a full Just-in-Time (JIT) compiler (200-400K). The result is much faster initial execution of Java programs and significantly smaller memory requirements. Further, most modern languages are implemented on a stack model. The features that allow the PSC1000 to run Java efficiently apply similarly to other Ianguages such as C, Forth and Postscript..

The PSC1000 CPU operates up to four groups of programmable bus configurations from as fast as two CPU clocksto asslow as 82 CPU clocks, allowing any desired mix of high-speed and low-speed memory. Minimum system cost is reduced, thus allowing the system designer to trade system cost for performance as needed.

By incorporating many on-chip system functions and a "glueless" bus interface, support chips are eliminated, further lowering system cost. The CPU includes an M PU, a Virtual Peripheral Unit, a DMA controller, an interrupt controller, bit inputs, bit outputs, and a programmable memory interface. It can operate with 32-bit-wide or 8-bit-wide memory and devices, and includes hardware debugging support. A minimum system consists of a PSC1000 CPU , an 8-bit-w ide EPRO M , an oscillator, and optionally one x8 or two x16 memories-a total of 4 or 5 active components. The small die, which contains only 137 500 transistors, produces a high-performance, lowcost CPU , and a high level of integration produces a high-performance, low-cost system.

## FEATURES

## MICRO PRO CESSING UNIT（MPU）

Zero－operand dual－stack architecture
Very similar to Java Virtual M achine
12.5 －ns instruction cycle

52 General－Purpose 32－Bit Registers
16 global data registers（ $\mathrm{g} 0-\mathrm{g} 15$ ）
16 local registers（r0－r15）double as return stack cache
ro is an index register with predecrement and postincrement
Automatic local－register stack spill and refill
18 operand stack cache registers（s0－s17）
s0 is an address register
Automatic operand stack spill and refill
Index register（x）with predecrement and postinc－
rement
Count register（ct）
Stack paging traps
Cache－management instructions
MPU communicates with DMA and VPU via global registers
Hardware single－and double－precision IEEE floating－ point support
Fast multiply
Fast bit－shifter
Hardware single－step and breakpoint
Virtual－memory support
Posted write
Power－fail status bit
Instruction－space－saving 8－bit opcodes

## DIRECT MEMO RY ACCESS CO NTRO LLER（D MAC）

Eight prioritized DMA channels
Fixed or revolving DMA priorities
Byte，four－byte or cell DMA devices
Single or back－to－back DM A requests
Transfer rates to 200 MB ／second
Programmable timing for each channel
Interrupt M PU on transfer boundary／count reached
Terminate DMA on transfer boundary／count reached Channels can be configured as event counters
DMA communicates with M PU and VPU via global registers

VIRTU AL PERIPHERAL UNIT（VPU）
Executes instruction stream independent of M PU
Deterministic execution
Performs timing，time－synchronous data transfers，bit－ output operations，DRAM refresh
Emulates peripherals like serial I／O，A to D，D to A， PW M ，timers
Eight transfer channels
Byte，four－byte or cell device transfers
Programmable timing for each channel
Interrupt M PU on transfer boundary／count reached
Set／reset output bits
Set M PU interrupt
Test and branch on input bit
Looping instructions
Load transfer address，direction，interrupt on boundary
VPU communicates with DMA and MPU via global registers or memory
Channels can be configured as timers
Instruction－space－saving 8－bit opcodes

## IN PU T－O U TPU T／INTERRU PTS

Eight bit inputs
Bits can be configured as zero－persistent Register－and bit－addressable
Eight bit outputs
Register－and bit－addressable
I／O bits available on pins or multiplexed on bus
Eight prioritized and vectored interrupts
PRO G RAM MABLE MEMO RY INTERFACE（MIF）
Programmable bus interface timing to $1 / 4$ external clock
Four independently configurable memory groups：
Any combination of 32 －bit and 8－bit devices
Any combination of EPROM，SRAM，DRAM，VRAM
Almost any DRAM size／configuration
Fast－page mode access for each DRAM group
Glueless support for one memory bank per group 1.25 gates per memory bank for decoding up to 16 memory banks（four per memory group）
Virtual－memory support
DRAM refresh support（via VPU）
VRAM support includes DSF，$\overline{O E}, \overline{W E}, \overline{C A S}$ before $\overline{\text { RAS }}$ control

## PSC1000 M icroprocessor

## 32-BIT RISC PROCESSO R

Table 1. Signal Descriptions

| SYMB OL | TYPE | DESCRIPTION |
| :---: | :---: | :---: |
| $\mathrm{cV}_{\text {ss }}$ | PWR | Ground for core logic and all output driver pre-drivers. |
| $\mathrm{cV}_{\text {cc }}$ | PWR | Power for core logic and all output driver pre-drivers. |
| ctrlV ss | PWR | Ground for control signal output drivers (DSF, out [7:0], all RASes, all CASes, $\overline{\mathrm{DOB}}$, $\overline{\mathrm{OE}}, \overline{\mathrm{xWE}}$ ). |
| ctrlV cc | PWR | Power for control signal outputdrivers (DSF, OUT [7:0] , all RASes, all CAS es, $\overline{\mathrm{DOB}}, \overline{\mathrm{OE}}$, $\overline{\mathrm{xWE}}$ ). |
| $a d V_{\text {ss }}$ | PWR | Ground for AD [ $31: 0$ ] output drivers. |
| $\mathrm{adV}_{\mathrm{cc}}$ | PWR | Power for AD [ $31: 0$ ] output drivers. |
| CLK | 1 | EXTERNAL OSCILLATOR: The CPU operating frequency is twice the external oscillator frequency. |
| $\overline{\text { RESET }}$ | $\begin{gathered} \prime \\ A() \end{gathered}$ | RESET: Asserting $\overline{\text { RESET }}$ causes the entire CPU to be initialized and the MPU and VPU to begin execution at their hardware reset locations. If $\overline{\operatorname{RESET}}$ is not held low during power-up, the signal also is input on AD8 during $\overline{\mathrm{RAS}} \overline{\mathrm{A}}$ active and $\overline{\mathrm{CAS}}$ inactive, and $\overline{\mathrm{RESET}}$ is ignored. |
| DSF | $\begin{gathered} 0 \\ \mathrm{I}(\mathrm{~L}) \end{gathered}$ | DEVICE SPECIAL FUNCTION: Set on VRAM memory cycles during $\overline{\text { RAS }}$ and $\overline{\mathrm{CAS}}$ accesses by the MPU to control VRAM function. |
| $\overline{\mathrm{MFLT}}$ | $\frac{1}{S(\operatorname{RAS})}$ | MEMORY FAULT: Asserted by external memory-management hardware before $\overline{\text { RAS }}$ active to invalidate the current MPU bus cycle and cause the MPU to trap if the configuration bit pkgmflt is set. The signal also is inputon AD8 at $\overline{\text { RAS }}$ fall during $\overline{\mathrm{CAS}}$ inactive, if the bit pkgmflt is clear. |
| $\overline{\mathrm{IN}}[7: 0]$ | $\begin{gathered} I \\ A() \end{gathered}$ | INPUTS: Asserted by external hardware to request an interrupt or DMA, or to input a bit, when the configuration bit pkgio is set. The bits alternatively are input on AD [7:0] during $\overline{\mathrm{RAS}}$ active and $\overline{\mathrm{CAS}}$ inactive, if the bit pkgio is clear. |
| OUT [ 7:0] | $\begin{gathered} 0 \\ \mathrm{I}(\mathrm{H}) \end{gathered}$ | OUTPUTS: Bit outputs writable from the VPU or MPU. These bits are also available on AD [7:0] during RAS inactive. |
| $\overline{\text { RAS }}$ | $\begin{gathered} 0 \\ \mathrm{I}(\mathrm{~L}) \end{gathered}$ | ROW ADDRESS STROBE: A control signal asserted to define row address valid and deasserted only when another row address cycle is required. |
| RAS | O, I(H) | Inverted $\overline{\text { RAS }}$. |
| $\overline{\mathrm{CAS}}$ | $\begin{gathered} 0 \\ \mathrm{I}(\mathrm{H}) \end{gathered}$ | COLUMN ADDRESS STROBE: A control signal asserted to define column address valid and deasserted at the end of the current bus cycle. |
| CAS | O, I(L) | Inverted $\overline{\mathrm{CAS}}$. |

Table 1. Signal Descriptions (continued)

| SYMB OL | TYPE | DESCRIPTION |
| :---: | :---: | :---: |
| $\frac{\overline{\text { MGSO.... }} \overline{3} / \overline{3}}{\text { RAS } \ldots . .}$ | $\begin{gathered} 0 \\ \mathrm{I}(\mathrm{~L}) \end{gathered}$ | MEMORY GROUP SELECTS/ROW ADDRESS STROBES: In multiple memory bank (MMB) mode (configuration bit mmb is set), the strobes are active during all bus cycles for the entire bus cycle. In single memory bank (SMB) mode, they are similar to $\overline{\text { RA } \bar{S}}$. |
| $\overline{\text { CASO }}-\overline{3}$ | $\begin{gathered} 0 \\ I(H) \end{gathered}$ | COLUMN ADDRESS STROBES: Similar to $\overline{C A S}$, to assert a column address cycle on the specified memory bank within the current memory group. |
| $\overline{\mathrm{OE}}$ | $\begin{gathered} 0 \\ \mathrm{I}(\mathrm{H}) \end{gathered}$ | OUTPUT ENABLE: Active when the current bus transaction is a read from memory. The configuration bit oed is set or cleared during the CPU reset startup process. |
| $\overline{\text { EWE }}$ | $\begin{gathered} 0 \\ \mathrm{I}(\mathrm{H}) \end{gathered}$ | EARLY WRITE ENABLE: Active when the current bus transaction is a write to memory. Active time at either start of cycle or $\overline{\mathrm{CAS}}$ fall is programmable for each memory group. |
| $\overline{L W E}$ | $\begin{gathered} 0 \\ \mathrm{I}(\mathrm{H}) \end{gathered}$ | LATE WRITE ENABLE: Active when the current bus transaction is a write to memory and for VRAM control. Active time either at or after $\overline{\mathrm{DOB}}$ active is programmable for each memory group. |
| AD [31:0] | $\begin{gathered} \frac{1 / O}{S(\overline{D O B})} \\ S(\overline{\text { RAS }}) \\ A(,) \\ I(Z) \end{gathered}$ | ADDRESS DATA BUS: Multiplexed address, data, I/O and control bus. <br> For data. <br> For alternate memory fault on AD8. <br> For alternate reset on AD8. See $\overline{\operatorname{RESET}}$. |
| Notes: $\begin{aligned} \mathrm{I} & =\text { Input } \\ \mathrm{O} & =\text { Outp } \\ \text { I/O } & =\text { Bidir } \\ \text { PWR } & =\text { P OW } \end{aligned}$ | Only Pins t-Only Pins ctional Pins P in | $A()=$ Asynchronous inputs $I(H)=$ high value on reset <br> $S($ sym $\neq$ Synchronous inputs must meet $I(L)=$ low value on reset <br>  setup and hold requirements rela- $I(Z)=$ high impedance on <br>  tive to symbol.  reset |

32-BIT RISC PROCESSO R


Figure 1. 100-Pin Thin Quad Flat Package (TQFP)

32-BIT RISC PROCESSO R

Table 2. Pin Assignments, 100-Pin TQFP

| $\begin{aligned} & \text { PIN } \\ & \text { NO. } \end{aligned}$ | PIN NAME | TYPE | $\begin{aligned} & \text { PIN } \\ & \text { NO. } \end{aligned}$ | PIN NAME | TYPE | $\begin{aligned} & \text { PIN } \\ & \text { NO. } \end{aligned}$ | PIN NAME | TYPE |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | ctrlV cc | PWR | 35 | AD17 | I/O | 69 | $\overline{\mathrm{CAS}}$ | 0 |
| 2 | OUT0 | 0 | 36 | AD16 | I/O | 70 | ctrlV Vss | PWR |
| 3 | OUT1 | 0 | 37 | $\mathrm{adV}_{\text {ss }}$ | PWR | 71 | ctrlV ${ }_{\text {cc }}$ | PWR |
| 4 | OUT2 | 0 | 38 | $\mathrm{adV}_{\text {cc }}$ | PWR | 72 | $\overline{\mathrm{DOB}}$ | 0 |
| 5 | OUT3 | 0 | 39 | $\mathrm{cV}_{\text {cc }}$ | PWR | 73 | DSF | 0 |
| 6 | OUT 4 | 0 | 40 | $\mathrm{cV}_{\text {Ss }}$ | PWR | 74 | $\overline{\mathrm{OE}}$ | 0 |
| 7 | OUT5 | 0 | 41 | AD15 | I/O | 75 | $\overline{L W E}$ | 0 |
| 8 | OUT6 | 0 | 42 | AD14 | I/O | 76 | ctrlV ${ }_{\text {ss }}$ | PWR |
| 9 | OUT7 | 0 | 43 | AD13 | I/O | 77 | ctrlV ${ }_{\text {cc }}$ | PWR |
| 10 | $\overline{\operatorname{RESET}}$ | 1 | 44 | $\mathrm{adV}_{\text {ss }}$ | PWR | 78 | $\overline{\text { MGSO }} / \overline{\text { RAS } 0}$ | 0 |
| 11 | AD31 | I/O | 45 | $\mathrm{adV}_{\text {cc }}$ | PWR | 79 | $\overline{\text { MGS 1 }} / \overline{\text { RAS } 1}$ | 0 |
| 12 | $\mathrm{CV}_{\text {SS }}$ | PWR ${ }^{1}$ | 46 | AD12 | I/O | 80 | $\overline{\text { MGS } 2 / \overline{R A S 2}}$ | 0 |
| 13 | AD30 | I/O | 47 | AD11 | I/O | 81 | $\overline{\text { MGS }} / \overline{\text { RAS } 3}$ | 0 |
| 14 | $\mathrm{cV}_{\text {cc }}$ | PWR ${ }^{1}$ | 48 | AD10 | I/O | 82 | $\overline{\text { MFLT }}$ | 1 |
| 15 | $\mathrm{adV}_{\text {ss }}$ | PWR | 49 | AD9 | I/O | 83 | $\overline{\mathrm{INO}}$ | 1 |
| 16 | $\mathrm{adV}_{\text {cc }}$ | PWR | 50 | $\mathrm{adV}_{\text {Ss }}$ | PWR | 84 | $\overline{\text { IN } 1}$ | 1 |
| 17 | AD29 | I/O | 51 | $\mathrm{adV}_{\text {cc }}$ | PWR | 85 | $\overline{\text { IN } 2}$ | 1 |
| 18 | AD28 | I/0 | 52 | AD8 | I/O | 86 | $\overline{\text { IN } 3}$ | 1 |
| 19 | AD27 | I/0 | 53 | AD7 | 1/0 | 87 | $\overline{\text { IN 4 }}$ | 1 |
| 20 | AD2 6 | I/O | 54 | AD6 | I/O | 88 | CLK | 1 |
| 21 | $\mathrm{adV}_{\text {ss }}$ | PWR | 55 | $\mathrm{adV}_{\text {ss }}$ | PWR | 89 | $\mathrm{cV}_{\text {SS }}$ | PWR ${ }^{2}$ |
| 22 | $\mathrm{adV}_{\text {cc }}$ | PWR | 56 | $\mathrm{adV}_{\mathrm{cc}}$ | PWR | 90 | $\mathrm{cV}_{\text {cc }}$ | PWR ${ }^{2}$ |
| 23 | AD25 | I/O | 57 | AD5 | I/O | 91 | IN5 | 1 |
| 24 | AD2 4 | 1/0 | 58 | AD 4 | I/O | 92 | $\overline{\text { IN } 6}$ | 1 |
| 25 | AD23 | I/O | 59 | AD3 | I/0 | 93 | $\overline{\text { IN } 7}$ | 1 |
| 26 | $\mathrm{adV}_{\text {ss }}$ | PWR | 60 | AD2 | I/O | 94 | $\overline{\text { CAS } 0}$ | 0 |
| 27 | $\mathrm{adV}_{\text {cc }}$ | PWR | 61 | $\mathrm{adV}_{\text {ss }}$ | PWR | 95 | $\overline{\text { CAS } 1}$ | 0 |
| 28 | AD22 | I/O | 62 | $\mathrm{adV}_{\text {cc }}$ | PWR | 96 | $\overline{\text { CAS } 2}$ | 0 |
| 29 | AD21 | 1/0 | 63 | $\mathrm{cV}_{\text {cc }}$ | PWR ${ }^{1}$ | 97 | $\overline{\text { CAS } 3}$ | 0 |
| 30 | AD20 | I/O | 64 | AD1 | I/O | 98 | RAS | 0 |
| 31 | AD19 | I/O | 65 | $\mathrm{cV}_{\text {ss }}$ | PWR ${ }^{1}$ | 99 | CAS | 0 |
| 32 | $\mathrm{adV}_{\text {ss }}$ | PWR | 66 | AD0 | I/O | 100 | ctrlv ${ }_{\text {ss }}$ | PWR |
| 33 | $\mathrm{adV}_{\text {cc }}$ | PWR | 67 | $\overline{\text { EWE }}$ | 0 |  |  |  |
| 34 | AD18 | I/O | 68 | $\overline{\text { RAS }}$ | 0 |  |  |  |

## Notes:

1. $P W R$ pin is near clock driver.
2. $P W R$ pin is near PLL.
```
I = Input-Only P in I/O = Bidirectional P ins
\(0 \quad\) Output-Only Pin PWR = Power Pins
```


## PSC1000 M icroprocessor

32-BIT RISC PROCESSO R

Table 3. PSC 1000 Microprocessor Ordering Information

| Description | CPU Clock <br> Frequency (MHz) | Package Type | Stock Number |
| :---: | :---: | :---: | :---: |
| PSC1000-BAXTC | 80 | TQFP | $31-0100371$ |

NOIL甘WYO』NI ヨコNシヘO＊

## Purpose

This reference manual describes the architecture, hardware interface, and programming of the PSC1000 Microprocessor. The PatriotPSC1000 microprocessor is one of a family of low-power, low-cost, stackarchitecture processors targeted specifically for embedded applications. Asstack-architecture processors, the PSC 1000 family are ideal for applications that must run Java ${ }^{\text {TM }}$ at native speeds. These include laser printers, ignition controllers, netw ork routers, personal digital assistants, set-top cable controllers, video games, pagers, cell phones, and many other applications. ButsinceC++is semantically similar to Java, the PSC1000 family also run C and C++efficiently, as well as stack-architecture languages such as Forth and Postscript ${ }^{\text {m }}$.

This data book provides the information required to design products thatuse the PSC1000 CPU , including functional capability, electrical characteristics and ratings, and package definitions, as well as the information required to program both the MPU and VPU.

## 0 verview

The PSC1000 Microprocessor is an implementation of the ShBoom ${ }^{\text {Tm }}$ Microprocessor architecture. It is a highly integrated 32-bit RISC processor that executes at a peak performance of one instruction per CPUclock cycle. The CPU is designed specifically for use in those embedded applications for which power consumption, M PU performance, and system costare deciding selection factors.

The PSC1000 CPU instruction sets are hardwired, allowing mostinstructionsto execute in a single cycle, without the use of pipelines or superscalar architecture. A "flow-through" design allows the next instruction to startbefore the prior instruction completes, thus increasing performance.

The PSC1000 MPU contains 52 general-purpose registers, including 16 global data registers, an index register, a count register, a 16 -deep addressable register/return stack, and an 18-deep operand stack.

Both stacks contain an index register in the top element, are cached on chip, and, when required, automatically spill to and refill from external memory. The stacks minimize the data movement typical of register-based architectures, and also minimize memory accesses during procedure calls, parameter passing, and variable assignments. Additionally, the MPU contains a mode/status register, two stack pointers, and 41 locally addressed registers for I/O, control, configuration, and status.

## KEY FEATURES

Run Java at N ative Speed: The stack architectures of the PSC1000 microprocessor and the Java Virtual Machine are very similar. This results in only a relatively simple byte code translator (20K) being required to produce executable native code from Java byte code, rather than a full Just-in-Time (IIT) compiler (200-400K) as is required for common processor architectures. The result is much faster initial execution of Java programs and significantly smaller memory requirements. Additionally, hundreds of kilobytes of memory are saved due to the reduced size of the translator itself.

Multiple Language Support: M ost modern languages are implemented on a stack model. The features that allow the PSC1000 to run Java efficiently apply similarly to other languages such as $\mathrm{C}, \mathrm{C}++$, Forth and Postscript.

Dual-Processor Architecture: The CPU containsboth a high-performance, zero-operand, dual-stack architecture microprocessing unit (MPU), and an virtual peripheral unit(VPU ) that executes instructions to transfer data, measure time, test inputs, set outputs, and emulate peripherals such as serial ports and $A$ to D or D to A converters.

Zero-O perand Architecture: M any RISC architectures waste valuable instruction space-often 15 bits or more per instruction-by specifying three possible operands for every instruction. Zero-operand (stack) architectures eliminate these operand bits, thus allowing much shorter instructions-typically onefourth the size- and thus a higher instruction-execution bandwidth and smaller program size. Stacks also
minimize register saves and loads within and across procedures, thus allowing shorter instruction sequences and faster-running code.

Fast, Simple Instructions: Instructions are simpler to decode and execute than those of conventional RISC processors, allowing the PSC1000 MPU and VPU to issue and complete instructions in a single clock cycle, as often as every CPU -clock cycle.

Four-Instruction Buffer: U sing 8-bitopcodes, the CPU obtains up to four instructions from memory each time an instruction fetch or pre-fetch is performed. These instructions can be repeated without rereading them from memory. This maintains high performance when connected directly to DRAM, without the expense of a cache.

Local and G lobal Registers: Local and global registers minimize the number of accessesto data memory. The local-register stack automatically caches up to sixteen registers, and the operand stack up to eighteen registers. Asstacks, any allocated data space efficiently nests and unnests across procedure calls. The sixteen global registers provide storage for shared data.

Posted W rite: Decouples the processor from data writes to memory, allowing the processor to continue executing after a write is posted.

Programmable Memory/Bus Interface: Allows the use of lower-cost memory and system components in price-sensitive systems. The interface supports many types of EPROM/SRAM/DRAM/VRAM directly, including fast-page mode on up to four groups of DRAM devices. On-chip supportof RAS cycle $\overline{O E}$ and $\overline{\mathrm{WE}}, \mathrm{CAS}$-before-RAS, and the dSF signal allow use of VRAM withoutadditional external hardware. Program-
mable bustiming and driver power allow the designer a range of solutions to system design challenges in order to match the time, performance, and budget requirements for each project.

Clock Multiplier: Internally doubles and quadruples the external clock. An on-chip PLL circuit eliminates typical stringentoscillator specifications, thusallowing the use of lower-cost oscillators.

Fully Static Design: A fully static design allows running the clock from DC up to rated speed. Lower clock speeds can be used to drastically cut power consumption.

Hardware Debugging Support: Both breakpoint and single-step capability aid in debugging programs.

Virtual Memory: Supported through the use of external mapping SRAM s and support logic.

Floating-Point Support: Special instructions implement efficient single- and double-precision IEEE floating-point arithmetic.

Direct Memory Access Controller: Supports up to eight prioritized levels at data rates of up to the equivalent of one byte per CPU clock cycle.

Interrupt Controller: Supports up to eight prioritized levels with interrupt responses as fast as eight CPU clock cycles.

Eight Bit Inputs and Eight Bit Outputs: I/O bits are available for MPU and VPU application use, thus reducing the requirement for external hardw are.

## Central Processing U nit

## PSC1000 MICRO PRO CESSO R

## Central Processing Unit

The PSC1000 CPU architectural philosophy is that of simplification and efficiency of use: implement the simplest solution that adequately solves the problem and provides the best utilization of existing resources. In hardware, this typically equates to using fewer transistors, and fewer transistors means a lower-cost, and often lower-power, CPU .

Early RISC processors reduced transistor counts compared to CISC processors, and gained their cost and performance improvements therein. Today,
interconnections between transistors dominate the silicon of many CPUs. The PSC1000 M PU architectural philosophy results in, along with fewer transistors, the minimization of interconnections compared to register-based MPUs.

## Resources

The PSC1000 CPU contains ten major functional areas: microprocessing unit (M PU ), virtual peripheral unit (VPU), global registers, direct memory access controller (DMAC), interrupt controller (INTC), onchip resources, bitinputs, bit outputs, programmable memory interface (MIF), and clock. In part, the


Figure 2. CPU Block Diagram

32－BIT RISC PROCESSO R

PSC1000 CPU gains its capability and small silicon size from the resource sharing within and among these areas．See Figure 2．For example：
－The global registers are shared by the M PU，the VPU， and the transfer logic within the MIF．They are used by the M PU for data storage and control communica－ tion with the DMAC and the VPU；by the VPU for transfer information，loop counts，and delay counts； and by the DM AC for transfer information．Further，the transfer information is used by the transfer logic in the MIF which is shared by the VPU and DMAC．
－The MIF is shared by the MPU ，the VPU，the DMAC， the bit outputs，and the bit inputs for access to the system bus．Bus transaction requests are arbitrated and prioritized by the MIF to ensure temporally determinis－ tic execution of the VPU．
－The bit inputs are made available to the system through the On－Chip Resource Registers．They are shared by the INTC and the DMAC for service requests，are available to the MPU and the VPU for programmed input，and are bit－addressable．
－The DM AC transfer－termination logic is significantly reduced by using specific termination conditions and close coupling with the M PU for intelligent termina－ tion action．
－The INTC is shared by the bit inputs，the VPU，and the DM AC（through the M IF transfer logic）for interrupt requests to the MPU ．
－The bit outputs are made available to the system through the On－Chip Resource Registers．They are shared by the MPU and the VPU for programmed output，and are bit－addressable．

Although the maximum usage case requiring a complex VPU program，many interruptsources，many input bits，many output bits，all available DMA channels，and maximum M PU computational ability mightleave a shortage of resources，such applications are not typical．The sharing of resources among
functional units increases CPU capability and flexibil－ ity，and significantly reduces transistor count，package pin count，and thus silicon size and cost．The ability to select among available resources，compared to the fixed resource set of other CPU s，allows the PSC1000 CPU to be used for a wider range of applications．

## Clock Speed

The clock speed of a CPU is not a predictor of its performance．For example，the PowerPC 604，running at about half the speed of the DEC Alpha 21064A， achieves about the same SPECint95 benchmark performance．In this respect，the PSC1000 CPU is more like the DEC Alphat than the PowerPC．However， the PSC1000 CPU is based on a significantly different design philosophy than either of these CPUs．

Most processors historically have forced the system designer to maintain a balanced triangle among CPU execution speed，memory bandwidth，and I／O bandwidth．However，as system clock rate increases， typically so does bus speed，cache memory speed， and system interface costs．Typically，too，so do CPU cost，as often thousands of transistors are added to maintain this balance．

The PSC1000 CPU lets the system designer select the performance level desired，while maintaining low system cost．This may tiltthe triangle slightly，but cost is not part of the classical triangle－balancing equation． The PSC1000 CPU＇s programmable memory interface permits a wide range of memory speeds to be used， allowing systems to use slow or fast memory as required．Slow memory clearly degrades system performance，but the fast internal clock speed of the PSC1000 CPU causes internal operations to be completed quickly．Thus the multi－cycle multiply and divide instructions always executequickly，without the silicon expense of a single－cycle multiply unit． Although higher performance can sometimes be gained by dedicating large numbers of transistors to functions such as these，silicon cost also increases， and increased cost did not fit the design goals for this version of the PSC1000 CPU．

## PSC1000 M icroprocessor

32-BIT RISC PROCESSOR

ADVANCE INFORMATION

32-BIT RISC PROCESSO R


Figure 3. MPU Block Diagram

## Microprocessing U nit

## PSC1000 MICRO PRO CESSO R

## Microprocessing Unit

The MPU supports the ShBoom ${ }^{\text {TM }}$ architectural philosophy of simplification and efficiency of use through its basic design in several interrelated ways.

Whereas most RISC processors use pipelines and superscalar execution to execute at high clock rates, the PSC1000 M PU uses neither. By having a simpler architecture, the PSC1000 M PU issues and completes most instructions in a single clock cycle. There are no pipelines to fill and none to flush during changes in program flow. Though more instructions are sometimes required to perform the same procedure in PSC1000 M PU code, the MPU operates at a higher clock frequency than other processors of similar silicon size and technology, thus giving comparable performance at significantly reduced cost.

A microprocessor's performance is often limited by how quickly it can be fed instructions from memory. The MPU reduces this bottleneck by using 8-bit instructions so that up to four instructions (an instruction group) can be obtained during each memory access. Each instruction typically takesone CPU -clock cycle to execute, thusrequiring four CPU-clock cycles to execute the instruction group. Because a memory access can complete in four (or even fewer) CPU -clock cycles, the next instruction group can be available when execution of the previous group completes. This makes it possible to feed instructions to the processor at maximum instructionexecution bandwidth without the cost and complexity of an instruction cache.

The zero-operand (stack) architecture makes 8-bit instructions possible. The stack architecture eliminates the requirement to specify source and destination operandsin every instruction. By not using opcode bits on every instruction for oper-

Table 4. Instruction Bandwidth Comparison

| $\mathrm{g} 5=\mathrm{g} 1-(\mathrm{g} 2+1)+\mathrm{g} 3-(\mathrm{g} 4 * 2)$ |  |  |  |
| :---: | :---: | :---: | :---: |
| Typical RISC MPU |  | PSC 1000 MPU |  |
|  |  | push | g1 |
|  |  | push | g2 |
|  | \#1,g2,g5 | inc | \#1 |
| sub | g1,g5,g5 | sub |  |
|  |  | push | g3 |
| add | g5,g3,g5 | add |  |
|  |  | push | g4 |
|  | g4,\#1,temp | shl | \#1 |
|  |  | sub |  |
|  | g5,temp,g5 | pop | g5 |
| 20 bytes |  | 10 bytes |  |

## Example of twice the instruction bandwidth available on the PSC1000 MPU

and specification, a much greater bandwidth of functional operations-up to four times as high-is possible. Table 4 depicts an example PSC1000 M PU instruction sequence that demonstrates twice the


Figure 4. MPU Registers

32－BIT RISC PROCESSO R
typical RISC M PU instruction bandwidth．The instruc－ tion sequence on the PSC1000 M PU requires one－half the instruction bits，and the uncached performance benefits from the resulting increase in instruction bandwidth．

Stack MPUs are thus simpler than register－based MPUs，and the PSC1000 MPU has two hardware stacks to take advantage of this：the operand stack and the local－register stack．The simplicity is widespread and is reflected in the efficient ways stacks are used during execution．

The ALU processes data from primarily one source of inputs－the top of the operand stack．The ALU is also used for branch address calculations．Data bussing is thus greatly reduced and simplified．Intermediate results typically＂stack up＂to unlimited depth and are used directly when needed，rather than requiring specific register allocations and management．The stacks are individually cached and spill and refill automatically，eliminating software overhead for stack manipulation typical in other RISC processors． Function parameters are passed on，and consumed directly off of，the operand stack，eliminating the need for most stack frame management．W hen additional local storage is required，the local－register stack supplies registers that efficiently nest and unnest across functions．As stacks，the stack register spaces are only allocated for data actually stored，maximizing storage utilization and bus bandwidth when registers are spilled or refilled－unlike architectures using fixed－size register windows．Stacks speed contextswitches，such as interrupt servicing，because registers do not need to be explicitly saved before use－additional stack space is allocated as required．The stacks thus reduce the number of explicitly addressable registers other－ wise required，and speed execution by reducing data location specification and movement．Stack storage is inherently local，so the global registers supply non－ local register resources when required．

Eight－bit opcodes are too small to contain much associated data．Additional bytes are necessary for immediate values and branch offsets．However， variable－length instructions usually complicate decoding and complicate and lengthen the associated


Figure 5．CPU Memory Map
data access paths．To simplify the problem，byte literal data is taken only from the rightmost byte of the instruction group，regardless of the location of the byte literal opcode within the group．Similarly，branch offsets are taken as all bits to the right of the branch opcode，regardless of the opcode position．For 32－bit literal data，the data is taken from a subsequent memory cell．These design choices ensure that the required data is always right－justified for placement on the internal data busses，reducing interconnections and simplifying and speeding execution．

## Microprocessing Unit

## PSC1000 MICRO PROCESSO R

Since mostinstructions decode and execute in a single clock cycle, the same ALU that is used for data operations is also available, and is used, for branch address calculations. This eliminates an entire ALU often required for branch offset calculations.

Rather than consume the chip area for a single-cycle multiply-accumulate unit, the higher clock speed of the M PU reduces the execution time of conventional multicycle multiply and divide instructions. For efficiently multiplying by constants, a fast multiply instruction multiplies only by the specified number of bits.

Rather than consume the chip area for a barrel shifter, the counted bit-shift operation is "smart" to first shift by bytes, and then by bits, to minimize the cycles required. The shift operations can also shift double cells ( 64 bits), allowing bit-rotate instructions to be easily synthesized.

Although floating-pointmath is useful, and sometimes required, it is not heavily used in embedded applications. Rather than consume the chip area for a floatingpoint unit, M PU instructions to efficiently perform the most time-consuming aspects of basic IEEE floatingpoint math operations, in both single and double precision, are supplied. The operationsuse the "smart" shifter to reduce the cycles required.

Byte read and write operations are available, but cycling through individual bytes is slow when scanning for byte values. These types of operations are made more efficient by instructions that operate on all of the bytes within a cell at once.

## Address Space

The M PU fully supports a linear four-gigabyte address space for all program and data operations. I/O devices are selected by mapping them into memory addresses. By convention, the uppermost address bits select I/O device addresses decoded in external hardware. This convention leaves a contiguous linear program and data space of two gigabytes with a sparse address space above two gigabytes. Italso allowssimultaneous addressing of an I/O device and a memory address for I/O channel transfers. See Memory and Device Addressing, page 105.


Figure 6. Byte Order
Several instructions or operations expect addresses aligned on four-byte (cell) boundaries. These addresses are referred to as cell-aligned. O nly the upper 30 bits of the address are used to locate the data; the two least-significant address bits are ignored but appear externally. Within a cell, the high order byte is located at the low byte address. The next lowerorder byte is at the next higher address, and so on. For example, the value $0 \times 12345678$ would exist at byte addresses in memory, from low to high address, as 12 3456 78. See Figure 6.

## Registers and Stacks

The register set contains 52 general-purpose registers, a mode/status register, two stack pointers, and 41 local address-mapped on-chip resource registers used for I/O, configuration, and status. See Figure 4, and Figure 24, page 129.

The operand stack contains eighteen registers and operates as a push-down stack, with direct access to the top three registers ( $\mathrm{s} 0-\mathrm{s} 2$ ). These registers and the remaining registers ( $s 3-\mathrm{s} 17$ ) operate together as a stack cache. Arithmetic, logical, and data-movement operations, as well as intermediate result processing, are performed on the operand stack. Parameters are passed to procedures and results are returned from procedures on the stack, without the requirement of building a stack frame or necessarily moving data between other registers and the frame. As a true stack, registers are allocated only as required, resulting in efficient use of available storage. The external operand stack is addressed by register sa.

The local-register stack contains sixteen registers and operates as a push-down stack with direct access to the firstfifteen registers (r0-r14). Theses registers and
the remaining register（r15）operate to gether as a stack cache．As a stack，they are used to hold subroutine return addresses and automatically nest local－register data．The external operand stack is addressed by register la．

Both cached stacks automatically spill to memory and refill from memory，and can be arbitrarily deep． Additionally，so and ro can be used for memory access．See Stacks and Stack Caches on page 28.

The use of stack－cached operand and local registers improves performance by eliminating the overhead required to save and restore context（when compared to processors with only global registers available）．This allows for very efficient interrupt and subroutine processing．

In addition to the stacks are sixteen global registers and three other registers．The global registers（g0－g15） are used for data storage，as operand storage for the M PU multiply and divide instructions（ g 0 ），and for the VPU．Since these registers are shared，the MPU and the VPU can also communicate through them． Remaining are mode，which contains mode and status bits；$x$ ，which is an index register（in addition to s0 and $r 0$ ）；and $c t$ ，which is a loop counter and also participates in floating－point operations．

## Programming Model

For those familiar with the Java Virtual Machine， American National Standard Forth（ANS Forth）， Postscript，or Hewlett－Packard calculators that use postfix notation，commonly known as Reverse Polish Notation（RPN），programming the PSC1000 M PU is in many ways be very familiar．

An MPU architecture can be classified as to the number of operands specified within its instruction format．Typical 16－bit and 32－bitCISC and RISC M PUs are usually two－or three－operand architectures， whereas smaller microcontrollers are often one－ operand architectures．In each instruction，two－and three－operand architectures specify a source and destination，or two sources and a destination，whereas one－operand architectures specify only one source and have an implicit destination，typically the accumula－
tor．Architectures are also usually not pure．For example，one－operand architectures often have two－ operand instructions to specify both a source and destination for data movement between registers．

The PSC1000 MPU is a zero－operand architecture， known as a stack computer．O perand sources and destinations are assumed to be on the top of the operand stack，which is also the accumulator．An operation such as add uses both source operands from the top of the operand stack，adds them，and returns the result to the top of the operand stack，thus causing a net reduction of one in the operand stack depth．See Figure 7.


Figure 7．add Execution Example
Most ALU operations behave similarly，using two source operands and returning one result operand to the operand stack．A few ALU operations use one source operand and return one result operand to the operand stack．Some ALU and other operations also require a non－stack register，and a very few do not use the operand stack at all．

Non－ALU operations are also similar．Loads（memory reads）either use an address on the operand stack or in a specified register，and place the retrieved data on the operand stack．Stores（memory writes）use either an address on the operand stack or in a register，and use data from the operand stack．Data movement operations push data from a register onto the operand stack，or pop data from the stack into a register．

O nce data is on the operand stack it can be used for any instruction that expects data there．The result of

## Microprocessing Unit

## PSC1000 MICRO PROCESSOR

an add, for instance, can be left on the stack indefinitely, until used by a subsequent instruction. See Table 4. Instructions are also available to reorder the data in the top few cells of the operand stack so that prior results can be accessed when required. Data can also be removed from the operand stack and placed in local or global registers to minimize or eliminate later reordering of stack elements. Data can even be popped from the operand stack and restacked by pushing it onto the local-register stack.

Computations are usually most efficiently performed by executing the most deeply nested computations first, leaving the intermediate results on the operand stack, and then combining the intermediate results as the computation unnests. If the nesting of the computation is complex, or if the intermediate results are to be used some time later after other data will have been added to the operand stack, the intermediate results can be removed from the operand stack and stored in global or local registers.

Global registers are used directly and maintain their data indefinitely. Local registers are registers within the local-register stack cache and, as a stack, must first be allocated. Allocation can be performed by popping data from the operand stack and pushing it onto the local-register stack one cell at a time. It can also be preformed by allocating a block of uninitialized stack registers at one time; the uninitialized registers are then initialized by popping data, one cell at a time, into the registers in any order. The allocated local registers can be deallocated by pushing data onto the operand stack by popping it off of the local register stack one cell at a time, and then discarding from the operand stack the data that is not required. Alternatively, the allocated local registers can be deallocated by first saving any data required from the registers, and then deallocating a block of registers at one time. The method selected depends on the number of registers required and whether the data on the operand stack is in the required order.

Registers on both stacks are referenced relative to the tops of the stacks and are thus local in scope. W hat was accessible in ro, for example, after one cell has been push onto the local-register stack, is accessible as r1; the newly pushed value is accessible as ro.

Parameters are passed to and returned from subroutines on the operand stack. An unlimited number of parameters can be passed and returned in this manner. An unlimited number of local-register allocations can also be made. Parameters and allocated local registers thus conveniently nest and unnest across subroutines and program basic blocks.

Subroutine return addresses are pushed onto the localregister stack and thus appear as ro on entry to the subroutine, with the previous r0 accessible as r1, and so on. As data is pushed onto the stacks and the available register space fills, registers are spilled to memory when required. Similarly, as data is removed from the stacks and the register space empties, the registers are refilled from memory as required. Thus from the program's perspective, the stack registers are always available.

## Instruction Set 0 verview

Table 5 lists the M PU instructions; Table 39, Table 39, page 84, 85, and Table 40, Table 40, page 86, 87 , list the mnemonics and opcodes. All instructions consist of eight bits, except for those that require immediate data. This allowsup to four instructions (an instruction group) to be obtained on each instruction fetch, thus reducing memory-bandwidth requirementscompared to typical RISC machines with 32-bit instructions. This characteristic also allows looping on an instruction group (a micro-loop) without additional instruction fetches from memory, further increasing efficiency. Instruction formats are depicted in Figure 8.

32-BIT RISC PROCESSO R

Table 5. MPU Instruction Set

| ARITHMETIC/SHIFT | CONTROL TRANSFER | LOGICAL |
| :---: | :---: | :---: |
| ADD | BRANCH | AND |
| ADD with carry | BRANCH ON ZERO | OR |
| ADD ADDRESS | BRANCH INDIRECT | XOR |
| SUBTRACT | CALL | NOT AND |
| SUBTRACT with borrow | CALL INDIRECT | TEST BYTES |
| InCREMENT | decrement and branch | equal zero |
| DECREMENT | SKIP |  |
| NEGATE | SKIP ON CONDITION | DEBUGGING |
| SIGN EXTEND BYTE | MICRO-LOOP | STEP |
| COMPARE | MICRO-LOOP ON CONDITION | BREAKPOINT |
| MAXIMUM | RETURN |  |
| MULTIPLY SIGNED | RETURN FROM INTERRUPT | DATA MANAGEMENT |
| MULTIPLY UNSIGNED |  | LOAD |
| FAST MULTIPLY SIGNED | FLOATING POINT | Store |
| DIVIDE UNSIGNED | TEST EXPONENT | STORE INDIRECT, pre-dec/post-inc |
| SHIFT LEFTRIGHT | EXTRACT EXPONENT | PUSH REGISTER/STACK |
| DOUBLE SHIFT LEFT/RIGHT | EXTRACT SIGNIFICAND | POP REGISTER/STACK |
| INVERT CARRY | REPLACE EXPONENT | EXCHANGE |
|  | DENORMALIZE | REVOLVE |
| MISCELLANEOUS | NORMALIZE RIGHT/LEFT | SPLIT |
| CACHE CONTROL | EXPONENT DIFFERENCE | Replace byte |
| FRAME CONTROL | ADD EXPONENTS | PUSH LITERAL |
| STACK DEPTH | SUBTRACT EXPONENTS | Store on-chip resource |
| NO OPERATION | ROUND | LOAD ON-CHIP RESOURCE |

## Microprocessing U nit

## PSC1000 MICRO PRO CESSO R

Table 6. ALU Instructions

| add | add pc | adda | addc |
| :--- | :--- | :--- | :--- |
| and | cmp | dec \#1 | dec \#4 |
| dec ct, \#1 | divu | eqz | iand |
| inc \#1 | inc \#4 | mulfs | muls |
| mulu | mxm | neg | notc |
| or | sexb | shift | shiftd |
| shl \#1 | shl \#8 | shr \#1 | shr \#8 |
| shld \#1 | shrd \#1 | sub | subb |
| testb | xor |  |  |

Table 7. Code Examples: Rotate


## ALU Operations

Almost all ALU operations occur on the top of the operand stack in so and, if required, s1. A few operations also use go, ct, or pc.

Only one ALU status bit, carry, is maintained and is stored in mode. Since there are no other ALU status bits, all other conditional operations are performed by testing s0 on the fly. eqz is used to reverse the zero/non-zero state of so. M ost arithmetic operations modify carry from the result produced out of bit 31 of s0. The instruction add pc is available to perform pc-relative data references. adda is available to perform address arithmetic withoutchanging carry. O ther operations modify carry as part of the result of the operation.
s0 and s1 can be used together for double-cell shifts, with s0 containing the more-significant cell and s1 the less-significantcell of the 64-bitvalue. Both singlecell and double-cell shifts transfer a bit between carry and bit 31 of $s 0$. Code depicting single-cell rotates constructed from the double-cell shift is given in Table 7.

All ALU instruction opcodes are formatted as 8-bit values with no encoded fields.

32－BIT RISC PROCESSO R

Table 8．Branch，Loop and Skip Instructions

| br | br［］ | bz | call |
| :--- | :--- | :--- | :--- |
| call［］ | dbr | mloop | mloopc |
| mloopn | mloopnc | mloopnn | mloopnz |
| mloopz | ret | reti | skip |
| skipc | skipn | skipnc | skipnn |
| skipnz | skipz |  |  |

Branches，Skips，and Loops
The instructions br，bz，call and dbr are variable－ length．The three least－significant bits in the opcode and all of the bits in the current instruction group to the right of the opcode are used for the relative branch offset．See Figure 8 and Table 9．Branch destination addresses are cell－aligned to maximize the range of the offset and the number of instructions that are executed at the destination．If an offset is not of sufficient size for the branch to reach the destination，the branch must be moved to an instruction group where more offset bits are available，or a register indirect branch， br［］or call［］，can be used．Register indirect branches use an absolute byte－aligned address from so．The instruction add pc can be used if a com－ puted pc－relative branch is required．

The mloop＿instructions are referred to as micro－ loops．If specified，a condition is tested，and then ct is decremented．If a termination condition is not met， execution continues at the beginning of the current instruction group．Micro－loops are used to re－execute short instruction sequences without re－fetching the instructions from memory．See Table 14.

Table 9．MPU Branch Ranges

| Offset Bits | Offset Range in Bytes |
| :---: | :--- |
| 3 | $-16 /+12$ |
| 11 | $-4096 /+4092$ |
| 19 | $-1048576 /+1048572$ |
| 27 | $-268435456 /+268435452$ |

## Note：

Encoded offset is in cells．Offset is added to the address of the beginning of the cell containing the branch to compute the destination address．

\section*{Branches <br> | opcode | opcode | opcode | branch | 3－bit offset |
| :---: | :---: | :---: | :---: | :---: |
| opcode | opcode | branch | offset | 11－bit offset |
| opcode | branch | offse |  | 19－bit offset |
| branch |  | offset |  | 27－bit offset | <br> Literals <br>  <br> All Others <br> \[

$$
\begin{array}{|l|l|l|l|}
\hline \text { opcode } & \text { opcode } & \text { opcode } & \text { opcode }
\end{array}
$$
\]}

Figure 8．MPU Instruction Formats

Other than branching on zero with bz ，conditional branching is performed with the skip＿instructions． They terminate execution of the current instruction group and continue execution at the beginning of the next instruction group．They can be combined with the br，call，dbr，and ret（or other instructions）to create additional flow－of－control operations．

## Microprocessing U nit

## PSC1000 MICRO PRO CESSO R

Table 10. Literal Instructions

| push.b push.l push.n |
| :--- | :--- |

## Literals

To maximize opcode bandwidth, three sizes of literals are available. The data for four-bit (nibble) literals, with a range of -7 to +8 , is encoded in the four leastsignificant bits of the opcode; the numbers are encoded as two's-complement values with the value 1000 binary decoded as +8. The data for eightbit(byte) literals, with a range of $0-255$, is located in the right-most byte of the instruction group, regardless of the position of the opcode within the instruction group. The data for 32-bit (long, or cell) literals, is located in a cell following the instruction group in the instruction stream. M ultiple push. I instructions in the same instruction group access consecutive cells immediately following the instruction group. See Figure 8.

Table 11. Data Movement Instructions

| pop ct <br> push ct <br> push x | pop gi | pop ri gi | push ri |
| :--- | :--- | :--- | :--- |

## Data M ovement

Register data is moved by first pushing the register onto the operand stack, and then popping it into the destination register. M emory data is moved similarly. See Loads and Stores, above.

The opcodes for the data-movement instructions that access gi and ri are 8-bit values with the register number encoded in the four least-significant bits. All other data-movement instruction opcodes are formatted as 8-bit values with no encoded fields.

32-BIT RISC PROCESSO R

Table 12. Load and Store Instructions

| $l d$ | $[--r 0]$ | $l d$ | $[--x]$ | $l d$ | $[r 0++]$ | $l d \quad[r 0]$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $l d$ | $[x++]$ | $l d$ | $[x]$ | $l d$ | [] | $l d . b[]$ |
| st | $[--r 0]$ | st | $[--x]$ | st $[r 0++]$ | st $[r 0]$ |  |
| st $[x++]$ | st $[x]$ | st [] | replb |  |  |  |

## Loads and Stores

r0 and x supportregister-indirect addressing and also register-indirectaddressing with predecrement by four or postincrement by four. These modes allow for efficientmemory reference operations. Code depicting memory move and fill operations is given in Table 14.

Register indirect addressing can also be performed with the address in s 0.0 ther addressing modes can be implemented using adda. Table 13 depicts the code for a complex memory reference operation.

The memory accesses depicted in the examples above are cell-aligned, with the two least-significant bits of the memory addresses ignored. Memory can also be read at byte addresses with ld.b [] and written at byte addresses using $x$ and replb. See Byte $O$ perations.

Table 13. Code Example: Complex Addressing Mode

```
; addc [g0+g2+20],#8,[g0-g3-4]
    push g0
    push g2
    adda
    push.b #20
    adda
    Id []
    push.n #8
    addc
    push g0
    push g3
    neg
    adda
    dec #4
    st []
```

; The carry into and out of addc is maintained.

Table 14. Code Examples: Memory Move and Fill

| ; Memory Move |  |  |
| :--- | :--- | :--- |
| ; ( cell_source cell_dest cell_count -- ) |  |  |
| move_cells:: |  |  |
|  |  |  |
| pop | ct | ; count |
| pop | x | ; dest |
| pop | Istack | ; source to ro |

move_cell_loop::

| Id | $[\mathrm{rO+}+]$ |  |
| :--- | :--- | :--- |
| st | $[\mathrm{x}++]$ |  |
| mloop | move_cell_loop |  |
|  |  |  |
| push | Istack |  |
| pop |  | ; discard source |

; Memory Fill
; ( cell_dest cell_count cell_value -- )
fill_cells::

| xcg |  |  |
| :--- | :--- | :--- |
| pop | ct | ; count |
| xcg |  |  |
| pop | x | ; dest |

fill_cells_loop::

| push |  | ; keep fill value |
| :--- | :--- | :--- |
| st | $[x++]$ |  |

pop ; discard fill value

The MPU contains a one-level posted write. This allows the MPU to continue executing while the posted write is in progressand can significantly reduce execution time. M emory coherency is maintained by giving the posted write priority bus access over other MPU bus requests, thus writes are not indefinitely deferred. In the code examples in Table 14, the loop execution overhead iszero when using posted writes. Posted writes are enabled by setting mspwe.

All load and store instruction opcodes are formatted as 8 -bit values with no encoded fields.

# Microprocessing U nit 

## PSC1000 MICRO PROCESSOR

Table 15. Stack Data Management Instructions

| lframe | pop | pop lstack | push |
| :--- | :--- | :--- | :--- |
| push lstack | rev | sframe | xcg |

Stack Data M anagement
O perand stack data is used from the top of the stack and is generally consumed when processed. This can require the use of instructions to duplicate, discard, or reorder the stack data. Data can also be moved to the local-register stack to place it temporarily out of the way, or to reverse its stack access order, or to place it in a local register for direct access. See the code examples in Table 14.

If more than a few stack data management instructions are required to access a given operand stack cell, performance usually improves by placing data in a local or global register. However, there is a finite supply of global registers, and local registers, at some point, spill to memory. Data should be maintained on the operand stack only while it is efficient to do so. In general, if the program requires frequentaccess to data in the operand stack deeper than s2, that data, or other more accessible data, should be placed in directly addressable registers to simplify access.

To use the local-register stack, data can be popped from the operand stack and pushed onto the localregister stack, or data can be popped from the localregister stack and pushed onto the operand stack. This mechanism is convenient to move a few cells when the resulting operand stack order is acceptable. W hen moving more data, or when the data order on the operand stack is not as desired, lframe can be used to allocate or deallocate the required local registers, and then the registers can be w ritten and read directly. $U$ sing lframe also has the advantage of making the required local-register stack space available by spilling the stack as a continuous sequence of bus transactions, which minimizes the number of RAS cycles required when writing to DRAM. The instruction sframe behaves similarly to 1 frame, and is primarily used to discard a number of cells from the operand stack.

All stack data management instruction opcodes are formatted as 8 -bit values with no encoded fields.

Table 16. Stack Cache Management Instructions

| lcache <br> push la | ldepth <br> push sa | pop la <br> scache | pop sa <br> sdepth |
| :--- | :--- | :--- | :--- |

Stack Cache M anagement
O ther than initialization, and possibly monitoring of overflow and underflow via the related traps, the stack caches do not require active management. Several instructions exist to efficiently manipulate the caches for context switching, status checking, and spill and refill scheduling.

The _depth instructions can be used to determine the number of cells in the SRAM part of the stack caches. This value can be used to discard the values currently in the cache, to later restore the cache depth with _cache, or to compute the total on-chip and external stack depth.

The _cache instructions can be used to ensure either that data is in the cache or that space for data exists in the cache, so thatspills and refillsoccur at preferential times. This allows more control over the caching process and thus a greater degree of determinism during the program execution process. Scheduling stack spills and refills in this way can also improve performance by minimizing the RAS cycles required due to stack memory accesses.

The _frame instructions can be used to allocate a block of uninitialized register space at the top of the SRAM part of a stack, or to discard such a block of register space when no longer required. They, like the _cache instructions, can be used to group stack spills and refills to improve performance by minimizing the RAS cycles required due to stack memory accesses.

See Stacks and Stack Caches on page 28 for more information.

All stack cache management instruction opcodes are formatted as 8 -bit values with no encoded fields.

Table 17．Byte Operation Instructions

```
ld.b [] replb copyb shl #8
shr #8 testb
```

Byte O perations
Bytes can be addressed and read from memory directly and can be addressed and written to memory with the code depicted in Table 18.

Instructions are available for manipulating bytes within cells．A byte can be replicated across a cell，the bytes within a cell can be tested for zero，and a cell can be shifted by left or right by one byte．Code examples depicting scanning for a specified byte，scanning for a null byte，and moving a null－terminated string in cell－sized units are given below．

All byte operation instruction opcodes are formatted as 8－bit values with no encoded fields．

Table 18．Code Example：Byte Store

```
; Byte store
; ( byte byte_addr -- )
byte_store::
\begin{tabular}{lll} 
pop & \(x\) & ；address \\
ld & {\([x]\)} & \begin{tabular}{l} 
；get data \\
replb
\end{tabular} \\
st insert byte
\end{tabular}
```

Table 19．Code Example：Null Character Search


Table 20．Code Example：Null－Terminated String Move

| ；Move cell－aligned null－terminated string <br> ；（ cell＿source cell＿dest－－） |  |  |
| :---: | :---: | :---: |
| null＿move：： |  |  |
| pop | x | ；destination |
| pop | Istack | ；source |
| push．n | \＃0 |  |
| pop | ct | ；a very long loop |
| null＿move＿loop：： |  |  |
| Id <br> testb <br> st <br> mloopn | $\begin{aligned} & {[\mathrm{r} 0++]} \\ & {[\mathrm{x}++\mathrm{+}} \\ & \text { null_m } \end{aligned}$ | ；check for zero e＿loop |
| push Istackpop |  |  |
|  |  |  |
| ．．． |  |  |

## Microprocessing U nit

## PSC1000 MICRO PROCESSO R

Table 21. Code Example: Byte Search


Table 22. Floating-Point Math Instructions

| addexp | denorm | expdif | extexp |
| :--- | :--- | :--- | :--- |
| extsig | norml | normr | replexp |
| rnd | subexp | testexp |  |

Floating-Point M ath
The instructions above are used to implementefficient single- and double-precision IEEE floating-point software for basic math functions (+, -, *, /), and to aid in the development of floating-point library routines. The instructions perform primarily the normalization, denormalization, exponent arithmetic, rounding and detection of exceptional numbers and conditions that are otherwise execution-time-intensive when programmed conventionally. See Floating-Point Math Support on page 33.

All floating-point math instruction opcodes are formatted as 8-bit values with no encoded fields.

32－BIT RISC PROCESSO R

Table 23．Debugging Instructions
bkpt step

## Debugging Features

Each of these instructions signals an exception and trapsto an application－supplied execution－monitoring program to assist in the debugging of programs．See Debugging Support on page 36.

Both debugging instruction opcodes are formatted as 8－bit values with no encoded fields．

Table 24．On－Chip Resources Instructions

Table 25．Miscellaneous Instructions

| di | ei nop |
| :--- | :--- | :--- |
| push mode | split |

Miscellaneous
The disable－and enable－interrupt instructions are the only system control instructions；they are supplied to make interrupt processing more efficient．O ther system control functions are performed by setting or clearing bits in mode，or in an on－chip resource register．The instruction split separates a 32－bit value into two cells，each containing 16 bits of the original value．

All miscellaneous instruction opcodes are formatted as 8－bit values with no encoded fields．

## Stacks and Stack Caches

The stack caches optimize use of the stack register resources by minimizing the overhead required for the allocation and saving of registers during programmed or exceptional context sw itches（such as call subrou－ tine execution and trap or interrupt servicing）．


Figure 9．Stack Exception Regions

## Microprocessing Unit

## PSC1000 MICRO PROCESSO R

The local-register stack consists of an on-chip SRAM array that is addressed to behave as a conventional last-in, first-out queue. Local registers r0-r15 are addressed internally relative to the currenttop of stack. The registers $\mathrm{r} 0-\mathrm{r} 14$ are individually addressable and are always contiguously allocated and filled. If a register is accessed that is not in the cache, all the lower-ordinal registers are read in to ensure a contiguous data set.

The operand stack is constructed similarly, with the addition of two registers in front of the SRAM stack cache array to supply inputs to the ALU. These registers are designated s0 and s1, and the SRAM array is designated s2-s17. O nly registers s $0, \mathrm{~s} 1$ and s2 are individually addressable, but otherwise the operand stack behaves similarly to the local-register stack. W hereas the SRAM array, s2-s17, can become "empty" (see below), s0 and s1 are always considered to contain data.

The stack caches are designed to always allow the current operation to execute to completion before an implicit stack memory operation is required to occur. No instruction explicitly pushes or explicitly pops more than one cell from either stack (except for stack managementinstructions). Thusto allow execution to completion, the stack cache logic ensures that there is always one or more cells full and one or more cells empty in each stack cache (except immediately after reset, see below ) before instruction execution. If, after the execution of an instruction, this is not the case on eitherstack, the corresponding stack cache is automatically spilled to memory or refilled from memory to reach this condition before the next instruction is allowed to execute. Similarly, the instructions _cache, _frame, pop sa, and pop la, which explicitly change the stack cache depth, execute to completion, and then ensure the above conditions exist.

Thus r15 or s17 can be filled by the execution of an instruction, but they are spilled before the next instruction executes. Similarly, r0 and s2 can be emptied by the execution of an instruction, but they are filled before the next instruction executes.

The stackscan be arbitrarily deep. W hen a stack spills, data is written at the address in the stack pointer and then the stack pointer is decremented by four (postdecremented stack pointer). Conversely, when a stack refills, the stack pointer is incremented by four, and then data is read from memory (preincremented stack pointer). The stack pointer thus points to the next location to write and the stacks grow from higher to lower memory addresses. The stack pointer for the operand stack is sa, and the stack pointer for the local-register stack is la.

Since the stacks are dynamically allocated memory areas, some amount of planning or management is required to ensure the memory areas do not overflow or underflow. The simplest is to allocate a sufficiently large memory area so that overflow conditions won't occur. In this case, a correctly written program does not produce underflow. Alternatively, stack memory can be dynamically allocated or monitored through the use of stack-page exceptions.

## Stack-Page Exceptions

Stack-page exceptions occur on any stack-cache memory access near the boundary of any 1024-byte memory page to allow overflow and underflow protection and stack memory management. To prevent thrashing stack-page exceptions near the margins of the page boundary areas, once a boundary area is accessed and the corresponding stack-page exception is signaled, the stack pointer must move to the middle region of the stack page before another stack-page exception can be signaled. See Figure 9.

Stack-page exceptions enable stack memory to be managed by allowing stack memory pages to be reallocated or relocated when the edges of the current stack page are approached. The boundary regions of the stack pages are located 32 cells from the ends of each page to allow even a _cache or _frame instruction to execute to completion and to allow for the corresponding stack cache to be emptied to memory. U sing the stack-page exceptions requires that only 2 KB of addressable memory be allotted to each stack at any given time: the currentstack page and the page near the most recently encroached boundary.

## PSC1000 M icroprocessor

32－BIT RISC PROCESSO R

Each stack supports stack－page overflow and stack－ page underflow exceptions．These exception condi－ tions are tested against the memory address that is accessed when the corresponding stack spills or refills between the execution of instructions．mode contains bits that signal local－stack overflow，local－stack underflow，operand stack overflow and operand stack underflow，as well as the corresponding trap enable bits．

The stack－page exceptions have the highest priority of all of the traps．As this implies，it is important to consider carefully the stack effects of the stack trap handler code so thatstack－page boundaries are not be violated during its execution．

Table 26．Code Example：Stack Initialization

```
init_stacks::
；Create a stack area below xx＿base in ；memory．One cell is read in to initialize \(s 2 / r 0\) ．
    push.l #os_base-8
    pop sa ;read os_base-4
    ; s0 and s1 are uninitialized
    push.l #s_base-8 ; allow dead zone
    pop la ; read Is_base-4
```


## Stack Initialization

After CPU reset both of the MPU stacks should be considered uninitialized until the corresponding stack pointers are loaded，and this should be one of the first operations performed by the MPU ．

After a reset，the stacks are abnormally empty．That is， r0 and s2 have not been allocated，and are allocated on the first push operation to，or stack pointer initial－ ization of，the corresponding stack．H ow ever，popping the pushed cell causes that stack to be empty and require a refill．The first pushed cell should therefore be left on thatstack，or the corresponding stack pointer should be initialized，before the stack is used further． See Table 26.

Stack Depth
The total number of cells on each stack can readily be determined by adding the number of cells that have spilled to memory and the number of cells in the on－ chip caches．See Table 27.

Table 27．Code Example：Stack Depth


## Microprocessing U nit

## PSC1000 MICRO PROCESSOR

Table 28. Code Example: Save Context

| ; Context switch: save context |  |
| :---: | :---: |
| ; Save off any gloabls required and flush stacks |  |
| save_context:: |  |
| ; Save globals as required |  |
| push g15 |  |
| push g14 |  |
| ... | ; save a |

; Flush stacks to memory
; add one cell to local-register stack so on-chip ; part can spill.
push.b \#14 ; count for_cache
pop Istack
push r0 ; count for Icache
; ensure no interrupts between flush and la read .quad 2
Icache ; write out spillable area
push la ; save pointer
; add three cells to stack so on-chip part can spill push
push
push r0 ; count for scache
; ensure no interrupts between flush and sa read .quad 2
scache ; write out all of spillable area push sa
push.l \#sp_save_area
st [] ; save off stack pointer
; Now load new context and continue

Stack Flush and Restore
W hen performing a context switch, it is necessary to spill the data in the stack cachesto memory so that the stack caches can be reloaded for the new context. Attention must be given to ensure that the parts of the stack caches that are alw ays maintained on-chip, ro and $s 0-s 2$, are forced into the spillable area of the stack caches so that they can be written to memory. Code examples are given for context switches that include flushing and restoring the caches in Table 28 and Table 29, respectively.

Table 29. Code Example: Restore Context


Table 30. Traps Dependent on System State

| Stack Depth Change |  | Traps |
| :---: | :---: | :---: |
| Operand Stack | LocalRegister Stack |  |
| +n | 0 | Operand Stack Overflow |
| -n | 0 | Operand Stack Underflow |
| 0 | +1 | Local Stack Overflow |
| 0 | -1 | Local Stack Underflow |
| +1 | -n | Local Stack Underflow Operand Stack Overflow Local Stack Underflow and Operand Stack Overflow |
| -1 | +n | Local Stack Overflow Operand Stack Underflow Local Stack Overflow and Operand Stack Underflow |
| -1 | -n | Local Stack Underflow Operand Stack Underflow Local Stack Underflow and Operand Stack Underflow |
| Notes: <br> 1. $+\mathrm{n}>0,-\mathrm{n}<0$ <br> 2. If the instruction reads or writes memory or if a posted write is in progress, a memory fault can also occur. <br> 3. If the instruction is single-stepped, a single-step trap also occurs. <br> 4. If any trap occurs, a local-register stack overflow could also occur. |  |  |

## Exceptions and Trapping

Exception handling is precise and is managed by trapping to executable-code vectors in low memory. Each 32-bit vector location can contain up to four instructions. This allows servicing the trap within those four instructions or by branching to a longer trap routine. Traps are prioritized and nested to ensure proper handling. The trap names and executable vector locations are shown in Figure 5.

An exception is said to be signaled when the defined conditions exist to cause the exception. If the trap is enabled, the trap is then processed. Traps are processed by the trap logic, which causes a call subroutine to the associated executable-code-vector address. When multiple traps occur concurrently, the lowestpriority trap is processed first, but before the executable-code vector is executed, the next-higherpriority trap is processed, and so on, until the highestpriority trap is processed. The highest-priority trap's executable-code vector then executes. The nested executable-code-vector return addresses unnest as each trap handler executes ret, thus producing the prioritized trap executions.

Interrupts are disabled during trap processing and nesting, until an instruction that begins in byte one of an instruction group is executed. Interrupts do notnest with the traps since their request state is maintained in the INTC registers.

Table 31 lists the priorities of each trap. Traps that can occur explicitly due to the data processed or instruction executed are listed in Table 32. Traps that can occur due to the current state of the system, concurrently with the traps in Table 32, are listed in Table 30.

## Microprocessing U nit

## PSC1000 MICRO PROCESSOR

Table 31. Trap Priorities

| Priority | Traps |
| :---: | :--- |
| 1 (highest) | local-register stack overflow |
| 2 | operand stack overflow |
| 3 | local-register stack underflow |
| 4 | operand stack underflow |
| 5 | memory fault |
| 6 | floating-point exponent <br> floating-point underflow <br> floating-point overflow <br> floating-point round |
| 7 | floating-point normalize |
| 8 | breakpoint |
| 9 (lowest) | single step |

Table 32. Traps Independent of System State

| Instruction | Trap C ombinations |
| :---: | :--- |
| addexp | Floating Point Underflow, <br> Floating Point Overflow |
| bkpt | Breakpoint |
| denorm | Floating Point Normalize |
| norml | Floating Point Underflow, <br> Floating Point Normalize, <br> Floating Point Underflow and <br> Floating Point Normalize |
| normr | Floating Point Overflow, <br> Floating Point Normalize, <br> Floating Point Overflow and <br> Floating Point Normalize |
| rnd | Floating Point R ound |
| step | Single Step |
| subexp | Floating Point Underflow, <br> Floating Point Overflow |
| testexp | Floating Point Exponent |

## Floating-Point Math Support

The MPU supports single-precision (32-bit) and double-precision (64-bit) IEEE floating-point math software. Rather than a floating-point unit and the silicon area it would require, the MPU contains instructions to perform most of the time-consuming operations required when programming basic floatingpoint math operations. Existing integer math operations are used to supply the core add, subtract, multiply, and divide functions, while special instructions are used to efficiently manipulate the exponents and detectexception conditions. Additionally, a threebitextension to the top one or two stack cells (depending on the precision) is used to aid in rounding and to supply the required precision and exception signaling operations.


Figure 10. Floating-Point Number Formats

## Data Formats

Though single- and double-precision IEEE formats are supported, from the perspective of the M PU , only 32bit values are manipulated at any one time (exceptfor double shifting). See Figure 10. The M PU instructions directly support the normalized data formats depicted. The related denormalized formats are detected by testexp and fully supportable in software.

Status and Control Bits
mode contains 13 bitsthat setfloating-point precision, rounding mode, exception signals, and trap enables. See Figure 11, page 39.

32-BIT RISC PROCESSO R

Table 33. GRS Extension Bit Manipulation Instructions

| cleared by: |  |  |
| :--- | :--- | :--- |
| testexp | replexp |  |
| shifted into by: |  |  |
| denorm | normr | shift |
| shr \#1 | shr \#8 | shrd \#1 |

GRS Extension Bits
To maintain the precision required by the IEEE standard, more significand bits are required than are held in the IEEE format numbers. These extra bits are used to hold bits that have been shifted out of the right of the significand. They are used to maintain additional precision, to determine if any precision has been lost during processing, and to determine whether rounding should occur. The three bits appear in mode so they can be saved, restored and manipulated. Individually, the bits are named guard_bit, round_bit and sticky_bit. Several instructions manipulate or modify the bits. See Table 33.

W hen denorm and normr shift bits into the GRS extension, the source of the bits is always the leastsignificant bits of the significand. In single-precision mode the GRS extension bits are taken from so, and in double-precision mode the bits are taken from s1. For conventional right shifts, the GRS extension bits always come from the least significant bits of the shift (i.e., s0 if a single shift and s1 if a double shift). The instruction norml is the only instruction to shift bits out of the GRS extension; it shifts into so in singleprecision mode and into s1 in double-precision mode. Conventional left shifts always shift in zeros and do not affect the GRS extension bits.

Table 34. Rounding-Mode Actions

| Sign of ct | G | R | S | Action |
| :---: | :---: | :---: | :---: | :---: |
| Round to nearest or even |  |  |  |  |
| X | 0 | x | x | do nothing |
| X | 1 | 0 | 0 | increment so, clear bit 0 of so |
| x | 1 |  |  | increment so |
| Round toward negative infinity |  |  |  |  |
| 0 | x | x | x | do nothing |
| 1 | 0 | 0 | 0 | do nothing |
| 1 |  | ny 1 |  | increment so |
| Round toward positive infinity |  |  |  |  |
| 0 | 0 | 0 | 0 | do nothing |
| 0 | any 1 |  |  | increment so |
| 1 | x | x | x | do nothing |
| Round toward zero |  |  |  |  |
| x | x | x | x | do nothing |

## Rounding

The GRS extension maintains three extra bits of precision while producing a floating-point result. These bits are used to decide how to round the result to fit the destination format. If one views the bits as if they were just to the right of the binary point, then guard_bit has a position value of one-half, round_bit hasa positional value of one-quarter, and sticky_bit has a positional value of one-eighth. The rounding operation selected by fp_round_mode uses the G RS extension bits and the sign bit of ct to determine how rounding occurs. If guard_bit is zero the value of GRS extension is below one-half. If guard_bit is one the value of GRS extension is onehalf or greater. Since the GRS extension bits are not part of the destination format they are discarded when the operation is complete. This information is the basis for the operation of the instruction rnd.

## Microprocessing Unit

## PSC1000 MICRO PROCESSOR

Most rounding adjustments by rnd involve doing nothing or incrementing s 0 . W hether this is rounding down or rounding up depends on the sign of the floating-point result that is in ct. If the GRS extension bits are non-zero, then doing nothing has the effect of "rounding down" if the result is positive, and "rounding up" if the result is negative. Similarly, incrementing the result has the effect of "rounding up" if the result is positive and "rounding down" if the result is negative. If the GRS extension bits are zero then the result was exact and rounding is not required. See Table 34.

In practice, the significand (or the lower cell of a double-precision significand) is in s0, and the sign and exponent are in ct. carry is set if the increment from rnd carried out of bit 31 of $s 0$; otherwise, carry iscleared. This allows carry to be propagated into the upper cell of a double-precision significand.

## Exceptions

To speed processing, exception conditions detected by the floating-point instructions set exception signaling bits in mode and, if enabled, trap. The following traps are supported:

- Exponent signaled from testexp
- Underflow signaled from norml, addexp, subexp
- O verflow signaled from normr, addexp, subexp
- Normalize signaled from denorm, norml, normr
- Rounded signaled from rnd

Exceptions are prioritized when the instruction completes and are processed with any other system exceptions or traps that occur concurrently. See Exceptions and Trapping, page 32.

- Exponent Trap: Detects special-case exponents. If the tested exponent is all zeros or all ones, carry is set and the exception is signaled. Setting carry allows testing the result without processing a trap.
- Underflow Trap: Detects exponents that have becometoo small dueto calculationsor decrementing while shifting.
- Overflow Trap: Detects exponents that have become too large due to calculations or incrementing while shifting.

Table 35. Code Example: Floating-Point Multiply
; Floating-Point Multiply
; (r1 r2 -- product )
$\ldots$
testexp
addexp
pop ct $\quad$; save sign \& exp sum
; A 24-bitx 24-bit multiply makes a 47 to 48-bit product, ; leaving 16 -bits in the high cell. If we multiply 32 -bit $x$ ; 24-bit we get a 56 -bit product with 24 -bits in the high ; part, which is what we want.
; make into a 32-bit multiplier

| shl | \#8 |  |
| :---: | :---: | :---: |
| pop | g0 |  |
| shl | \#1 |  |
| push.n | \# |  |
| mulu |  |  |
| xcg |  |  |
| pop |  | ; discard low part |
| normr |  |  |
| rnd |  |  |
| normr |  |  |
| push replexp | ct |  |
|  |  |  |

- Normalize Exception: Detects bits lost due to shifting into the GRS extension. The exception condition is tested at the end of instruction execution and is signaled if any of the bits in the GRS extension are set. Testing at this time allows normal right shifts to be used to set the GRS extension bits for later floating-point instructions to test and signal.
- Rounded Exception: Detects a change in bitzero of so due to rounding.


## H ardware D ebugging Support

The MPU contains both a breakpoint instruction， bkpt，and a single－step instruction，step．The instruction bkpt executes the breakpoint trap and supplies the address of the bkpt opcode to the trap handler．Thisallows execution at full processor speed up to the breakpoint，and then execution in a program－controlled manner follow ing the breakpoint． st ep executes the instruction at the supplied address， and then executes the single－step trap．The single－step trap can efficiently monitor execution on an instruction－by－instruction basis．

## Breakpoint

The instruction bkpt performs an operation similar to a call subroutine to address $0 \times 134$ ，except that the return address is the address of the bkpt opcode．This behavior is required because，due to the instruction push．l，the address of a call subroutine cannot always be determined from its return address．

Commonly，bkpt is used to temporarily replace an instruction in an application at a point of interest for debugging．The trap handler for bkpt typically restores the original instruction，displays information for the user，and waits for a command．Or，the trap handler could be implemented as a conditional breakpoint to check for a termination condition（such as a register value or the number of executions of this particular breakpoint），continuing execution of the application until the condition is met．The advantage of bkpt over step is that the applications executes at full speed betw een breakpoints．

## Single－Step

The instruction step is used to execute an application program one instruction at a time．It acts much like a return from subroutine，except that after executing one instruction at the return address，a trap to address $0 \times 138$ occurs．The return address from the trap is the address of the next instruction．The trap handler for step typically displays information for the user，and waits for a command．Or，the trap handler could instead check for a termination condition（such as a register value or the number of executions of this particular location），continuing execution of the application until the condition is met．
step is processed and prioritized similarly to the other exception traps．Thismeans that all traps execute before the step trap．The result is that step cannot directly single－step through the program code of other trap handlers．The instruction step is normally considered to be below the operating－system level， thus operating－system functions such as stack－page traps must execute without its intervention．

Higher－priority trap handlers can be single－stepped by re－prioritizing them in software．Rather than directly executing a higher－priority trap handler from the corresponding executable trap vector，the vector would branch to code to rearrange the return ad－ dresses on the return stack to change the resulting execution sequence of the trap handlers．Various housekeeping tasks must also be performed，and the various handlers must ensure that the stack memory area boundaries are not violated by the re－prioritized handlers．

## Virtual－Memory Support

The M PU supports virtual memory through the use of external mapping logic that translates logical to physical memory addresses．During MPU RAS memory cycles，theCPU－supplied logical row address is translated by an external SRAM to the physical row address and a memory page－fault bit．The memory page－fault bit is sampled during the memory cycle to determine if the translated page in memory is valid or invalid．Sufficient time exists in the normal RAS precharge portion of DRAM memory cycles to map the logical pages to physical pages with no memory－ cycle－time overhead．

An invalid memory page indication causes the memory－faultexception to be signaled and，if enabled， the trap to be executed to service the fault condition． Posted－write faults are completed in the trap routine； other types of faulting operations are completed by returning from the trap routine to re－execute them． W hether the fault is from a read or write operation is indicated by mflt＿write．The fault address and data （if a write）are stored in mfltaddr and mfltdata． Memory－fault traps are enabled by mflt＿trap＿en． See the code example on page 37.

## PSC1000 MICRO PROCESSOR

Table 36. Code Example: Memory-Fault Service Routine

; Now go and get the faulted page from disk
; into memory, update the mapping SRAM, etc.
; ( mode data addr -- mode data addr )
; If memory fault occurred while attempting a
; posted write, perform the write in the handler.

| ; check if fault was read or write |  |  |
| :---: | :---: | :---: |
| push | s2 | ; duplicate mode |
| push.l | \#mflt_write |  |
| bz | discard location | write fault? |
| push.l | \#miscc |  |
| Ido | [] |  |
| push.b | \#mspwe |  |
| and |  | ; posted write? |
| .quad | 3 |  |
| skipz | stack,discard_loc | tion |
| st | [] | ; complete it |
| push |  | ; maintain 2 items |

discard_location::

| pop | ; discard "address" |
| :--- | :--- |
| pop | ; discard "data" |

; R eset exception-signal bit.
push.l \#mflt_exc_sig
iand
pop mode
; For non-posted-write faults, the load/store/pre ;-fetch retries on return.

```
ret
```

Table 37. VRAM Commands

| Description | At falling edge of: |  |  |  |  |
| :--- | :---: | :---: | :---: | :---: | :---: |
|  | RAS |  |  |  | CAS |
|  | CAS | OE | WE | DSF | DSF |
| R AM read/write | H | H | H | L | L |
| color register set | H | H | H | H | - |
| masked write | H | H | L | L | L |
| flash write | H | H | L | H | - |
| read transfer | H | L | H | L | - |
| split read transfer | H | L | H | H | - |
| block write | H | H | H | L | H |
| masked block <br> write | H | H | L | L | H |
| set bit-blt mode | L | - | L | - | - |

## Video RAM Support

Video RAMS(VRAM s) are D RAM sthat have a second port that provides serial access to the DRAM array. This allows video data to be serially clocked out of the memory to the display while normal MPU accesses occur. To preventDRAM array access contentions, the M PU periodically issues read transfer requests, which copy the selected DRAM row to the serial transfer buffer. To eliminate read transfer synchronization problems, many VRAMs have split transfer buffers, which allow greater timing flexibility for the MPU 's read transfer operations. The M PU instructs the VRAM to perform a read transfer or a split read transfer by encoding the command on the state of the VRAM $\overline{O E}$, $\overline{W E}$, and DSF (devicespecial function) during the time $\overline{R A S}$ falls. These operations are encoded by writing vram and performing an appropriate read or write to the desired VRAM memory address. See Figure 32, page 137.

Some VRAM shave more advanced operations- such as line fills, block fills, and bit-blts-which are encoded with other combinations of $\overline{\mathrm{WE}}, \overline{\mathrm{OE}}, \mathrm{DSF}$, $\overline{R A S}$, and $\bar{C} \overline{A S}$. A basic set of operations and commands is common among manufacturers, but the commands for more advanced functions vary.

32－BIT RISC PROCESSO R

## Register mode

mode contains a variety of bits that indicate the status and execution options of the MPU．Except as noted， all bits are writable．The register is shown in Figure 11.

```
mflt_write
```

After a memory－fault exception is signaled，indicates that the fault occurred due to a memory write．

## guard＿bit

The most－significantbitof a 3－bitextension below the least－significantbitof s0（s1，if fp＿precision is set） that is used to aid in rounding floating－pointnumbers．

```
round_bit
```

The middle bit of a 3－bit extension below the least－ significant bit of s0（s1，if fp＿precision is set）that is used to aid in rounding floating－point numbers．

```
sticky_bit
```

The least－significant bit of a 3－bit extension below the least－significantbit of s0（s1，if fp＿precision is set） that is used to aid in rounding floating－point numbers． O nce set due to shifting or writing the bit directly，the bit stays set even though zero bits are shifted right through it，until it is explicitly cleared or written to zero．
mflt＿trap＿en
If set，enables memory－fault traps．
mflt＿exc＿sig
Set if a memory fault is detected．
ls＿boundary
Set if ls＿ovf＿exc＿sig or ls＿unf＿exc＿sig becomes set as the result of a stack spill or refill． Cleared when the address in la，as the result of a stack spill or refill，has entered the middle region of a 1024－byte memory page，and when la is written． $U$ sed by the local－register stack trap logic to prevent unnecessary stack overflow and underflow traps when repeated local－register stack spills and refills occur near a 1024－byte memory page boundary．Not writable．

```
ls_unf_trap_en
```

If set，enables a local－register stack underflow trap to occur after a local－register stack underflow exception is signaled．

## ls＿unf＿exc＿sig

Setifalocal－register stack refill occurs，ls＿boundary is clear，and the accessed memory address is in the last thirty－two cells of a 1024－byte memory page．
ls＿ovf＿trap＿en
If set，enables a local－register stack overflow trap to occur after a local－register stack overflow exception is signaled．
ls＿ovf＿exc＿sig
Setifa local－register stack spill occurs，Is＿boundary is clear，and the accessed memory address is in the first thirty－two cells of a 1024－byte memory page．

```
os_boundary
```

Set if os＿ovf＿exc＿sig or os＿unf＿exc＿sig becomes set as the result of a stack spill or refill． Cleared when the address in sa，as the result of a stack spill or refill，has entered the middle region of a 1024－byte memory page，and when sa is written． Used by the operand stack trap logic to prevent unnecessary stack overflow and underflow trapswhen repeated operand stack spills and refills occur near a 1024－byte memory page boundary．N ot writable．
os＿unf＿trap＿en
If set，enables an operand stack underflow trap to occur after an operand stack underflow exception is signaled．
os＿unf＿exc＿sig
Set if an operand stack refill occurs，os＿boundary is clear，and the accessed memory address is in the last thirty－two cells of a 1024－byte memory page．

## os＿ovf＿trap＿en

If set，enables an operand stack overflow trap to occur after an operand stack overflow exception issignaled．

## Microprocessing Unit

PSC1000 MICRO PRO CESSO R


Figure 11. Register mode

32－BIT RISC PROCESSO R
os＿ovf＿exc＿sig
Set if an operand stack spill occurs，os＿boundary is clear，and the accessed memory address is in the first thirty－two cells of a 1024－byte memory page．

## carry

Contains the carry bit from the accumulator．Saving and restoring mode can be used to save and restore carry．
power＿fail
Set during power－up to indicate that a power failure has occurred．Cleared by any write to mode．O ther－

## interrupt＿en

If set，interrupts are globally enabled．Set by the instruction ei，cleared by di．
fp＿rnd＿exc＿sig
If set，a previous execution of rnd caused a change in the leastsignificantbitof s0（s1，if fp＿precision is set）．

```
fp_rnd_trap_en
```

If set，enables a floating－pointround trap to occur after a floating－point round exception is signaled．
fp＿nrm＿exc＿sig
If set，one or more of the guard＿bit，round＿bit and sticky＿bit were set after a previous execution of denorm，norml or normr．
fp＿nrm＿trap＿en
If set，enables a floating－point normalize trap to occur after a floating－point normalize exception is signaled．

## fp＿ovf＿exc＿sig

If set，a previous execution of normr，addexp or subexp caused the exponent field to increase to or beyond all ones．
fp＿ovf＿trap＿en
If set，enables a floating－point overflow trap to occur after a floating－point overflow exception is signaled．
fp＿unf＿exc＿sig
If set，a previous execution of norml，addexp or subexp caused the exponent field to decrease to or beyond all zeros．

```
fp_unf_trap_en
```

If set，enables a floating－point underflow trap to occur after a floating－point underflow exception issignaled．
fp＿exp＿exc＿sig
If set，a previous execution of testexp detected an exponent field containing all ones or all zeros．
fp＿exp＿trap＿en
If set，enables a floating－point exponent trap to occur after a floating－point exponent exception is signaled．
fp＿round＿mode
Contains the type of rounding to be performed by the MPU instruction rnd．

## fp＿precision

Ifclear，the floating－point instructions operate on stack values in IEEE single－precision（32－bit）format．If set， the floating－point instructions operate on stack values in IEEE double－precision（64－bit）format．

## MPU Reset

After reset，the VPU begins executing at address $0 \times 80000004$ ，before the M PU begins execution．The VPU must be programmed to execute delay before the MPU can access the bus and begin execution． Once the VPU executes delay，the MPU begins executing at address $0 \times 80000008$ ．Details of various startup configurations are detailed in Processor Startup，page 181.

## Interrupts

The CPU contains an on－chip prioritized interrupt controller that supports up to eight different interrupt levels from twenty－four interrupt sources．Interrupts can be received through the bit inputs，from I／O－ channel transfers，from the VPU，or can be forced in software by writing to ioin．For complete details of interrupts and their servicing，see InterruptController， page 107.

## Microprocessing U nit

## PSC1000 MICRO PROCESSO R

## Bit Inputs

TheCPU contains eightgeneral-purpose bitinputsthat are shared with the INTC and DM AC as requests for those services. The bits are taken from $\overline{\mathrm{IN}}[7: 0]$, or, if so configured, are sampled from AD [7:0] on the bus. Sampling from the bus can allow the use of smaller, less-expensive packages for the CPU ; it can also reduce PW $B$ area requirements through reuse of the $A D$ bus rather than routing a separate bit-inputbus. See Bit Inputs, page 111

## Bit O utputs

The CPU contains eight general-purpose bit outputs that can be written by the MPU or VPU. The bits are output on OUT [7:0] and are also available on $\operatorname{AD}[7: 0]$ during RAS inactive. Taking the bits from the bus can allow the use of smaller, less-expensive packages for the CPU; it can also reduce PWB area requirements through reuse of the AD bus rather than routing a separate bit-output bus. See Bit Outputs, page 115.

Table 38. Instructions That Hold-off Pre-fetch

| bkpt br | bz | call | dbr | ld $\dagger$ |  |
| :--- | :--- | :--- | :--- | :--- | :--- |
| mloopx push.l | ret | reti | st $\dagger$ | step |  |
| $\dagger$ |  |  |  |  |  |
|  |  |  |  |  |  |

## Instruction Pre-fetch

The MPU issues bus requests ordered to optimize execution. To keep executing instructions as much as possible, the next group of instructions are fetched while the current group executes. This is referred to as instruction pre-fetch. Instruction pre-fetch begins as soon as an instruction group begins to execute unless it is held off. Pre-fetch is held off if the executing instruction group contains one of the instruction in Table 38. Id and st only hold off pre-fetch if they occur as the first instruction in the executing instruction group. Knowing which instruction hold-off prefetch Is useful when programming bus configuration information.

## Posted-W rite

The MPU supports a one-level posted write. This allows MPU execution to continue unimpeded after the write is posted. To maintain memory coherency, posted writes have the highest priority of all M PU bus requests. This guarantees that memory reads following a posed write will always retrieve the mostup-to-date data.

## On-Chip Resources

The non-MPU hardware features of the CPU are generally accessed by the MPU through a set of 41 registers located in their own address space. Using a separate address space simplifies implementation, preserves opcodes, and prevents cluttering the normal memory address space with peripherals. Collectively known as the On-Chip Resources, these registers allow access to the bit inputs, bit outputs, INTC, DMAC, MIF, system configuration, and some functions of the VPU. These registers and their functions are referenced throughout this manual and are described in detail in On-Chip Resource Registers, page 129.

## Instruction Reference

As a stack-based M PU architecture, the PSC1000 M PU instructions have documentation requirements similar to other stack-based systems, such as the Java Virtual Machine (JVM) and American National Standard Forth (AN S Forth). N ot surprisingly, many of the JVM and ANS Forth operations are instructions on the PSC1000 MPU. As a result, the JVM and ANS Forth stack notation used for language documentation is useful for describing PSC1000 M PU instructions. The basic notation adapted for the PSC 1000 M PU is:
( input_operands -- output_operands )
( L: input_operands -- output_operands ) where "--" indicates the execution of the instruction. "Input_operands" and "output_operands" are lists of values on the operand stack (the default) or local register stack (preceded by "L:"). These are similar, though not always identical, to the source and destination operands that can be represented within instruction mnemonics. The value held in the top-ofstack register ( $s 0$ or $r 0$ ) is always on the right of the operand list with the values held in the higher ordinal
registers appearing to the left (e.g., s2 s1 s0). The only items in the operand lists are those that are pertinent to the instruction; other values may exist under these on the stacks. All of the input_operands are considered to be popped off the stack, the operation performed, and the output operands pushed on the stack. For example, a notational expression of: n1 n2 -- n3
represents two input operands, n1 and n2, and one outputoperand, n3. For the instruction add, n1 (taken from s ) is added to n 2 (taken from so), and the result is $n 3$ (left in $s 0$ ). If the name of a value on the left of either diagram is the same as the name of a value on the right, then the value was required, butunchanged. The name represents the operand type. Numeric suffixes are added to indicate different or changed operands of the same type. The values may be bytes, integers, floating-point numbers, addresses, or any other type of value that can be placed in a single 32bit cell.

$$
\begin{array}{ll}
\text { addr } & \text { address } \\
\text { byte } & \text { character or byte (upper } 24 \text { bits zero) }
\end{array}
$$

n integer or 32 arbitrary bits
other text integer or 32 arbitrary bits
ANS Forth defines other operand types and operands that occupy more than one stack cell; those are not used here.

N ote that typically all stack action is described by the notation and is not explicitly described in the text. If there are multiple possible outcomes then the outcome options are on separate lines and are to be considered as individual cases. If other registers or memory variables are modified, then that effect is documented in the text.

Also on the stack diagram line is an indication of the effect on carry, if any, as well as the opcode and execution time at the right margin.

A timing with an " M " indicates the specified number of bus requests and bustransactions (memory cycles) for the instruction to complete. Bus requests require two CPU-clock cycles and bus transaction times are as programmed and described in Programmable M emory Interface, page 117, and Bus O peration, page 157.The value used for " M " includes both the bus request and bus transaction times.

Timings do not include implied memory cycles such as stack spills and refills required to maintain the state of the stack caches. Any operation that pushes or pops a stack, or references a local register could cause a memory cycle. O perations that wait on the completion of instruction pre-fetch are labeled "M prefetch." These are distinct in that pre-fetch occurs in parallel with execution so the wait time is probably not a full memory cycle.

## AN S Forth W ord Equivalents

Those PSC1000 instructions that are exact equivalents of ANS Forth words are indicated in the body text for the instruction. Many additional ANS Forth words simply require a short instruction sequence, but these are not indicated.

## Java Byte Code Equivalents

Those PSC1000 instructionsthat are exactequivalents of Java byte codes are indicated in the body text for the PSC1000 instruction. M any additional Java byte codes simply require a short instruction sequence, though the most complex byte codes require a subroutine call. For detailed information contact Patriot Scientific.

## Microprocessing Unit

PSC1000 MICRO PRO CESSO R
MNEMONIC
STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...SO/R0 )

CARRY?
OPCODE

## add

Add $n 1$ and $n 2$ giving the sum $n 3$. carry is set if there is a carry out of bit 31 of the sum and cleared otherwise.

Equivalent to Java byte code iadd.
Equivalent to ANS Forth word +.
add pc
( $n 1$-- n2 )
carry $\pm$
10111011
$0 \times B B$
1 CPU-clock
Add the value of pc (the byte-aligned address of the add pc opcode) to n 1 giving the sum n 2 . carry is set if there is a carry out of bit 31 of the sum and cleared otherwise.

## adda

Add Address
adda
( n1 n2 -- n3 )

11101000
$0 \times E 8$
1 CPU-clock
Add $n 1$ and $n 2$ giving the sum $n 3$. carry is unaffected.

## addc

Add with Carry
addc ( n1 n2 -- n3 )
carry $\quad 11000010$
$0 \times C 2$
1 CPU-clock
Add n 1 and n 2 and carry giving the sum n 3 . carry is set if there is a carry out of bit 31 of the sum, otherwise carry is cleared.

MNEMONIC STACKS ( input $\mathrm{Sn} / \mathrm{Rn} . . . \mathrm{SO} / \mathrm{R0}$-- output $\mathrm{Sm} / \mathrm{Rm} \ldots . \mathrm{So} / \mathrm{R0}$ ) CARRY? OPCODE

## addexp

Add Exponents


Perform the following:
Exponent_Field(n5) = Exponent_Field(n1)-BIAS + Exponent_Field(n2)
Sign_Bit(n5) = Sign_Bit(n1) XO R Sign_Bit(n2)
BIAS is $\overline{127}$ ( $0 \times 3 F 800000$ in position) for single precision and 1023 (0x3FF00000 in position) for double precision, as selected by fp_precision.

Compute as described above. Clear the exponent field bits and sign bit and set the hidden bit of n 1 and n2, givingn3 and n4, respectively. n5 isthe result of the computation. After completion, ifthe exponent-field calculation result equaled or exceeded the maximum value of the exponent field (exponent field result $\geq$ 255 for single, exponent field result $\geq 2047$ for double) an overflow exception is signaled. If the exponentfield calculation result is less than or equal to zero an underflow exception is signaled. W hen an exception is signaled, the exponent field of $n 5$ contains as many bits of the computed exponent as it will hold.

## and

Bitwise AND
and
( n1 n2 -- n3 )

1 CPU-clock
Perform a bitwise AND of $n 1$ and $n 2$ giving the result $n 3$.
Equivalent to Java byte code iand.
Equivalent to the ANS Forth word AND.

## Microprocessing Unit

PSC1000 MICRO PROCESSOR
MNEMONIC
STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...S0/R0 )

CARRY?
OPCODE

## bkpt

Breakpoint
bkpt ( -- ) 00111100
( L: -- addr ) 0x3C
1+M CPU-clocks
Perform a call subroutine to the breakpoint trap location, $0 \times 134$. addr is the address of the bkpt instruction. Typically the breakpoint service routine replaces the bkpt opcode at addrwith the original opcode, performs whatever debugging function desired, and ret to addr.

Equivalent to Java byte code breakpoint.


Transfer execution to offset cells from the beginning of the current instruction group．
The instruction adds the two＇s－complement cell offset encoded within and following the br opcode to pc， and transfers execution to the resulting cell－aligned address．

Equivalent to Java byte codes goto，goto＿w．
Equivalent to the run－time for the ANS Forth words AGAIN，AHEAD，ELSE．

```
br [] ( addr -- ) 0100 1011
Branch Indirect 0x4B
```

M CPU－clocks
Replace the value in pc with addr to transfer execution to addr．Note that addr is an absolute byte－aligned address and not an offset．

```
bz offset ( n -- ) 0001 0xxx
Branch if Zero
0x1?
```

M CPU－clocks
If $n$ iszero，transfer execution to offset cells from the beginning of the instruction group；otherw ise，continue execution at the next instruction group．

If n is zero the instruction adds the two＇s－complement cell offset encoded within and following the bz opcode to pc，and transfers execution to the resulting cell－aligned address．If n is non－zero execution continues with the next instruction group．

Equivalent to Java byte codes ifeq，ifnull．
Equivalent to the run－time for the ANS Forth words IF，UNTILL，while．

## Microprocessing Unit

## PSC1000 MICRO PRO CESSO R

MNEMONIC STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...S0/R0
dbr offset $\quad(--)$
Decrement CT and Branch

CARRY?
OPCODE

0001 1xxx
$0 \times 1 ?$
M CPU-clocks

Decrement ct by one. If ct, is non-zero transfer execution to offset cells from the beginning of the current instruction group; otherwise, continue execution with the next instruction group.

The instruction decrements ct by one. If the resulting ct is non-zero the instruction then adds the two'scomplement cell offset encoded within and following the dbr opcode to pc, and transfers execution to the resulting cell-aligned address. If the resulting ct is zero execution continues with the nextinstruction group.

MNEMONIC STACKS ( input Sn/Rn...SO/R0 -- output Sm/Rm...so/Ro ) CARRY? OPCODE

## cache

Fill/Empty Stack Cache
The cache instructions are used to optimize program execution, or to make program execution more deterministic. Stack cache spills and refills can be caused to occur at preferential times, and to occur in bursts to optimize memory access. Executing the instruction with both $n$ and $n-14(n>0)$ ensures that an exact number of items are in the stack cache. Pushing dummy values onto the stack (one value for the local-register stack, three values for the operand stack) and then executing the instruction with $n=-14$ causes all previously held data to be spilled to memory.
lcache ( $n--$ )
01001101
0x4D
1 or (1M to 14M) CPU-clocks If $n>0$, ensure that at least $n$ cells can be removed from the local-register stack without causing local-register stack cache refills. Cells are refilled from memory into the cache if required. ( $1 \leq n \leq 14$ ).

If $\mathrm{n}<0$ (two's complement), ensure that at least $|\mathrm{n}|$ cells can be added to the local-register stack without causing local-register stack cache spills. Cells are spilled from the stack cache to memory if required. (-14 $\leq n \leq-1$ ).

If $\mathrm{n}=0$ the local-register stack cache is unchanged.
scache ( $n--n$ ) 01000101
$0 \times 45$
1 or (1M to 14M) CPU-clocks If $n>0$, ensure that at least $n$ cells can be removed from the operand stack without causing operand stack cache refills. Cells are refilled from memory into the cache if required. ( $1 \leq n \leq 14$ ).

If $n<0$ (two's complement), ensure that at least $|n|$ cells can be added to the operand stack without causing operand stack cache spills. Cells are spilled from the stack cache to memory if required. ( $-14 \leq \mathrm{n} \leq-1$ )

If $\mathrm{n}=0$ the operand stack cache is unchanged.

## Microprocessing Unit

PSC1000 MICRO PRO CESSO R
MNEMONIC STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...S0/R0 )

CARRY?
OPCODE

## call

Call Subroutine

```
call offset ( -- )
    ( L: -- addr ) 0x0?
Call Subroutine 1+M CPU-clocks
```

Transfer execution to offset cells from the beginning of the currentinstruction group. addr is the cell-aligned address of the next instruction group.

The instruction pushes addr on the local-register stack and then adds the two's-complement cell offset encoded within and following the call opcode to pc, and transfers execution to the resulting cell-aligned address. The offset is in the same form and follows the same rules as those for branches.

```
call [] ( addr1 -- ) 0100 1110
    ( L: -- addr2 ) 0x4E
Call Subroutine Indirect 1+M CPU-clocks
```

Replace the value in pc with addr1 to transfer execution there. addr2 is the byte-aligned address of the next instruction following call []. Note that addr1 is an absolute address and not an offset.

## cmp

Compare
cmp ( n1 n2 -- n1 n2 ) carry $\pm \quad 11001011$
1 CPU-clock
Compare n 2 and n 1 as signed values. Set carry if $\mathrm{n} 1<\mathrm{n} 2$, otherwise clear carry.

## copyb

Copy Byte Across Cell
copyb ( $n 1--n 2$ )
11010000
0xD0
1 CPU-clock
n 2 is the result of copying the lowest byte of n 1 into each of the higher byte positions. For example, $0 \times 12345678$ becomes $0 \times 78787878$.

## dec

Decrement
dec \#1 ( $n 1--n 2)$

11001111
$0 \times C F$
1 CPU-clock
Subtract one from n 1 leaving the result n 2 .
Equivalent to ANS Forth word 1-.
dec \#4 ( $n 1--n 2)$

Subtract four from $n 1$ leaving the result n 2 .
dec ct, \#1 ( -- )
11000001
$0 \times C 1$
1 CPU-clock

Subtract one from ct.

## Microprocessing U nit

PSC1000 MICRO PRO CESSO R
MNEMONIC STACKS ( input $\mathrm{Sn} / \mathrm{Rn}$... S0/R0 -- output $\mathrm{Sm} / \mathrm{Rm} . . . \mathrm{SO} / \mathrm{RO}$ )

CARRY?
OPCODE

## denorm

Denormalize
denorm ( n1 -- n2 ) if single precision 11000101
( n1 n2 -- n3 n4) if double precision 0xC5
1 to 13 CPU-clocks
( L: -- addr ) only when trap processed
$3+M$ to $15+M$ CPU-clocks
Shift $n 1$ (or $n 2 n 1$ if double) right by the bit count in the exponent field of $c t$. Bits shift out of the right into the GRS extension. If any bit in the GRS extension is set, a normalize exception is signaled. The location of the exponent field depends on fp_precision. The exponent field of ct is decremented to zero.

Shifting is performed by bytes or bits to minimize CPU-clock cycles required. If the count in the exponent bits of ct is larger than the width in bits of the significand field +3 (for the guard_bit, round_bit and the hidden bit), the sticky_bit isset and the other bits are cleared, and execution requires one CPU -clock cycle.

## depth

Depth of Stack
ldepth
( -- n )
10011011
0x9B
1 CPU-clock
n is exactly the number of cells that can be removed from the local-register stack without causing a localregister stack cache refill. ( $0 \leq \mathrm{n} \leq 14$ ).
sdepth ( -- n )
10011111
0x9F
1 CPU-clock
n is exactly the number of cells, before n was pushed, that could be removed from the operand stack without causing an operand stack cache refill. $(0 \leq n \leq 14)$. If $n=14$, then an operand stack cache spill occurred when $n$ was pushed and only 13 cells remain, excluding $n$, that can be removed from the operand stack without causing an operand stack cache refill.

Globally disable interrupts, clearing interrupt_en. The ioie bits are not changed.

## divu

Divide Unsigned
( n1 n2 -- n3 n4 )
0xDE
32 CPU-clocks
Divide the double value $n 2 n 1$ by the value in $g 0$ giving the quotient $n 3$ and remainder $n 4$. All values are unsigned. If $n 2$ is greater than or equal to $g 0$ then the quotient will overflow. If $g 0$ is zero then $n 3$ equals n 1 and n 4 equals n 2 .

## ei

Enable Interrupts
ei ( -- )

10110110
0xB6
1 CPU-clock

Globally enable interrupts, setting interrupt_en. The ioie bits are not changed.

## Microprocessing Unit

PSC1000 MICRO PRO CESSO R
MNEMONIC
STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...SO/R0 )

CARRY?
OPCODE

## eqz

Equal Zero
eqz ( $n 1--n 2$ )

11100101
0xE5
1 CPU-clock
n 2 is the logical inverse of n 1 . If n 1 is equal to zero n 2 is -1 . If n 1 is non-zero n 2 is zero.

Equivalent to ANS Forth word $0=$.

## expdif

Exponent Difference
expdif ( n1 n2 -- n3 n4 )
11000100
0 xC 4
1 CPU-clock
Clear the upper half of ct. Subtract the exponent field of $n 2$ from the exponent field in $n 1$ placing the result in the exponent-field bits of ct. Clear the exponent-field bits and sign bit and set the hidden bit of $n 1$ and n 2 giving n3 and n4, respectively. The locations of the exponent field and hidden bit depend on fp_precision.

## extexp

Extract Exponent
extexp (n1-- n2 ) 11011011
$0 x D B$
1 CPU-clock
Clear the significand bits of n 1 leaving the exponent-field bits and sign bit unchanged, giving n 2 . The locations of the exponent field and significand field depend on fp_precision.

## extsig

Extract Significand
extsig ( n1 -- n2 ) 11011100
$0 x D C$
1 CPU-clock
Clear the exponent and sign bits of n 1 leaving the significand-field bits unchanged. Then set the hidden bit of $n 1$, giving $n 2$. The locations of the exponent field and significand field depend on fp_precision.

## Microprocessing Unit

PSC1000 MICRO PRO CESSO R
MNEMONIC STACKS ( input $\mathrm{Sn} / \mathrm{Rn}$... S0/R0 -- output $\mathrm{Sm} / \mathrm{Rm} . . . \mathrm{SO} / \mathrm{RO}$ )

CARRY?
OPCODE
frame
Allocate On-Chip Stack Frame
lframe ( $n$-- )
10111110
$\left(\mathrm{L}: \ldots j_{2} j_{1}--\ldots j_{2} j_{1} x_{n} \ldots x_{1}\right) \quad(n>0) \quad 0 \times B E$ 1 or (1M to 15M) CPU-clocks
$\left(L: \ldots j_{n+1} j_{n} \ldots j_{1}-\ldots \ldots j_{n+2} j_{n+1}\right) \quad(n<0)$
1 or (1 to 15) CPU-clocks
( L: -- )
( $n=0$ ) 1 CPU-clock
If $n>0$, allocate $n$ uninitialized cells, $x_{n} \ldots x_{1}$, at the top of the local-register stack cache. This causes $r 0$ to move to $r n, r 1$ to move to $r(n+1)$, ri to move to $r(n+i)$, etc. Those local registers for which $(n+i)>14$ are written from the local-register stack cache to memory. ( $1 \leq n \leq 15$ ).

If $n<0$, discard $|n|$ cells, $j_{n} \ldots j_{1}$, from the top of the local-register stack cache. Thiscauses $r 0$ through $r(|n|-1)$ to be discarded, $r|n|$ to become $r 0, r(|n|+1)$ to become r1, etc. $(-15 \leq n \leq-1)$. Each cell discarded that is not in the stack cache requires one CPU-clock cycle.

If $\mathrm{n}=0$, no cells are allocated or discarded.

$$
\begin{aligned}
& \left(\ldots j_{2} j_{1} m n-\ldots j_{2} j_{1} x_{n} \ldots x_{1} m n\right)(n>0) \\
& 1 \text { or (1M to 15M) CPU-clocks } \\
& \left(\ldots j_{n+1} j_{n} \ldots j_{1} m n--\ldots j_{n+2} j_{n+1} m n\right) \quad(n<0) \\
& 1 \text { or (1 to 15) CPU-clocks } \\
& \text { ( } n-n \text { ) ( } n=0 \text { ) } 1 \text { CPU-clock }
\end{aligned}
$$

If $n>0$, allocate $n$ uninitialized cells, $x_{n} \ldots x_{1}$, in the operand stack cache after $s 0$ and $s 1$. This causes s2 to move to $s(n+2)$, 33 to move to $s(n+3)$, si to move to $s(n+i)$, etc. Those stack cells for which $(n+i)>16$ are written from the operand stack cache to memory. ( $1 \leq \mathrm{n} \leq 15$ ).

If $n<0$, discard $|n|$ cells, $j_{n} \ldots j_{1}$, from within the operand stack cache after $s 0$ and $s 1$. This causes s2 through $s(|n|+1)$ to be discarded, $s(|n|+2)$ to become s2, $s(|n|+3)$ to become s3, etc. $(-15 \leq n \leq-1)$. Each cell discarded that is not in the stack cache requires one CPU -clock cycle.

If $\mathrm{n}=0$, no cells are allocated or discarded.

## iand

Bitwise Invert then AND
iand ( n1 n2 -- n3 )
clear carry 11101001
0xE9
1 CPU-clock

Clear the bits in n 1 that are set in n 2 leaving the result n 3 .

## inc

Increment
inc \#1 ( n1 -- n2 )
11001110
0xCE
1 CPU-clock
Add one to n 1 giving the sum n 2 .
Equivalent to ANS Forth word $1+$.
inc \#4 ( $n 1--n 2$ )
11001100
$0 \times C C$
1 CPU-clock
Add four to n 1 giving the sum n2.
lcache
See _cache.

## Microprocessing U nit

## PSC1000 MICRO PRO CESSO R

MNEMONIC
STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...S0/R0 )

CARRY?
OPCODE

## ld

Load Indirect from Memory
ld [--r0] ( -- n )
01000100
0x44
$1+\mathrm{M}$ CPU-clocks
Decrement the address in roby four. n is the value from the cell in memory at the new address in $r 0$. The two least significant bits of the address are ignored and treated as zero.
ld [--x] ( -- n )
01001010
$0 \times 4 A$
1+M CPU-clocks
Decrement the address in x by four. n is the value from the cell in memory at the new address in x . The two least significant bits of the address are ignored and treated as zero.
ld $[r 0++] \quad(--n)$
01000110
0x46
M CPU-clocks $n$ is the value from the cell in memory at the address in ro. Increment ro by four. The two least significant bits of the address are ignored and treated as zero.
ld $[r 0] \quad(--n)$
01000010
$0 \times 42$
M CPU-clocks $n$ is the value from the cell in memory at the address in $r 0$. The two least significant bits of the address are ignored and treated as zero.
ld [x++] ( -- n )
01001001
$0 \times 49$
M CPU-clocks
n is the value from the cell in memory at the address in x . Increment x by four. The two least significant bits of the address are ignored and treated as zero.
ld [x]
( $--n$ )
01000001
$0 \times 41$
M CPU-clocks
n is the value from the cell in memory at the address in x . The two least significant bits of the address are ignored and treated as zero.

MNEMONIC STACKS（ input Sn／Rn．．．S0／R0－－output Sm／Rm．．．so／Ro ）CARRY？OPCODE
ld［］
（ addr－－$n$ ）
01000000
$0 \times 40$
M CPU－clocks
n is the value from the cell in memory at the address addr．The two least significant bits of the address are ignored and treated as zero．

Equivalent to AN S Forth words＠，F＠，SF＠．
ld．b［］（ addr－－byte ）
01001000
$0 \times 48$
M CPU－clocks
byte is the value from the byte in memory at the address addr．
Equivalent to ANS Forth word C＠．

## ldo

Load Indirect from On－Chip Resource
ldo［］（ addr－－n ）
10010110
0x96
1 CPU－clock
n is the value from the on－chip resource at addr．For valid values of addr，see 0 n －Chip Resource Registers， page 129.
ldo．i［］（ bit＿addr－－n ）
10010111
$0 \times 97$
1 CPU－clock
n is all ones（－1）if the bit at the on－chip resource address bit＿addr is one，otherw ise n iszero．For valid values of bit＿addr，see O n－Chip Resource Registers，page 129.

## ldepth See＿depth．

lframe
See


PSC1000 MICRO PROCESSO R
MNEMONIC STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...S0/R0 )

CARRY?
OPCODE

## mloop_

Micro Loop on Condition
An mloop re-executes the current instruction group, beginning with the first instruction in the group, up to the mloop_ instruction, until a specified condition is not met or until ct is decremented to zero. When either termination condition occurs, execution continues with the instruction following the mloop_opcode.


1 CPU-clock
Decrement ct by one. If ct is non-zero transfer execution to the beginning of the current instruction group. If $c t$ is zero continue execution with the instruction following mloop.

```
mloopc ( -- )
Micro Loop if Carry
Micro Loop if Carry
```

1 CPU-clock Decrement ct by one. If ct is non-zero and carry is set transfer execution to the beginning of the current instruction group. If ct is zero or carry isclear continue execution with the instruction followingmloopc.
mloopn
mloopnp ( $n-n$ )
00111010
Micro Loop if Negative/Not Positive
0x3A
Decrement ct by one. If ct is non-zero and $n$ is negative (neither positive nor zero) transfer execution to the beginning of the current instruction group. If ct is zero or n is not negative (either positive or zero) continue execution with the instruction following mloopn or mloopnp.

```
mloopnc ( -- )
Micro Loop if Not Carry
Micro Loop if Not Carry
```

00111101

Decrement ct by one. If ct is non-zero and carry isclear transfer execution to the beginning of the current instruction group. If ct is zero or carry is set continue execution with the instruction following mloopnc.
mloopnn
mloopp ( $n$-- n ) 00111110
Micro Loop if Not Negative/Positive 0x3E
1 CPU-clock
Decrement ct by one. If ct is non-zero and n is not negative (either positive or zero) transfer execution to the beginning of the current instruction group. If ct is zero or $n$ is negative (neither positive nor zero) continue execution with the instruction following mloopnn or mloopp.

MNEMONIC STACKS ( input Sn/Rn...So/R0 -- output Sm/Rm...So/Ro ) CARRY? OPCODE
mloopnz ( $n-n$ )
00111111
Micro Loop if Not Zero 0x3F
1 CPU-clock
Decrement ct by one. If ct is non-zero and $n$ is not zero transfer execution to the beginning of the current instruction group. If ct is zero or $n$ is zero continue execution with the instruction following mloopnz.

```
mloopz ( n -- n ) 0011 1011
Micro Loop if Zero 0x3B
```

1 CPU-clock
Decrement ct by one. If ct is non-zero and $n$ is zero transfer execution to the beginning of the current instruction group. If ct is zero or $n$ is not zero continue execution with the instruction following mloopz.

## mulfs

Multiply Fast Signed
mulfs ( n1 n2 -- n3 n4 )
11010110
$0 \times D 6$
2 to 32 CPU-clocks Multiply the bit-order-reversed value $n 1$ by the value in $g 0$ leaving the result $n 4$. $n 2$ is usually zero and $n 3$ is garbage (see below). The number of significant bits in $n 1$ is indicated by the value in ct. All values are single-cell size and signed. ct is decremented to zero.

The program must supply n1 in bit-order-reversed form (e.g., the binary value for decimal 13 is 01101 and bit-order reversed is 10110; note that the original high-order bit is zero as a sign bit and must be included.) The program must also load ct with the bit count and push a zero for $n 2$. For the example number above, the count would be 5. n3 is typically discarded.
$n 2$ could be non-zero but its use in this form is questionable. The effect of $n 2$ on the result is that the value of $n 2$ shifted left by the bit count value in ct is added to the result, n4. n3 contains the low cell of the value remaining after $n 2 n 1$ is shifted right by the number of bits in ct. Instruction execution time is limited to 65 CPU-clock cycles by the instruction expiration counter.

# Microprocessing Unit 

PSC1000 MICRO PROCESSO R
MNEMONIC STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...S0/R0 )

CARRY?
OPCODE

## muls

Multiply Signed
muls ( n1 n2 -- n3 n4 )
11010101
0xD5
32 CPU-clocks
Multiply $n 1$ by the value in $g 0$ and add $n 2$, leaving the double result $n 4 n 3$. All values are signed.

## mulu

Multiply Unsigned
mulu ( n1 n2 -- n3 n4 )

Multiply $n 1$ by the value in $g 0$ and add $n 2$, leaving the double result $n 4 n 3$. All values are unsigned.

## mxsm

Maximum
mxm

$$
\left(\begin{array}{lllll}
n 1 & n 2 & -- & n 1 & n 2
\end{array}\right)
$$

$$
\text { carry set } 11011111
$$

carry clear

$$
0 \times D F
$$

2 CPU-clocks
Compare n 2 and n 1 as signed values. Set carry if $\mathrm{n} 1<\mathrm{n} 2$, otherwise clear carry. Bring the larger of n 1 and n 2 to the top of stack. That is, if the resulting carry is set then n 2 is greater than n 1 and n 2 remains on top. If the resulting carry is clear then $n 2$ is less than or equal to $n 1$ and $n 1$ is exchanged with $n 2$.

## neg

Two＇s－Complement Negation
neg（ $n 1--n 2)$
0xC9
1 CPU－clock
n 2 is the two＇s－complement negation of n 1 ．
Equivalent to Java byte code ineg．
Equivalent to ANS Forth word NEGATE．

## nop

No Operation
nop（－－） 11101010
0xEA
1 CPU－clock
Do nothing．
Equivalent to Java byte code nop．

## Microprocessing Unit

PSC1000 MICRO PRO CESSO R
MNEMONIC STACKS ( input $\mathrm{Sn} / \mathrm{Rn}$... S0/R0 -- output $\mathrm{Sm} / \mathrm{Rm} . . . \mathrm{SO} / \mathrm{RO}$ )

CARRY?
OPCODE

## norml

Normalize Left
norml

$$
\begin{aligned}
& \text { ( n1 -- n2 ) if single precision } 11000111 \\
& \text { ( n1 n2 -- n3 n4 ) if double precision 0xC7 } \\
& 1 \text { to } 13 \text { CPU-clocks } \\
& \text { ( L: -- addr ) only when trap processed } \\
& 3+\mathrm{M} \text { to } 15+\mathrm{M} \text { CPU-clocks } \\
& \text { ( L: -- addr1 addr2 ) only when both traps processed } \\
& 5+2 \mathrm{M} \text { to } 17+2 \mathrm{M} \text { CPU-clocks }
\end{aligned}
$$

W hile the hidden bit and the seven bits to the right of it in n 1 ( n 2 if double) are zero, repeat the following: Shift n 1 (or n 2 n 1 if double) left by eight bits and decrement the exponent field in ct by eight.
Then, while the hidden bit of n 1 ( n 2 if double) is zero, repeat the following:
Shift n 1 (or n 2 n 1 if double) left by one bit and decrement the exponent field in ct by one.
In both steps, bits shifted into bit zero of nl come from the GRS extension.
When the operation is complete, if shifting was required and the decremented field in ct reached or passed all zero bits during the processing, an underflow exception is signaled. If no shifting is required an underflow exception is not signaled. Then, if any bit in the GRS extension is set, a normalize exception is signaled. The location of the exponent field depends on fp_precision. If both traps are processed, the underflow trap has higher priority. Instruction execution time is limited to 65 CPU -clock cycles by the instruction expiration counter.

MNEMONIC STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...S0/R0 ) CARRY?

## normr

Normalize Right
normr ( $n 1--n 2$ ) if single precision 11000110
( n1 n2 -- n3 n4) if double precision 0xC6
1 to 11 CPU-clocks
( L: -- addr ) only when trap processed
$3+M$ to $13+M$ CPU-clocks
( L: -- addr1 addr2 ) only when both traps processed
$5+2 \mathrm{M}$ to $15+2 \mathrm{M} \mathrm{CPU}$-clocks
W hile any bit except the first bit (the hidden bit) in the exponent field is non-zero, repeat the following:
Shift $n 1$ (or n2n1 if double) right by one bit and increment the exponent field in ct by one. Bits shifted out of bit zero of $n 1$ shift into the GRS extension bits.

W hen the operation is complete, if shifting was required and the incremented field in ct reached or passed all one bits during the processing, an overflow exception is signaled. If no shifting is required an overflow exception is not signaled. Then, if the GRS extension is set, a normalization exception is signaled. The locations of the exponent field and hidden bit depend on fp_precision. If both traps are processed, the overflow trap has higher priority. Instruction execution time is limited to 65 CPU -clock cycles by the instruction expiration counter.

## notc

Complement Carry
notc ( -- )
carry inverted101 1101
$0 x D D$
1 CPU-clock
Invert the state of carry.

## Microprocessing Unit

PSC1000 MICRO PRO CESSO R
MNEMONIC
STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...S0/R0 )

CARRY?
OPCODE

## Or

Bitwise OR
or ( n1 n2 -- n3 )

carry clear | 1110 |
| ---: |
| 0000 |
| $0 \times E 0$ |
| 1 |$\quad$ CPU-clock

1 CPU-clock
Perform a bitwise $O R$ on $n 1$ and $n 2$ giving the result $n 3$.
Equivalent to Java byte code ior.
Equivalent to ANS Forth word OR.


MNEMONIC
STACKS（ input Sn／Rn．．．s0／R0－－output Sm／Rm．．．SO／R0 ）CARRY？
OPCODE

## pop

pop（ $n--$ ）
10110011
0xB3
1 CPU－clock
Discard $n$ ．

Equivalent to Java byte codes pop，12i．
Equivalent when executed twice to Java byte code pop2．
Equivalent to AN S Forth word D＞S，DROP，FDROP．
Equivalent when executed twice to ANS Forth word 2DROP．
pop ct（ $n--$ ）
10110100
$0 \times B 4$
1 CPU－clock
Replace the value in ct with $n$ ．
pop gi（ n－－） $\begin{array}{r}0101 \text { xxxx } \\ 0 \times 5 ? \\ \end{array}$
Replace the value in gi（global registeri，i．e．，g0－g15）with n ．To eliminate contentions on registers $\mathrm{g} 1-\mathrm{g} 15$ ， if the DMAC or the VPU is using one of these global registers when the M PU attempts access，the M PU stalls until the registers are available．Contentions are not possible on g 0 ．
pop la
（ addr－－）
10111101
$0 \times B D$
1＋M CPU－clocks
Replace the value in la with cell－aligned address addr．The contents of the local－register stack cache， $\ldots j_{n} \ldots j_{1}$ ，are discarded．The two least－significant bits of la are cleared．The bit ls＿boundary is cleared． A stack refill is performed at addr＋4 to initialize r0．

```
pop lstack ( n -- )
    ( L: -- n )
```

10111010
0xBA
1 CPU－clock

Remove $n$ from the operand stack and push it onto the local－register stack（into ro）．The previous contents of $r 0$ are placed in r1，the previous contents of $r 1$ are placed in r2，and so on．

Equivalent to ANS Forth word $>$ R．
Equivalent when executed twice to ANS Forth word $2>$ R．

## Microprocessing Unit

PSC1000 MICRO PRO CESSO R
MNEMONIC STACKS（ input $\mathrm{Sn} / \mathrm{Rn} . . . \mathrm{SO} / \mathrm{RO}$－－output $\mathrm{Sm} / \mathrm{Rm} . . . \mathrm{SO} / \mathrm{RO}$ ）CARRY？
（ $n--$ ）
10111001
0xB9
1 CPU－clock
Replace the value in mode with $n$ and clear power＿fail．The mode bits power＿fail，ls＿boundary and os＿boundary are not writeable．

```
pop ri ( n -- ) 1010 xxxx
    0xA?
1 CPU-clock
```

Replace the value in ri（local register i，i．e．，r0－r14）with $n$ ．
If $r i$ is in the local－register stack cache（ $i \leq l$ depth）the value in ri is replaced with $n$ ．If $r i$ is not currently in the local－register stack cache（ $\mathrm{i}>$ ldepth），cells starting at $r$（ldepth +1 ）are read from memory sequentially to fill the cache until ri is reached．ri is then replaced with the value $n$ ．

Equivalent to Java byte codes astore＿0，astore＿1，astore＿2，astore＿3，fstore＿0，fstore＿1， fstore＿2，fstore＿3，istore＿0，istore＿1，istore＿2，istore＿3．
Equivalent when executed twice to Java byte codes dstore＿0，dstore＿1，dstore＿2，dstore＿3， lstore＿0，lstore＿1，lstore＿2，lstore＿3．
Equivalent for indexes up to fourteen（almost all actual cases）to Java byte codes astore（vindex）， fstore（vindex），istore（vindex）．
Equivalent when executed twice for indexes up to thirteen（almost all actual cases）to Java byte codes dstore（vindex），lstore（vindex）．
pop sa（ ．．．$\left.j_{n} \ldots j_{1} m 1 ~ m 2 ~ a d d r ~--~ m 1 ~ m 2 ~\right) ~$
10111100
$0 \times B C$
1＋M CPU－clocks
Replace the value in sa with cell－aligned address addr．The contents of the operand stack cache，$\ldots \mathrm{j}_{n} . . . \mathrm{j}_{1}$ ， are discarded．The two least－significant bits of sa are cleared．The bit os＿boundary is cleared．A stack refill is performed at addr＋4 to initialize s2．
（ $n--$ ）
10111000
$0 \times B 8$
1 CPU－clock
Replace the value in x with n ．

MNEMONIC
STACKS（ input Sn／Rn．．．S0／R0－－output Sm／Rm．．．S0／R0 ）CARRY？
OPCODE

## push

push
（ $n--n n$ ）
10010010
$0 \times 92$
1 CPU－clock
Duplicate $n$ ．
Equivalent to Java byte code dup．
push ct（－－n ）
10010100
0×94
1 CPU－clock
n is the value in ct．
push gi（－－n ）
0111 xxxx 0x7？
1 CPU－clock
$n$ is the value in gi（global register i，i．e．，$g 0-\mathrm{g} 15$ ）．To eliminate contentions on registers $\mathrm{g} 1-\mathrm{g} 15$ ，if the DM AC or the VPU is using one of these global registers w hen the M PU attempts access，the M PU stalls until the registers are available．Contentions are not possible on g 0 ．
push la（－－addr ） 10011101
0x9D
1 CPU－clock
addr is the value in la．
push lstack（－－n ）
10011010
0x9A
（ L：n－－）
1 CPU－clock
Pop $n$ from the local－register stack（from ro）and push it onto the operand stack．The previous contents of $r 1$ are placed in r0，the previous contents of $r 2$ are placed in r1，and so on．

Equivalent to ANS Forth word R＞．
Equivalent when executed twice to ANS Forth word $2 R>$ ．
push mode（ $--n$ ）

10010001
$0 \times 91$
1 CPU－clock
n is the value in mode．

## Microprocessing Unit

PSC1000 MICRO PRO CESSO R
MNEMONIC
STACKS ( input Sn/Rn...SO/R0 -- output Sm/Rm...SO/R0 )
push ri ( -- n )

CARRY?
OPCODE

1000 xxxx
$0 \times 8$ ?
1 CPU-clock
$n$ is the value in ri (local register i, i.e. r0-r14).

If ri is in the local-register stack cache ( $i \leq 1$ depth) the value in ri is pushed onto the operand stack. If ri is not currently in the local-register stack cache (i>ldepth), cells starting at r(ldepth+1) are read from memory sequentially until ri is reached. The value in ri is then pushed onto the operand stack.

Equivalent to Java byte codes aload_0, aload_1, aload_2, aload_3, fload_0, fload_1, fload_2, fload_3, iload_0, iload_1, iload_2, iload_3.
Equivalent when executed twice to Java byte codes lload_0, lload_1, lload_2, lload_3, dload_0, dload_1, dload_2, dload_3.
Equivalentfor indexesup to fourteen (almostall actual cases) to Java byte codes aload (vindex), fload (vindex), iload (vindex).
Equivalentwhen executed twice for indexes up to thirteen (almostall actual cases) to Java byte codes dload (vindex), lload (vindex).

Equivalent to ANS Forth word R@.
Equivalent when executed twice to ANS Forth word 2R@ .

```
push si ( -- n )
```

| s0 | 1001 | 0010 |
| :--- | ---: | ---: |
|  |  | $0 \times 92$ |
| s1 | 1001 | 0011 |
|  |  | $0 \times 93$ |
| s2 | 1001 | 1110 |
|  |  | $0 \times 9 E$ |
|  | CPU-clock |  |

n is the value in si (operand stack register i, i.e., s0, s1 or s2)

Equivalent to Java byte code dup.
Equivalent when executed twice to Java byte code dup2.
Equivalent to AN S Forth words 2DUP, DUP, FDUP, FOVER, OVER.

```
push sa ( -- addr )
10011100
0x9C
1 CPU-clock
```

addr is the value in sa.
push x ( -- n )
10011000
0x98
1 CPU-clock
n is the value in x

MNEMONIC
STACKS（ input Sn／Rn．．．S0／R0－－output Sm／Rm．．．So／Ro ）CARRY？
OPCODE
push．b \＃n（－－n ）
10010000
$0 \times 90$
1 CPU－clock
n is an eight－bit literal value in the range $0-255$ ．The byte literal is encoded as the last byte in the instruction group．This allows only one unique push．b \＃value per instruction group．Multiple push．b \＃opcodes in the same instruction group push the same value．

Equivalent for positive values to Java byte code bipush． Equivalent for some values to Java byte code sipush．
push．l \＃n（ $--n$ ）
01001111
$0 \times 4 \mathrm{~F}$
M CPU－clocks
n is a 32－bit literal value．The value is compiled as a full cell following the instruction group．Multiple push． 1 \＃in an instruction group are compiled with data in sequential cellsfollowing the instruction group in memory．As the push． 1 \＃opcodes are executed，the internally maintained next pc is incremented to move past each cell as it is fetched and pushed on the stack．N ote that skipping a push． 1 \＃causes the M PU to execute the literal value because the skipped push．I \＃will nothave incremented next pc to move past the value．

Equivalent to Java byte code fconst＿1，fconst＿2，ldc，ldc＿w，sipush．
Equivalent when executed twice to Java byte code ldc2＿w．
push．n \＃n（ $--n$ ）
0010 xxxx
0x2？
1 CPU－clock $n$ is a literal value in the range -7 to 8 ．The four least－significant bits of the opcode encode the value for $n$ ． The value is encoded as a two＇s－complement representation of $n$ except that -8 （ 1000 binary）is decoded to be +8 ．

Equivalentto Java bytecodes aconst＿null，fconst＿0，iconst＿m1，iconst＿0，iconst＿1，iconst＿2， iconst＿3，iconst＿4，iconst＿5．
Equivalent for some values to Java byte code bipush． Equivalent when executed twice to Java byte codes dconst＿0，lconst＿0，lconst＿1．

Equivalent to AN S Forth words FALSE，TRUE．

## Microprocessing Unit

PSC1000 MICRO PROCESSO R
MNEMONIC
STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...S0/R0 )

CARRY?
OPCODE

## replb

Replace Byte
replb ( n1 n2 -- n3 )
11011010
$0 x D A$
1 CPU-clock
Replace the target byte of $n 2$ with the least-significant byte of $n 1$, leaving the result $n 3$. The target byte is selected by the two least-significant bits of $x$, as when accessing a byte in memory.

For example, if $\mathrm{x}=0 \times 121, \mathrm{n} 1=0 \times C$ CDDEEFF, and $\mathrm{n} 2=0 \times 12345678$, then $\mathrm{n} 3=0 \times 12 F F 5678$.

## replexp

Replace Exponent

```
replexp ( n1 n2 -- n3 )
```

10110101
0xB5
1 CPU-clock
Replace the exponent field and sign bits of $n 1$ with the corresponding bits of $n 2$. Clear the GRS extension. The location of the exponent field depends on fp_precision.

## ret

Return
ret ( -- )
( L: addr -- )
01101110
0x6E
Return from Subroutine
M CPU-clocks
Pop addr from the local-register stack into pc to transfer execution to addr.
Equivalent to ANS Forth word EXIt.

```
reti ( -- ) 0110 1111
    ( L: addr -- )
Return from Interrupt
M CPU-clocks
```

Pop addr from the local-register stack into pc to transfer execution to addr. Clear the current interruptunderservice bit.

PSC1000 M icroprocessor

MNEMONIC STACKS（ input Sn／Rn．．．S0／R0－－output Sm／Rm．．．S0／R0 ）CARRY？

## rev

Revolve Operand Stack
rev（ n1 n2 n3－－n2 n3 n1 ）

11100100
$0 \times E 4$
1 CPU－clock

Rotate the top three cells of the stack to bring $n 1$ to the top．

Equivalent to the run－time for the ANS Forth words FROT，ROT．
rnd
Round
（ n1－－n2
carry $\quad 11010001$
0xD1
1 CPU－clock
（ L：－－addr ）only when trap processed 3＋M CPU－clocks
Round $n 1$ giving $n 2$ ．Rounding is based on fp＿round＿mode，the sign of ct，and the GRS extension．See Rounding，page 34．If an increment carried out of bit 31 then set carry，clear carry otherwise．

If the value of $n 2$ is different from $n 1$ ，a rounded exception is signaled．The exception is detected as a change in the value of bit zero．

## scache

 See＿cache．
## sdepth See＿depth．

## Microprocessing Unit

PSC1000 MICRO PRO CESSO R
MNEMONIC
STACK
input Sn/Rn...SO/RO -- output Sm/Rm...SO/RO )

CARRY?
OPCODE

## sexb

Sign-extend byte
sexb ( n1 -- n2 )
$1101 \quad 1000$

1 CPU-clock

Copy the value of bit seven of $n 1$ into bits eight to thirty-one, leaving $n 2$.

Equivalent to Java byte code i2b.

MNEMONIC STACKS（ input $\mathrm{Sn} / \mathrm{Rn} . . . \mathrm{SO} / \mathrm{RO}$－－output $\mathrm{Sm} / \mathrm{Rm} . . . \mathrm{So} / \mathrm{RO}$ ）CARRY？OPCODE

## shift

The number of CPU－clock cycles required to shift the specified number of bits depends on the number of bits requested．W hile the count $\geq$ eight the value（single or double）is shifted eight bits each CPU－clock cycle．W hen the count becomes less than eight the shifting is finished at one bit per CPU－clock cycle．For instance，the worst－ case useful shift is 31 bits（either left or right）and takes eleven CPU－clock cycles－three 8 －bit shifts and seven 1－bit shifts plus one CPU－clock cycle for setup．A 32－bit shift would take five CPU－clock cycles．The counts are modulo 64 in sign－magnitude representation using only the six least－significant bits for the magnitude and bit 31 for the sign．A zero in the six least－significant bits represents zero．（Sign－magnitude representation here is a positive integer count in the six least－significant bits，the middle bits ignored，and bit 31 indicating the sign， zero is positive，one is negative）．

```
shift
( n1 n2 -- n3 )
```

carry $\pm(n 2>0) 11101110$
0xEE
1 to 11 CPU－clocks
Shiftn1 by｜n2｜bits leaving the resultn3．If n2 is positive the shift is to the left，each bit is shifted out through carry，and zero is shifted into each bit on the right．If $n 2$ is negative the shift is to the right，each bit shifted out is shifted through the GRS extension，and carry is copied into each high order bit of $n 1$ vacated by the shift．See text above regarding execution time and format of negative counts．

Equivalent to ANS Forth word LShIft．
shiftd（n1 n2 n3－－n4 n5 ）carry $\pm(n 3>0) 11101111$
Shift Double
0xEF
1 to 15 CPU－clocks
Shift the cell pair n2n1 by $|n 3|$ bits leaving the resulting cell pair n5n4．If $n 3$ is positive the shift is to the left，each bit is shifted out of n 2 through carry，and zero is shifted into each bit on the right into n1．If n3 is negative the shift is to the right，each bitshifted out of $n 1$ is shifted through the GRSextension，and carry is copied into each high order bit of n 2 vacated by the shift．See text above regarding execution time and format of negative counts．

## Microprocessing Unit

PSC1000 MICRO PRO CESSO R
MNEMONIC

Shift Left

```
shl #1 ( n1 -- n2
Shift Left
carry\pm
1110 0010
( n1 -- n2 )
0xE2
```

1 CPU-clock
Shift $n 1$ one bit to the left leaving the result $n 2$. The high order bit of $n 1$ shifted out goes into carry. The vacated bit on the right of n 1 is filled with zero.

Equivalent to ANS Forth word $2 *$.

```
shl #8 ( n1 -- n2 )
Shift Left Byte
carry\pm
110 1100
0xEC
```

1 CPU-clock
Shift n1 eight bits (one byte) to the left leaving n2. The last bit shifted out goes into carry. The vacated eight bits on the right are filled with zeros.

```
shld #1 ( n1 n2 -- n3 n4 )
Shift Left Double
carry\pm
110 0110
1 CPU-clock
```

Shift cell pair n 2 n 1 one bit to the left leaving the result n 4 n 3 . The high order bit of n 2 shifted out goes into carry. The vacated bit on the right of $n 1$ is filled with zero.

Equivalent to ANS Forth word D2*.

## shr_

Shift Right
shr \#1 ( n1 -- n2 ) 11100011
Shift Right 0xE3
1 CPU-clock
Shift $n 1$ one bit to the right leaving the result $n 2$. The bit shifted out is shifted into the GRS extension. The vacated bit on the left is filled with carry.
shr \#8 ( n1 -- n2 ) 11101101
Shift Right Byte 0xED
1 CPU-clock
Shift n1 eight bits (one byte) to the right leaving the result n2. The bits shifted out are shifted into the GRS extension. The vacated eight bits on the left are filled with carry.

```
shrd #1 ( n1 n2 -- n3 n4 )
Shift Right Double
Shift Right Double
\(0 x E 7\)
```

1 CPU-clock
Shift cell pair $n 2 n 1$ one bit to the right leaving the result $n 4 n 3$. The bit shifted out of $n 1$ is shifted into the GRS extension. The vacated bit in n 2 on the left is filled with carry.

# Microprocessing U nit 

PSC1000 MICRO PRO CESSO R
MNEMONIC STACKS ( input $\mathrm{Sn} / \mathrm{Rn}$... S0/R0 -- output $\mathrm{Sm} / \mathrm{Rm} . . . \mathrm{SO} / \mathrm{RO}$ )

CARRY?
OPCODE

## skip_

Skip if Condition
skip conditionally or unconditionally skips execution of the remainder of the instruction group. If the condition is true, skip the remainder of the instruction group and continue execution with the following instruction group. If condition is false, continue execution with the next instruction.

WARNING: Do not skip a push. 1 \#. Since the MPU will not have executed the push. 1 \# opcode, the corresponding literal cell is not skipped. The result will be the M PU executing the literal cell.

| skip (-- ) | 00110000 |
| :--- | ---: |
| Skip Unconditionally | $0 \times 30$ |$\quad$ Mprefetch CPU-clocks

Unconditionally skip the remainder of the instruction group.

```
skipc ( -- )
Skip if Carry 0x31

1 (no carry) Mprefetch (carry) CPU-clocks If carry is set, skip the remainder of the instruction group and continue execution with the next instruction group; otherwise, continue execution with the next instruction.
skipn
skipnp ( n -- )
Skip if Negative/Not Positive
00110010
\(0 \times 32\)
1 (not neg) Mprefetch (neg) CPU-clocks If \(n\) is negative (neither positive norzero), skip the remainder of the instruction group and continue execution with the next instruction group; otherwise, continue execution with the next instruction.
```

skipnc
( -- )

```

00110111
\(0 \times 35\)
If carry isclear, skip the remainder of the instruction group and continue execution with the nextinstruction group; otherwise, continue execution with the next instruction.
```

skipnn

```
skipp ( n -- )
00110110
Skip if Not Negative/Positive
\(0 \times 36\)
1 (neg) Mprefetch (not neg) CPU-clocks If n is not negative (either positive or zero), skip the remainder of the instruction group and continue execution with the next instruction group; otherwise, continue execution with the next instruction.


1 （zero）Mprefetch（non－zero）CPU－clocks If \(n\) is notzero，skip the remainder of the instruction group and continue execution with the next instruction group；otherwise，continue execution with the next instruction．
```

skipz ( n -- )
Skip if Zero 0x33

```
1 (non-zero) Mprefetch (zero) CPU-clocks

If \(n\) is zero，skip the remainder of the instruction group and continue execution with the next instruction group；otherwise，continue execution with the next instruction．

\section*{split \\ Split Cell}
split（ n1－－n2 n3 ）
0x99
1 CPU－clock
Split \(n 1\) into two parts so that the lower－half of \(n 1\) is in the lower－half of \(n 2\) and the upper－half of \(n 1\) is in the lower－half of n3．

For example，if \(n 1=0 \times 12345678\) then \(n 2=0 \times 5678\) and \(n 3=0 \times 1234\) ．

\section*{Microprocessing Unit}

\section*{PSC1000 MICRO PRO CESSO R}

MNEMONIC
STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...SO/R0 )

CARRY?
OPCODE

\section*{st}
```

Store Indirect to Memory

```
\[
\text { st }[--r 0] \quad(n--)
\]

Decrement ro by four. Store the cell \(n\) into memory at the new address in \(r 0\). The two least-significant bits of the address are ignored and treated as zero.
st [--x] ( \(n--\) )
01101000
\(0 \times 68\)
\(1+\mathrm{M} \mathrm{CPU-clocks}\)
Decrement \(x\) by four. Store the cell \(n\) into memory at the new address in \(x\). The two least-significant bits of the address are ignored and treated as zero.
```

st [r0++] ( n -- )

```

01100110
0x66
M CPU-clocks
Store the cell \(n\) into memory at the address in ro. Increment ro by four. The two least-significant bits of the address are ignored and treated as zero.
st [r0]
( \(n--\) )
01100010
\(0 \times 62\)
M CPU-clocks
Store the cell n into memory at the address in ro. The two least-significant bits of the address are ignored and treated as zero.
st \([\mathrm{x}++] \quad(\mathrm{n}--)\)
01101001
0x69
M CPU-clocks
Store the cell \(n\) into memory at the address in \(x\). Increment \(x\) by four. The two least-significant bits of the address are ignored and treated as zero.
st [x]
( \(n--\) )
01100001
\(0 \times 61\)
M CPU-clocks
Store the cell \(n\) into memory at the address in \(x\). The two least-significant bits of the address are ignored and treated as zero.

MNEMONIC STACKS（ input Sn／Rn．．．S0／R0－－output Sm／Rm．．．S0／R0 ）CARRY？
st［］
（ \(n\) addr－－\(n\) ）
01100000
\(0 \times 60\)
M CPU－clocks
Store the cell n into memory at address addr．The two least－significant bits of the address are ignored and treated as zero．

\section*{step}

Single－Step Processor

Pop addrl from the local－register stack into pc and continue execution at addrl for one instruction．Then perform a call subroutine to the single－step trap location， \(0 \times 138\) ．addr2 is the address of the next instruction following addr1．

\section*{sto}

Store Indirect to On－Chip Resource
sto［］（ \(n\) addr \(--n\) ）
10110000
\(0 \times B 0\)
1 CPU－clock
Store n into the on－chip resource register at address addr．The programmer must ensure that sto［］is not executed to access（even if not changed）any configuration register containing information for a memory group with a bus transaction in process．For valid values of addr，see On－Chip Resource Registers，page 129.
```

sto.i [] ( n bit_addr -- n )

```

If \(n\) is non－zero，set the bit at the on－chip resource register address bit addr；otherwise，clear the bit．For valid values of addr，see On－Chip Resource Registers，page 129.

\section*{Microprocessing Unit}

\section*{PSC1000 MICRO PRO CESSO R}

MNEMONIC
STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...SO/R0 )

CARRY?
OPCODE

\section*{sub}

Subtract
sub

> ( n1 n2 -- n3 )
carry \(\pm\)
11001000
0xC8
1 CPU-clock
Subtract n 2 from n 1 leaving the difference n 3 . If computing the difference required a borrow, carry is set; otherwise, carry is cleared.

Equivalent to Java byte code isub.

Equivalent to ANS Forth word -.

\section*{subb}

Subtract with Borrow
subb ( n1 n2 -- n3 ) carry \(\quad 11001010\)
\(0 \times C A\)
1 CPU-clock
Subtract n 2 and carry from n 1 leaving the difference n 3 . If computing the difference required a borrow, carry is set; otherwise, carry is cleared.

\title{
32-BIT RISC PROCESSO R
}

MNEMONIC STACKS ( input \(\mathrm{Sn} / \mathrm{Rn} . . . \mathrm{So} / \mathrm{Ro}\)-- output \(\mathrm{Sm} / \mathrm{Rm} . . . \mathrm{So} / \mathrm{Ro}\) ) CARRY? OPCODE

\section*{subexp}

Subtract Exponents


Perform the following:
Exponent_Field(n5) = Exponent_Field(n1) - Exponent_Field(n2) + BIAS - 1
Sign_Bit(n5) = Sign_Bit(n1) XO R Sign_Bit(n2)
BIAS is 127 ( \(0 \times 3 F 800000\) in bit position) for single precision and 1023 ( \(0 \times 3 F F 00000\) in bit position) for double precision, as selected by fp_precision.

Compute as described above. Clear the exponent-field bits and sign bit and set the hidden bit of \(n 1\) and n 2 giving n3 and n4, respectively. n5 is the result of the computation. After completion, ifthe exponent-field calculation result equaled or exceeded the maximum value of the exponent field (exponent result \(\geq 255\) for single, exponent result \(\geq 2047\) for double) an overflow exception is signaled. If the exponent-field calculation result is less than or equal to zero an underflow exception is signaled. When an exception is signaled, the exponent field of n 5 contains as many bits of the result as it will hold.

\section*{testb}

Test Bytes for Zero
testb ( \(n--n\) carry \(\pm 11011001\) \(0 \times D 9\)
1 CPU-clock
If any byte of n is zero set carry, otherwise clear carry.

\section*{Microprocessing Unit}

PSC1000 MICRO PRO CESSO R
MNEMONIC
STACKS ( input Sn/Rn...S0/R0 -- output Sm/Rm...S0/R0 )

CARRY?
OPCODE

\section*{testexp}

Test Exponent


Clear the GRS extension. If the exponent field in \(n 1\) or \(n 2\) is all zeros or all ones, an exponent exception is signaled and carry is set; otherwise, carry is cleared. The location of the exponent field depends on fp_precision.

\section*{xCg}

Exchange
\[
\operatorname{xcg} \quad(n 1 n 2--n 2 n 1)
\]

10110010
\(0 \times B 2\)
1 CPU-clock
Exchange the top two operand stack cells.

Equivalent to Java byte code swap.

Equivalent to the ANS Forth words FSWAP, SWAP.

\section*{xor}

Bitwise Exclusive OR
xor ( n1 n2 -- n3 )
carry clear 11000011
\(0 \times C 3\)
1 CPU-clock
Perform a bitwise EXCLU SIVE OR of n 1 and n 2 giving the result n 3 .

Equivalent to Java byte code ixor.
Equivalent to AN S Forth word XOR.

PSC1000 \({ }^{\text {™ }}\) Microprocessor
32-BIT RISC PROCESSO R

Table 39. MPU Mnemonics and Opcodes (Mnemonic Order)
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline \multicolumn{2}{|l|}{Mnemonic 0} & Opcode & \multicolumn{2}{|l|}{Mnemonic 0} & Opcode & \multicolumn{2}{|l|}{Mnemonic} & \multirow[t]{2}{*}{\begin{tabular}{l}
Opcode \\
a0
\end{tabular}} & \multicolumn{2}{|l|}{Mnemonic} & \multirow[t]{2}{*}{\begin{tabular}{l}
Opcode \\
83
\end{tabular}} \\
\hline add & & co & lframe & & be & pop & ro & & push & r3 & \\
\hline add & pc & bb & mloop & & 38 & pop & r1 & a1 & push & r4 & 84 \\
\hline adda & & e8 & mloopc & & 39 & pop & r2 & a2 & push & r5 & 85 \\
\hline addc & & c2 & mloopn & & 3 a & pop & r3 & a3 & push & r6 & 86 \\
\hline addexp & & d2 & mloopnc & & 3d & pop & r4 & a 4 & push & r7 & 87 \\
\hline and & & e1 & mloopnn & & 3 e & pop & r5 & a5 & push & r8 & 88 \\
\hline bkpt & & 3 c & mloopnp & & 3 a & pop & r6 & a6 & push & r9 & 89 \\
\hline br & offset & t 00... & mloopnz & & \(3 \pm\) & pop & r7 & a7 & push & r10 & 8 a \\
\hline br & [] & 4 b & mloopp & & 3 e & pop & r8 & a8 & push & r11 & 8 b \\
\hline bz & offset & t 10... & mloopz & & 3 b & pop & r9 & a9 & push & r12 & 8 c \\
\hline call & offset & \(t\) 08... & mulfs & & d6 & pop & r10 & aa & push & r13 & 8d \\
\hline call & [] & 4 e & muls & & d5 & pop & r11 & ab & push & r14 & 8 e \\
\hline cmp & & cb & mulu & & d7 & pop & r12 & ac & push & so & 92 \\
\hline copyb & & do & mxm & & df & pop & r13 & ad & push & s1 & 93 \\
\hline dbr & offset & t 18... & neg & & c9 & pop & r14 & ae & push & s2 & 9 e \\
\hline dec & \#1 & cf & nop & & ea & pop & sa & bc & push & sa & 9 c \\
\hline dec & \#4 & cd & norml & & c7 & pop & x & b8 & push & x & 98 \\
\hline dec & ct & c1 & normr & & c6 & push & & 92 & push.b & \#byte & 90 \\
\hline denorm & & c5 & notc & & dd & push & ct & 94 & push. 1 & \#cell & 4 f \\
\hline di & & b7 & or & & e0 & push & go & 70 & push.n & \#-7 & 29 \\
\hline divu & & de & pop & & b3 & push & g1 & 71 & push.n & \#-6 & 2 a \\
\hline ei & & b6 & pop & ct & b4 & push & g2 & 72 & push.n & \#-5 & 2 b \\
\hline eqz & & e5 & pop & go & 50 & push & g3 & 73 & push.n & \#-4 & 2 c \\
\hline expdif & & c4 & pop & g1 & 51 & push & g4 & 74 & push.n & \#-3 & 2d \\
\hline extexp & & db & pop & g2 & 52 & push & g5 & 75 & push.n & \#-2 & 2 e \\
\hline extsig & & dc & pop 9 & g3 & 53 & push & g6 & 76 & push.n & \#-1 & 2 f \\
\hline iand & & e9 & pop 9 & g4 & 54 & push & g7 & 77 & push.n & \#0 & 20 \\
\hline inc & \#1 & ce & pop g & g5 & 55 & push & 98 & 78 & push.n & \#1 & 21 \\
\hline inc & \#4 & cc & pop 96 & g6 & 56 & push & g9 & 79 & push.n & \#2 & 22 \\
\hline lcache & & 4 d & pop 9 & g7 & 57 & push & g10 & 7 a & push.n & \#3 & 23 \\
\hline 1 d & [--r0] & ] 44 & pop g & g8 & 58 & push & g11 & 7 b & push.n & \#4 & 24 \\
\hline 1 d & [--x] & 4 a & pop g & g9 & 59 & push & g12 & 7 c & push.n & \#5 & 25 \\
\hline 1 d & [r0++] & ] 46 & pop g & g10 & 5 a & push & g13 & 7 d & push.n & \#6 & 26 \\
\hline 1 d & [r0] & 42 & pop g & g11 & 5 b & push & g14 & 7 e & push.n & \#7 & 27 \\
\hline 1 d & [ \(\mathrm{x}+\mathrm{+}\) ] & 49 & pop g & g12 & 5 c & push & g15 & 7 f & push.n & \#8 & 28 \\
\hline 1 d & [x] & 41 & pop g & g13 & 5d & push & la & 9d & replb & & da \\
\hline 1 d & [] & 40 & pop g & g14 & 5 e & push & 1stack & k \(\quad 9 \mathrm{a}\) & replexp & & b5 \\
\hline ld.b & [] & 48 & pop g & g15 & 5 f & push & mode & 91 & ret & & 6 e \\
\hline 1 do & [] & 96 & pop l & 1 a & bd & push & ro & 80 & reti & & \(6 \pm\) \\
\hline ldo.i & [] & 97 & pop l & 1stack & - ba & push & r1 & 81 & rev & & e4 \\
\hline 1 depth & & 9 b & pop m & mode & b9 & push & r2 & 82 & rnd & & d1 \\
\hline
\end{tabular}

\section*{Microprocessing U nit}

PSC1000 MICRO PROCESSO R

Table 39. MPU Mnemonics and Opcodes (Mnemonic Order, continued)
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|}
\hline Mnemonic & Opcode & Mnemonic & Opcode & \multicolumn{2}{|l|}{Mnemonic} & Opcode & \multicolumn{2}{|l|}{Mnemonic} & Opcode \\
\hline scache & 45 & shr \#8 & ed & skipz & & 33 & sto & [] & b0 \\
\hline sdepth & \(9 f\) & shrd \#1 & e7 & split & & 99 & sto.i & [] & b1 \\
\hline sexb & d8 & skip & 30 & st & [--r0] & ] 64 & sub & & c8 \\
\hline sframe & bf & skipc & 31 & st & [--x] & 68 & subb & & ca \\
\hline shift & ee & skipn & 32 & st & [r0++] & ] 66 & subexp & & d3 \\
\hline shiftd & ef & skipnc & 35 & st & [r0] & 62 & testb & & d9 \\
\hline shl \#1 & e2 & skipnn & 36 & st & [ \(\mathrm{x}+\mathrm{+}\) ] & 69 & testexp & & d4 \\
\hline shl \#8 & ec & skipnp & 32 & st & [x] & 61 & xcg & & b2 \\
\hline shld \#1 & e6 & skipnz & 37 & st & [] & 60 & xor & & e3 \\
\hline shr \#1 & e3 & skipp & 36 & step & & 34 & & & \\
\hline
\end{tabular}

PSC1000 \({ }^{\text {m }}\) Microprocessor
32-BIT RISC PROCESSO R
Table 40. MPU Mnemonics and Opcodes (Opcode Order)


\section*{Microprocessing Unit}

PSC1000 MICRO PROCESSO R

Table 40. MPU Mnemonics and Opcodes (Opcode Order, continued)


NOIL甘WYO』NI ヨコNシヘO＊

\section*{Virtual Peripheral Unit}

\section*{PSC1000 MICRO PROCESSO R}

\section*{Virtual Peripheral U nit}

The Virtual Peripheral U nit(VPU ) is a special-purpose processing unit that executes instructions to transfer data betw een devices and memory, refresh dynamic memory, measure time, manipulate bit inputs and bit outputs, and perform system timing functions. With these functions the VPU can be programmed to emulate serial ports, analog to digital converters, digital to analog converters, PW M outputs, timers, and other peripherals. VPU programs are usually written to be entirely temporally deterministic. Because itcan be difficult or impossible to write programs that contain conditional execution pathsthat execute in an efficient temporally deterministic manner, the VPU contains no computational and minimal decisionmaking ability. VPU programs are intended to be relatively simple, using interrupts to the MPU to perform computation or decision making.

To ensure temporally deterministic execution, the VPU exercises absolute priority over bus access. Bustiming must always be deterministic; "wait states" of fixed length are programmed in the MIF. Temporal determinism is achieved by counting VPU-execution and busCPU-clock cycles betw een the timed VPU events. Bus access is granted to the VPU unless it is executing delay, which allows M PU and D M A requests access to the bus during a specified time. Thus, when a memory access is required, the VPU simply seizes the bus and performs the required operation at precisely the programmed instant.

The MIF ensures that the bus is available when the VPU requires it. The MPU and the DMAC request the bus from the MIF, which prioritizes the requests and grants the buswhile the VPU is executing delay. The MIF ensures that any transactions are completed before the delay time of the VPU expires and the VPU next requires the bus.


Figure 12. VPU Block Diagram

32－BIT RISC PROCESSO R

W hen transferring data，the VPU does not modify any data that is transferred；it only causes the bus transac－ tion to occur at the programmed time．It performs time－synchronous I／O－channel transfers，as opposed to the DMAC，which prioritizes and performs asyn－ chronous I／O－channel transfers．O ther than how they are initiated，the two types of transfers are identical．

\section*{U sage}

A VPU program can be used to eliminate external logic and simplify system designs．By using the VPU for timing－dependent system and application opera－ tions，timing constraints on the MPU program can often be eliminated or greatly relaxed．Additionally， the VPU with the assistance of the M PU can emulate a wide variety of system peripherals including serial ports，analog to digital converters，digital to analog converters，PW M outputs，timers，and other peripher－ als．

For example，a VPU program of about 150 bytes supplies the data transfers and timing for a video display．The program produces vertical and horizontal sync，and transfers data from DRAM to a video shift register or palette．Additionally，the VPU supplies flexibility．Video data from various areas of memory could be displayed，without requiring that the data be moved to create a contiguous frame buffer．As new data areas are specified，the VPU instructions are rew ritten by the M PU to change the program the VPU executes for the next video frame．While this is executing，the MPU still has access to the bus to execute instructions and process data，and the DMAC still has access to the bus to transfer data．

M any other applications are possible．TheVPU is best used for applications that require data to be moved， or some other event to occur，at specific times．For example：
－sending digitized 16 －bit data values to a pair of DACs to play CD－quality stereo sound，
－sampling data from inputdevices atspecified time intervals for the M PU to later process，
－sending data and control signals to display images on an LCD display，
－transferring data packetsfor an intelligentnetwork interface，
－transferring synchronous data blocks for an intelligent SCSI controller，
－sending multiple channels of data to DACs for a wave－table synthesizer，
－controlling video and I／O for serial and X－W in－ dows video terminals or PC video accelerators，
－controlling timed events in process－control environments，
－controlling ignition and fuel for automotive engines，
－inputting and outputting serial data streams，
－producing PWM outputdirectly or for integration by an external R－C network for a low－cost digital to analog converter，or
－combining several of the above to significantly reduce system cost．

The VPU is designed to dictate access to the bus（to ensure temporally deterministic execution），but to be a slave to the MPU．TheVPU can communicate status to the MPU by：
－the status changing on a device the VPU has accessed，
－loading a value in a global register，
－setting a bit output，or
－consuming a bit input．
The MPU can control the VPU by：
－rewriting VPU instructions in memory，
－modifying the global registers the VPU is using，
－clearing a bit input，or
－resetting the VPU．
The events controlled are not required to occur at a persistent，constant rate．The VPU is appropriate for applications whose event rates must be consistently controlled，whether once or many times．As an example of the former，the VPU can take audio data from memory and send it to a DAC to play the sound at a continuous rate，for as long as the audio clip lasts． As an example of the latter，the VPU can be synchro－ nized to the rotation of an automotive engine by the MPU in order for the VPU to time fuel injection and ignition，with the synchronization constantly changed by the M PU（by changing global registers or rewriting the VPU program）as the MPU monitors engine performance．

\section*{Virtual Peripheral U nit}

\section*{PSC1000 MICRO PROCESSO R}

\section*{Resources}

TheVPU consists of instruction decode and execution processes, and control paths to other CPU resources, as shown in Figure 12. The VPU and related registers include:
- Bit input register, ioin: bit inputs configured as DMA or interrupt requests, or general bit inputs. See Figure 26, page 131.
- Interruptpending register, ioip: indicates which interrupts have been recognized but are waiting to be prioritized and serviced. See Figure 27, page 132.
- Bit output register, ioout: bits that were last written by either the M PU or the VPU. See Figure 29, page 134.
- VPU reset register, vpureset: writing any value causes the VPU to begin execution at the VPU software reset address. See Figure 51, page 155.
- Global registers g1 through g7: contain values used by delay.
- Global registers g8 through g15: contain loop counts or I/O-channel transfer specifications. Transfer specifications consist of device and memory transfer addresses and control bits. See Figure 16, page 104.


Figure 13. VPU Register Usage

Register U sage The VPU shares global registers g1-g15 with the MPU, and uses them for loop counts, delay initialization counts, and transfer information. See Figure 13. Loop counts and delay counts are 32 bits. Transfer addresses in bits 31-2 typically address cells, but can also address bytes, depending on the I/O-channel configuration. Bitone determines whether the transfer is a memory write or a memory read, and bit zero enables interrupts on 1024-byte memory page boundary crossings (see Interrupts, below). See Figure 16, page 104.

The M PU can read or write any registers used by the VPU at any time. If there is a register-access contention betw een the MPU and the VPU, the MPU is held off until the VPU access is complete.

Table 41. VPU Instructions
\begin{tabular}{ll} 
DELAY & NO OPERATION \\
DECREMENT AND SKIP & OUTPUT TRUE \\
INTERRUPT MPU & OUTPUT FALSE \\
JUMP & REFRESH \\
LOAD REGISTER & TEST INPUT AND SKIP \\
MICRO-LOOP & TRANSFER
\end{tabular}

\section*{Instruction Set}

Table 41 lists the VPU instructions; Table 44 and Table 45, page 101, list the mnemonics and opcodes. Detail sof instruction execution are given in Instruction Reference, page 95.

\section*{Instruction Formats}

All instructionsconsist ofeightbitsexceptfor \(1 d\), which requires 32 -bit immediate data, and jump, which requires a page-relative destination address. The use of eight-bit instructions allows up to four instructions (referred to as an instruction group) to be obtained on each instruction fetch, thus reducing memory-bandwidth requirements compared to typical 32-bit processors. This characteristic also allows looping on the instruction group (a micro-loop) without additional instruction fetches, further increasing efficiency. Instruction formats are depicted in Figure 14.

Jumps
The instruction jump is variable-length. The jump opcode can occur in any position within the instruction group. The four least-significantbits in the opcode and all of the bits in the current instruction group to the right of the opcode are used for the page-relative destination address. See Figure 14 and Table 42. The size of the encoded page-relative destination address depends on the location of the opcode within the current instruction group. The bits are used to replace the same cell-address bits within the next VPU pc. These destination addresses are cell-aligned to maximize the range of the destination address bits and the number of instructions that are executed at the destination. The next VPU pc is the cell-aligned

32-BIT RISC PROCESSO R
address following the current instruction group, incremented for each id instruction that preceded the jump in the current instruction group. If the destination address bits are not of sufficient range for the jump to reach the destination, the jump must be moved to an instruction group where more destination address bits are available.

Table 42. VPU Branch Ranges
\begin{tabular}{|c|c|c|}
\hline & B its & Page-Relative Range \\
\hline & 4 & 64 bytes \\
\hline D & 12 & 4096 bytes \\
\hline \(\checkmark\) & 20 & 1048576 bytes \\
\hline D & 28 & 268435456 bytes \\
\hline \(\bigcirc\) & \multicolumn{2}{|l|}{Encoded bits replace the same number of bits from A2 upward in the VPU next PC; A1 and A0 are zero.} \\
\hline
\end{tabular}

Literals
The instruction ld requires a total of 40 bits, eight bits for the opcode in the current instruction group, and 32 bits following the current instruction group for the literal data. The ld opcode can occur in any position within the instruction group. The data for the first ld in an instruction group immediately follows the instruction group in memory; the data for each subsequent ld occupies successive locations. The four least-significantbits in the opcode contain the number of the global register that is the destination for the data. Global register zero ( g 0 ) is not allowed.

\section*{O thers}

All other instructions require eight bits. M ost have a register or bit number encoded in the three or four least-significant bits of the opcode. See Instruction Reference, page 95 , for details on the other individual instructions.

\section*{Execution Timing}

Counting execution CPU-clocks cycles is the key to programming the VPU. Each instruction requires execution time as described in Instruction Reference, page 95 . In general, instructions execute in one CPU-
Jumps
\begin{tabular}{|c|c|c|c|c|}
\hline opcode & opcode & opcode & jump & 4-bit destination \\
\hline opcode & opcode & jump & dest & 12-bit destination \\
\hline opcode & jump & \multicolumn{2}{|l|}{destination} & 20-bit destination \\
\hline jump & \multicolumn{3}{|c|}{destination} & 28-bit destination \\
\hline
\end{tabular}
Literals
\begin{tabular}{|c|c|c|c|c|}
\hline opcode & Id \#,gn & opcode & opcode & \multirow[t]{2}{*}{load register (any position)} \\
\hline \multicolumn{4}{|c|}{data for first ld \#,gn} & \\
\hline \multicolumn{4}{|l|}{data for second Id \#,gn (if present)} & \\
\hline \multicolumn{4}{|l|}{data for third Id \#,gn (if present)} & \\
\hline \multicolumn{4}{|l|}{data for fourth Id \#,gn (if present)} & \\
\hline opcode & opcode & opcode & opcode & \\
\hline
\end{tabular}

\section*{All Others}
opcode opcode opcode opcode
opcode opcode opcode opcode

Figure 14. VPU Instruction Formats
clock cycle, and, if they require a bus transaction, the instruction execution overlaps the time for the bus transaction. A timing with an " M " indicates the specified number of bus requests and bus transactions (memory cycles) for the instruction to complete. Bus requests require two CPU-clock cycles and bus transaction times are as programmed and described in Programmable M emory Interface, page 117, and Bus \(O\) peration, page 157.The value used for " M " includes both the bus request and bus transaction times.

Additionally, instruction fetch betw een the execution of instruction groups must be considered and requires "M" CPU-clock cycles. There is no instruction prefetch in the VPU, so timing computation is simplified. When execution of the instructions in an instruction group completes, instruction fetch begins during the next CPU-clock cycle.

To ensure deterministic timing, the programmer must keep track of the addresses being accessed and whether or not a RAS cycle or a CAS cycle will occur. This is fairly simple. There are only two cases in which RAS cycles occur in the VPU:

\section*{Virtual Peripheral Unit}

\section*{PSC1000 MICRO PROCESSOR}
1. A RAS cycle is forced by the VPU on the first bus transaction to each memory group after the execution of delay or refresh. This guarantees, regardless of whether or not the current RAS page on a memory group is the target page, that the bus timing will be known: a RAS cycle.
2. A RAS cycle occurs when the memory page accessed is not the current RAS page on the target memory group. W hile this seems unknowable, with case 1, above, and a little care, it is easy to know if the target page is the current page. Case 1 eliminates all possibilities of the MPU or DMA making bus access timing non-deterministic. This limits RAS cycles to only those caused by the VPU program. H ere, again, other than at initialization, there are only two cases:
A. Locating the VPU program to fully reside within a single RAS page, or in SRAM, eliminates RAS cycles due to instruction fetch page crossings. Alternatively, so long as the location of the page crossing is known, the RAS cycle can be considered in the VPU programming execution timing.
B. Planning of data transfers with the instruction xfer allows timing to be known and considered. Placing data transfer buffers fully within a single RAS page, or planning the starting address to know when page crossings occur, allows deterministic timing.

\section*{Techniques}

Creating correcttiming in A VPU program is matter of counting instruction executions and determining the type of memory accesses and the bustransaction times involved. M ostsimple, and many complex, programs executes an infinite loop. More complex programs execute continually changing program code.
- Straight in-line code is the easiest to program as there is only one path and no inner loops. Simply count the cycles through the path to determine the timing.
- mloops are also simple to program. The first access to the instruction group will require a bus transaction, butsubsequentiterationswill execute the instruction group without refetching the instructions.
- Counted program loops (other than mloops) are a little more complex. They are programmed using:

Table 43. Code Example: VPU DRAM Refresh
; VPU DRAM Refresh
; A typical 256K DRAM requires 512 refreshes every ; 8 ms . That means we require a refresh every ; 15.625 us, or a total loop time below of 31.250 us ; since we do two refreshes per loop. Assuming a RAS ; cycle with the bus request takes 11 CPU-clock cycles, ; the loop below takes \(11+11+2+\) delay or \(35+\) delay ; CPU-clock cycles to execute.
\[
\begin{array}{lll}
\text { External_clock } & =50000000 & ; \mathrm{Hz} \\
\text { CPU_clock } & =(\text { External_clock } * 2) / 100000 \\
& & ; \text {; } 100 \mathrm{KHz} \\
\text { HndrdKHz_per_ns } & =10000 \quad & ; \text { scaling factor }
\end{array}
\]

VPU_start::
; Enter here from A VPU software reset.
; Total time to be taken by one loop iteration in
; nanoseconds.
Loop_ns \(=31250\)
; Number of CPU-clocks required by
; instructions except delay time.
Overhead_clocks = 35
; Instruction overhead in nanoseconds.
Overhead_ns =((Overhead_clocks * H̄ndrdKHz_per_ns) / C \(\bar{P} U_{-}\)clock)
; CPU-clock delay value required to achieve
; Loop_ns above.
Refresh_delay =(Loop_ns - O verhead_ns)/
(HndrdKHz_per_ns / CPU_clock)
Id \#R efresh_delay,g7
; Inst. Fetch, 11
VPU_Refresh_Loop::
\begin{tabular}{lll}
\begin{tabular}{l} 
refresh \\
refresh
\end{tabular} & \(; 11\) \\
delay & g7 & \(; 11\) \\
jump & VPU_Refresh_Loop & \(; 11\)
\end{tabular}
backward_label: :
\begin{tabular}{ll}
... & ; put loop body here \\
dskipz & gx, forward_label \\
jump & backward_label
\end{tabular}
forward_label::
They are more complex because the exittiming is one CPU-clock cycle shorter than the looping timing.

32-BIT RISC PROCESSO R
- M aintaining consistent timing on an event that is repeated throughout the program containing loops is even more complex. A good example of such a requirement is video generation, where horizontal sync must be maintained throughout the main program loop. N ested loops are used to create the top and bottom margins and data area of the screen and must generate precisely timed horizontal sync throughout. Separate delay values are required for the transitions into, out of, and inside each loop. W hen programmed appropriately, timing is simplified to loading at each point the delay count equal to the fixed interval required minus the interval instruction execution time.
- Alternatively, loops can be unrolled at the expense of additional memory. Timing is to the straightin-line case.
- Timing is also simplified by keeping duplicate timing code arranged with the same timing at each occurrence. In the video example, the horizontal sync pulse (three instructions) is alw ays kept within a single instruction group, thus creating a fixed timing element.
- Rearranging the sequence of instructions, where the sequence is not critical, can assist in creating the correct program timing. For instance, a register load for a loop, delay, or xfer value can occur anywhere preceding the instruction. Refresh instructionsalso can generally occur at any convenientlocation, so long as the overall rate is maintained.
- Often, timing is not required to be absolutely precise, nor absolutely consistent. Tolerances make coding easier. For instance, a 40 KHz audio stream could probably be played consistently, or randomly, plus or minus one microsecond and the variations not be audible to the listener.

A code example of a typical refresh routine is given in Table 43, and example video code is included with the Patriot software development tools..

\section*{Address Space, Memory and Device Addressing}

The VPU uses the same 32-bit address space as the M PU , but has its own program counter and executes independently and concurrently. I/O devices addressed during the execution of xfer are within the same address space. xfer bus transactions are identical to I/O-channel bus transactions except for
how they are initiated. See Direct Memory Access Controller, page 103.

\section*{Interrupts}

The VPU can request any of the eight M PU interrupts by executing int. The VPU can also request an MPU interrupt by accessing the last location in a 1024-byte memory page during the execution of xfer. xfer transfer interrupts and I/O-channel transfer interrupts are identical. See Direct M emory Access Controller, page 103, for more information. The MPU can respond to interrupt requests when the VPU next executes delay.

\section*{Bus Transactions}

VPU instruction-fetch bus transactions are identical to MPU memory-read bus transactions. xfer bus transactions are identical to DMA bus transactions exceptfor how they are initiated. See Bus O perations, page 157.

\section*{Bit Inputs and Bit O utputs}

The bit inputs in ioin are accessed by the VPU with tskipz. This instruction tests an inputbit, consumes it, and conditionally skips the remainder of the instruction group. This allows for polled device transfers or complex device-transfer sequences rather than the simple asynchronous transfers available with the DMAC. See Bit Inputs, page 111. Note that since tskipz causes conditional execution, care must be taken when designing program code that contains tskipz if deterministic execution is expected.

The bit outputs in ioout can be individually set or cleared by the VPU with outt and outf. They can be used to activate external events, generate synchronization pulses, etc. See Bit O utputs, page 115.

\section*{VPU Hardware and Software Reset}

After hardware reset, the VPU begins executing at address \(0 \times 80000004\), before the M PU begins execution. The VPU can then perform the RAS cycles required to initialize DRAM, and begin a program loop to maintain DRAM refresh, before executing delay to allow the MPU to configure the system.

\section*{Virtual Peripheral Unit}

PSC1000 MICRO PRO CESSO R

O nce the MPU has configured the system, the VPU typically is required to begin execution of its application program code. The VPU power-on-reset address selects the boot memory device, usually because A31 is set and other high address bits are zero. To clear A31 and thus begin execution in non-boot memory, a softw are reset must be issued by the M PU. SeeTable 43 , page 94 . The software reset is the only way to clearA31. The software reset can also be used in other instances to cause the VPU to begin execution of a new program. See Processor Startup, page 181.

\section*{Instruction Reference}

The following text contains a description of each of the VPU instructions. In addition to a functional description, at the right margin is the instruction opcode and the number of CPU -clock cycles required to execute. See Execution Timing, page 92.

32-BIT RISC PROCESSO R
MNEMONIC
OPCODE

\section*{delay}

\author{
delay gi
}
\(01010 x x x\)
5i hex
2+gi CPU-clocks
Load vpudelay from gi (global register i, g1-g7) and wait the specified number of CPU -clock cycles, allowing bus access for DMA and the M PU. gi is unchanged. vpudel ay counts down once each CPU -clock cycle. After vpudelay reaches zero, theVPU instruction after delay executes. N ote that instruction decode and termination requires two CPU -clock cycle for a total execution time of \(2+\) gi CPU-clock cycles. Within the opcode 0101 \(0 x x x\) binary, \(x x x\) is the register number (1-7).

DMA and MPU bus transactions are granted bus access only when vpudelay indicates that sufficient time remains for the complete bus transaction to occur. The first VPU memory access to each memory group after delay executes is forced to be a RAS cycle so that VPU execution timing is deterministic. See Table 53, page 126.

\section*{dskipz}

Decrement and Skip if Zero
dskipz gi
0110 1xxx
\(6 i\) hex
(not zero) 1 CPU-clock
(zero)M CPU-clocks
Decrement gi (global register \(\mathrm{i}, \mathrm{g} 8-\mathrm{g} 15\) ). If gi is zero, then skip the remainder of the instruction group and continue execution with the next instruction group; otherwise, continue execution with the next instruction. Primarily used to create program loops by following dskipz with jump. Loopscan be nested by using a different global register for each level of loop counter. W ithin the opcode 0110 xxxx binary, xxxx is the register number (8-15).

\section*{int}

Set Interrupt
int \(n\)
10010 xxx
\(9 n\) hex
1 CPU-clock
Set bit \(n\) of ioip to request interrupt \(n\). U sed to notify the MPU that an event has occurred. Within the opcode 1001 0xxx binary, \(x x x\) is the input bit number ( \(0-7\) ).

\title{
Virtual Peripheral U nit
}

PSC1000 MICRO PRO CESSO R
MNEMONIC

\section*{jump}
jump destination
0011 xxxx
3? hex
M CPU-clocks
Transfer execution to the page-relative, cell-aligned destination. The bits of destination replace the same celladdress bits within the current VPU pc. The number of bits within destination depends on the position of jump within the current instruction group. See page 92 . N ote that because of how jump functions, it cannot change A30 or A31. A VPU software reset from the M PU is used to clear A31 after power-on reset. See VPU Power-on and Software Reset, page 94.

\section*{ld}

Load Register
ld \#value, gi
0010 xxxx
\(2 n\) hex
M CPU-clocks
Load gi (global registeri, g1-g15) with the 32-bitconstant value. U sed to load values for \(x\) fer, mloop, dskipz and delay, or to communicate with the M PU. W ithin the opcode 0010 xxxx binary, xxxx is the register number (1-15).

\section*{mloop}

Micro-Loop on Register
mloop gi
0111 1xxx
\(7 i\) hex
1 CPU-clock
D ecrement gi (global register \(\mathrm{i}, \mathrm{g} 8-\mathrm{g} 15\) ). If gi is non-zero, transfer execution to the beginning of the instruction group. If gi is zero, continue execution with the instruction following mloop. U sed to loop on sequences of up to three other instructions without requiring the re-fetching of the instructions from memory. Within the opcode 0111 xxxx binary, xxxx is the register number (8-15).

\title{
32－BIT RISC PROCESSO R
}

OPCODE

\section*{nop}

No Operation
nop \(\begin{array}{r}11110000 \\ \\ \text { F0 hex }\end{array}\)
1 CPU－clock
Do nothing．Used to waste time or as a placeholder for an instruction to be later placed．

\section*{outf}

Set Bit Output False
outf \(n \quad 1011\) 0xxx Bn hex
1 CPU－clock
Clear bit output n ．Within the opcode \(10110 x x x\) binary， xxx is the bit number（ \(0-7\) ）．

\section*{outt}

Set Bit Output True
\begin{tabular}{rl} 
outt \(n\) & 1010 0xxx \\
An hex
\end{tabular}

Set bit output n ．Within the opcode 10100 xxx binary， xxx is the bit number（ \(0-7\) ）．

\title{
Virtual Peripheral U nit
}

PSC1000 MICRO PRO CESSO R
MNEMONIC

\section*{refresh}

\author{
refresh \\ 00010000 \\ 10 hex \\ M CPU-clocks
}

Perform a RAS-only memory refresh cycle simultaneously on all memory groups so enabled. msrra, msrha, and msra31 are used as the RAS refresh address. msrra is incremented. msrtg specifies the memory group whose RAS cycle timing is used for the refresh cycle. See Figure 44, page 151. mgXrd enables or disables refresh on each memory group. See Figure 33, page 139. VPU program code must be written to include refresh at intervals adequate for any DRAM used. The first VPU memory access to each memory group after refresh executes is forced to be a RAS cycle so that VPU execution timing is deterministic. See Table 53, page 126.

\section*{tskipz}

Test Bit Input and Skip if Zero
tskipz \(n \quad 1000\) 0xxx
8n hex
(not zero) 1 CPU-clock
(zero) M CPU-clocks
If bit input n is zero, then consume the input and skip the remainder of the instruction group and continue execution with the nextinstruction group; otherwise, continue execution with the next instruction. U sed to cause the VPU code to operate conditionally on bit inputs. See Bit Inputs, page 111. W ithin the opcode \(10000 \times x x\) binary, \(x x x\) is the input bit number ( \(0-7\) ).

\section*{xfer}

Transfer Data
xfer gi
0000 1xxx
\(0 i\) hex
M CPU－clocks
Cause an I／O－channel transfer to occur immediately using gi，（global register i，g8－g15）．gi containsthe device address，memory address，and control information．See Figure 16 ．If bit one of gi is zero，perform a write bus transaction；if it is one，perform a read bus transaction．Increment bits 2－15 of gi．If bits 2－15 of gi are zero and bit zero of gi is one，then assert interrupt request \(\mathrm{i}-8\) ．Within the opcode 0000 xxxx binary， xxxx is the register number（8－15）．

The type of bustransaction performed depends on whether the memory group involved iscell－wide or byte－wide （see Figure 34，page 140）and on the device transfer type（see Figure 46 and Figure 47，page 152）．xfer bus transactions are identical to D M A bus transactions except for how they are initiated．See DirectM emory Access Controller，page 103.

\section*{Virtual Peripheral Unit}

PSC1000 MICRO PRO CESSO R

Table 44. VPU Mnemonics and Opcodes (Mnemonic Order)
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline \multicolumn{2}{|l|}{Mnemonic} & Opcode & \multicolumn{2}{|l|}{Mnemonic} & Opcode & \multicolumn{2}{|l|}{Mnemonic} & Opcode & \multicolumn{2}{|l|}{Mnemonic} & Opcode \\
\hline delay & g1 & 51 & int & 6 & 96 & mloop & g11 & 7 b & outt & 7 & a7 \\
\hline delay & g2 & 52 & int & 7 & 97 & mloop & g12 & 7 c & refresh & & 10 \\
\hline delay & g3 & 53 & jump & dest & 30... & mloop & g13 & 7d & tskipz & 0 & 80 \\
\hline delay & g 4 & 54 & ld & \#, g1 & 21 & mloop & g14 & \(7 e\) & tskipz & 1 & 81 \\
\hline delay & g5 & 55 & ld & \#, g2 & 22 & mloop & g15 & 7 f & tskipz & 2 & 82 \\
\hline delay & 96 & 56 & ld & \#, g3 & 23 & nop & & f0 & tskipz & 3 & 83 \\
\hline delay & g7 & 57 & ld & \#, 94 & 24 & outf & 0 & b0 & tskipz & 4 & 84 \\
\hline dskipz & g8 & 68 & ld & \#, 95 & 25 & outf & 1 & b1 & tskipz & 5 & 85 \\
\hline dskipz & g9 & 69 & ld & \#, 96 & 26 & outf & 2 & b2 & tskipz & 6 & 86 \\
\hline dskipz & g10 & 6 a & ld & \#, 97 & 27 & outf & 3 & b3 & tskipz & 7 & 87 \\
\hline dskipz & g11 & 6 b & ld & \#, 98 & 28 & outf & 4 & b4 & xfer & g8 & 08 \\
\hline dskipz & g12 & 6 c & ld & \#, g9 & 29 & outf & 5 & b5 & xfer & g9 & 09 \\
\hline dskipz & g13 & 6d & ld & \#, g10 & 2a & outf & 6 & b6 & xfer & g10 & 0 a \\
\hline dskipz & g14 & 6 e & ld & \#, g11 & 2b & outf & 7 & b7 & xfer & g11 & 0 b \\
\hline dskipz & g15 & 6 f & ld & \#, g12 & 2 c & outt & 0 & a0 & xfer & g12 & 0 c \\
\hline int & 0 & 90 & ld & \#, g13 & 2d & outt & 1 & a1 & xfer & g13 & 0d \\
\hline int & 1 & 91 & ld & \#, g14 & 2 e & outt & 2 & a2 & xfer & g14 & 0 e \\
\hline int & 2 & 92 & ld & \#, g15 & 2 f & outt & 3 & a3 & xfer & g15 & 0 f \\
\hline int & 3 & 93 & mloop & g8 & 78 & outt & 4 & a4 & & & \\
\hline int & 4 & 94 & mloop & g9 & 79 & outt & & a5 & & & \\
\hline int & 5 & 95 & mloop & g10 & 7 a & outt & 6 & a6 & & & \\
\hline
\end{tabular}

PSC1000 \({ }^{\text {m }}\) Microprocessor
32-BIT RISC PROCESSO R

Table 45. VPU Mnemonics and Opcodes (Opcode Order)


\section*{DMA Controller}

\section*{PSC1000 MICRO PRO CESSO R}

\section*{Direct Memory Access Controller}

A Direct Memory Access Controller (DMAC) allows I/O devices to transfer data to and from system memory without the intervention of the MPU. The DMAC supports eight I/O channels prioritized from eight separate sources. Direct memory access (DMA) requests are received from the bit inputs through ioin. DM A and MPU bus request priorities are either fixed, which allows higher-priority requests to block lower-priority requests, or revolving, which prevents higher-priority requests that cannot be satisfied from blocking lower-priority requests.

DMA is supported for both cell-wide and byte-w ide devices in both cell-wide and byte-wide memory. Each I/O channel can be individually configured asto the type of device and bus timing requirements. Bytewide devices transfer data on AD [7:0] and can be configured as either one-byte byte-transfer or four-byte byte-transfer devices. Transfers are flybys or are buffered, as required for the I/O -channel bus transaction. See Table 57, page 158. DMAC and VPU xfer
transfers are identical except for how they are initiated. DMAC transfers occur from asynchronous requests whereas xfer transfers occur at their programmed time.

\section*{Resources}

The DM AC consists of several registers and associated control logic. DMA request zero, which corresponds to bit zero of the registers, has the highest priority; DMA request seven, which corresponds to bit seven of the registers, has the low est priority. The DM AC and related registers include:
- Bit input register, ioin: bit inputs configured as DMA or interrupt requests, or general bit inputs. See Figure 26, page 131.
- Interrupt enable register, ioie: indicates which ioin bits are to be recognized as interrupt requests. See Figure 30, page 135.
- DMA enable register, iodmae: indicates which ioin bits are to be recognized as DMA requests. If DMA is enabled on an ioin bit, interrupt enable by ioie on that bit is ignored. See Figure 31, page 136. - DMA enable expiration register, iodmaex:

Figure 15. DMAC Block Diagram

32－BIT RISC PROCESSO R


Figure 16．I／O－Channel Transfer Data Format
indicates which iodmae bits are cleared following a DMA transfer involving the last location in a 1024－ byte memory page occurring on that channel．See Figure 49，page 153.
－Global registers 98 through g15：contain I／O－ channel transfer specifications．Transferspecifications consist of device and memory transfer addresses and control bits．See Figure 16，page 104.
－Fixed DMA priorities bit，fdmap，in register miscellaneous B ，miscb：prevents or allows lower－ priority bus requests to contend for access to the bus if a higher－priority request cannot be satisfied（i．e．，the available bus transaction slot is too small）．See Figure 34 ，page 140.

\section*{DMA Requests}

An ioin bit is configured as a DM A request source when the corresponding iodmae bit is set and the corresponding ioie bit is clear（though ioie is ignored when iodmae is set）．Once a zero reaches ioin，it is available to request a DMA I／O－channel transfer．See DMA U sage，page 112．A DMA request is forced in software by clearing the corresponding ioin bit．Individually disabling DM A operations on an I／O channel by clearing its iodmae bit prevents a corresponding zero bitin ioin from being recognized as a DMA request，but does not affect the zero－ persistence of the corresponding bit in ioin．

\section*{Prioritization}

A DMA request is prioritized with other pending DM A requests，and，if the request has the highest priority or is the next request in revolving－priority sequence（see
below），its corresponding I／O channel is the next to request the bus．DMA request prioritization requires one CPU－clock cycle to complete．When the I／O channel bus request is made，the MIF waits until the current bus transaction，if any，is almost complete．It then checks vpudelay to determine if the available bus slot is large enough for the required I／O channel bus transaction．If the bus slotis large enough，the bus is granted to the I／O channel，and the bus transaction begins．

The VPU always seizes the bus when vpudelay decrements to zero．Otherwise，a DMA I／O channel bus request and an MPU bus request contend for the bus，with the DMA I／O channel bus request having higher priority．

If \(f\) dmap is set and the bus slot is too small，the DMA I／O channel does not get the bus．Until a higher－ priority DMA I／O channel request is made that fits the shrinking available busslot，no bustransactionsoccur until the VPU seizes the bus．When the VPU next executes delay，the highest－priority D M A request，or the MPU if there are no DM A requests，repeats the bus request process．

If fdmap is clear and the bus slot is too small，the DMA I／O channel does not get the bus．The next lower－priority bus request is then allowed to request the bus，with the MPU as the lowest－priority request． The process repeats until the bus is granted or theVPU seizes the bus．When the VPU next executes delay， the highest－priority DM A request，or the M PU if there are no DMA requests，repeats the bus request process．

\section*{PSC1000 MICRO PROCESSO R}

\section*{Memory and Device Addressing}

Addresses used for I/O channel transfers contain both the I/O device address and the memory address. By convention, the uppermost address bits (when A31 is set) select I/O device addresses, while the lower address bits select the memory source/destination for the transfer. Multi-cycle transfer operations (e.g., transferring between a byte device and cell memory) assume A31 is part of the external I/O-device address decode and pass/clear A31 to select/deselect the I/O device as required during the bus transaction. See I/O Addressing, page 158, and I/O-Channel Transfers, page 159.

1024-byte memory page boundaries have special significance to I/O channel transfers. W hen each I/Ochannel bus transaction completes, bits 15-2 of the memory address in the global register are incremented. The new address is evaluated to determine if the last location in a 1024-byte memory page was just transferred (by detecting that bits 9-2 are now zero). W hen the last location in a 1024-byte memory page was just transferred, an MPU interrupt can be requested or DM A can be disabled. See Interrupts and Terminating DMA I/O-Channel Transfers, below.

\section*{Interrupts}

An MPU interrupt can be requested after an I/O channel transfer accesses the last location in a 1024byte memory page. The interrupt requested is the same asthe I/O-channel number, and occurs if interrupts are enabled on that channel (i.e., if bit zero of the corresponding global register is set). See Figure 16, and Interrupt Controller, page 107. This allows, for example, the MPU to be notified that a transfer has completed (by aligning the end of a transfer memory area with the end of a 1024-byte memory page), or to inform the M PU of progress during long transfers.

N ote that for the interrupt to be serviced, the M PU must obtain the bus for sufficient time to execute the ISR. If the VPU does not execute delay, or continuous DMA transfers occur, the MPU will be unable to get the bus.

\section*{Bus Transaction Types}

The type of bus transaction performed with an I/O device depends on whether the memory group involved iscell-wide or byte-wide and the whether the device is a one-byte byte-transfer, four-byte bytetransfer, or one-cell cell-transfer device. See I/OChannel Transfers, page 159.

\section*{Device Access Timing}

Any I/O device accessed during an I/O-channel transfer must complete the transfer by the end of the programmed bus cycle. W ait states are not available. Since I/O devices generally have longer access times than memory, during an I/O-channel bus cycle the programmed bus timing for the accessed memory group is modified by substituting ioXebt for the corresponding value in mgXebt. Note that ioXebt must be adequate both for the I/O device and for any memory group involved in the transfer. See Programmable M emory Interface, page 117.

\section*{Maximum Bandwidth Transfers}

W hen the external inputsource for ioin is \(\overline{I N}\) [7:0], maximum-bandwidth, back-to-back DMA transfers are possible. To achieve this, at the end of the DMA bus transaction an internal circuit bypasses the input sampling circuitry to check the DMA request bit directly on \(\overline{I N}[7: 0]\); if the signal is low and no higher-priority requests are pending, another DM A bus request occurs immediately without the usual sampling and prioritization delays. This requires that the external DM A hardware ensure the bit is valid at this time. See Figure 80, page 217. If the remaining bus slot is large enough, the DM A bus request is granted, and the transfer starts immediately. To terminate back-to-back DMA bustransactions, the DMA request input must go high before the end of the current DMA bus transaction, or the corresponding DMA enable bit must be cleared. See Terminating DMA I/O-Channel Transfers, below. The maximum possible transfer rate is four bytes every two CPU-clock cycles. For example, with a \(50-\mathrm{MHz} 1 \mathrm{X}\) clock, the maximum transfer rate is 200 MB /second.

32－BIT RISC PROCESSO R

Terminating D MA I／O－Channel Transfers
DMA I／O channel bus transactions occur on an I／O channel while DMA remains enabled and DMA requests are received．To limit DMA transfers to a specified number of transactions：
－program the DMA transfer address so that the last data transfer desired occurs using the last location in a 1024－byte memory page，and
－set the corresponding iodmaex bit．
When the above transaction completes，the DMA enable bit in iodmae is cleared．If the transfer interrupt is enabled in the global register for the corresponding I／O channel，a corresponding M PU interrupt is also requested．

If more than 1024 bytes are to be transferred，enable the transfer interrupt for the I／O channel in the corresponding global register．Program the interrupt service routine to check the global register for the next－to－last 1024－byte page，and，at that time，set the
corresponding iodmaex bit．When the last location in the next 1024－byte page is transferred，the corre－ sponding bitin iodmae is cleared，disabling DMA on that channel．Note that this assumes the bus is available to the MPU to execute the ISR during the DMA transfers．

\section*{O ther Capabilities}

The DMAC can also be used to count events，and to interrupt the M PU when a given count is reached．To do this，events are designed to produce a normal DMA memory read request，and the resulting transfer cycle increments the＂address＂in the corresponding global register．This＂address＂becomes the event counter． The M PU can also examine the register at any time to determine how many events have occurred．To interrupt the M PU after a given event count，program the global register for a negative count value within bits 9－2，and enable the page－boundary interrupt．The MPU is interrupted when the counter reaches zero．

\section*{PSC1000 MICRO PRO CESSO R}

\section*{Interrupt Controller}

An interrupt controller (INTC) allows multiple external or internal requests to gain, in an orderly and prioritized manner, the attention of the M PU. The INTC supports up to eight prioritized interruptrequests from twenty-four sources. Interrupts are received from the bit inputs through ioin, from I/O-channel transfers, or from the VPU interrupt instruction int.

\section*{Resources}

The IN TC consists of several registers and associated control logic. Interruptzero, which corresponds to bit zero of the registers, has the highest priority; interrupt seven, which corresponds to bit seven of the registers, has the lowest priority. The INTC and related registers include:
- Bit input register, ioin: bit inputs configured as DMA or interrupt requests, or general bit inputs. See Figure 26, page 131.
- Interrupt enable register, ioie: indicates which ioin bits are to be recognized as interrupt requests. See Figure 30, page 135.
- Interrupt pending register, ioip: indicates which interrupts have been recognized, butare waiting to be prioritized and serviced. See Figure 27, page 132.
- Interrupt under service register, ioius: indicates which interrupts are currently being serviced. See Figure 28, page 133.
- Global registers g8 through g15: contain I/Ochannel transfer specifications. Transfer specifications consist of device and memory transfer addresses and control bits. Bit zero enables interrupts during I/Ochannel transfers on the corresponding channel. See Figure 16, page 104.
- DMA enable register, iodmae: indicates which ioin bits are to be recognized as DMA requests. If DMA is enabled on an ioin bit, interrupt enable by ioie on that bit is ignored. See Figure 31, page 136.

Table 46. Sources of Interrupts
\begin{tabular}{|c|l|}
\hline Interrupt \# & \multicolumn{1}{|c|}{ Interrupt Source } \\
\hline \multirow{3}{|c|}{\begin{tabular}{c} 
ioin bit X \\
X
\end{tabular}} & \begin{tabular}{l} 
I/O channel \(X\) (register \\
\\
VPU instruction int \(X\)
\end{tabular} \\
\hline
\end{tabular}


Figure 17. INTC Block Diagram

\section*{0 peration}

Each interrupt request is shared by three sources．A request can arrive from a zero bit in ioin（typically from an external input low），from an I／O－channel transfer interrupt，or from the VPU instruction int． Interrupt request zero comes from io in bit zero，I／O channel zero（using g8），or int 0；interrupt request one comes from ioin bitone，I／O channel one（using g9），or int 1；the other interrupt requests are similarly assigned．See Table 46．Application usage typically designates only one source for an interrupt request，though this is not required．

Associated with each of the eight interrupt requests is an interrupt service routine（ISR）executable－code vector located in external memory．See Figure 5，page 16．A single ISR executable－code vector for a given interrupt request is used for all requests on that interrupt．It is programmed to contain executable code，typically a branch to the ISR．When more than one source is possible，the current source might be determined by examining associated bits in ioin， ioie，iodmae and the global registers．

\section*{Interrupt Request Servicing}

W hen an interruptrequest from any source occurs，the corresponding bit in ioip is set，and the interrupt requestisnow a pending interrupt．Pending interrupts are prioritized each CPU－clock cycle．The interrupt＿en bit in mode holds the current global interrupt enable state．It can be set with the MPU enable－interrupt instruction，ei；cleared with the disable－interrupt instruction，di；or changed by modifying mode．Globally disabling interrupts allows all interrupt requests to reach ioip，but prevents the pending interrupts in ioip from being serviced．

W hen interrupts are enabled，interrupts are recognized by the MPU between instruction groups，just before the execution of the first instruction in the group．This allows short，atomic，uninterruptable instruction sequences to be written easily withouthaving to save， restore，and manipulate the interrupt state．The stack architecture allows interrupt service routines to be executed without requiring registers to be explicitly saved，and the stack caches minimize the memory
accesses required when making additional register resources available．

If interrupts are globally enabled and the highest－ priority io ip bithasa higher priority than the highest－ priority ioius bit，the highest－priority ioip bit is cleared，the corresponding ioius bit is set，and the MPU is interrupted just before the next execution of the first instruction in an instruction group．This nests the interrupt servicing，and the pending interrupt is now the currentinterruptunder service．The ioip bits are not considered for interrupt servicing while interrupts are globally disabled，or while none of the ioip bits has a higher priority than the highest－priority ioius bit．

Unless software modifies ioius，the currentinterrupt under service is represented by the highest－priority ioius bit currently set．reti is used at the end of ISRs to clear the highest－priority ioius bit that is set and to return to the interrupted program．If the interrupted program was a lower－priority interrupt service routine，this effectively＂unnests＂the interrupt servicing．

\section*{External Interrupts}

An ioin bit is configured as an＂external＂interrupt request source if the corresponding io ie bitisset and the corresponding iodmae bit is clear．O nce a zero reaches ioin，it is available to request an interrupt． An interrupt request is forced in software by clearing the corresponding ioin bit or by setting the corre－ sponding ioip bit．Individually disabling an interrupt request by clearing its io ie bit prevents a correspond－ ing zero bit in ioin from being recognized as an external interrupt request，but does not affect a corresponding interrupt request from another source．

While an interruptrequestis being processed，until its ISR terminates by executing ret \(i\) ，the corresponding ioin bit is not zero－persistent and follows the sampled level of the external input pin．Specifically， for a given interrupt request，while its ioie bit is set， and its ioip bitor ioius bitis set，its ioin bitis not zero－persistent．This effect can be used to disable zero－persistent behavior on non－interrupting bits．

For waveforms, see Figure 82, page 219, and Figure 83 , page 221.

\section*{I/O -Channel Transfer Interrupts}

If an ioin bit is configured as a DMA request, or if that I/O channel is used by xfer, interrupt requests occur after a transfer involving the last location in a 1024-byte memory page, provided bit zero in the corresponding global register is set (i.e., transfer interrupts are enabled). The request occurs by the corresponding ioip bit being set, and is thus not disabled by clearing the corresponding ioie bit. See Direct Memory Access Controller, page 103, and Virtual Peripheral Unit, page 89.

\section*{VPU int Interrupts}

The VPU can also directly request any of the eight available interrupts by executing int. The request occurs by the corresponding ioip bit being set, and is thus not disabled by clearing the corresponding ioie bit. The MPU is able to respond to the interrupt request when the VPU next executes delay. VPU interrupts are disabled by modifying the VPU instructions in memory to remove the instruction int.

\section*{ISR Processing}

W hen an interrupt request is recognized by the M PU , a call to the corresponding ISR executable-code vector is performed, and interrupts are blocked until an instruction that begins in byte one of an instruction group is executed. To service an interrupt without being interrupted by a higher-priority interrupt:
- the ISR executable-code vectortypically contains a four-byte branch, and
- the first instruction group of the interrupt service routine must globally disable interrupts.
See the code example in Table 47.
If interrupts are left globally enabled during ISR processing, a higher-priority interruptcan interrupt the M PU during processing of the currentISR. This allows devices with more immediate servicing requirements to be serviced promptly even when frequentinterrupts at many priority levels are occurring.

Table 47. Code Example: ISR Vectors
\begin{tabular}{|c|c|c|}
\hline \multicolumn{3}{|l|}{; Interrupt Vectors} \\
\hline .quad .text & 4 vectors & ; org 0x100 set in linker \\
\hline \[
\begin{aligned}
& \mathrm{br} \\
& \mathrm{br}
\end{aligned}
\] & \[
\begin{aligned}
& \text { int 0_ISR } \\
& \text { int_1_ISR }
\end{aligned}
\] & ; highest-priority ISR \\
\hline \(\ddot{\mathrm{br}}\) & int_7_ISR & ; lowest-priority ISR \\
\hline .text & ISRs & ; org set in linker file \\
\hline \multicolumn{3}{|l|}{This ISR can't be interrupted because int 0 ; has the highest priority.} \\
\hline \[
\begin{aligned}
& \text { pop } \\
& \text { rit }
\end{aligned}
\] & mode & ; restore carry \\
\hline \[
\begin{aligned}
& \text { int_A_ISR:: } \\
& \text { push }
\end{aligned}
\] & mode & ; save carry \\
\hline \multicolumn{3}{|l|}{;This ISR can be interrupted by a higher ; priority interrupt. pop mode} \\
\hline \[
\underset{\substack { \text { int_B_ISR:: } \\
\begin{subarray}{c}{\text { di }{ \text { int_B_ISR:: } \\
\begin{subarray} { c } { \text { di } } } \\
{\hline \text { dish }}\end{subarray}}{\text { and }}
\] & mode & ; save carry \& ei state \\
\hline  & allow this ISR return befo 2 & to be interrupted at all. interrupts re-enabled \\
\hline
\end{tabular}
reti
int_C_ISR::
\begin{tabular}{lll} 
push & mode & ; save carry \& ei state \\
pop & Istack & ; place accessible
\end{tabular} di ; place accessible
; Don't allow this critical part of the ISR to be ; interrupted.
push r0
pop mode ; restore ei state
; ISR can be interrupted by higher-priority
; interrupts now
\begin{tabular}{lll} 
push & Istack \\
pop & mode \\
reti & & ; restore carry \\
... & &
\end{tabular}

N ote that there is a delay of one CPU-clock cycle between the execution of ei, di, or pop mode and the change in the global interrupt enable state taking effect. To ensure the global interrupt enable state change takes effect before byte zero of the next instruction group, the state-changing instruction must not be the last instruction in the current instruction group.

If the global interrupt enable state is to be changed by the ISR, the prior global interrupt enable state can be saved with push mode and restored with pop mode within the ISR. U sually a pop mode, reti sequence is placed in the same instruction group at the end of the ISR to ensure that ret i is executed, and the localregister stack unnests, before another interrupt is
serviced. Since the return address from an ISR is always to byte zero of an instruction group (because of the way interrupts are recognized), another interrupt can be serviced immediately after execution of reti. See the code example in Table 47.

As described above for processing ISR executablecode vectors, interrupt requests are similarly blocked during the execution of all traps. This allows software to prevent, for example, further data from being pushed on the local-register stack due to interrupts during the servicing of a local-register-stack overflow exception. When resolving concurrent trap and interrupt requests, interrupts have the low est priority.

\section*{Bit Inputs}

\section*{PSC1000 MICRO PROCESSOR}

\section*{Bit Inputs}

Eight external bit inputs are available in bit input register ioin. They are shared for use as interrupt requests, as DMA requests, as input to the VPU instruction tskipz, and as bit inputs for general use by the MPU. They are sampled externally from one of two sources determined by the state of pkgio.

\section*{Resources}

The bit inputs consist of several registers, package pins, and associated input sampling circuitry. These resources include:
- Bit input register, ioin: bit inputs configured as DMA or interrupt requests, or general bit inputs. See Figure 26, page 131.
- Interrupt enable register, ioie: indicates which ioin bits are to be recognized as interrupt requests. See Figure 30, page 135.
- Interrupt pending register, ioip: indicates which interrupts have been recognized, butare waiting to be prioritized and serviced. See Figure 27, page 132.
- Interrupt under service register, ioi us: indicates which interrupts are currently being serviced. See Figure 28, page 133.
- DMA enable register, iodmae: indicates which io in bits are to be recognized as DMA requests for the corresponding I/O channels. IfDMA is enabled on an ioin bit, interrupt enable by ioie on that bit is ignored. See Figure 31, page 136.
- Package I/O pins bit, pkgio, in register miscellaneous \(B\), miscb: selects whether the bit inputs are sampled from the dedicated inputs \(\overline{I N}[7: 0]\) or multiplexed off AD [7:0]. See Figure 34, page 140.

\section*{Input Sources and Sampling}

If pkgio is clear, the bit inputs are sampled from \(\operatorname{AD}[7: 0]\) while \(\overline{R A S}\) islow and \(\overline{C A S}\) ishigh. External


Figure 18. B it Input B lock Diagram

32－BIT RISC PROCESSO R
hardware must place the bit inputs on \(A D[7: 0]\) and remove them at the appropriate time．U sing AD［7：0］ for bitinputs can reduce PW B area and cost compared with using \(\overline{I N}[7: 0]\) ．AD［7：0］aresampled for input：
－while \(\overline{C A S}\) is high，four CPU－clock cycles after \(\overline{\text { RAS }}\) transitions low，
－every four CPU－clock cycles while \(\overline{\mathrm{CAS}}\) remains high，
－immediately before \(\overline{\mathrm{CAS}}\) transitionslow if at least four CPU－clock cycles have elapsed since the last sample，and
－fourCPU－clock cycles after \(\overline{\mathrm{CAS}}\) transitionshigh， provided \(\overline{\mathrm{CAS}}\) is still high．
This ensures：
－time for external hardware to place data on the bus before sampling，
－continuous sampling while \(\overline{\mathrm{CAS}}\) is high，and
－at least one sample every \(\overline{\mathrm{CAS}}\) bus cycle when four CPU－clocks have elapsed since the last sample．
To ensure sampling in a given state，an input bit must be valid at the designated sample times or remain low for a worst－case sample interval，which，as described above，depends on the programmed bus timing and activity．See Figure 83，page 221，for waveforms．

If pkgio is set，the bit inputs are sampled from IN［7：0］every four CPU－clock cycles．To ensure sampling in a given state，a bit input must be valid for just more than four CPU－clock cycles．See Figure 82， page 219，for waveforms．

All asynchronously sampled signals are susceptible to metastable conditions．To reduce the possibility of metastable conditions resulting from the sampling of the bit inputs，they are held for four CPU－clock cycles to resolve to a valid logic level before being made available to ioin and thus for use within the CPU． The worst－case sampling delay for bit inputs taken from \(A D[7: 0]\) to reach ioin depends on the bus cycle times．The worst－case sampling delay for bit inputs from \(\overline{\mathrm{IN}}[7: 0]\) to reach ioin is eight CPU－ clock cycles．The sample delay causes bit－input consumers not to detect an external signal change for the specified period．

The bit inputs reaching ioin are normally zero－ persistent．That is，once an ioin bit is zero，it stays zero regardless of the bitstate at subsequentsamplings until the bit is＂consumed＂and released，or is written with a one by the M PU ．Zero－persistent bits have the advantage of both edge－sensitive and level－sensitive inputs，without the noise susceptibility and non－ shareability of edge－sensitive inputs．U nder certain conditions during DMA request servicing and ioin interrupt servicing，the ioin bits are not zero－ persistent．See DMA Usage and Interrupt Usage below．An effect of the INTC can be used to disable zero－persistent behavior on the bits．See General－ Purpose Bits below．

\section*{D MA U sage}

An ioin bit is configured as a DMA request source when its corresponding iodmae bit is set．After the DMA bus transaction begins，the ioin bit is con－ sumed．

W hen the external inputsourcefor ioin is \(\overline{\mathrm{IN}}[7: 0]\) ， maximum－bandwidth back－to－back DMA transfers are possible．To achieve this，an internal circuit bypasses the sampling and zero－persistence circuitry to check the DMA request bit on \(\overline{I N}[7: 0]\) at the end of the DM A bus transaction without the usual sampling and prioritization delays．See Maximum Bandwidth Transfers，page 105.

\section*{Interrupt U sage}

An ioin bit is configured as an interrupt request source when the corresponding ioie bitis set and the corresponding iodmae bitisclear．W hile an interrupt request is being processed，until its ISR terminates by executing reti，the corresponding ioin bit is not zero－persistent and follows the sampled level of the external input．Specifically，for a given interrupt request，while its ioie bit is set，and its ioip bit or ioius bit is set，its ioin bit is not zero－persistent． This effect can be used to disable zero－persistent behavior on non－interrupting bits（see below）．

\section*{PSC1000 MICRO PROCESSOR}

Table 48. Code Example: B it Input Without ZeroPersistence


\section*{General-Purpose Bits}

If an ioin bit is configured neither for interrupt requests nor for DMA requests, then it is a zeropersistent general-purpose ioinbit. Alternatively, by using an effect of the INTC, general-purpose ioin bits can be configured withoutzero-persistence. Any bits so configured should be the lowest-priority ioin bits to preventblocking a lower-priority interrupt. They are configured by setting their ioie and ioius bits. The ioius bit prevents the ioin bit from zero-persisting and from being prioritized and causing an interrupt request. See the code example in Table 48.

\section*{VPU Usage}

An ioin bit are used as input to tskipz. This instruction reads, tests, and consumes the bit. The ioin bits cannot be written by the VPU. Generalpurpose ioin bits are typically used for tskipz, but there are no hardware restrictions on usage.

\section*{MPU U sage}

Bits in ioin are read and written by the MPU as a group with ldo [ioin] and sto [ioin], or are read and written individually with ldo.i [ioXin_i] and sto.i [ioXin_i]. W riting zero bits to io in has the same effect as though the external bit inputs had transitioned low for one sampling cycle, except that there is no sampling delay. This allows software to simulate events such as external interrupt or DMA requests. Writing one bits to ioin, unlike data from external inputs when the bits are zero-

Table 49. Code Example: MPU Usage of Bit Inputs

; R ead last sampled state of zero-persistent bit ; inputs. (Assumes all bits are configured as ; zero-persistent).
\begin{tabular}{lll} 
push.n \#-1 & ; all ones for all bits \\
\begin{tabular}{l} 
push.n \\
sto []
\end{tabular} & \begin{tabular}{l} 
\# temporarily remove \\
; persistence, latest \\
; sample latches, \\
; discard -1
\end{tabular} \\
pop & & \\
\begin{tabular}{l} 
push.n \\
ldo []
\end{tabular} & ; get last sample \\
... & &
\end{tabular}
persistent, releases persisting zeros to accept the current sample. The written data is available immediately after the write completes. The MPU can read ioin at any time, without regard to the designations of the ioin bits, and with no effect on the state of the bits. The MPU does not consume the state of ioin bits during reads. See the code examples in Table 49.

To perform a "real-time" external-bit-input read on zero-persistent bits, ones bits are w ritten to the bits of interest in ioin before reading ioin. This releases any persisting zeros, latches the most recently resolved sample, and reads that value. Bits that are not configured as zero-persistent do not require this w rite. N ote that any value read can be as much as two worst-case sample delays old. To read the values currently on the external inputs requires waiting two worst-case sample delays for the values to reach ioin. See the code example in Table 50.

Table 50. Code Example: MPU "Real-Time" Bit Input Read


\section*{Programmable M emory Interface}

\section*{PSC1000 MICRO PROCESSO R}

\section*{Bit O utputs}

Eight general-purpose bit outputs can be set high or low by either the MPU or the VPU. The bits are available in the bit output register, ioout.

\section*{Resources}

The bit outputs consistof a register, package pins, and associated circuitry. These resources include:
- Bit output register, ioout: bits that were last written by either the M PU or the VPU. See Figure 29, page 134.
- Outputs, out [7:0]: the dedicated output pins.
- Address Data bus, AD [7:0]: multiplexed bit outputs on these pins while \(\overline{\text { RAS }}\) is high.
- Output pin driver current bits, outdrv, in driver current register, driver: sets the drive capability of Out [7:0]. See Figure 50, page 154.

\section*{U sage}

The bits are read and written by the MPU as a group with ldo [ioout] and sto [ioout], or are read and written individually with ldo.i [ioXout_i] and sto.i [ioXout_i].

The bit outputs are written individually by the VPU with outt and out \(f\). The bit outputs cannot be read by the VPU.

When written, the new values are available immediately after the write completes. Note that if both the MPU and VPU write the same bit during the same CPU-clock cycle, any one bit written prevails.

The bits are always available on OUT [7:0], and on \(\operatorname{AD}[7: 0]\) when RAS is high. When sampled from \(\operatorname{AD}[7: 0]\), external hardware is required to latch the bits when \(\overline{\text { RAS }}\) falls. N ote that (by definition) these bits are only updated when a RAS cycle occurs. Using AD [7:0] for output can reduce PW B area and cost compared to using out [7:0]. See Figure 81, page 218, for waveforms.

The drive capability of OUT [7:0] can be programmed in driver.


Figure 19. Bit Outputs Block Diagram

NOIL甘WYO』NI ヨコNシヘO＊

\section*{Programmable M emory Interface}

\section*{PSC1000 MICRO PRO CESSO R}

\section*{Programmable Memory Interface}

The Programmable M emory Interface (MIF) allows the timing and behavior of the CPU bus interface to be adapted to the requirements of peripheral deviceswith minimal external logic, thus reducing system cost while maintaining performance. A variety of memory devices are supported, including EPROM, SRAM, DRAM and VRAM, as well as a variety of I/O devices. All operations on the bus are directed by the MIF. M ost aspects of the bus interface are programmable, including address setup and hold times, data setup and hold times, output buffer enable and disable times, write enable activation times, memory cycle times, DRAM-type device address multiplexing, and when DRAM -type RAS cycles occur. Additional specifications are available for I/O devices, including data setup and hold times, outputbuffer enable and disable times, and device transfer type (one-byte, four-byte or one-cell).

\section*{Resources}

The MIF consists of several registers, package pins, and associated control logic. These resources include: - VRAM control bit register, vram: controls \(\overline{\mathrm{OE}}\), \(\overline{L W E}\), CASes, RASes, and DSF to initiate special VRAM operations. See Figure 32, page 137.
- MiscellaneousA register, misca: controls refresh and RAS-cycle generation. See Figure 33, page 139. - Miscellaneous B register, miscb: selects each memory group data width (cell-wide or byte-wide), and the memory bank-select architecture. See Figure 34 , page 140.
- Memory system group-select mask register, msgsm: indicates which address bits are decoded to select groups of memory devices. See Figure 37, page 143.
- Memory group device size register, mgds: indicates the size and configuration of memory devices for each memory group. See Figure 38, page 144.
- Miscellaneous C register, miscc: controls RAScycle generation and the location of bank-select address bits for SRAM memory groups. See Figure 39, page 145.
- Memory group \(X\) extended bus timing register, mgXebt: indicates memory-cycle expansion or
extension values, which create longer data setup and hold times and outputbuffer enable and disable times for the memory devices in the corresponding memory group. See Figure 40, page 146.
- Memory group X CAS bus timing register, mgXcasbt: indicates the unexpanded and unextended address and data strobe activation times for the CAS portion of a bus cycle. See Figure 41, page 147.
- Memory group X RAS bus timing register, mgXrasbt: indicates the RAS precharge and address hold times to be prepended to the CAS part of a bus cycle to create a RAS cycle. See Figure 42, page 149. - I/O channel \(X\) extended bus timing register, ioXebt: indicates memory cycle expansion or extension values, which create longer data setup and hold times and outputbuffer enable and disable times for the I/O device on the corresponding I/O channel. See Figure 43, page 150.
- Memory system refresh address, ms ra: indicates the row address to be used during the next DRAM refresh cycle. See Figure 44, page 151.
- I/O device transfer types A register, iodtta: indicates the type of transfer for each of I/O channels \(0,1,2\) and 3 . See Figure 46, page 152.
- I/O device transfer types B register, iodttb: indicates the type of transfer for each of I/O channels 4, 5, 6 and 7 . See Figure 47, page 152.
- Driver current register, driver: indicates the relative drive current of the various outputdrivers. See Figure 50, page 154.

\section*{Memory System Architecture}

The MIF supports direct connection to a variety of memory and peripheral devices. The primary requirement is that the device access time be deterministic; w ait states are not available because they create nondeterministic timing for the VPU. The MIF directly supports a wide range of sizes for multiplexed-address devices (DRAM, VRAM , etc.) up to 128 MB, as well as sizes for demultiplexed-address devices (SRAM, EPRO M , etc. ) up to 1 M B. Fast-page mode access and RAS-only refresh to DRAM-type devices are supported. SRAM-type devices appear to the MIF as DRAM with no RAS address bits and a large number of CAS address bits. See Figure 38, page 144.

32-BIT RISC PROCESSO R


Figure 20. Group-Select and Bank-Select Bit Locations
Address bits are multiplexed outoftheCPU on AD [31:9] to reduce package pin count. DRAM-type devices collect the entire memory address in two pieces, referred to as the row address (upper address bits) and column address (lower address bits). Their associated bus cycles are referred to as Row Address Strobe (RAS) cycles and Column Address Strobe (CAS) cycles. W ith the exception of memory faults, refresh, and CAS-before-RAS VRAM cycles, a RAS cycle contains, enclosed within the \(\overline{\text { RAS }}\) active period, a CAS cycle. Thus, RAS cycles are longer than CAS cycles. W hile RAS cycles are not required for the operation of SRAM-type devices, RAS cycles can occur for several reasons which are discussed below.

Though I/O devicescan be addressed like memory for access by the MPU, I/O-channel transfers require addressing an I/O device and a memory location simultaneously. This is achieved by splitting the available 32 address bits into two areas: the lower address bits, which address memory, and the higher address bits, which address I/O devices. The location


Figure 21. SMB Memory Architecture

\section*{Programmable M emory Interface}

\section*{PSC1000 MICRO PROCESSO R}
of the split depends upon application requirements for the quantity of addressable memory and I/O devices installed. The areas can overlap, if required, with the side effect that an I/O device can only transfer data with a corresponding area of memory. These higher address bits are discussed below.

Memory Groups
The MIF operates up to four memory groups, maintaining for each the most recent RAS address bits and a unique configuration. Up to two address bits are decoded to determine the current group. The address bits for this function are set in the memory system


Figure 22. MMB Memory Architecture

32－BIT RISC PROCESSO R
group－select mask register，msgsm．Each memory group is programmed for device width，bus timing， and device size（which specifies how address bits are multiplexed onto AD［31：9］）．Address bits below the group－select mask are typically used to address memory devices or portions of an I／O device，and bits above the group－select mask are typically used to address I／O devices．

\section*{Memory Banks}

Each memory group can have one or more memory banks，which are selected in a manner dependent upon the bus interface mode．All memory banks within a memory group share the configuration and most recent RAS address of that group．Two address bits are decoded to determine the current memory bank．

In Single Memory Bank（SM B）mode（mmb \(=0\) ），msgsm sets the group－select and bank－select bits to be the same bits．This allows up to four groups at one bank per group，totaling four banks：group 0 ，bank 0 ；group 1，bank 1；group 2，bank 2；and group 3，bank 3. \(\overline{\text { MGSx／RASx }}\) output RASx signals for directconnec－ tion to memory devices．See Figure 21.

In Multiple Memory Bank（ M M B）mode（mmb＝1）， depending on whether msgsm overlaps the bank－select bits，one，two or four banks can be selected in each group．This allows up to sixteen banks for all groups combined；more banks can be decoded by defining additional bank－select bits with external logic．The address bits that select the currentmemory bank either are located immediately above the row－address bits for DRAM devices（mgxds values \(0-0 \times 0 \mathrm{e}\) ），or are specified by the mssbs bits for all SRAM devices in the system（mgXds value 0x0f）．The group－select bits determine the \(\overline{\text { MGSx }} / \overline{\text { RASx }}\)（which outputhe \(\overline{\text { MGS }}\) signal），and the bank－select bits determine the \(\overline{\mathrm{CAS} x}\) that activates in any given bus cycle．See Figure 20. Gating the four MGSx signals with the four CASx signals creates up to sixteen memory bank selects．See Figure 22.

A hybrid of the two modes can also be programmed by selecting MMB mode and placing the msgsm bits overlapping the banks bits．This allows using \(\overline{M G S x}\)
directly as a faster chip select for SRAM－type devices than \(\overline{C A S x}\) is in SM B mode．For DRAM－type devices， the \(\overline{C A S X}\) strobes can be connected directly to the memory device and only one NOR gate per group is required to create the RAS for that group．

\section*{D evice Requirements Programming}

Each memory group can be programmed with a unique configuration of device width，devicesize，and bus timing．After a CPU reset，the system operates in byte－wide mode，with the slowest possible bustiming， and executes from memory group zero，typically from an external PRO M．See Processor Startup，page 181. Usually，the program code in the PROM initially executes code to determine and set the proper configurations for the memory groups，I／O devices， and other requirements of the system．

\section*{Device Sizes}

Memory device sizes are programmed to one of sixteen settings in mgds．M ost currently available and soon to be available DRAM－type device sizes can be selected，as well as an SRAM－type option．The selection of the device size and width determines the arrangement of the address bits on AD［31：9］．See Table 51，page 122，and Table 52，page 123.

For DRAM ，during both RAS and CAS cycles，someor all of the high address bits are on AD above those AD used for the RAS and CAS address bits．These high address bits can be used by the application，e．g．，for decoding by external hardware to select I／O devices． On high－performance systems with fast CAS cycles， RAS cycles are often required for I／O address decod－ ing．If the external decoding hardware is sufficiently fast，however，CAS－cycle I／O is possible．

For SRAM，to allow addressing as much memory as possible with CAS cycles，the only high address bit that appears during CAS address time is A31．I／O devices can still be selected on CAS cycles by translating the device addressing bits in softw are to lower address bits， provided that these translated bits do not interfere with the desired SRAM memory addressing．The device addressing bits must be translated to those address bits that appear during SRAM access on the AD that are externally decoded for \(\mathrm{I} / \mathrm{O}\) addressing．

\section*{Programmable M emory Interface}

\section*{PSC1000 MICRO PROCESSO R}

D evice W idth
M emory device widths are either 8-bits (byte) or 32bits (cell), and are programmed using mgXds in miscb.

As shown in Table 51, cell-wide memory groups do not use A1 or A0 to address the memory device. All accesses to cell-wide devices are cell-aligned and transfer the entire cell. Memory device address lines are attached to the CPU on AD [x:11] ( \(x\) is determined by the device size).

Accesses to a byte-w ide memory group are also cellaligned and transfer all four bytes within the cell, from mostsignificantto leastsignificant(i.e., \(0,1,2,3\) ). The only exception is for an I/O-channel transfer with a one-byte byte-transfer device, in which case only one
arbitrarily addressed byte is transferred. See Bus O peration, page 157.

As shown in Table 52, byte-wide memory devices require the use of \(A 1\) and \(A 0\). Since for DRAM the RAS and CAS memory device address bits must be on the same AD, the address lines (exceptA31) are internally rotated left two bits. This properly places A0 on AD11 for connection to DRAM. This also means, how ever, that the high address bits used for I/O address decoding appear on AD differently for a byte-wide memory group than for a cell-wide memory group. Since I/O device address decoding hardware is wired to fixed AD, the address bits used to access a device are different when transferring data with a byte-wide memory device than when transferring data with a cell-wide memory device.

Table 51. RAS/CAS Address Line Configuration, Cell memory
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline Device Size & 0,1 & 0 & 1 & 2,3 & 2 & 3 & \[
\begin{gathered}
4,5 \\
6
\end{gathered}
\] & 4 & 5 & 6 & \[
\left\lvert\, \begin{gathered}
7,8 \\
9
\end{gathered}\right.
\] & 7 & 8 & 9 & 10
11,
12 & 10 & 11 & 12 & \[
\begin{aligned}
& 13, \\
& 14
\end{aligned}
\] & 13 & 14 & & 15 \\
\hline & \[
\begin{array}{|c}
64, \\
128 \\
\mathrm{~K}
\end{array}
\] & 64K & 128 & \[
\begin{gathered}
256, \\
512 \\
\mathrm{~K}
\end{gathered}
\] & \[
\begin{gathered}
256 \\
K
\end{gathered}
\] & \[
\begin{gathered}
512 \\
\mathrm{~K}
\end{gathered}
\] & \[
\begin{gathered}
1, \\
2,4 \\
M
\end{gathered}
\] & 1M & 2M & 4M & \[
\begin{gathered}
4, \\
8,16 \\
M
\end{gathered}
\] & 4M & 8M & 16M & \[
\begin{gathered}
16, \\
32,6 \\
4 M
\end{gathered}
\] & 16M & 32M & 64M & \[
\begin{array}{|c}
64, \\
128 \\
\mathrm{M}
\end{array}
\] & 64M & \[
\begin{gathered}
128 \\
\mathrm{M}
\end{gathered}
\] & & AM \\
\hline & C & RA & S & \[
\begin{aligned}
& \mathrm{C} \\
& \mathbf{A} \\
& \mathrm{~S}
\end{aligned}
\] & RA & S & \[
\begin{aligned}
& \text { C } \\
& \text { A } \\
& \text { S }
\end{aligned}
\] & & RAS & & \[
\begin{aligned}
& \mathbf{C} \\
& \mathbf{A} \\
& \mathbf{S}
\end{aligned}
\] & & RAS & & \[
\begin{aligned}
& \text { C } \\
& \text { A } \\
& \text { S }
\end{aligned}
\] & & RAS & & \[
\begin{aligned}
& \text { C } \\
& \text { A } \\
& \mathbf{S}
\end{aligned}
\] & & & C
A
S & RAS \\
\hline \#BITS \({ }^{1}\) & 8 & 8 & 9 & 9 & 9 & 10 & 10 & 10 & 11 & 12 & 11 & 11 & 12 & 13 & 12 & 12 & 13 & 14 & 13 & 13 & 14 & n/a & n/a \\
\hline AD9 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 & A0 \\
\hline AD10 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 & A1 \\
\hline AD11 & A2 & A10 & A10 & A2 & A11 & A11 & A2 & A12 & A12 & A12 & A2 & A13 & A13 & A13 & A2 & A14 & A14 & A14 & A2 & A15 & A15 & A2 & A11 \\
\hline AD12 & A3 & A11 & A11 & A3 & A12 & A12 & A3 & A13 & A13 & A13 & A3 & A14 & A14 & A14 & A3 & A15 & A15 & A15 & A3 & A16 & A16 & A3 & A12 \\
\hline AD13 & A4 & A12 & A12 & A4 & A13 & A13 & A4 & A14 & A14 & A14 & A4 & A15 & A15 & A15 & A4 & A16 & A16 & A16 & A4 & A17 & A17 & A4 & A13 \\
\hline AD14 & A5 & A13 & A13 & A5 & A14 & A14 & A5 & A15 & A15 & A15 & A5 & A16 & A16 & A16 & A5 & A17 & A17 & A17 & A5 & A18 & A18 & A5 & A14 \\
\hline AD15 & A6 & A14 & A14 & A6 & A15 & A15 & A6 & A16 & A16 & A16 & A6 & A17 & A17 & A17 & A6 & A18 & A18 & A18 & A6 & A19 & A19 & A6 & A15 \\
\hline AD16 & A7 & A15 & A15 & A7 & A16 & A16 & A7 & A17 & A17 & A17 & A7 & A18 & A18 & A18 & A7 & A19 & A19 & A19 & A7 & A20 & A20 & A7 & A16 \\
\hline AD17 & A8 & A16 & A16 & A8 & A17 & A17 & A8 & A18 & A18 & A18 & A8 & A19 & A19 & A19 & A8 & A20 & A20 & A20 & A8 & A21 & A21 & A8 & A17 \\
\hline AD18 & A9 & A17 & A17 & A9 & A18 & A18 & A9 & A19 & A19 & A19 & A9 & A20 & A20 & A20 & A9 & A21 & A21 & A21 & A9 & A 22 & A22 & A9 & A18 \\
\hline AD19 & A18 & A18 & A18 & A10 & A19 & A19 & A10 & A20 & A20 & A20 & A10 & A21 & A21 & A21 & A10 & A22 & A22 & A22 & A10 & A23 & A23 & A10 & A19 \\
\hline AD20 & A19 & A19 & A19 & A20 & A20 & A20 & A11 & A21 & A21 & A21 & A11 & A22 & A22 & A22 & A11 & A23 & A23 & A23 & A11 & A24 & A24 & A11 & A20 \\
\hline AD21 & A20 & A20 & A20 & A21 & A21 & A21 & A21 & A21 & A22 & A22 & A12 & A23 & A23 & A23 & A12 & A24 & A24 & A24 & A12 & A25 & A25 & A12 & A21 \\
\hline AD22 & A21 & A21 & A21 & A22 & A22 & A22 & A22 & A22 & A22 & A23 & A22 & A22 & A24 & A24 & A13 & A25 & A25 & A25 & A13 & A26 & A26 & A13 & A22 \\
\hline AD23 & A22 & A22 & A22 & A23 & A23 & A23 & A23 & A23 & A23 & A23 & A23 & A23 & A23 & A25 & A23 & A23 & A26 & A26 & A14 & A27 & A27 & A14 & A23 \\
\hline AD24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A27 & A24 & A24 & A28 & A15 & A24 \\
\hline AD25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A16 & A25 \\
\hline AD26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A17 & A26 \\
\hline AD27 & A27 & A27 & A 27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A18 & A27 \\
\hline AD28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A19 & A28 \\
\hline AD29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A20 & A29 \\
\hline AD30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A21 & A30 \\
\hline AD31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 \\
\hline
\end{tabular}

Notes:
1. \#BITS is the number of CAS or RAS address bits for the specified device size.

Location of DRAM CAS or RAS address bits for the specified device size.
Location of bank-select bits in MMB mode for the specified device size.
psCIOOO MICROPROCESSOR
Table 52. RAS/CAS Address Line Configuration, B yte Memory
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline Device Size & 0,1 & 0 & 1 & 2,3 & 2 & 3 & \[
\begin{gathered}
4,5 \\
6
\end{gathered}
\] & 4 & 5 & 6 & \[
\begin{array}{|c}
7,8 \\
9
\end{array}
\] & 7 & 8 & 9 & 10
11,
12 & 10 & 11 & 12 & \[
\begin{aligned}
& 13, \\
& 14
\end{aligned}
\] & 13 & 14 & \multicolumn{2}{|r|}{15} \\
\hline & \[
\begin{array}{|c}
64 \\
128 \\
\mathrm{~K}
\end{array}
\] & 64K & 128 & \[
\left\lvert\, \begin{gathered}
256, \\
512 \\
\mathrm{~K}
\end{gathered}\right.
\] & \[
\begin{gathered}
256 \\
K
\end{gathered}
\] & \[
\begin{gathered}
512 \\
K
\end{gathered}
\] & \[
\begin{gathered}
1 \\
2,4 \\
M
\end{gathered}
\] & 1M & 2M & 4M & \[
\begin{gathered}
4, \\
8,16 \\
M
\end{gathered}
\] & 4M & 8M & 16M & \[
\begin{gathered}
16, \\
32,6 \\
4 M
\end{gathered}
\] & 16M & 32M & 64M & \[
\begin{gathered}
64, \\
128 \\
\mathrm{M}
\end{gathered}
\] & 64M & \[
\begin{gathered}
128 \\
\mathrm{M}
\end{gathered}
\] & \multicolumn{2}{|r|}{SRAM} \\
\hline & C & \multicolumn{2}{|r|}{RAS} & \[
\begin{aligned}
& \text { C } \\
& \text { A } \\
& \text { S }
\end{aligned}
\] & \multicolumn{2}{|r|}{RAS} & \[
\begin{aligned}
& \mathrm{A} \\
& \mathbf{S}
\end{aligned}
\] & \multicolumn{3}{|c|}{RAS} & \[
\begin{aligned}
& \mathrm{C} \\
& \mathbf{A} \\
& \mathrm{~S}
\end{aligned}
\] & \multicolumn{3}{|c|}{RAS} & \[
\begin{aligned}
& \text { C } \\
& \text { A } \\
& \mathbf{S}
\end{aligned}
\] & \multicolumn{3}{|c|}{RAS} & \[
\begin{aligned}
& \text { C } \\
& \text { A } \\
& \text { S }
\end{aligned}
\] & \multicolumn{2}{|r|}{RAS} & C
A
S & RAS \\
\hline \#BITS & 8 & 8 & 9 & 9 & 9 & 10 & 10 & 10 & 11 & 12 & 11 & 11 & 12 & 13 & 12 & 12 & 13 & 14 & 13 & 13 & 14 & n/a & n/a \\
\hline AD9 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 & A29 \\
\hline AD10 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 & A30 \\
\hline AD11 & A0 & A8 & A8 & AO & A9 & A9 & A0 & A10 & A10 & A10 & A0 & A11 & Al1 & A11 & A0 & A12 & A12 & A12 & A0 & A13 & A13 & A0 & A9 \\
\hline AD12 & A1 & A9 & A9 & A1 & A10 & A10 & A1 & A11 & A11 & A11 & A1 & A12 & A12 & A12 & A1 & A13 & A13 & A13 & A1 & A14 & A14 & A1 & A10 \\
\hline AD13 & A2 & A10 & A10 & A2 & Al1 & A11 & A2 & A12 & A12 & A12 & A2 & A13 & A13 & A13 & A2 & A14 & A14 & A14 & A2 & A15 & A15 & A2 & A11 \\
\hline AD14 & A3 & A1 & A1 & A3 & A12 & A12 & A3 & A13 & A13 & A13 & A3 & A14 & A14 & A14 & A3 & A15 & A15 & A15 & A3 & A16 & A16 & A3 & A12 \\
\hline AD15 & A4 & A12 & A12 & A4 & A13 & A13 & A4 & A14 & A14 & A14 & A4 & A15 & A15 & A15 & A4 & A16 & A16 & A16 & A4 & A17 & A17 & A4 & A13 \\
\hline AD16 & A5 & A13 & A13 & A5 & A14 & A14 & A5 & A15 & A15 & A15 & A5 & A16 & A16 & A16 & A5 & A17 & A17 & A17 & A5 & A18 & A18 & A5 & A14 \\
\hline AD1 & A6 & A1 & A1 & A6 & A15 & A15 & A6 & A16 & A16 & A16 & A6 & A17 & A17 & A17 & A6 & A18 & A18 & A18 & A6 & A19 & A19 & A6 & A15 \\
\hline AD18 & A7 & A15 & A15 & A7 & A16 & A16 & A7 & A17 & A17 & A17 & A7 & A18 & A18 & A18 & A7 & A19 & A19 & A19 & A7 & A20 & A20 & A7 & A16 \\
\hline AD19 & A16 & A16 & A16 & A8 & A17 & A17 & A8 & A18 & A18 & A18 & A8 & A19 & A19 & A19 & A8 & A20 & A20 & A20 & A8 & A21 & A21 & A8 & A17 \\
\hline AD20 & A17 & A17 & A17 & A18 & A18 & A18 & A9 & A19 & A19 & A19 & A9 & A20 & A20 & A20 & A9 & A 21 & A21 & A21 & A9 & A 22 & A22 & A9 & A18 \\
\hline AD21 & A18 & A18 & A18 & A19 & A19 & A19 & A19 & A19 & A20 & A20 & A10 & A21 & A21 & A21 & A10 & A22 & A22 & A22 & A10 & A23 & A23 & A10 & A19 \\
\hline AD22 & A19 & A19 & A19 & A20 & A20 & A20 & A20 & A20 & A20 & A21 & A20 & A20 & A22 & A22 & A11 & A23 & A23 & A23 & A11 & A24 & A24 & A11 & A20 \\
\hline AD23 & A20 & A20 & A20 & A21 & A21 & A21 & A21 & A21 & A21 & A21 & A21 & A21 & A21 & A23 & A21 & A21 & A24 & A24 & A12 & A25 & A25 & A12 & A21 \\
\hline AD24 & A21 & A21 & A21 & A22 & A22 & A22 & A22 & A22 & A22 & A22 & A22 & A22 & A22 & A22 & A22 & A22 & A22 & A25 & A13 & A22 & A26 & A13 & A22 \\
\hline AD25 & A22 & A22 & A22 & A23 & A23 & A23 & A23 & A23 & A23 & A23 & A23 & A23 & A23 & A23 & A23 & A23 & A23 & A23 & A14 & A23 & A23 & A14 & A23 \\
\hline AD26 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A24 & A15 & A24 \\
\hline AD27 & A25 & A 25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A 25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A25 & A16 & A25 \\
\hline AD28 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A26 & A17 & A26 \\
\hline AD29 & A27 & A 27 & A27 & A27 & A27 & A 27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A27 & A18 & A27 \\
\hline AD30 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A28 & A19 & A28 \\
\hline AD31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 & A31 \\
\hline
\end{tabular}

Notes: 1. \#BITS is the number of CAS or RAS address bits for the specified device size.

Location of DRAM CAS or RAS address bits for the specified device size.
Location of bank-select bits in MMB mode for the specified device size.

Programmable Timing
The timing for RAS and CAS cycles on each memory group，as well as data setup and hold times for each I／O channel，is programmable．Depending on the parameter，timing granularity is in either CPU－clock cycles or 2X－CPU－clock cycles．In some cases，timing is specified in CPU－clock cycles with a modifier available to advance the event by one 2X－CPU－clock cycle．

In all cases，the hardw are actually counts time in CPU－ clock cycle granules and then delays or advances the signal transition by any 2X－CPU－clock granularity timing specified．If the rate of the external clock is changed during operation，2X－CPU－clock granularity timing generated by the 2X－CPU－clock PLL must not be in effect during the time of the change because the PLL cannot track the change and improper clock cycles will be generated．Simply program the timing to measure an integral number of CPU－clock cycles． See 72，page 204.

Timing specification is broken into three pieces：RAS prefix，basic CAS cycle，and CAS extension／expansion timing．All CAS cycles consist of the basic CAS cycle timing and the appropriate CAS extension／expansion timing．This combination is referred to asthe CAS part of the memory cycle．All RAS cycles consist of a RAS prefix plus a CAS part．Bustransactions of multiple bus cycles are simply the required sequence of RAS prefixes and CAS parts in immediate succession．O nly discrete read cycles or write cycles are performed； read－modify－write cycles are not performed．

To gain access to the bus，the bus address must be transferred to the MIF and a check made to see if the bus is available for the time required to complete the bus transaction．This bus request process takes two CPU－clock cycles at the beginning of each bus transaction．Memory－reference MPU and VPU instructions always overlap one cycle of instruction execution with the bus request process．DM A opera－ tion can overlap both cycles of the bus request process with a preceding MPU bus transaction．Thus，except for DMA overlapped with an MPU bus transaction， there are two inactive CPU－clock cycles on the bus preceding each bustransaction．Instruction execution
times listed herein include the bus access and programmed bus transaction time as part of the entire memory reference time．

\section*{RAS Prefix Timing}

This timing for a memory group is specified by programming the fields in the corresponding mgXrasbt．The RAS prefix of a RAS cycle consists of a leading CPU－clock cycle；the \(\overline{R A S}\) inactive portion， also referred to as RAS precharge（mgbtras）；and the RAS address hold time（mgbt rhld）．The last two are modified by the early RAS bit（mgbteras）．For computation of the RAS－cycle duration，mgbtrast must contain the sum of mgbtras and mgbtrhld plus one．D uring thistime the DRAM RAS address bits， high address bits，and bit outputs are on AD．See Figure 42，page 149.

\section*{CAS Part Timing}

This timing for a memory group is specified by programming the fields in mgXcasbt and mgXebt． The CAS part of the cycle begins with the timing for the \(\overline{\mathrm{CAS}}\) inactive portion，also referred to as CAS precharge（mgbtcas）．Next is the CAS address hold time／beginning of data time（mgbtdob），when \(\overline{D O B}\) ， and possibly \(\overline{O E}\) or \(\overline{L W E}\) ，go active．Then \(\overline{C A S}, \overline{D O B}\) ， and either \(\overline{O E}\)（if a memory read）or both EWE and \(\overline{L W E}\)（if a memory write）go inactive again （mgbtcast）．To accommodate longer data setup and buffer delay times，the CAS cycle can be expanded at \(\overline{\mathrm{DOB}}\) fall（mgebt dobe）．To accommodate longer data hold and output buffer disable times，the CAS strobes can be extended following \(\overline{\mathrm{D} \overline{O B}}\) inactive （mgebtcase）．Memory write cycles can be pro－ grammed to have \(\overline{E W E}\) go active either at the begin－ ning of the CAS cycle（before \(\overline{\text { RAS }}\) rise if a RAS cycle） or at \(\overline{C A S}\) fall（mgbtewea）．Similarly，\(\overline{L W E}\) can be programmed to go active either at \(\overline{\mathrm{DOB}}\) fall plus expansion or at \(\overline{\mathrm{DOB}}\) fall plus expansion plus one 2 X － CPU－clock cycle（mgbtlwea）．EWE generally accommodates SRAM－type devices and LWE accom－ modates DRAM－type devices．Further，\(\overline{\mathrm{DOB}}\) going inactive tracks \(\overline{E W E} / \overline{L W E}\) or \(\overline{O E}\) ，either of which can be made to go inactive earlier than the unextended CAS time by one 2X－CPU－clock cycle（mgbtewe and mgbteoe）．For computation of CAS－cycle duration， mgbt cast is added to mgebt sum，the latter of which

\section*{Programmable M emory Interface}

\section*{PSC1000 MICRO PRO CESSO R}
must contain the sum of mgebtdobe and mgebtcase. See Figure 41, page 147, and Figure 40, page 146.

W hen M PU bus transactions or VPU instruction-fetch bus transactions occur, the bus cycle timing for the memory group uses the values in mgXebt, as described above. W hen an I/O channel bus transaction occurs, the values in ioXebt for the appropriate I/O channel are substituted for the mgXebt values. The ioXebt values must be programmed to accommodate any memory group that might be involved in the transfer, as well as the I/O device.

DRAM Refresh
DRAM requires periodic accesses to each row within the memory device to maintain the memory contents. M ostDRAM devices supportseveral modes of refresh, including the RAS-only refresh mode supplied by the VPU instruction refresh. The VPU must be programmed to execute refresh at intervals short enough for the most restrictive DRAM in the system. The timing during the refresh cycle uses the RAScycle timing of the memory group indicated by msrtg, which must be long enough for the slowest DRAM refresh cycle in the system. Refresh on each memory group can be individually enabled or disabled. See Figure 33, page 139.
msra contains data used during each refresh cycle. refresh incrementsthel4-bitrow addressin msrra after the refresh cycle completes. The address bits in msra31 and msrha are normally zero, but can be
written if the zero values interfere with other system hardware during refresh cycles.

Video RAM Support Special VRAM operating modes are supported through the use of vram. See Figure 32, page 137, and Table 37, page 37. M any VRAM modes use a RAS cycle to set an operating state in the VRAM device that persists until the nextRAS cycle occurs on that VRAM device. U nexpected RAS cycles can thus cause undesirable results.

Refresh cycles are one source of unexpected RAS cycles; these can be disabled on groups containing VRAM by setting the appropriate mgXrd bits. See Figure 33, page 139.

Changes in the high address bits are a second source of unexpected RAS cycles; these can be prevented from occurring on memory group msvgrp by setting msevhacr. The high address bits are typically used for I/O device addresses, and require a RAS cycle when these bits change if mshacd is clear. An I/Ochannel transfer immediately prior to a VRAM group access is an example of such an occurrence. The RAS cycle might be required for proper system operation, but the VRAM group can be prevented from receiving the RAS cycle by setting msevhacr. The RAS precharge portion of the cycle will occur on RAS and \(\overline{\text { RAS }}\), butnoton the \(\overline{M G S x} / \overline{R A S x}\) of the VRAM group. Note that if more than one memory group is used forVRAM then this protection is not effective. See Figure 39, page 145.

32－BIT RISC PROCESSO R

\section*{System Requirements Programming}

RAS Cycle Generation
RAS cycles are primarily required to bring new row addresses onto AD for DRAM－type devices．They are also required，in certain instances，to ensure tempo－ rally deterministic execution of the VPU，or to ensure correctoperation after certain events．The M IF handles these cases automatically．RAS cycles can also be configured to occur in order to supply additional time for decoding I／O addresses，for example．Since RAS cycles generally take considerably longer than CAS
cycles，it is desirable to minimize their use．The various sources of RAS cycles are listed in Table 53， page 126.

When the current and previous addresses are com－ pared to determine if a RAS cycle is required，the MIF uses the following rules：
－The currentDRAM RAS address bitsare compared to those from the mostrecent RAS cycle on the current memory group．If the bits are different，a RAS cycle occurs．

Table 53．Sources of RAS cycles
\begin{tabular}{|c|c|l|l|l|c|}
\hline Group & Access & \multicolumn{1}{|c|}{ Reason } & & Configuration & Requirement
\end{tabular} \(\left.\begin{array}{c}\text { MPU } \\
\text { DMA } \\
\text { VPU }\end{array}\right]\)

KEY：
all－any group or device with which the event might occur
pgm－any group programmed for the event to occur
any－any arbitrary access creating the specified condition
first－first access on each specified group after the specified event
S－might be required by system hardware
C－might be required for correct operation of devices
T－required for temporally deterministic VPU execution

\section*{Programmable M emory Interface}

\section*{PSC1000 MICRO PROCESSO R}
- The middle address bits are not compared (see Figure 20, page 118). The middle address bits are: for DRAM , above the RAS address bits up to and including msgsm; for SRAM, from A22 up to and including msgsm . If msgsm is zero there are no middle address bits in either case. If msgsm includes A31, A31 becomes part of the high address bits and is optionally compared.
- The current high address bits are compared to those from the most recent RAS cycle, depending on the configuration options discussed below. The location of the high address bits depends on msgsm. See Figure 37, page 143.

Three high-address-bit configuration options are available to minimize the occurrence of RAS cycles caused by high-address-bit comparisons.
- The high address bits are typically used for I/O device addresses, and thus when they change, a RAS cycle might be required for their proper decoding by external hardware. The high address bits can be excluded from RAS-cycle determination by setting the memory system high-address-bit compare disable (mshacd). See Figure 33, page 139.
- During bus transactions between four-byte bytetransfer devices and cell memory or betw een one-cell cell-transfer devices and byte memory, A31is passed (taken from the global register, usually set) or cleared (by the MIF) to select or deselect the I/O device when required. Decoding A31 externally for this purpose can be done more quickly than a full address decode, so this separate option is available. A31 can be included in or excluded from the high-address-bit compare (msexa31hac). See Figure 39, page 145.
- In systems that require a RAS cycle to decode I/O device addresses but not to decode changes in A31 (mshacd clear and msexa31hac set), it might be necessary for the memory address bits and I/O addressing bits to overlap if the system contains a large amount of memory and I/O devices. This can prevent a RAS cycle from occurring because some of the overlapped address bits do not cause a RAS (middle address bits), or do not require a RAS (DRAM RAS address bits), even though they changed from the last
system RAS cycle. In this case, a RAS can be forced to ensure that I/O device addresses are decoded by setting A31 (msras31d clear). This option can also be useful any other time forcing a RAS cycle is desirable.

\section*{Driver Current}

The drive capability of all the package output drivers is programmable. See Figure 50, page 154.

Memory Faults
Virtual memory page-fault detection is enabled through mflt_enable in mode. The memory fault input can either come from AD8 or \(\overline{\mathrm{MF} \mathrm{LT}}\), depending on the state of pkgmflt. See Figure 39, page 145.

\section*{I/O -Channel Programming}

As previously discussed, the normal memory-group bus timing is changed during an I/O-channel bus transaction by substituting the values in the corresponding ioXebt for the values in mgXebt for the memory group involved. This allows each I/O channel to be programmed to meet the requirements of the device. The ioxebt values must be adequate for the I/O device, as well as any memory group with which a data transfer might occur. See Figure 43, page 150.

In addition to timing, the type of transfer on each I/O channel can be specified in iodtta or iodttb. Transfers can either be one byte or four bytes per transaction for byte-wide devices, or one cell per transaction for cell-wide devices. Four-byte bytetransfer devices might contend for the bus less often than one-byte byte-transfer devices, and thus can transfer data more efficiently. Also, with cell-wide memory, four-byte byte transfers are cell-aligned and pack the data into the memory cells, whereas one-byte byte transfers place only one byte per memory cell. See Bus O peration, page 157.

See Direct M emory Access Controller, page 103, for other I/O-channel transfer options.

\section*{PSC1000 M icroprocessor}

32-BIT RISC PROCESSO R


Note: \(\overline{\mathrm{DOB}}\) rise tracks \(\overline{\mathrm{OE}}\) or \(\overline{\mathrm{EWE}}\) and \(\overline{\mathrm{LWE}}\) rise.
bustime.wpg
Figure 23. Programmable Bus Timing Reference

\section*{O n-Chip Resource Registers}

\section*{PSC1000 MICRO PROCESSO R}

\section*{On-Chip Resource Registers}

The on-chip resource registers comprise portions of various functional areas on the CPU including the MPU, VPU, DMAC, INTC, MIF, bit inputs, and bit outputs. The registers are addressed from the M PU in their own address space using the instructions ldo [ ] and sto[] at the register level, or ldo.i[] and sto.i [ ] at the bit level (for those registers that have bit addresses). On other processors, resources of this
type are often either memory-mapped or opcodemapped. By using a separate address space for these resources, the normal address space remains uncluttered, and opcodes are preserved. Exceptas noted, all registers are readable and writeable. Areas marked "Reserved Zeros" contain no programmable bits and always return zero. Areas marked "Reserved" contain unused programmable bits. Both areas might contain functional programmable bits in the future.


Figure 24. On-Chip Resource Registers

32-BIT RISC PROCESSO R


Figure 25. Example On-Chip Register Diagram

The first several registers are bit addressable in addition to being register addressable. This allows the MPU to modify individual bits without corrupting other bits that might be changed concurrently by the VPU, DMAC, or INTC logic.

Bus activity must be prevented to avoid a possible invalid bus cycle when changing the value in any register that affects the bus configuration or timing of a bus cycle that might be in progress. Bus activity can be prevented by ensuring:
- no DMA requests are serviced,
- the VPU does not seize the bus (because vpudelay goes to zero),
- no writes are posted, and
- pre-fetch does not occur.

This is typically not a problem because most changes are made just after power-up when no DMA or VPU activity of concern is occurring. Posted writes can be
ensured complete by ensuring an MPU memory access (such as an instruction fetch) occurs after the write is posted.

The diagrams that follow use a \{\}notation that depicts the decoded set of values represented by ordinal values within the corresponding bit field. The full range of values possible on a bit field are always depicted. Thus \(\{1,2,3,4\}\) is only be possible on a two-bit-wide field. In this case, a zero in the field represents a one value, a one in the field represents a two value, and so on through the list. Note that not all sets are consecutive numbers, such as \(\{0,1,2,4\}\). Also note that references in the text to usage of a field imply the decoded value represented by the field, not the ordinal values, e.g., references to mgbt ras in the example imply the decoded values 1-16 and not the ordinal values \(0-15\) programmed into the field.

\section*{O n-Chip Resource Registers}

\section*{PSC1000 MICRO PROCESSO R}


Figure 26. Bit Input Register

Contains sampled data from \(\overline{\operatorname{IN}}[7: 0]\) or \(\operatorname{AD}[7: 0]\), depending on the value of pkgio. io in is the source of inputs for all consumers of bit inputs. Bits are zeropersistent: once a bit is zero in io in it stays zero until consumed by the VPU , DM AC, or INTC, or written by the MPU with a one. Under certain conditions bits become not zero-persistent. See Bit Inputs, page 111.

The bits can be individually read, set and cleared to prevent race conditions between the MPU and other CPU logic.

32-BIT RISC PROCESSO R


Figure 27. Interrupt Pending Register

Contains interrupt requests that are waiting to be serviced. Interrupts are serviced in order of priority (0 = highest, 7 = low est). An interrupt request from an I/O-channel transfer or from int occurs by the corresponding pending bit being set. Bits can be set or cleared to submit or withdraw interrupt requests. When an ioip bit and corresponding ioie bit are
set, the corresponding ioin bit is notzero-persistent. See Interrupt Controller, page 107.

The bits can be individually read, set and cleared to prevent race conditions between the M PU and INTC logic.

\section*{O n-Chip Resource Registers}

PSC1000 MICRO PROCESSO R


Figure 28. Interrupt Under Service Register

Contains the current interrupt service request and those that have been temporarily suspended to service a higher-priority request. W hen an ISR executablecode vector for an interrupt request is executed, the ioius bit for that interrupt request is set and the corresponding ioip bit is cleared. When an ISR executes reti, the highest-priority interrupt underservice bit is cleared. The bits are used to prevent
interrupts from interrupting higher-priority ISRs. W hen an ioius bit and corresponding ioie bit are set, the corresponding ioin bit is not zero-persistent. See Interrupt Controller, page 107.

The bits can be individually read, set and cleared to prevent race conditions between the M PU and IN TC logic.

PSC1000 M icroprocessor
32-BIT RISC PROCESSO R


Figure 29. Bit Output Register

Contains the bits from MPU and VPU bit-output operations. Bits appear on OUT [7:0] immediately after writing and on \(A D[7: 0]\) while \(\overline{R A S}\) is inactive. See Bit O utputs, page 115.

The bits can be individually read, set and cleared to prevent race conditions between the M PU and VPU .

\section*{O n-Chip Resource Registers}

PSC1000 MICRO PROCESSO R


Figure 30. Interrupt E nable Register

If the corresponding iodmae bit is not set, allows a corresponding zero bit in ioin to request the corresponding interruptservice. W hen an enabled interrupt request is recognized, the corresponding ioip bit is set and the corresponding io in bit is no longer zeropersistent. See Interrupt Controller, page 107.

The bits can be individually read, set and cleared. Bit addressability for this register is an artifact of its position in the address space, and does not imply any race conditions on this register can exist.

\section*{PSC1000 M icroprocessor}

32-BIT RISC PROCESSO R

\section*{AO iodmae DMA Enable Register}


Figure 31. DMA Enable Register

Allows a corresponding zero bit in io in to request a DMA I/O-channel transfer for the corresponding I/O channel. When an enabled DMA request is recognized, the corresponding zero bit in ioin is set. If the corresponding iodmaex bit is set, the iodmae bit is cleared (to disable further DMA
requests from that channel) when an I/O-channel transfer on that channel accesses the last location in a 1024-byte memory page. See Direct Memory Access Controller, page 103. W hen a iodmae bit is set, the corresponding ioie bit is ignored.

\title{
On-Chip Resource Registers
}

\section*{PSC1000 MICRO PROCESSOR}
\begin{tabular}{|c|c|c|c|c|c|c|c|c|}
\hline CO vram & \multicolumn{8}{|l|}{VRAM Control Bit Register} \\
\hline \multirow[t]{3}{*}{31} & & & 7 & 6 & 4 & 2 & 1 & 0 \\
\hline & \multicolumn{2}{|r|}{Reserved Zeros} & & & & & & \\
\hline & Mnemonic msvgrp dsfvcas dsfuras casbvras wevras oevras & \begin{tabular}{l}
Description \\
memory system VRAM group [3] \\
state of \(\overline{\text { DSF }}\) at VRAM \(\overline{\text { CAS }}\) fall [0] state of \(\overline{\mathrm{DSF}}\) at next VRAM \(\overline{\text { RAS }}\) fall [0] \(\overline{\text { CAS }}\) fall before \(\overline{\text { RAS }}\) next VRAM RAS [0] \(\overline{L W E}\) low at next VRAM RAS fall [0] \(\overline{\mathrm{OE}}\) low at next VRAM \(\overline{\text { RAS }}\) fall [0]
\end{tabular} & & &  & & & \\
\hline
\end{tabular}

Figure 32. VRAM C ontrol B it Register

These bits control the behavior of \(\overline{\mathrm{OE}}, \overline{\mathrm{LWE}}\), the CASes, and DSF at \(\overline{R A S}\) fall time; they also control the behavior of DSF at \(\overline{C A S}\) fall time. They can be used in any combination to activate the various modes on VRAMs.

The bits from vram move through a hidden register prior to controlling the memory strobes during a subsequent MPU memory cycle. The bits stored for msvgrp in the hidden register determine which memory group is the current VRAM memory group, whose strobes are affected by the accompanying data in the hidden register. The hidden register is locked once data has been transferred into it from vram until an MPU access to the VRAM memory group occurs, thus consuming the data in the hidden register.

When a sto [] to vram occurs and the hidden register is not currently locked, the data from vram is transferred into the hidden register immediately if a posted write (to any memory group) is not waiting or in process, or at the end of the posted write if a posted write is waiting or in process. When a sto [] to vram occurs and the hidden register is already locked, the data in vram is not transferred (and is replaceable) until after the next access to the VRAM memory group occurs. The next access to the VRAM memory group uses the data in the hidden register, and when the memory access is complete, the data in vram is transferred to the hidden register.

O nly M PU memory accesses have an effect on vram or the hidden register. Immediately after transferring vram to the hidden register, dsfvras, casbvras, wevras, and oevras in vram are cleared. After the VRAM group access, additional CAS or RAS cycles can occur on the VRAM memory group without rewriting the register, and use the current (cleared) vram data. W hen writes to vram are paired with one or more accesses to the VRAM memory group of the required RAS or CAS type, the internal operations described above are transparent to the program. N ote that RAS precharge must be at least three CPU-clock cycles in duration for proper VRAM operation. See Video RAM Support, pages 37, 125, and 161.

\section*{msvgrp}

Specifies the memory group containing theVRAM that is controlled by this register. VPU and MPU instructions must not be fetched from the memory group used for VRAM because the VRAM operations will likely occur on an instruction-fetch bus transaction rather than the intended VRAM transaction.

\section*{dsfvcas}

Contains the state applied to DSF at the start of the next CAS-part of a memory cycle on the VRAM memory group. The bit is persistent and is not automatically cleared after being transferred to the hidden register. DSF is low when not accessing the VRAM memory group.

32-BIT RISC PROCESSO R

\section*{dsfvras}

Contains the state applied to DSF two CPU-clock cycles after \(\overline{\text { RAS }}\) rises during the next RAS cycle on the VRAM memory group. DSF changes to the dsfvcas state at the expiration of the row-address hold time. The bitis automatically cleared after being transferred to the hidden register.

\section*{casbvras}

If set, during the nextRAS cycleon the VRAM memory group all CAS signal sare active two CPU-clock cycles after \(\overline{\text { RAS }}\) rises, and are inactive at the normal expiration time. \(\overline{O E}, \overline{E W E}\) and \(\overline{L W E}\) go inactive at the expiration of the row-address hold time. The next access to the memory group msvgrp is forced by internal logic to be a RAS cycle.

N ote that since all read and write strobes are inactive throughout their normally active times during the bus cycle, no data I/O with memory can occur. The data associated with the ST or LD used to cause the cycle is lost or undefined. The casbvras bit is automatically cleared after being transferred to the hidden register.

\section*{wevras}

If set, LWE is low two CPU-clock cycles after \(\overline{\text { RAS }}\) rises during the next RAS cycle on the VRAM memory group, and is high at the expiration of the row-address hold time. O therwise, \(\overline{\mathrm{LWE}}\) is high until the expiration of the row-address hold time during the next RAS cycle on the VRAM memory group. In either case, during the CAS portion of the cycle LWE behaves normally and the data transferred is part of the function performed. The bit is automatically cleared after being transferred to the hidden register.

\section*{oevras}

Ifset, \(\overline{O E}\) is low two CPU -clock cycles after \(\overline{R A S}\) rises during the next RAS cycle on the VRAM memory group, and is high at the expiration of the row-address hold time. O therwise, \(\overline{\mathrm{OE}}\) is high until the expiration of the row-address hold time during the next RAS cycle on the VRAM memory group. In either case, during the CAS portion of the cycle \(\overline{\mathrm{OE}}\) behaves normally and the data transferred is part of the function performed. The bit is automatically cleared after being transferred to the hidden register.

\section*{On-Chip Resource Registers}

\section*{PSC1000 MICRO PROCESSOR}


Figure 33. Miscellaneous A Register
mgXrd
Allows (if clear) or prevents (ifset) a refresh cycle from occurring on the corresponding memory group when refresh executes. Allowing refresh on some memory groups can be undesirable or inappropriate. For example, the primary side effect of refresh is that the current row address latched in the memory device is changed. This can be undesirable on VRAM devices when a RAS cycle sets persistent operational modes and addresses. A nother refresh side effect is that the next memory cycle to the memory group is a RAS cycle to re-select the operational memory row. This is usually undesirable in SRAM because refresh is not required; the refresh and RAS cycles only slow execution, or make otherwise predictable timing unpredictable.

\section*{msras31d}

If set, allows non-RAS cycles when A31 is a one. If clear, forces a RAS cycle on both one-bus-cycle transactions and the first cycle of four-bus-cycle byte transactions when A31 is a one. In large memory systems in which the I/O-device addressing bits overlap the group, bank, or DRAM RAS bits, this option forces a RAS cycle when one might not otherw ise occur because these various bits either are
excluded from the RAS comparison logic or could inadvertently match the I/O-device address bits. RAS cycles might be required by system design to allow enough time for I/O decode and select. A31 is used in selecting I/O addresses.

\section*{mshacd}

If clear, enables the comparison of the high address bits to those of the mostrecentRAS cycle to determine if a RAS cycle mustoccur. If set, disables this comparison. These bits aretypically used for I/O addresses that require external decoding logic which might require the additional time available in a RAS cycle for this decoding. However, with high-speed logic it is often possible to decode the I/O address in the time available within a CAS cycle, thus speeding I/O access. A31 can be excluded from the high-address-bit compare by setting msexa31hac.

\section*{msrtg}

Containsthe number of the memory group whose RAS cycle timing is to be used for refresh cycles produced by refresh. The memory group specified must be the group with the most-restrictive (slowest) refresh timing.
100 miscb Miscellaneous B Register
\begin{tabular}{|c|c|c|c|c|c|c|c|c|}
\hline 31 & & & 7 & 5 & & 2 & 1 & 0 \\
\hline \multicolumn{9}{|c|}{Reserved Zeros} \\
\hline & \begin{tabular}{l}
Mnemonic \\
mmb \\
fdmap \\
pkgio \\
oed \\
mg3bw \\
mg2bw \\
mg1bw \\
mg0bw
\end{tabular} & \begin{tabular}{l}
Description \\
multiple memory bank［0］ fixed DMA priorities［0］ package has I／O pins［0］ \(\overline{\mathrm{OE}}\) disable［1］ memory group 3 byte wide［1］ memory group 2 byte wide［1］ memory group 1 byte wide［1］ memory group 0 byte wide［1］
\end{tabular} &  & & & & & \\
\hline
\end{tabular}

Figure 34．Miscellaneous B Register
mmb
If clear，selects Single M emory Bank（SM B）mode for all memory groups．\(\overline{\text { RASx }}\) signals appear on the corresponding package pins．Bank－select bits corre－ spond with the msgsm bits．Up to four memory banks （i．e．，one memory bank per memory group）can be directly connected and accessed．See Figure 21，page 118.

If set，selects M ultiple M emory Bank（M M B）mode for all memory groups．\(\overline{M G S x}\) signals appear on the corresponding package pins．Bank－select bits are located immediately above the DRAM RAS bits，or for SRAM in the mssbs location．Up to sixteen memory banks（i．e．，four banks per memory group）can be connected with 1.25 two－input gates per bank．W ith additional inputs per gate and additional decoding，an arbitrarily large number of memory banks can easily be connected．See Figure 22，page 119.

\section*{fdmap}

D M A requests contend for the bus；the highest－priority request gets the first chance at access．If vpudelay is large enough to allow bus access by the highest－ priority request，the bus is granted to the device．

If fdmap is set and vpudelay is too small for the highest－priority DMA request，the DMA request does
not get the bus．U nless a higher－priority DMA request occurs that fits the shrinking available bus slot，no bus transactionsoccur until the VPU seizesthe bus．W hen the VPU next executes delay，the highest－priority DMA request－or the MPU if there are no DMA requests－repeats the bus request process．

If fdmap is clear and vpudelay is too small for the highest－priority D M A request，the request does not get the bus．The next lower－priority bus request is then allowed to request the bus，with the MPU as the lowest－priority request．The process repeats until the bus is granted or the VPU seizes the bus．When the VPU next executes delay，the highest－priority DMA request－or the MPU if there are no DMA requests－repeats the bus request process．

\section*{pkgio}

If set，inputs to ioin are taken from \(\overline{\mathrm{IN}}[7: 0]\) ．If clear，inputs are taken from \(A D[7: 0]\) when \(\overline{R A S}\) is low and \(\overline{\mathrm{CAS}}\) is high．See Bit Inputs，page 111.
oed
Ifset，disables \(\overline{\mathrm{OE}}\) from going active during buscycles． If clear，\(\overline{\mathrm{OE}}\) behaves normally．OnCPU reset，the \(\overline{\mathrm{OE}}\) signal is disabled to prevent conventionally connected memory from responding；this allows booting from a device in I／O space．See Processor Startup，page 181.

\section*{On-Chip Resource Registers}

PSC1000 MICRO PROCESSOR
mgXbw
If clear, the corresponding memory group is cell-w ide and is read and w ritten 32-bits per buscycle. If set, the corresponding memory group is byte-wide and is read
and written in a single bus transaction of four bus cycles, one byte per cycle.

\section*{PSC1000 M icroprocessor}

32－BIT RISC PROCESSO R

\section*{120 mfltaddr Memory Fault Address Register \\ Memory Fault Address \\ Register is read－only．Reading mfltaddr after a memory fault releases the data lock on mfltaddr and mfltdata，allowing data to flow into the registers．［0］}

Figure 35．Memory Fault Address Register

W hen a memory page－fault exception occurs during a memory read or write，mfltaddr contains the

\section*{140 mfltdata Memory Fault Data Register} mfltdata，allowing data to flow into the registers．［0］

Figure 36．Memory Fault Data Register

When a memory page－fault exception occurs during a memory write，mfltdata contains the data to be stored at mfltaddr．The contents of mfltdata and
read of mfltaddr after the fault．After reading mfltaddr，the data in mfltaddr and mfltdata are no longer valid．
\(31 \quad 0\) Memory Fault Data

Register is read－only．Reading mfltaddr after a memory fault releases the data lock on mfltaddr and
mfltdata are latched until the first read of mfltaddr after the fault．

\section*{O n-Chip Resource Registers}

PSC1000 MICRO PROCESSO R
\begin{tabular}{|llll|}
\hline 160 msgsm & Memory System Group Select Mask Register \\
\hline 31 & \multicolumn{2}{c|}{16} & \\
\hline Reserved Zeros & Memory System Group-Select Mask \\
\hline
\end{tabular}

Contains zero, one, or two adjacent bits to determine which, if any, of the upper 16 address bits will be decoded to select memory groups. [0]

Figure 37. Memory System Group-Select Mask Register

Containszero, one, ortwo adjacentbits that locate the memory group-select bits between A16 and A31.

When no bits are set, all memory accesses occur in memory group zero. The memory system high address bits occur in the address bits: for DRAM, above the memory group zero DRAM RAS address; for SRAM, above A21.

When one bit is set, it determines the address bit that selects accesses between memory group zero and memory group one. The memory system high address bits occur in the address bits higher than the bit selected, but always include A31.

When two adjacent bits are set, they are decoded to selectone of four memory groups that is accessed. The memory system high address bits occur in the address bits higher than the bits selected, but always include A31.

PSC1000 M icroprocessor
32-BIT RISC PROCESSO R


Figure 38. Memory Group Device Size Register

Contains 4-bit codes that select the DRAM address bit configuration, or SRAM , for each memory group. The code determines which bits are used during RAS and CAS addressing and which bits are compared to
determine if a RAS cycle is required (due to the DRAM row address changing). See Table 51, page 122, and Table 52, page 123.

\section*{On-Chip Resource Registers}

\section*{PSC1000 MICRO PROCESSO R}


Figure 39. Miscellaneous C Register

\section*{pkgmflt}

If set, the memory-fault input is sampled from \(\overline{\mathrm{MFLT}}\). If clear, the memory-fault input is sampled from AD8 when \(\overline{\text { RAS }}\) falls. See Figure 77, page 212.
mspwe
If set, enables a one-level MPU posted-write buffer, which allows the M PU to continue executing after a write to memory occurs. A posted write has precedence over subsequent MPU reads to maintain memory coherency. If clear, the M PU must wait for \(w\) rites to complete before continuing.

\section*{msexvhacr}

If set, RAS cycles do not occur in the memory group msvgrp when due to a high-address-bitcomparison. This prevents unexpected RAScycles (typically caused by a DMA or VPU initiated bus transaction) from causing a VRAM operation.

\section*{msexa31hac}

If set, A31 is not included in the high-address-bit compare. If clear, A 31 is included in the high-address-
bit compare. See mshacd for more information. The high address bits are typically used for I/O addresses, and require external decodinglogic thatmightrequire the additional time available in a RAS cycle for decoding. Some bustransactions contain adjacentbus cycles whose high address bits differ by only the state of A31, and could thus require a RAS cycle due solely to the change in this bit. However, some system designs can decode the A31 change in the time available in a CAS cycle, thus speeding I/O access. If this bit is set a RAS cycle does not occur if only address bit A31 changes.

\section*{mssbs}

For multiple memory bank mode only, these bits contain the offset from A14 (A12 for a byte-mode group) to the two address bits used to select banks within any memory group containing SRAM devices. Typically set to place the bits immediately above the address bits of the SRAM devices used.

32-BIT RISC PROCESSO R


Figure 40. Memory Group 0-3 Extended Bus Timing Registers

These values compensate for propagation, turn-on, turn-off, and other delays in the memory system. They are specified separately for each memory group. W hen an I/O-channel bus transaction occurs, the I/Ochannel extension, ioxebt, is substituted for the corresponding value. The I/O-channel extensionsmust be sufficientfor any memory group into which that//O channel might transfer.

\section*{mgebtsum}

Programmed to contain the sum of mgebt case and mgebt dobe. This value is used only during the slot check to compute the total time required for the bus cycle.
mgebtdobe
Expands the CAS cycle at \(\overline{\mathrm{DOB}}\) fall by the specified time. This parameter is used to compensate for memory group buffer delays, device access time, and other operational requirements. If the bus cycle is a memory read cycle, \(\overline{O E}\) is expanded. If the bus cycle is a memory write cycle, \(\overline{E W E}\) is expanded and \(\overline{L W E}\) fall is delayed the specified time.
mgebtcase
Extends the CAS cycle by the specified amount after the unextended CAS time. \(\overline{\mathrm{DOB}}, \overline{\mathrm{OE}}, \overline{\mathrm{EWE}}\) and \(\overline{\mathrm{LWE}}\) rise unextended. This parameter is used to allow for data hold times or to allow for devices to disable their output drivers. When used in combination with mgbt ewe or mgbteoe, hold or disable times can be set in most increments of 2X-CPU-clock cycles.

\section*{O n-Chip Resource Registers}

\section*{PSC1000 MICRO PRO CESSO R}


Figure 41. Memory Group 0-3 CAS Bus Timing Registers

Defines the basic timing for CAS-only cycles and the CAS portion of RAS cycles. Timing is specified separately for each memory group. The values that refer to \(\overline{C A S}\) apply to CAS, \(\overline{\mathrm{CASO}}, \overline{\mathrm{CAS} 1}, \overline{\mathrm{CAS} 2}\) and \(\overline{\mathrm{CAS3}}\), appropriately. The basic CAS cycle timing is augmented by mgXebt and ioxebt values.

\section*{mgbtcas}

Specifies the CAS-cycle precharge time, the time from the start of theCAS-timed portion of the memory cycle until \(\overline{\mathrm{CAS}}\) goes low.
mgbtdob
Specifies the end of address time (column address hold) and the beginning of data time on the bus relative to the start of the CAS portion of the memory cycle. This is the time the CPU places write data on the bus or begins accepting read data from the bus.

\section*{mgbtcast}

Specifies the total unexpanded and unextended time of a CAS cycle. \(\overline{\mathrm{DOB}}, \overline{\mathrm{OE}}, \overline{\mathrm{EWE}}\) and \(\overline{\mathrm{LWE}}\) rise at this time unless modified by mgbteoe or mgbtewe. This
time value is also used during the slot check to compute the total time required for the bus cycle.
mgbtewea
In a system with fast SRAM, \(\overline{E W E}\) fall at cycle start is required to have an adequate write enable. Other devices require their addresses to be valid before write enable falls; in these cases \(\overline{\mathrm{CAS}}\) low is required.
mgbtlwea
Specifies a delay of zero or one 2X-CPU -clock cycle after \(\overline{\mathrm{DOB}}\) fall plus expansion for \(\overline{\mathrm{LWE}}\) fall. Expansion refers to the value of mgebt dobe or ioebt dobe, as appropriate. Allows adjustment for system and device delays. For example, DRAM expects data valid at its write-enable fall. In small systems \(\overline{\mathrm{DOB}}\) plus one 2 X -CPU-clock cycle (with an expansion of zero) might be appropriate. In a large system with a heavily loaded (or buffered) \(\overline{\mathrm{LWE}}, \overline{\mathrm{DOB}}\) might be appropriate for the fastest memory cycle. If a larger delay is required, an expansion value can be set. Allows resolution of one 2 X -CPU -clock cycle in expansion timing.

PSC1000 M icroprocessor
32－BIT RISC PROCESSO R
mgbteoe
If set，\(\overline{O E}\) rises one \(2 X-C P U\)－clock cycle before the end of the unextended CAS cycle．If clear，\(\overline{\mathrm{OE}}\) rises with the end of the unextended CAS cycle．O ne 2X－CPU－ clock cycle is sufficient output－driver disable time for some devices；if not，output－driver disabletime can be created in most increments of 2X－CPU－clock cycles by combining mgebtcase and mgbteoe．
mgbtewe
If set，\(\overline{E W E}\) and \(\overline{L W E}\) rise one \(2 X-C P U\)－clock cycle before the end of the unextended CAS cycle．If clear， \(\overline{E W E}\) and \(\overline{L W E}\) rise with the end of the unextended CAScycle．O ne 2X－CPU－clock cycle is sufficienthold time for some devices；if not，hold time can be created in most increments of 2X－CPU－clock cycles by combining mgebtcase and mgbtewe．

\section*{On-Chip Resource Registers}

\section*{PSC1000 MICRO PROCESSO R}


Figure 42. Memory Group 0-3 RAS Bus Timing Registers

D efines the timing for the RAS-prefix portion a of RAS memory cycle. Timing isspecified separately for each memory group. The values are selected as required for the memory devices used. Timing values that refer to \(\overline{\text { RAS }}\) apply to RAS, \(\overline{\text { RAS } 0}, \overline{\text { RAS } 1}, \overline{\text { RAS } 2}\) and \(\overline{\text { RAS } 3}\), appropriately.
mgbtrast
Programmed to contain the sum of the decoded number of CPU -clock cycles represented in mgbt ras and mgbtrhld plus one. At the end of this time the CAS portion of the memory cycle begins. This value is used only during the slot check to compute the total time required for the bus cycle.
mgbtras
Specifies the RAS precharge time, the time \(\overline{\text { RAS }}\) is high at the beginning of a RAS cycle. The time can be shortened with mgbteras.
mgbtrhld
Specifies the row-address hold time of a RAS cycle, immediately preceding the CAS timing portion of the cycle. The time can be lengthened with mgbteras. Immediately following this time the CAS address is placed on the bus, if appropriate.
mgbteras
If set, reduces the RAS precharge time (specified by mgbtras) and extends the row-address hold time (specified by mgbtrhld) by one 2X-CPU-clock cycle.

32-BIT RISC PROCESSO R
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|}
\hline \multicolumn{10}{|l|}{ioXebt I/O Channel 0-7 Extended Bus Timing Registers} \\
\hline \multirow[t]{3}{*}{340 io0ebt 3C0 io4ebt 31} & \multicolumn{2}{|l|}{\multirow[t]{2}{*}{360 iolebt 3E0 io5ebt}} & \multicolumn{2}{|l|}{\multirow[t]{2}{*}{\begin{tabular}{ll}
380 io2ebt & \(3 A 0\) io3ebt \\
400 io6ebt & 420 io 7 ebt
\end{tabular}}} & \multirow[b]{3}{*}{65} & \multirow[b]{3}{*}{5} & \multicolumn{2}{|l|}{\multirow[b]{3}{*}{21}} & \multirow[b]{3}{*}{10} \\
\hline & & & & & & & & & \\
\hline & & & & 1110 & & & & & \\
\hline \multicolumn{5}{|c|}{Reserved Zeros} & & & & & \\
\hline & \begin{tabular}{l}
onic um \\
obe \\
ase
\end{tabular} & \begin{tabular}{l}
\[
\mathrm{I} / \mathrm{O} \mathrm{ch}
\] \\
I/O ch
\end{tabular} & \begin{tabular}{l}
Descrip \\
I/O channel ext \(\{0,1,2, \ldots, 31\}\) \\
extended bus \(\{0,1,2, \ldots, 15\}\) el extended bus
\[
\{0,1,2,4
\]
\end{tabular} & ed bus timing sum U-clock cycles [1f] ng \(\overline{D O B}\) expansion U-clock cycles [0f] ing CAS extension U-clock cycles [3] & &  & & & nchp340.wpg \\
\hline
\end{tabular}

Figure 43. I/O Channel 0-7 Extended Bus Timing Registers

These values compensate for signal propagation, turnon, turn-off, device, and other delays in the memory and I/O systems. They are substituted for the memory group values, mgXebt, during I/O channel transfers and thus must be sufficient for the I/O device, as well as any memory group with which the I/O device will transfer.

\section*{ioebtsum}

Programmed to contain the sum of ioebtcase and ioebtdobe. This value is used only during the slot check to compute the total time required for the bus cycle.

\section*{ioebtdobe}

Expands the CAS cycle at \(\overline{\text { DOB }}\) fall by the specified time. This parameter is used to compensate for
memory group buffer delays, device access time, and other operational requirements. If the bus cycle is a memory read cycle, \(\overline{O E}\) is expanded. If the bus cycle a is memory write cycle, EWE is expanded and LWE fall is delayed the specified time.
ioebtcase
Extends the CAS cycle by the specified amount after the unextended CAS time. \(\overline{\mathrm{DOB}}, \overline{\mathrm{OE}}, \overline{\mathrm{EWE}}\) and \(\overline{\mathrm{LWE}}\) rise unextended. This parameter is used to allow for data hold times or to allow for devices to disable their output drivers. When used in combination with mgbtewe or mgbteoe, hold or disable times can be set in most increments of 2X-CPU-clock cycles.

\section*{O n-Chip Resource Registers}

\section*{PSC1000 MICRO PRO CESSO R}

onchp44w.wpg
Figure 44. Memory System Refresh Address

Contains the next address used for memory-system refresh. The values are placed on the specified pins when refresh executes, and msrra is incremented by one. The timing for a refresh cycle is set by msrtg,

\section*{440 vpudelay VPU Delay Counter Register}

READ ONLY
                        VPU Delay Counter
                        VPU Delay Counter

0

Figure 45. VPU Delay Counter Register

Contains the number of CPU-clock cycles until the VPU seizes the bus. The counter is decremented once each CPU-clock cycle. The counter can be used, for
example, to determine if a time-critical task can be completed before the VPU seizes the bus, or to measure time in CPU -clock increments.
and those memory groups that are refreshed are set by mgXrd.

\section*{460 iodtta I／O Device Transfer Types A Register}

onchp460．wpg
Figure 46．I／O Device Transfer Types A Register

\section*{480 iodttb I／O Device Transfer Types B Register}


Figure 47．I／O Device Transfer Types B Register

Specifies one of three transfer types for the device attached to the corresponding I／O channel．
－Four－Byte Byte－Transfer Type：Transfers four bytes of data，one byte at a time，between the device and memory in a single bus transaction．The transaction consists of four bus cycles accessing the device，plus one additional bus cycle to access memory if the memory is cell－wide．All initial transfer addresses are to cell boundaries．
－O ne－Byte Byte－Transfer Type：Transfers one byte of data between the device and memory in a single
bus transaction．The transaction consists of a single bus cycle．Transfers to cell－wide memory are to byte zero of the addressed cell，with the remaining 24 bits undefined．Transfers to byte－wide memory are to the specified byte．
－O ne－Cell Cell－TransferType：Transfers one cell of data between the device and memory in a single bus transaction．The transaction consists of one bus cycle to access the device，plus four additional bus cycles to access memory if the memory is byte－wide．All initial transfers are to cell boundaries．

\section*{On-Chip Resource Registers}

PSC1000 MICRO PROCESSO R

\section*{Reserved Register Addresses}

4A0-780

Figure 48. Reserved Register Addresses

These addresses are reserved.


Figure 49. DMA Enable Expiration Register

Clears the corresponding DMA enable bit in iodmae after a DMA I/O channel transfer is made to the last location in a 1024-byte memory page. This allows

DM A on the corresponding I/O channel to be disabled after transferring a predetermined number of bytes. See Direct M emory Access Controller, page 103.

32－BIT RISC PROCESSO R

\section*{7C0 drivers Driver Current Register}

\section*{\(31 \quad 2928 \quad 2625 \quad 2322 \quad 201918171615\) \\ 0}


\section*{\begin{tabular}{|c|c|c|c|}
\hline & 3－Bit Field & & 2－Bit Fiel \\
\hline 00n & 1 of 3 drivers & On & 1 of 3 \\
\hline \(01 n\) & 2 of 3 drivers & 1 n & 2 of 3 \\
\hline \(11 n\) & 3 of 3 drivers & & \\
\hline
\end{tabular}

Allows programming the relative amount of current available to drive the various signals out of the package．The programmed driver current has several effects．
－The amount of currentselected determines the rise and fall times of the signals into a given load．The rise and fall times，PW B wire lengths，and PW B construc－ tion determine whether the signals are to be treated as transmission lines，and whether signal terminations are required．
－The rise and fall of signals affects bus cycle timing since signal switching consumestime．Slower rise and fall times might require a slower bus cycle．
－Greater driver current increases di／dt，and thus increases package and system electrical noise．Though total power consumption does not change when driver current is changed（since the same load is charged， just slower or faster），there is less noise produced when di／dt is decreased．Reducing output driver pre－ driver current also reduces package and system electrical noise，and can thus facilitate approval of electromagnetic compliance for products．

Programmable drivers allow the system designer to trade among system design complexity，system cost， and system performance．

O utput drivers consist of a pre－driver and an output driver．The current－supply capability of each part of the outputdriver can be programmed separately．The low bit of each field selects full－or half－drive capabil－ ity on the pre－drivers for that set of signals．The upper one ortwo bits select 1／3－，2／3－orfull－drive capability．

The pre－drivers are supplied by the core logic power， and the noise generated by their operation can affect the performance of the CPU in systems with an inadequate power supply or decoupling．In such systems，lowering pre－driver current can possibly compensate for system design flaws．

The drivers are on two separate power buses：one for AD and one for control signals and all other output pins．As a result，inside the package，electrical noise caused by AD driver switching is prevented from corrupting the quality of the control signals．This separation，however，does not preclude noise cou－ pling between the power pins outside the package． Depending on system loading，the output drivers account for \(50 \%\) to \(95 \%\) of the power consumed by the CPU ，and thus are a potentially large noise source．

\section*{On-Chip Resource Registers}

PSC1000 MICRO PROCESSO R
\(\square\)
7E0 vpureset VPU Reset Register
write
reset VPU on any write Offffffff while waiting to reset, zero otherwise

Figure 51. VPU Reset Register

Writing any value causes the VPU to begin executing at its software reset executable-code vector (location \(0 \times 00000010\) ) at the end of the current memory cycle. This is the mechanism used to clear bit 31 in the VPU PC after hardware reset, and to direct the VPU to execute a new procedure. The value of the register is
-1 during the VPU reset process (i.e., from the time vpureset is written until the VPU begins execution of the software reset executable-code vector); otherwise, its value is zero.

PSC1000 M icroprocessor
32-BIT RISC PROCESSO R
Table 54. Bit Field to On-Chip Register Cross-Reference
\begin{tabular}{|c|c|c|c|c|c|}
\hline B it field & Register & B it field & Register & B it-field & Register \\
\hline addrv & drivers & ioXout_i & ioout & mmb & miscb \\
\hline bankxdrv & drivers & mfltaddr & mfltaddr & msexa31hac & miscc \\
\hline casbvras & vram & mfltdata & mfltdata & msexvhacr & miscc \\
\hline ctrladrv & drivers & mgbtcas & mgXcasbt & msgsm & msgsm \\
\hline ctrlbdrv & drivers & mgbtcast & mgXcasbt & mshacd & misca \\
\hline dsfvcas & vram & mgbtdob & mgXcasbt & mspwe & miscc \\
\hline dsfvras & vram & mgbteoe & mgXcasbt & msra31 & msra \\
\hline fdmap & miscb & mgbteras & mgXrasbt & msras31d & misca \\
\hline ioebtcase & ioXebt & mgbtewe & mgXcasbt & msrha & msra \\
\hline ioebtdobe & ioXebt & mgbtewea & mgXcasbt & msrra & msra \\
\hline ioebtsum & ioXebt & mgbtlwea & mgXcasbt & msrtg & misca \\
\hline vpudelay & vpudelay & mgbtras & mgXrasbt & mssbs & miscc \\
\hline vpureset & vpureset & mgbtrast & mgXrasbt & msvgrp & vram \\
\hline ioXdmae_i & iodmae & mgbtrhld & mgXrasbt & oed & miscb \\
\hline ioXdmaex & iodmaex & mgebtcase & mgXebt & oevras & vram \\
\hline ioxdtt & iodtta/b & mgebtdobe & mgXebt & outdrv & drivers \\
\hline ioXie_i & ioie & mgebtsum & mgXebt & pkgio & miscb \\
\hline ioXin_i & ioin & mgXbw & miscb & pkgmflt & miscc \\
\hline ioXip_i & ioip & mgXds & mgds & rasbcasbdrv & drivers \\
\hline ioXius_i & ioius & mgXrd & misca & wevras & vram \\
\hline
\end{tabular}

\section*{Bus O peration}

\section*{PSC1000 MICRO PROCESSO R}

\section*{Bus 0 peration}

The MIF handles requests from all sources for access to the system bus. Requests arrive and are prioritized, respectively, from the VPU, DMAC and MPU. This order ensures that the VPU always has predictable memory timing, that DMA has bus availability (because the MPU can saturate the bus), and that memory coherency is maintained for the M PU .

\section*{O peration}

To gain access to the bus, the bus address must be transferred to the M IF and a check made to see if the bus is available for the time required to complete the bus transaction. The available bus time is called the slot and the process checking is called the slot check. This bus request process takes two CPU -clock cycles at the beginning of each bus transaction. Memoryreference MPU and VPU instructions always overlap one cycle of instruction execution with the bus access process. DMA operation can overlap both cycles of the bus request process with a preceding MPU bus transaction. Thus, exceptfor DM A overlapped with an M PU bus transaction, there are two CPU -clock cycles of no activity on the bus preceding each bus transaction. Instruction execution timeslisted include the bus request and programmed bus transaction time as part of the entire memory reference time.

The MIF must always grant the bus to the VPU immediately when requested in order to guarantee temporally deterministic VPU execution. To allow this, the VPU has exclusive access to the bus except when it is executing delay. When a DMA or MPU bus

Table 56. Bus Access Priorities
```

(Highest)
VPU
DMA:
I/O Channel 0
I/O Channel 1
I/O Channel }
I/O Channel }
I/O Channel }
I/O Channel }
I/O Channel }
I/O Channel }
MPU:
Posted write
Instruction pre-fetch
Local-register stack spill or refill
O perand stack spill or refill
ld/st
Instruction fetch
(Lowest)

```
request is made, the MIF prioritizes the request, determines the type of bus transaction, computes the slot required (see Table 55), and compares this to vpudelay-the amount of time before the VPU seizes the bus. If vpudelay iszero, the VPU currently has the bus. If vpudelay is larger than the value computed for the bus transaction, the bus is granted to the requestor. O therwise, the bus remains idle until a bus request occurs that can be satisfied, or until the VPU seizes the bus. Once a bus request has passed the slot check, the bus transaction begins on the next CPU-clock cycle.

The slotcheck computation is an estimate because for I/O channel bus transactions ioxebt is used for all

Table 55. Slot Check Computation

For MPU bus transactions:
((number of RAS cycles) \(\cdot\) mgbtrast \()+((\) number of bus cycles \() \cdot((\) mgbtcast +1\()+\) mgebtsum \())\)
For I/O-channel bus transactions \(\dagger\) :
((number of RAS cycles) \(\cdot\) mgbtrast) \(+((\) number of bus cycles \() \cdot((\) mgbtcast +1\()+\) ioebt sum \())\)
Memory values are for the accessed memory group, and I/O-channel values are for the accessed I/O channel.
\(\dagger\) To simplify calculation, this value is an estimate of the actual required slot. See text.

32-BIT RISC PROCESSO R
parts of the computation even though a mix of ioXebt and mgXebt times might be used during the transaction. The effect of this simplified computation is that the slot requested might be larger than the bus time actually used. The bus becomes immediately available for use when the actual bus transaction completes.

The address lines out of the CPU are multiplexed to reduce package pin count and provide an easy interface to DRAM. DRAM s have their addresses split into two pieces: the upper-address bits, or row address, and the lower-address bits, or column address. The two pieces of the address are clocked into the DRAM with two corresponding clock signals: \(\overline{\mathrm{RAS}}\) and \(\overline{\mathrm{CAS}}\). AD [ \(31: 0\) ] also output higher-order address bits than the DRAM row and column addresses during RAS and CAS times, as well as data input or output during the last portion of each bus cycle while \(\overline{D O B}\) is active. Bit outputs and bit inputs are also available on AD [7:0].

\section*{I/O Addressing}

All the address bits above the msgsm bits are referred to as the high address bits. These bits are typically used to address I/O devices with external decoding
hardware. They can be configured to be included in RAS-cycle determination, or excluded for faster I/O cycles, to match the requirements of the external decoding hardware. See System Requirements Programming, page 126, for the available configuration options.

\section*{Bus Transaction Types}

The CPU supports both cell-wide and byte-wide memory, cell-wide and byte-wide devices, and singleor multi-bus-cycle transactions. Various combinations of these are allowed; they require one, four, or five bus cycles to complete the bus transaction, which can include zero, one, or two RAS cycles. The underlying structure of all bus cycles is the same. Depending on the programmed system configuration, device-memory combination, and current system state, RAS prefix and CAS parts of bus cycles are combined to provide correct address generation and memory device operation. Table 58, page 163, lists the various combinations of RAS and CAScycles that are possible within a given bus transaction.

\section*{MPU and VPU (non-xfer) Memory Cycles}

The MPU and the VPU can read and execute programs stored in cell-wide or byte-wide memory. The

Table 57. I/O-Channel Transfer Characteristics
\begin{tabular}{|c|c|c|c|c|c|}
\hline Device Width & \begin{tabular}{c} 
Device \\
Transfer \\
Type \(^{\mathbf{1}}\)
\end{tabular} & \begin{tabular}{c} 
Memory \\
Width
\end{tabular} & \begin{tabular}{c} 
Flyby \(^{\mathbf{2}}\) \\
Buffered \(^{\mathbf{3}}\)
\end{tabular} & Bus Cycles \(^{\mathbf{4}}\) & Bits Moved \\
\hline byte & 0 & byte & F & 4 & 32 \\
\hline byte & 0 & cell & B & 5 & 32 \\
\hline byte & 1 & byte & F & 1 & 8 \\
\hline byte & 1 & cell & F & 1 & 8 \\
\hline cell & 2 & byte & B & 5 & 32 \\
\hline cell & 2 & cell & \(F\) & 1 & 32 \\
\hline
\end{tabular}
1. Refers to device type specified in iodtta or iodttb.
2. Data is transferred directly between device and memory.
3. Data is stored in the MIF during part of the transfer.
4. The entire sequence of cycles is an atomic bus transaction.

\section*{Bus O peration}

\section*{PSC1000 MICRO PROCESSO R}

M PU can also read data from and write data to cellwide and byte-wide memory. All accesses to cell-wide or byte-wide memory involve an entire cell. Accesses to cell-wide memory thus require one buscycle, while accesses to byte-wide memory require four buscycles.

Cell M emory W rite from M PU
Cell Memory Read to M PU/VPU
Table 58 and the referenced figures provide details regarding these bus transactions. These transactions require one bus cycle.

Byte M emory W rite from M PU
Byte Memory Read to MPU/VPU
Table 58 and the referenced figures provide details regarding these bus transactions. These transactions require four bus cycles. Byte address bits A1 and A0 are incremented from 0 to 3 to address the mostsignificant through the least-significant byte of the accessed cell.

\section*{I/O-Channel Transfers}

Depending on the device transfer type and memory device width, a variety of bus cycle combinations occur between I/O devices and memory, as shown in Table 57. The starting address for the transaction comes from the global register that corresponds to the I/O channel involved (g8 corresponds to I/O channel \(0, \ldots\), g15 corresponds to I/O channel 7). The direction of the transfer relative to memory is indicated by bit one of the same register. See Figure 16, page 104. The device transfer type for the transaction comes from the corresponding field in iodtta or iodttb. The bus transaction proceeds with the cycles and strobes listed in Table 58.

Cell Memory W rite from Four-byte Byte-transfer Device
Table 58 and the referenced figure provide details regarding the bus transaction. The transaction requires five bus cycles. D ata is collected from the device and stored in the MIF during the first four bus cycles, and is written to memory by the MIF during the fifth bus cycle. Data that is written to memory while being collected from the device during the first four bus cycles is replaced during the fifth bus cycle. A31 is cleared to deselect the I/O device in order to prevent
contention with the MIF during the fifth bus cycle. Byte address bits A1 and A0 are incremented from 0 to 3 to address the most-significant through the leastsignificant byte of the accessed cell while the data is being transferred from the device.

Cell Memory Read to Four-byte Byte-transfer Device Table 58 and the referenced figure provide details regarding the bus transaction. The transaction requires five bus cycles. Data is collected from memory and stored in the M IF during the first bus cycle and written to the device by the MIF during the last four bus cycles. \(\overline{\mathrm{OE}}\) is suppressed during the last four bus cycles to prevent bus contention between memory and the M IF while the device is w ritten. A31 iscleared to deselect the I/O device in order to prevent contention with memory during the first bus cycle. Byte address bits A1 and A0 are incremented from 0 to 3 to address the most-significant through the leastsignificant byte of the accessed cell while the data is being transferred to the device.

Byte Memory W rite from Four-byte Byte-transfer Device
Table 58 and the referenced figure provide details regarding the bus transaction. The transaction requires four bus cycles. Byte address bits A1 and A0 are incremented from 0 to 3 to address the most-significant through the least-significant byte of the accessed cell on both the device and memory. The data is transferred on the bus directly from the device to memory without the intervention of the MIF.

Byte M emory Read to Four-byte Byte-transfer D evice Table 58 and the referenced figure provide details regarding the bus transaction. The transaction requires four bus cycles. Byte address bits A1 and A0 are incremented from 0 to 3 to address the most-significant through the least-significant byte of the accessed cell on both the device and memory. The data is transferred on the bus directly from memory to the device without the intervention of the MIF.

Cell Memory W rite from One-byte Byte-transfer Device
Table 58 and the referenced figure provide details regarding the bus transaction. The transaction requires

32－BIT RISC PROCESSO R
one buscycle．D ata istypically supplied by the device on \(\mathrm{AD}[7: 0]\) ，and is written to the corresponding bits in memory．\(A D\)［31：8］are also written to memory， and，if not driven by an external device，still hold the CAS address bits．

Cell M emory Read to O ne－byte Byte－transfer Device Table 58 and the referenced figure provide details regarding the bus transaction．The transaction requires one bus cycle．Data is typically taken by the device from \(A D[7: 0]\) ，which come from the corresponding bits in memory．The other memory bits are driven by memory，but are typically unused by the device．

Byte Memory Write from One－byte Byte－transfer Device
Table 58 and the referenced figure provide details regarding the bus transaction．The transaction requires one bus cycle．Addresses in the global registers normally address cells because the lowestwo bits are unavailable for addressing．However，for this transac－ tion，the address in the global register is a modified byte address．That is，the address is shifted left two bits （pre－shifted in software）to be correctly positioned for the byte－wide memory connected to AD．The address is not shifted again before reaching AD．A31 remains in place，A30 and A29 become unavailable，and the group bits exist two bits to the right of their normal position due to the pre－shifting in the supplied address． This transaction allows bytes to be transferred，one byte per bus transaction，and packed into byte－wide memory．

Byte Memory Read to O ne－byte Byte－transfer Device Table 58 and the referenced figure provide details regarding the bus transaction．The transaction requires one bus cycle．Addresses in the global registers normally address cells because the lowest two bits are unavailable for addressing．However，for this transac－ tion，the address in the global register is a modified byte address．That is，the address is shifted left two bits （pre－shifted in software）to be correctly positioned for the byte－wide memory connected to AD．The address is not shifted again before reaching AD．A31 remains in place，A30 and A29 become unavailable，and the groups bits exist two bits to the right of their normal position in the due to the pre－shifting in the supplied
address．This transaction allows bytes to be trans－ ferred，one byte per bus transaction，and unpacked from byte－wide memory to a device．

Cell M emory W rite from O ne－cell Cell－transfer Device Table 58 and the referenced figure provide details regarding the bus transaction．The transaction requires one bus cycle．

Cell Memory Read to One－cell Cell－transfer Device Table 58 and the referenced figure provide details regarding the bustransaction．Thetransaction requires one bus cycle．

Byte M emory W rite from O ne－cell Cell－transfer D evice Table 58 and the referenced figure provide details regarding the bus transaction．The transaction requires five bus cycles．Data is collected from the device and stored in the M IF during the first bus cycle and written to memory by the MIF during the last four bus cycles． Data that is written to memory while being collected from the device during the first bus cycle is replaced during the second cycle．A31 iscleared to deselect the I／O devicein order to prevent contention with the M IF during the last four bus cycles．Byte address bits A1 and A 0 are incremented from 0 to 3 to address the most－significant through the least－significant byte of the accessed cell while the data is being transferred from the MIF to memory．

Byte Memory Read to O ne－cell Cell－transfer Device Table 58 and the referenced figure provide details regarding the bustransaction．Thetransaction requires five bus cycles．Data is collected from memory and stored in the MIF during the first four bus cycles and written to the device by the M IF during the last bus cycle．\(\overline{\mathrm{OE}}\) is suppressed during the fifth bus cycle to prevent a bus contention between the memory and MIF while the device is written．A31 is cleared to deselect the I／O device in order to prevent contention with memory during the first four bus cycles．Byte address bits A1 and A0 are incremented from 0 to 3 to address the most－significant through the least－ significant byte of the accessed cell while the data is being transferred from the memory to the M IF．

\section*{Bus O peration}

\section*{PSC1000 MICRO PROCESSO R}

\section*{Bus Reset}

External hardware reset initializes the entire CPU to the power-on configuration, except for power_fail in mode. W hile the reset is active (external or power-on selfreset), the AD go to a high-impedance state, OUT [7: 0] go high, RASes go active, and all other outputs go inactive. See Figure 73, page 205, for waveforms.

\section*{Video RAM Support}

VRAMs increase the speed of graphics operations primarily by greatly reducing the system memory bandwidth required to display pixels on the video display. A VRAM command is used to transfer an entire row of data from the DRAM array to an internal serial access memory to be clocked out to the video display. VRAMs also support other commands to enhance graphicsoperations. TheVRAM operations are encoded by writing vram and performing an appropriate read or write to the desired VRAM memory address. Basic timing for VRAM bus cycles is the same as any similar bus transaction in that memory group. See Figure 32, page 137. Refresh and RAS cycles might also affect VRAM operations. See Video RAM Support, page 125. W aveforms representing the effects of the various vram options are on page 215.

\section*{Virtual-Memory Page Faults Input}

The MIF detects memory page faults that are caused by MPU memory accesses by integrating fault detection with RAS cycles. The mapped page size is thus the size of the CAS page. The memory system RAS page address is mapped from a logical page address to a physical page address during RAS precharge through the use of an external SRAM. A memory fault signal supplied from the SRAM is sampled during \(\overline{R A S}\) fall and, if low, indicates that a memory page fault has occurred. See Figure 52. The memory fault signal is input from \(\overline{M F L T}\) or AD8. See Alternate M emory Fault Input, below.

W hen a memory fault is detected, the bus transaction completes without any of the signals that normally go active during the CAS part of the buscycle. A memory fault exception is then signaled to the M PU, which executes a trap to service the fault condition. See Figure 77, page 212, for waveforms.


Figure 52. Virtual-Memory Page Mapping Logic

\section*{Alternate Inputs and \(\mathbf{O}\) utputs}

The bit inputs, bit outputs, memory fault input, and reset inputcan be multiplexed on AD rather than using the dedicated pins. Thisfeature can be used to reduce the number of tracks routed on the PW B (to reduce PW B size and cost), and can allow the PSC1000 CPU to be supplied in smaller packages. See Figure 81, page 218, for waveforms.

Alternative Bit Inputs
The bit inputs can be sampled either from \(\overline{\mathrm{IN}}[7: 0]\) or from AD [7:0] while \(\overline{R A S}\) is low and \(\overline{C A S}\) is high. The source is determined by pkgio. See Figure 34, page 140, and Bit Inputs, page 111.

Alternative Bit O utputs
The bit outputs appear both on OUT [7:0] and on AD [7:0] while \(\overline{R A S}\) is high. Since they appear in both places, no selection bit is required. See Bit O utputs, page 115.

Alternative M emory Fault Input
The memory fault signal can be sampled either from \(\overline{\mathrm{MFLT}}\) or from AD8 during \(\overline{\text { RAS }}\) fall．The source is determined by pkgmflt．See Figure 39，page 145.

Alternative Reset Input
External hardware reset can be taken either from \(\overline{\mathrm{RESET}}\) or from AD8；the determination is made at power－on．The power－on and reset sequence is described in detail in Processor Startup，page 181.

\section*{Bus O peration}

PSC1000 MICRO PROCESSO R

Table 58. RAS/CAS Bus Transactions


\section*{Notes:}
1. I/O-channel transfer type in iodtta and iodttb.
2. VPU does not write to memory.
3. Indicates on which bus cycle RAS or CAS cycles are possible. Presence of a RAS cycle depends on system conditions. \(R_{1}\) or \(C_{1}\) indicates that the bus cycle uses ioxebt timing values, \(R_{M}\) or \(C_{M}\) indicate that the bus cycle uses mgXebt timing values.
4. Active strobe during cycle ( \(w\) is \(\overline{\mathrm{E}} \overline{\mathrm{WE}} / \mathrm{LWE}, \mathrm{o}\) is \(\overline{\mathrm{OE}},-\) is no active strobe).
5. A31 selects the I/O device when set, deselects the I/O device when clear ( \(a=\) program-supplied value, \(0=\) forced to zero).
6. Data is collected from the device and stored in the MIF during the first four cycles, and is written to memory by the MIF during the fifth cycle. Data written during first four cycles is replaced during the fifth cycle.
7. Data is collected from memory into the MIF during the first cycle and written to the device by the MIF during the last four cycles. \(\bar{O} \bar{E}\) is suppressed during the last four cycles to prevent memory from driving the bus.
8. Data is collected from the device and stored in the MIF during the first cycle, and is written to memory by the MIF during the last four cycles. Data written to memory during the first cycle is replaced during the second cycle.
9. Data is collected from memory into the MIF during the first four cycles, and is written to the device by the MIF during the last cycle. \(\bar{O} \bar{E}\) is suppressed on the fifth cycle to prevent memory from driving the bus.

\section*{PSC1000 M icroprocessor}

32-BIT RISC PROCESSO R


Figure 53. Cell Memory Write from MPU

\section*{Bus O peration}

PSC1000 MICRO PROCESSO R

NOIL甘WとO』NI ヨコNVへO甘


Figure 55．Byte Memory Write from MPU

\section*{Bus O peration}

PSC1000 MICRO PROCESSO R



Figure 57. Cell Memory Write from Four-byte Byte-transfer Device

\section*{Bus O peration}

PSC1000 MICRO PRO CESSO R



Figure 59. Byte Memory Write from Four-byte Byte-transfer Device

\section*{Bus O peration}

PSC1000 MICRO PROCESSO R


\section*{PSC1000 M icroprocessor}

32-BIT RISC PROCESSO R


Figure 61. Cell Memory Write from One-byte Byte-transfer Device

\section*{Bus O peration}

PSC1000 MICRO PROCESSO R


\section*{PSC1000 M icroprocessor}

32-BIT RISC PROCESSO R


Figure 63. Byte Memory Write from One-byte Byte-transfer Device

\section*{Bus 0 peration}

PSC1000 MICRO PROCESSO R


\section*{PSC1000 M icroprocessor}

32-BIT RISC PROCESSO R


Figure 65. Cell Memory Write from One-cell Cell-transfer Device

\section*{Bus O peration}

PSC1000 MICRO PROCESSO R


32-BIT RISC PROCESSO R


Figure 67. Byte Memory Write from One-cell Cell-transfer Device

\section*{Bus O peration}

PSC1000 MICRO PRO CESSO R


Figure 68. Byte Memory Read to One-cell Cell-transfer Device

NOIL甘WYO』NI ヨコNシヘO＊

\section*{Bus O peration}

\section*{PSC1000 MICRO PROCESSO R}

\section*{Processor Startup}

\section*{Power-on Reset}

The CPU self-resets on power-up (see Reset Process, below). TheCPU contains an internal circuitthatholds internal reset active and keeps the processor from running, regardless of the state of the external hardware reset, until the supply voltage reaches approximately 3 V . Once the supply reaches \(3 \mathrm{~V}, \overline{\mathrm{RESET}}\) is sampled and, if active, is used as the source of external reset for the CPU . O therw ise, external reset is multiplexed on AD8. This determination applies until power is cycled again. If one of the resets is active, the CPU waits until that reset goes inactive before continuing. If neither reset source is active, the processor immediately begins the reset sequence. The clock input at CLK, therefore, must be stable before that time.

During the power-on-reset process, the mode bit power_fail is set to indicate that the power had previously failed. The bit is cleared by any write to mode.

\section*{Boot Memory}

The CPU supports booting from byte-wide memory that is configured as either an \(\overline{\mathrm{OE}}\)-activated or bootonly memory device. The boot-only memory configuration is primarily used to keep the typically slow boot EPRO M sout of the heavily used low-address memory pages.

Boot-only memory is distinct from \(\overline{\mathrm{OE}}\)-activated memory in that it is wired into the system to place data on the bus without the use of \(\overline{\mathrm{OE}}\) or memory bank- or group-specific (RASX or \(\overline{C A S X}\) ) signals. OED is initially set during a CPU reset to disable \(\overline{\mathrm{OE}}\) during the boot-up process to allow the described operation. The boot-only memory select signal is externally decoded from the uppermostaddress bits that contain \(0 \times 800 \ldots\). The number of uppermost address bits used depends on the system's \(1 / 0\) device address decoding requirements. The lowest address bits are connected so as to address individual bytes and cells as they are for a normal memory. Thus the boot-only memory device can be selected regardless of which memory group is accessed.

\section*{Reset Process}

When reset occurs, the CPU leaves on-chip RAM uninitialized and clears most registers to zero, except for strategically placed bits that assist in the reset sequence. Specifically, the CPU resets to the most conservative system configuration. See Table 59. The mode bit power_fail is set only by the power-onreset process and can be checked to determine whether the reset was caused by a power failure or reset going active.

The first bus transaction after reset is a cell read of four bytes from byte-wide memory in memory group zero, memory bank zero, starting from addresses \(0 \times 80000000\), with \(\overline{O E}\) disabled, in SM B mode. This address consists of I/O device address \(0 \times 800 \ldots\) and memory device address 0x...N. Because \(\overline{\mathrm{OE}}\) is disabled, \(\overline{\mathrm{OE}}\)-activated memory does not respond, thus allowing a boot-only memory to respond.

The CPU tests the byte returned from address \(0 \times 80000003\). If the byte is \(0 \times 35\) then a boot-only memory responded and execution continues with \(\overline{\mathrm{EE}}\) disabled. Otherwise, a boot-only memory did not respond, and the CPU assumes booting occurs from \(\overline{\mathrm{OE}}\)-activated memory. The CPU then clears OED to activate \(\overline{\mathrm{OE}}\) for this memory to respond on subsequent bus cycles.

\section*{Bootstrap Programs}

With either boot-only or \(\overline{\mathrm{OE}}\)-activated memory, bus accesses continue in SMB mode from the byte-wide memory device. The second bus transaction is to the hardw are reset address for the VPU at \(0 \times 80000004\). This typically contains a jump to a small refresh/delay loop. The delay makes the bus available and allows the MPU to begin executing at its reset address, \(0 \times 80000008\). The programmer must ensure that the delay value programmed in the VPU is sufficient to allow the M PU on the bus with the very slow byte-wide bus transactions that defaultafter reset.

If the system is wired in MMB mode, booting is simpler from a boot-only memory. Booting from \(\overline{\mathrm{OE}}\) activated memory is also possible, but requires external gating to prevent bank zero of memory

32-BIT RISC PROCESSO R
groups one, two, and three from being selected when memory group zero is accessed.

Next, the MPU begins executing and typically is programmed to branch to the system bootstrap routine. The M PU bootstrap is programmed to:
- set the configuration registers required for the system hardw are,
- set the software reset vector for the VPU,
- copy the initial MPU and VPU application programs from the boot device into memory (if required),
- branch to the application program for the MPU , and
- reset the VPU in softw are to begin VPU program execution (if required).

System startup is now complete.
The following pages describe several startup configurations. For actual code see Example PSC1000 CPU System, page 187. The configurationsdescribed below are:
- Boot from byte-wide boot-only memory and copy the application program to cell-wide R/W memory.
- Boot from cell-wide boot-only memory and copy the application program to cell-wide R/W memory.
- Boot and run from byte-wide memory.
- Boot and run from cell-wide memory.

Boot from Byte-W ide Boot-O nly Memory and Copy the Application Program to Cell-W ide R/W Memory This process requires external decoding hardware to cause the boot-only memory to activate as previously described.

To indicate that boot-only memory is present, the memory must have 0xa5 at location 0x80000003 (typically \(0 x 000000 a 5\) in the cell at \(0 \times 80000000\) ). This signature byte must be detected at startup to continue the boot process from a boot-only memory.

Construct the boot program execution sequence to be as follows:
1. The VPU executes JUMP from its power-on-reset location to code that performs eight RAS cycles on
each memory group (by performing refresh cycles) to initialize system DRAM. It then enters a micro-loop that includes refresh for DRAM, and delay to allow the MPU to execute. The micro-loop repeats refresh and delay, and eliminates VPU accesses to the bus for further instructions during configuration. delay allows the M PU bus access to begin configuring the system before more refresh cyclesare required. The refresh cycles are not required if the system does not contain DRAM.
2. The M PU executes br from its reset location to the program code to configure the system. The br should contain bits that address memory group three. This later allows the configuration for memory group three to be used for boot-only device access timing while memory groups zero, one and two are programmed for the system timing requirements. Although memory group one or two could be used instead of three in the manner described herein, only memory group three is discussed for simplicity.

The MPU configuration program code must be arranged to hold off instruction pre-fetch so that the configurations of the current memory group and the global memory system are not changed during a bus cycle. See the supplied example boot code on page 191.
3. W hen programming miscb, set mmb if required. In systems wired for MMB mode this allows RAS-type cycles to occur properly on all memory groups.
4. Set msgsm to define four memory groups, even if the system ultimately does not have them. During the next instruction fetch the boot-only memory is again selected. H ow ever, the address bits for memory group three placed in the PC by br in step two cause the configuration for memory group three to be used.
5. Program the timing of memory group three to optimize access to the boot-only memory. Then program the remainder of the system configuration. During this process the VPU typically performs three or so sets of refresh cycles. Though it is possible that the MPU will be changing pertinent configuration registers during the refresh cycles, it is very unlikely

\section*{Bus O peration}

\section*{PSC1000 MICRO PROCESSO R}
due to the long bus cycle times of EPROM s typically used for boot-only memory. Further, the worst result is inappropriate timing on a single refresh cycle, which is of little actual consequence since there is no data yet in DRAM to be protected.

If memory group three is used by the application, it must be configured later from the loaded application code.
6. Read the final boot code (if any) and the application program from the boot-only memory and write them to the appropriate locations in R/W memory. The entire application program can be loaded into R/W RAM , except for that part, if any, that is destined for memory group three, where the boot-only memory is running. This must be copied by the application once it is running.
7. Layout a single instruction group that contains programming to clear OED and to branch to the application program. Using br [] clears A31 so that the boot-only memory does not activate at the branch destination.
8. N ow the application program is executing. Configure memory group three, if required. If loading memory group three from the boot-only memory is necessary, then arrange the code between two instruction groups to firstensure pre-fetch iscomplete, then set OED, then execute a micro-loop to transfer the application to memory group three, and reenable OED when the micro-loop completes.
9. Reset the VPU in software to begin execution of its application program. A software reset of the VPU causes it to begin executing at \(0 \times 10\), and as a result clears A31 from the VPU PC so the boot-only memory is no longer selected.

The boot process is complete.
Bootfrom Cell-W ide Boot-O nly M emory and Copy the Application Program to Cell-W ide R/W Memory This process requires external decoding hardware to cause the boot-only memory to activate as previously described.

The CPU always initially boots from byte-wide memory since thisis the reset configuration. The CPU executes instructions from the low byte of each address until the configuration for the currentmemory group is programmed to be cell wide. Up to this point, the upper 24 bits of the boot-device data are unused. The boot process is otherwise the same as booting from byte-wide boot-only memory, except that atstep 3 , when writing miscb, also set memory groups zero and three to be cell-wide. In the instruction group with the sto to miscb place a br to the next instruction group. This holds off pre-fetch so that the next instruction fetch is cell-wide. Note that the boot program must be carefully programmed so that the instructions before the br are represented as bytewide and after the br are represented as cell-wide. The Patriot linker has a section directive, CELLBOOT, to create the appropriate initial section.

Boot and Run from Byte-W ide Memory This process requires the boot/run memory device to be activated by MGS0 \(/\) RAS \(0 / \mathrm{CASO}\). A 31 is not used when selecting the boot/run memory.

To indicate that \(\overline{\mathrm{OE}}\)-activated memory is present, the memory must not respond with 0xa5 at location \(0 \times 80000003\) when \(\overline{O E}\) is not asserted. The lack of this signature byte is detected atstartup to indicate that \(\overline{\mathrm{OE}}\) is required to continue the boot process. OED is set during a CPU reset to disable OED during the boot-up process, and cleared when the signature byte \(0 \times a 5\) is not detected, re-enabling \(\overline{\mathrm{OE}}\).

Construct the boot program execution sequence to be as follows:
1. The VPU executes JUMP from its power-on-reset location to code that performs eight RAS cycles on each memory group (by performing refresh cycles) to initialize system DRAM. It the enters a micro-loops that includes refresh for DRAM, and delay to allow the MPU to execute. The micro-loop repeats refresh and delay, and eliminates accesses by the VPU to the bus for further instructionsduring configuration. delay allowsthe M PU bus access to begin configuring the system before more refresh cycles are required. The refresh cycles are not required if the system does not contain DRAM.

2．The MPU executes br from its reset location to the program code to configure the system．

The MPU configuration program code must be arranged to hold off instruction pre－fetch so that the configurations of the current memory group and the global memory system are not changed during a bus cycle．See the supplied example boot code on page 191.

3．W hen programming miscb，set mmb if required．In systems wired for M MB mode this allows RAS－type cycles to occur properly on all memory groups．

4．Program the timing of memory group zero to optimize access to the memory．Then program the remainder of the system configuration．During this process the VPU typically performs three or so sets of refresh cycles．Though it is possible for the M PU to be changing pertinent configuration registers during a refresh cycle，it is very unlikely due to the long bus cycle times of EPROM s．Further，the worst result is inappropriate timing on a single refresh cycle，which is of little actual consequence since there is no data yet in DRAM to be protected．

5．Reset the VPU in software to begin execution of its application program，if needed．A software reset of the VPU causes it to begin executing at \(0 \times 10\) ，and as a result clears A31 from the VPU PC．

6．Begin execution of the application program．
The boot process is complete．

Boot and Run from Cell－Wide Memory
This process requires the boot／run memory device to be activated by MGSO／RAS0／CASO．A31 is notused when selecting the boot／run memory．

The CPU always initially boots from byte－wide memory since this is the reset configuration．TheCPU executes instructions from the low byte of each address until the configuration for the currentmemory group is programmed to be cell wide．Up to this point， the upper 24 bits of the boot－device data are unused． The boot process is otherw ise the same as booting and running from byte－wide memory，except that at step 3 ，when writing miscb，also set memory group zero to be cell－wide．In the instruction group with the sto to miscb place a br to the next instruction group． This holds off pre－fetch so that the next instruction fetch is cell－wide．N ote that the bootprogram mustbe carefully programmed so that the instructions before the br are represented as byte－wide and after the br are represented as cell－wide．The Patriot linker has a section directive，CELLBOOT，to create the appropriate initial section．

\section*{Stack Initialization}

After CPU resetboth of the M PU stacks are uninitializ－ ed until the corresponding stack pointers are loaded． This should be one of the first operations performed by the MPU．

After a reset，the operand stack is abnormally empty． That is，s2 has not been allocated，and is allocated on the first push operation．How ever，popping this item causes the stack to be empty and require a refill．The first pushed item should therefore be left on the stack， or sa should be initialized，before the operand stack is used further．

PSC1000 MICRO PROCESSO R

Table 59. System Configuration after CPU Reset


NOIL甘WYO』NI ヨコNシヘO＊

\section*{Example Systems}

\section*{PSC1000 MICRO PRO CESSO R}

\section*{Example PSC 1000 CPU Systems}

\section*{Example System 1}

Figure 69 depicts a minimal system with an 8-bit wide EPROM in memory group zero, and 256 K of 8 -bitwide DRAM in memory group one. Memory group zero and memory group one must be configured with timing appropriate for the devices used, and mg1ds set to \(0 \times 02\) (256K DRAM). Otherwise, the default system configuration is suitable. The system can boot and run directly from the EPRO M , or, since EPRO M s are generally slower than DRAM, can copy the EPROM into DRAM for faster code execution

\section*{Example System 2}

Figure 70 depicts a minimal system with 32-bit-wide DRAM in memory group zero, an 8-bit-wide EPRO M as a boot-only memory device, and an I/O address decoder. The I/O address decoding is performed by a 74HC137, a 3-to-8 decoder with latch. The decoder is w ired to supply four device selects when A31 is set, and another four when A31 is clear. The sets of four selects are latched during RAS precharge and enabled during \(\overline{C A S}\) active. They are decoded from A30 and A29 when a 32-bit-wide memory group is involved and from A28 and A27 when an 8-bit-wide memory group is involved. The device select with A31 set and the other decoded address bits clear is used to select the EPROM as a boot-only memory device.

The EPROM must be programmed with 0xa5 at location 0x80000003 (typically 0x000000a5 at location 0x80000000). Memory group zero must be configured with timing appropriate for the devices used, mg0bw set to zero (cell wide), and mg0ds set to \(0 \times 02\) (256K DRAM ). Since RAS is used to latch the I/O address, msras 31 d , mshacd and msexa 31 hac must remain in their default configuration of clear.

\section*{Example System 3}

Figure 71 depicts a system with 32 KB of 32 -bit-wide SRAM in memory group zero, 1 MB of 32 -bit-wide DRAM in memory group one, an 8-bit-w ide EPRO M as a boot-only memory device, and an I/O address decoder. Address latching of the CAS address for the SRAM is performed by two 74ACT841 transparent latches. The address inputs of the DRAM and EPRO M are also connected to the outputs of the latches, though they could have been connected to the corresponding AD instead. The I/O address decoding is performed by a 74FCT138A, a 3-to-8 decoder, using the latched CAS address bits. The decoder is wired to supply eight device selects when A31 is set. The selects are enabled during \(\overline{\text { CAS }}\) active. They are decoded from A 30 and A 29 when the DRAM memory group is involved and from A20 and A21 when the SRAM memory group is involved. Since the EPROM is 8-bit-wide, the selects are decoded from A18 and A19 when accessing the EPROM. The device select with A31 set and the other decoded address bits clear is used to select the EPROM as a boot-only memory device.

The EPROM must be programmed with 0xa5 at location 0x80000003 (typically 0x000000a5 at location 0x80000000). The memory groups must be configured with timing appropriate for the devices used, mg0bw and mg1bw set to zero (cell wide), mg0ds set to 0x0f (SRAM), and mg1ds set to \(0 \times 02\) (256K DRAM ). Since RAS is not used to latch the I/O address, msras \(31 \mathrm{~d}, \mathrm{mshacd}\) and msexa 31 hac can be set to reduce the number of RAS cycles involved in I/O.

\section*{PATRIOT \\ PSC1000 Microprocessor \\ 32-BIT RISC PROCESSOR}


Figure 69. Example Minimal System with 8-bit Memory

\section*{Example Systems}

PSC1000 MICROPROCESSOR


Figure 70. Example Minimal System with 32-bit DRAM and I/O Decoding

\section*{- PRATRTOT PSC1000 Microprocessor \\ 32-BIT RISC PROCESSOR}


Figure 71. Example System with SRAM, DRAM and I/O Decode

\section*{Example Systems}

PSC1000 MICRO PROCESSO R
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

NOIL甘WとO』NI ヨコNVへOV


\section*{Example Systems}

PSC1000 MICRO PROCESSO R

NOI\&VNYOJNI ヨONVヘGV


Example Systems
PSC1000 MICROPROCESSOR

NOIL甘WとO」NI ヨコNV＾OV


\section*{Example Systems}

PSC1000 MICRO PROCESSO R




\title{
Electrical Characteristics
}

\section*{PSC1000 MICRO PRO CESSO R}

\section*{Electrical Characteristics}

\section*{Power and Grounding}

The PSC1000 CPU is implemented in CM O S for low average power requirements. However, the high clock-frequency capability of the CPU can require large switching currents of asmuch as eleven amperes, depending on the output loading. Thus, all \(\mathrm{V}_{\mathrm{CC}}\) and \(\mathrm{V}_{\mathrm{sS}}\) must be connected to planes within the PW B (printed wire board) for adequate power distribution.

The switching current required by \(\mathrm{cV}_{\mathrm{cc}}\) and \(\mathrm{cV}_{\mathrm{ss}}\) is characterized by the internal clock and output driver pre-drivers. The internal clock requires approximately 500 mA with significant \(5-\mathrm{GHz}\) frequency components every clock transition. The output driver pre-drivers require as much as 3 A with significant \(1-\mathrm{GHz}\) frequency components every output transition. The CPU has on-chip capacitance to supply the highfrequency components. Package diagrams indicate which of \(c V_{c c}\) and \(c V_{S S}\) are closest to the internal clock drivers and PLL.

The switching current required by ctrlV \(\mathrm{C}_{\mathrm{cc}}\) and ctrlV \(\mathrm{V}_{\text {s }}\) is characterized by the supplied output drivers and externally attached loads. Assuming a worst-case average load of 100 pF and 16 pins switching at once, these drivers require 2.67 A with significant \(300-\mathrm{M} \mathrm{Hz}\) frequency components every output transition. Switching-current requirements reduce substantially linear manner with a reduction in external loading.

The switching power required by \(\operatorname{adV}_{\mathrm{cc}}\) and \(\mathrm{adV}_{\mathrm{sS}}\) is characterized by the supplied output drivers and externally attached loads. Assuming a worst-case average load of 100 pF and 32 pins switching at once, these drivers require 5.33 A with significant \(300-\mathrm{M} \mathrm{Hz}\) frequency components every output transition. Switching-current requirements reduce substantially linear manner with a reduction in external loading.

\section*{Power Decoupling}

Due to the switching characteristics discussed above, power decoupling at the CPU is typically required. Surface-mount capacitors with low ESR are preferred. Generally, smaller-physically-sized capacitors have
better frequency characteristics (i.e., lower series inductance, resulting in higher self-resonance frequency) than larger physically-sized capacitors. PW B board construction using FR-4 with power and ground layer spacing of 10 mils or less supplies the best highfrequency decoupling (typically about \(100 \mathrm{pF} / \mathrm{in}^{2}\) ). Connections to the power and ground planes must be as short as possible. Proper power and ground plane connections and appropriate decoupling also reduces EMC problems.

The charge supply required from the decoupling capacitors can be calculated from the relation \(C=\) \(I /(f \Delta V)\), where I is the current required, \(f\) is the frequency, and \(\Delta \mathrm{V}\) is the allowed voltage drop, typically .1 V . Thus, \(\mathrm{cV}_{\mathrm{cc}}\) and \(\mathrm{cV}_{\mathrm{sS}}\) require 1000 pF for the internal clock and \(.03 \mu \mathrm{~F}\) for the output driver pre-drivers, while ctrl\(V_{C C}\) and ctrl\(V_{S S}\) together with \(a d V_{c c}\) and \(a d V_{\text {sS }}\) require \(.24 \mu \mathrm{~F}\). These requirements can be met with four . \(1 \mu \mathrm{~F}\) X7R capacitors, one on each side of, and on the same side of the PW B, as the CPU, as close to the package as practical.

Note that mounting capacitors on the same PWB surface as the PSC1000 CPU package can allow connecting traces of about 25 mils in length, while mounting capacitors on the opposite PWB surface requires traces of over 100 mils in length. At the switching frequencies listed, the difference in trace lengths creates significant differences in decoupling effectiveness. The package and capacitor power and ground connections would optimally be fabricated with VIP (via-in-pad), if possible, for the same reasons.

\section*{Connection Recommendations}

All output drivers are designed to directly drive the heavy capacitive loads of memory systems, thus minimizing the external components and propagation delays associated with buffering logic. H ow ever, with increased loading comesincreased power dissipation, and trade-offs must be made to ensure that the PSC1000 CPU operating temperature does notexceed operating limitations. Systems with heavy CPU bus loads might require heatsinksor forced air ventilation. Note that reducing output driver current does not reduce total power dissipation because power consumption is dependent on outputloading and not

\section*{PSC1000 M icroprocessor}

32－BIT RISC PROCESSO R
on signal transition edge rates．See Figure 50，page 154.

To reduce system cost，most inputs have internal circuitry to provide a stable input voltage if the input is unused．Thus，most unused inputs do not require pull－ups．

\section*{Clock}

The PSC1000 CPU requires an external CMOS oscillator at one－half the processor frequency．The oscillator is doubled internally（CPU－clock cycle）to operate the MPU and the VPU，and doubled again to provide fine－granularity programmable bustiming（ 2 X － CPU－clock cycle）．

Inexpensive oscillatorstypically have guaranteed duty cycles of only \(55 / 45\) or 60／40．The narrower half of the
clock cycle represents the clock period at which the CPU appears to be operating．An \(80-\mathrm{MHz}\) CPU is thus belimited with a \(60 / 40\) oscillator to \(64 \mathrm{MHz}(32 \mathrm{M} \mathrm{Hz}\) externally），because with a 64 MHzCPU －clock the \(40 \%\) clock period is 12.5 ns ．Thus oscillator selection and qualification is an important factor in processor performance．

The CPU－clock frequency selected depends on appli－ cation and system hardware requirements．A clock frequency might be selected for the VPU to produce appropriate application timing，or for the MIF to optimize bus timing．For instance，if the system requires a 40 ns bus cycle，it might be more efficient to operate at 75 MHz with a three CPU－clock cycle long bus cycle（ 40 ns ）than to operate at 80 MHz with a four CPU－clock cycle long bus cycle（ 50 ns ）．

\section*{Electrical Characteristics}

PSC1000 MICRO PRO CESSO R

\section*{Absolute Maximum Ratings}

Table 60. Absolute Maximum Ratings
\begin{tabular}{|c|c|c|c|c|c|}
\hline Characteristic & Symbol & Min & Max & Unit & Notes \\
\hline Core Logic Supply Voltage & \(\mathrm{cV}_{\text {cc }}\) & -0.5 & +7.0 & V & 1 \\
\hline Control Driver Supply Voltage & \(\mathrm{ctrlV}_{\text {cc }}\) & -0.5 & +7.0 & V & 1 \\
\hline AD Driver Supply Voltage & \(\mathrm{adV}_{\text {cc }}\) & -0.5 & +7.0 & V & 1 \\
\hline DC Input Voltage & V , & -0.5 & +7.0 & V & \\
\hline \multirow[t]{2}{*}{DC Output Voltage} & \multirow[t]{2}{*}{V 。} & -0.5 & +7.0 & V & output Hi-Z \\
\hline & & -0.5 & \(\mathrm{V}_{\mathrm{cc}}+0.5\) & V & output driven \\
\hline DC Input Diode Current & \(\mathrm{I}_{\text {IK }}\) & & -50 & mA & \(\mathrm{V}_{1}<\mathrm{V}_{\text {S }}\) \\
\hline \multirow[t]{2}{*}{DC Output Diode Current} & \multirow[t]{2}{*}{\(\mathrm{I}_{\mathrm{KK}}\)} & & -50 & mA & \\
\hline & & & +50 & mA & \\
\hline Storage Temperature & \(\mathrm{T}_{\text {STG }}\) & -65 & +150 & \({ }^{\circ} \mathrm{C}\) & \\
\hline Case Temperature Under Bias & \(\mathrm{T}_{\mathrm{C}}\) & -65 & +125 & \({ }^{\circ} \mathrm{C}\) & \\
\hline Operating J unction Temperature & \(\mathrm{T}_{\mathrm{J}}\) & -65 & +150 & \({ }^{\circ} \mathrm{C}\) & \\
\hline \multicolumn{6}{|l|}{\begin{tabular}{l}
Stressing the device beyond Absolute Maximum Ratings can cause the device to sustain permanent damage. Operating the device beyond Operating Conditions is not recommended and can reduce device reliability. Functional operation at Absolute Maximum Ratings is not guaranteed. \\
1. \(\mathrm{cV}_{\mathrm{ss}}, \operatorname{ctrlV_{ss}}\) and \(\mathrm{adV}_{\mathrm{ss}}\) are required to be at the same potential.
\end{tabular}} \\
\hline
\end{tabular}

\section*{PSC1000 M icroprocessor}

32-BIT RISC PROCESSO R

\section*{Operating Conditions}

Table 61. Operating Conditions
\begin{tabular}{|c|c|c|c|c|c|}
\hline Characteristic & Symbol & Min & Max & Unit & Notes \\
\hline Core Logic Supply Voltage & \(\mathrm{cV}_{\mathrm{cc}}\) & 3.0 & 5.5 & V & \\
\hline Control Driver Supply Voltage & \(\mathrm{ctrlV}_{\text {cc }}\) & 3.0 & 5.5 & V & \\
\hline AD Driver Supply Voltage & \(a d V_{c c}\) & 3.0 & 5.5 & V & \\
\hline Input Voltage & \(V_{1}\) & 0 & 5.5 & V & \\
\hline \multirow[t]{2}{*}{O utput Voltage} & \multirow[t]{2}{*}{\(\mathrm{V}_{0}\)} & 0 & 5.5 & V & output Hi-Z \\
\hline & & 0 & \(\mathrm{V}_{\mathrm{cc}}\) & V & output driven \\
\hline \multirow[t]{2}{*}{O utput Current} & \(\mathrm{I}_{\mathrm{OH}}\) & & 180 & mA & 1 \\
\hline & \(\mathrm{I}_{\mathrm{OL}}\) & & 180 & mA & 1 \\
\hline Input Clock & \(\mathrm{f}_{\mathrm{c}}\) & & 80 & M Hz & \\
\hline Case Temperature Under Bias & \(\mathrm{T}_{\mathrm{C}}\) & 0 & +85 & \({ }^{\circ} \mathrm{C}\) & \\
\hline Free-Air O perating Temperature & \(\mathrm{T}_{\mathrm{A}}\) & -40 & +85 & \({ }^{\circ} \mathrm{C}\) & \\
\hline Input Edge R ate & \(\Delta t / \Delta \mathrm{V}\) & 0 & . 1 & ns/V & 2 \\
\hline \multicolumn{6}{|l|}{\begin{tabular}{l}
Notes: \\
1. Assumes the maximum of three driver sections enabled (at 60 ma each) during signal transitions only. \\
2. \(\mathrm{V}_{\text {IN }}=\mathrm{V}_{\text {IH-MIN }}-\mathrm{V}_{\text {IL-MAX }}\) monotonic
\end{tabular}} \\
\hline
\end{tabular}

\section*{Electrical Characteristics}

PSC1000 MICRO PROCESSOR
DC Specifications
Table 62. DC Specifications
\begin{tabular}{|c|c|c|c|c|c|}
\hline Characteristic & Symbol & Min & Max & Unit & Notes \\
\hline \multirow[t]{2}{*}{Input Low Voltage} & \multirow[t]{2}{*}{VIL} & 0 & 0.8 & V & TTL \\
\hline & & 0 & 1.8 & & CMOS \\
\hline \multirow[t]{2}{*}{Input High Voltage} & \multirow[t]{2}{*}{\(\mathrm{V}_{\text {IH }}\)} & 2.0 & \(\mathrm{cV}_{\mathrm{cc}}\) & V & TTL \\
\hline & & 3.0 & \(\mathrm{cV}_{\mathrm{cc}}\) & V & CMOS \\
\hline Output Low Voltage & \(\mathrm{V}_{\text {OL }}\) & & 0.4 & V & \(\mathrm{I}_{\mathrm{OL}}=12 \mathrm{~mA}\) \\
\hline \multirow{2}{*}{Output High Voltage} & \multirow{2}{*}{\(\mathrm{V}_{\text {OH }}\)} & 2.4 & & \multirow{2}{*}{V} & \(\mathrm{I}_{0 \mathrm{~L}}=45 \mathrm{~mA}\) \\
\hline & & \(\mathrm{V}_{\text {cC }}-0.4\) & & & \(\mathrm{I}_{\mathrm{OL}}=12 \mathrm{~mA}\) \\
\hline Input Leakage Current & \(\mathrm{I}_{\mathrm{LI}}\) & & \(\pm 10\) & \(\mu \mathrm{A}\) & \(0<=\mathrm{V}_{\text {IN }}<=\mathrm{V}_{\text {cc }}\) \\
\hline Output Leakage Current & \(\mathrm{I}_{\mathrm{OL}}\) & & \(\pm 10\) & \(\mu \mathrm{A}\) & \(0.4<\mathrm{V}_{\text {OUT }}<\mathrm{V}_{\text {CC }}\) \\
\hline Power Supply Current & \(\mathrm{I}_{\mathrm{cc}}\) & & 100 & mA & 1 \\
\hline Input Capacitance & \(\mathrm{C}_{\text {IN }}\) & & 8 & pF & 2 \\
\hline I/O or Output Capacitance & \(\mathrm{C}_{\text {OUT }}\) & & 10 & pF & 2 \\
\hline
\end{tabular}

\section*{Notes:}
1. Under normal operation. Specially constructed programs can draw substantially more current.
2. \(f_{C}=1 \mathrm{MHz}\). Capacitance values are not tested.

Table 63. Input C haracteristics
\begin{tabular}{|c|c|c|c|c|}
\hline \multirow{2}{*}{PIN} & \multirow{2}{*}{Input Level} & \multicolumn{2}{|l|}{Impedance, Ohms} & \multirow{2}{*}{Notes} \\
\hline & & Minimum & Maximum & \\
\hline \multirow[t]{2}{*}{\(\overline{\mathrm{AD}}[31: 0]\)} & \multirow[t]{2}{*}{TTL} & 25K & 50K & repeater, \(\mathrm{V}_{\text {IH }}\) \\
\hline & & 15K & 30K & repeater, \(\mathrm{V}_{\text {IL }}\) \\
\hline CLK & CMOS & 1M & & must be driven \\
\hline \(\overline{\mathrm{IN}}\) [7:0] & TTL & 1M & & \\
\hline \(\overline{\mathrm{MF}} \mathrm{LT}\) & TTL Schmitt trigger & 250K & 500K & pull-up \\
\hline \(\overline{\text { RESET }}\) & CMOS Schmitt trigger & 250K & 500K & pull-up \\
\hline
\end{tabular}

\section*{PSC1000 M icroprocessor}

32-BIT RISC PROCESSO R

\section*{AC Characteristics}

Table 64. CPU-Clock and 2X-CPU-Clock
\begin{tabular}{|c|l|l|l|l|c|c|}
\hline No. & Characteristic & Symbol & Min & Max & Unit & Notes \\
\hline 1 & Clock period & & 12.5 & & ns & \\
\hline & Stabilization of PLL & & & 10,000 & CPU-Clocks & 3 \\
\hline
\end{tabular}

\section*{Notes:}
1. CPU-Clock generated from \(\overline{\mathrm{C}} \overline{\mathrm{LK}}\) edges.
2. 2X-CPU-Clock intermediate pulse generated with a PLL.
3. Required after \(\overline{\operatorname{RESET}}\) inactive before dependance on \(2 \mathrm{X}-\mathrm{CPU}\)-Clock timing.


Figure 72. CPU-Clock and 2X-CPU-Clock

\section*{Electrical Characteristics}

PSC1000 MICRO PRO CESSO R
Table 65. CPU Reset Timing
\begin{tabular}{|c|l|c|c|c|c|c|}
\hline No. & Characteristic & Symbol & Min & Max & Unit & Notes \\
\hline 1 & Reset active time, pin & & 2 & & CPU-clocks & \\
\hline 2 & Reset active to AD Hi-Z & & 3 & 4 & CPU-clocks & 2 \\
\hline & & & 10 & 11 & CPU-clocks & 3 \\
\hline 3 & \begin{tabular}{l} 
Reset inactive to AD active and start of RAS \\
prefix for first bus cycle
\end{tabular} & & 4 & 5 & CPU-clocks & \\
\hline 4 & Reset active to signals inactive & & 3 & 4 & CPU-clocks & 2 \\
\cline { 3 - 7 } & & & 10 & 11 & CPU-clocks & 3 \\
\hline
\end{tabular}

\section*{Notes:}
1. AD have bus repeaters that hold the last bus state when not driven by the CPU or an external device.
2. When reset is sampled from RESET.
3. When reset is sampled from AD8.
4. States occur from subsequent bus cycle and program execution.


Figure 73. CPU Reset Timing

PSC1000 M icroprocessor
32-BIT RISC PROCESSO R
Table 66. Memory Read and Write Timing
\begin{tabular}{|c|c|c|c|c|c|}
\hline No. & Characteristic & Min & Max & Unit & Notes \\
\hline 1 & RAS P refix & \multicolumn{2}{|l|}{1 + mgbtras + mgbtrhld} & CPU-clocks & 5 \\
\hline 2 & \(\overline{\text { RAS }}\) inactive & \multicolumn{2}{|l|}{(mgbtras \(\cdot 2\) ) - mgbteras} & \[
\begin{aligned}
& \text { 2X-CPU- } \\
& \text { clocks }
\end{aligned}
\] & 5 \\
\hline 3 & RAS address hold & \multicolumn{2}{|l|}{(mgbtrhld \(\cdot 2\) ) + mgbteras} & \[
\begin{aligned}
& \text { 2X-CPU- } \\
& \text { clocks }
\end{aligned}
\] & 5 \\
\hline 4 & RAS prefix start to \(\overline{\text { RAS }}\) rise & \multicolumn{2}{|c|}{1} & CPU-clocks & \\
\hline 5 & End of bus cycle to start of next & 0 & & CPU-clocks & \\
\hline 6 & CAS part & \multicolumn{2}{|l|}{mgbtcast + mgebtdobe + mgebtcase} & CPU-clocks & 5 \\
\hline & & \multicolumn{2}{|l|}{mgbtcast + ioebtdobe + ioebtcase} & CPU-clocks & 5,9 \\
\hline 7 & CAS part start to \(\overline{C A S}\) fall & \multicolumn{2}{|c|}{mgbtcas} & \[
\begin{aligned}
& \text { 2X-CPU- } \\
& \text { clocks }
\end{aligned}
\] & 5 \\
\hline 8 & CAS part start to \(\overline{\text { MGS }} \overline{\mathrm{x}}\) rise & \multicolumn{2}{|c|}{time 6} & CPU-clocks & \\
\hline 9 & CAS part start to \(\overline{\text { MGS }} \overline{\mathrm{x}}\) fall & & 3.75 & ns & \\
\hline 10 & \(\overline{\mathrm{MGSx}}\) inactive pulse width, RAS cycle & 0 & & ns & 3 \\
\hline 11 & RAS cycle start to \(\overline{\text { MGSx }}\) fall & & 3.0 & ns & \\
\hline 12 & \(\overline{\mathrm{MGSx}}\) inactive pulse width, CAS cycle & 0 & & ns & 3 \\
\hline 13 & CAS part start to \(\overline{\mathrm{DOB}}\) rise, memory read & \multicolumn{2}{|l|}{(mgbtcast - 2) - mgbteoe} & \[
\begin{aligned}
& \text { 2X-CPU- } \\
& \text { clocks }
\end{aligned}
\] & 5 \\
\hline 14 & CAS part start to \(\overline{\mathrm{DOB}}\) fall & \multicolumn{2}{|c|}{mgbtdob} & 2X-CPUclocks & 5 \\
\hline 15 & CAS part start to CAS address valid & & 5.0 & ns & \\
\hline 16 & \(\overline{\mathrm{DOB}}\) fall to address invalid & 1.5 & & ns & \\
\hline 17 & RAS prefix to address valid & & 2.25 & ns & \\
\hline 18 & RAS prefix end to RAS address invalid & 2.0 & & ns & \\
\hline 19 & RAS prefix end to CAS address valid & & 5.75 & ns & \\
\hline 20 & Data setup before \(\overline{\mathrm{DOB}}\) rise & 16.0 & & ns & 4,6 \\
\hline 21 & Data hold after \(\overline{\mathrm{DOB}}\) rise & 0 & & ns & 4,6 \\
\hline 22 & CAS part start to \(\overline{\mathrm{OE}}\) rise & time 13 & & 2X-CPU-
clocks & \\
\hline 23 & CAS part start to \(\overline{\mathrm{OE}}\) fall & time 14 & & \[
\begin{aligned}
& \text { 2X-CPU- } \\
& \text { clocks }
\end{aligned}
\] & \\
\hline 24 & Previous cycle end to \(\overline{\text { EWE }}\) rise & & 1.75 & ns & \\
\hline 25 & Previous cycle end to \(\overline{\text { LWE }}\) rise & & 1.75 & ns & \\
\hline
\end{tabular}

\section*{Electrical Characteristics}

PSC1000 MICRO PRO CESSO R
Table 66. Memory Read and Write Timing (continued)
\begin{tabular}{|c|c|c|c|c|c|}
\hline No. & Characteristic & Min & Max & Unit & Notes \\
\hline 26 & CAS part start to \(\overline{\mathrm{DOB}}\) rise, memory write & \multicolumn{2}{|l|}{(mgbtcast - 2 ) - mgbtewe} & \[
\begin{aligned}
& \text { 2X-CPU- } \\
& \text { clocks }
\end{aligned}
\] & 5 \\
\hline 27 & \(\overline{\mathrm{D}} \overline{\mathrm{OB}}\) fall to data valid & & 3.25 & ns & 4 \\
\hline 28 & \(\overline{\mathrm{DOB}}\) rise to data not driven & 1.0 & & ns & 4 \\
\hline 29 & CAS part start to EWE rise & time 26 & & CPU-clocks & \\
\hline 30 & CAS part start to EWE fall & & 8.5 & ns & \\
\hline 31 & \(\overline{\mathrm{E} W \mathrm{~W}}\) inactive pulse width, RAS & 2.5 & & ns & 3 \\
\hline 32 & RAS prefix start to \(\overline{\text { EWE fall }}\) & & 6.0 & ns & \\
\hline 33 & \(\overline{\text { EWE }}\) inactive pulse width, CAS & 2.5 & & ns & 3 \\
\hline 34 & CAS part start to EWE fall & \multicolumn{2}{|c|}{mgbtcas} & \[
\begin{aligned}
& \text { 2X-CPU- } \\
& \text { clocks }
\end{aligned}
\] & 5 \\
\hline 35 & CAS part start to \(\overline{\text { LWE }}\) rise & \multicolumn{2}{|c|}{time 26} & \[
\begin{aligned}
& \text { 2X-CPU- } \\
& \text { clocks }
\end{aligned}
\] & \\
\hline 36 & CAS part start to \(\overline{L W E}\) fall & \multicolumn{2}{|l|}{mgbtdob + mgbtlwea + (mgebtdobe - 2)} & \[
\begin{aligned}
& \text { 2X-CPU- } \\
& \text { clocks }
\end{aligned}
\] & 5 \\
\hline & & \multicolumn{2}{|l|}{mgbtdob + mgbtlwea + (ioebtdobe \(\cdot 2\) )} & \[
\begin{aligned}
& \text { 2X-CPU- } \\
& \text { clocks }
\end{aligned}
\] & 5,9 \\
\hline 37 & Previous cycle end to \(\overline{\mathrm{OE}}\) rise & & 2.25 & ns & \\
\hline
\end{tabular}

Notes:
1. AD have bus repeaters that hold the last bus state when not driven by the CPU or an external device.
2. Does not apply to byte-wide data transfers. See note 1 .
3. Minimum applies when time 5 is minimum.
4. Time applies only to data transfers to the CPU.
5. Use decoded value of register fields for calculations.
6. If mgbteoe is set, data must be held until specified time relative to the next CPU-clock timing boundary. See Note 1.
7. \(\overline{\text { MGSx }}\) applies when mmb is set. \(\overline{\text { RAS } x}\) applies when mmb is clear.
8. All CASes and RASes move appropriately.
9. Applies to bus cycles of I/O-channel bus transactions that involve the I/O device.


Figure 74. Memory Read Timing

\section*{Electrical Characteristics}

PSC1000 MICRO PRO CESSO R


Figure 75. Memory Write Timing

\section*{PSC1000 M icroprocessor}

32-BIT RISC PROCESSO R
Table 67. Signal Coincidence Timing


Figure 76. Signal Coincidence Timing

\section*{Electrical Characteristics}

PSC1000 MICRO PRO CESSO R
Table 68. Memory Fault Timing
\begin{tabular}{|c|c|c|c|c|c|c|}
\hline No. & Characteristic & Symbol & Min & Max & Unit & Notes \\
\hline 1 & \(\overline{\text { MF LT }}\) setup & & 4.5 & & ns & 7 \\
\hline 2 & \(\overline{\text { MF LT }}\) hold & & 0 & & ns & 7 \\
\hline 3 & Fault request setup & & 9.0 & & ns & 7 \\
\hline 4 & Fault request hold & & 0 & & ns & 7 \\
\hline 5 & \(\overline{\text { EWE }}\) rise after \(\overline{\text { RAS }}\) fall & & \multicolumn{2}{|l|}{(mgbtrhld•2) +mgbteras} & \[
\begin{aligned}
& \text { 2X-CPU- } \\
& \text { clocks }
\end{aligned}
\] & 8, 9 \\
\hline
\end{tabular}

\section*{Notes:}
1. \(\overline{\mathrm{MGSx}}\) applies when mmb is set.
\(2 \overline{\text { RASx }}\) applies when mmb is clear.
3. \(\overline{\text { MFLT }}\) is used for memory fault requests when pkgmflt is set.
4. AD8 is used for memory fault requests when pkgmflt is clear.
5. Appropriate timing references of \(\overline{\text { RAS }}\) apply to RAS.
6. Conditions exist for time equivalent to the entire bus transaction.
7. Applies as if RAS had fallen at the next CPU-clock timing boundary.
8. Applies only to memory write cycles.
9. Use decoded value of register fields for calculation.


Figure 77. Memory Fault Timing

\section*{Electrical Characteristics}

PSC1000 MICRO PRO CESSO R
Table 69. Refresh Timing
\begin{tabular}{|c|l|c|c|c|c|c|}
\hline No. & Characteristic & Symbol & Min & Max & Unit & Notes \\
\hline 1 & Refresh cycle length & \begin{tabular}{c}
\(1+\) mgbtras \\
+mgbtrhld + \\
mgbtcast + \\
mgebbtdobe + \\
mgbtcase
\end{tabular} & CPU-clocks & 4,5 \\
\hline 2 & RAS cycle precharge & & \begin{tabular}{c} 
(mgbtras \(\cdot 2\) ) \\
-mgbteras
\end{tabular} & \begin{tabular}{c} 
2X-CPU- \\
Clocks
\end{tabular} & 4,5 \\
\hline
\end{tabular}

\section*{Notes:}
1. \(\overline{\mathrm{MGSx}}\) applies when mmb is set.
\(2 \overline{\text { RASx }}\) applies when mmb is clear.
3. Appropriate timing references of \(\overline{\text { RAS }}\) apply to RAS.
4. Timing is for memory group msrtg.
5. Use decoded values of register fields for calculation. Sum is the same as for a RAS cycle.


Figure 78. Refresh Timing

32-BIT RISC PROCESSO R
Table 70. VRAM Timing
\begin{tabular}{|c|c|c|c|c|c|c|}
\hline No. & Characteristic & Symbol & Min & Max & Unit & Notes \\
\hline 1 & \(\overline{\text { RAS }}\) rise to DSF in dsfvras state & & 0 & 2 & CPU-clocks & 9 \\
\hline 2 & \(\overline{\text { RAS }}\) fall to DSF changing to dsfvcas state & & \multicolumn{2}{|l|}{\[
\begin{gathered}
\text { (mgbtrhld•2) } \\
+ \text { mgbteras }
\end{gathered}
\]} & \[
\begin{aligned}
& \text { 2X-CPU- } \\
& \text { clocks }
\end{aligned}
\] & \\
\hline 3 & DSF changing to dsfvcas state before \(\overline{\mathrm{CAS}}\) fall & & \multicolumn{2}{|l|}{mgbtcas + 1} & \[
\begin{aligned}
& \text { 2X-CPU- } \\
& \text { clocks }
\end{aligned}
\] & 10 \\
\hline 4 & DSF in dsfvcas state after \(\overline{\text { CAS }}\) rise & & 0 & 1 & CPU-clocks & 11 \\
\hline 5 & \(\overline{\text { RAS }}\) rise to signal active & & \multicolumn{2}{|c|}{2} & CPU-clocks & \\
\hline 6 & \(\overline{\text { RAS }}\) fall to signal inactive & & \multicolumn{2}{|c|}{time 2} & \[
2 \mathrm{X}-\mathrm{CPU}-
\]
clocks & \\
\hline \multicolumn{7}{|l|}{Notes:} \\
\hline \multicolumn{7}{|l|}{\begin{tabular}{l}
1. During an access to the VRAM memory group when casbvras is clear. \\
2. During an access to the VRAM memory group when casbvras is set.
\end{tabular}} \\
\hline \multicolumn{7}{|l|}{3. During an access to the VRAM memory group when oevras is set.} \\
\hline \multicolumn{7}{|l|}{4. During an access to the VRAM memory group when wevras is set.} \\
\hline \multicolumn{7}{|l|}{5. Active during a memory read.} \\
\hline \multicolumn{7}{|l|}{6. Active during a memory write.} \\
\hline \multicolumn{7}{|l|}{7. DSF is low during non-VRAM memory group accesses.} \\
\hline \multicolumn{7}{|l|}{8. All CAS es move appropriately.} \\
\hline \multicolumn{7}{|l|}{9. If the previous memory cycle was to the VRAM memory group then DSF might not go low between memory cycles.} \\
\hline \multicolumn{7}{|l|}{10. Applies to RAS cycles and CAS cycles.} \\
\hline \multicolumn{7}{|l|}{11. If the next memory cycle is to the VRAM memory group then DSF might not go low between memory cycles.} \\
\hline
\end{tabular}

Electrical Characteristics
PSC1000 MICRO PRO CESSO R


Figure 79. VRAM Timing

\section*{PSC1000 M icroprocessor}

32-BIT RISC PROCESSO R
Table 71. DMA Request Timing
\begin{tabular}{|c|l|c|c|c|c|}
\hline No. & Characteristic & Min & Max & Unit & Notes \\
\hline 1 & Initial DMA request & \(>4\) & CPU-clocks & \\
\hline 2 & \begin{tabular}{l} 
Initial DMA request to first DMA I/O-channel bus \\
cycle start
\end{tabular} & \begin{tabular}{c}
\(>3.25 n s+5\) \\
CPU-cycles
\end{tabular} & \(\infty\) & & 4 \\
\hline 3 & \begin{tabular}{l} 
DMA request setup before end of DMA I/O-channel \\
bus transaction
\end{tabular} & \begin{tabular}{c}
\(6.75 n s+2\) \\
CPU-clocks
\end{tabular} & & & 2 \\
\hline 4 & \begin{tabular}{l} 
DMA request hold after end of DMA I/O-channel \\
bus transaction
\end{tabular} & 0 & ns & 2 \\
\hline 5 & \begin{tabular}{l} 
DMA request high setup before end of DMA I/O- \\
channel bus cycle
\end{tabular} & \begin{tabular}{c}
\(6.75 n s+2\) \\
CPU-clocks
\end{tabular} & 0 & 2 \\
\hline 6 & \begin{tabular}{l} 
DMA request high hold after end of DMA I/O-chan- \\
nel bus cycle
\end{tabular} & 2 & CPU-clocks & 2,5 \\
\hline 7 & \begin{tabular}{l} 
End of DMA bus cycle to start of next DMA I/O- \\
channel bus cycle
\end{tabular} & 2 & & CPU-clocks & 2,5 \\
\hline 8 & \begin{tabular}{l} 
End of DMA bus cycle to start of next non-DMA I/O \\
channel bus cycle
\end{tabular} & & \\
\hline
\end{tabular}

\section*{Notes:}

Timings assume pkgio is set. When pkgio is clear, bus sampling timings predominate.
1. Bus transaction start can be for a RAS or CAS cycle and occurs after bus request overhead.
2. Timings are only relevant on the last bus cycle of a DMA bus transaction. Noted areas can contain 0,3 or 4 bus cycles to complete the bus transaction. Some cycles might be RAS cycles.
3. Bus cycle could be either RAS or CAS.
4. The max condition occurs if the VPU never executes delay or if there are continuous DMA bus transactions from higher priority devices.
5. Value represents bus request overhead.

\section*{Electrical Characteristics}

PSC1000 MICRO PROCESSO R


Figure 80. DMA Request Timing

32-BIT RISC PROCESSO R
Table 72. I/O on Bus Timing


Figure 81. I/O on Bus Timing

\section*{Electrical Characteristics}

PSC1000 MICRO PRO CESSO R
Table 73. Bit Input Sample Timing
\begin{tabular}{|c|l|c|c|c|c|c|}
\hline No. & Characteristic & Symbol & Min & \multicolumn{1}{l|}{ Max } & Unit & Notes \\
\hline 1 & Sample clock period & & \multicolumn{2}{l|}{4} & CPU-clocks & 1 \\
\hline 2 & \(\overline{\text { INx }}\) to sample delay & & .75 & \begin{tabular}{c}
\(1.5 \mathrm{~ns}+4\) \\
CPU-clocks
\end{tabular} & ns & 1 \\
\hline 3 & Low data sampled to ioXin delay & & & 4 & CPU-clocks & \(1,2,5\) \\
\hline 4 & High data sampled to ioXin delay & & 4 & & CPU-clocks & \(1,2,4,5\) \\
\hline 5 & \(\overline{\text { INx }}\) to ioXin delay & & & 1.5 & ns & 1,3 \\
\hline
\end{tabular}

\section*{Notes:}
1. \(\overline{I N}[7: 0]\) are used for inputs when pkgio is set.
2. Allows data sampled in a metastable state to resolve to stated level.
3. Only during a DMA bus transaction on the corresponding I/O channel.
4. Minimum is exceeded when ioin is a persisting zero.
5. Except during a DMA bus transaction on the corresponding I/O channel.


Figure 82. Bit Input Sample Timing

32-BIT RISC PROCESSO R
Table 74. Bit Input from Bus Sample Timing
\begin{tabular}{|c|c|c|c|c|c|c|}
\hline No. & Characteristic & Symbol & Min & Max & Unit & Notes \\
\hline 1 & \(\overline{\text { RAS }}\) fall to first sample & & \multicolumn{2}{|c|}{2} & CPU-clocks & 1 \\
\hline 2 & Continued sample clock while \(\overline{\mathrm{CAS}}\) remains high & & \multicolumn{2}{|c|}{4} & CPU-clocks & \\
\hline 3 & Sample clock to \(\overline{\text { CAS }}\) fall & & & 5.0 & ns & 2 \\
\hline 4 & \(\overline{\mathrm{CAS}}\) rise to first sample & & \multicolumn{2}{|c|}{4} & CPU-clocks & \\
\hline 5 & \(\overline{\mathrm{CAS}}\) inactive & & 4 & & CPU-clocks & \\
\hline 6 & \(\overline{\mathrm{CAS}}\) inactive & & & <4 & CPU-clocks & \\
\hline 7 & External input change to AD change & & & 50.5 & CPU-clocks & 3 \\
\hline 8 & AD to sample delay & & & 4 & CPU-clocks & 4 \\
\hline 9 & Low data sampled to ioin delay & & \multicolumn{2}{|c|}{4} & CPU-clocks & 5 \\
\hline 10 & High data sampled to ioin delay & & 4 & note 5 & CPU-clocks & 5,6 \\
\hline \multicolumn{7}{|l|}{\begin{tabular}{l}
Notes: \\
1. If \(\overline{R A S}\) fall to \(\overline{C A S}\) fall is less than maximum, time 3 applies. \\
2. Applies only when four or more CPU-clock cycles have elapsed since the last sam \\
3. Does not include external buffer delay. \\
4. Minimum is specified only to allow meeting specific sampling events. \\
5. Allows data sampled in metastable state to resolve. \\
6. Minimum is exceeded when ioin is a persisting zero.
\end{tabular}} \\
\hline
\end{tabular}

Electrical Characteristics
PSC1000 MICRO PROCESSO R


Figure 83. Bit input from Bus Sample Timing

NOIL甘WYO』NI ヨコNシヘO＊

\section*{PSC1000 MICRO PRO CESSO R}

Mechanical Characteristics


Figure 84. 100-Pin TQFP Package Dimensions

Table 75. 100-Pin TQFP Package Dimensions
\begin{tabular}{|c|c|c|c|}
\hline \multirow{2}{*}{ Symbol } & \multicolumn{3}{|c|}{ Millimeters } \\
\cline { 2 - 4 } & Min. & Nom. & Max. \\
\hline A & - & - & 1.60 \\
\hline A \(_{1}\) & .05 & - & .15 \\
\hline B & .17 & .20 & .27 \\
\hline C & - & - & .17 \\
\hline D & \multicolumn{3}{|c|}{16.00 BSC.\(\)} \\
\hline \(\mathrm{D}_{1}\) & \multicolumn{3}{|c|}{14.00 BSC.\(\)} \\
\hline E & \multicolumn{3}{|c|}{16.00 BSC.\(\)} \\
\hline \(\mathrm{E}_{1}\) & \multicolumn{3}{|c|}{14.00 BSC.\(\)} \\
\hline L & .45 & .60 & .75 \\
\hline N & \multicolumn{3}{|c|}{100} \\
\hline e & \multicolumn{3}{|c|}{.50 BSC.\(\)} \\
\hline coplanarity & - & - & .08 \\
\hline\(\theta\) & \(0 \circ\) & \(3.5^{\circ}\) & \(7.0^{\circ}\) \\
\hline
\end{tabular}

Note: J EDEC SPEC MS-026

Table 76. 100-Pin TQFP Package Thermal Characteristics
\begin{tabular}{|c|c|c|c|c|c|c|c|}
\hline \multirow[t]{2}{*}{Characteristic} & \multirow[t]{2}{*}{Symbol} & \multicolumn{4}{|l|}{Value @ Airflow LFM} & \multirow[t]{2}{*}{Unit} & \multirow[t]{2}{*}{Notes} \\
\hline & & 0 & 225 & 500 & 1000 & & \\
\hline Thermal R esistance, J unction to Ambient & \(\theta_{\text {JA }}\) & 42 & 37 & 32 & 28 & \({ }^{\circ} \mathrm{C} / \mathrm{W}\) & \\
\hline Thermal R esistance, J unction to Case & \(\theta_{\text {J }}\) & \multicolumn{4}{|c|}{10} & \({ }^{\circ} \mathrm{C} / \mathrm{W}\) & \\
\hline
\end{tabular}

Notes:

Revision History

NOII \(\forall\) WYO』NI ヨつN \(\forall \wedge\) A \(\forall\)

\section*{Distributors and Sales Offices}

\section*{PSC1000 MICRO PROCESSOR}

\section*{D istributors and Sales 0 ffices}

\section*{Asia}

\section*{JAPAN}

RealVision
3-1-1 Shin-Yokohama
Kouhoku-Ku, Yokohama
2220033 JAPAN
Mac Sano
Tel: 81 (45)473-7331
Fax: 81 (45)473-7330
e-mail: sano@realvision.co.jp

\section*{KO REA}

Acetronix
\(5^{\text {th }} \mathrm{Fl}\) Namhan Bldg
76-42 Hannam-Dong
Yongsan-Ku Seoul 140-210, Korea
Tel: +822-796-4561
Fax: +822-796-4563
Shane Rhee
e-mail: ace@ace-tronix.co.kr

\section*{SIN G APO RE, MALAYSIA, THAILAND, PHILIPPIN ES, INDO NESIA}

Microtronics Associates PTE LTD
8 Lorong Baker Bantu, \#30-01
Kolam Ayer Industrial Pakr,
Singapore 348743
Tel: 65-748-1835
Fax: 65-743-3065
Samuel Tan
e-mail: microapl@pacific.net.sg
web site: www.microtronics-associate.com

\section*{TAIW AN}

Pantek Technology Corp. 11F, No. 156 Sec. 5 Nan-King E. Rd Taipei, Taiwan R.O.C.
Tel: +886-225-2749-5909
Fax: +886-225-2749-4053
Victor Shen
e-mail: pantek@gen.net.tw

\section*{Europe}

\section*{FINLAND}

Inegrated Electronics Oy Ab
Laurinmaenkuja 3 A, 00440 Helsinki
PL31 00441 Helsinki, Finland
Tel: 90-2535-4400
Fax: 90-2535-4450
Ilpo Hamunen
e-mail: ilpo@ieoy.fi
web site: www.ieoy.fi

\section*{GERMANY \& AUSTRIA}

Ineltek Gmbh
Haupststr. 45
D-85922 Heidenheim, Germany
Tel: 49-7321-9385-0
Fax: 49-7321-9385-95
Roland Becker
e-mail: becker@ineltek.com
web site: www.ineltek.com

\section*{Middle East}

ISRAEL
Iridium Data Ltd.
1 Shwartz St. Eliave Center
P.O. Box 677

Ra'anana 43000
Tel: +972-9-74505555
Fax: +972-9-7451515
Yossi Gabbay
e-mail: iridium@netvision.net.il
web site: www.iridium.co.il

\section*{USA}

Patriot Scientific Corporation
10989 Via Frontera
San Diego, CA 92127
1 (619) 6745000 (voice)
1 (619) 6745005 (fax)
www.ptsc.com

NOIL甘WYO』NI ヨコNシヘO＊

PSC1000 MICRO PROCESSO R
Index

NOIL甘WYO』NI ヨコNシヘO＊```

