This is the semipublic section of Andy Glew's comp-arch.net wiki. Readable by the world, but writable only by certain people (like Andy Glew).
Please add comments on corresponding public shadow part of the wiki. See the sidebar for a link to the corresponding page.
This is the semipublic wiki at comp-arch.net.
See comp-arch.net Overview and Administrivia for an overview of why this site exists.
See Topics for CompArch wiki for something like a technical table of contents. Although be warned that many topic pages may not be linked into this hierarchical structure - they may be linked to only from
- some other page
- they may have been created from a wish or to-do list, and not yet been linked into a logical structure
- or in Special:AllPages
Including Topics for CompArch wiki here, since that is what most people want to go to:
There are currently 289 articles in this wiki. (Some may be support, infrastructure, or administrivia.)
What should be on this wiki
This is not expected to be a full or final list. Rather, it is hoped to be just a start - topics that I want to write essays on.
WikiWish: I wish that this was a clickable, folding/unfolding, list. (TBD: choose one of the many, many, examples of such code.) Currently this is an odd mix of sections and lists. I really want it to be a TopicTree.
Categories for CompArch wiki
The mediawiki categories are NOT a substitute for a Table of contents (TOC) or TopicTree. They are just convenient, an add-on, more easily maintained in some ways. They are not a complete tree: they are only provided for topics that experience has shown have many subtopic pages.
The categories are arranged'
- Topic Related
Often enough new categories not above may be created - see Special:Categories.
Topic Related Categories
- Category:non-Computer Architecture - some non-computer architecture topics will inevitably arise
- Category:Computer Architecture - the main topic of this wiki
- Category:Macroarchitecture - that which is externally visible
- Category:Instruction set architecture (ISA) - the contrast between programmer and machine
- other aspects of macroarchitecture include Pin Architecture, Interconnect Architecture
- Category:Macroarchitecture - that which is externally visible
Corresponding topic pages
- Computer Architecture - the main topic of this wiki
- Macroarchitecture - that which is externally visible
- non-Computer Architecture - some non-computer architecture topics will inevitably arise
Topics that I have not yet organized into the tree. Typically these are topics that I have sketched out in my head, possibly even completely written, but which I have not yet had the time to enter in yet.
See also Vocabulary to-do list.
- Message Passing Instruction Set Extensions
- TLB topics
- Why saying "You need N-way associativity in the TLB to guarantee forward progress" is bogus and reflective of an in-order mindset
- Large pages versus large page tables
- Multilevel TLBs
- Page table topics
- Unusual Operations
- delt - Sun SX pixel processor. (1995). "Add four delta values cumulatively giving 4n-long vector."
- I think this amounts to r0=s0+d0, r1=s1+d0+d1, r2=s2+d0+d1+d2, r3=s3+d0+d1+d2+d3
- I.e. sum prefix of the delta vector, and a vector-vector add, i.e. vadd(s+sumprefix(d))
- Q what is it used for?
- Or, sumprefix(s+d)
- plot - "Bresenham interpolation to vector"
- Intel genX plane, line, blend
- Unified versus Split or Typed Register Files
- Overlapped Registers
- Pipestage Registers
- Number of Operands in an Instruction
- Special Instructions vs. Synthesized
- Error Reporting
- Absolute Jump
- Conditional Relative Branches
- Indirect Jump
- Call/Return Instructions
- Return and adjust stack pointer (MIPS16, microMIPS; fits two register read, one register write constraint when Instruction Pointer/Program Counter is not a GPR, suited to the common case of releasing a small constant-sized stack frame when a frame pointer is not used)
- Where to put the return address
- Save the return address at a fixed offset from the stack pointer, possibly incrementing the stack pointer (x86)
- Save the return address to a special link register (e.g., Power)
- Save the return address to a dedicated GPR (MIPS, SPARC, ARM)
- Save the return address to a specified GPR (Alpha)
- Save the return address onto a separate return address stack (stack processors: Chuck Moore's c18 and Bernd Paysan's b16)
- RISCifying CALL/RETurn
- Jump and Link
- Background Call/Return Stuff
- Position Independent Code
- Call/Return Trivia
- Advanced Call/Return Stuff
Funky control-flow stuff:
- Integer, fixed point
- Floating point (FP) - 32bit, 64bit
- Addresses / Pointers
- strings - usually strings are arrays of small byte-sized integers, counted or terminated. Possibly lists.
- FP80, FP128
- Complex Numbers - usually FP, sometimes integers
- Shift Instructions
- Special Integer Instructions
- Divide and Square Root Step
- Transcendental Instructions
- Special Floating Point
- System Call Instructions
- Control and Status Registers
- Timer Architecture
- Addressing modes
- PC-relative addressing modes and PIC (Position Independent Code)
- Pre/Post Increment/Decrement Addressing Modes
- Fancy Addressing Modes
- SIMD and Vector ISAs
- Hint instructions and/or hint flavors of non-hint instructions
- ABS absolute value, and related instructions such as SAD, sum of absolute difference
- Memory and Cache Prefetch and Control
- Find special byte in register
- Block Copy Instructions
- BitBlt and Compositing
- Compress and Crypto
- XML instructions - just say no!!
Extending an instruction set architecture is one of the fundamental tests of an ISA.
- DSP Instructions
- Video Instructions
- Vector Instructions
- Typical Embedded Instructions
- Parallel Processing Instructions
- Cryptographic Instructions
- Coprocessor Instructions
- Message Passing Instructions
- Cache Control Instructions
- Cache Prefetch Instructions
- Parallel Programming and Synchronization Instructions
- Reconfigurable Logic Instructions
- Logarithmic Number System Instructions
- Fixed Point Arithmetic Instructions
- Block Floating Point Instructions
- Micro-optimization primitive instructions
- Hint instructions and/or hint flavors of non-hint instructions
This section presents recent significant instruction set extensions in an approximately historical order and context for major current companies.
TBD: extend far enough back into the past to see the wheel of reincarnation.
TBD: make predictions about instruction set futures.
- Fixed Length Instructions
- Standard RISC 32 or 64 bit instructions
- 16 bit RISC
- Long instructions: 128 bit
- Funky Fixed Width Instructions like 24, 40, or 42 bits
- Variable Length Instructions
- Why x86 is easier to decode than VAX or Motorola 68K
- ARM Thumb and other RISC compression
- Generic Instruction Set Compression
- Immediates as a primary motivation for variable length instructions
- Heidi Pan's Heads and Tails (fixed width packet with fixed width instruction portions packed left to right and variable length additions packed right to left)
- fixed width packet filled with instructions of two lengths (CDC 6600, M32R)
- VLIW: multiple operations per instruction packet
- Non-adjacent Instruction Formats
- e.g. ATI R700 - control flow program controlling separate, non-adjacent, program "clauses" for exec, data fetch, etc.
- Predicate field in instruction
- SKIP instructions
Modern Microarchitecture Examples
Let us walk through a modern microarchitecture. This provides us a place to hang certain topics.
- Classic RISC Five Stage Pipeline
- Glew Opinion: the classic 5-stage pipeline should have 7-stages
- Typical OOO Pipeline
- or M - Map
- or ROB - ReOrder Buffer
Hanging Topics Off the Pipeline
Now that we have generic names for such pipestages, we can hang some topics, ranging from simple to advanced, off them.
TBD: many, many, topics need to be hung up above.
TBD: what I really need is one such tree object for the pipeline, with multiple views - one view with little detail, one view with more. I like such "show the overview", "show the detail", "now zoom in". Automatically maintained to prevent inconsistencies. Unfortunately, this is yet another WikiWish, something I hope to add to this wiki in my copious spare time. Might be able to approximate by using one of Mediawiki's automatically generated TOCs.
BP - Branch Prediction
- Branch direction predictor (BDP), branch identifier (BID)
- Branch target buffer (BTB), Branch target predictor, Indirect Branch Predictor
- Return address stack (RAS)
- Multilevel Branch Predictors
- Trace BTB, Unrolled BTB
- Branch Predictors for Trace Caches, trace predictors
- taken/not-taken branch prediction history (TNT), stew
- Managing branch prediction history: copying short history versus pointing to large history
- Local versus global branch history
- TNT versus path history
IF - Instruction Fetch
ID - Instruction Decode
RR - Register Rename or M - Map
- Register Renamer Port Reduction
- Register Renamer Dependency Comparison Circuitry
- Multithreaded Register Renaming
RD - Register Read
- Register File Port Reduction
- Read after schedule versus read before schedule
- Register file cache
- Incomplete Bypass Network
S - Schedule
EX - Execute
MEM - Memory Access, usually Data Cache Access
- Sum Addressed Memory
- Specialized Caches
- Todd Austin's Knapsack
- Stack caches
- Memory registerification
- It is surprisingly challenging to maintain the ability to retire (commit) 1 store per cycle - let alone more. You cannot fall into the trap of initiating the store at retirement, and then waiting until it is a confirmed hit (in a writeback cache - let alone in store-through) before initiating the next.
- In-order Pipelined
- Pipeline Stalls - Pipeline Bubbles
- Conventional MP
- SMT, SoEMT, and other hardware threading
- Thread Forking
- GPUs, SIMT, Coherent Threading
Two standard techniques in computer architecture are predictors and caches. By this point in time these can be considered well known and obvious.
See the topic page for a design tree by me, preserved by Mark Smotherman.
See also Taxonomy of Patterns for Prefetchers
- I'm not sure that I like the term "RAM", Random Access Memory. Something like Fixed Address Memory might be more descriptive, since over the years such memories have become less and less randomly accessible
- There are two forks in the discussion of CAM arrays.
- True CAMs - all of the flavors of CAMs
- Separate Tag and Data Arrays - or Combined
- Direct mapped
- Fully Associative - comparisons done at each entry
- N-way Associative
- Tag-sequential - read N tags, compare, and select data way to read
- In array tag comparison
One of the most common hardware functions is scheduling, picking, or prioritization. Not just in OOO schedulers. The scoreboards of in-order processors are really schedulers. But also in memory schedulers, selecting what memory access to handle next, etc.
- Scheduler Elements and Toolkits
- Multilevel and shared caches,, also multilevel cache hierarchy
- Funky Cache Topics
- Strong Ordering or Sequential Consistency
- Weak Consistency
- Intermediate Consistency
See also Memory coalescing
- Page table Structure
- Conventional Tree Structured Page Tables
- Hashed Page Tables
- Folded Virtual Memory
- Special Virtual Memory Issues
- Why I prefer to say Speed rather than High Frequency
- Increasing Speed of Transistors
- Increasing Speed of Logic Design
- Microarchitecture for High Speed
- Replay Pipelines
- Carry-free Arithmetic or Redundant Arithmetic
- Reducing RF and other Array Ports for High Speed
- Staggered ALUs or Width-pipelined ALUs
- Cascaded ALUs
- Collapsing Dependent Operations
- Arguments Against High Speed Design
- DRAM Alignment
- Burst / Block Transfers and Caches
- Aligned Cache Accesses
- Alignment Network Complexity
- Why Vectors Increase Frequency of Misalignment
- Instruction Set Support for Misalignment
- Scatter/Gather Support in the Memory Subsystem
- Alignment and Memory Ordering
- Privilege Levels
- Multiple Privilege Levels
- Virtual Memory
- Interrupts and Exceptions
- System Calls
- Non-OS Privilege Levels
- Virtual Machine Architecture
- Controlling Bugs That Lead to Security Vulnerabilities
- Security Architectures
- The Ubiquitous User/Kernel
- Why More Rings Have Failed - And Keep Getting Reinvented
- ACLs vs Capabilities
- Secure Up-calls
- Memory Protection Unit (MPU) - Memory protection without address translation
- Page groups - PA-RISC and Itanium; can support distinct memory protection within the same address space
- Stack Protection
- Since stack clobbering, buffer overflows on mixed data/return address stacks, is a common source of security problems, there have been many band-aid proposals that fix just this aspect of computer (in)security, including:
as well as some that particularly help this, although are of more general use
- Coprocessor Location in the Pipeline
- Coprocessors and Virtual Memory
- EMON Event Monitoring
- Non-EMON perfmon
- Security Issues with Performance Monitoring - Covert Channels
- Reliability for Memory Arrays: ECC and Redundancy
- Reliability for Communication Channels: CRC
- Reliability for Combinatorial and Irregular Logic
Power and Energy
- Spatial Aspects of Power Management: Domains
- this naturally causes me to wonder of there is any use to Temporal Aspects of Power Management: Modes. Hmm, seems that there is, and, in fact, power management modes came first.
- integration is obvious:
- integration is obvious (except when it is not).
- Levels of integration
- Small scale integration (SSI)
- Medium scale integration (MSI)
- Large scale integration (LSI)
- Very large scale integration (VLSI)
- Ultra large scale integration (ULSI)
- Chip-level integration
- Wafer scale integration
- Chip stacking or 3D integration
- Package level integration
- Chip carriers such as IBM's ceramic thermal modules
- Board level integration
- miniature boards carrying several chips
- Blade level integration
- Mesa-level integration
- Integrating logic and memory
"Conventional" Alternative or Advanced Computer Architecture
Topics listed here are "conventional", in that many people have proposed them, often time and again.
- LIM, PIM (LnM, PnM)
- The Need for New Software to Support Parallel Hardware
- Interval arithmetic
The following are similarly often proposed topics, but these are topics I am more sympathetic to.
- Complex arithmetic
- Matrix-matrix and matrix-vector operations
- LNS (logarithmic number systems)
- Generic multi-component arithmetic:
Less often proposed
Hey, this is my wiki - why can't I indulge my pet topics?
- Speculative Multithreading (SpMT)
- Log-based computer architecture
- Multilevel Branch Predictors
- Dynamic Instruction Rewriting
- Symbolic Memory Disambiguation
- Bit Munging Instructions
- MLP Memory Level Parallelism
- Capability Architecture
- Large Instruction Windows
- Coherent Threading
- Eliminating Unnecessary Read for Ownerships
- Block Memory Operation Agenda
- Lightweight Threading
- Poor Man's ECC
- Multicluster Multithreading (MCMT)
- Alphabet Soup: a Collection of Microarchitectures
- Take every optimization in the Dragon compiler book and consider it in hardware
- Dynamic dead code elimination versus Hardware futures
- Hardware strength reduction
- Dynamic constant propagation
- Dynamic loop detection
- Dynamic common subexpression elimination
- Dynamic loop invariant detection and hoisting
- Hardware unreachable code elimination
- Hardware expression simplification
BS: Bluesky, Brainstorming, Bullshit
The boundary between an advanced topic and BS is variable.
- BS: A Scatter/Gather based Memory System and Interconnection Fabric
- BS: Bitmask Coherency for Writeback Caches
Tools in the Toolkit
- Queues and Buffers
- Bloom Filters
Tricks and hacks:
Topics for the Professional Computer Architect - and Wannabe
Many more people are wannabe computer architects than get paid to it. Heck: I was a wannabe before it became my job, and I still am!
Many people have job descriptions that say computer architect, but are not.
I am often asked for a recommended reading list. Might as well record it.
- Common Terminology Confusions in Computer Architecture