RM7965A-900UI 900 MHz 64-bit Microprocessor Data Sheet
4.4 Delay slots
The intrinsic branch and load delays are each increased by 1 in the E9000 due to the increase in
pipeline length.
4.4.1
4.4.2
Branch Delay
The branch delay slot increases from one to two, but with branch prediction, which has been
simulated to predict accurately ~95% of the time, the effective branch delay stays about one.
The second, or additional, branch delay slot is hidden to the code and is taken as a one-cycle
stall in the case where the branch prediction misses. When the branch prediction hits, this
second slot is taken with the first instruction of the branch target code.
Load Delay
In the E9000, the load delay slot is increased from one to two. Compilers optimized for the
E9000 are able to fill the extra delay slot with non-data dependent instructions. Even code that
has not been recompiled, however, will perform nearly optimally on the E9000 core.
4.5 Branch Prediction
The E9000 has an 8K entry branch prediction table, utilizing a correlative branch prediction
algorithm which increases the accuracy of prediction to greater than 95%. The correlative
algorithm hashes the lower address bits with bits of dynamic prediction from all branches to
derive the index for the branch entry. Using this approach a given branch instruction can have a
predictor for its “inner” loop and a separate predictor for its “outer” loop.
4.6 Integer Unit
The E9000 implements the MIPS64 Instruction Set Architecture including five implementation
specific instructions not found in the baseline MIPS IV ISA, but which are useful for embedded
applications. These instructions are integer multiply-add (MAD), multiply-add unsigned
(MADU), multiply-subtract (MSUB), multiply-subtract unsigned (MSUBU), and three-operand
integer multiply (MUL).
Another instruction new to the E9000 is the Superscalar No-Operation (SSNOP) instruction.
This instruction issues a NOP instruction to each integer unit.
The E9000 integer unit includes 32 general-purpose 64-bit registers, the HI/LO result registers
for two-operand integer multiply/divide operations, and the program counter (PC). There are
two separate execution units: one that can execute function (F) pipe instructions and one that
can execute memory (M) pipe instructions. Refer to Table 4 for the instruction issue rules.
Proprietary and Confidential to PMC-Sierra, Inc., and for its customers’ internal use.
Document No.: PMC-2100294, Issue 2
18