RM7965A-900UI 900 MHz 64-bit Microprocessor Data Sheet
5
Cache Architecture
The E9000 cache architecture is similar to that of the RM7000. Each core contains 16-KBytes
of instruction cache, 16 KB of data cache, and 256 KB of unified secondary cache. The
instruction cache, data cache and secondary cache are all four-way set associative. Cache
locking is supported for all of the caches, and the caches can be locked with line granularity.
This is very useful for keeping frequently called routines in the cache, along with frequently
accessed data structures such as look-up tables for routing and other data communications
applications. The E9000 data cache is non-blocking, and the pipeline will not stall until a third
cache-miss or a data dependency is encountered.
Each primary cache has a 64-bit read path and a 128-bit write path. Both caches can be accessed
simultaneously. The primary caches provide the integer and floating-point units with an
aggregate bandwidth of 14.4 GB/s at an internal clock frequency exceeding 800 MHz. During
an instruction or data primary cache refill, the secondary cache can provide a 64-bit datum
every cycle following an initial five-cycle latency, for a peak bandwidth of 7.2 GB/s.
5.1 Instruction Cache
The integrated 16 KB, four-way set associative instruction cache in the E9000 is virtually
indexed and physically tagged. The effective physical index eliminates the potential for virtual
aliases in the cache.
The data array portion of the instruction cache is 64 bits wide and protected by word parity
while the tag array holds a 24-bit physical address, 14 control bits, a valid bit, and a single
parity bit.
By accessing 64 bits per cycle, the instruction cache is able to supply two instructions per cycle
to the superscalar dispatch unit. For signal processing, graphics, and other numerical code
sequences where a floating-point load or store and a floating-point computation instruction are
being issued together in a loop, the entire bandwidth available from the instruction cache is
consumed by instruction issue. For typical integer code mixes, where instruction dependencies
and other resource constraints restrict the level of parallelism that can be achieved, the extra
instruction cache bandwidth is used to fetch both the taken and non-taken branch paths to
minimize the overall penalty for branches.
A 32-byte (8 instruction) line size is used to maximize the communication efficiency between
the instruction cache and the secondary cache, tertiary cache, or memory system.
The E9000 supports cache locking on a per line basis. The contents of each line of the cache can
be locked by setting a bit in the Tag RAM. Locking the line prevents its contents from being
overwritten by a subsequent cache miss. Refills occur only into unlocked cache lines. This
mechanism allows the programmer to lock critical code into the cache, thereby guaranteeing
deterministic behavior for the locked code sequence.
Proprietary and Confidential to PMC-Sierra, Inc., and for its customers’ internal use.
Document No.: PMC-2100294, Issue 2
30