TMS320C6672
Multicore Fixed and Floating-Point Digital Signal Processor
SPRS708C—February 2012
www.ti.com
1.1 KeyStone Architecture
TI’s KeyStone Multicore Architecture provides a high performance structure for integrating RISC and DSP cores
with application specific coprocessors and I/O. KeyStone is the first of its kind that provides adequate internal
bandwidth for nonblocking access to all processing cores, peripherals, coprocessors, and I/O. This is achieved with
four main hardware elements: Multicore Navigator, TeraNet, Multicore Shared Memory Controller, and
HyperLink.
Multicore Navigator is an innovative packet-based manager that controls 8192 queues. When tasks are allocated to
the queues, Multicore Navigator provides hardware-accelerated dispatch that directs tasks to the appropriate
available hardware. The packet-based system on a chip (SoC) uses the two Tbps capacity of the TeraNet switched
central resource to move packets. The Multicore Shared Memory Controller enables processing cores to access
shared memory directly without drawing from TeraNet’s capacity, so packet movement cannot be blocked by
memory access.
HyperLink provides a 50-Gbaud chip-level interconnect that allows SoCs to work in tandem. Its low-protocol
overhead and high throughput make HyperLink an ideal interface for chip-to-chip interconnections. Working with
Multicore Navigator, HyperLink dispatches tasks to tandem devices transparently and executes tasks as if they are
running on local resources.
1.2 Device Description
The TMS320C6672 DSP is a highest-performance fixed/floating-point DSP that is based on TI's KeyStone multicore
architecture. Incorporating the new and innovative C66x DSP core, this device can run at a core speed of up to 1.5
GHz. For developers of a broad range of applications, such as mission critical, medical imaging, test and automation,
and other applications requiring high performance, TI's TMS320C6672 DSP offers 3 GHz cumulative DSP and
enables a platform that is power-efficient and easy to use. In addition, it is fully backward compatible with all existing
C6000 family of fixed and floating point DSPs.
TI's KeyStone architecture provides a programmable platform integrating various subsystems (C66x cores, memory
subsystem, peripherals, and accelerators) and uses several innovative components and techniques to maximize
intra-device and inter-device communication that allows the various DSP resources to operate efficiently and
seamlessly. Central to this architecture are key components such as Multicore Navigator that allows for efficient data
management between the various device components. The TeraNet is a non-blocking switch fabric enabling fast and
contention-free internal data movement. The multicore shared memory controller allows access to shared and
external memory directly without drawing from switch fabric capacity.
For fixed-point use, the C66x core has 4× the multiply accumulate (MAC) capability of C64x+ cores. In addition,
the C66x core integrates floating point capability and the per core raw computational performance is an
industry-leading 32 MACS/cycle and 16 flops/cycle. It can execute 8 single precision floating point MAC operations
per cycle and can perform double- and mixed-precision operations and is IEEE754 compliant. The C66x core
incorporates 90 new instructions (compared to the C64x+ core) targeted for floating point and vector math oriented
processing. These enhancements yield sizeable performance improvements in popular DSP kernels used in signal
processing, mathematical, and image acquisition functions. The C66x core is backwards code compatible with TI's
previous generation C6000 fixed and floating point DSP cores, ensuring software portability and shortened software
development cycles for applications migrating to faster hardware.
The C6672 DSP integrates a large amount of on-chip memory. In addition to 32KB of L1 program and data cache,
there is 512KB of dedicated memory per core that can be configured as mapped RAM or cache. The device also
integrates 4096KB of Multicore Shared Memory that can be used as a shared L2 SRAM and/or shared L3 SRAM. All
L2 memories incorporate error detection and error correction. For fast access to external memory, this device
includes a 64-bit DDR-3 external memory interface (EMIF) running at 1600 MHz and has ECC DRAM support.
14
Features
Copyright 2012 Texas Instruments Incorporated