TMS320C6678
Multicore Fixed and Floating-Point Digital Signal Processor
SPRS691D—April 2013
www.ti.com
1.1 KeyStone Architecture
TI’s KeyStone Multicore Architecture provides a high performance structure for integrating RISC and DSP cores
with application specific coprocessors and I/O. KeyStone is the first of its kind that provides adequate internal
bandwidth for nonblocking access to all processing cores, peripherals, coprocessors, and I/O. This is achieved with
four main hardware elements: Multicore Navigator, TeraNet, Multicore Shared Memory Controller, and
HyperLink.
Multicore Navigator is an innovative packet-based manager that controls 8192 queues. When tasks are allocated to
the queues, Multicore Navigator provides hardware-accelerated dispatch that directs tasks to the appropriate
available hardware. The packet-based system on a chip (SoC) uses the two Tbps capacity of the TeraNet switched
central resource to move packets. The Multicore Shared Memory Controller enables processing cores to access
shared memory directly without drawing from TeraNet’s capacity, so packet movement cannot be blocked by
memory access.
HyperLink provides a 50-Gbaud chip-level interconnect that allows SoCs to work in tandem. Its low-protocol
overhead and high throughput make HyperLink an ideal interface for chip-to-chip interconnections. Working with
Multicore Navigator, HyperLink dispatches tasks to tandem devices transparently and executes tasks as if they are
running on local resources.
1.2 Device Description
The TMS320C6678 DSP is a highest-performance fixed/floating-point DSP that is based on TI's KeyStone multicore
architecture. Incorporating the new and innovative C66x DSP core, this device can run at a core speed of up to
1.25 GHz. For developers of a broad range of applications, such as mission critical, medical imaging, test and
automation, and other applications requiring high performance, TI's TMS320C6678 DSP offers 10 GHz cumulative
DSP and enables a platform that is power-efficient and easy to use. In addition, it is fully backward compatible with
all existing C6000 family of fixed and floating point DSPs.
TI's KeyStone architecture provides a programmable platform integrating various subsystems (C66x cores, memory
subsystem, peripherals, and accelerators) and uses several innovative components and techniques to maximize
intra-device and inter-device communication that allows the various DSP resources to operate efficiently and
seamlessly. Central to this architecture are key components such as Multicore Navigator that allows for efficient data
management between the various device components. The TeraNet is a non-blocking switch fabric enabling fast and
contention-free internal data movement. The multicore shared memory controller allows access to shared and
external memory directly without drawing from switch fabric capacity.
For fixed-point use, the C66x core has 4× the multiply accumulate (MAC) capability of C64x+ cores. In addition,
the C66x core integrates floating point capability and the per core raw computational performance is an
industry-leading 40 GMACS/core and 20GFLOPS/core (@1.25 GHz operating frequency). It can execute 8 single
precision floating point MAC operations per cycle and can perform double- and mixed-precision operations and is
IEEE754 compliant. The C66x core incorporates 90 new instructions (compared to the C64x+ core) targeted for
floating point and vector math oriented processing. These enhancements yield sizeable performance improvements
in popular DSP kernels used in signal processing, mathematical, and image acquisition functions. The C66x core is
backwards code compatible with TI's previous generation C6000 fixed and floating point DSP cores, ensuring
software portability and shortened software development cycles for applications migrating to faster hardware.
The C6678 DSP integrates a large amount of on-chip memory. In addition to 32KB of L1 program and data cache,
there is 512KB of dedicated memory per core that can be configured as mapped RAM or cache. The device also
integrates 4096KB of Multicore Shared Memory that can be used as a shared L2 SRAM and/or shared L3 SRAM. All
L2 memories incorporate error detection and error correction. For fast access to external memory, this device
includes a 64-bit DDR-3 external memory interface (EMIF) running at 1600 MHz and has ECC DRAM support.
14
Features
Copyright 2013 Texas Instruments Incorporated