Performance Metrics
PLX Technology, Inc.
8.5
Latency
8.5.1
Queuing Effect
In switches with large internal buffers, the latency increases once internal queuing is developed. The
packet at the end of an egress VC&T queue does not transmit until all packets in front of it are
transmitted. Assume the egress RAM of a x4 port is packed with packets of the same VC&T, draining
the entire egress RAM takes 2,560 clocks (512 beats x 20B per beat / 4B/clock). Worst-case packet
latency can be as long as 10 µs.
To overcome the queuing effect, attempt the following:
• Avoid creating hot spots. In particular, ensure that the upstream port width in a host-centric
application matches the sum width of all active downstream ports. Otherwise, the upstream port
can easily become a hot spot when all downstream ports are attempting to transmit packets to it.
• Program a small Egress queue packet upper and lower limit, to avoid packet accumulation in an
egress port. Section 8.3.2.4 describes how to program these thresholds. Lower latency is achieved
at the cost of reducing the PEX 8532’s capability to buffer transient congestion.
• Reduce traffic load. Lighter traffic is less likely to experience congestion and can drain relatively
faster, as the egress links can drain at the full link rate.
8.5.2
Time Division Multiplex Effect
As previously illustrated, the PEX 8532 source station employs port-to-station aggregation, and the
destination station employs station-to-port de-aggregation. Time Division Multiplex (TDM) controls
aggregation and de-aggregation. Usually, waiting for a proper TDM slot to process packet coming from
or going to a particular port increases the latency. The wider the port, the more TDM slots that port
owns; therefore, the less latency contributed by TDM.
Within a station, only a subset of 16 lanes are connected to SerDes. One approach to reduce latency is to
strap the port as a wider port and allow it to negotiate down to the expected link width. For example, if
there is only a x2 port owned by a station, the port can be strapped as x16, and allowed to become x2
later through the normal link training process. As a result, all TDM slots in this station are acquired by
the x2 port. The worst case TDM effect for a x1 or x2 port is 14 symbol times = 57 ns.
114
ExpressLane PEX 8532AA/BA/BB/BC 8-Port/32-Lane Versatile PCI Express Switch Data Book
Copyright © 2007 by PLX Technology, Inc. All Rights Reserved – Version 1.6