February, 2007
Multiple Stream Throughput
Egress Schedulers Select Frequency
To simplify the egress scheduler design, one limitation incurred is that one packet can be scheduled out
every other Clock cycle. For port widths narrower than x16, this restriction does not cause a
performance degradation. For x16 port widths, the throughput of header-only packets (16B or less) is
not sustained.
x8 and x16 Port PAD Slots
When a PEX 8532 port is configured as x8 or x16, only bytes belonging to one TLP are transmitted in
single-symbol time. The residue lanes are filled with PAD.
Also, for x8 or x16 ports, the PEX 8532 does not attempt to optimize throughput by placing a partial
TLP and DLLP in single-symbol time.
8.4.3
Multiple Stream Throughput
8.4.3.1
Enable PLX-Specific Relaxed Ordering
The PEX 8532 does not support optional Relaxed Ordering bits in TLP, as specified in the
PCI Express Base r1.0a, Table 2-23. By default, all packets entering from a specific port are dispatched
to their respective destinations, based on strict ordering.
However, as described in Section 8.3.2.1, “Source Scheduler,” the PEX 8532 provides its own Relaxed
Ordering to overcome the packet-to-packet dependency in a burst of posted traffic from the same ingress
port, but to different egress ports.
8.4.3.2
Avoid Hot Spots
A hot spot forms when multiple ingress ports attempt to transmit packets to the same egress port, and the
overall influx bandwidth outweighs the efflux bandwidth. If the hot spot is not transient, the hot spot
port throughput can appear high. However, eventually the Egress queues fill, backpressuring the Ingress
queues. When the Ingress queues fill, ingress traffic is backpressured, potentially impacting traffic
flow not targeting the congested egress port. As a result, the switch overall average throughput is
dramatically reduced. PCI Express does not provide a mechanism to recognize and avoid hot spots. It is
therefore left to the system designers to understand and avoid this pitfall.
8.4.4
Throughput and Packet Size Relationship
In general, sustained throughput increases as Payload Size increases due to the increased PCI Express
protocol efficiency. However, the following secondary effects can also affect throughput:
• In peer-to-peer applications, longer packets can result in less interleaved or randomized egress
port distribution compared to shorter packets. This increases the chance of building up transient
congestion in egress ports, and can negatively impact overall throughput.
• Longer packets require fewer header credits per unit time, and are therefore less likely to idle the
link while waiting for additional header credit.
• Longer packets burn up payload credits faster and can stall DLLPs behind the long TLP longer,
potentially causing credit starvation. If there is insufficient link credit (3 TLPs worth or more),
shorter packets may provide better throughput.
• Posted packets block younger packets of other types (Non-Posted and Completions). In a
system with minimal credits, Posted packets should receive the strongest consideration when
allocating credits.
It is recommended to carefully compare the benefits and drawbacks of using longer packets.
ExpressLane PEX 8532AA/BA/BB/BC 8-Port/32-Lane Versatile PCI Express Switch Data Book
Copyright © 2007 by PLX Technology, Inc. All Rights Reserved – Version 1.6
111