Architecting the Metro Edge for Accurate One-Way Frame Delay Measurements | Ethernet Academy Articles

Rate this item

( 10 votes )

Precise one-way frame delay measurements to UNIs at the Metro Edge are critical for service providers providing mobile backhaul. To accurately measure one-way frame delay it is necessary that there be a very close agreement of the Time of Day (ToD) at the UNIs at both ends of the service. In the past GPS was used for this purpose, but today IEEE 1588v2 PTP [1] is available for accurate synchronization, without the complexity of GPS.

Choosing Metro Edge equipment with IEEE 1588v2 support is important, but it is not enough. The Metro Edge must be properly architected to support 1588v2. To do so requires an understanding of how measurement accuracy degrades when the backhaul network exhibits different delays in each direction.

This paper shows how asymmetric delays degrade one-way frame delay measurement accuracy and explains how service providers can architect their mobile backhaul networks to provide highly accurate customer SLA reports and reduce the costs of synchronizing their network.

For more information on Carrier Ethernet and measuring one-way frame delay for Mobile Backhaul, please see the description of PM-1 and PM-2 in section 8 of MEF 35 [2].

RTD Measurements Need Precision, but not Synchronization

Before we dig into One-Way Frame Delay, let’s take a look at its predecessor, round trip delay (RTD). A measurement of RTD is simple and popular because it requires no synchronization between the endpoints. As noted in section 8.2 of Y.1731 [3].

However, one-way frame delay measurement requires that the clocks at the transmitting MEP and the receiving MEPs are synchronized. For the purposes of frame delay variation measurement, which is based on the difference between subsequent frame delay measurements, the requirement for the clock synchronizations can be relaxed since the out-of-phase period can be eliminated in the difference of subsequent frame delay measurements.

If it is not practical for the clocks to be synchronized, which is expected to be the most common scenario, the frame delay measurement can be made only for two-way measurements, where the MEP transmits a frame with ETH-DM request information with the TxTimeStampf, and the receiving MEP responds with a frame with ETH-DM reply information with TxTimeStampf copied from the ETH-DM request information. The MEP receiving the frame with ETH-DM reply information compares the TxTimeStampf with the RxTimeb, which is the time at the reception of frame with ETH-DM reply information and calculates the two-way frame delay as:

Frame delay = RxTimeb – TxTimeStampf

The MEP can also make two-way frame delay variation measurements based on its ability to calculate the difference between two subsequent two-way frame delay measurements.

As described above, a timer is started when a packet is sent from one end, and the elapsed time is measured when a response is received. An early example of this method is the RFC792 echo/echo reply (usually referred to by the “ping” utility). The accuracy of a RTD measurement can be improved by adding timestamps at the responder to remove its processing time from the measurement, as defined in the DMM/DMR messages define in Y.1731 [3] and in IEEE 1588v2 [1]. Figure 1 below (reproduced from Figure 12 in IEEE 1588v2 [1]) shows how the 4-timestamp method works. As shown, the processing time at the receiver may be calculated as t₃-t₂.

After the Delay_Req message is received at the master and the Delay_Resp message is received at the slave, each side has four timestamps: t₁, t₂, t₃, and t₄. The master and slave can then calculate the RTD as:

RTD = (elapsed time at master) – (processing time at slave) = (t₄-t₁) – (t₃-t₂)

Note that the absolute values of the timestamps at the master (t4 and t1) do not need to be correlated to the absolute values of the timestamps at the slave (t₃ and t2). The accuracy of RTD measurement will be limited by how close the master and slave clocks are in frequency, but it is usually straightforward and economical to achieve the needed accuracy.

The Key to Measuring One Way Frame Delay is Time of Day (ToD)

In contrast to an RTD measurement, a measurement of One Way Frame Delay requires that the two endpoints be synchronized with respect to ToD. There are two basic ways to achieve ToD synchronization:

An out-of-band method that is delivered independently from the data e.g. GPS.
An inband method that is delivered along with the data. Examples include IEEE 1588v2 Precision Time Protocol [1] (PTP) and Network Time Protocol (NTP) [4].

The advantage of an out-of-band method is that it decouples the measurement system from the measuring system. This adds additional accuracy and resiliency. However, an out-of-band method adds cost and complexity.

The inband methods (NTP and PTP) are popular because they add minimal cost to the end node and the system as a whole. However, there are some drawbacks.

The first is the intrinsic limits to the accuracy achieved. This issue is being addressed through better network equipment and the use of transparent clocking to adjust for node internal delay variations.
A second potential issue is due to the timing synchronization messages sharing a physical path with the data. The timing algorithms assume symmetry, so physical asymmetry cannot be detected along this shared path. As a result the calculated ToD will be incorrect and offset by half of the asymmetric delay. This limitation can be addressed through the use of symmetric technologies such as G.SHDSL for copper, Ethernet over TDM/SONET for existing facilities and direct Ethernet over fiber where available. Asymmetric technologies such as VDSL2 and GPON introduce timing asymmetry that cannot be detected using One Way Frame Delay measurements based on ToD delivered using packet timing protocols. Ethernet rings can also introduce asymmetry if the upstream and downstream flows take different directions around the ring.

NTP and PTP Assume Symmetry in the Physical Data Path

The limitation of NTP and PTP in detecting asymmetry is due to the way that they determine the offset from their master. When the slave receives a timestamp form the master the time contained is now offset by the time it took to travel from the master to the slave. The slave can make a calculation of the one way propagation delay by measuring the RTD as described above and then dividing the RTD by 2.

Please refer to Figure 1 above to help understand this discussion. After the Delay_Resp message is received at the slave, the slave has 4 timestamps: t₁, t₂, t₃, and t₄. The slave can measure the offset of its clock from the master clock as:

Offset = [ ( t₂ - t₁ ) – ( t₄ - t₃ ) ] / 2

Again, the division by 2 reflects the assumption of symmetry. The offset is used to correct the timestamp sent by the master.

When the link delays are not symmetric, then the asymmetry will introduce bias:
“The NTP synchronization is correct when both the incoming and outgoing routes between the client and the server have symmetrical nominal delay. If the routes do not have a common nominal delay, the synchronization has a systematic bias of half the difference between the forward and backward travel times.” [4]
“Like all message-based time transfer protocols, PTP time accuracy is degraded by asymmetry in the paths taken by event messages … Specifically the time offset error is 1/2 of the asymmetry.” [1]

For those of you who want to understand the details of why asymmetry causes an offset, I have included a section below called “Example Calculations” containing details for the symmetric and asymmetric cases, along with the results of a real-world test.

The Utility of 1DM

We see now that One Way Frame Delay cannot detect differences in the physical data path when a packet timing protocol is used and the path is shared. So, what is the value of One Way Frame Delay?

The answer lies in the difference in the treatment of the timing packets versus the 1DM measurement packets.

The timing packets are forwarded at a high priority, and their timestamps are adjusted by boundary clock devices to minimize jitter. They should experience minimal delay variation due to queuing.
In contrast, the One Way Frame Delay packets are given the same CoS marking and treatment as the data stream that they are measuring. They will experience the same delay variation and loss as the user packets they simulate. As such, they can provide an accurate measurement of the delay and delay variation experienced by the user traffic of interest.

Figure 2 below shows how One Way Frame Delay can indicate the type of treatment that user packets are receiving. Note that the slave timing function would typically be implemented in a Network Interface Device (NID), as would the delay measurement function.

The upper red path shows the path taken by timing packets. Queuing delays are minimized for these packets, and known internal delays are corrected using transparent clock functionality.

In contrast, the lower green path shows the path taken by 1DM packets for measuring One Way Frame Delay. They experience the same queuing as other packets in the same COS, giving an accurate indication of the delay seen by the user of the service.

Key Takeaways for Using One Way Frame Delay and Packet Timing in the Metro Edge

Takeaway #1: Asymmetry in the Physical Data Path Can’t Be Detected Using One Way Frame Delay

As described above, packet-based timing protocols assume symmetry in the physical data path. Because they assume symmetry, they can’t detect asymmetry, and any asymmetry in the physical data path will introduce a bias error in the slave clock. This is the nature of the packet timing protocols, and can’t be overcome by any particular equipment or implementation of the timing. This limitation applies to part of the physical data path shared by clock messages and measurement messages. Figure 3 below shows where One Way Frame Delay provides useful info on physical asymmetry and where it is simply RTD/2.

As shown, end-to-end One Way Frame Delay measurements made on EVC12 between UNI1 and UNI2 will show any differences in the physical data path between the two masters because the timing packets do not share the same path as the data. However, One Way Frame Delay measurements made on EVC13 between UN1 and UNI3 will show RTD/2 when there is no congestion, regardless of any physical asymmetry. This is because the timing packets and the measurement packets travel along the same path.

Figure 4 below is adapted from Figure 21 in MEF 22.1 [6] and shows an example of One Way Frame Delay in a Mobile Backhaul application. As mentioned above, there would likely be a NID located near the UNI-C to perform the delay measurement function.

As shown, the timing packets and the measurement packets share a common physical path through part of the MEN nearest to the UNI at the tower. Physical asymmetry in this part of the network is not detectable using One Way Frame Delay due to the limitations of packet-based timing.

Takeaway #2: Symmetry in the Physical Data Path Must be Designed For and Built In

If you need symmetry in the physical data path it must designed in. Above all, this means using media with symmetric behavior, such as the following:

TDM/PDH: T1/E1, DS3/E1, Linear SONET/SDH
EoC: G.SHDSL
Fiber: 100BASE-FX, 1000BASE-SX, 1000BASE-LX, etc.

Media with asymmetric properties will introduce ToD bias that is not detectable using One Way Frame Delay. Example 6 below shows how a typical VDSL2 deployment can introduce a ToD bias of 9 us. This bias is much larger than the vestigial asymmetry typical in symmetric media. While the bias may not be material in all cases, it should be considered during network design. Other examples of asymmetric media include:

ADSL
VDSL/VDSL2
GPON

The asymmetry can be verified using an external test set with an external timing source such as GPS, but not by equipment that relies on timing packets for synchronization.

Takeaway #3: The Clock Master Should be as Close as Possible to the UNI

The issues identified in the paper affect the part of the path shared by clock messages and measurement messages. This is why we don’t use a single timing source in the network and distribute the timing packets from there. As noted in [5],

Since centrally served NTP timing packets experience the same network delays as the payload data, they are fundamentally unsuitable as a one-way delay measurement reference.
…
The solution is to distribute accurate sources of time throughout the network and get it as close to the client as possible.

By pushing the clock source out as far as possible you can minimize the part of the network to which these limitations applied.

Example Calculations

Here is an example of calculating offset and One Way Frame Delay when the assumption of physical symmetry is valid (4 ms in each direction) as well as when the physical delays are asymmetric (1 ms downstream, 7 ms upstream). The values that differ in the asymmetric case are highlighted. Please refer back to Figure 1 above to see the various times graphically.

Note that in the examples below I am using the same messages for synchronization and One Way Frame Delay. Typically, different message types are used for these two purposes. However, this simplification has no bearing on the results. Using different message types will yield the same measurements.

Example 1: Measure RTD and Estimate ToD Offset

This example shows the initial measurement of the offset of the slave clock from that of the master. It is assumed that the timing packets are given priority through the system, and that a number of calculations are performed to come up with a measurement of the offset.

Table 1 below shows two examples with the same RTD of 8 ms: one with symmetric delay (4 ms in each direction), and one with asymmetric delay (1 ms downstream and 7 ms upstream). They are shown side-by-side to highlight the impact of asymmetry.

In the symmetric case, the clock at the slave is adjusted by the measured offset of 50 ms to achieve synchronization with the master.

In the asymmetric case, the measured offset at the slave is off by 3 ms, which is half the difference in the two one-way frame delays ( = (7 ms - 1 ms) / 2 ).

Example 2: Measure RTD and Verify ToD Offset

This example uses the same networks shown in Example 1 above. Note that the calculations in this example are repeated periodically to verify the clock alignment. Again, the timing packets are assumed to be given priority treatment through the system.

The measured offset of 0 means that the clocks at the master and slave are still measured to be in synch. However, there is an undetected bias of 3 ms at the slave. This is not detectable because the 4 timestamps (t₁, t₂, t₃, and t₄) are the same in both cases!

Example 3: Measure One Way Frame Delay – Physical Data Path

Once the master and slave clocks are synchronized with respect to ToD it is possible to use these clocks to measure One Way Frame Delay in each direction. However, this is a bad assumption in the asymmetric case. Table 3 below contrasts measurements using 1DM frames as defined in Y.1731 [3] in two situations with the same RTD (8ms): a symmetric case (4ms in each direction), and an asymmetric case (1 ms downstream and 7 ms upstream). As shown in Table 3 below, the asymmetric case will have an error of 3 ms in the One Way Frame Delay measurements.

Normally, One Way Frame Delay packets would be given the priority treatment as the user traffic. In this case we are assuming they are given the same high priority treatment as the timing packets. This assumption not realistic, but it is made to illustrate that the One Way Frame Delay packets cannot detect physical asymmetry.

As with examples 1 and 2, the 4 timestamps (t₁, t₂, t₃, and t₄) are the same in both the symmetric and asymmetric cases. While the 1DM measurement is accurate in the symmetric case, it is in error by 3 ms in each direction for the asymmetric case.

Example 4: Measure One Way Frame Delay – Logical Path

This example illustrates the true utility of One Way Frame Delay – measurement of delay variation due to congestion (1 ms downstream, 2 ms upstream) with the physical data path being symmetric (4 ms each way). In this case we look at the difference between the timing packets and the measurement packets due to their different treatment. See Figure 2 for an example application.

Because the physical path is symmetric, both ends are synchronized with respect to ToD. However, the One Way Frame Delay packets are given the same priority treatment as other packets in the class. Any incremental delay will be indicated by the One Way Frame Delay packets.

Example 5: Measure One Way Frame Delay – Logical Path – Impact of Asymmetry

This example illustrates the impact of an asymmetric physical data path on One Way Frame Delay. In this case, there is the same congestion as in the previous example (1 ms downstream, 2 ms upstream). This congestion affects only the data and measurement packets, but it does not affect the timing packets. However, in one case the path is symmetric (4 ms in each direction), and asymmetric in the other (3 ms downstream, 5 ms upstream). The physical asymmetry will affect the timing packets, so slave in the asymmetric case will have a 1 ms offset in its ToD value. See Figure 2 for an example application.

As shown, the physical asymmetry introduces a 1 ms error in each One Way Frame Delay measurement. This is due to the offset of the clock at the slave.

Example 6: Impact of Asymmetric Bandwidth

The previous examples have shown the impact of physical asymmetry on One Way Frame Delay results. In this section we look at the impact of asymmetric bandwidth on latency.

The introduced asymmetry is due to the difference in the downstream and upstream transmit times:

Error in ToD = ½ * [ upstream latency – downstream latency ]
= ½ * [ (packet size / upstream rate) – ( packet size / downstream rate ) ]

We can relate the upstream and downstream rates with a ratio:

Error in ToD = ½ * [ (down/up ratio ) (packet size / downstream rate ) – ( packet size / downstream rate ) ]
= ½ * [(down/up ratio – 1) * packet size] / downstream rate

For this example we will look at the impact of a VDSL2 link with 40 Mbps downstream and 20 Mbps upstream on a 90 byte / 720 bit SYNC packet.

Error in ToD = ½ * [(down/up ratio – 1) * packet size ]/ downstream rate
= ½ * [( 2 -1 ) * 720 bits] / 40 Mbps
= 9 us

Many applications call for a ToD accuracy of 1 us or better, and the ITU is currently discussing a +/- 50 ns requirement. In the example above the VDSL2 media imposes asymmetry of 9 us that does not meet these requirements. Table 6 below shows the impact of various rates and ratios on ToD error. Even at Gigabit rates the errors are still too large (200 ns) to meet the ITU target of 50 ns.

It is possible for a 1588v2 boundary clock node to correct for known delays. If the delay asymmetry were calculated each time the upstream or downstream bandwidth changed, then a correction could be applied. However, this correction is only valid for packets that are the same size as SYNC packets. Other sizes of packets would experience an asymmetric delay that would not be corrected. The bottom line is that physical asymmetry introduces differences in packet delay that can’t be detected and which can affect ToD and/or One Way Frame Delay.

Real World Measurement

We set up two of our 1GigE Ethernet Access Devices (EADs) back to back with a delay generator between them. They were using NTP for timing with one being the master and the other being the slave. Synchronization using 1588v2 will yield the same results.

First we injected 0.5 ms of delay in each direction. The units achieved synch and each measured a RTD of 1 ms and One Way Frame Delay of 0.5 ms in each direction.

We then made the physical delay asymmetric with 1 ms in one direction and 0 ms in the other and restarted the EADs. Both units still reported a RTD of 1 ms and One Way Frame Delay of 0.5 ms in each direction.

Note that if you start the units with a symmetric physical delay and then inject asymmetric delay (which might be the case during a topology change such as a ring protection switch), the EADs will report different One Way Frame Delay measurements in each direction for a period of time. This is because the slave was able to achieve accurate synchronization in a symmetric situation. After the asymmetric delay is injected, the slave is still using its old offset, so it can accurately measure One Way Frame Delay. Over a period of several hours the slave will adjust to the new asymmetric delay and introduce a timing bias.

References

[1] IEEE 1588v2, “IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and
Control Systems,” July 2008.
[2] MEF, “MEF 35: Service OAM Performance Monitoring Implementation Agreement,” April 2012.
[3] ITU, “Y.1731: OAM functions and mechanisms for Ethernet based networks,” February 2008,
[4] Wikipedia, “Network Time Protocol”
[5] Symmetricom, “Accurately Measuring One-Way Delay in Packet-Based Networks”
[6] MEF, “MEF 22.1: Mobile Backhaul Phase 2 Implementation Agreement,” January 2012

Comments

0 #1 Prayson Pate 2013-09-17 11:45

Here is a posting that says that a poll of LTE operators indicates support for Takeaway #3 above, pushing the grandmaster timing source out to the first aggregation point.

http://www.rad.com/19/LTE-Operators-Support-Distributed-Grandmaster-Strategy/29199/?goback=.gde_2996485_member_272573873#

Refresh comments list
RSS feed for comments to this post

Please login to post comments

JComments