Network
    Reporter


 



End-to-End VoIP Product Comparison Testing 

Overview
Summary
How We Tested

Table 1: Voice Source, Cisco-7905-to-Cisco-7905
Table 2: Voice Source, Cisco-7905-to-Cisco-7960
Table 3: Voice Source, Cisco-7960-to-Cisco-7935-ConfPhone
Table 4: Voice Source, Cisco-7960-to-Cisco-ATA186
Table 5: 1KHz Test Tone Source, Cisco 7960 to Cisco 7905
Table 6: 1KHz Test Tone Source, Cisco 7960-to-Cisco 7935 -ConfPhone
Table 7 - Cisco ATA186 POTS-Adapter
Table 8 - Pingtel xpressa PX-1
Table 9 - Pingtel xpressa Software Phones
Appendix A - Notes

Overview

This report summarizes the results of comparison testing between six Voice-over-IP products: 

Pingtel Xpressa PX-1 (a "hard" phone)

Pingtel Instant Xpress (a software phone on a Dell PC)

Cisco 7960 (a full-featured VoIP phone)

Cisco 7905 (a streamlined-functionality phone)

Cisco 7935 "ConfPhone" (a conferencing phone)

Cisco ATA186 "POTS Adapter" (a device that allows a plain ordinary telephone to be connected to the Internet for VoIP use). 

All these phones are real products being marketed today.   All were tested under a variety of network conditions which are nearly identical to what they might encounter when operated in the open Internet. Although it is possible to upgrade network infrastructure to provide QoS guarantees, this will not be helpful in most cases. In the general case, "anyone-talking-to-anyone" communications, product performance could be affected by network disturbances. These disturbances include jitter, packet drop, reorder, duplication, and others. The report includes sound clips so you can hear and judge for yourself the voice quality results of representative conditions.   We used our   Maxwell[tm] (see sidebar) to impose these real-world network conditions.  The test methodology is also described.

Summary:

Maxwell makes it very easy to perform side-by-side product, product-version, regression and -interoperability comparisons[1] under controlled and realistic network conditions. 

Some VoIP phone manufacturer's documentation recommends that the user's network be designed within certain quality-of-service conditions, such as jitter not to exceed 30ms, so we used those as a starting point in these measurements. We picked 25ms and 30ms.   The Pingtels performed well at jitter levels far in excess of these, as you can hear from the sound clips.

The sound clips also demonstrate how combinations of network disturbances or impairments affect the phones.  Individual impairments may not affect voice quality, however, in combination with other impairments, voice quality is degraded.   For example, at an average jitter of 25ms, we found no audible distortion in the Cisco 7960 phones unless we also added reordering.

The sound clips are in WAV format, which most desktop computers can play.  All are digitized at 8KHz, 16-bit resolution, monophonic.  For reference, CD-quality sound is 44.1KHz, 16-bit resolution, stereo.  You can listen to the recordings and judge for yourself.  Reference recordings are also included for `best case' network conditions.  You will need a PC with a sound card.  It is best to listen with good headphones, rather than typical desktop PC speakers.  Actual sound quality effects are more accurate iif the sound comes from a source near your ears, just as it does with a regular phone.  Good headphones also block ambient noise, allowing you to hear just the recording.

The following tables  show network conditions at which the phones were tested.  Only three of the many kinds of possible network impairments were tested.  These three are:

Jitter:  uniformly-distributed random amounts of delay is added to voice data packets.  Maxwell keeps track of the arrival time and the exit time of each packet, automatically calculating an average delay which is displayed and updated in real-time by the graphical user interface and also shown in the tables below.

Drops: voice data packets are randomly selected to be dropped.  Distribution function is uniform.  The mean is given;  e.g., at 1% drop, 1 out of a hundred packets is dropped.  This number applies to both directions, which means that the effective packet loss in each direction is about half that number (e.g., when the drop rate is set to 3%, each direction was showing a drop rate of 1.5%.  The table column for drops has been adjusted for this fact

Reorder:  the order in which packets arrive can be changed.  The higher the number, the more reordering takes place.  In real networks, packet-reordering can take place occasionally when routes are adjusted, and consistently over tandem links (a commonplace solution when a quick bandwidth-upgrade is needed).  You can think of the reorder-number as being the number of extra data links: e.g., reorder 0 -> one data link, reorder 1-> two tandem data links, reorder 2 -> three tandem data links, etc.

We did not duplicate, modify or corrupt packets, though Maxwell can do those things too.

The table below provides a sound clip for each phone under each condition, along with a text notation of the voice quality. Click on the sound clip to hear for yourself exactly what the indicated network conditions do to the tested equipment (i.e., "what that sounds like"). These clips were digitized at 8KHz, 16-bit monophonic.

How We Tested

Other than the effects introduced by the Maxwell, the network was a quiet internal 10/100 switched LAN, i.e., almost perfect. 

Two kinds of audio source material were used:  a snippet from a local radio station's news reporting[2], and a 1000 Hz test tone.  Both were recorded onto CD-R media and played using an RCA portable CD player.  The headphone jack was connected via adapter to RJ11 connector on the phone.  The CD player's volume control was adjusted so that with no impairments from the Maxwell, the signal was loud, clear and undistorted.

For all measurements except the ones to the Cisco 7935 ConfPhone (which has no handset), the receiving phone's handset cord was connected thru an RJ11 adaptor to ministereo plug, and fed directly into a PC sound card.  The purpose in doing so was to avoid speaker-to-microphone distortion and background noise pickup.  The recording volume control was adjusted for maximum clarity and volume without distortion when the Maxwell was set to no impairments.

Since the Cisco 7935 ConfPhone has no handset and no way to directly record the output signal, for these measurements, an AudioTechnica ATR20 cardioid low-impedance microphone was suspended one inch above the Cisco 7935 ConfPhone speaker.  These measurements were taken in a separate and quiet (though not anechoic) room, away from our lab's equipment, RF emissions, and fan noises.

For each set of tests, a reference recording was made.  Listen [3] to the reference recording to hear what "best case" sounds like. 

For reference recordings, Maxwell was set to 0 ms jitter, 0% drop, no reordering.  In other words, Maxwell did not impair any of the traffic. These reference files contain some noise picked up by the sound card and cabling, not introduced by either the Maxwell or its effect on VoIP traffic.  It is recognizable as 60-Hz "hum" and also hiss.  You hear it in all samples.  Network effects on VoIP tends by be heard as gaps and dropouts, or in some cases like the person is gargling or talking underwater.  For the test-tone measurements, instead of a steady tone, it sounds more like you're listening to Morse Code.

Tables 1 thru 4 below show results when the audio sources are human voices, a woman's and a man's, speaking clearly. 

Tables 5 and 6 show results for a 1000-Hz sine wave test tone. 

Table 7 shows the POTS Adapter.

Table 8 shows the Pingtel "hard" phone.

Table 9 shows the Pingtel "software" phone.

Table 1: Voice Source, Cisco-7905-to-Cisco-7905
Jitter
(in ms)
Drop
(in %)
Reorder Recording Comments
0 0 0 Reference file
25 0 0 clear
25 1 0 clear
25 1 1 Gargling/"underwater sound".  Annoying.
25 2 0 Slight distortion
25 2 1 Gargling/"underwater sound".  Max Headroom Sound.  Annoying.
25 3 0 Slight echo sound
25 3 1 Gargling, echoing, extreme distortion.  Very annoying, intelligible only for slowly speaking talkers
25 4 0 A little bit echo-sounding
25 4 1 Very distorted.  Unacceptable.
25 5 0 Distorted but intelligible.
25 5 1 Distorted and barely intelligible.  Unacceptable.
25 5 2 "
30 0 0 Clear
30 1 0 Slightly noticeable distortion but clear enough to understand
30 1 1 Very distorted
30 2 0 Slightly noticeable distortion but clear enough to understand
30 2 1 Gargling/"underwater sound"
30 3 0 "
30 3 1 Distorted
30 4 0 "
30 4 1 Very distorted
30 5 0 Distorted
30 5 1 Very distorted
30 5 2 Very distorted



Table 2: Voice Source, Cisco-7905-to-Cisco-7960

Jitter Drop Reorder Recording Comments
0 0 0 Reference file
25 1 1 Slightly distorted
25 2 0 Very slight distortion
25 2 1 Distorted but understandable
37 3 0 Distorted but understandable
37 3 1 Distorted but understandable
37 5 0 Distorted, noise pops
37 7 1 Very distorted



Table 3: Voice Source, Cisco-7960-to-Cisco-7935-ConfPhone

Jitter Drop Reorder Recording Comments Notes
0 0 0 Reference These measurements were taken using a microphone and thus subject to some speaker-to-microphone distortion
30 0 0 clear  
30 1 0 Some distortion  
30 1 1 Some distortion  
30 2 0 Distortion  
30 2 1 Distortion  
30 3 0 Distortion  
30 3 1 Distortion  
30 4 0 Very distorted  
30 4 1 Very distorted  
30 5 0 Very distorted  
30 5 1 Very distorted  

 

Table 4: Voice Source, Cisco-7960-to-Cisco-ATA186
Jitter Drop Reorder Recording Comments Notes
0 0 0 Reference file  
30 0 0    
30 2 1 Slightly-noticeable gargling sound  
30 4 0 Clear  
30 4 1 Very distorted  

 
 

Table 5: 1KHz Test Tone Source, Cisco 7960 to Cisco 7905
Jitter
(in ms)
Drop 
(in %)
Reorder Recording Comments
0 0 0 Reference
25 0 0 clear
25 1 0 clear
25 1 1 Sounds like Morse Code
25 2 0 Sounds like Morse Code
25 3 0 Sounds like Morse Code
25 3 1 Sounds like Morse Code
25 4 0 Sounds like Morse Code
25 4 1 Sounds like Morse Code
25 5 0 Sounds like Morse Code
25 5 1 Sounds like Morse Code
25 6 0 Sounds like Morse Code
25 6 1 Sounds like Morse Code
25 7 1 Sounds like Morse Code
25 7 2 Sounds like Morse Code


Table 6: 1KHz Test Tone Source, Cisco 7960-to-Cisco 7935 -ConfPhone
Jitter
(in ms)
Drop 
(in %)
Reorder Recording Comments
0 0 0 Reference
25 0 0 Clear
25 2 0 clear
25 2 1 Morse code, lots of dropouts
25 2 0 Morse code
30 0 0 Clear
30 2 0 Morse code
30 2 1 Morse code
30 5 0 Some dropouts can be heard
30 5 1 Morse code; dropouts

Tests started from no impairments, then increased drop percentage at one- percentage-point intervals.  At each interval, jitter started at 0 ms (the "set-point") then increased, and the same with reordering.  This order may be meaningful depending upon how the receiving units compensated for these impairments.


Table 7 - Cisco ATA186 POTS-Adapter
Jitter
(in ms)
Drop 
(in %)
Reorder Recording Comments
0 0 0 Reference file, no impairments
30 0 0  
30 2 1  
30 4 0  
30 4 1  

 
 

Table 8 - Pingtel xpressa PX-1
Jitter
(in ms)
Drop 
(in %)
Reorder Recording Comments[4]
0 0 0 Reference, no impairments
155 5 0  Clear
155 5 1  Clear
155 5 1 Toggles between "no impairments" and 155ms/10%/reorder1. You can hear some slight distortion in the transitions, but the compensation adapts quickly and you don't hear this in the steady-state condition. Ignore the buzz that you hear right after the phrase "six years". This is created by the CD player when it goes back to repeat the track, it is not caused by the phone.
155 10 0  Some slight distortion
155 12.5 0  Slight distortion
155 12.5 1  Some warbling
155 15 0  Some distortion
155 15 1  Warbling
200 5 0  Clear
200 5 0 Toggle b/n no impairments. Some slightly noticeable distortion occurring at the transitions.
200 5 1  A little warbling
200 5 1 Toggle b/n no impairments. A little warbling and some echo at the transitions
200 5 2  Clear
200 5 2 Toggle b/n no impairments. Can hear some distortion, echo and warbling at the transitions. Very garbled in spots.
247 0 0  Clear
247 2.5 0  Clear, though slight warbling is audible
247 2.5 0 Toggle b/n no impairments. Slight gargling type effect at transitions.
247 2.5 1  Clear
247 2.5 1 Toggle b/n no impairments. Can hear a little gargling effect at transitions.
247 2.5 2  Definite warbling, unacceptable quality


 
 

Table 9 - Pingtel xpressa Software Phones
Jitter
(in ms)
Drop 
(in %)
Reorder Recording Comments[4]
0 0 0 Reference, no impairments
200 0 0  Clear
200 0 0   Toggles between no impairments and 200ms jitter. Some warbling audible at the transitions.
200 5 0  Clear
200 5 0  Toggle no impairments. No discernible changes at the transitions.
200 10 0  Clear
200 10 0  Toggle no impairments. Some echo audible at transitions
200 10 1  Very bad, unacceptable. Warbling.
300 0 0 Slight warbling, but generally clear
300 1 0  A little warbling
300 1 1 A little warbling, some pops
300 1 1  Toggle no impairments. Noticeable warbling at transitions
 

Appendix A - Notes


The Cisco 7905-to-Cisco 7960 jitter25msdrop3pctreorder1 measurement had to be redone.  The phone had dropped its connection and no audio was present in the recorded file.  Cause unknown.

When the Cisco 7935 ConfPhone is taken off HOLD (un-muting the speaker), even with no impairments set by the Maxwell, the resulting voice quality is sporadic for a period of time (~30s)

During most Cisco VoIP phone; tests, connection to its call set-up director software would be lost, but audio data continued to be sent and the source audio was still audible.  Hanging up the phone in some cases did not restore this connection.  Reducing the impairment parameter "reorder" back down was not enough to make the phones work:  I could dial but wouldn't get the call completed when I picked up.  The called phone kept ringing even after its handset was picked up.

Why this matters

This would be a security vulnerability at the very least in the sense of a denial-of-service attack:  by manufacturing "bad" network conditions, it would be possible to prevent the phones from switching between calls or placing another call. This occurs under conditions where the voice quality is bad but still intelligible.  Although in principle it would be possible for the network to be installed such that the phones were on a separate physical network than PCs, which as we know tend to be vulnerable to Email-based and other forms of virus.  Since the Cisco 7960s have a PC LAN connector, in practice this might not be so easy to enforce.

References:

RTP: RFC1889


Footnotes

[1] Within the limits of what firmware versions are allowed to be simultaneously extant

 

[2] In the real world, one is much more likely to hear and converse daily with many different people with many different speaking rates and accents.  We kept these measurements relatively simple, but bear in mind when you listen to the recordings that the effects of small distortions are magnified greatly when the speaker is speaking rapidly using a thick accent.

 

[3] With all these recordings, it is best to use good headphones.  You'll hear the recording better, hear what's distorted about it, and because good headphones block ambient noise, you'll be less distracted by other noises.

 

[4] The rows containing a comment indicating "toggling" mean that the recording contains a few seconds (about four) where the network conditions are as indicated for that row, followed by a few seconds (also about four) in which all impairments were set to zero.  This repeats several times.  By switching between "perfect" network conditions and impaired conditions, you can hear how the tested equipment's compensation algorithms adapt to sudden changes in conditions.  Some compensation algorithms adapt to poor conditions gradually, and can continue to adjust ad conditions degrade, but sudden changes from "good" to "very bad" can cause momentary service disruptions as they adapt.