Tag Archives: RTP

QoS: Essentials, Part III

In the previous installment of this series (QoS: Essentials, Part II), we discussed types of marking, NBAR,  and Congestion management/queuing techniques. With part III, I intend on discussing Traffic shaping /policing, Congestion avoidance, and link efficiency mechanisms. Because of the sheer amount of information in QoS, I cannot cover all of the QoS spectrum, but I hope to instill the foundational information that QoS is built upon. As usual, without further ado, let’s get to it!


Traffic Shaping

Let me paint a picture for you. Here in Florida, we have a lot of toll plaza’s. You know, you drive up, pay some absurd amount, then continue on your journey. Now let’s picture that there is a toll plaza, and 2 miles down the road there has been an accident. Thankfully, the accident did not block all 4 lanes of traffic, but instead only 3. Traffic is moving along but very slowly. As a result, there is heavy congestion at the scene of the accident. The city, having some foresight, doesn’t want the congestion at the scene to get any worse…and decides to only send one car through the toll plaza every 30 seconds. Since they are putting slowing the rate of traffic at the booth, it allows things to clear up a bit at the end, and traffic to flow smoothly again- although at a slower rate.

So how does that apply to shaping in a network? Well, in a nutshell, shaping will enqueue excess packets (above the configured rate), and ‘release’ them onto the wire at the configured shaping rate. As a result, you can slow your transmit rate without just cutting off any traffic above a certain rate. Let’s say you administer a spoke that heads into a frame relay cloud- Your negogiate a CIR (committed information rate- the rate you wish to send under stable conditions) of 64k. Let’s assume your access rate, or AR, is 192k. You want to configure shaping at 64k, so that you don’t send more data then the distant end can handle..which may result in delay, jitter, or even loss of packets due to policing. Speaking of policing…

Policing (“Oh S!)%!! Not another ticket..”

How many of you reading this have seen speed traps setup by the local law enforcement officers? I’m sure anyone who has driven a car once or twice in their lifetime has. Let’s imagine we have a rookie right out of the academy, a little bit hot headed and power hungry. For the sake of this article (and to plug a hilarious movie), we’ll call the rookie “Farva”. Farva decides he wants to hit the roads and setup a speed trap. Since the speed limit is 55 Mph, he decides to not only ticket, but arrest anybody going over 55 Mph. This means 56 Mph gets you a cell next to a fellow named “Tiny” with a propensity for middle-aged caucasian males.

Once again, how does this apply to our network? Well, remember how we shaped the traffic leaving the spoke towards the frame relay network to 64k? Let’s imagine that we hadn’t shaped it at all. The service provider said “you get 64k to us, that’s it”. Well, in the event we start sending what our actual AR (access rate, remember? the line rate..) is, 192k in this case…then the service provider is going to implement policing and drop any traffic above the configured CIR. Obviously they could drop traffic at whatever rate they chose to, but in our example, it’s anything above the agreed upon CIR. In summation, policing enforces the CIR, making sure that nothing extra gets sent through. It is worth a brief mention that policing can be configured to mark down a packets IP Precedence or DSCP value so that it will get through, but later stands a better chance at being dropped then other packets. In times of network congestion, this will ensure the marked down packets are dropped first, but still allow them through if traffic is not heavy.

Bc…just a little info..

This is generally the place where I would discuss different shaping terminology, but I already described some of them here: FRTS, so check that out if you feel the need. I will, however give you this:

To calculate Bc, there’s a couple of ways to do it. For the sake of the following conversation, lets say our CIR is 64Kbps, and we are using the default Tc values of 125ms (over 8 intervals). What’s our Bc? There’s a few ways to do it:

Bc = Tc (125ms) x CIR (64) = 8000 bits per interval

OR

Bc = CIR (64000) / 8 (amount of intervals) = 8000 bits per interval

As you can see, both are correct. Whichever one you use is really up to you. Either way, our Bc will be the amount of data sent per time interval in order to conform to our shaping rate.


Congestion Avoidance

When the queues on an interface fill, by default, the next packets that try to be added to that queue will be tail dropped. In order to solve the problem of tail drop, you can either configure the queues to be larger, or use congestion avoidance. Here’s the idea: When tail drop is employed, all packets are treated as equals..not good! This means that delay-sensitive traffic such as VoIP/Video is no different then say Limewire, or HTTP traffic.  This means that several TCP segments can be dropped at once, causing those hosts to reduce their send window, then raise each of their transmit rates at the same time..resulting in bandwidth utilization that looks like a very sharp wave of high utilization and very low utilization. Congestion avoidance techniques such as WRED will help us avoid these issues. We’ll discuss those later, but first, let’s go over exactly why the default behavior of tail drop is bad.


Tail drop, and why it’s bad

Tail drop has several downfalls. The first one that comes to mind is that tail drop treats all packets as equals (as mentioned above). The second, is that with tail drop you are open to TCP synchronization and not efficiently using your links. TCP Synchronization, is cause by the natural behavior of TCP segments. A TCP segment will begin opening it’s window, gradually increasing it’s transmit rate, until it drops segments, then reduce the window by 50%.  The TCP hosts will build their transmit rate (open their window) slowly again, and upon reaching the maximum utilization, it will repeat this process. Now once you throw in multiple TCP sessions, you encounter TCP synchronization. The downfall to this is that when all of the sessions cut their window by 50%, you have a period of relative quiet in the network where very little traffic is being sent, followed by bursts of TCP traffic. The final issue with tail drop is that the more aggressive traffic (say, HTTP or limewire traffic, as mentioned above) will fill the queues quickly, leaving the less aggressive flows to be tail dropped.


Weighted Random Early Detection (RED/WRED)

WRED is based on RED. The basic idea behind RED is this- as the router’s queue’s fill, RED randomly selects TCP packets to be dropped, thus preventing synchronization as described above, and preventing congestion and eventually tail drop. What WRED does differently then RED, however, is allow you to be a more precise when dropping traffic. WRED joins with the powers of IP Precedence/DSCP to allow you to drop the lower precedence packets first, while allowing the higher precedence traffic to pass. In essence, WRED ‘predicts’ congestion, and gives you some decision on what can be dropped if the network is experiencing congestion. Based on statistics, WRED will drop traffic more often from a high volume sender then a low. What this means is, the more ‘offending’ TCP hosts will have their traffic lost as they cross the threshold (more on that in a minute), as opposed to the low volume sender. Now, it is important to remember that we are dealing with TCP packets here. If the bulk of the traffic is UDP, WRED will not be effective.

Now, it’s important to note that the ‘core’ of WRED is no different then RED. The only difference is that WRED is more selective about what is dropped, not how. Here’s a rough framework of how these techniques operate:

  • Packet is received, and the average queue depth is checked. If it’s below the minimum threshold, it is queued and sent out the proper interface. If it’s above the minimum threshold, it is either queued or dropped on a percentage based on the MPD (Mark probablity denominator). The MPD simply put, is the maximum percentage of packets that WRED will discard. If MPD is 16, using the forumlua of 1/MPD (1/16 in our example), the max discard rate would be 6%. If we made the MPD 10, it would be 10% (1 divided by 10).

Here is a collection of random notes I have thrown together as it relates to RED/WRED:

  • RED/WRED use an exponential weighting constant to determine the average queue depth. The lower this is, the more quickly the average queue depth will change, and by raising it it will react slower. By default it is 9, and can be changed with the random-detect exponential-weighting-constant X command
  • RED differs from tail drop in the sense that tail drop occurs when the queue is full- RED may begin dropping all incoming packets even if the queue is not full…if you set the max threshold low, it will discard them even if the queue isn’t full.
  • RED drops above the min threshold at a linear rate, based on the MPD (1/MPD is the formula, so default MPD of 10 means 1/10, or 10% max discard rate)
  • WRED cannot operate with other queuing techniques at the physical interface. If you configure CBWFQ/LLQ, you must configure WRED within each individual class. When WRED is enabled on the physical interface, only FIFO queuing is used. This can be seen with a show queueing interface s1/0
  • WRED weights packets on the following: Average queue depth (found using the exponential weighting constant, default of 9), Min & Max threshold (dependent upon the DSCP/Precedence value), MPD (see two bullets above)
  • Enabling WRED (random-detect) disables WFQ
  • WRED defaults to being IP Precedence based, but you can specify it to work on DSCP instead with random-detect dscp-based
  • If using DSCP based WRED, you use the following command to alter the thresholds per DSCP value: random-detect dscp af21 40 50. This command would make the DSCP value AF21′s minimum threshold at 40, and maximum at 50. You can see the effect of this command by doing a show queueing interface s1/0 again.

Below you’ll find a graph that I created to demonstrate RED’s behavior. You can see that when traffic crosses the minimum threshold, RED begins dropping traffic at a linear rate, up to the maximum discard rate, or MPD, which is by default, 10%. After that, it will cross the max threshold, and drop all traffic.

wred



Link Efficiency Mechanisms

  • Multilink PPP (MLP)
  • Frame Relay Fragmentation (End to End FRF.12)
  • Header Compression (RTP Header compression, TCP header compression)


Link Efficiency

Link efficiency may not strike you as an important feature of QoS, however it is when you are the one paying for the bandwidth! In these days of the rough economy and financial uncertainty- getting the most out of our money is key..especially when it comes to business. Bandwidth costs money, after all. Link efficiency can be broken down into a couple of categories, Compression, and link fragmentation/interleaving tools. Compression is the act of compressing the packet (or the number of bytes in the packet), so there is fewer bytes to transmit across the link. Fragmentation is essentially chopping up the larger packets into smaller ones. To understand interleaving, let’s say we have a large packet waiting to transmit, with a small packet that is delay-sensitive (such as voice) waiting behind it. If the small packet waits for the large packet to serialize (be put onto the wire), it may wait too long and exceed the acceptable delay/jitter. By fragmentation and interleaving, we are chopping the large packets into smaller pieces, and inserting parts of the voice packet in between the large one. Let’s first discuss compression..

Compression is not difficult to understand..well, the concept at least- agreed? There are a couple types of compression I’d like to discuss, Payload compression, and Header compression. Here’s a quick rundown:

Payload compression: Compresses the headers and user data. Uses more CPU cycles.  Here I am mostly referring to Layer 2 compression such as ‘Stacker’ or ‘Predictor’. Stacker is more CPU intensive but uses less memory. Use “compress stac” at the interface to use stacker. Predictor is more memory intensive. Use “compress predictor” to use Predictor. It is worth mentioning that predictor only supports PPP and LAPB, whereas Stacker supports most Point-to-point layer 2 protocols.

Header Compression: If you were to examine packets, you’d see that the headers are very similar..header compression is based on this..and as a result uses very little CPU. The two common types of header compression are TCP and RTP header compression. TCP is best used with relatively small TCP packets, since it reduces the header from 40 bytes to anywhere from 3 to 5 bytes. The best way to see why TCP header compression is not so great with larger packets, is to consider that we saved about 35 bytes in our header compression of say a 56 byte packet (40 bytes being in the header)..but what if our packet was 1300 byes? We’d compress about the same amount of bytes, and it would be a relatively small savings byte-wise..almost not worth it. RTP header compression is best with voice traffic, and will generally compress the headers a little bit more then TCP (2-4 bytes down from 40). An interesting note, if fast switching or CEF is not enabled on an interface, and then you enable RTP header compression, the interface will process-switch the traffic. NOT good!

Multilink PPP

Out of the few link efficiency mechanisms listed above, this is the one many people have heard of- usually before the others. Let’s say you have two slow serial links, both running PPP to the same location..Multilink PPP allows you to bundle them and treats them as one link. This provides layer 2 redundancy as one of those links can drop and traffic will still flow- although at the much slower speed of the single link. There’s several other benefits that multilink provides which we’ll go over..

Multilink interleaving- Interleaving simply takes two separate streams of data, and ‘interleaves’ them, sending (in our case) delay-sensitive traffic in between the large datagrams. As you may have guessed, this is especially helpful with delay sensitive applications.

Multilink fragmentation- Multilink Fragmentation is ‘chopping up’ large datagrams into several smaller ones, but using multilink headers on each of the smaller datagrams.

As a side note, me being a frame relay nut, love things like MLP over Frame relay (MLPoFR). If you’re interested in that stuff too, check out what cisco.com has to say. I could probably dedicate a whole article to MLP, so I would look for that in the future..

Designing and Deploying Multilink PPP over Frame Relay and ATM

That is all for now, folks. I was going to go into FRF.12 and MLP LFI into detail, but I ran out of steam to be quite honest. More to come at another time most likely! Enjoy…