Tag Archives: DSCP

QoS: Essentials, Part II

In QoS: Essentials, Part I, we discussed what QoS is, classifying/marking traffic, and trust boundaries. In Part II, we will get into the actual types of marking, do an overview of NBAR, and finally get into Congestion management/Queuing. Ready?

Types of Marking:

There are several different ways to mark, and each one is suited for a special situation. For example, if you’re running QoS over frame relay, you’d use the Frame Relay DE (discard elgibility) bit, whereas if you’re using ATM, you’d opt for the CLP (Cell loss priority) bit, etc. For now, we are going to simply discuss the different types. First, it is important to note ahead of time that we are going to be discussing markings that are used on layer 2, and some that are used on layer 3. Here are the breakdowns:


markings


  • CoS (Class of Service): CoS is very common in a LAN environment, as it is marked at Layer 2. In the graphic below, we have an 802.1Q frame, with the PRI field (used for CoS) inside a 4 byte tag, where you find the Type ID (TPID), will always be 0×8100 in order to identify the frame as an IEEE 802.1Q frame. Next we have the PRI field, which is 3 bits long (8 total values, 0-7), then the CFI, and VLAN ID. The key part here is to remember the PRI field is 3 bits, and can have 8 possible values. Thus our CoS values can be anywhere from zero up to seven. To give you some perspective on this, most Cisco IP phones will tag their traffic with a CoS of 5 by default, putting it into the critical category. This makes sense, since VoIP traffic is very sensitive to delay/jitter.

8021q_p_frame_cos












  • DE bit: The Discard Elgibility bit is used in Frame relay environments. Here’s the concept. The USPS mail guy does his usual mail run, and heads back to the office to pick up more mail. Upon arriving, he realizes he has entirely too much to take, so he takes only the unmarked pieces of mail, and leaves the ones with a red marking behind..or drops them. Essentially all that happens with the DE bit is that you are telling nodes along the path that this packet *can* be dropped before others do in times of network congestion- or when the router cannot handle all of the traffic. The other end does not have to act on this bit at all, however. If it does choose to, however, the packets with the DE bit set will be dropped before those with no bit set.

  • CLP (Cell Loss Priority): The CLP bit works the same as the DE bit in concept, except will be used in ATM cells.

  • IP Precedence: Now we’re talking about Layer 3. In 1981, the ToS byte was used to set a certain level of service for that packet. Inside the byte was IP precedence (3 bits, the same as CoS), a ToS field (yes, a ToS field within a ToS byte, which was 4 bits), then the remaining 7 bits were unused. IP Precedence is fine, but DiffServ is quickly becoming standard, with engineers opting for DSCP marking, as it can be more grainular. Instead of 0-5 IP Precedence levels, you have from 0-63 with DSCP. That being said, DSCP is backward compatible with IP Precedence..there are 8 DSCP values that map to IP Precedence values. If a network running IP Precedence receives a packet marked with DSCP, it will simply read the first 3 bits of the DSCP, which it thinks is just a regular IP Precedence mark. That’s another time and place, however!

  • DSCP (Differentiated Services Code Point): The ToS byte has been redefined as the DSCP field, with the 6 most signifigant bytes making up the DSCP value, and the last two bits being the ECN, or Explicit Congestion Notification bits. As I said, DSCP is backward compatible with IP Precedence, so if a system receives an IP packet with a DSCP value, remember that it will only read the most signifigant 3 bits, and treat it as IP Precedence. With DSCP, you set the DSCP value, which in turn causes a DiffServ node to act in a certain way towards that packet..this is called Per-Hop Behavior. In a nutshell, the node reads the DSCP, and realizes it is part of a group (or behavior aggregate..BA for short), and treats it the same way for the rest of the packets belonging to that BA.

  • MPLS EXP: Ok, this one is kind of odd. Without diving into MPLS too deep, here is a breakdown. MPLS packets can be thought of as a regular IP packet with a 4 byte (or more) MPLS header inside it. The IP packet (with MPLS header inside..) is then encapsulated in a Layer 2 protocol, such as ethernet. It is then sent. Because of the fact that it is technically in a layer 3 packet, but encapsulated by layer 2, the MPLS header can almost be considered Layer 2 1/2. The MPLS header consists of only 4 fields, the label (which is basically like a color that is marked on the packet), the EXP bits (3 bits to be exact), BS bit (bottom-of-stack), and TTL. Inside the EXP bits, you have the same values as you do for CoS, or IP Precedence.

NBAR: Digging deep…

Prepare to be amazed! NBAR, also known as Network Based Application Recognition..is incredible. NBAR is a feature found in Cisco IOS, which can allow you to check traffic statistics, protocol discovery, and classify your traffic…for you! Let’s say you decide you want to implement QoS on your network. The first step is to identify traffic and requirements, right? Well, with NBAR, you can simply issue the following on the interface you wish to monitor:

SGTccie(config-if)#ip nbar protocol-discovery

In order to actually see the traffic statistics, we’d then issue the following command from enable mode:


SGTccie#show ip nbar protocol-discovery



It is worth mentioning CEF is required to run NBAR. Also, when using the “show ip nbar protocol-discovery” command, it will show you all interfaces unless you add “interface X” after it. NBAR can also save you a lot of time. Once we get to QoS configuration, you will see. The old way of doing things was to configure extended ACL’s listing port numbers and IP’s, and etc. Instead of “access-list 101 permit ip any host 192.168.1.1 eq www”, we now use “match protocol http”. Nice!


Congestion Management/Queuing…waiting in line…

Ahh, congestion management. Running fiber everywhere along with 1GB ethernet everywhere is great..but congestion still happens. Why? Many reasons, really..poor QoS implementation (or none!), poorly designed networks, outdated equipment, etc..the list goes on and on. Generally, however, the point of congestion is almost always where traffic from multiple sources aggregate onto a single link. Picture 10 access-layer switches connecting to one distribution-layer switch, which only has a 100MB link to the core. You could easily have 400 users’ traffic flowing to the core on that one link. Another scenario would be where you have a slow WAN link (pretty common!). Another way you could think of it is: Congestion occurs when the rate of input for incoming traffic exceeds the rate of output. In english? When going from high speed interfaces down to low speed interfaces you are prone to congestion. It’s no different then a theatre filled with people trying to get out of two doors at once..they can only move so fast!

Queuing is a temporary form of congestion management. It will ease some issues with congestion, but the long-term fix is fairly obvious- getting more capacity. This is not always feasible, unfortunately. So what can we do? We can alter the order that traffic leaves the node, so the low-priority traffic will be dropped first, and not the high priority (VoIP, critical applications, etc) traffic. By default, however, you will experience FIFO (First In, First Out) on interfaces that are faster then 2.048Mbps. Weighted Fair Queuing is used on interfaces slower then 2.048Mbps by default..but we’ll get into that in a bit. Depicted below is the way FIFO software queuing works. It is key to mention that there is only one hardware queue..and it uses FIFO. When we discuss creating new queues, and assigning traffic to certain queues, we are discussing the software queue only. As you can see below, FIFO treats all traffic equally, meaning the sensitive VoIP traffic will have to wait in line behind the web traffic. Not ideal!


fifo_queue2


Priority Queuing

Priority Queuing, or PQ, consists of four queues: high, medium, normal, and low. By default, all packets will be assigned to the normal queue when using PQ. PQ is a pretty harsh Queuing method, which generally leaves lower-priority queues starved. PQ works by always giving the high priority queue the right of way, so to speak. If there is something in the high queue, it is sent before any other traffic. If the high queue is empty, it will check the medium queue..send one packet from there, then move down to the low, and start the cycle over. What you get is the possibility of the queues below high not getting enough bandwidth, since the high queue is taking it all. The idea is almost right (treating the high priority traffic as such), but the implementation is a little off. Let’s look at some better options.

Round Robin (RR)

Round robin contrasts heavily in comparison to Priority Queuing. The Round Robin process passes one packet from one queue at a time, effectively (almost) dividing the bandwidth almost equally. This is assuming the packet sizes are almost the same size, however. If one queue consistently has packet sizes much larger then the rest, it will take more bandwidth then the rest. RR does a good job of dealing with queue starvation, but does not prioritize at all. It can also be somewhat unpredictable as to actual queue usage.

Weighted Round Robin (WRR)

WRR is a modification of RR, where each queue receives a weight, and as a result of the weight, receives that portion of the bandwidth. WRR allows you to prioritize to some degree, but can also be somewhat unpredictable as some queues may use more bandwidth then planned.

Weighted Fair Queuing (WFQ)

As I mentioned before, WFQ is the default queuing method used on interfaces that are slower then 2.048Mbps. WFQ is important to know because as we’ll find later, it is implemented in both LLQ and CBWFQ..which are popular methods of queuing these days. WFQ is flow-based, meaning that once it receives a flow, it is assigned to a FIFO queue. A flow consists of packets that have the same source IP, destination IP, Layer 4 protocol (TCP/UDP), IP Precedence, TCP/UDP source and destination ports. WFQ creates queues on the fly for each flow, so the number of queues can vary greatly.

Class-Based Weighted Fair Queueing (CBWFQ)

CBWFQ divides traffic into classes (that are configured by the user), which are assigned their own respective queue. Although each queue can use more bandwidth then configured for, they can have a minimum bandwidth guarantee, so that even in times of congestion, they will get that amount of bandwidth. CBWFQ can create up to 64 queues, with each one being a FIFO queue. It is worth noting that you can configure the class-default queue to be a WFQ. The class-default queue is used for all undefined traffic. Bear in mind that while the CBWFQ functions with WFQ as a whole, once the traffic has been divided up into it’s respective queue, they are FIFO. Think of it like this, you are sending traffic into separate lines based on preference, but once in that line, they are considered equal. CBWFQ is a big improvement over previous queuing methods, however it still falls short as it relates to voice or video applications. You’ll note that CBWFQ provides no method of identifying a priority queue..this can hurt applications sensitive to delay. To solve these issues we move on to LLQ!

Low Latency Queuing (LLQ)

At this point we can agree what we need is a queuing method that will give priority to delay-sensitive traffic, but at the same time not leave all other queues starved for bandwidth. Do you remember the issue with priority queuing? PQ gives priority to one queue- which is great, but leaves the other queues starved in times of congestion. WFQ is good, as it doesn’t leave flows starved, but it also provides no guarantee to any particular queues. LLQ solves these issues. LLQ is essentially a CBWFQ with at least one strict-priority queue. What does this mean? It means one queue receives priority, however that queue is policed, meaning in times of congestion it cannot use more bandwidth than is configured.


Ahhh..sigh of relief!

Here we are, at the end of Part II! As you have noticed by now, QoS can definitely be daunting, but if you take the time to tackle the theory behind it, it really isn’t that difficult. The difficulty (for me at least), has always been in the theory as opposed to the implementation! In Part III we will discuss Traffic shaping/policing, link efficiency, and congestion avoidance. Look forward to seeing you!


QoS: Essentials, Part I

Many of you reading this have been mystifyed by terms like WFQ, WRED, Jitter, or DiffServ. My aim in Part I of QoS: Essentials, is to take some of the mystique surrounding Quality of Service away. Let’s get to it!

What is QoS?
While in Iraq, we stayed in tiny trailers with 2 soldiers sharing a room. It wasn’t horrible, but let’s just say you got to know your roommate a little bit more then you wanted to due to the close proximity. Two people was OK, compared to what some people had to do! Our commander was given his own trailer, because well, he was more important then the “foot soldiers” or “average joes”. If something happened to most people in my unit, the position can be filled. If something happens to the commander, it’s not quite as easy. Moving on, the commander receives his own room, thus putting a soldier out, and forcing him to move in with two guys already occupying a trailer. Their beds were nearly touching…for 15 months. What happened? QoS…sort of. Here’s what cisco has to say about QoS:

“QoS is the ability of the network to provide better or special service to a set of users or applications or both to the detriment of other users or applications or both.

Based on that statement, we can agree that Quality of Service is essentially improving service for one service, while limiting the service of other users/applications. Think about it, you run a network with 1,000+ systems, and run a special application we’ll call AppX that the companies employee’s practically live on. You also have employee’s who are running P2P file transfer programs such as Gnutella, Kazaa, or Limewire. Without a QoS policy in place, heavy P2P use will prove to be detrimental to AppX, and thus decrease company productivity, resulting in less earnings, and ultimately hurt everyone! In the above scenario, using QoS, it would be completely possible to limit the P2P users to only using a percentage of the available bandwidth, and simultaneously guarantee a percentage of bandwidth to AppX..even in times of network congestion. Pretty outstanding!

Before we get into the options QoS provides you with, we must first understand the basic QoS models available to you as the network engineer.

  • Best Effort: No QoS policy is implemented. All packets receive the same level of service.
  • Integrated Services Model (IntServ): the first real end-to-end QoS solution. IntServ is based on a per-flow basis, where a “flow” is defined as a stream of packets with a common source, destination, and port number. Does not scale well, as each router using IntServ is required to maintain per flow state information.
  • Differentiated Services (DiffServ): Not a guaranteed service model. Flows are combined into “classes”, and are treated on a per class basis. DiffServ is very scalable, and flexible. Packet classification, marking, and conditioning is done at the edge, where the core handles QoS on a Per-Hop Behavior (PHB) based on the packet’s class. DiffServ is highly scalable

QoS: Steps to implementation

There are three broad steps required to implement QoS. While the methods of implementation vary, the idea behind each is the same. They are as follows:

  1. Identify traffic types and requirements

The first step consists of evaluating business requirements, and the applications/services currently in use on the network- then determining the requirements of each one. The idea behind this step is to later place apps/services with similar requirements into the same class, and then apply policies to each class. For example, if you have VoIP traffic, and have several other important, but not critical applications, you would give priority to the VoIP traffic (therefore getting it’s own class), and give the lower-priority traffic it’s own class.

2. Classifying traffic based on the requirements

Classifying traffic is essentially taking a group of applications and placing them into several classes. Marking is also usually done after classifying. Why should you mark? If you don’t, each device in your network that handles QoS will have to perform a deep-packet inspection along each hop. If you mark at the edge, each device that sees that marking (or “color”) after that will know what treatment it should receive already. Classification can be based on the incoming interface, Class of Service (CoS, Layer 2 marking) value, source or destination IP address, IP Precedence/DSCP value, MPLS EXP, or by the application type.

3. Define Policies for each class

Now that you’ve identified business/network requirements, we’ve classified and marked (you did mark your traffic, didn’t you?) our traffic..we must do something with all of it! This is where the action happens. This is where you can set a maximum/minimum bandwidth for a class to use, define a priority, or apply congestion management/avoidance (in other words, how to act when there is congestion present..which traffic should be dropped first, etc).

Trust Boundaries

So you know the steps to implementing QoS, and classifying/marking..but where do we start? The point at which traffic is marked in our network is defined as the “trust boundary”, or where the QoS markings are “trusted”. You should always try to mark closest to the source if possible. A common scenario I hear is network admins installing Cisco VoIP phones, with a PC connected to the 3-port switch on the phone. Most Cisco phones will provide a CoS (Layer 2 marking)/IP Precedence (Layer 3 marking) of 5 by default. If you “trust” the incoming values at the access switch, your trust boundary is at the IP phone/access switch.  This is ideal. This type of configuration ensures that all of your core/distribution nodes do nothing more then quickly read the markings, and act on the necessary policy for that class instead of deep packet inspection. In the diagram below, we see three different possibilities for trust boundaries:

trustboundary

A) In this scenario, the IP phone marks it’s own traffic. This is ideal, however not all IP phones can mark.

B) This option is still good, and is a pretty common place to mark- at the access layer. This is generally where you would mark if you just had a regular PC attached, or a phone not capable of marking it’s own traffic.

C) This one is OK. Generally the congestion in networks occur at the WAN links, so as long as you mark before it hits the WAN link, you should still be OK. This is why you generally want to mark as close to the source as possible.

Although this has not been a complete overview of QoS, I hope it’s cleared some things up for those new to QoS. In Part II we’ll discuss QoS policy, Congestion avoidance/management, and Queueing.