ALTQ Tips last update: $Date: 2006/09/28 03:00:40 $ 1. General Issues 1.1 Queueing Disciplines 1.2 Kernel Configuration Options 1.2.1 KLD 1.3 Test Environments 1.4 Traffic Generation 1.5 Network Cards 1.6 PC Hardware 1.7 Token Bucket Regulator 1.7.1 Interface Rate-Limiting 1.7.2 Tuning a Token-Bucket Regulator to Minimize Delay 1.8 ATM driver 1.9 Tun Driver 2. CBQ 2.1 CBQ Configuration File 2.2 Filter-Matching Rule 2.3 Sample Setting 2.4 CBQ Setting for Slow Interfaces 2.5 Shaping the Total Traffic of an Interface 2.6 CBQ over 100baseT 2.7 CBQ Monitoring 2.8 Limitations of Borrowing 3. HFSC 3.1 HFSC Basics 3.2 Notes on HFSC 3.3 Sample configuration 4. RSVP 5. RED (Random Early Detection) 6. ECN (Explicit Congestion Notification) 7. Diffserv 8. WFQ 9. FIFOQ 10. JoBS 11. IPv6 12. Trouble Shooting 13. Coverage 1. General Issues 1.1 Queueing Disciplines A queueing discipline controls outgoing traffic by packet scheduling and/or queue buffer management. (yes, it controls only outgoing traffic.) ALTQ supports many queueing disciplines but mainly for research purpose. Probably, what you will be using is only CBQ, HFSC and/or RED. CBQ is the most well-engineered among the implemented disciplines. HFSC has nicer theoretical properties than CBQ at the cost of slightly higher overhead. 1.2 Kernel Configuration Options The options defined in "i386/conf/ALTQ" will be OK for most users. When you use CBQ (especially on FastEthernet), it is recommended to use a fine-grained kernel timer, (since CBQ needs the timer to shape the traffic). The following option changes the timer from 100Hz to 1KHz. options HZ=1000 note: OpenBSD (and possibly NetBSD) doesn't support changing HZ. The kernel configuration options of ALTQ has dependencies. ALTQ: always required options for CBQ ALTQ_CBQ: required ALTQ_RED: to use RED on CBQ classes ALTQ_RIO: to use RIO on CBQ classes options for HFSC ALTQ_HFSC: required ALTQ_RED: to use RED on HFSC classes ALTQ_RIO: to use RIO on HFSC classes options for PRIQ ALTQ_PRIQ: required ALTQ_RED: to use RED on PRIQ classes ALTQ_RIO: to use RIO on PRIQ classes options for RED ALTQ_RED: required ALTQ_FLOWVALVE: red penalty-box options for RIO ALTQ_RIO: required options for CDNR ALTQ_CDNR: required options for BLUE ALTQ_BLUE: required options for WFQ ALTQ_WFQ: required options for FIFOQ ALTQ_FIFOQ: required options for JoBS ALTQ_JOBS: required options for AFMAP ALTQ_AFMAP: this is an undocumented feature (used to map an IP flow to an ATM VC) options for LOCALQ (a placeholder for any local use) ALTQ_LOCALQ: required options to support IPSEC in IPv4 (IPSEC is always supported in IPv6) ALTQ_IPSEC: to disable use of processor cycle counter ' ALTQ_NOPCC: HFSC, CDNR, and token-bucket regulators use the processor cycle counter (Pentium TSC on i386 and PCC on alpha) for measuring time. but it should be disabled in the following cases: - 386/486 (non-pentium) CPUs don't have TSC - in SMP, per-CPU counters are not in sync - Power Management might affect processor cycle counter - architecture other than i386 and alpha for debugging ALTQ (verbose and extra checking) ALTQ_DEBUG: ALTQ is global options (visible from all the kernel files). The other options are ALTQ local and put in "opt_altq.h" that is created by config(8) under the kernel build directory. 1.2.1 KLD (only in FreeBSD-3.x or later) KLD is dynamic kernel module support in FreeBSD. Each altq discipline can be loaded/unloaded at run time using the KLD mechanism. Note - altq cdev is not KLDed since altq support code is scattered in the kernel. - we do not use the module mechanism for built-in disciplines in order to be compatible with 2.x or other bsd. - afmap can't be a module since it's part of the atm driver. KLD modules are built and installed as part of the kernel build/install. If you want to manually install the KLD modules: # cd /usr/src/sys-altq/modules/altq # make # make install The modules are installed in the "/modules" directory. The altq modules have names starting with "altq_" (e.g., altq_cbq.ko). altqd(8) tries to load required modules automatically. In case you want to manipulate the modules by hand, To load a module: # kldload altq_cbq To check the loaded modules: # kldstat -v To unload a module: # kldunload altq_cbq 1.3 Test Environments Creating a Bottle-neck: Queueing is effective at the entrance of a bottleneck link where many packets are stored in the queue, and thus, a better queueing has a chance to do something intelligent. If you don't have a bottleneck, there isn't much the router can do. An example is shown in the figure below. In this case, the bottleneck is the interface at the source rather than at the router. src ----> router ----> sink 10Mbps 10Mbps On the other hand, if you are trying to test ALTQ, you have to create a bottleneck. There are two approaches you can take. (1) Fast-link to Slow-link: src ----> router ----> sink 100Mbps 10Mbps ^bottle-neck (2) Many-to-One Connection: 10Mbps src1 ----> router ----> sink src2 ----> 10Mbps 10Mbps ^bottle-neck Probably, method (1) is easier to handle, but it depends on what you are trying to achieve. 1.4 Traffic Generation Even when you have a bottleneck, it is not a simple task to control the queue length. I recommend to use TCP; a UDP stream just overflows the queue and eats up the CPU power and the link bandwidth. But a single TCP will not grow the queue length; TCP is clever enough! You have to run multiple TCP streams to observe some interesting traffic dynamics. The following simple script works for me. #!/bin/sh PATH=/bin:/usr/bin:/usr/local/bin export PATH dest=dest-host-name sec=20 win=48K size=8K netperf -H $dest -l $sec -- -m $size -s $win -S $win & sleep 3 netperf -H $dest -l $sec -- -m $size -s $win -S $win & sleep 3 netperf -H $dest -l $sec -- -m $size -s $win -S $win & sleep 3 netperf -H $dest -l $sec -- -m $size -s $win -S $win 1.5 Network Cards Ethernet/FastEthernet: Most PCI based Ethernet drivers support PCI busmastering DMA. fxp driver is the most popular in the FreeBSD community. Be aware that Ethernet is shared media; other traffic (and even your traffic for the reverse direction) will affect the performance. To reduce the risk of possible interference, you can connect 2 machines with a cross-cable and set the NICs to the "full-duplex" mode (see ifconfig(8)). Synchronous Serial: I have been using RISCom/N2 cards (sr driver). There is a third party driver for Cronyx Tau/E1 and Cyclades-PC300. ATM: Efficient Networks, Inc. ENI-155 and Adaptec ANA-59x0. (both Efficient and Adaptec already dropped ATM NICs from their product line) ATM is quite nice for performance tests. - the interface speed can be set by a hardware shaper - full-duplex - negligible delay - many commercial tools (and even hardware delayers) Slip: It turns out that slip is nice to build a test environment with NotePCs! 1.6 PC Hardware The IO throughput of a 200MHz Pentium PC is about 100Mbps (the bottleneck is memory-access). It should be enough to handle 10Mbps NICs. PCs with PentiumPro or PentiumII have much better IO performance (not just because of CPU power but their chipsets). Here are TCP throughputs measured for lo0 (local loop). They show relative differences in performance. MMX Pentium 200MHz 196Mbps PentiumPro 200MHz 366Mbps Pentium-II 300MHz 420Mbps Pentium-II 400MHz 541Mbps Pentium-III 700MHz 1524Mbps 1.7 Token Bucket Regulator Starting from altq-3.0, a token-bucket regulator is used to control the bahavior of a network device driver. Most ALTQ users do not need to tune the token-bucket regulator, but if you want to (1) rate-limit the interface, or (2) minimize the delay here is how to tune the token-bucket regulator. In order to separate the effect of a token-bucket regulator from that of a queueing discipline, it is recommended to tune the token-bucket regulator first with FIFOQ, and then, use the resulting setting for other queueing disciplines. There is a trade-off in setting the transmission buffer size in a network card. If the buffer size is too small, there is a risk of buffer under-run that makes the link under-utilized even when packets are backlogged. Another concern is the overhead of interrupt processing. A larger buffer helps to reduce the number of interrupts for those network cards which interrupt only when all transmission is completed. On the other hand, if the buffer is too large, it has negative effects to packet scheduling. Many modern network cards support chained DMA, typically, up to 128 or 256 entries. Most network drivers are written to buffer packets as many as possible in order not to under-utilize the link and to reduce the number of interrupts. However, it creates a long waiting queue after packets are scheduled by the packet scheduler, and large buffers in network cards adversely affect packet scheduling. The device buffer has an effect of inserting another FIFO queue beneath a queueing discipline. An obvious problem is delay caused by a large buffer. Even if the packet scheduler tries to minimize the delay for a certain packet, the packet needs to wait in the device buffer for hundreds of packets to be drained. Thus, delay cannot be controlled if there is a large buffer in the network card. Another less obvious but more serious problem is bursty dequeues. When the device buffer is large, packets are moved from the queue to the device buffer in a very bursty manner. If the queue gets emptied when a large chunk of packets are dequeued at a time, the packet scheduler loses control. A packet scheduler is effective only when there are backlogged packets in the queue. These problems are invisible under FIFO, and thus, most drivers are not written to limit the number of packets in the transmission buffer. However, the problem becomes apparent when preferential scheduling is used. The transmission buffer size should be set to the minimum amount that is required to fill up the link. Although it is not easy to automatically detect the appropriate buffer size, the number of packets allowed in the device buffer should be limited to a small number. Many drivers, however, set an excessive buffer size. Hence, it is necessary to have a way to limit the number of packets (or bytes) that are buffered in the card. The purpose of a token bucket regulator is to limit the amount of packets that a driver can dequeue. A token bucket has ``token rate'' and ``bucket size''. Tokens accumulate in a bucket at the average ``token rate'', up to the ``bucket size''. A driver can dequeue a packet as long as there are positive tokens, and after a packet is dequeued, the size of the packet is subtracted from the tokens. Note that this implementation allows the token to be negative as a deficit in order to make a decision without prior knowledge of the packet size. It differs from a typical token bucket that compares the packet size with the remaining tokens beforehand. The bucket size controls the amount of burst that can dequeued at a time, and controls a greedy device trying dequeue packets as much as possible. This is the primary purpose of the token bucket regulator, and thus, the token rate should be set to the actual maximum transmission rate of the interface. On the other hand, if the rate is set to a smaller value than the actual transmission rate, the token bucket regulator becomes a shaper that limits the long-term output rate. Another important point is that, when the rate is set to the actual transmission rate or higher, transmission complete interrupts can trigger the next dequeue. However, if the token rate is smaller than the actual transmission rate, the rate limit would be still in effect at the time of transmission complete interrupt, and the rate limiting falls back to the kernel timer to trigger the next dequeue. In order to achieve the target rate under timer-driven rate limiting, the bucket size should be increased to fill the timer interval. 1.7.1 Interface Rate-Limiting If you want to limit the outgoing bandwidth of an interface but you don't need a queueing discipline, you can set up a token-bucket regulator without any queueing discipline by tbrconfig(8). To limit the outgoing traffic of fxp0 up to 30Mbps, # tbrconfig fxp0 30M auto fxp0: tokenrate 30.00M(bps) bucketsize 36.62K(bytes) The "auto" keyword is used to automatically calculate the required bucket size. In the above example, 36.62KB is selected. The following formula is used to compute the bucket size: bucket_size = desired_rate(in bps) / 8 / kernel_timer_frequency; The computed bucket size is conservative in the sense that it is large enough to be able to satisfy the specified rate only by the kernel timer events. In many cases, a half of the computed size is still able to achieve the rate. To remove the installed token-bucket regultor, # tbrconfig -d fxp0 deleted token bucket regulator on fxp0 By default, altqd selects a small bucket size for non-rate limiting operation. If you want to use a queueing discipline with interface rate-limiting, you need to explicitly specify the bucket size by "tbrsize" in the interface commmand of altq.conf. "bandwidth" specifies the token rate, and "tbrsize" specifies the bucket size. The following FIFOQ setting has an effect similar to the previous example for tbrconfig(8). [altq.conf] interface fxp0 bandwidth 30M tbrsize 36K fifoq It is recommended to tune the backet size with FIFOQ, and then, use the resulting size for other queueing disciplines. Note that, if a token-bucket regulator is already installed on the interface when altqd is started, altqd does not install a new token-bucket regulator. That is, the existing setting is respected. 1.7.2 Tuning a Token-Bucket Regulator to Minimize Delay If you are serious about minimizing the delay, you need to tune the token rate and the bucket size. The point here is to set the token rate (bandwidth) to match the actual maximum transmission rate. If the token rate is higher than the transmission rate, packets can accumulate in the device buffer, which increases delay for high priority packets. <> (1) measure the actual maximum throughput Let's start with a simple FIFOQ config: [altq.conf] interface fxp0 bandwidth 100M fifoq Run your favorite benchmark, and observe the throughput by altqstat(1). I use netperf with the following parameters: netperf -H -t TCP_STREAM -l 20 -- -s 56K -S 56K -m 8K [altqstat output] q_len:17 q_limit:50 period:5158 xmit:17557 pkts (26571598 bytes) drop:0 pkts (0 bytes) throughput: 66.23Mbps q_len:25 q_limit:50 period:8374 xmit:28704 pkts (43448156 bytes) drop:0 pkts (0 bytes) throughput: 67.17Mbps The throughput is about 67Mbps. Note that the benchmarking software could report smaller throughput since it is the application level throughput. The throughput reported by altqstat(1) includes TCP/IP/MAC headers, and this value should be used for the token rate. (2) set the measured token rate [altq.conf] interface fxp0 bandwidth 67M fifoq ^^^ Repeat the measurement. [altqstat output] q_len:34 q_limit:50 period:222 xmit:48625 pkts (73608550 bytes) drop:0 pkts (0 bytes) throughput: 64.80Mbps q_len:22 q_limit:50 period:287 xmit:59437 pkts (89977918 bytes) drop:0 pkts (0 bytes) throughput: 65.15Mbps This time, focus on the period counter. The period counter of FIFOQ is incremented every time the queue becomes empty. Therefore, if the period counter increases, the queue is not constantly backlogged. It suggests the bucket size is too large (provided that the TCP window size is big enough.) (3) decrease the bucket size [altq.conf] interface fxp0 bandwidth 67M tbrsize 4K fifoq ^^^^^^^^^^ Repeat the measurement. [altqstat output] q_len:34 q_limit:50 period:22 xmit:36567 pkts (55352738 bytes) drop:0 pkts (0 bytes) throughput: 62.49Mbps q_len:31 q_limit:50 period:22 xmit:46982 pkts (71121048 bytes) drop:0 pkts (0 bytes) throughput: 62.76Mbps As you can see, the period counter now stays at 22 (in excahange for slightly lower throughput). If "tbrsize" becomes too small, the throughput will sharply degrade. (4) use the parameters for other queueing disciplines The following altq.conf sets up PRIQ (priority queueing) to give high priority to ICMP. [altq.conf] interface fxp0 bandwidth 67M tbrsize 4K priq class priq fxp0 high_class NULL priority 1 class priq fxp0 def_class NULL priority 0 default filter fxp0 high_class 0 0 0 0 1 Run ping(8) and the TCP benchmarking at the same time, and see the delay experienced by ping(8). If you commnet out the filter line, both flows are put into the same default class. 1.8 ATM Driver The ATM driver is based on bsdatm1.4 written by Chuck Cranor of Washington University . the ALTQ release includes enhancements to the ATM driver (especially, pvc interface support). 1.9 Tun Driver Although altq-3.0 supports the tun driver, it is a bit tricky to control the tun device. When ppp transmits packets the tun interface is not a bottleneck but the serial port is the bottleneck. As a result, packets are queued in the output queue of ppp and the buffer in the kernel as shown in the following figure. (a queued packet is shown as X.) ppp app ---+ +-->[ XXX]--+ | | | user | | | ==========|==============|===========|===== kernel | | | | +------+ | | +-->| |---+ +->[ XX]--> sio +------+ line tun0 discipline To control traffic at the tun interface, rate-limit the tun interface by a token-bucket regulator to shift the queueing point to the tun device as shown in the following figure. ppp app ---+ +-->[ ]--+ | | | user | | | ==========|==============|===========|===== kernel | | | | +------+ | | +-->| XXXXX|---+ +->[ ]--> sio +------+^tbr line tun0 discipline Note that ppp usually compress packets so that the throughput at the tun interface will be much higher than the line rate. 2. CBQ 2.1 CBQ Configuration File Keep your altq.conf simple! Most of the CBQ parameters are automatically set by the system unless they are explicitly specified in the configuration file. Basic commands Though there are many commands and options, all you need to use will be the following commands and their options. interface if_name [bandwidth bps] cbq (e.g., "interface fxp0 bandwidth 10M cbq") class cbq if_name class_name parent [borrow] [pbandwidth percent] [red] [default|control] (e.g., "class cbq fxp0 my_ftp tcp_class borrow pbandwidth 30 red") filter if_name class_name dst_addr dst_port src_addr src_port proto (e.g., "filter fxp0 my_ftp 133.138.1.83 0 0 20 6") "interface" command sets up the interface. specify the interface bandwidth in bits-per-second. "class" command creates a class. set the bandwidth of the class by "pbandwidth" in percent of the interface bandwidth. set "borrow" when the class can borrow bandwidth from its parent class. set "red" if you use RED dropper (good for TCP). "filter" command sets a packet-filter to a class. a basic filter uses . NOTE: dst comes first. set "0" if you don't care about the field. 2.2 Filter-Matching Rule The CBQ (and HFSC/PRIQ) classifier performs filter-matching for every packet. The classifier goes through the filters from the last entry in the config file, which means you have to list a more generic filter first in the config file. For example, two filters, one for all TCP and the other for HTTP, should be listed in the following order. filter fxp0 TCP_class 0 0 0 0 6 filter fxp0 HTTP_class 0 0 0 80 6 If the order is reversed, all HTTP packets match TCP_class first. In other words, the HTTP filter is a "subset" of the TCP filter. All packets matched by the HTTP filter are matched by the TCP filter. On the other hand, if two filters have different values in the same field, there's no packet to match both filters. Such two filters are "disjoint". For example, a packet has a single source port number and never matches both of the following filters. filter fxp0 TELNET_class 0 0 0 23 6 filter fxp0 HTTP_class 0 0 0 80 6 Another filter relation is "intersect". If two filters have a shared region (intersection) but they are not a subset of each other, the order of applying the filters is very important. For example, a filter by the destination address and a filter by the source port. HTTP Packets to 133.138.1.83 match both filters. filter fxp0 my_class 133.138.1.83 0 0 0 6 filter fxp0 HTTP_class 0 0 0 80 6 It is recommended to avoid the use of "intersecting" filters. The last filter relation is a special case of "intersection", called "port intersection". when two filters have the following relation: - intersection is only port numbers - one specifies src port and the other specifies dst port the well-known ports are used by the system processes, and there must be no packet with well-known port numbers in both src and dst ports. so we allow this special "intersection" and handle it differently. For example, filter fxp0 TELNET_class 0 23 0 0 6 filter fxp0 HTTP_class 0 0 0 80 6 The altqd config file parser will - provide an error message and exit when a "subset" filter has an wrong order. - provide a warning message when a "intersecting" filter is detected. (can be supressed with keyword "dontwarn".) 2.3 Sample Setting The following graph shows a sample class hierarchy in which traffic is divided into 3 meta classes (bulk, interactive, misc). The meta classes are defined in order to control hierarchical distribution of the available bandwidth under congestion. Filters are set only to the leaf classes. root | (100%) +-------------------------------+ | | def_class | | (95%) | +--------------+---------------+ | | | | | bulk misc intr | | (30%) | (30%) | (30%) | +----+----+ | +-+---+ | | | | | | | | tcp ftp http udp dns telnet ctl_class (10%)(10%)(10%) (10%) (10%) (10%) (4%) The corresponding altq.conf is listed below. line 4: interface is sr0, bandwidth is 1Mbps, use CBQ line 5: create root class. set "NULL" to parent and "100"% to pbandwidth (bandwidth in percent). line 9: create "control class" using keyword "control". the system uses the control class to send control packets (RSVP, ICMP, IGMP). This rule is built-in and provided for backward compatibility. This feature will be removed in the future. if a control class is not defined by the time of the default class is created, the system will automatically create one with 2% bandwidth. The bandwidth is taken out of the default class. line 10: create "default class". if a packet doesn't match the filters, the packet is put into the default class. line 12-14: create 3 meta-classes as children of the default class. they can borrow bandwidth from the default class. line 23: create a class for TCP as a child class of the "bulk" class. bandwidth can be borrowed from the parent. also, this class uses RED dropper. line 24: add a filter to the tcp class. This filter match all TCP packets (proto=6), and thus, should be listed earlier than other filters for packets using TCP. 1 # 2 # sample configuration file for 1Mbps link 3 # 4 interface sr0 bandwidth 1M cbq 5 class cbq sr0 root NULL pbandwidth 100 6 # 7 # meta classes 8 # 9 class cbq sr0 ctl_class root pbandwidth 4 control 10 class cbq sr0 def_class root borrow pbandwidth 95 default 11 # 12 class cbq sr0 bulk def_class borrow pbandwidth 30 13 class cbq sr0 misc def_class borrow pbandwidth 30 14 class cbq sr0 intr def_class borrow pbandwidth 30 15 16 # 17 # leaf classes 18 # 19 20 # 21 # bulk data classes 22 # 23 class cbq sr0 tcp bulk borrow pbandwidth 10 red 24 filter sr0 tcp 0 0 0 0 6 # other tcp 25 class cbq sr0 ftp bulk borrow pbandwidth 10 red 26 filter sr0 ftp 0 0 0 20 6 # ftp-data 27 filter sr0 ftp 0 20 0 0 6 # ftp-data 28 class cbq sr0 http bulk borrow pbandwidth 10 red 29 filter sr0 http 0 0 0 80 6 # http 30 filter sr0 http 0 80 0 0 6 # http 31 # 32 # misc (udp) classes 33 # 34 class cbq sr0 udp misc borrow pbandwidth 10 red 35 filter sr0 udp 0 0 0 0 17 # other udp 36 # 37 # interactive classes 38 # 39 class cbq sr0 dns intr borrow pbandwidth 10 red 40 filter sr0 dns 0 0 0 53 17 41 filter sr0 dns 0 0 0 53 6 42 class cbq sr0 telnet intr borrow pbandwidth 10 red 43 filter sr0 telnet 0 0 0 23 6 # telnet 44 filter sr0 telnet 0 23 0 0 6 # telnet 45 filter sr0 telnet 0 0 0 513 6 # rlogin 46 filter sr0 telnet 0 513 0 0 6 # rlogin Some TIPS for CBQ setting: I have to admit that it is tricky to correctly set up CBQ parameters. - Don't try to borrow too much. There are some technical difficulties. For example, parent: 90%, child: 2%. The child should be able to use up to 90%, but may not work as you expect depending other conditions involved. - Keep the depth of the leaf classes equal from the root class. - Setting high-priority to a class won't help much, and high-priority has some side-effect to borrowing mechanism. Don't use "priority" unless the link is less than 1Mbps. - Don't expect accurate rate control. CBQ has error margins of several percent against the REAL interface speed. - Use the "altqstat" tool to see the various statistics of a class. Especially, I recommend to use 1000Hz timer for CBQ tests. Although CBQ should work with 100Hz timer, it is not easy to tune CBQ for a wide range of CPU and network (speed, MTU, etc). 2.4 CBQ Setting for Slow Interfaces There has been some difficulties to set the right parameters when the link is slow (say, less than 512Kbps). If the default doesn't work well, try "maxburst 2" or "maxburst 1". Also, I recommend to assign more than 10% of the link bandwidth to each class. Setting high-priority to an interactive class could improve the response time. See also "Token-bucket regulator". 2.5 Shaping the Total Traffic of an Interface CBQ is not designed to shape the total traffic. (the original CBQ design assumes that the bandwidth of the root class is bound by the link bandwidth.) Therefore, I recommend to use a link-layer technologies (e.g., serial-line speed) to reduce the total traffic. Having said that, if you still want to shape the total traffic there are two ways. Don't expect accurate rate control. CBQ has error margins of several percent against the REAL interface speed. (1) Use a Class as a Shaper Create a common ancestor class which limits the total bandwidth. Limitations: - the minimum unit is 1% of the linkbandwidth interface fxp0 bandwidth 10M cbq class cbq fxp0 root_class NULL pbandwidth 100 # use def_class as a 1Mbps shaper. don't borrow from root class cbq fxp0 def_class root_class pbandwidth 10 default class cbq fxp0 tcp_class def_class pbandwidth 5 (2) Set a Low Value to Interface Bandwidth Limitations: - real traffic becomes bursty. (try "maxburst 2" or "maxburst 1") - borrowing may not work so well. # set 512Kbps to the 10Mbps interface interface fxp0 bandwidth 512K cbq class cbq fxp0 root_class NULL pbandwidth 100 # don't borrow from the root class class cbq fxp0 def_class root_class pbandwidth 95 default class cbq fxp0 tcp_class def_class pbandwidth 50 See also "Token-bucket regulator". 2.6 CBQ over 100baseT CBQ shapes the outgoing traffic using the kernel timer. 1500 byte MTU over 100Mbps is too much for the default 10msec kernel timer. You need to use 1msec timer instead. See "Kernel Configuration Options" for how to do it. 2.7 CBQ Monitoring the altqstat program reports the statistics and the internal state of the classes. The output for a class looks like: Class 0 on Interface fxp0: priority: 1 depth: 0 offtime: 8664 [us] wrr_allot: 1016 bytes nsPerByte: 2666 (3.00 Mbps), Measured: 8.22 [Mbps] pkts: 9735, bytes: 13381264 overs: 9662, overactions: 520 borrows: 9142, delays: 520 drops: 2, drop_bytes: 3008 QCount: 1, (qmax: 30) AvgIdle: -125 [us], (maxidle: 1843 minidle: -125 [us]) How to read: (you have to understand the mechanism of CBQ) priority: 1 depth: 0 offtime: 8664 [us] wrr_allot: 1016 bytes --> priority is 1, depth is 0 (it's a leaf node) offtime is 8664usec, allotment of weighted-round robin is 1016B. nsPerByte: 2666 (3.00 Mbps), Measured: 8.22 [Mbps] --> the bandwidth of the class is 3.00Mbps, currently 8.22Mbps is used pkts: 9735, bytes: 13381264 --> 9735 packets were transmitted (total 13381264 bytes) overs: 9662, overactions: 520 --> exceeded the assigend bandwidth 9662 times overlimit actions were called 520 times borrows: 9142, delays: 520 --> borrowed bandwidth from its ancestor 9142 times suspended for rate-control 520 times drops: 2, drop_bytes: 3008 --> 2 packets were dropped (total 3008 bytes) QCount: 1, (qmax: 30) --> current queue length is 1 (the limit is 30) AvgIdle: -125 [us], (maxidle: 1843 minidle: -125 [us]) --> current average idle is -125 usec. (maxidle is 1843 usec and minidle is -125 usec) 2.8 Limitations of Borrowing The borrowing mechanism is nice but it has limitations: 1. a small class cannot borrow the entire bandwidth of its parent. a class gets suspended when it is overlimit. a smaller class has a longer suspension period (offtime). when borrowing is enabled, a child also borrows the offtime of the parent. but when the parent also gets overlimit, the child has to use its own offtime to avoid overloading the system. (otherwise, all the classes use the minimum offtime even under a heavy load.) as a result, a small class is not able to make full use of the bandwidth of the parent. 2. competing TCPs equally share the bandwidth even when their bandwidth allocations are not equal. when borrowing is enabled, the bandwidth allocation is enforced only when the queues have enough backlog. but TCPs can reach the equilibrium without creating backlog in the queues. in this case, the bandwidth share is made by the TCP mechanism, not by CBQ. if there are many TCP flows, TCP will not be able to reach the equilibrium and the allocation will be done by CBQ. 3. UDP beats TCP when both are set to borrow from the same parent. the situation is similar to case 2. TCP backs off before the allocation is enforced by CBQ. again, if there are many flows, the situation will be improved. 4. UDP is very bursty when borrowing is enabled. as explained in case 1, a child has to be suspended longer when the parent gets overlimit. this leads to the bursty behavior of UDP. on the other hand, TCP adapts better to avoid overloading the parent. 3. HFSC 3.1 HFSC Basics The following paper is a good reference but it is a bit too theoretical so that HFSC basics are summarized here. "A Hierarchical Fair Service Curve Algorithm for Link-Sharing, Real-Time and Priority Service" Ion Stoica, Hui Zhang, and T. S. Eugene Ng. SIGCOMM'97. More information is available from http://www.cs.cmu.edu/~hzhang/HFSC/main.html Service Curve: HFSC maintains 2 service curves; one for real-time criteria and the other for link-sharing criteria. A service curve of HFSC consists of 2 segments. "m1" and "m2" are slopes of the 2 segments and "d" is the x-projection of the intersection that specifies the length of the 1st segment. Intuitively, "m2" specifies the long term throughput guaranteed to a flow, while "m1" specifies the rate at which a burst is served. When the slope of the 1st segment is larger than that of the 2nd segment, it is called "concave". a service curve is either convex or concave. m2 ________ ________-------- / / m2 / / / m1 / / / m1 = 0 / / __________/ <-d-> <-- d ---> concave convex A concave service curve provides a bounded burst similar to a token-bucket. The triangular area made by the 1st segment is roughly corresponds to the depth of a token-bucket, and the slope bounds the peak rate. The difference is that the peak rate of a token-bucket is a upper bound of the sending rate and is often set to the wire speed. On the other hand, HFSC guarantees the rate defined by the 1st segment, and thus, it cannot be the wire speed. A convex service curve, on the other hand, supresses the initial traffic volume. "m1" of a convex curve must be 0 in the current implementation. A linear service curve is a special case of a convex curve with a NULL 1st segment. A linear service curve corresponds to a traditional virtual clock model and a good starting point for novice users. Virtual Time: Each class keeps the total byte count already sent. When a class is backlogged, virtual time (vt) is calculated for the packet at the head of the class queue. vt is the x-projection of the service curve corresponding to (total + packet_len). As a result, vt of a class monotonically increases. By scheduling a packet with the smallest vt, the bandwidth allocation becomes propotional to the service curve slope of each class. bytes | / | /service curve | / next -->+ +----------------+ packet | | /| length | | / | | | / | total --> + +------------+ | bytes | /| | already | / | | sent | / | | / | | | | | | --------+---+--------------> time vt for next packet vt for previous packet A service curve is updated every time a class becomes backlogged. The update operation takes the minimum of (1) the service curve used in the previous backlogged period and (2) the original service curve starting at (current_time, total_bytes). When a class has been idle long enough, the updated curve is equal to (2). On the other hand, when the class has been using bandwidth much more than its share, the updated curve is equal to (1). (1) and (2) can intersect when the class has been using bandwidth a little less than its share. In this case, the updated curve could have a different value of "d". The operation is illustrated in the following figures. It might be easier to see it as a half-filled token-bucket. ________ ________-------- / ______ / ________-------- ________---+---- / / / / total -> / + new coordinate / / service curve | of + previous period current time Update Operation ______ ________-------- +---- / / total -> + new coordinate | + current time New Service Curve HFSC Scheduling: HFSC has 2 independent scheduling mechanisms. Real-time scheduling is used to guarantee the delay and the bandwidth allocation at the same time. Hierachical link-sharing is used to distribute the excess bandwidth available. When dequeueing a packet, HFSC always tries real-time scheduling first. If no packet is eligible for real-time scheduling, link-sharing scheduling is performed. HFSC does not use class hierarchy for real-time scheduling. Hierarchical Link-sharing: In HFSC, only leaf classes have real packets but vt of an intermediate class is also maintained by summing up the total byte count used by its descendants. When dequeueing a packet, HFSC's hierachical scheduler walks through the class hierarchy from the root to a leaf class. At each level of the class hierarchy, the scheduler selects a class with the smallest vt among its child classes. When the scheduler reaches a leaf class, this leaf class is scheduled. Note that the scheduler looks at only direct children at each level. Thus, the bandwidth allocation is propotional to the service curve slopes among the sibling classes but is not propotional among classes with different parents. For example, 4 leaf classes in the following figure will have the same bandwidth allocation although A, B and C, D have different slopes. In other words, the ratio among siblings controls the bandwidth allocation, and absolute slope values do not matter. (Similarly, vt values should be consistent only among sibling classes.) root (100Mbps) | +-------+-------+ | | E (20Mbps) F (20Mbps) | | +---+---+ +---+---+ | | | | A B C D (10Mbps) (10Mbps) (1Mbps) (1Mbps) Real-time scheduling: As opposed to link-sharing scheduling, single consistent time is used for real-time scheduling. Each class keeps the cumulative byte count that is similar to the total byte count but only for packets scheduled by the real-time scheduling. HFSC computes the eligible time and deadline for each class. The eligible time and deadline are the x-projections of the head and tail of the next packet. A class becomes eligible for real-time scheduling when the current time becomes greater than the eligible time of the class. The real-time scheduler selects a class with the smallest deadline among eligible classes. bytes | / | /service curve | / next -->+ +----------------+ packet | | /| length | | / | | | / | cumulative --> + +------------+ | bytes | /| | already | / | | sent | / | | / | | | | | | --------+---+--------------> time eligible deadline time In the original HFSC paper, a single service curve is used for both real-time scheduling and link-sharing scheduling. We have extended HFSC to have independent service curves for real-time and link-sharing. Decoupling service curves allows to independently control the guaranteed rate and the distribution of excess bandwidth. For example, it is possible to guarantee the minimum bandwidth of 2Mbps to 2 classes but the excess bandwidth is distributed with a different ratio. It is also possible to set either of the service curves to be 0. When the real-time service curve is 0, a class recevies only excess bandwidth. When the link-sharing service curve is 0, a class cannot recevie excess bandwidth. Note that 0 link-sharing makes the class non-work conserving. Note that, the link-sharing scheduling alone can guarantee the assigned bandwidth as long as the real-time service curve is equal to or smaller than the link-sharing service curve for all classes. But if the link-sharing service curve is smaller, assigned link-sharing bandwidth may not be provided. 3.2 Notes on HFSC Root class the root class is automatically created by the interface command. Both service curves are initialized with a linear curve of the interface speed. Default class One default (leaf) class is required for an interface. Only leaf classes can have filters because of the hierarchical link-sharing algorithm, only leaf class can have packets. Thus, you will need to explicitly create a leaf class to represent an intermediate class. For example, in order to distribute the bandwidth of CMU to its departments, create a leaf class "other" and attach a filter for CMU to this leaf class. | CMU | +-------+-------+ | | | CS EE other Admission Control Sum of the service curves of the children should be less than the service curve of the parent. Especially, you have to be careful when assigning concave service curves since the sum of the peak rates could be large. Reserved real-time bandwidth Many network cards are not able to saturate the wire, and if we allocate real-time traffic more than the actual maximum transmission rate, all classes become eligible and hfsc is no longer able to meet the delay bound requirements. Thus, 20% of the real-time bandwidth is reserved for safety. Link-sharing does not have reserved bandwidth. Specifying service curves in altq.conf To specify the same service curve to both real-time and link-sharing, use [sc ] To specify only the real-time service curve, use [rt ] To specify only the link-sharing service curve, use [ls ] Keyword "pshare" and "grate" are shorthand expression to specify a linear service curve. "pshare" specifies a linear link-sharing service curve by percentage of the interface bandwidth. "pshare " is equivalent to "[ls 0 0 m2]" here m2 = interface_bandwidth * percent / 100 "grate" specifies a linear real-time service curve. "grate " is equivalent to "[rt 0 0 m2]". Shaping by HFSC NULL link-sharing service curve can be used to limit the bandwidth of a class. When a link-sharing service curve is zero, packets more than the assigned real-time rate remain in the queue until the class becomes eligible again. Shaping of HFSC is more accurate than CBQ because HFSC does not use a suspention period (called offtime in CBQ) and packet length is considered. Note that delay requirement can not be guaranteed when in the shaping mode since an eligible packet could be held until the next timer tick in the worst case. Also note that, if NULL link-sharing service curve is assigned to a parent class, its children also cannot have link-sharing. 3.3 Sample configuration # # hfsc configuration for hierachical sharing # interface pvc0 bandwidth 45M hfsc # # (10% of the bandwidth share goes to the default class) class hfsc pvc0 def_class root pshare 10 default # # bandwidth share guaranteed rate # CMU: 45% 15Mbps # PITT: 45% 15Mbps # class hfsc pvc0 cmu root pshare 45 grate 15M class hfsc pvc0 pitt root pshare 45 grate 15M # # CMU bandwidth share guaranteed rate # CS: 20% 10Mbps # other: 20% 5Mbps # class hfsc pvc0 cmu_other cmu pshare 20 grate 10M filter pvc0 cmu_other 0 0 128.2.0.0 netmask 0xffff0000 0 0 class hfsc pvc0 cmu_cs cmu pshare 20 grate 5M filter pvc0 cmu_cs 0 0 128.2.242.0 netmask 0xffffff00 0 0 # # PITT bandwidth share guaranteed rate # CS: 20% 10Mbps # other: 20% 5Mbps # class hfsc pvc0 pitt_other pitt pshare 20 grate 10M filter pvc0 pitt_other 0 0 136.142.0.0 netmask 0xffff0000 0 0 class hfsc pvc0 pitt_cs pitt pshare 20 grate 5M filter pvc0 pitt_cs 0 0 136.142.79.0 netmask 0xffffff00 0 0 4. RSVP Too complex to list here. rsvpd works only on FreeBSD that implementes a special hook for RSVP. RSVP routers need to intercept RSVP signaling packets. The normal IP stack does not have a way to intercept packets not destined to itself. (The router alert IP option was introduced later for this purpose but it is not implemented in the BSD stack.) But here is a simple test scenario using rtap. sender: 172.16.3.17 port 9001 destination: 224.100.100.5 port 9000 rate 4000B/s (~32Kbps) sender ------------ router ------- receiver (172.16.3.17) ------------------------------------------------------------- # dest udp 224.100.100.5/9000 # dest udp 224.100.100.5/9000 # sender 9001 [t 4000 1000 5000 100] (src-port r b p m) **** start sending path message **** # receive (join the multicast group) **** start receiving path message *** # reserve ff 172.16.3.17/9001 \ [cl 4000 1000 100 5000] (reserve by fixed-filter) (or) reserve wf \ [cl 4000 1000 100 5000] (reserve by wildcard-filter) **** start sending resv message **** **** start receiving resv message **** # close **** ResvTear message **** **** ResvTear message **** # close **** PathTear message **** **** PathTear message **** 5. RED (Random Early Detection) To enable a simple RED at a router, specify "red" in altq.conf(5). interface fxp0 bandwidth 10M red RED parameters can be specified as follows: interface fxp0 bandwidth 10M red thmin 10 thmax 20 invpmax 15 qlimit 80 Note that RED never shapes the traffic by itself. To test RED, run multiple TCP streams from src to sink. % altqstat will show the statistics (among other thing, the average queue length). Experimental ECN (explicit congestion notification) support for IPv4 is added since altq-0.4.3. - To enable ECN by RED, add "ecn" to the "interface" command line. RED can be enabled for a CBQ/HFSC/PRIQ class. 6. ECN (Explicit Congestion Notification) ECN needs support in routers and end-hosts. ECN support in routers is a straightforward modification to RED, setting a "Congestion Experienced" bit in the IP header instead of dropping a packet. ECN support in end-hosts needs modifications to TCP. See README.ecn for more information. 7. Diffserv ALTQ supports 1. traffic conditioning at ingress interfaces CDNR(conditioner) supports - token-bucket meter - 2-rate three color marker - time-sliding window three color marker (ALTQ no longer supports meter/tagger at an output interface.) 2. preferential scheduling at egress interface RIO has been extended to support 3 drop precedence values. combine RIO and CBQ/HFSC/PRIQ to support multiple classes. you can build AF(Assured Forwarding) and EF(Expedited Forwarding) services using ALTQ. The following figure shows what ALTQ can do for diffserv. diffserv network +------------------------------+ | | src ----> ingress --------> core --------> egress ----> sink diffedge diffedge - MF classifier - BA classifier - BA classifier - meter/marker - meter/marker - meter/marker - AF/EF PHB - AF/EF PHB - AF/EF PHB - clear dscp Sample configuration files can be found in "altqd/altq.conf.samples/". Also see altq.conf(5). Note that the performance of the AF and EF services depends on how to provision those services and how to configure the network. ALTQ just provides mechanisms in order to build services. References RFC 2474 Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers RFC 2475 An Architecture for Differentiated Services RFC 2597 Assured Forwarding PHB Group RFC 2598 An Expedited Forwarding PHB RFC 2697 A Single Rate Three Color Marker RFC 2698 A Two Rate Three Color Marker draft-ietf-diffserv-model-00.txt A Conceptual Model for Diffserv Routers Sample configuration for traffic conditioner # # null interface command # interface pvc1 # # simple dropper # conditioner pvc1 dropper filter pvc1 dropper 0 0 172.16.4.173 0 0 # # simple marker to clear dscp # conditioner pvc1 clear_marker filter pvc1 clear_marker 0 0 172.16.4.174 0 0 # # EF style conditioner (a simple token bucket) # conditioner pvc1 ef_cdnr > filter pvc1 ef_cdnr 0 0 172.16.4.176 0 0 # # AF style conditioners (trTCM) # conditioner pvc1 af1x_cdnr \ colorblind> filter pvc1 af1x_cdnr 0 0 172.16.4.177 0 0 # # color-blind trTCM is equivalent to a dual tokenbucket meter # conditioner pvc1 dual_tb \ >> filter pvc1 dual_tb 0 0 172.16.4.178 0 0 Sample queueing configuration using HFSC # # output interface # interface pvc0 bandwidth 45M hfsc class hfsc pvc0 def_class root pshare 10 default # # EF class # real-time: 6Mbps # link-sharing: 0% # class hfsc pvc0 ef_class root grate 6M filter pvc0 ef_class 0 0 0 0 0 tos 0xb8 tosmask 0xfc # # AF classes # real-time: 3Mbps # link-sharing: 10% (4.5Mbps) # # rio threshold values rio 40 50 10 20 30 10 5 15 10 # class hfsc pvc0 af1x_class root grate 3M pshare 10 rio class hfsc pvc0 af2x_class root grate 3M pshare 10 rio class hfsc pvc0 af3x_class root grate 3M pshare 10 rio cleardscp class hfsc pvc0 af4x_class root grate 3M pshare 10 rio filter pvc0 af1x_class 0 0 0 0 0 tos 0x20 tosmask 0xe4 filter pvc0 af2x_class 0 0 0 0 0 tos 0x40 tosmask 0xe4 filter pvc0 af3x_class 0 0 0 0 0 tos 0x60 tosmask 0xe4 filter pvc0 af4x_class 0 0 0 0 0 tos 0x80 tosmask 0xe4 Similar queueing configuration using CBQ # # output interface # interface pvc0 bandwidth 45M cbq class cbq pvc0 root_class NULL pbandwidth 100 class cbq pvc0 def_class root_class borrow pbandwidth 86 default # # EF class # class cbq pvc0 ef_class root_class pbandwidth 14 priority 5 filter pvc0 ef_class 0 0 0 0 0 tos 0xb8 tosmask 0xfc # # AF classes # # rio threshold values rio 40 50 10 20 30 10 5 15 10 # class cbq pvc0 af1x_class def_class borrow pbandwidth 20 rio class cbq pvc0 af2x_class def_class borrow pbandwidth 20 rio class cbq pvc0 af3x_class def_class borrow pbandwidth 20 rio cleardscp class cbq pvc0 af4x_class def_class borrow pbandwidth 20 rio filter pvc0 af1x_class 0 0 0 0 0 tos 0x20 tosmask 0xe4 filter pvc0 af2x_class 0 0 0 0 0 tos 0x40 tosmask 0xe4 filter pvc0 af3x_class 0 0 0 0 0 tos 0x60 tosmask 0xe4 filter pvc0 af4x_class 0 0 0 0 0 tos 0x80 tosmask 0xe4 8. WFQ WFQ (weighted fair queueing) is implemented as a sample implementation. WFQ is easy to use since it requires no configuration for a default setting. By default, WFQ allocates 256 queues and packets are mapped into one of the queues by hashing the destination address. So, packets for the same host will be put in the same queue. To enable WFQ on interface "vx0" and "vx1", add the following lines to your altq.conf(5). interface vx0 bandwidth 10M wfq interface vx1 bandwidth 10M wfq The following command can be used to monitor the wfq statistics. % altqstat -i vx1 9. FIFOQ FIFOQ (first-in first-out queueing) is implemented as a template for those who want to write their own queueing schemes on the ALTQ framework. So, there would be no reason to use FIFOQ unless you want to modify the FIFOQ implementation. Using FIFOQ To enable FIFOQ on interface "vx0", add the following lines to your altq.conf(5). interface vx0 bandwidth 6M fifoq Alternatively, you can use a daemon process "fifoqd" in "legacy-tools". # fifoqd -d vx0 10. JoBS (Joint Buffer Management and Scheduling) The JoBS queuing scheme is contributed by Nicolas Christin (nicolas@cs.virginia.edu). This implementation is currently considered as EXPERIMENTAL. 10.1. JoBS Basics The following two papers are good references, but may be a bit theoretical, thus we summarized the basics of JoBS here. "JoBS: Joint Buffer Management and Scheduling for Differentiated Services" Jorg Liebeherr and Nicolas Christin. IWQoS'01. "A Quantitative Assured Forwarding Service" Nicolas Christin, Jorg Liebeherr and Tarek F. Abdelzaher. UVA-CS Technical Report CS-2001-21. Short version to appear in Infocom 2002. More information is available from http://qosbox.cs.virginia.edu Overview: As its name indicates, JoBS is a joint buffer management and scheduling algorithm. It provides, on a per-hop basis, absolute and proportional service guarantees to traffic aggregates (henceforth refered to as "classes" of traffic). The following types of guarantees are supported: - absolute throughput guarantees (ARC) e.g., Class-1 throughput >= 5 Mbps - absolute delay guarantees (ADC) e.g., Class-2 delay <= 3 ms - absolute loss guarantees (ALC) e.g., Class-1 loss rate <= 0.5 % - proportional delay guarantees (RDC) e.g., Class-3 delay/Class-2 delay is roughly equal to 2. - proportional loss guarantees (RLC) e.g., Class-4 loss rate/Class-3 loss rate is roughly equal to 2. The acronyms used differ from the name of the guarantees for historical reasons. Any mix of service guarantees can be enforced by JoBS. Service guarantees are offered to backlogged classes, and are valid over the current busy period. The begining of the current busy period is defined as the last time the output queue of the interface was empty. Mechanisms: JoBS uses the following mechanisms: a service rate is allocated to each class of traffic. Upon each packet arrival, the service rate is adjusted to meet the delay and throughput constraints. If no feasible rate allocation satisfies the delay and throughput constraints, traffic is dropped according to the loss guarantees specified. JoBS does not perform admission control or traffic policing. Instead, if the set of service guarantees becomes unfeasible (which may be the case when some absolute guarantees are offered), some service guarantees are relaxed. In this prototype implementation, the following order of relaxation is observed: 1. Relax RLC and/or RDC. 2. Relax ARC. 3. Relax ADC. 4. Relax ALC. An algorithm similar to Deficit Round-Robin is used to convert the rate allocations into packet scheduling decisions. Remarks: JoBS is a work-conserving scheduler. In other words, if the output link is idle, and a packet is backlogged, the backlogged packet is transmitted at once, REGARDLESS of the service guarantees specified. Hence, at low loads, the work-conserving property may result in improper proportional delay differentiation. On the other hand, at low loads, all classes get very low delays, and thus a high-grade service. Arguably, proportional delay differentiation is only needed at times of overload. By design, JoBS attempts at minimizing the number of packets dropped. This ALTQ implementation of JoBS offers two modes of operation: - shared buffer: all classes are backlogged in the same queue. If the queue length exceeds a given threshold, or if no feasible service rate allocation can satisfy the delay/throughput guarantees, packets are dropped. The shared buffer mode is the default, and is required to provide loss differentiation. - per-class buffers: a separate buffer is associated to each class. Per-class buffers are useful when ONLY throughput and delay guarantees are desired. Per-class buffers cannot be used to provide loss differentiation. Note that JoBS does NOT check if the set of service guarantees offered is feasible. While some examples are trivial (e.g., guaranteeing a throughput exceeding the output link capacity), some other cases may be trickier. For instance, giving an ARC to all classes, with a shared buffer and no loss guarantees will essentially result in FIFO queuing, and the service guarantees offered will NOT be respected. This can be explained as follows. Assume we have two classes, an output link capacity of 10 Mbps, and we want to give 7 Mbps to Class 1, and 3 Mbps to Class 2. After a short amount of time, Class 2 packets will end up filling up the buffer. Incoming Class 1 packets will thus end up being dropped, and the input rate (i.e., arrival rate - drop rate) of Class 1 will be limited by the "slow" Class 2 packets still in the buffer. This problem is a cousin of the traditional "Head-Of-Line" blocking problem. Using TCP sources make it even more obvious, since TCP sources reduce their sending rates when detecting a packet drop. Thus, if only rate guarantees are to be supported, you need to use SEPARATE buffers instead of the default shared buffer. 10.2. Sample configuration examples Note: If you want to try JoBS over the loopback interface, please be aware that, due to the complexity of the queueing scheme, you may not get the expected results if you are also using the machine to generate traffic. These examples produce the desired results when used on a Pentium III 1 Ghz, but we cannot make any promises for slower CPUs. (As a matter of fact, testing JoBS on a Pentium II 450 MHz showed that the machine had trouble generating enough traffic to saturate a 100 Mbps virtual link on the loopback when JoBS was running as a queueing discipline.) Thus, if you are using a slow processor, we recommend that you try with a 10 Mbps token bucket limiter, and accordingly modify Example 1. Example 1: # Configuration for a 100 Mbps output link (fxp1), # Separate buffers with a limit of 50 packets each, # throughput guarantees to all classes, # no delay or loss guarantees. # interface fxp1 bandwidth 100M qlimit 50 separate jobs # class jobs fxp1 high_class NULL priority 0 adc -1 rdc -1 alc -1 rlc -1 arc 39M class jobs fxp1 med2_class NULL priority 1 adc -1 rdc -1 alc -1 rlc -1 arc 29M class jobs fxp1 med1_class NULL priority 2 adc -1 rdc -1 alc -1 rlc -1 arc 19M class jobs fxp1 low_class NULL priority 3 default adc -1 rdc -1 alc -1 rlc -1 arc 9M filter fxp1 high_class 10.0.4.2 0 0 0 0 filter fxp1 med2_class 10.0.5.2 0 0 0 0 filter fxp1 med1_class 10.0.6.2 0 0 0 0 filter fxp1 low_class 10.0.7.2 0 0 0 0 Example 2: # Configuration for a 100 Mbps output link (fxp1), # Shared buffer with a limit of 200 packets, # Delay bound of 2000 microseconds on Class 0, # Loss rate bound of 0.5 % on Class 0, # Proportional differentiation as follows: # Class 3-Delay = 2 Class 2-Delay # Class 4-Delay = 2 Class 3-Delay # and # Class 3-Loss Rate = 2 Class 2-Loss Rate # Class 4-Loss Rate = 2 Class 3-Loss Rate # interface fxp1 bandwidth 100M qlimit 200 jobs # class jobs fxp1 high_class NULL priority 0 adc 2000 rdc -1 alc 0.005 rlc -1 arc -1 class jobs fxp1 med2_class NULL priority 1 adc -1 rdc 2 alc -1 rlc 2 arc -1 class jobs fxp1 med1_class NULL priority 2 adc -1 rdc 2 alc -1 rlc 2 arc -1 class jobs fxp1 low_class NULL priority 3 default adc -1 rdc 2 alc -1 rlc 2 arc -1 filter fxp1 high_class 10.0.4.2 0 0 0 0 filter fxp1 med2_class 10.0.5.2 0 0 0 0 filter fxp1 med1_class 10.0.6.2 0 0 0 0 filter fxp1 low_class 10.0.7.2 0 0 0 0 10.3. JoBS altqstat module The JoBS altqstat module reports the following output: The first column is the class index. The second column is the average queuing delay (in microseconds) experienced in the last five seconds, the third column is the ratio of the class-(i+1) delay with the class-i delay. The fourth column is the percentage of packets that missed their deadline (due to constraints relaxation) since the beginning of time. The fifth column represents the loss rate (in percent) of the class. The sixth column represents the throughput obtained by each class in Mbps. The seventh column represents the arrival rate (offered load) of each class in Mbps. The eigth column is the best case number of cycles consummed by the enqueue() function, the ninth column is the average number of cycles consummed by the enqueue function(), the tenth column is the standard deviation of the number of cycles consummed by the enqueue() function, the eleventh column is the worst-case number of cycles consummed by the enqueue() function. Columns 12 to 15 offer the same information for the number of cycles consummed by the dequeue() function. The sixteenth and seventeenth columns represented the total number of packets enqueued and dequeued since the beginning of time, respectively. A value of -1 means "Not Applicable". For example, one can get: fxp1: pri del rdc viol p_i rlc thru off_ld bc_e avg_e stdev_e wc_e bc_d avg_d stdev_d wc_d nr_en nr_de 3 51968 1.98 0.000 34.60 2.01 12.344 18.39 2970 15618 3398 47674 1400 3728 981 32245 199920 176436 2 26218 1.89 0.000 17.25 2.00 14.923 18.77 2970 15618 3398 47674 1400 3728 981 32245 199920 176436 1 13851 13.61 0.000 8.64 17.32 35.035 37.08 2970 15618 3398 47674 1400 3728 981 32245 199920 176436 0 1018 -1.00 0.897 0.50 -1.00 35.530 35.70 2970 15618 3398 47674 1400 3728 981 32245 199920 176436 which shows, among other things, that over the last 5 seconds, class 1 packets were queued for 1018 microseconds on average, that the ratio of class-3 delay to class-2 delays was 1.98, and that the average number of cycles consummed by the enqueue operation was 15618 cycles. In this example the loss rate of class 3 is extremely high (34%). This is due to the fact that we brutally overloaded the link with UDP traffic. 11. IPv6 ALTQ supports IPv6 and has been used over IPv6 since mid 1998. 12. Trouble Shooting 1. Kernel Configuration: Q. "opt_altq.h" file is missing. A. Somehow, you messed up the kernel source tree. Start with a fresh source tree and use the config file named "ALTQ". See "Kernel Configuration". 2. CBQ/HFSC/PRIQ: Q. altqd doesn't start. altqd: can't open altq device: No such file or directory A. /dev/altq/cbq is missing. use MAKEDEV.altq to create the ALTQ devices. Q. altqd doesn't start. syscall error: can't add cbq on interface 'lo0': Device not configured A. ALTQ doesn't support this driver. you'd better find a different interface card type. Q. altqd doesn't start. CBQ open: Operation not supported by device A. Your kernel doesn't have the CBQ module, or altqd failed to load the KLD module. The kernel patch might have failed. Don't forget % find altq_kernel_src_path -name "*.rej" -print when you apply the altq patch. An ALTQ-ready kernel shows the following line at boot: "altq: major number is 96" "strings /kernel | grep altq" should print the following line: "altq: major number is %d" Q. altqd doesn't start. syscall error: can't add cbq on interface 'fxp0': Device busy A. another altqd or another ALTQ related daemon is already running on this interface but you start another altqd. Q. CBQ reports warning: filter for "foo_class" at line 58 could override filter for "bar_class" at line 27 A. Two CBQ filters have "intersection". See "Filter-Matching Rules". Q. CBQ reports filters for "foo_class" at line 58 and for "bar_class" at line 57 has order problem! A. The order you put the filter is wrong. See "Filter-Matching Rules". Q. CBQ reports warning: class is too slow!! A. It's a warning message saying that the allocated bandwidth is below the precision of the internal calculation and the value is rounded up to the minimum value. CBQ has a limitation of bandwidth assigned for a class: the minimum bandwidth is 6Kbps to avoid 32bit integer overflow in the internal calculation. see "CBQ Setting for Slow Interfaces". Q. CBQ doesn't work as I expected. A. It is not easy to track down problems. My rule of thumb to track down problems: - watch out for possible interference: CPU or link could get saturated before queueing takes place. - start with a simple setting, add complexity step by step. - use "altqstat" to get the statistics and the internal state of CBQ. - try a kernel with a fine-grained timer value. if the problem is gone, there must be some granularity mismatching. 13. Coverage (incomplete) diffserv RFC2474 (default PHB and Class Selector PHBs) RFC2597 (Assured Forwarding PHB Group) RFC2598 (Expedited Forwarding PHB) RFC2697 (A Single Rate Three Color Marker) RFC2698 (A Two Rate Three Color Marker) RED RFC2309 ECN RFC3168