ALTQ Tips

				last update: $Date: 2006/09/28 03:00:40 $

	1. General Issues
		1.1 Queueing Disciplines
		1.2 Kernel Configuration Options
		    1.2.1 KLD
		1.3 Test Environments
		1.4 Traffic Generation
		1.5 Network Cards
		1.6 PC Hardware
		1.7 Token Bucket Regulator
		  1.7.1 Interface Rate-Limiting
		  1.7.2 Tuning a Token-Bucket Regulator to Minimize Delay
		1.8 ATM driver
		1.9 Tun Driver
	2. CBQ
		2.1 CBQ Configuration File
		2.2 Filter-Matching Rule
		2.3 Sample Setting
		2.4 CBQ Setting for Slow Interfaces
		2.5 Shaping the Total Traffic of an Interface
		2.6 CBQ over 100baseT
		2.7 CBQ Monitoring
		2.8 Limitations of Borrowing
	3. HFSC
		3.1 HFSC Basics
		3.2 Notes on HFSC
		3.3 Sample configuration
	4. RSVP
	5. RED (Random Early Detection)
	6. ECN (Explicit Congestion Notification)
	7. Diffserv
	8. WFQ
	9. FIFOQ
	10. JoBS
	11. IPv6
	12. Trouble Shooting
	13. Coverage

1. General Issues

1.1 Queueing Disciplines

    A queueing discipline controls outgoing traffic by packet
    scheduling and/or queue buffer management.
    (yes, it controls only outgoing traffic.)
    ALTQ supports many queueing disciplines but mainly for research
    purpose.
    Probably, what you will be using is only CBQ, HFSC and/or RED.
    CBQ is the most well-engineered among the implemented disciplines.
    HFSC has nicer theoretical properties than CBQ at the cost of
    slightly higher overhead.

1.2 Kernel Configuration Options

    The options defined in "i386/conf/ALTQ" will be OK for most users.

    When you use CBQ (especially on FastEthernet), it is recommended
    to use a fine-grained kernel timer, (since CBQ needs the timer to
    shape the traffic).  The following option changes the timer from 
    100Hz to 1KHz.
	options		HZ=1000

    note: OpenBSD (and possibly NetBSD) doesn't support changing HZ.

    The kernel configuration options of ALTQ has dependencies.

	ALTQ: always required

	options for CBQ
		ALTQ_CBQ: required
		ALTQ_RED: to use RED on CBQ classes
		ALTQ_RIO: to use RIO on CBQ classes
	options for HFSC
		ALTQ_HFSC: required
		ALTQ_RED: to use RED on HFSC classes
		ALTQ_RIO: to use RIO on HFSC classes
	options for PRIQ
		ALTQ_PRIQ: required
		ALTQ_RED: to use RED on PRIQ classes
		ALTQ_RIO: to use RIO on PRIQ classes
	options for RED
		ALTQ_RED: required
		ALTQ_FLOWVALVE: red penalty-box
	options for RIO
		ALTQ_RIO: required
	options for CDNR
		ALTQ_CDNR: required
	options for BLUE
		ALTQ_BLUE: required
	options for WFQ
		ALTQ_WFQ: required
	options for FIFOQ
		ALTQ_FIFOQ: required
	options for JoBS
		ALTQ_JOBS: required
	options for AFMAP
		ALTQ_AFMAP: this is an undocumented feature 
		(used to map an IP flow to an ATM VC)
	options for LOCALQ (a placeholder for any local use)
		ALTQ_LOCALQ: required
	options to support IPSEC in IPv4 (IPSEC is always supported in
		IPv6)
		ALTQ_IPSEC:
	to disable use of processor cycle counter
'		ALTQ_NOPCC:
		HFSC, CDNR, and token-bucket regulators use the
		processor cycle counter (Pentium TSC on i386 and PCC
		on alpha) for measuring time.
		but it should be disabled in the following cases:
		- 386/486 (non-pentium) CPUs don't have TSC
		- in SMP, per-CPU counters are not in sync
		- Power Management might affect processor cycle counter
		- architecture other than i386 and alpha
	for debugging ALTQ (verbose and extra checking)
		ALTQ_DEBUG:

    ALTQ is global options (visible from all the kernel files).
    The other options are ALTQ local and put in "opt_altq.h" that is
    created by config(8) under the kernel build directory.

1.2.1 KLD (only in FreeBSD-3.x or later)
    KLD is dynamic kernel module support in FreeBSD.
    Each altq discipline can be loaded/unloaded at run time using
    the KLD mechanism.

    Note
	- altq cdev is not KLDed since altq support code is scattered
	  in the kernel.
	- we do not use the module mechanism for built-in disciplines
	  in order to be compatible with 2.x or other bsd.
	- afmap can't be a module since it's part of the atm driver.

    KLD modules are built and installed as part of the kernel
    build/install.
    If you want to manually install the KLD modules:
	# cd /usr/src/sys-altq/modules/altq
	# make
	# make install
    The modules are installed in the "/modules" directory.
    The altq modules have names starting with "altq_" (e.g., altq_cbq.ko).

    altqd(8) tries to load required modules automatically.

    In case you want to manipulate the modules by hand, 
        To load a module:
	# kldload altq_cbq

        To check the loaded modules:
	# kldstat -v

        To unload a module:
	# kldunload altq_cbq

1.3 Test Environments

    Creating a Bottle-neck:
    Queueing is effective at the entrance of a bottleneck link where
    many packets are stored in the queue, and thus, a better queueing
    has a chance to do something intelligent.

    If you don't have a bottleneck, there isn't much the router can do.
    An example is shown in the figure below. In this case, the
    bottleneck is the interface at the source rather than at the router.

        src ----> router ----> sink
           10Mbps       10Mbps

    On the other hand, if you are trying to test ALTQ, you have to 
    create a bottleneck.  There are two approaches you can take.
    
    (1) Fast-link to Slow-link:

        src ----> router ----> sink
           100Mbps       10Mbps
                        ^bottle-neck

    (2) Many-to-One Connection:

             10Mbps
	src1 ---->
                   router ----> sink
        src2 ---->        10Mbps
             10Mbps      ^bottle-neck

    Probably, method (1) is easier to handle, but it depends on what
    you are trying to achieve.

1.4 Traffic Generation
    Even when you have a bottleneck, it is not a simple task to control
    the queue length.
    I recommend to use TCP; a UDP stream just overflows the queue and
    eats up the CPU power and the link bandwidth.
    But a single TCP will not grow the queue length; TCP is clever
    enough!

    You have to run multiple TCP streams to observe some interesting
    traffic dynamics.

    The following simple script works for me.

		#!/bin/sh
		PATH=/bin:/usr/bin:/usr/local/bin
		export PATH
		
		dest=dest-host-name
		sec=20
		win=48K
		size=8K
		netperf -H $dest -l $sec -- -m $size -s $win -S $win &
		sleep 3
		netperf -H $dest -l $sec -- -m $size -s $win -S $win &
		sleep 3
		netperf -H $dest -l $sec -- -m $size -s $win -S $win &
		sleep 3
		netperf -H $dest -l $sec -- -m $size -s $win -S $win 

1.5 Network Cards
	Ethernet/FastEthernet:
	  Most PCI based Ethernet drivers support PCI busmastering DMA.
	  fxp driver is the most popular in the FreeBSD community.

	  Be aware that Ethernet is shared media; other traffic (and
	  even your traffic for the reverse direction) will affect the
	  performance.
	  To reduce the risk of possible interference, you can connect
	  2 machines with a cross-cable and set the NICs to
	  the "full-duplex" mode (see ifconfig(8)).

	Synchronous Serial:
	  I have been using RISCom/N2 cards (sr driver).
	  There is a third party driver for Cronyx Tau/E1 and
	  Cyclades-PC300.

	ATM:
	  Efficient Networks, Inc. ENI-155 and Adaptec ANA-59x0.
	  (both Efficient and Adaptec already dropped ATM NICs from
	  their product line)

	  ATM is quite nice for performance tests.  
	    - the interface speed can be set by a hardware shaper
	    - full-duplex
            - negligible delay
	    - many commercial tools (and even hardware delayers)

	Slip:
	  It turns out that slip is nice to build a test environment
	  with NotePCs!

1.6 PC Hardware
    The IO throughput of a 200MHz Pentium PC is about 100Mbps (the
    bottleneck is memory-access).
    It should be enough to handle 10Mbps NICs.

    PCs with PentiumPro or PentiumII have much better IO
    performance (not just because of CPU power but their chipsets).

    Here are TCP throughputs measured for lo0 (local loop).  
    They show relative differences in performance.
	MMX Pentium 200MHz	196Mbps
	PentiumPro  200MHz	366Mbps
	Pentium-II  300MHz	420Mbps
	Pentium-II  400MHz	541Mbps
	Pentium-III 700MHz     1524Mbps

1.7 Token Bucket Regulator
    Starting from altq-3.0, a token-bucket regulator is used to
    control the bahavior of a network device driver.
    Most ALTQ users do not need to tune the token-bucket regulator,
    but if you want to
     (1) rate-limit the interface, or
     (2) minimize the delay
    here is how to tune the token-bucket regulator.

    In order to separate the effect of a token-bucket regulator
    from that of a queueing discipline, it is recommended to tune the
    token-bucket regulator first with FIFOQ, and then, use the
    resulting setting for other queueing disciplines.

    <Background Info: Negative effects of long DMA chains>
    There is a trade-off in setting the transmission buffer size in a
    network card.
    If the buffer size is too small, there is a risk of buffer
    under-run that makes the link under-utilized even when packets are
    backlogged.  Another concern is the overhead of interrupt
    processing.  A larger buffer helps to reduce the number of
    interrupts for those network cards which interrupt only when all
    transmission is completed.  On the other hand, if the buffer is too
    large, it has negative effects to packet scheduling.

    Many modern network cards support chained DMA, typically, up to
    128 or 256 entries.
    Most network drivers are written to buffer packets as many as
    possible in order not to under-utilize the link and to reduce the
    number of interrupts.
    However, it creates a long waiting queue after packets are
    scheduled by the packet scheduler, and large buffers in network
    cards adversely affect packet scheduling.
    The device buffer has an effect of inserting another FIFO queue
    beneath a queueing discipline.

    An obvious problem is delay caused by a large buffer.
    Even if the packet scheduler tries to minimize the delay for a
    certain packet, the packet needs to wait in the device buffer for
    hundreds of packets to be drained.
    Thus, delay cannot be controlled if there is a large buffer in the
    network card.

    Another less obvious but more serious problem is bursty dequeues.
    When the device buffer is large, packets are moved from the queue
    to the device buffer in a very bursty manner.
    If the queue gets emptied when a large chunk of packets are
    dequeued at a time, the packet scheduler loses control.  
    A packet scheduler is effective only when there are backlogged
    packets in the queue.

    These problems are invisible under FIFO, and thus, most drivers
    are not written to limit the number of packets in the transmission
    buffer. 
    However, the problem becomes apparent when preferential scheduling
    is used.
    The transmission buffer size should be set to the minimum amount
    that is required to fill up the link.
    Although it is not easy to automatically detect the appropriate
    buffer size, the number of packets allowed in the device buffer
    should be limited to a small number.
    Many drivers, however, set an excessive buffer size.
    Hence, it is necessary to have a way to limit the number of
    packets (or bytes) that are buffered in the card.

    <Token Bucket Regulator>
    The purpose of a token bucket regulator is to limit the amount of
    packets that a driver can dequeue.
    A token bucket has ``token rate'' and ``bucket size''.  
    Tokens accumulate in a bucket at the average ``token rate'', 
    up to the ``bucket size''.
    A driver can dequeue a packet as long as there are positive
    tokens, and after a packet is dequeued, the size of the packet is
    subtracted from the tokens.
    Note that this implementation allows the token to be negative as a
    deficit in order to make a decision without prior knowledge of the
    packet size.
    It differs from a typical token bucket that compares the packet
    size with the remaining tokens beforehand.

    The bucket size controls the amount of burst that can dequeued at
    a time, and controls a greedy device trying dequeue packets as
    much as possible.  This is the primary purpose of the token bucket
    regulator, and thus, the token rate should be set to the actual
    maximum transmission rate of the interface.

    On the other hand, if the rate is set to a smaller value than the
    actual transmission rate, the token bucket regulator becomes a
    shaper that limits the long-term output rate.
    Another important point is that, when the rate is set to the
    actual transmission rate or higher, transmission complete
    interrupts can trigger the next dequeue.  
    However, if the token rate is smaller than the actual transmission
    rate,  the rate limit would be still in effect at the time of
    transmission complete interrupt, and the rate limiting falls back
    to the kernel timer to trigger the next dequeue. 
    In order to achieve the target rate under timer-driven rate limiting, 
    the bucket size should be increased to fill the timer interval.

 1.7.1 Interface Rate-Limiting
    <simple rate-limiting by tbrconf(8)>
    If you want to limit the outgoing bandwidth of an interface but
    you don't need a queueing discipline, you can set up a
    token-bucket regulator without any queueing discipline by
    tbrconfig(8).

    To limit the outgoing traffic of fxp0 up to 30Mbps,

	# tbrconfig fxp0 30M auto
	fxp0: tokenrate 30.00M(bps)  bucketsize 36.62K(bytes)

    The "auto" keyword is used to automatically calculate the required
    bucket size.  In the above example, 36.62KB is selected.
    The following formula is used to compute the bucket size:

	bucket_size = desired_rate(in bps) / 8 / kernel_timer_frequency;

    The computed bucket size is conservative in the sense that
    it is large enough to be able to satisfy the specified rate
    only by the kernel timer events.
    In many cases, a half of the computed size is still able to
    achieve the rate.

    To remove the installed token-bucket regultor,

	# tbrconfig -d fxp0
	deleted token bucket regulator on fxp0

    <interface rate-limiting with a queueing discipline>
    By default, altqd selects a small bucket size for non-rate
    limiting operation.  If you want to use a queueing discipline with
    interface rate-limiting, you need to explicitly specify the bucket
    size by "tbrsize" in the interface commmand of altq.conf.
    "bandwidth" specifies the token rate, and "tbrsize" specifies the
    bucket size.  
    The following FIFOQ setting has an effect similar to the previous
    example for tbrconfig(8).

	[altq.conf]
	interface fxp0 bandwidth 30M tbrsize 36K fifoq

    It is recommended to tune the backet size with FIFOQ, and then,
    use the resulting size for other queueing disciplines.

    Note that, if a token-bucket regulator is already installed
    on the interface when altqd is started, altqd does not install a
    new token-bucket regulator.  That is, the existing setting is
    respected.

 1.7.2 Tuning a Token-Bucket Regulator to Minimize Delay
    If you are serious about minimizing the delay, you need to tune
    the token rate and the bucket size.

    The point here is to set the token rate (bandwidth) to match
    the actual maximum transmission rate.
    If the token rate is higher than the transmission rate,
    packets can accumulate in the device buffer, which increases
    delay for high priority packets.

    <<detailed step to tune the daly>>

    (1) measure the actual maximum throughput
    Let's start with a simple FIFOQ config:

	[altq.conf]
	interface fxp0 bandwidth 100M fifoq

    Run your favorite benchmark, and observe the throughput by
    altqstat(1).
    I use netperf with the following parameters:
	netperf -H <dst> -t TCP_STREAM -l 20 -- -s 56K -S 56K -m 8K

	[altqstat output]
	 q_len:17 q_limit:50 period:5158
	 xmit:17557 pkts (26571598 bytes) drop:0 pkts (0 bytes)
	 throughput: 66.23Mbps
	 q_len:25 q_limit:50 period:8374
	 xmit:28704 pkts (43448156 bytes) drop:0 pkts (0 bytes)
	 throughput: 67.17Mbps

    The throughput is about 67Mbps.
    Note that the benchmarking software could report smaller
    throughput since it is the application level throughput.
    The throughput reported by altqstat(1) includes TCP/IP/MAC
    headers, and this value should be used for the token rate.

    (2) set the measured token rate

	[altq.conf]
	interface fxp0 bandwidth 67M fifoq
	                         ^^^
    Repeat the measurement.

	[altqstat output]
	 q_len:34 q_limit:50 period:222
	 xmit:48625 pkts (73608550 bytes) drop:0 pkts (0 bytes)
	 throughput: 64.80Mbps
	 q_len:22 q_limit:50 period:287
	 xmit:59437 pkts (89977918 bytes) drop:0 pkts (0 bytes)
	 throughput: 65.15Mbps

    This time, focus on the period counter.
    The period counter of FIFOQ is incremented every time the queue
    becomes empty.  Therefore, if the period counter increases, the
    queue is not constantly backlogged.  It suggests the bucket size
    is too large (provided that the TCP window size is big enough.)

    (3) decrease the bucket size

	[altq.conf]
	interface fxp0 bandwidth 67M tbrsize 4K fifoq
	                             ^^^^^^^^^^
    Repeat the measurement.

	[altqstat output]
	 q_len:34 q_limit:50 period:22
	 xmit:36567 pkts (55352738 bytes) drop:0 pkts (0 bytes)
	 throughput: 62.49Mbps
	 q_len:31 q_limit:50 period:22
	 xmit:46982 pkts (71121048 bytes) drop:0 pkts (0 bytes)
	 throughput: 62.76Mbps

    As you can see, the period counter now stays at 22 (in excahange
    for slightly lower throughput).
    If "tbrsize" becomes too small, the throughput will sharply degrade.

    (4) use the parameters for other queueing disciplines

    <simple delay measurement by PRIQ and ping>
    The following altq.conf sets up PRIQ (priority queueing)
    to give high priority to ICMP.

	[altq.conf]
	interface fxp0 bandwidth 67M tbrsize 4K priq
	class priq fxp0 high_class NULL priority 1
	class priq fxp0 def_class NULL priority 0 default
		filter fxp0 high_class 0 0 0 0 1

    Run ping(8) and the TCP benchmarking at the same time, and see
    the delay experienced by ping(8).
    If you commnet out the filter line, both flows are put into the
    same default class.

1.8 ATM Driver

    The ATM driver is based on bsdatm1.4 
    <ftp://dworkin.wustl.edu/dist/bsd/bsdatm1.4.tar.gz> 
    written by Chuck Cranor of Washington University 
    <http://dworkin.wustl.edu/pub/chuck/>.

    the ALTQ release includes enhancements to the ATM driver
    (especially, pvc interface support).

1.9 Tun Driver
    Although altq-3.0 supports the tun driver, it is a bit tricky to
    control the tun device.

    When ppp transmits packets the tun interface is not a bottleneck
    but the serial port is the bottleneck.  As a result, packets are
    queued in the output queue of ppp and the buffer in the kernel
    as shown in the following figure. (a queued packet is shown as X.)

                                 ppp
      app ---+              +-->[ XXX]--+
             |              |           |
   user      |              |           |
   ==========|==============|===========|=====
   kernel    |              |           |
             |   +------+   |           |
             +-->|      |---+           +->[ XX]--> sio
                 +------+                  line
                   tun0                  discipline

    To control traffic at the tun interface, rate-limit the tun
    interface by a token-bucket regulator to shift the queueing point
    to the tun device as shown in the following figure.

                                 ppp
      app ---+              +-->[    ]--+
             |              |           |
   user      |              |           |
   ==========|==============|===========|=====
   kernel    |              |           |
             |   +------+   |           |
             +-->| XXXXX|---+           +->[   ]--> sio
                 +------+^tbr              line
                   tun0                  discipline

    Note that ppp usually compress packets so that the throughput at
    the tun interface will be much higher than the line rate.

2. CBQ

2.1 CBQ Configuration File

    Keep your altq.conf simple!
    Most of the CBQ parameters are automatically set by the system
    unless they are explicitly specified in the configuration file.

    Basic commands

    Though there are many commands and options, all you need to use
    will be the following commands and their options.

	interface if_name [bandwidth bps] cbq
	(e.g., "interface fxp0 bandwidth 10M cbq")
	
	class cbq if_name class_name parent [borrow] [pbandwidth percent] 
		[red] [default|control]
	(e.g., "class cbq fxp0 my_ftp tcp_class borrow pbandwidth 30 red")

	filter if_name class_name dst_addr dst_port src_addr src_port proto
	(e.g., "filter fxp0 my_ftp 133.138.1.83 0 0 20 6")

    "interface" command sets up the interface.  specify the interface
    bandwidth in bits-per-second.

    "class" command creates a class.  set the bandwidth of the class
    by "pbandwidth" in percent of the interface bandwidth.
    set "borrow" when the class can borrow bandwidth from its parent 
    class.  set "red" if you use RED dropper (good for TCP).

    "filter" command sets a packet-filter to a class.
    a basic filter uses <dst_addr, dst_port, src_addr, src_port, proto>.
    NOTE: dst comes first.
    set "0" if you don't care about the field.

2.2 Filter-Matching Rule

    The CBQ (and HFSC/PRIQ) classifier performs filter-matching for
    every packet.
    The classifier goes through the filters from the last entry in the
    config file, which means you have to list a more generic filter
    first in the config file.
    For example, two filters, one for all TCP and the other for HTTP,
    should be listed in the following order.

	filter fxp0 TCP_class  0 0 0 0 6
	filter fxp0 HTTP_class 0 0 0 80 6

    If the order is reversed, all HTTP packets match TCP_class first.
    In other words, the HTTP filter is a "subset" of the TCP filter.
    All packets matched by the HTTP filter are matched by the TCP
    filter.

    On the other hand, if two filters have different values in the
    same field, there's no packet to match both filters.
    Such two filters are "disjoint".  
    For example,  a packet has a single source port number and never
    matches both of the following filters.

	filter fxp0 TELNET_class 0 0 0 23 6
	filter fxp0 HTTP_class   0 0 0 80 6
   
    Another filter relation is "intersect".  If two filters have a
    shared region (intersection) but they are not a subset of each
    other, the order of applying the filters is very important.
    For example, a filter by the destination address and a filter
    by the source port.  HTTP Packets to 133.138.1.83 match both
    filters.

	filter fxp0 my_class 133.138.1.83 0 0 0 6
	filter fxp0 HTTP_class   0 0 0 80 6

    It is recommended to avoid the use of "intersecting" filters.

    The last filter relation is a special case of "intersection",
    called "port intersection".
    when two filters have the following relation:
        - intersection is only port numbers
	- one specifies src port and the other specifies dst port
    the well-known ports are used by the system processes, and
    there must be no packet with well-known port numbers in both src
    and dst ports.  so we allow this special "intersection" and handle
    it differently.
    For example, 

	filter fxp0 TELNET_class 0 23 0 0 6
	filter fxp0 HTTP_class   0 0 0 80 6


    The altqd config file parser will
	- provide an error message and exit when a "subset" filter has
          an wrong order.
	- provide a warning message when a "intersecting" filter is
	  detected.  (can be supressed with keyword "dontwarn".)

2.3 Sample Setting

    The following graph shows a sample class hierarchy in which
    traffic is divided into 3 meta classes (bulk, interactive, misc).
    The meta classes are defined in order to control hierarchical
    distribution of the available bandwidth under congestion.
    Filters are set only to the leaf classes.

                          root 
                           | (100%)
                           +-------------------------------+
                           |                               |
                        def_class                          |
                           | (95%)                         |
            +--------------+---------------+               |
            |              |               |               |
           bulk           misc            intr             |
            | (30%)        | (30%)         | (30%)         |
       +----+----+         |             +-+---+           |
       |    |    |         |             |     |           |    
      tcp  ftp  http       udp          dns  telnet     ctl_class
      (10%)(10%)(10%)      (10%)        (10%) (10%)       (4%)

    The corresponding altq.conf is listed below.
	line 4:
	    interface is sr0, bandwidth is 1Mbps, use CBQ
	line 5:
	    create root class. set "NULL" to parent and "100"% to 
	    pbandwidth (bandwidth in percent).
	line 9:
	    create "control class" using keyword "control".
	    the system uses the control class to send control
	    packets (RSVP, ICMP, IGMP).  This rule is built-in and 
	    provided for backward compatibility.  This feature will be
	    removed in the future. 

	    if a control class is not defined by the time of the
	    default class is created,  the system will automatically
	    create one with 2% bandwidth.  The bandwidth is taken out
	    of the default class.
	line 10:	 
	    create "default class".  if a packet doesn't match the 
	    filters, the packet is put into the default class.
	line 12-14:
	    create 3 meta-classes as children of the default class.
	    they can borrow bandwidth from the default class.
	line 23:
	    create a class for TCP as a child class of the "bulk" class.
	    bandwidth can be borrowed from the parent.
	    also, this class uses RED dropper.
	line 24:
	    add a filter to the tcp class.  This filter match all TCP 
	    packets (proto=6), and thus, should be listed earlier than
	    other filters for packets using TCP.


     1	#
     2	# sample configuration file for 1Mbps link
     3	#	
     4	interface sr0 bandwidth 1M cbq
     5	class cbq sr0 root NULL pbandwidth 100
     6	#
     7	# meta classes
     8	#
     9	class cbq sr0 ctl_class root pbandwidth 4 control
    10	class cbq sr0 def_class root borrow pbandwidth 95 default
    11	#
    12	class cbq sr0 bulk def_class borrow pbandwidth 30
    13	class cbq sr0 misc def_class borrow pbandwidth 30
    14	class cbq sr0 intr def_class borrow pbandwidth 30
    15	
    16	#
    17	# leaf classes
    18	#
    19	
    20	#
    21	#  bulk data classes
    22	#
    23	class cbq sr0 tcp bulk borrow pbandwidth 10 red
    24		filter sr0 tcp 0 0 0 0 6	# other tcp
    25	class cbq sr0 ftp bulk borrow pbandwidth 10 red
    26		filter sr0 ftp 0 0 0 20 6	# ftp-data
    27		filter sr0 ftp 0 20 0 0 6	# ftp-data
    28	class cbq sr0 http bulk borrow pbandwidth 10 red
    29		filter sr0 http 0 0 0 80 6	# http
    30		filter sr0 http 0 80 0 0 6	# http
    31	#
    32	# misc (udp) classes
    33	#
    34	class cbq sr0 udp misc borrow pbandwidth 10 red
    35		filter sr0 udp 0 0 0 0 17	# other udp
    36	#
    37	# interactive classes
    38	#
    39	class cbq sr0 dns intr borrow pbandwidth 10 red
    40		filter sr0 dns 0 0 0 53 17
    41		filter sr0 dns 0 0 0 53 6
    42	class cbq sr0 telnet intr borrow pbandwidth 10 red
    43		filter sr0 telnet 0 0 0 23 6	# telnet
    44		filter sr0 telnet 0 23 0 0 6	# telnet
    45		filter sr0 telnet 0 0 0 513 6	# rlogin
    46		filter sr0 telnet 0 513 0 0 6	# rlogin

    Some TIPS for CBQ setting:
    I have to admit that it is tricky to correctly set up CBQ parameters.
	- Don't try to borrow too much.  There are some technical
	  difficulties.
	  For example, parent: 90%, child: 2%.  The child should
	  be able to use up to 90%, but may not work as you expect
	  depending other conditions involved.
	- Keep the depth of the leaf classes equal from the root class.
	- Setting high-priority to a class won't help much, and 
	  high-priority has some side-effect to borrowing mechanism.
	  Don't use "priority" unless the link is less than 1Mbps.
	- Don't expect accurate rate control.  CBQ has error margins of 
	  several percent against the REAL interface speed.
	- Use the "altqstat" tool to see the various statistics of a
	  class.

    Especially, I recommend to use 1000Hz timer for CBQ tests.
    Although CBQ should work with 100Hz timer, it is not easy
    to tune CBQ for a wide range of CPU and network (speed, MTU, etc).

2.4 CBQ Setting for Slow Interfaces

    There has been some difficulties to set the right parameters
    when the link is slow (say, less than 512Kbps).

    If the default doesn't work well, try "maxburst 2" or "maxburst 1". 
    Also, I recommend to assign more than 10% of the link bandwidth to
    each class.

    Setting high-priority to an interactive class could improve the 
    response time.

    See also "Token-bucket regulator".

2.5 Shaping the Total Traffic of an Interface

    CBQ is not designed to shape the total traffic.  (the original CBQ
    design assumes that the bandwidth of the root class is bound by
    the link bandwidth.)
    Therefore, I recommend to use a link-layer technologies (e.g.,
    serial-line speed) to reduce the total traffic.
    Having said that, if you still want to shape the total traffic
    there are two ways.
    Don't expect accurate rate control.  CBQ has error margins of 
    several percent against the REAL interface speed.

    (1) Use a Class as a Shaper

    Create a common ancestor class which limits the total bandwidth.

	Limitations:
	    - the minimum unit is 1% of the linkbandwidth

	interface fxp0 bandwidth 10M cbq
	class cbq fxp0 root_class NULL pbandwidth 100
	# use def_class as a 1Mbps shaper.  don't borrow from root
	class cbq fxp0 def_class root_class pbandwidth 10 default 
	class cbq fxp0 tcp_class def_class pbandwidth 5 

    (2) Set a Low Value to Interface Bandwidth

	Limitations:
	    - real traffic becomes bursty.
		(try "maxburst 2" or "maxburst 1")
	    - borrowing may not work so well.

	# set 512Kbps to the 10Mbps interface
	interface fxp0 bandwidth 512K cbq
	class cbq fxp0 root_class NULL pbandwidth 100
	# don't borrow from the root class
	class cbq fxp0 def_class root_class pbandwidth 95 default 
	class cbq fxp0 tcp_class def_class pbandwidth 50

    See also "Token-bucket regulator".

2.6 CBQ over 100baseT

    CBQ shapes the outgoing traffic using the kernel timer.
    1500 byte MTU over 100Mbps is too much for the default 10msec
    kernel timer.  You need to use 1msec timer instead.
    See "Kernel Configuration Options" for how to do it.

2.7 CBQ Monitoring

	the altqstat program reports the statistics and the internal
	state of the classes.

	The output for a class looks like:

	Class 0 on Interface fxp0:
		priority: 1 depth: 0 offtime: 8664 [us] wrr_allot: 1016 bytes
	        nsPerByte: 2666 (3.00 Mbps),    Measured: 8.22 [Mbps]
	        pkts: 9735,     bytes: 13381264
	        overs: 9662,    overactions: 520
	        borrows: 9142,  delays: 520
	        drops: 2,       drop_bytes: 3008
	        QCount: 1,      (qmax: 30)
	        AvgIdle: -125 [us],     (maxidle: 1843 minidle: -125 [us])

	How to read: (you have to understand the mechanism of CBQ)

	priority: 1 depth: 0 offtime: 8664 [us] wrr_allot: 1016 bytes
	--> priority is 1, depth is 0 (it's a leaf node) 
	    offtime is 8664usec, allotment of weighted-round robin is 1016B.
        nsPerByte: 2666 (3.00 Mbps),    Measured: 8.22 [Mbps]
	--> the bandwidth of the class is 3.00Mbps, currently 8.22Mbps 
	    is used
        pkts: 9735,     bytes: 13381264
	--> 9735 packets were transmitted (total 13381264 bytes)
        overs: 9662,    overactions: 520
	--> exceeded the assigend bandwidth 9662 times
	    overlimit actions were called 520 times
        borrows: 9142,  delays: 520
	--> borrowed bandwidth from its ancestor 9142 times
	    suspended for rate-control 520 times
        drops: 2,       drop_bytes: 3008
	--> 2 packets were dropped (total 3008 bytes)
        QCount: 1,      (qmax: 30)
	--> current queue length is 1 (the limit is 30)
        AvgIdle: -125 [us],     (maxidle: 1843 minidle: -125 [us])
	--> current average idle is -125 usec.
	    (maxidle is 1843 usec and minidle is -125 usec)

2.8 Limitations of Borrowing

    The borrowing mechanism is nice but it has limitations:

    1. a small class cannot borrow the entire bandwidth of its parent.

	a class gets suspended when it is overlimit.
	a smaller class has a longer suspension period (offtime).
	when borrowing is enabled, a child also borrows the offtime of
	the parent.
	but when the parent also gets overlimit, the child has to use
	its own offtime to avoid overloading the system.  (otherwise,
	all the classes use the minimum offtime even under a heavy load.)

	as a result, a small class is not able to make full use of the 
	bandwidth of the parent.

    2. competing TCPs equally share the bandwidth even when their bandwidth
       allocations are not equal. 

	when borrowing is enabled, the bandwidth allocation is
	enforced only when the queues have enough backlog.
	but TCPs can reach the equilibrium without creating backlog in
	the queues.
	in this case, the bandwidth share is made by the TCP mechanism,
	not by CBQ.

	if there are many TCP flows, TCP will not be able to reach
	the equilibrium and the allocation will be done by CBQ.

    3. UDP beats TCP when both are set to borrow from the same parent.

	the situation is similar to case 2.  TCP backs off before the
	allocation is enforced by CBQ.

	again, if there are many flows, the situation will be improved.

    4. UDP is very bursty when borrowing is enabled.

	as explained in case 1, a child has to be suspended longer
	when the parent gets overlimit.  this leads to the bursty
	behavior of UDP.  on the other hand, TCP adapts better to
	avoid overloading the parent.

3. HFSC

  3.1 HFSC Basics

    The following paper is a good reference but it is a bit too
    theoretical so that HFSC basics are summarized here.

	"A Hierarchical Fair Service Curve Algorithm for Link-Sharing, 
	Real-Time and Priority Service"
	Ion Stoica, Hui Zhang, and T. S. Eugene Ng.  SIGCOMM'97.

    More information is available from
	http://www.cs.cmu.edu/~hzhang/HFSC/main.html

  Service Curve:

    HFSC maintains 2 service curves; one for real-time criteria and
    the other for link-sharing criteria.

    A service curve of HFSC consists of 2 segments. "m1" and "m2" are
    slopes of the 2 segments and "d" is the x-projection of the
    intersection that specifies the length of the 1st segment.
    Intuitively, "m2" specifies the long term throughput guaranteed to 
    a flow, while "m1" specifies the rate at which a burst is served.
    When the slope of the 1st segment is larger than that of the 2nd
    segment, it is called "concave".
    a service curve is either convex or concave.


                          m2    ________
                ________--------                         /
               /                                     m2 /
              /                                        /
          m1 /                                        /
            /                              m1 = 0    /
           /                              __________/
           <-d->                          <-- d --->

                  concave                   convex

    A concave service curve provides a bounded burst similar to a
    token-bucket.  The triangular area made by the 1st segment is
    roughly corresponds to the depth of a token-bucket, and the slope
    bounds the peak rate.
    The difference is that the peak rate of a token-bucket is a upper
    bound of the sending rate and is often set to the wire speed.
    On the other hand, HFSC guarantees the rate defined by the 1st
    segment, and thus, it cannot be the wire speed.

    A convex service curve, on the other hand, supresses the initial
    traffic volume.  "m1" of a convex curve must be 0 in the current
    implementation.
    A linear service curve is a special case of a convex curve with a
    NULL 1st segment.
    A linear service curve corresponds to a traditional virtual clock
    model and a good starting point for novice users.

  Virtual Time:

    Each class keeps the total byte count already sent.
    When a class is backlogged, virtual time (vt) is calculated for
    the packet at the head of the class queue.   vt is the
    x-projection of the service curve corresponding to (total + packet_len).
    As a result, vt of a class monotonically increases.
    By scheduling a packet with the smallest vt, the bandwidth
    allocation becomes propotional to the service curve slope of each
    class.

                    bytes         
                         |                   /
                         |                  /service curve
                         |                 /
           next   -->+   +----------------+
           packet    |   |               /|
           length    |   |              / |
                     |   |             /  |
           total --> +   +------------+   |
           bytes         |           /|   |
           already       |          / |   |
           sent          |         /  |   |
                                  /   |   |
                                      |   |
                                      |   |
                              --------+---+--------------> time
                                          vt for next packet
                                      vt for previous packet

    A service curve is updated every time a class becomes backlogged.
    The update operation takes the minimum of 
	(1) the service curve used in the previous backlogged period
    and
	(2) the original service curve starting at (current_time, total_bytes).
    When a class has been idle long enough, the updated curve is equal 
    to (2).  On the other hand, when the class has been using
    bandwidth much more than its share, the updated curve is equal to (1).
    (1) and (2) can intersect when the class has been using bandwidth
    a little less than its share.  In this case, the updated curve could
    have a different value of "d".  The operation is illustrated in
    the following figures.  It might be easier to see it as a
    half-filled token-bucket.
                                                     ________
                                     ________--------
                                    /                  ______
                                   /   ________--------
                       ________---+----        
                      /          /             
                     /          /              
    total    ->     /          + new coordinate                
                   /                           
                  /                            
           service curve       |
                 of            +
          previous period   current
                             time

			Update Operation

                                                       ______
                                       ________--------
                                  +----        
                                 /             
                                /              
    total    ->                + new coordinate                
                                               
                                               
                               |
                               +
                             current
                             time

			New Service Curve

  HFSC Scheduling:

    HFSC has 2 independent scheduling mechanisms.
    Real-time scheduling is used to guarantee the delay and the
    bandwidth allocation at the same time.
    Hierachical link-sharing is used to distribute the excess
    bandwidth available.  
    
    When dequeueing a packet, HFSC always tries real-time scheduling
    first.  If no packet is eligible for real-time scheduling,
    link-sharing scheduling is performed.
    HFSC does not use class hierarchy for real-time scheduling.

  Hierarchical Link-sharing:

    In HFSC, only leaf classes have real packets but vt of an
    intermediate class is also maintained by summing up the total byte 
    count used by its descendants.
    When dequeueing a packet, HFSC's hierachical scheduler walks
    through the class hierarchy from the root to a leaf class.
    At each level of the class hierarchy, the scheduler selects a
    class with the smallest vt among its child classes.  When the
    scheduler reaches a leaf class, this leaf class is scheduled.

    Note that the scheduler looks at only direct children at each
    level.  Thus, the bandwidth allocation is propotional to the
    service curve slopes among the sibling classes but is not
    propotional among classes with different parents.
    For example, 4 leaf classes in the following figure will have the
    same bandwidth allocation although A, B and C, D have different
    slopes.
    In other words,  the ratio among siblings controls the bandwidth
    allocation, and absolute slope values do not matter.
    (Similarly, vt values should be consistent only among sibling
    classes.)

                              root (100Mbps)
                               |
                       +-------+-------+
                       |               |
                       E (20Mbps)      F (20Mbps)
                       |               |
                   +---+---+       +---+---+
                   |       |       |       |
                   A       B       C       D
               (10Mbps) (10Mbps) (1Mbps) (1Mbps)

  Real-time scheduling:

    As opposed to link-sharing scheduling, single consistent time is
    used for real-time scheduling.
    Each class keeps the cumulative byte count that is similar to the
    total byte count but only for packets scheduled by the real-time
    scheduling.

    HFSC computes the eligible time and deadline for each class.
    The eligible time and deadline are the x-projections of the
    head and tail of the next packet.
    A class becomes eligible for real-time scheduling when the current
    time becomes greater than the eligible time of the class.
    The real-time scheduler selects a class with the smallest deadline
    among eligible classes.

                    bytes         
                         |                   /
                         |                  /service curve
                         |                 /
           next   -->+   +----------------+
           packet    |   |               /|
           length    |   |              / |
                     |   |             /  |
      cumulative --> +   +------------+   |
           bytes         |           /|   |
           already       |          / |   |
           sent          |         /  |   |
                                  /   |   |
                                      |   |
                                      |   |
                              --------+---+--------------> time
                                eligible  deadline
                                time

    In the original HFSC paper, a single service curve is used for
    both real-time scheduling and link-sharing scheduling.
    We have extended HFSC to have independent service curves for
    real-time and link-sharing.

    Decoupling service curves allows to independently control the
    guaranteed rate and the distribution of excess bandwidth.
    For example, it is possible to guarantee the minimum bandwidth of
    2Mbps to 2 classes but the excess bandwidth is distributed with a
    different ratio.

    It is also possible to set either of the service curves to be 0.
    When the real-time service curve is 0, a class recevies only
    excess bandwidth.
    When the link-sharing service curve is 0, a class cannot recevie
    excess bandwidth.  Note that 0 link-sharing makes the class
    non-work conserving.

    Note that, the link-sharing scheduling alone can guarantee the
    assigned bandwidth as long as the real-time service curve is equal
    to or smaller than the link-sharing service curve for all classes.
    But if the link-sharing service curve is smaller, assigned
    link-sharing bandwidth may not be provided.

  3.2 Notes on HFSC

    Root class
	the root class is automatically created by the interface command.
	Both service curves are initialized with a linear curve of the
	interface speed.

    Default class
	One default (leaf) class is required for an interface.

    Only leaf classes can have filters
	because of the hierarchical link-sharing algorithm,  only leaf
	class can have packets.
	Thus, you will need to explicitly create a leaf class to
        represent an intermediate class.
	For example, in order to distribute the bandwidth of CMU to
        its departments, create a leaf class "other" and attach a
	filter for CMU to this leaf class.

                                 |
				CMU
                                 |
                         +-------+-------+
                         |       |       |
                        CS       EE     other
        
    Admission Control
	Sum of the service curves of the children should be less than
	the service curve of the parent.
	Especially, you have to be careful when assigning concave
	service curves since the sum of the peak rates could be large.

    Reserved real-time bandwidth
	Many network cards are not able to saturate the wire, and if
	we allocate real-time traffic more than the actual maximum
	transmission rate, all classes become eligible and hfsc is no
	longer able to meet the delay bound requirements.
	Thus, 20% of the real-time bandwidth is reserved for safety.
	Link-sharing does not have reserved bandwidth.

   Specifying service curves in altq.conf
	To specify the same service curve to both real-time and
	link-sharing, use

		[sc <m1> <d> <m2>]

	To specify only the real-time service curve, use

		[rt <m1> <d> <m2>]

	To specify only the link-sharing service curve, use

		[ls <m1> <d> <m2>]

	Keyword "pshare" and "grate" are shorthand expression to
	specify a linear service curve.

	"pshare" specifies a linear link-sharing service curve by
	percentage of the interface bandwidth.
	"pshare <percent>"  is equivalent to "[ls 0 0 m2]"
	here 
		m2 = interface_bandwidth * percent / 100

	"grate" specifies a linear real-time service curve.
	"grate <m2>" is equivalent to "[rt 0 0 m2]".

    Shaping by HFSC

	NULL link-sharing service curve can be used to limit the
	bandwidth of a class.
	When a link-sharing service curve is zero, packets more than
	the assigned real-time rate remain in the queue until the
	class becomes eligible again.

	Shaping of HFSC is more accurate than CBQ because HFSC does
	not use a suspention period (called offtime in CBQ) and packet
	length is considered.

	Note that delay requirement can not be guaranteed when in the
	shaping mode since an eligible packet could be held until the
	next timer tick in the worst case.

	Also note that, if NULL link-sharing service curve is assigned 
	to a parent class, its children also cannot have link-sharing.

 3.3 Sample configuration

	#
	# hfsc configuration for hierachical sharing
	#
	interface pvc0 bandwidth 45M hfsc
	#
	# (10% of the bandwidth share goes to the default class)
	class hfsc pvc0 def_class root pshare 10 default
	#
	#          bandwidth share    guaranteed rate
	#    CMU:       45%             15Mbps
	#    PITT:      45%             15Mbps
	#
	class hfsc pvc0 cmu  root pshare 45 grate 15M 
	class hfsc pvc0 pitt root pshare 45 grate 15M 
	#
	# CMU      bandwidth share    guaranteed rate
	#    CS:        20%             10Mbps
	#    other:     20%              5Mbps
	#
	class hfsc pvc0 cmu_other cmu  pshare 20 grate 10M 
	        filter pvc0 cmu_other   0 0 128.2.0.0   netmask 0xffff0000 0 0
	class hfsc pvc0 cmu_cs    cmu  pshare 20 grate  5M 
	        filter pvc0 cmu_cs      0 0 128.2.242.0 netmask 0xffffff00 0 0
	#
	# PITT     bandwidth share    guaranteed rate
	#    CS:        20%             10Mbps
	#    other:     20%              5Mbps
	#
	class hfsc pvc0 pitt_other pitt  pshare 20 grate 10M 
	        filter pvc0 pitt_other  0 0 136.142.0.0  netmask 0xffff0000 0 0
	class hfsc pvc0 pitt_cs    pitt  pshare 20 grate  5M 
	        filter pvc0 pitt_cs     0 0 136.142.79.0 netmask 0xffffff00 0 0

4. RSVP
	Too complex to list here.

		rsvpd works only on FreeBSD that implementes a special
		hook for RSVP.  RSVP routers need to intercept RSVP
		signaling packets.  The normal IP stack does not have
		a way to intercept packets not destined to itself.
		(The router alert IP option was introduced later for
		this purpose but it is not implemented in the BSD
		stack.) 

	But here is a simple test scenario using rtap.
		sender:      172.16.3.17 port 9001
		destination: 224.100.100.5 port 9000

		rate 4000B/s (~32Kbps)


        sender  ------------   router   -------   receiver
        (172.16.3.17)
       -------------------------------------------------------------

        # dest udp 224.100.100.5/9000           # dest udp 224.100.100.5/9000


        # sender 9001 [t 4000 1000 5000 100]
               (src-port  r     b   p    m)
        
        **** start sending path message ****

                                                # receive
                                                (join the multicast group)

                                        **** start receiving path message ***

                                                # reserve ff 172.16.3.17/9001 \
                                                     [cl 4000 1000 100 5000]
                                                (reserve by fixed-filter)

                                                (or) reserve wf \
                                                     [cl 4000 1000 100 5000]
                                                (reserve by wildcard-filter)

                                        **** start sending resv message ****


        **** start receiving resv message ****


                                                # close
                                        **** ResvTear message ****
        **** ResvTear message ****

        # close
        **** PathTear message ****
                                        **** PathTear message ****

5. RED (Random Early Detection)

    To enable a simple RED at a router, specify "red" in altq.conf(5).

	interface fxp0 bandwidth 10M red

    RED parameters can be specified as follows:

	interface fxp0 bandwidth 10M red thmin 10 thmax 20 invpmax 15 qlimit 80

    Note that RED never shapes the traffic by itself.
    To test RED, run multiple TCP streams from src to sink.

	% altqstat

    will show the statistics (among other thing, the average queue
    length).

	Experimental ECN (explicit congestion notification) support
	for IPv4 is added since altq-0.4.3.
	 - To enable ECN by RED,  add "ecn" to the "interface" command 
	   line.
	
    RED can be enabled for a CBQ/HFSC/PRIQ class.

6. ECN (Explicit Congestion Notification)
    ECN needs support in routers and end-hosts.
    ECN support in routers is a straightforward modification to RED,
    setting a "Congestion Experienced" bit in the IP header instead of 
    dropping a packet.
    ECN support in end-hosts needs modifications to TCP.
    See README.ecn for more information.

7. Diffserv

    ALTQ supports
	1. traffic conditioning at ingress interfaces
		CDNR(conditioner) supports
		 - token-bucket meter
		 - 2-rate three color marker
		 - time-sliding window three color marker
	(ALTQ no longer supports meter/tagger at an output interface.)

	2. preferential scheduling at egress interface
		RIO has been extended to support 3 drop precedence values.
		combine RIO and CBQ/HFSC/PRIQ to support multiple classes.

	you can build AF(Assured Forwarding) and EF(Expedited Forwarding)
	services using ALTQ.

    The following figure shows what ALTQ can do for diffserv.

                            diffserv network
                     +------------------------------+
                     |                              |
        src ----> ingress --------> core --------> egress ----> sink
                  diffedge                         diffedge

               - MF classifier  - BA classifier  - BA classifier
               - meter/marker   - meter/marker   - meter/marker
               - AF/EF PHB      - AF/EF PHB      - AF/EF PHB
                                                 - clear dscp

    Sample configuration files can be found in "altqd/altq.conf.samples/".
    Also see altq.conf(5).

    Note that the performance of the AF and EF services depends on how 
    to provision those services and how to configure the network.
    ALTQ just provides mechanisms in order to build services.

	References

	RFC 2474 Definition of the Differentiated Services Field (DS Field)
                 in the IPv4 and IPv6 Headers
	RFC 2475 An Architecture for Differentiated Services
	RFC 2597 Assured Forwarding PHB Group
	RFC 2598 An Expedited Forwarding PHB
	RFC 2697 A Single Rate Three Color Marker
	RFC 2698 A Two Rate Three Color Marker
	draft-ietf-diffserv-model-00.txt
		A Conceptual Model for Diffserv Routers

  Sample configuration for traffic conditioner

	#
	# null interface command
	#
	interface pvc1

	#
	# simple dropper
	#
	conditioner pvc1 dropper <drop>
		filter pvc1 dropper 0 0 172.16.4.173 0 0

	#
	# simple marker to clear dscp
	#
	conditioner pvc1 clear_marker <mark 0x0>
		filter pvc1 clear_marker 0 0 172.16.4.174 0 0

	#
	# EF style conditioner (a simple token bucket)
	#
	conditioner pvc1 ef_cdnr <tbmeter 6M 64K <pass><drop>>
		filter pvc1 ef_cdnr 0 0 172.16.4.176 0 0

	#
	# AF style conditioners (trTCM)
	#
	conditioner pvc1 af1x_cdnr \
	    <trtcm 3M 32K 10M 64K <mark 0x28><mark 0x30><mark 0x38> colorblind>
		filter pvc1 af1x_cdnr 0 0 172.16.4.177 0 0

	#
	# color-blind trTCM is equivalent to a dual tokenbucket meter
	#
	conditioner pvc1 dual_tb \
		<tbmeter 10M 64K \
			<tbmeter 3M 32K <mark 0x28><mark 0x30>><mark 0x38>>
		filter pvc1 dual_tb 0 0 172.16.4.178 0 0

  Sample queueing configuration using HFSC

	#
	# output interface
	#
	interface pvc0 bandwidth 45M hfsc
	class hfsc pvc0 def_class root pshare 10 default 
	#
	# EF class
	#	real-time: 6Mbps
	#	link-sharing: 0%
	#
	class hfsc pvc0 ef_class root grate 6M
		filter pvc0 ef_class 0 0 0 0 0 tos 0xb8 tosmask 0xfc
	#
	# AF classes
	#	real-time: 3Mbps
	#	link-sharing: 10% (4.5Mbps)
	#
	# rio threshold values
	rio 40 50 10 20 30 10 5 15 10
	#
	class hfsc pvc0 af1x_class root grate 3M pshare 10 rio
	class hfsc pvc0 af2x_class root grate 3M pshare 10 rio
	class hfsc pvc0 af3x_class root grate 3M pshare 10 rio cleardscp
	class hfsc pvc0 af4x_class root grate 3M pshare 10 rio

		filter pvc0 af1x_class 0 0 0 0 0 tos 0x20 tosmask 0xe4
		filter pvc0 af2x_class 0 0 0 0 0 tos 0x40 tosmask 0xe4
		filter pvc0 af3x_class 0 0 0 0 0 tos 0x60 tosmask 0xe4
		filter pvc0 af4x_class 0 0 0 0 0 tos 0x80 tosmask 0xe4

  Similar queueing configuration using CBQ

	#
	# output interface
	#
	interface pvc0 bandwidth 45M cbq
	class cbq pvc0 root_class NULL pbandwidth 100
	class cbq pvc0 def_class root_class borrow pbandwidth 86 default
	#
	# EF class
	#
	class cbq pvc0 ef_class root_class pbandwidth 14 priority 5
		filter pvc0 ef_class 0 0 0 0 0 tos 0xb8 tosmask 0xfc
	#
	# AF classes
	#
	# rio threshold values
	rio 40 50 10 20 30 10 5 15 10
	#
	class cbq pvc0 af1x_class def_class borrow pbandwidth 20 rio
	class cbq pvc0 af2x_class def_class borrow pbandwidth 20 rio
	class cbq pvc0 af3x_class def_class borrow pbandwidth 20 rio cleardscp
	class cbq pvc0 af4x_class def_class borrow pbandwidth 20 rio

		filter pvc0 af1x_class 0 0 0 0 0 tos 0x20 tosmask 0xe4
		filter pvc0 af2x_class 0 0 0 0 0 tos 0x40 tosmask 0xe4
		filter pvc0 af3x_class 0 0 0 0 0 tos 0x60 tosmask 0xe4
		filter pvc0 af4x_class 0 0 0 0 0 tos 0x80 tosmask 0xe4

8. WFQ
    WFQ (weighted fair queueing) is implemented as a sample implementation.

    WFQ is easy to use since it requires no configuration for a
    default setting.
    By default, WFQ allocates 256 queues and packets are mapped into
    one of the queues by hashing the destination address.
    So, packets for the same host will be put in the same queue.

    To enable WFQ on interface "vx0" and "vx1", add the following
    lines to your altq.conf(5).

	interface vx0 bandwidth 10M wfq
	interface vx1 bandwidth 10M wfq

    The following command can be used to monitor the wfq statistics.
	% altqstat -i vx1

9. FIFOQ

    FIFOQ (first-in first-out queueing) is implemented as a template
    for those who want to write their own queueing schemes on the
    ALTQ framework.
    So, there would be no reason to use FIFOQ unless you want to 
    modify the FIFOQ implementation.

  Using FIFOQ

    To enable FIFOQ on interface "vx0", add the following lines to
    your altq.conf(5).

	interface vx0 bandwidth 6M fifoq

    Alternatively, you can use a daemon process "fifoqd" in "legacy-tools".

	# fifoqd -d vx0

10. JoBS (Joint Buffer Management and Scheduling)

    The JoBS queuing scheme is contributed by Nicolas Christin 
(nicolas@cs.virginia.edu). This implementation is currently considered as 
EXPERIMENTAL.

    10.1. JoBS Basics

       The following two papers are good references, but may be a bit 
theoretical, thus we summarized the basics of JoBS here.

	"JoBS: Joint Buffer Management and Scheduling for Differentiated 
Services"
	Jorg Liebeherr and Nicolas Christin.  IWQoS'01.

	"A Quantitative Assured Forwarding Service"
	Nicolas Christin, Jorg Liebeherr and Tarek F. Abdelzaher.  UVA-CS 
Technical Report CS-2001-21. Short version to appear in Infocom 2002.

    More information is available from
	http://qosbox.cs.virginia.edu

    Overview:

        As its name indicates, JoBS is a joint buffer management and 
scheduling algorithm. It provides, on a per-hop basis, absolute and 
proportional service guarantees to traffic aggregates (henceforth 
refered to as "classes" of traffic). The following types of guarantees 
are supported:

	- absolute throughput guarantees (ARC)
	  e.g., Class-1 throughput >= 5 Mbps
	- absolute delay guarantees (ADC)
	  e.g., Class-2 delay <= 3 ms
	- absolute loss guarantees (ALC)
	  e.g., Class-1 loss rate <= 0.5 %
	- proportional delay guarantees (RDC)
	  e.g., Class-3 delay/Class-2 delay is roughly equal to 2.
	- proportional loss guarantees (RLC)
	  e.g., Class-4 loss rate/Class-3 loss rate is roughly equal to
	  2.

	The acronyms used differ from the name of the guarantees for 
historical reasons. Any mix of service guarantees can be enforced by 
JoBS. Service guarantees are offered to backlogged classes, and are 
valid over the current busy period. The begining of the current busy 
period is defined as the last time the output queue of the interface 
was empty.

    Mechanisms:

	JoBS uses the following mechanisms: a service rate is allocated 
to each class of traffic. Upon each packet arrival, the service rate is 
adjusted to meet the delay and throughput constraints. If no feasible 
rate allocation satisfies the delay and throughput constraints, traffic 
is dropped according to the loss guarantees specified. 

	JoBS does not perform admission control or traffic policing. 
Instead, if the set of service guarantees becomes unfeasible (which may 
be the case when some absolute guarantees are offered), some service 
guarantees are relaxed. In this prototype implementation, the 
following order of relaxation is observed:

	  1. Relax RLC and/or RDC.
	  2. Relax ARC.
	  3. Relax ADC.
	  4. Relax ALC.

	An algorithm similar to Deficit Round-Robin is used to convert 
the rate allocations into packet scheduling decisions.

    Remarks:

	JoBS is a work-conserving scheduler. In other words, if the 
output link is idle, and a packet is backlogged, the backlogged packet 
is transmitted at once, REGARDLESS of the service guarantees specified. 
Hence, at low loads, the work-conserving property may result in 
improper proportional delay differentiation. On the other hand, 
at low loads, all classes get very low delays, and thus a high-grade 
service. Arguably, proportional delay differentiation is only needed 
at times of overload.

	By design, JoBS attempts at minimizing the number of packets 
dropped. This ALTQ implementation of JoBS offers two modes of operation:

	 - shared buffer: all classes are backlogged in the same queue. 
	   If the queue length exceeds a given threshold, or if no 
	   feasible service rate allocation can satisfy the 
	   delay/throughput guarantees, packets are dropped. The shared		
	   buffer mode is the default, and is required to provide loss 
	   differentiation.
	 - per-class buffers: a separate buffer is associated to each 
	   class. Per-class buffers are useful when ONLY throughput and 
	   delay guarantees are desired. Per-class buffers cannot be 
	   used to provide loss differentiation.

	Note that JoBS does NOT check if the set of service guarantees 
	offered is feasible. While some examples are trivial (e.g., 
	guaranteeing a throughput exceeding the output link capacity), 
	some other cases may be trickier. For instance, giving an ARC 
	to all classes, with a shared buffer and no loss guarantees will 
	essentially result in FIFO queuing, and the service guarantees 
	offered will NOT be respected. This can be explained as follows. 
	Assume we have two classes, an output link capacity of 10 Mbps, 
	and we want to give 7 Mbps to Class 1, and 3 Mbps to Class 2. 
	After a short amount of time, Class 2 packets will end up filling 
	up the buffer. Incoming Class 1 packets will thus end up being 
	dropped, and the input rate (i.e., arrival rate - drop rate) of 
	Class 1 will be limited by the "slow" Class 2 packets still in the 
	buffer. This problem is a cousin of the traditional "Head-Of-Line" 
	blocking problem. Using TCP sources make it even more obvious, 
	since TCP sources reduce their sending rates when detecting a 
	packet drop. Thus, if only rate guarantees are to be supported, 
	you need to use SEPARATE buffers instead of the default shared 
	buffer.

    10.2. Sample configuration examples

    Note: If you want to try JoBS over the loopback interface, please be 
    aware that, due to the complexity of the queueing scheme, you may not 
    get the expected results if you are also using the machine to generate 
    traffic. These examples produce the desired results when used on a 
    Pentium III 1 Ghz, but we cannot make any promises for slower CPUs. 
    (As a matter of fact, testing JoBS on a Pentium II 450 MHz showed that 
    the machine had trouble generating enough traffic to saturate a 100 
    Mbps virtual link on the loopback when JoBS was running as a queueing 
    discipline.) Thus, if you are using a slow processor, we recommend that 
    you try with a 10 Mbps token bucket limiter, and accordingly modify 
    Example 1.

    Example 1:

    # Configuration for a 100 Mbps output link (fxp1),
    # Separate buffers with a limit of 50 packets each,
    # throughput guarantees to all classes, 
    # no delay or loss guarantees.
    #
    interface fxp1 bandwidth 100M qlimit 50 separate jobs
    #
    class jobs fxp1 high_class NULL priority 0 adc -1 rdc -1 alc -1 rlc -1 arc 39M
    class jobs fxp1 med2_class NULL priority 1 adc -1 rdc -1 alc -1 rlc -1 arc 29M
    class jobs fxp1 med1_class NULL priority 2 adc -1 rdc -1 alc -1 rlc -1 arc 19M
    class jobs fxp1 low_class NULL priority 3 default adc -1 rdc -1 alc -1 rlc -1 arc 9M
        filter fxp1 high_class 10.0.4.2 0 0 0 0
        filter fxp1 med2_class 10.0.5.2 0 0 0 0
        filter fxp1 med1_class 10.0.6.2 0 0 0 0
        filter fxp1 low_class 10.0.7.2 0 0 0 0 

    Example 2:

    # Configuration for a 100 Mbps output link (fxp1),
    # Shared buffer with a limit of 200 packets,
    # Delay bound of 2000 microseconds on Class 0,
    # Loss rate bound of 0.5 % on Class 0,
    # Proportional differentiation as follows:
    # Class 3-Delay = 2 Class 2-Delay
    # Class 4-Delay = 2 Class 3-Delay
    # and
    # Class 3-Loss Rate = 2 Class 2-Loss Rate
    # Class 4-Loss Rate = 2 Class 3-Loss Rate
    #
    interface fxp1 bandwidth 100M qlimit 200 jobs
    #
    class jobs fxp1 high_class NULL priority 0 adc 2000 rdc -1 alc 0.005 rlc -1 arc -1
    class jobs fxp1 med2_class NULL priority 1 adc -1 rdc 2 alc -1 rlc 2 arc -1
    class jobs fxp1 med1_class NULL priority 2 adc -1 rdc 2 alc -1 rlc 2 arc -1
    class jobs fxp1 low_class NULL priority 3 default adc -1 rdc 2 alc -1 rlc 2 arc -1 
        filter fxp1 high_class 10.0.4.2 0 0 0 0
        filter fxp1 med2_class 10.0.5.2 0 0 0 0
        filter fxp1 med1_class 10.0.6.2 0 0 0 0
        filter fxp1 low_class 10.0.7.2 0 0 0 0 

    10.3. JoBS altqstat module

	The JoBS altqstat module reports the following output:


	The first column is the class index. The second column is the average 
	queuing delay (in microseconds) experienced in the last five seconds, 
	the third column is the ratio of the class-(i+1) delay with the 
	class-i delay. The fourth column is the percentage of packets that 
	missed their deadline (due to constraints relaxation) since the 
	beginning of time. The fifth column represents the loss rate 
	(in percent) of the class. The sixth column represents the 
	throughput obtained by each class in Mbps. The seventh column 
	represents the arrival rate (offered load) of each class in Mbps. 
	The eigth column is the best case number of cycles consummed by the 
	enqueue() function, the ninth column is the average number of cycles 
	consummed by the enqueue function(), the tenth column is the standard 
	deviation of the number of cycles consummed by the enqueue() function,
	the eleventh column is the worst-case number of cycles consummed by 
	the enqueue() function. Columns 12 to 15 offer the same information 
	for the number of cycles consummed by the dequeue() function. The 
	sixteenth and seventeenth columns represented the total number of 
	packets enqueued and dequeued since the beginning of time,     
	respectively. A value of -1 means "Not Applicable".

	For example, one can get:

fxp1:
pri     del     rdc     viol    p_i     rlc     thru    off_ld  bc_e    avg_e
stdev_e wc_e    bc_d    avg_d   stdev_d wc_d    nr_en   nr_de
3       51968   1.98    0.000   34.60   2.01    12.344  18.39   2970    15618
3398    47674   1400    3728    981     32245   199920  176436
2       26218   1.89    0.000   17.25   2.00    14.923  18.77   2970    15618
3398    47674   1400    3728    981     32245   199920  176436
1       13851   13.61   0.000   8.64    17.32   35.035  37.08   2970    15618
3398    47674   1400    3728    981     32245   199920  176436
0       1018    -1.00   0.897   0.50    -1.00   35.530  35.70   2970    15618
3398    47674   1400    3728    981     32245   199920  176436
	
	which shows, among other things, that over the last 5 seconds, 
	class 1 packets were queued for 1018 microseconds on average, that 
	the ratio of class-3 delay to class-2 delays was 1.98, and that the 
	average number of cycles consummed by the enqueue operation was 
	15618 cycles. In this example the loss rate of class 3 is extremely 
	high (34%). This is due to the fact that we brutally overloaded the 
	link with UDP traffic.

11. IPv6
    ALTQ supports IPv6 and has been used over IPv6 since mid 1998.

12. Trouble Shooting

    1. Kernel Configuration:
	Q. "opt_altq.h" file is missing.
	A. Somehow, you messed up the kernel source tree.
	   Start with a fresh source tree and use the config file
	   named "ALTQ".
	   See "Kernel Configuration".

     2. CBQ/HFSC/PRIQ:
	Q. altqd doesn't start.
	   altqd: can't open altq device: No such file or directory
	A. /dev/altq/cbq is missing.  use MAKEDEV.altq to create the ALTQ
	   devices.

	Q. altqd doesn't start.
	   syscall error: can't add cbq on interface 'lo0': Device not configured
	A. ALTQ doesn't support this driver.
	   you'd better find a different interface card type.

	Q. altqd doesn't start.
	   CBQ open: Operation not supported by device
	A. Your kernel doesn't have the CBQ module, or altqd failed to
	   load the KLD module.
	   The kernel patch might have failed.  Don't forget
		% find altq_kernel_src_path -name "*.rej" -print
	   when you apply the altq patch.

	   An ALTQ-ready kernel shows the following line at boot:
		"altq: major number is 96"
	   "strings /kernel | grep altq" should print the following line:
		"altq: major number is %d"

	Q. altqd doesn't start.
	   syscall error: can't add cbq on interface 'fxp0': Device busy
	A. another altqd or another ALTQ related daemon is already running
	   on this interface but you start another altqd.

	Q. CBQ reports 
	   warning: filter for "foo_class" at line 58 could override
		filter for "bar_class" at line 27
	A. Two CBQ filters have "intersection".
	   See "Filter-Matching Rules".

	Q. CBQ reports 
	   filters for "foo_class" at line 58 and 
		for "bar_class" at line 57 has order problem!
	A. The order you put the filter is wrong.
	   See "Filter-Matching Rules".

	Q. CBQ reports
	   warning: class is too slow!!
	A. It's a warning message saying that the allocated bandwidth
	   is below the precision of the internal calculation and the
	   value is rounded up to the minimum value.
	   CBQ has a limitation of bandwidth assigned for a class: the
	   minimum bandwidth is 6Kbps to avoid 32bit integer overflow
	   in the internal calculation.
	   see "CBQ Setting for Slow Interfaces".

	Q. CBQ doesn't work as I expected.
	A. It is not easy to track down problems.
	   My rule of thumb to track down problems:
	      - watch out for possible interference: CPU or link could
		get saturated before queueing takes place.
	      - start with a simple setting, add complexity step by step.
	      - use "altqstat" to get the statistics and the internal
		state of CBQ. 
	      - try a kernel with a fine-grained timer value.  if the
		problem is gone, there must be some granularity
		mismatching.

13. Coverage (incomplete)

 diffserv
	RFC2474 (default PHB and Class Selector PHBs)
	RFC2597 (Assured Forwarding PHB Group)
	RFC2598 (Expedited Forwarding PHB)
	RFC2697 (A Single Rate Three Color Marker)
	RFC2698 (A Two Rate Three Color Marker)
 RED
	RFC2309
 ECN
	RFC3168