Notes on the new ALTQ implementation

				Kenjiro Cho
				July 2000
				last update: $Date: 2006/09/28 03:00:40 $

    The BSD systems need better output queueing abstraction to support
packet scheduling (e.g., CBQ) or active queue management (e.g., RED).
To introduce a new model, we need to convert the existing code to be
conformant to the new model.  But the problem is that there are too
many drivers to convert all at once.

    This is a proposal that allows incremental transition to the
new model.  (If we are going to modify the existing drivers, we need
to get it right.)
The model is designed for ALTQ but it is general enough for other
implementations so that we can make the driver conversion once and
for all.

The new model removes direct references to the fields
within ifp->if_snd, and defines the following macros to manipulate
ifp->if_snd:
	IFQ_ENQUEUE(ifq, m, err)
	IFQ_DEQUEUE(ifq, m)
	IFQ_POLL(ifq, m)
	IFQ_PURGE(ifq)
	IFQ_IS_EMPTY(ifq)
The new model also enforces some rules regarding how to use these
macros.

Another requirement for a driver is to work under rate-limiting.
 - IFQ_DEQUEUE() could return NULL even when IFQ_IS_EMPTY() is FALSE
   under rate-limiting.  a driver should always check if (m == NULL).
 - a driver is supposed to call if_start from the tx complete interrupt
   under late-limiting (in order to trigger the next dequeue).

For most drivers, it is a simple task of replacing old-style lines by
the corresponding new-style lines, and usually just a few lines need
to be modified.  But some drivers need more than that.
The old-style drivers still work with the original FIFO queue but
they cannot take advantage of new queueing disciplines.

For locking an output queue to support SMP, ALTQ uses the same model
as in FreeBSD-5.0.  One restriction is that, if a driver uses
poll-and-dequeue, the driver needs to explicitly lock the queue
between the poll operation and the dequeue operation.

Contents
	1 Problems in the current code
	  1.1 Lack of enough abstraction in queue operations
	  1.2 Negative effects of long DMA chains
	  1.3 Summary
	2 Design overview
	  2.1 System model
	  2.2 output queue structure
	  2.3 queue operations
	  2.4 classifier
	  2.5 tokenbucket regulator
	3 Details
	  3.1 compatibility with the existing code
	  3.2 struct ifaltq
	  3.3 if_output
	4 How to convert the existing drivers
	5 Queueing Disciplines
	6 SMP fine-grained locking support
	7 Summary

1 Problems in the current code

1.1 Lack of enough abstraction in queue operations

(1) Drop-Tail assumption

The existing code assumes the Drop-Tail policy, that is, the arriving
packet is dropped. But decision of dropping and selection of a victim
packet should be done by a queueing discipline.  

To support non-Drop Tail policies, it is difficult to separate
the drop operation from the enqueue operation without knowledge of the 
structure of the queueing discipline.
Thus, the drop operation needs to be part of the enqueue operation.

(2) Lack of poll operation

Some drivers poll at the head of the queue to see if the driver has enough
resources (e.g., buffer space and/or DMA descriptors) for the next packet.  
A typical peek operation found in drivers looks like:

    while (ifp->if_snd.ifq_head != NULL) {
        /*
         * get resources to send a packet
         */

        IF_DEQUEUE(&ifp->if_snd, m);

        /*
         * DMA setup and kick the device
         */
    }

Those drivers directly access if_snd.ifq_head using different methods
since no standard procedure is defined for a poll operation.
A queueing discipline could have multiple queues, or could be about to 
dequeue a packet other than the one at the head of the queue.
Therefore, a poll operation that returns the next packet without
dequeueing it should be part of the generic queueing operations.

(3) Prepend operation assuming FIFO

On the other hand, IF_PREPEND() is currently defined to add a packet
at the head of the queue, but the prepend operation is intended for a
FIFO queue and should not be included in the generic queueing operations.

There are driver which uses IF_PREPEND() to put back a dequeued packet
when something goes wrong.  However, for some queueing disciplines, it
is not so simple to cancel a dequeue operation once the internal state
is updated.  Such a driver should be modified to use a
poll-and-dequeue method instead of a dequeue-and-prepend method.

(4) Lack of purge operation

A dequeue loop is often used in drivers to empty the queue.
However, a non-work conserving queue cannot be emptied by this method
since a packet is not dequeued until its departure time.
Therefore, a purge operation should be defined and drivers
should be modified to use the defined purge operation to empty the
queue.

1.2 Negative effects of long DMA chains

Many modern network cards support chained DMA, typically, up to 128 or
256 entries.
Most network drivers are written to buffer packets as many as possible
in order not to under-utilize the link and to reduce the number of
interrupts.
However, it creates a long waiting queue after packets are scheduled
by the packet scheduler.
The device buffer has an effect of inserting another FIFO queue
beneath a queueing discipline.

(1) Delay caused by a large buffer

An obvious problem is that, even if the packet scheduler tries to
minimize the delay for a certain packet, the packet needs to wait in
the device buffer for hundreds of packets to be drained.
Thus, delay cannot be controlled if there is a large buffer in the
network card.

(2) Bursty dequeues

Another less obvious but more serious problem is that, packets are
moved from the queue to the device buffer in a very bursty manner.
If the queue gets emptied when a large chunk of packets are dequeued
at a time, the packet scheduler loses control.  
A packet scheduler is effective when there are backlogged packets in
the queue.

These problems are invisible under FIFO, and thus, most drivers are
not written to limit the number of packets in the transmission buffer.
However, the problem becomes apparent when preferential scheduling is
used.

1.3 Summary

In summay,  
 (1) we need to introduce a new abstraction of an output queue
 (2) we need a way to limit the burst size that a driver can dequeue at
     a time
We cannot avoid introducing some changes to the existing drivers.

2 Design overview

The new model provides an abstraction of an output queue, and defines
operations to manipulate queues.
It also provides a device-independent mechanism to limit the size of
dequeue burst.
(In the previous ALTQ, there wasn't clear separation between
abstraction and implementation.  Also, drivers are independently
modified to limit the burst size. The new model tries to solve these
issues.)

In order to allow incremental support of drivers, the current
structure (struct ifqueue), and macros (IF_ENQUEUE(), IF_DEQUEUE())
are not changed.
(struct ifqueue is used by many other places)

In the new model, a new output queue structure, struct ifaltq, is
defined.  Macros for the new queue have a prefix "IFQ_".

2.1 System model

The new queue model consists of 3 independent components.
    Classifier: classifies a packet to a scheduling class based on
	predefined rules.
    Queueing Discipline: implements packet scheduling and buffer
	management algorithms.
    Queue Regulator: limits the amount that a driver can dequeue at a
	time.

The following figure illustrates the relation of these components.
3 operations (classify, enqueue, dequeue) are used to let the
components work on a packet.

   ---> classifier ------> queueing ------>  queue    ------> (driver)
             |          |  discipline        regulator |
             |          |                              |
            CLASSIFY   ENQUEUE                        DEQUEUE

In the BSD protocol stack, classify and enqueue is called from
if_output, and dequeue is called from if_start.

        ip_output
            |
	if_output
		- classify   <-----> classifier structure
		- enqueue    ---+    (the classifier result
                                |     is passed to enqueue)
                                |
                                +--> queue structure
                                          |
	if_start                          |
		- dequeue    <------ regulator structure

2.2 output queue structure

The new output queue structure, struct ifaltq, can be divided into 5
groups.
 (1) fields compatible with struct ifqueue
 (2) alternate queueing discipline related fields
 (3) classifier fields
 (4) token bucket regulator fields
 (5) input traffic conditioner fields (doesn't belong to the output queue...)

2.3 queue operations

The following output queue operations are defined.

	IFQ_ENQUEUE(ifq, m, err)
	  - enqueues a packet (m) to the queue (ifq).
	    if it fails, ENOBUFS is set to (err).
	IFQ_DEQUEUE(ifq, m)
	  - dequeues a packet (m) from the queue (ifq).
	IFQ_POLL(ifq, m)
	  - returns the next packet (m) without removing it from the queue
	IFQ_IS_EMPTY(ifq)
	  - TRUE if the queue is empty
	IFQ_PURGE(ifq)
	  - discards all the packets in the queue
	IFQ_CLASSIFY(ifq, m, af, pktattr)
	  - classify a packet to a scheduling class, and set the
	    result to pktattr.

	other minor macros:
	IFQ_SET_MAXLEN(ifq, len)
	 - sets len to (ifq)->ifq_maxlen.
	IFQ_INC_LEN(ifq)
	IFQ_DEC_LEN(ifq)
	 - increments or decrements (ifq)->ifq_len.
	IFQ_INC_DROPS(ifq)
	 - same as IF_DROP(ifq).
	    (I really want to move this field into struct if_data...)
	IFQ_SET_READY(ifq)
	 - set a flag to indicate this driver is converted to the new model. 

2.4 classifier

A classifier maps a packet to a scheduling class.
In the current design, a classifier is called in if_output, and the
result (scheduling class) is passed to the enqueue operation.
Packet classification needs to be done before prepending link-level
headers.  (variable length link-level headers make it hard to look
into the IP header.)

It could be possible in the future to integrate the classifier into a
part of the existing IP packet filter, and tag the result to the mbuf
in order to pass the result to the queueing discipline.  
But I still don't have a clear idea how to modify the packet filter
and the mbuf structure to do that...

2.5 tokenbucket regulator

The purpose of the token bucket regulator is to limit the amount of
packets that a driver can dequeue.
A token bucket has "rate" and "size".  Tokens accumulate in a bucket
at the average "rate", up to the bucket "size".
A driver can dequeue a packet as long as there are positive tokens,
and after a packet is dequeued, the size of the packet is subtracted
from the tokens.
(note that this implementation allows the token to be negative as a
deficit, and differs from a typical token bucket that compares the
packet size with the remaining tokens beforehand.)

The token bucket regulator is implemented as a wrapper function of the
dequeue operation.  A simplified version of tbr_dequeue looks like:

struct mbuf *
tbr_dequeue(ifq)
	struct ifaltq *ifq;
{
	struct tb_regulator *tbr = ifq->altq_tbr;
	struct mbuf *m;

	update_token(tbr);
	if (tbr->tbr_token <= 0)
		return (NULL);
	if (ALTQ_IS_ENABLED(ifq))
		ALTQ_DEQUEUE(ifq, m);
	else
		IF_DEQUEUE(ifq, m);
	if (m)
		tbr->tbr_token -= m->m_pkthdr.len;
	return (m);
}

It is important to understand the roles of "rate" and "size".
The bucket size controls the amount of burst that can dequeued at a
time, and controls a greedy device trying dequeue packets as much as
possible.  This is the primary purpose of the token bucket regulator
in ALTQ.  Thus, the rate should be set to the wire speed.  (even if
the rate is set to a larger value, it does not matter much since our
focus is excessive bursts.)

On the other hand, if the rate is set to a smaller value than the wire
speed, the token bucket regulator becomes a shaper that limits the
long-term output rate.
Another important point is that, when the rate is set to more than the
actual transfer speed, tx complete interrupts can trigger the next
dequeue.  However, if the rate is smaller, the rate limit would be
still in effect at the tx complete interrupt, and the rate limiting
falls back to the kernel timer to trigger the next dequeue.  In order
to achieve the target rate under timer-driven rate limiting, the
bucket size should be increased to fill the timer interval.

I have had difficulties in debugging ALTQ to distinguish problems in
drivers from problems in disciplines since they interfere with each
other.  The token bucket regulator allows us to tackle with the
problems independently.

(A token bucket regulator uses a high resolution clock on i386 (TSC)
and alpha (PCC) but uses microtime() on other platforms or on SMP.)

3 Details

3.1 compatibility with the existing code

In order to keep compatibility with the existing code, the new
output queue structure (ifaltq) has the same fields not to break
IF_XXX macros and direct references to the fields within if_snd.
(Once we finish conversions of all the drivers, we no longer need
these fields.)

            ##old-style##                           ##new-style##
                                       |
 struct ifqueue {                      | struct ifaltq {
    struct mbuf *ifq_head;             |    struct mbuf *ifq_head;
    struct mbuf *ifq_tail;	       |    struct mbuf *ifq_tail;
    int          ifq_len;              |    int          ifq_len;
    int          ifq_maxlen;           |    int          ifq_maxlen;
    int          ifq_drops;            |    int          ifq_drops;
 };                                    |    /* altq related fields */
                                       |    ......
                                       | };
                                       |

The new structure replaces struct ifqueue in struct ifnet.

            ##old-style##                           ##new-style##
                                       |
 struct ifnet {                        | struct ifnet {
     ....                              |     ....
                                       |
     struct ifqueue if_snd;            |     struct ifaltq if_snd;
                                       |
     ....                              |     ....
 };                                    | };
                                       |

The (simplified) new IFQ_XXX macros looks like:

	#ifdef ALTQ
	#define IFQ_DEQUEUE(ifq, m)			\
		if (ALTQ_IS_ENABLED((ifq))		\
			ALTQ_DEQUEUE((ifq), (m));	\
		else					\
			IF_DEQUEUE((ifq), (m));
	#else
	#define IFQ_DEQUEUE(ifq, m)	IF_DEQUEUE((ifq), (m));
	#endif

3.2 struct ifaltq

Here is the complete ifaltq structure.

/*
 * Structure defining a queue for a network interface.
 */
struct	ifaltq {
	/* fields compatible with struct ifqueue */
	struct	mbuf *ifq_head;
	struct	mbuf *ifq_tail;
	int	ifq_len;
	int	ifq_maxlen;
	int	ifq_drops;

	/* alternate queueing related fields */
	int	altq_type;		/* discipline type */
	int	altq_flags;		/* flags (e.g. ready, in-use) */
	void	*altq_disc;		/* for discipline-specific use */
	struct	ifnet *altq_ifp;	/* back pointer to interface */

	int	(*altq_enqueue) __P((struct ifaltq *ifq, struct mbuf *m,
				     struct altq_pktattr *));
	struct	mbuf *(*altq_dequeue) __P((struct ifaltq *ifq, int remove));
	int	(*altq_request) __P((struct ifaltq *ifq, int req, void *arg));

	/* classifier fields */
	void	*altq_clfier;		/* classifier-specific use */
	void	*(*altq_classify) __P((void *, struct mbuf *, int));

	/* token bucket regulator */
	struct	tb_regulator *altq_tbr;

	/* input traffic conditioner (doesn't belong to the output queue...) */
	struct top_cdnr *altq_cdnr;
};

3.3 if_output

(1) enqueue operation

The semantics of the enqueue operation is changed.  In the new style,
the enqueue and packet drop are combined since they cannot be easily
separated in many queueing disciplines.
The new enqueue operation corresponds to the following macro written
with the old macros.

#define	IFQ_ENQUEUE(ifq, m, err)					\
do {									\
	if (IF_QFULL((ifq))) {						\
		m_freem((m));						\
		(err) = ENOBUFS;					\
		IF_DROP(ifq);						\
	} else {							\
		IF_ENQUEUE((ifq), (m));					\
		(err) = 0;						\
	}								\
} while (0)


IFQ_ENQUEUE() does the followings:
	1. queue a packet
	2. drops (and m_freem) a packet if failed

	when enqueue is failed,
		IFQ_ENQUEUE sets error (ENOBUFS)
		mbuf is freed <-- differs from the current IF_ENQUEUE
	DO NOT TOUCH mbuf after IFQ_ENQUEUE().
		need to store m->m_pkthdr.len or m->m_flags beforehand.
	DO NOT use senderr(ENOBUFS) since mbuf was already freed.

The new style if_output looks as follows:

            ##old-style##                           ##new-style##
                                       |
 int                                   | int 
 ether_output(ifp, m0, dst, rt0)       | ether_output(ifp, m0, dst, rt0)
 {                                     | {
     ......                            |     ......
                                       |
                                       |     mflags = m->m_flags;
                                       |     len = m->m_pkthdr.len;
     s = splimp();                     |     s = splimp();
     if (IF_QFULL(&ifp->if_snd)) {     |     IFQ_ENQUEUE(&ifp->if_snd, m,
                                       |                 error);
         IF_DROP(&ifp->if_snd);        |     if (error != 0) {
         splx(s);                      |         splx(s);
         senderr(ENOBUFS);             |         retuen (error);
     }                                 |     }
     IF_ENQUEUE(&ifp->if_snd, m);      |
     ifp->if_obytes +=                 |     ifp->if_obytes += len;
                    m->m_pkthdr.len;   |
     if (m->m_flags & M_MCAST)         |     if (mflags & M_MCAST)
         ifp->if_omcasts++;            |         ifp->if_omcasts++;
                                       |
     if ((ifp->if_flags & IFF_OACTIVE) |     if ((ifp->if_flags & IFF_OACTIVE)
         == 0)                         |         == 0)
         (*ifp->if_start)(ifp);        |         (*ifp->if_start)(ifp);
     splx(s);                          |     splx(s);
     return (error);                   |     return (error);
                                       |
 bad:                                  | bad:
     if (m)                            |     if (m)
         m_freem(m);                   |         m_freem(m);
     return (error);                   |     return (error);
 }                                     | }
                                       |

(2) classifier

The classifier mechanism is currently implemented in if_output().
struct altq_pktattr is used to store the classifier result, and it is
passed to the enqueue function.
Because the classifier part is still not in the final form, the
classifier related codes are enclosed by "#ifdef ALTQ".
(changing the classifier implementation does not need to touch the
drivers.)

int
ether_output(ifp, m0, dst, rt0)
{
	......
	struct pktattr pktattr;

	......

	/* classify the packet before prepending link-headers */
	IFQ_CLASSIFY(&ifp->if_snd, m, dst->sa_family, &pktattr);

	/* prepend link-level headers */
	......

	IFQ_ENQUEUE(&ifp->if_snd, m, &pktattr, error);

	......
}

4 How to convert the existing drivers

First, make sure the corresponding if_output is already converted to
the new style.

Look for "if_snd" in the driver.  Probably, you need to make changes
to the lines that include "if_snd".

(1) empty check

If the code checks "ifq_head" to see whether the queue is empty or not,
use IFQ_IS_EMPTY().

            ##old-style##                           ##new-style##
                                       |
 if (ifp->if_snd.ifq_head != NULL)     | if (!IFQ_IS_EMPTY(&ifp->if_snd))
                                       |

Note that IFQ_POLL() can be used for the same purpose, but IFQ_POLL()
could be costly for a complex scheduling algorithm since the
IFQ_POLL() needs to run the scheduling algorithm to select the next
packet.
On the other hand, IFQ_EMPTY() checks only if there is any packet
stored in the queue.

(2) dequeue operaion

Replace IF_DEQUEUE() by IFQ_DEQUEUE().
 - ALWAY CHECK whether the dequeued mbuf is NULL or not.  
   Note that even when IFQ_IS_EMPTY() is FALSE, IFQ_DEQUEUE() could
   return NULL due to the rate limit.

            ##old-style##                           ##new-style##
                                       |
 IF_DEQUEUE(&ifp->if_snd, m);          | IFQ_DEQUEUE(&ifp->if_snd, m);
                                       | if (m == NULL)
                                       |     return;
                                       |

 - a driver is supposed to call if_start from the tx complete interrupt
   under late-limiting (in order to trigger the next dequeue).

(3) poll-and-dequeue

If the code polls the packet at the top of the queue and actually use
it before dequeueing it, use IFQ_POLL() and IFQ_DEQUEUE().

            ##old-style##                           ##new-style##
                                       |
 m = ifp->if_snd.ifq_head;             | IFQ_POLL(&ifp->if_snd, m);
 if (m != NULL) {                      | if (m != NULL) {
                                       |
     /* use m to get resources */      |     /* use m to get resources */
     if (something goes wrong)         |     if (something goes wrong)
         return;                       |         return;
                                       |
     IF_DEQUEUE(&ifp->if_snd, m);      |     IFQ_DEQUEUE(&ifp->if_snd, m);
                                       |
     /* kick the hardware */           |     /* kick the hardware */
 }                                     | }
                                       |

It is guaranteed that IFQ_DEQUEUE() immediately after IFQ_POLL()
returns the same packet.  (They need to be guarded by splimp() if
called from other than if_start.)

Do NOT FREE mbuf before it is dequeued.  (some drivers try to
m_copydata() and m_freem() in fit a packet into a single mbuf.)

(4) eliminating IF_PREPEND

If the code use IF_PREPEND(), you have to eliminate it.
A common use of IF_PREPEND() is to cancel the previous dequeue
operation.  You have to convert the logic into poll-and-dequeue.

            ##old-style##                           ##new-style##
                                       |
 IF_DEQUEUE(&ifp->if_snd, m);          | IFQ_POLL(&ifp->if_snd, m);
 if (m != NULL) {                      | if (m != NULL) {
                                       |
     if (something_goes_wrong) {       |     if (something_goes_wrong) {
         IF_PREPEND(&ifp->if_snd, m);  |
         return;                       |         return;
     }                                 |     }
                                       |
                                       |     /* at this point, the driver
                                       |      * is committed to send this
                                       |      * packet.
                                       |      */
                                       |     IFQ_DEQUEUE(&ifp->if_snd, m);
                                       |
     /* kick the hardware */           |     /* kick the hardware */
 }                                     | }
                                       |

(5) purge operation

Use IFQ_PURGE() to empty the queue.
Note that a non-work conserving queue cannot be emptied by a dequeue
loop.

            ##old-style##                           ##new-style##
                                       |
 while (ifp->if_snd.ifq_head != NULL) {|  IFQ_PURGE(&ifp->if_snd);
     IF_DEQUEUE(&ifp->if_snd, m);      |
     m_freem(m);                       |
 }                                     |
                                       |

(6) attach routine

Use IFQ_SET_MAXLEN() to set a value to "ifq_maxlen".
Add IFQ_SET_READY() to show this driver is converted to the new style.
(this is used to distinguish new-style drivers.)

            ##old-style##                           ##new-style##
                                       |
                                       | IFQ_SET_READY(&ifp->if_snd);
 if_attach(ifp);                       | if_attach(ifp);
                                       |


(7) other issues

            ##old-style##                           ##new-style##
                                       |
 ifp->if_snd.ifq_maxlen = qsize;       | IFQ_SET_MAXLEN(&ifp->if_snd, qsize);
                                       |
 IF_DROP(&ifp->if_snd);                | IFQ_INC_DROPS(&ifp->if_snd);
                                       |
 ifp->if_snd.ifq_len++;                | IFQ_INC_LEN(&ifp->if_snd);
                                       |
 ifp->if_snd.ifq_len--;                | IFQ_INC_LEN(&ifp->if_snd);
                                       |

 - the fxp driver instructs the hardware to invoke tx complete
   interruts only when it thinks necessary.
   the tokenbucket regulator breaks its assumption.
 - the de driver has tulip_ifstart_one() that dequeues only one
   packet.  the tokenbucket regulator breaks its assumption when to
   switch to this code.

(8) How to convert drivers using multiple ifqueues

Some (pseudo) devices (such as slip) have another ifqueue to
prioritize packets.  It is possible to eliminate the second queue
since ALTQ provides more flexible mechanism but the following shows
how to keep the original behavior.

struct sl_softc {
	struct	ifnet sc_if;		/* network-visible interface */
	...
	struct	ifqueue sc_fastq;	/* interactive output queue */
	...
};

The driver doesn't compile in the new model since it has the following
line (if_snd is no longer a type of struct ifqueue).

	struct ifqueue *ifq = &ifp->if_snd;

A simple way is to use the original IF_XXX macros for "sc_fastq" and
use the new IFQ_XXX macros for "if_snd".
The enqueue operation looks like:

            ##old-style##                           ##new-style##
                                       |
 struct ifqueue *ifq = &ifp->if_snd;   | struct ifqueue *ifq = NULL;
                                       |
 if (ip->ip_tos & IPTOS_LOWDELAY)      | if ((ip->ip_tos & IPTOS_LOWDELAY) &&
     ifq = &sc->sc_fastq;              | !ALTQ_IS_ENABLED(&sc->sc_if.if_snd)) {
                                       |     ifq = &sc->sc_fastq;
 if (IF_QFULL(ifq)) {                  |     if (IF_QFULL(ifq)) {
     IF_DROP(ifq);                     |         IF_DROP(ifq);
     m_freem(m);                       |         m_freem(m);
     splx(s);                          |         error = ENOBUFS;
     sc->sc_if.if_oerrors++;           |     } else {
     return (ENOBUFS);                 |         IF_ENQUEUE(ifq, m);
 }                                     |         error = 0;
 IF_ENQUEUE(ifq, m);                   |     }
                                       | } else {
                                       |     IFQ_ENQUEUE(&sc->sc_if.if_snd, 
                                       |                 m, error);
                                       | }
                                       | if (error) {
                                       |     splx(s);
                                       |     sc->sc_if.if_oerrors++;
                                       |     return (error);
                                       | }
 if ((sc->sc_oqlen =                   | if ((sc->sc_oqlen =
      sc->sc_ttyp->t_outq.c_cc) == 0)  |      sc->sc_ttyp->t_outq.c_cc) == 0)
     slstart(sc->sc_ttyp);             |     slstart(sc->sc_ttyp);
 splx(s);                              | splx(s);
                                       |

The dequeue operations looks like:

            ##old-style##                           ##new-style##
                                       |
 s = splimp();                         | s = splimp();
 IF_DEQUEUE(&sc->sc_fastq, m);         | IF_DEQUEUE(&sc->sc_fastq, m);
 if (m == NULL)                        | if (m == NULL)
     IF_DEQUEUE(&sc->sc_if.if_snd, m); |     IFQ_DEQUEUE(&sc->sc_if.if_snd, m);
 splx(s);                              | splx(s);
                                       |

5 Queueing Disciplines

Queueing disciplines
 - need to maintain "ifq_len".  (used by IFQ_IS_EMPTY())
 - need to guarantee the same mbuf is returned if IFQ_DEQUEUE()
   is called immediately after IFQ_POLL().
 - can rely on the token bucket regulator to call if_start
   appropriately.

6 SMP fine-grained locking support

FreeBSD-5.0 has introduced SMP fine-grained locking.
A mutex is added to struct ifqueue, and IF_ENQUEUE() and IF_DEQUEUE()
lock the queue while manipulating the queue.

ALTQ follows this convention, that is,  IFQ_ENQUEUE() and IFQ_DEQUEUE()
lock the queue in the same way.  The poll-and-dequeue operations,
however, need to lock the queue between the two operations.  In this
case, a driver needs to explicitly lock and unlock the queue.  A
driver should avoid poll-and-dequeue if possible.

The following macros are used to lock and unlock a queue.
	IFQ_LOCK(ifq)
	IFQ_UNLOCK(ifq)

The queue operations without locking are used with explicit locking.
	IFQ_ENQUEUE_NOLOCK(ifq, m, pattr, err)
	IFQ_DEQUEUE_NOLOCK(ifq, m)
	IFQ_POLL_NOLOCK(ifq, m)
	IFQ_PURGE_NOLOCK(ifq)

            ##new-style##                           ##SMP-version##
                                       |
                                       | IFQ_LOCK(&ifp->if_snd);
 IFQ_POLL(&ifp->if_snd, m);            | IFQ_POLL_NOLOCK(&ifp->if_snd, m);
 if (m != NULL) {                      | if (m != NULL) {
                                       |
     /* use m to get resources */      |     /* use m to get resources */
     if (something goes wrong)         |     if (something goes wrong) {
         return;                       |         IFQ_UNLOCK(&ifp->if_snd);
                                       |         return;
                                       |     }
     IFQ_DEQUEUE(&ifp->if_snd, m);     |     IFQ_DEQUEUE_UNLOCK(&ifp->if_snd,
                                       |                        m);
                                       |
     /* kick the hardware */           |     /* kick the hardware */
 }                                     | }
                                       | IFQ_UNLOCK(&ifp->if_snd);
                                       |

In addition, IF_HANDOFF(ifq, m, ifp) is changed.
IF_HANDOFF() uses an inline function, if_handoff(), internally but it
does not work with both struct ifqueue and struct ifaltq since the
queue argument is prototyped.  To work with different types of queue
structures, an inline function cannot be used.
Another problem is that IF_HANDOFF() returns an error but a macro of
statements cannot be an expression in ANSI C.
Thus, a new macro, IFQ_HANDOFF() is defined for output queues, and its
syntax is different from IF_HANDOFF().

       IFQ_HANDOFF(ifp, m, pattr, err)

In the original net/if_ethersubr.c:

    if (!IF_HANDOFF(&ifp->if_snd, m, ifp))
       return (ENOBUFS);

This is changed to

    IFQ_HANDOFF(ifp, m, &pktattr, error);
    return (error);

7 Summary

The new output queueing model presented in this note allows to
implement flexible queue management algorithms and also allows to
incrementally convert the existing code to the new model.
Although the transition to the new model requires to examin and clean
up the existing drivers, I think the effort will pay off in a long
run.

An implementation of the new model for FreeBSD/NetBSD/OpenBSD can
be found in altq-3.x or KAME snap kits.
http://www.kame.net/