<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/uapi/linux/pkt_sched.h, branch v4.4.27</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.27</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.27'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2015-05-12T22:22:49Z</updated>
<entry>
<title>net_sched: gred: add TCA_GRED_LIMIT attribute</title>
<updated>2015-05-12T22:22:49Z</updated>
<author>
<name>David Ward</name>
<email>david.ward@ll.mit.edu</email>
</author>
<published>2015-05-10T02:01:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a3eb95f891d6130b1fc03dd07a8b54cf0a5c8ab8'/>
<id>urn:sha1:a3eb95f891d6130b1fc03dd07a8b54cf0a5c8ab8</id>
<content type='text'>
In a GRED qdisc, if the default "virtual queue" (VQ) does not have drop
parameters configured, then packets for the default VQ are not subjected
to RED and are only dropped if the queue is larger than the net_device's
tx_queue_len. This behavior is useful for WRED mode, since these packets
will still influence the calculated average queue length and (therefore)
the drop probability for all of the other VQs. However, for some drivers
tx_queue_len is zero. In other cases the user may wish to make the limit
the same for all VQs (including the default VQ with no drop parameters).

This change adds a TCA_GRED_LIMIT attribute to set the GRED queue limit,
in bytes, during qdisc setup. (This limit is in bytes to be consistent
with the drop parameters.) The default limit is the same as for a bfifo
queue (tx_queue_len * psched_mtu). If the drop parameters of any VQ are
configured with a smaller limit than the GRED queue limit, that VQ will
still observe the smaller limit instead.

Signed-off-by: David Ward &lt;david.ward@ll.mit.edu&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>codel: add ce_threshold attribute</title>
<updated>2015-05-10T23:50:20Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-05-08T22:05:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=80ba92fa1a92dea128283f69f55b02242e213650'/>
<id>urn:sha1:80ba92fa1a92dea128283f69f55b02242e213650</id>
<content type='text'>
For DCTCP or similar ECN based deployments on fabrics with shallow
buffers, hosts are responsible for a good part of the buffering.

This patch adds an optional ce_threshold to codel &amp; fq_codel qdiscs,
so that DCTCP can have feedback from queuing in the host.

A DCTCP enabled egress port simply have a queue occupancy threshold
above which ECT packets get CE mark.

In codel language this translates to a sojourn time, so that one doesn't
have to worry about bytes or bandwidth but delays.

This makes the host an active participant in the health of the whole
network.

This also helps experimenting DCTCP in a setup without DCTCP compliant
fabric.

On following example, ce_threshold is set to 1ms, and we can see from
'ldelay xxx us' that TCP is not trying to go around the 5ms codel
target.

Queue has more capacity to absorb inelastic bursts (say from UDP
traffic), as queues are maintained to an optimal level.

lpaa23:~# ./tc -s -d qd sh dev eth1
qdisc mq 1: dev eth1 root
 Sent 87910654696 bytes 58065331 pkt (dropped 0, overlimits 0 requeues 42961)
 backlog 3108242b 364p requeues 42961
qdisc codel 8063: dev eth1 parent 1:1 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
 Sent 7363778701 bytes 4863809 pkt (dropped 0, overlimits 0 requeues 5503)
 rate 2348Mbit 193919pps backlog 255866b 46p requeues 5503
  count 0 lastcount 0 ldelay 1.0ms drop_next 0us
  maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 72384
qdisc codel 8064: dev eth1 parent 1:2 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
 Sent 7636486190 bytes 5043942 pkt (dropped 0, overlimits 0 requeues 5186)
 rate 2319Mbit 191538pps backlog 207418b 64p requeues 5186
  count 0 lastcount 0 ldelay 694us drop_next 0us
  maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 69873
qdisc codel 8065: dev eth1 parent 1:3 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
 Sent 11569360142 bytes 7641602 pkt (dropped 0, overlimits 0 requeues 5554)
 rate 3041Mbit 251096pps backlog 210446b 59p requeues 5554
  count 0 lastcount 0 ldelay 889us drop_next 0us
  maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 37780
...

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Glenn Judd &lt;glenn.judd@morganstanley.com&gt;
Cc: Nandita Dukkipati &lt;nanditad@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>pkt_sched: fq: better control of DDOS traffic</title>
<updated>2015-02-05T06:15:45Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-02-05T05:30:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=06eb395fa9856b5a87cf7d80baee2a0ed3cdb9d7'/>
<id>urn:sha1:06eb395fa9856b5a87cf7d80baee2a0ed3cdb9d7</id>
<content type='text'>
FQ has a fast path for skb attached to a socket, as it does not
have to compute a flow hash. But for other packets, FQ being non
stochastic means that hosts exposed to random Internet traffic
can allocate million of flows structure (104 bytes each) pretty
easily. Not only host can OOM, but lookup in RB trees can take
too much cpu and memory resources.

This patch adds a new attribute, orphan_mask, that is adding
possibility of having a stochastic hash for orphaned skb.

Its default value is 1024 slots, to mimic SFQ behavior.

Note: This does not apply to locally generated TCP traffic,
and no locally generated traffic will share a flow structure
with another perfect or stochastic flow.

This patch also handles the specific case of SYNACK messages:

They are attached to the listener socket, and therefore all map
to a single hash bucket. If listener have set SO_MAX_PACING_RATE,
hoping to have new accepted socket inherit this rate, SYNACK
might be paced and even dropped.

This is very similar to an internal patch Google have used more
than one year.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: pkt_sched: PIE AQM scheme</title>
<updated>2014-01-06T20:13:01Z</updated>
<author>
<name>Vijay Subramanian</name>
<email>vijaynsu@cisco.com</email>
</author>
<published>2014-01-05T01:33:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d4b36210c2e6ecef0ce52fb6c18c51144f5c2d88'/>
<id>urn:sha1:d4b36210c2e6ecef0ce52fb6c18c51144f5c2d88</id>
<content type='text'>
Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.

&gt;From the IETF draft below:
" Bufferbloat is a phenomenon where excess buffers in the network cause high
latency and jitter. As more and more interactive applications (e.g. voice over
IP, real time video streaming and financial transactions) run in the Internet,
high latency and jitter degrade application performance. There is a pressing
need to design intelligent queue management schemes that can control latency and
jitter; and hence provide desirable quality of service to users.

We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software.  "

Many thanks to Dave Taht for extensive feedback, reviews, testing and
suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and
suggestions.  Naeem Khademi and Dave Taht independently contributed to ECN
support.

For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.

Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00

All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.

For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)

Signed-off-by: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
Signed-off-by: Mythili Prabhu &lt;mysuryan@cisco.com&gt;
CC: Dave Taht &lt;dave.taht@bufferbloat.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sch_netem: support of 64bit rates</title>
<updated>2013-12-31T19:31:44Z</updated>
<author>
<name>Yang Yingliang</name>
<email>yangyingliang@huawei.com</email>
</author>
<published>2013-12-25T09:35:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6a031f67c83aa175aedd10d4ae64750415ab57b0'/>
<id>urn:sha1:6a031f67c83aa175aedd10d4ae64750415ab57b0</id>
<content type='text'>
Add a new attribute to support 64bit rates so that
tc can use them to break the 32bit limit.

Signed-off-by: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Acked-by: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sch_tbf: add TBF_BURST/TBF_PBURST attribute</title>
<updated>2013-12-26T18:54:22Z</updated>
<author>
<name>Yang Yingliang</name>
<email>yangyingliang@huawei.com</email>
</author>
<published>2013-12-20T01:24:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2e04ad424b03661ec8239acd52146497eb33be1c'/>
<id>urn:sha1:2e04ad424b03661ec8239acd52146497eb33be1c</id>
<content type='text'>
When we set burst to 1514 with low rate in userspace,
the kernel get a value of burst that less than 1514,
which doesn't work.

Because it may make some loss when transform burst
to buffer in userspace. This makes burst lose some
bytes, when the kernel transform the buffer back to
burst.

This patch adds two new attributes to support sending
burst/mtu to kernel directly to avoid the loss.

Signed-off-by: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc</title>
<updated>2013-12-19T19:48:42Z</updated>
<author>
<name>Terry Lam</name>
<email>vtlam@google.com</email>
</author>
<published>2013-12-15T08:30:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=10239edf86f137ce4c39b62ea9575e8053c549a0'/>
<id>urn:sha1:10239edf86f137ce4c39b62ea9575e8053c549a0</id>
<content type='text'>
This patch implements the first size-based qdisc that attempts to
differentiate between small flows and heavy-hitters.  The goal is to
catch the heavy-hitters and move them to a separate queue with less
priority so that bulk traffic does not affect the latency of critical
traffic.  Currently "less priority" means less weight (2:1 in
particular) in a Weighted Deficit Round Robin (WDRR) scheduler.

In essence, this patch addresses the "delay-bloat" problem due to
bloated buffers. In some systems, large queues may be necessary for
obtaining CPU efficiency, or due to the presence of unresponsive
traffic like UDP, or just a large number of connections with each
having a small amount of outstanding traffic. In these circumstances,
HHF aims to reduce the HoL blocking for latency sensitive traffic,
while not impacting the queues built up by bulk traffic.  HHF can also
be used in conjunction with other AQM mechanisms such as CoDel.

To capture heavy-hitters, we implement the "multi-stage filter" design
in the following paper:
C. Estan and G. Varghese, "New Directions in Traffic Measurement and
Accounting", in ACM SIGCOMM, 2002.

Some configurable qdisc settings through 'tc':
- hhf_reset_timeout: period to reset counter values in the multi-stage
                     filter (default 40ms)
- hhf_admit_bytes:   threshold to classify heavy-hitters
                     (default 128KB)
- hhf_evict_timeout: threshold to evict idle heavy-hitters
                     (default 1s)
- hhf_non_hh_weight: Weighted Deficit Round Robin (WDRR) weight for
                     non-heavy-hitters (default 2)
- hh_flows_limit:    max number of heavy-hitter flow entries
                     (default 2048)

Note that the ratio between hhf_admit_bytes and hhf_reset_timeout
reflects the bandwidth of heavy-hitters that we attempt to capture
(25Mbps with the above default settings).

The false negative rate (heavy-hitter flows getting away unclassified)
is zero by the design of the multi-stage filter algorithm.
With 100 heavy-hitter flows, using four hashes and 4000 counters yields
a false positive rate (non-heavy-hitters mistakenly classified as
heavy-hitters) of less than 1e-4.

Signed-off-by: Terry Lam &lt;vtlam@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>pkt_sched: fq: fix pacing for small frames</title>
<updated>2013-11-16T02:01:52Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-11-15T16:58:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f52ed89971adbe79b6438c459814034707b8ab91'/>
<id>urn:sha1:f52ed89971adbe79b6438c459814034707b8ab91</id>
<content type='text'>
For performance reasons, sch_fq tried hard to not setup timers for every
sent packet, using a quantum based heuristic : A delay is setup only if
the flow exhausted its credit.

Problem is that application limited flows can refill their credit
for every queued packet, and they can evade pacing.

This problem can also be triggered when TCP flows use small MSS values,
as TSO auto sizing builds packets that are smaller than the default fq
quantum (3028 bytes)

This patch adds a 40 ms delay to guard flow credit refill.

Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Maciej Żenczykowski &lt;maze@google.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>pkt_sched: fq: warn users using defrate</title>
<updated>2013-11-16T02:01:52Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-11-15T16:57:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=65c5189a2b57b9aa1d89e4b79da39928257c9505'/>
<id>urn:sha1:65c5189a2b57b9aa1d89e4b79da39928257c9505</id>
<content type='text'>
Commit 7eec4174ff29 ("pkt_sched: fq: fix non TCP flows pacing")
obsoleted TCA_FQ_FLOW_DEFAULT_RATE without notice for the users.

Suggested by David Miller

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net_sched: tbf: support of 64bit rates</title>
<updated>2013-11-09T19:53:37Z</updated>
<author>
<name>Yang Yingliang</name>
<email>yangyingliang@huawei.com</email>
</author>
<published>2013-11-08T02:23:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a33c4a2663c19ac01e557d6b78806271eec2a150'/>
<id>urn:sha1:a33c4a2663c19ac01e557d6b78806271eec2a150</id>
<content type='text'>
With psched_ratecfg_precompute(), tbf can deal with 64bit rates.
Add two new attributes so that tc can use them to break the 32bit
limit.

Signed-off-by: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Suggested-by: Sergei Shtylyov &lt;sergei.shtylyov@cogentembedded.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
