Table of Contents
Cover image
Title page
Copyright
Dedication
Preface to the second edition
Preface
Audience
What this book is about
Organization of the book
Features
Usage Why this book was written
Acknowledgments
15 principles used to overcome network bo lenecks
Part 1: The rules of the game
Introduction
Chapter 1: Introducing network algorithmics
Abstract
1.1. The problem: network bottlenecks
1.2. The techniques: network algorithmics
1.3. Exercise
Chapter 2: Network implementation models
Abstract
2.1. Protocols
2.2. Hardware
2.3. Network device architectures
2.4. Operating systems
2.5. Summary
2.6. Exercises
References
Chapter 3: Fifteen implementation principles
Abstract
3.1. Motivating the use of principles—updating ternary contentaddressable memories
3 2 Algorithms versus algorithmics
3.3. Fifteen implementation principles—categorization and description
3.4. Design versus implementation principles
3.5. Caveats
3.6. Summary
3.7. Exercises
References
Chapter 4: Principles in action
Abstract
4.1. Buffer validation of application device channels
4.2. Scheduler for asynchronous transfer mode flow control
4 3 Route computation using Dijkstra's algorithm
4.4. Ethernet monitor using bridge hardware
4.5. Demultiplexing in the x-kernel
4 6 Tries with node compression
4.7. Packet filtering in routers
4.8. Avoiding fragmentation of LSPs
4.9. Policing traffic patterns
4.10. Identifying a resource hog
4.11. Getting rid of the TCP open connection list
4.12. Acknowledgment withholding
4.13. Incrementally reading a large database
4.14. Binary search of long identifiers
4.15. Video conferencing via asynchronous transfer mode
References
Part 2: Playing with endnodes
Introduction
Chapter 5: Copying data
Abstract
5.1. Why data copies
5.2. Reducing copying via local restructuring
5.3. Avoiding copying using remote DMA
5.4. Broadening to file systems
5.5. Broadening beyond copies
5.6. Broadening beyond data manipulations
5.7. Conclusions
5.8. Exercises
References
Chapter 6: Transferring control
Abstract
6.1. Why control overhead?
6.2. Avoiding scheduling overhead in networking code
6.3. Avoiding context-switching overhead in applications
6.4. Scalable I/O Notification
6.5. Avoiding system calls or Kernel Bypass
6.6. Radical Restructuring of Operating Systems
6.7. Reducing interrupts
6.8. Conclusions
6.9. Exercises
References
Chapter 7: Maintaining timers
Abstract
7.1. Why timers?
7.2. Model and performance measures
7.3. Simplest timer schemes
7.4. Timing wheels
7.5. Hashed wheels
7.6. Hierarchical wheels
7.7. BSD implementation
7.8. Google Carousel implementation
7.9. Obtaining finer granularity timers
7.10. Conclusions
7.11. Exercises
References
Chapter 8: Demultiplexing
Abstract
8.1. Opportunities and challenges of early demultiplexing
8.2. Goals
8.3. CMU/Stanford packet filter: pioneering packet filters
8.4. Berkeley packet filter: enabling high-performance monitoring
8.5. Pathfinder: factoring out common checks
8.6. Dynamic packet filter: compilers to the rescue
8.7. Conclusions
8.8. Exercises
References
Chapter 9: Protocol processing
Abstract
9.1. Buffer management
9.2. Cyclic redundancy checks and checksums
9.3. Generic protocol processing
9.4. Reassembly
9.5. Conclusions
9.6. Exercises
References
Part 3: Playing with routers
Introduction
Chapter 10: Exact-match lookups
Abstract
10.1. Challenge 1: Ethernet under fire
10.2. Challenge 2: wire speed forwarding
10.3. Challenge 3: scaling lookups to higher speeds
10.4. Summary
10.5. Exercise
References
Chapter 11: Prefix-match lookups
Abstract
11.1. Introduction to prefix lookups
11.2. Finessing lookups
11.3. Non-algorithmic techniques for prefix matching
11.4. Unibit tries
11.5. Multibit tries
11.6. Level-compressed (LC) tries
11.7. Lulea-compressed tries
11.8. Tree bitmap
11.9. Binary search on ranges
11.10. Binary search on ranges with Initial Lookup Table
11.11. Binary search on prefix lengths
11.12. Linear search on prefix lengths with hardware assist
11.13. Memory allocation in compressed schemes
11.14. Fixed Function Lookup-chip models
11.15. Programmable Lookup Chips and P4
11.16. Conclusions
11.17. Exercises
References
Chapter 12: Packet classification
Abstract
12.1. Why packet classification?
12.2. Packet-classification problem
12.3. Requirements and metrics
12.4. Simple solutions
12.5. Two-dimensional schemes
12.6. Approaches to general rule sets
12.7. Extending two-dimensional schemes
12.8. Using divide-and-conquer
12.9. Bit vector linear search
12.10. Cross-producting
12.11. Equivalenced cross-producting
12.12. Decision tree approaches
12.13. Hybrid algorithms
12.14. Conclusions
12.15. Exercises
References
Chapter 13: Switching
Abstract
13.1. Router versus telephone switches
13.2. Shared-memory switches
13.3. Router history: from buses to crossbars
13.4. The take-a-ticket crossbar scheduler
13.5. Head-of-line blocking
13.6. Avoiding HOL blocking via output queuing
13.7. Avoiding HOL blocking via virtual output queuing
13.8. Input-queued switching as a bipartite matching problem
13.9. Parallel iterative matching (PIM)
13.10. Avoiding randomization with iSLIP
13.11. Computing near-optimal matchings via learning
13.12. Sample-and-compare: a stunningly simple adaptive algorithm
13.13. SERENA: an improved adaptive algorithm
13.14. The queue-proportional sampling strategy
13.15. QPS implementation
13.16. Small-batch QPS and sliding-window QPS
13.17. Combined input and output queueing
13.18. Scaling to larger and faster switches
13.19. Scaling to faster link speeds
13.20. Conclusions
13.21. Exercises
References
Chapter 14: Scheduling packets
Abstract
14.1. Motivation for quality of service
14.2. Random early detection
14.3. Approximate fair dropping
14.4. Token bucket policing
14.5. Multiple outbound queues and priority
14.6. A quick detour into reservation protocols
14.7. Providing bandwidth guarantees
14.8. Schedulers that provide delay guarantees
14.9. Generalized processor sharing
14.10. Weighted fair queueing
14.11. Worst-case fair weighed fair queueing
14.12. The data structure and algorithm for efficient GPS clock tracking
14 13 Implementing WFQ and WF2Q
14.14. Quick fair queueing (QFQ)
14.15. Towards programmable packet scheduling
14 16 Scalable fair queuing
14.17. Summary
14.18. Exercises
References
Chapter 15: Routers as distributed systems
Abstract
15.1. Internal flow control
15.2. Internal Link Striping
15.3. Distributed Memory
15.4. Asynchronous updates
15.5. Conclusions
15.6. Exercises
References
Part 4: Endgame
Introduction
Chapter 16: Measuring network traffic
Abstract
16.1. Why measurement is hard
16.2. Reducing SRAM width using DRAM backing store
16.3. A randomized counter scheme
16.4. Maintain active counters using BRICK
16.5. Extending BRICK for maintaining associated states
16.6. Reducing counter width using approximate counting
16.7. Reducing counters using threshold aggregation
16.8. Reducing counters using flow counting
16.9. Reducing processing using sampled NetFlow
16.10. Reducing reporting using sampled charging
16.11. Correlating measurements using trajectory sampling
16.12. A concerted approach to accounting
16.13. Computing traffic matrices
16.14. Sting as an example of passive measurement
16.15. Generating better traffic logs via data streaming
16.16. Counting the number of distinct flows
16.17. Detection of heavy hitters
16.18. Estimation of flow-size distribution
16.19. The Tug-of-War algorithm for estimating F2
16.20. Conclusion
16.21. Exercises
References
Chapter 17: Network security
Abstract
17.1. Searching for multiple strings in packet payloads
17.2. Approximate string matching
17.3. IP traceback via probabilistic marking
17.4. IP traceback via logging
17.5. Detecting worms
17.6. EarlyBird system for worm detection
17.7. Carousel: scalable logging for intrusion prevention systems
17.8. Conclusion
17.9. Exercises
References
Chapter 18: Conclusions
Abstract
18.1. What this book has been about
18.2. What network algorithmics is about
18.3. Network algorithmics and real products
18.4. Network algorithmics: back to the future
18.5. The inner life of a networking device
References
Appendix A: Detailed models
A.1. TCP and IP
A.2. Hardware models
A.3. Switching theory
A.4. The interconnection network Zoo
References
References
References
Index
Copyright
Morgan Kaufmann is an imprint of Elsevier 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
Copyright © 2022 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmi ed in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher's permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described
herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a ma er of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
ISBN: 978-0-12-809927-8
For information on all Morgan Kaufmann and Elsevier publications visit our website at h ps://www.elsevier.com/booksand-journals
Publisher: Mara Conner
Editorial Project Manager: Lindsay C. Lawrence
Production Project Manager: Manchu Mohan
Cover Designer: Ma hew Limbert
Typeset by VTeX
Preface to the second edition
Unless otherwise stated, this preface is wri en by author Xu, who is referred to as “I” in the following. Author Varghese is referred to as “George” in the following.
When George invited me in 2015 to write the second edition and become a co-author of this legendary book, I felt both humbled and honored. I also felt a deep sense of mission, as the bulk of my research and teaching had been on network algorithmics since the mid-1990s and still is.
When I signed the new edition contract with the publisher, I was originally commi ed to significantly revising only three chapters: Chapter 13 (Switching), Chapter 14 (Scheduling packets), and Chapter 16 (Measuring network traffic). In August 2020, at the suggestion of George, I agreed to add two sections, on EarlyBird (Section 17.6) and Carousel (Section 17.7) respectively, to Chapter 17 (Network security), and one section on d-Left approach (Section 10.3.3), to Chapter 10 (Exact-match lookups).
I have made the following major revisions to Chapter 13 (Switching). First, I have made the use of virtual output queueing (VOQ) in switching a separate section (Section 13.7) instead of an integral part of the parallel iterative matching (PIM) algorithm (Section 13.7 in the first edition), since the concept of VOQ was introduced a few years earlier than PIM. Sections 13.11 through 13.16 are entirely new, in which a few more single-crossbar switching algorithms, including Sample-and-Compare, SERENA, QPS, SBQPS, and SW-QPS, are introduced. This addition is critical, because PIM–iSLIP was the only single-crossbar switching algorithm “series” described in the first edition, which did not show readers any alternative way of designing such switching algorithms. These
newly added algorithms provide such an alternative viewpoint. Each of them was selected (to be included in this book) for their conceptual simplicity, low computational and communication complexity, and elegance. In Section 13.17, I have described the combined input and output queueing (CIOQ) proposal that advocates combining switching with packet scheduling for providing QoS guarantees. Finally, I have added a short section (13.18.5) on load-balanced switching, a research topic that just emerged when the first edition went into print. A few other sections have been updated with “modern materials.” For example, in Section 13.18.3 (Clos networks for medium-size routers), I have added a few paragraphs describing how a 3-stage Clos network can be used for data center networking and switching.
I have made the following major revisions to Chapter 14 (Scheduling packets). In Section 14.3, I have added approximate fair dropping (AFD), a low-complexity technique for fair bandwidth allocation. In Sections 14.8 through 14.14, I have added GPS, WFQ, WF2Q, QFQ, and an efficient time complexity per packet) algorithm, called the shape data structure (published in 2004), for tracking the GPS clock, which makes WFQ and WF2Q efficiently implementable (also with time complexity per packet). The fact that WFQ, an algorithm that is known to provide very strong fairness guarantees, is efficiently implementable using the shape data structure is extremely important since WFQ has been widely believed to be not efficiently implementable (with a time complexity of per packet), even to this day. In Section 14.15, I have described two research proposals towards making packet scheduling reprogrammable in switches and routers.
I have made the following major revisions to Chapter 16 (Measuring network traffic). I have added a third SRAM/DRAM hybrid counter array scheme in Section 16.3 to the two such schemes described in Section 16.2. The former is very different than the la er two, as the former is randomized, whereas the la er two are both deterministic. However, all three counter schemes are passive in the sense they allow fast increments but do not allow fast reads. Hence,
in Sections 16.4 and 16.5, I introduce an active counter array scheme called BRICK and a flow-state lookup scheme (that by definition has to be active) called RIH. Finally, in Sections 16.15 through 16.19, I provide a crash course on network data streaming and sketching (DSS). DSS, which originated first in the area of databases, has evolved over the past two decades into a booming research subtopic of network measurement and monitoring. For example, in the past decade or so, SIGCOMM and NSDI together accept several network DSS papers almost every year.
George has made significant updates to Chapters 5 (Copying data); 6 (Transferring control); 7 (Maintaining timers); 11 (Prefixmatch lookups); 12 (Packet classification); 15 (Routers as distributed systems); and 18 (Conclusions).
My work in writing this edition has been supported in part by US National Science Foundation through grants NeTS-1423182, CNS1909048, and CNS-2007006. I have reported my effort and progress every year in the annual or final project reports.
A special thanks to my current and former editors, Lindsay Lawrence and Brian Romer and Todd Green; to my co-author George, who came up with the ingenious term “network algorithmics” that defined the bulk of my research in the past 25 years; to my Ph.D. advisor, Mukesh Singhal, who taught me how to write research papers; to all my collaborators on various network algorithmics topics, especially to Bill Lin; to many colleagues at Georgia Tech, especially to Mostafa Ammar and Ellen Zegura; to former and current School Chairs Lance Fortnow and Vivek Sarkar who gave me a reduced service load for this book-writing effort; to former and current Ph.D. students who adventured in the field of network algorithmics with me and who helped me with drawing figures and tables, proofreading, and fixing 100+ “book bugs” in the first edition collected by George; to anonymous reviewers of this book; to my parents and my brother; to my wife Linda; and to my daughter Ellie.