An efficient pipeline processing scheme for programming Protocol-independent Packet Processors

https://doi.org/10.1016/j.jnca.2020.102806Get rights and content

Abstract

OpenFlow is unable to provide customized flow tables, resulting in memory explosions and high switch retirement rates. This is the bottleneck for the development of SDN. Recently, P4 (Programming Protocol-independent Packet Processors) attracts much attentions from both academia and industry. It provides customized networking services by offering flow-level control. P4 can “produce” various forwarding tables according to packets. P4 increases the speed of custom ASICs. However, with the prevalence of P4, the multiple forwarding tables could explode when used in large scale networks. The explosion problem can slow down the lookup speed, which causes congestions and packet losses. In addition, the pipelined structure of forwarding tables brings additional processing delay. In this study, we will improve the lookup performance by optimizing the forwarding tables of P4. Intuitively, we will install the rules according to their popularity, i.e., the popular rules will appear earlier than others. Thus, the packets can hit the matched rule sooner. In this paper, we formalize the optimization problem, and prove that the problem is NP-hard. To solve the problem, we propose a heuristic algorithm called EPSP (Efficient Pipeline Processing Scheme for P4), which can largely reduce the lookup time while keeping the forwarding actions the same. Because running the optimization algorithm frequently brings additional processing burdens, wedesign an incremental update algorithm to alleviate this problem. To evaluate the proposed algorithms, we set up the simulation environments based on ns-3. The simulation results show that the algorithm greatly reduces both the lookup time and the number of memory accesses. The incremental algorithm largely reduces the processing burdens while the lookup time remains almost the same with the non-incremental algorithm. We also implemented a prototype using floodlight and mininet. The results show that our algorithm brings acceptable burder, and performs better than traditional algorithm.

Introduction

Researchers have made tremendous effort to improve the “stiff” network architecture, and have achieved a better network programming framework (Li et al., 2018; Gao et al., 2018). Recently, P4 (Programming Protocol-independent Packet Processors) (Bosshart et al., 2014; Wang et al., 2019) has attracted wide attentions from academia and industry, due to its characteristics of flexibility and protocol-independence. The P4 paper (Bosshart et al., 2014) published by Nick McKeown et al. has attracted wide attentions. McKeown et al. subsequently published the documents such as The P4 Language Specification, Barefoot White Paper, etc. ONF has also established an open source project —— PIF(the Protocol Independent Forwarding). The results of the PIF project will be used to design the next generation of OpenFlow protocol. Many studies (Shah et al., 2018; Demirci and Sagiroglu; Bradai et al., 2015) on protocol-independent forwarding devices have been proposed. P4 is now widely adopted in networks. Using P4, developers can easily develop new applications in SDN (Software-Defined Networking) controllers, install and query rules in the target SDN switch.

As described in OpenFlow protocol, each OpenFlow logical switch contains multiple flow tables in the pipeline. Each flow table stores one or more flow rules/entries. An OpenFlow switch must have at least one flow table, and can optionally have more flow tables. We assume that each switch has more than one flow table. The flow tables are numbered in sequence starting from 0, as shown in Fig. 1. OpenFlow provides a programmable control plane and the data plane can rely on the control plane entirely. Multiple flow tables are stored sequentially on the forwarding device, forming a pipeline. While P4 gives us a programmable data plane. P4 changes the way the packets are handled with OpenFlow. Traditionally, a front-end compiler will parse the P4 program and compose a set of “match ​+ ​action tables”, which equivalent to the pipeline in the OpenFlow Standard (Openflow switch specifica, 2012). Besides, according to OpenFlow, the flow table can have customized matching fields. Multiple flow tables introduce complexity and overhead (The benefits of multiple, 2015). However, the number of flow rules a commodity OpenFlow switch can support is far below what is needed (Stephens et al., 2012). P4 solves this problem with allowing custom headers. With P4, the compiler finds matching fields that the packets will use. P4 improves the flexibility of configured switches through custom forwarding tables. Reduces memory requirements and the elimination rate of hardware. This flexibility enables terbit-level rates on a configured ASIC (Bosshart et al., 2014).

In OpenFlow, Pipeline is composed of flow tables, and in P4, Pipeline is composed of “match ​+ ​action tables”. In the following paper, when we focus on the technical aspect, we use “forwarding table(s)" uniformly for the purpose of simplicity.

Although P4 brings flexibility, it needs multiple forwarding tables. The forwarding tables may exponentially increase the number of rules, and cause explosion in forwarding table. The explosion can further degrade the performance of switches/routers, including increasing access frequencies in memory and slowing down the lookup speed.

In this paper, we will add a module in the P4 compiler, which can optimize the forwarding table when parsing the P4 program. The greatest challenge is how to optimize the forwarding tables by properly placing flow rules across a pipeline of flow tables. A logical OpenFlow switch contains a number of sequential flow tables. When packets arrive at a switch, the lookup process starts from the first table in the pipeline and goes forward until matching some rule in a table, or reach the end of the pipeline. If the matching rules are installed at the end of the pipeline, the lookup performance is significantly reduced. The performance would be even worse if such a rule is frequently matched. Unfortunately, designing an optimized rule placement algorithm is not straightforward, because we need to maintain the semantics of the forwarding table. In other words, the forwarding actions should remain the same after optimization.

Many recent works have studied the problem of rule placement for OpenFlow (Curtis et al., 2011; Kanizo et al., 2013; Kang et al., 2013; Nguyen et al., 2014; Yu et al., 2010). Some of them proposed solutions for better flow management (Curtis et al., 2011), which focuses on achieving high-level traffic management goals (e.g., access control, load balancing) with minimum OpenFlow rules required. Some other works proposed caching popular rules (Katta et al., 2014; Liu et al., 2013; Huang and Guo, 2015; Zadnik and Canini, 2011) to accelerate lookup speeds without upgrading the forwarding table. In this paper, we will provide fine granularity control over rule placement to optimize lookup speeds.

The new scheme we propose is called EPSP (Efficient Pipeline Processing Scheme for P4), which can efficiently parse P4 programs and configure the OpenFlow pipeline forwarding tables. In particular, rules are reordered by migration to improve efficiency. Popular rules are placed in front of the pipeline and can be matched earlier to accelerate the lookup process.

However, as rules may overlap with each other, thus there exist complex dependencies among them. Moving an individual rule directly will break the semantics of forwarding tables, and thus we need to carefully migrate rules within different tables in the pipeline. In EPSP, interdependent rules are grouped and migrated as a whole. To extract rule dependencies, we use a directed acyclic graph (DAG) to denote the dependencies between rules, and design algorithms for migrating rules to optimize the lookup performance.

However, updating a single rule in EPSP will incur re-optimization and bring additional processing burdens, while updates may occur frequently. Thus, updates will become resource-intensive and occupy most of the CPU resources, and further slow down lookup speeds. We need to develop incremental update algorithms to minimize such burdens. In this paper, we develop an incremental algorithm In-EPSP, where a rule update only causes minimum computations, and a global optimization only happens after a long time periodically.

We implemented the algorithm on two platforms, ns-3 and floodlight. We conduct comprehensive simulations based on ns-3 network simulator. The simulation results show that EPSP can greatly reduce the number of lookups in forwarding tables and increase lookup speeds. Averagely, EPSP can reduce the lookup times to half of the traditional solutions. With the incremental algorithms, the improvement is almost the same, while the update burden is greatly reduced. The deployment result on the floodlight also proves that the algorithm optimizes the CPU load.

The remainder of this paper is organized as follows. Section II describes the background and basic pipeline of P4 and OpenFlow. Section III introduces some related works about P4 and improvement based on OpenFlow. Section IV formulates the problem and section V designs the algorithm of EPSP, where data structure and optimization process are elaborated. Section VI presents the incremental update algorithm. Section VII evaluates EPSP, and The final section concludes the paper and discusses about the future work.

Section snippets

The framework of EPSP

This section introduces the framework of EPSP, but we review the OpenFlow pipeline and the workflow of P4 firstly.

P4, SDN and OpenFlow

SDN (Xia et al., 2015; Masoudi and Ghaffari, 2016; Wang et al., 2017) separates the control and forwarding plane, and gives network operators programmatic control the data networks. Within SDN, one control plane can control multiple forwarding devices. Each device is equipped with a common and open interface such as OpenFlow (McKeown et al., 2008; Openflow switch specifica, 2011), which is a fact in the industry. To achieve flexible programming over networks, P4 (Bosshart et al., 2014; Shah

Problem formalization

Using protocol-independent switch to process packets, we first need to define the parser to identify the packets’ IP address, TTL, protocol type, and other matching fields of the packets. Then define the matching fields in “match ​+ ​action tables”, such as EtherType, MacSa, Protocol, Vlan, etc. We use IPv4 packets as an example, so we can use IP addresses as matching field. The front-end compiler translates the P4 program to an IR (Intermediate Representation). What we focused on is the IR

EPSP: An heuristic Approach

Within the forwarding graph, EPSP merges and migrates rules. EPSP is designed to reduce the lookup time. The heuristic approach and related algorithms are presented in this section.

Problems in update

After using the EPSP, the forwarding tables execute lookup, and find the rule R that matches the packet best (e.g. longest match principle), and increase the weight of the rule R. If the weight exceeds the threshold, it is selected into the Hot-Rule set. When a new rule is inserted into the Hot-Rule set, the Hot-Rule set attempts to merge the rule. If there are more than one rules that can be merged, the switch performs the merge operation and update all tables. If there are no rule that can be

Performance evaluation and Analysis

This section evaluates the performance of our model.

Conclusions

The process of looking up a matched rule can be improved if more popular rules appear earlier in P4. Unfortunately, a simple sorting algorithm is not feasible since the relative order of certain rules must be kept. In this paper, the problem of determining actions on packets with a minimal number of packet lookups has been formalized and proven to be NP-hard. We have proposed EPSP, migrating flow rules between tables to minimize the average number of lookups to determine the actions on a

CRediT authorship contribution statement

Shu Yang: Conceptualization, Validation, Writing - review & editing. Lu Bai: Software, Data curation, Writing - original draft. Laizhong Cui: Methodology, Supervision. Zhongxing Ming: Formal analysis, Software. Yulei Wu: Investigation, Writing - review & editing. Shui Yu: Visualization, Software, Supervision. Hongfei Shen: Project administration, Methodology. Yi Pan: Writing - review & editing, Data curation.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported in part by the National Key Research and Development Plan of China under Grants 2018YFB1800302 and No.2018YFB1800805, National Natural Science Foundation of China under Grants 61772345 and 61902258, Major Fundamental Research Project in the Science and Technology Plan of Shenzhen under Grants JCYJ20190808142207420 and GJHZ20190822095416463, and the Pearl River Young Scholars funding of Shenzhen University

Shu Yang received his B·Sc. degree from Beijing University of Posts and Telecommunications and Ph.D. degree from Tsinghua University. He is currently an associate researcher in the College of Computer Science and Software Engineering at Shenzhen University, China. His research interest includes network architecture and high performance router.

References (41)

  • Z. Chen et al.

    A new lookup model for multiple flow tables of open flow with implementation and optimization considerations

  • A.R. Curtis et al.

    Devoflow: scaling flow management for high-performance networks

    Proc. The 2011 Conference of the ACM Special Interest Group on Data Communication, SIGCOMM '11

    (August. 2011)
  • S. Demirci, S. Sagiroglu, Optimal placement of virtual network functions in software defined networks: a survey, J....
  • O.E. Ferkouss et al.

    A 100gig network processor platform for openflow

    (October. 2011)
  • K. Gao et al.

    Trident: toward a unified sdn programming framework with automatic updates

    Proc. The 2018 Conference of the ACM Special Interest Group on Data Communication, SIGCOMM'18

    (August. 2018)
  • J. Ge et al.

    H-soft:A heuristic storage space optimisation algorithm for flow table of openflow

    Journal of Concurrency and Computation: Practice and Experience

    (2014)
  • H. Huang et al.

    Multi-flow oriented packets scheduling in openflow enabled networks

  • N. Kang et al.

    Optimizing the one big switch abstraction in software- defined networks

    Proc. The 9th ACM Conference on Emerging Networking Experiments and Technologies, CoNEXT'13

    (2013)
  • Y. Kanizo et al.

    Palette: distributing tables in software-defined networks

    Proc. The 2013 IEEE International Conference on Computer Communicatio (INFOCOM), INFOCOMM'13

    (July. 2013)
  • N. Katta et al.

    Infinite cacheflow in software-defined networks

    Proc. The 3rd Workshop on Hot Topics in Software Defined Networking, HotSDN '14

    (August. 2014)
  • Cited by (6)

    • A Defense Strategy Against LDDoS Attack Aggregation in DCN

      2023, IEEE International Conference on Communications
    • An Enhanced Data Plane for Network Event Processing in Software Defined Networking

      2020, Proceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020

    Shu Yang received his B·Sc. degree from Beijing University of Posts and Telecommunications and Ph.D. degree from Tsinghua University. He is currently an associate researcher in the College of Computer Science and Software Engineering at Shenzhen University, China. His research interest includes network architecture and high performance router.

    Lu Bai is a graduate student in the College of Computer Science and Software Engineering at Shenzhen University, China. She will get her master degree in 2021. Her research interest includes software-defined network and blockchain.

    Laizhong Cui(corresponding author) received the B.S. degree from Jilin University, Changchun, China, in 2007 and Ph.D. degree in computer science and technology from Tsinghua University, Beijing, China, in 2012. He is currently an associate professor in the College of Computer Science and Software Engineering at Shenzhen University, China. He led the projects of the National Key Research and Development Program of China and the National Natural Science Foundation, and several projects of Guangdong Province and Shenzhen City. His research interests include future Internet architecture, edge computing, big data, IoT, computational intelligence, software-defined network and machine learning.

    Zhongxing Ming is currently an associate research fellow in the College of Computer Science and Software Engineering, Shenzhen University. He received his Ph.D. degree from the Department of Computer Science and Technology, Tsinghua University, in 2015. He obtained his B.Eng. from the College of Software Engineering, Jilin University, in 2009. His research interests include future internet architecture, internet of things, blockchain and edge computing.

    Yulei Wu received the Ph.D. degree in Computing and Mathematics and the B·Sc. degree (1st Class Hons.) in Computer Science from the University of Bradford, United Kingdom, in 2010 and 2006, respectively. He is a Senior Lecturer in the Department of Computer Science with the University of Exeter, United Kingdom. His expertise is on networking and his main research interests include intelligent networking technologies, network slicing and softwarization, etc. He is an Editor of IEEE Transactions on Network and Service Management, Elsevier Computer Networks and IEEE Access. He contributes to major conferences on networking as various roles including a Steering Committee Chair, a General Chair, a Program Chair, and a Technical Program Committee Member. He is a Senior Member of the IEEE, and a Fellow of the HEA (Higher Education Academy.

    Shui Yu is a Professor of School of Computer Science, University of Technology Sydney, Australia. Dr Yu's research interest includes Security and Privacy, Networking, Big Data, and Mathematical Modelling. He is a Senior Member of IEEE, a member of AAAS and ACM, and a Distinguished Lecturer of IEEE Communication Society.

    Hongfei Shen received the B.S. degree from Southeast University, China in 2002. He received the M.S. degree from Nanjing University of Posts and Telecommunications in 2010. He is currently a manager in China Telecom Co. Ltd., Shenzhen Branch.

    Yi Pan is currently a Regents' Professor and Chair of Computer Science at Georgia State University, USA. He has served as an Associate Dean and Chair of Biology Department during 2013–2017 and Chair of Computer Science during 2006–2013. Dr. Pan received his B.E. and M.E. degrees in computer engineering from Tsinghua University, China, in 1982 and 1984, respectively, and his Ph.D. degree in computer science from the University of Pittsburgh, USA, in 1991. His profile has been featured as a distinguished alumnus in both Tsinghua Alumni Newsletter and University of Pittsburgh CS Alumni Newsletter.Dr.Pan'sresearchinterestsincludeparallelandcloudcomputing, big data, and bioinformatics. Dr. Pan has published more than 250 journal papers with over 90 papers published in various IEEE journals. In addition, he has published 194 papers in refereed conferences. He has also coauthored/co-edited 44 books. His work has been cited more than 10,000 times in Google Scholar and his current H-index is 54. Dr. Pan has served as an editor-in-chief or editorial board member for 20 journals including 7 IEEE Transactions. He is the recipient of many awards including an IEEE Transactions Best Paper Award, IEEE conference best paper awards, andseveral other conference and journal best paper awards, 4 IBM Faculty Awards, 2 JSPS Senior Invitation Fellowships, IEEE Outstanding Leadership Award, IEEE BIBE Outstanding Achievement Award, NSF Research Opportunity Award, and AFOSR Summer Faculty Research Fellowship. He has organized many international conferences and delivered keynote speeches at over 60 international conferences around the world.

    This paper is an extendable version of the IEEE LCN 2016 paper (Wu et al., 2016). We made some additional contributions, including: propose an incremental update algorithm, add some experiments, use P4 switch.

    View full text