SmartNICs are Complicated, So Intel Pitches NICs That are Smart Enough
Ethernet 800 Series NICs can be tweaked to adjust to workload requirements, and it doesn’t take a PhD.
April 2, 2019
The networks powering hyperscale cloud platforms are changing, but enterprise data center networks are going through a similar transformation, albeit at a much slower pace. A solution Intel announced Tuesday is aimed at this trend.
The ever growing volume of data has become not only something to be stored but something to be analyzed in many different ways, forcing networks to get both faster and smarter. Microservices and containers, AI workloads, video are all pushing up the amount of east-west traffic inside and between data centers to levels that overwhelm even the 100Gbps network fabrics like Facebook’s.
Hyperscale clouds have been dealing with this for some time. But with more mainstream data centers starting to face the same issues, more mainstream approaches to making networks more intelligent are now emerging.
The Ethernet 800 Series NIC Intel is launching to accompany its new second-generation Xeon Scalable CPUs (also announced Tuesday) has multiple programmable features for customization. You can add new protocols to the ASIC and create dedicated queues to filter key application traffic into to improve throughput and latency.
It might sound like a SmartNIC, but it isn’t. The Ethernet 810 Network Adapter is very different from the FPGA-based SmartNICs Microsoft uses inside Azure, or even the multi-core Arm-based SmartNICs Mellanox offers. The complexity of using technologies like DPDK and eBPF to reprogram network cards for your specific workloads puts a solution like Microsoft’s out of reach for many organizations. While the 810 isn’t as flexible, it can be customized, and it’s not as hard to do.
Its Dynamic Device Personalization feature lets you add new profiles for network protocols and tunneling options as they become useful, so the NIC can intelligently filter and route traffic directly to the right virtual machine or into a dedicated application queue. Installing a new profile (or more likely a package of profiles that fits your industry) as a standard device driver is much simpler than programming a SmartNIC and fits better with existing management tools, Intel solution architect Brian Johnson said.
He compared it to routing packages that don’t have complete address labels and must be sent to the mailroom rather than going straight to the right department. “That’s a bottleneck,” he said. “The more we get into the header and the actual payload, the better. If we don’t understand the protocol, all we can do is mark it as a layer-two or layer-three payload. With DDP, instead of the application or the operating system interrogating the packets and separating them out, we’re teaching the NIC, and it just puts them [in the right queue].”
Adding the Point-to-Point Protocol Over Ethernet (PPPoE) profile to the network card in a remote-access server could deliver significantly more upstream processing performance, taking full advantage of the 100Gb ports.
Doing this in the NIC is more efficient than handling it in software. “If I want all my MPLS traffic to go on these queues, I don't have to do that in software. Allowing the NIC to do some of the parsing, filtering, and routing is easier than trying to build software to look at this packet and say that goes over here,” Johnson said.
There are too many different protocols for security, network edge, and network virtualization to program them all into a single network card; not every network will need them all, plus, edge protocols are still in flux and more will be developed in future. Having a programmable pipeline lets you pick the protocols you need and update them as they evolve.
The GTPv1 and PPPoE protocols will be available when the adapters ship in Q3 of 2019, with others like QUIC, IPsec, VXLAN-GPE, and different MPLS profiles following.
The queues that a DDP protocol routes data into might be the new Application Device Queues which group traffic for specific applications into dedicated queues to improve throughput, latency, and variability of response times (which become an issue as compute scales out). Anil Vasudevan, senior principal engineer at Intel’s data center engineering and architecture group, compared ADQ to express lanes on highways.
“By reducing jitter you can scale applications and parallelize tasks with more servers or support more users with existing hardware,” he explained. As well as connecting application execution threads to specific data queues dedicated to the particular app, ADQ lets you partition bandwidth between those queues. On a 100Gb card, for example, you might want to prioritize traffic by dedicating 60Gb to one application and just 20Gb to another.
Intel has contributed ADQ drivers to the Linux 4.18 network stack and is working with Microsoft to get support in Windows Server. The company also plans to use ADQ with eXpress Data Path (AFXDP, a kernel bypass technology it’s contributing to Linux networking) to extend virtual network functions like load balancing and intrusion detection to cloud services, where hardware abstractions mean giving VMs exclusive access to hardware for higher performance is difficult. This is again simpler than DPDK, requiring only seven lines of code to do packet steering, which would take a thousand lines with DPDK.
Network cards that are both smart enough and simple enough could help more organizations solve their network traffic and workload scaling issues. The ability to prioritize specific applications and give them dedicated bandwidth means that you can treat this as a business rather than a technology issue. You can specify the optimal level of latency, throughput, and predictability each application requires.
The same principles of customization and personalization appear with the second-generation Xeon Scalable processors, whose performance-management features give you per-core quality-of-service control, so you can set the same CPU to be optimized either for VM density or for workload performance, and use an orchestrator like Kubernetes to manage that. An Optane-based server can be optimized for a workload like SAP HANA that’s all about memory, for example, but if that particular workload isn’t in demand at a certain point in time, you can re-optimize the Optane storage memory to tune it for other workloads.
“For me, this is part of making the infrastructure more consumable as code in new developments, so they can be secure and match the users’ requirements,” Ovum analyst Roy Illsley told Data Center Knowledge. But, he noted, this technology is still in the very early stages, “so, to succeed it has to be simple.”
About the Author
You May Also Like