How DevOps Gained a Foothold in the Data Center Network
Expert silos and manual methods are on their way out, as scale and software-defined networking demand ever more team integration and automation.
September 10, 2018
Adopting Agile development processes for IT operations and infrastructure management to make development and deployment faster and more consistent through automation – also known as DevOps – is now widespread. But what about automating the data center network underneath? The most popular Linux-based automaton and orchestration tools like Ansible, Chef, and Puppet have started to gain a foothold in network management, especially with the arrival of software-defined and intent-based networking and the rise of open networking based on options like Cumulus Linux.
Deepak Giridharagopal, CTO of Puppet, shared observations along those lines in an interview with Data Center Knowledge. A conversation with someone who manages, operates, or buys data centers without thinking about better automation for all aspects of configuring and maintaining the network infrastructure is rare these days, he said.
Driving greater automation primarily are cost pressures and IT wanting higher levels of abstraction for networks. “There's a greater desire to have IT move forward more quickly and respond to change more quickly, but also have a higher degree of reliability, and to do that at lower cost,” Giridharagopal explained.
Chef’s customers who manage data centers view everything through the lens of automation and tools, Corey Scobie, senior VP for product and engineering at the company, told us. “Their core criteria [are] automatability and self-service. Our customers are investing in their data center with the idea that they're going to operate it very much like a like internal cloud service for their own customers.”
A recent survey by F5 Networks and Red Hat Software shows broad adoption of DevOps-style practices for “NetOps.” Lori MacVittie, principal technical evangelist at the F5 office of the CTO, found the adoption surprisingly broad:
48 percent of the organizations are using automation for network configuration management
29 percent use it for compliance checks
26 percent for upgrades
22 percent for testing before or after making changes
24 percent use automation to monitor service availability
These developments are recent, MacVittie told us. “Networking is almost tied with security as being the last holdout where things have not been getting automated,” she said.
Those practices aren’t ubiquitous, even in the teams that say they’re using DevOps principles. More than 75 percent of organizations in the study are still using manual methods to configure VLANs, routers, network access controllers, firewalls, and security services. “They’re embracing it where it makes the most sense to achieve efficiencies but not every project is automated,” MacVittie said.
Projects for applications built through a DevOps pipeline, especially with microservices, are more likely to be automated, she suggested. “Developers are saying we have to move faster, so you have to give me the ability to provision some of these things and move at my own speed. My app needs to be updated every three weeks, and you’re not managing that schedule. Cloud would give it to us, containers enables us to do it, but you’re blocking us.”
Productivity and Uptime at Scale
Automation has obvious productivity advantages, freeing up engineers to design networks for the new application architectures, such as container-based ones. Network admins also get a lot of benefits from reusable, scalable, software-defined automation, testing, and deployment of network resources. Things traditionally configured manually are automated using software, the same way developers and operations teams do it – the concept known as “infrastructure as code.”
And it’s not just about speed, convenience, or support for new architectures. Application uptime is at stake here as well. “If you want to do something at scale, if you have a lot of networking gear, your topology is complex and distributed,” Giridharagopal said. “If you do things by hand, the likelihood [that] you're going to make an error is extremely high.”
According to Uptime Institute, 70 percent of data center outages are directly attributable to human error. Network configuration errors are often to blame.
In addition to automated scripts, tools like Puppet enable engineers to preview changes in advance. “If you model your network as code we can tell you what is that code going to do before you make it go live,” Giridharagopal said. “We can test it; we can simulate what would happen, and you can't do that if you're cut-and-pasting from Notepad, or if you're transcribing things from a wiki page.”
He said he still meets network admins that update core switches and routers by creating a configuration file in a text editor or a spreadsheet, opening up a telnet window or an SSH connection into the router, and copying and pasting. “That’s not maintainable, and it makes it easy to make a mistake. At this point, any new networking gear that comes out has at least a passable management API or service area in front of it. It might be primitive, it might be low-level, but at least it's at a higher, more useful level than just passing a bunch of commands into a terminal session when you're logged into a single switch.”
Orchestration helps with hardware failures too; if a fan or a power supply fails in a switch, monitoring tools can alert you to the problem, and you can run an Ansible playbook, a Puppet manifest, or a Chef cookbook to take the faulty device out of the routing fabric to be replaced and repaired. If traffic levels surge from a DDoS attack, monitoring and automation can push an iptables rule to the network hardware to block specific network sources or the services the attackers are targeting.
More and more data center operators are willing to replace old processes and technology so they can automate better. Automating legacy is too complicated.
“The labor to automate an old-world network is somewhere between enormous and impossible,” Chef’s Scobie said. “Automation is about writing some code that will do change management across a broad scope of hardware in a ubiquitous way; if there is no part of the network that looks ubiquitous or symmetrical, that makes it incredibly hard to do.”
A common solution has been to leave a traditionally-run segment of a data center in place, while building out a new segment that’s automation-ready.
Land of Python and Ruby
Despite rising interest in automation, F5’s survey shows that Python scripts are still the most common tool for network operations, followed by Ansible playbooks, which run from any Python-enabled system and have the widest range of modules to support different network hardware. “Ansible is a great fit for network operations because it’s agentless, it’s easy to get automating quickly, and it reuses existing network CLI commands and other knowledge directly in Ansible Playbooks,” Andrius Benokraitis, product manager for network automation at Red Hat Ansible told us.
Even network devices that weren’t designed to have software agents installed can be managed via SSH or similar API connections, so Ansible can connect to almost all popular networking hardware and or SDN (software-defined network) controllers. “The same manual CLI commands NetOps know and love can be used directly inside Ansible Playbooks with little friction,” Benokraitis said.
Chef and Puppet are the next most popular tools, although Scobie admits there’s a learning curve. “Our foundational technology is based on Ruby, so to create a cookbook or write a recipe, you have to write some Ruby code, and that’s probably a pretty long way away from the typical network operator today,” he said. The company is working on tools like Chef Workstation to simplify that by letting admins pick and choose from community-created content for their mix of network devices and platforms.
The last 10 percent of networking tools in the report is listed only as “other” but likely include options like PowerShell (which runs on both Windows and Linux), especially for hyperconverged infrastructure. For example, Cisco UCS Manager supports PowerShell and Desired State Configuration as well as Python.
Whatever tools you pick for network automation and orchestration, network admins will need both a broader skillset and a different approach, Giridharagopal suggested. “As time goes on, managing a complex network looks less like managing devices and hardware assets and starts to look more like managing software, which is very different from how we’ve managed infrastructure for a long time.”
This goes beyond switching from interactive commands to declarative scripts. A lot of it is thinking about scale and patterns. “If I'm doing this across 5,000 devices at the same time, what does error handling look like? When do I push this out? Can it be triggered automatically or by some event?”
The biggest challenge is that the current ecosystem of automation and orchestration tools is still geared toward developers and operating application infrastructure, F5’s MacVittie noted. Declarative models used by tools like Ansible remove a lot of complexity, but they don’t remove the need for network expertise, she said. “Right now, we’re at a halfway point.” A network operator still relies on IP addresses, still has to understand how things travel on the network, and what services to define. You can’t do any network automation without understanding those things.
A new type of hybrid team that combines networking, operations, and security expertise has now emerged. “But they’re under the same umbrella, as opposed to being three teams with different command-and-control structures,” Scobie said. That’s taking the core principles of DevOps, which is less about specific tools and more about a culture of collaboration between developers and operations teams and applying it throughout the business, including in the way the data center network is managed.
About the Author
You May Also Like