Learning ML

Or, in other words, learning how to learn how machines are learning.

I’ve spent the last two years going on a self-guided learning journey to better understand AI/ML and how it can be creatively applied to the networking field. There are some great resources out there …. but there are also a LOT of resources out there … so this blog post is an attempt to share what’s worked well for me and what I would change if I had to start the journey all over again.

Why machine learning?

Let’s start with the Gartner definition of AIOps.

AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination.

Next-generation network hardware generates a lot of telemetry data and the network is more critical spot than ever. The network team needs to be able to rapidly sift through mountains of data to effectively troubleshoot… This is not an easy task if you’re going about things manually.

Enter AIOps.

With supervised learning we are training the model on data that has the answers – and we are using this information to uncover rules that are unique to our environment. For example, you could uncover the estimated CPU utilization of one of your routers given table state, throughput, and other stats. Or you could determine how likely you are dealing with a faulty cable when the AP is only coming up at 100Mbps. Or you could determine how much the latency will increase for all users on a single 5GHz radio that is running at 55% RF utilization when you place a 4K video stream onto that same AP. Rules like these help determine the health of your network and can trigger a rapid response if something starts go awry.

In short, ML/AI techniques enable the next level of network automation.

Build the foundation first:

If you are interested in learning more about how machines are learning, it’s critical to start with a strong foundation. Many of the more advanced ML/AI courses recommend having experience with Python, linear algebra, calculus, and statistics. Unfortunately… and this was very humbling on my personal journey… sometimes they are not kidding about those prerequisites. You can grasp the intuition of how a model works without the more arcane layers of math but it can really help you lock things in if you understand what the mathematical formula means.

The importance of prerequisites is particularly true with Python. Many curriculums use Python-based Jupyter notebooks. If you are new to Python you run the risk of struggling with the syntax when you should be struggling with the theory behind the models. I highly recommend this course on Udemy to brush up on Python:

https://www.udemy.com/course/100-days-of-code/?course_id=2776760

I’d also avoid starting with deep learning techniques. Everyone is excited about the new cutting-edge systems but having ML fundamentals in place first is a big help.

Exploratory Data Analysis is important:

“Even in such technical lines as engineering, about 15% of one’s financial success is due one’s technical knowledge and about 85% is due to skill in human engineering, to personality and the ability to lead people.” -Dale Carnegie

I’d argue that the foundational 15% is extremely important, but I believe that the underlying point here is solid – if you are not able to communicate your ideas effectively you will be at a severe disadvantage.

One of the first (and more involved) stages when building out a new model is the Exploratory Data Analysis (EDA) phase. This involves sifting through through the dataset, identifying valid features, feature engineering, looking for initial correlations and trends, and other similar tasks. There are common software packages that come in handy like Pandas/NumPy for working with datasets and Seaborn/Matplotlib/GGPlot for visualizing the resulting data. Learning Exploratory Data Analysis techniques (EDA) has made a huge difference in my personal troubleshooting approach.

Often in networking we have to give the answer “It Depends.” I don’t think it’s possible to get away from that answer entirely as there are a lot of variables. But by harnessing the power of APIs to programmatically pull data and then leveraging popular data visualization techniques you can cleanly describe network state and make data-driven decisions.

I’ve published a complementary blogpost to demonstrate how you can use Jupyter notebooks to programmatically troubleshoot a large-scale network.

DON’T FORGET ABOUT THE INFRASTRUCTURE:

This is a recreation of an illuminating graph that was presented by Andrej Karpathy, director of AI at Tesla:

Many courses and materials are focused on the theory behind the model. But when implementing all of this into a production environment the model itself is only one small piece of the puzzle – most of the work involved is curating the data itself. It can be very useful to understand the data formats, data storage options, data pipelines, training schedules, and other similar infrastructure-side systems.

I personally enjoyed the AWS machine learning curriculum. If you want to learn about the differences between data lakes and data warehouses check this course out:

https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/?course_id=6125705

General Recommendations:

I have a multi-threaded learning style – I like to pull from multiple sources at once and tie concepts together as I go. I did not follow this exact list going cover-to-cover. But I highly recommend these resources and the order is somewhat intentionally tailored.

Becoming a Data Head – Alex J. Gutman and Jordan Goldmeier

This is a great overview of the field that focuses on how to think, speak, and understand data science, statistics, and machine learning. In other words – this should give you a good idea if data science is something that you want to pursue further.

Stanford | Deep Learning AI Machine Learning Specialization

I wish I had started with this course. Andrew Ng has a great presentation style and this curriculum does a great job of laying the foundations. This is a three course series that will likely take 2-3 months to complete. It is self paced.

https://www.coursera.org/specializations/machine-learning-introduction/

Machine Learning for Network and Cloud Engineers – Javier Antich

This is a uniquely valuable perspective that highlights how network engineers can take advantage of AI/ML concepts. Other books will focus on more general data science projects like accurately estimating home prices – but Javier gives examples of how you can use clustering techniques to identify issues across your SD-WAN deployment.

Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow – Aurelien Geron

This is a great reference book that shows hands-on examples of topics ranging from decision trees to advanced generative adversarial networks. The second chapter alone is worth the price of admission as it shows an end-to-end machine learning project that you can spin up in your own environment.

Designing Machine Learning Systems – Chip Huyen

Finally, this book does a great job of highlighting the considerations around what it takes to actually stand up AI/ML projects at scale. Easy-to-read with lots of great examples from production projects.

IN CLOSING:

It’s a really fun time to be in the networking space. Hopefully this turns out to be a helpful list for those of you that are interested in brushing up on AI/ML techniques and learning how these can help optimize your network.

As a quick disclaimer… I am not getting any kind of referral bonuses or similar perks from these recommendations and these recommendations are my own; they are not a reflection on my employer. I am still continuing this journey myself as well so if you find something else out there that you think is valuable – please let me know!

Leave a comment