This is part 1 of a 5-part tour of machine learning in Python. In this part I provide a high-level overview of optimization methods for machine learning, and in particular, for training models. Most importantly, we see the motivation for gradient descent and provide some justification for this hugely popular approach, which is the foundation of many many other model training methods. As much as possible, I rely on a "from scratch" approach, avoiding high-level libraries like scikit-learn.