Data School/Introduction to Machine Learning with scikit-learn

  • Free

Introduction to Machine Learning with scikit-learn

Learn the fundamentals of Machine Learning in Python with this FREE 4-hour course!

Who should take this course?

This is the perfect course for you if:

  • You're new to Machine Learning

  • You have Machine Learning experience, but you're new to scikit-learn

  • You've used scikit-learn, but you don't know if you're doing things the "right" way

What topics are covered in the course?

  • What is Machine Learning?
  • Why use scikit-learn?
  • Installing scikit-learn & Jupyter notebook
  • Jupyter notebook basics
  • Machine Learning terminology
  • Machine Learning workflow
  • Loading a dataset using pandas
  • Preprocessing categorical features
  • Model training & prediction
  • Regression with Linear Regression
  • Classification with K-nearest neighbors
  • Classification with Logistic Regression
  • Model evaluation with train/test split
  • Model evaluation with cross-validation
  • Metrics for regression & classification
  • Hyperparameter tuning with grid search & randomized search

The course is AMAZING! Well-structured and very informative. I like the level of depth and the length of each module. I also love the additional resources you put together, and your brief introduction to these resources at the end of each module.

I can really apply what I learned to my job. You found a good balance between practical and theoretical. I've followed your YouTube channel and will continue learning.

- W.Z.

Why should you learn scikit-learn?

If you want to solve any Machine Learning problem in Python, then I always recommend using scikit-learn. Here's why:

  • It provides a consistent interface to a huge number of Machine Learning models

  • It offers many options and tuning parameters (but uses sensible defaults)

  • It includes a rich set of functionality to support the entire Machine Learning workflow

  • It has exceptional documentation

  • There is active community of researchers and developers who continue to improve and support the library

In fact, more than 80% of data scientists use scikit-learn, according to Kaggle's recent "State of Machine Learning" report.

Aren't these videos already on YouTube?

I uploaded this series to YouTube in 2015, and it has since gotten more than 2 million views. Still today, I believe these videos are the single best way to learn the fundamentals of Machine Learning and scikit-learn.

Here's why you'll have a better learning experience by taking the course here:

  1. You can watch the videos without ads

  2. You can save your progress at any time, and return later to the exact same spot

  3. You can download the course notebooks, which I've updated to use Python 3.9 and scikit-learn 0.23

  4. Below each video, you can read my detailed notes about what has changed since the recording

  5. After each video, you can take an interactive quiz to check your understanding (50+ questions total)

  6. You can access my recommended resources for deepening your understanding (80+ links)

  7. You can post your own questions, and I'll do my best to respond

  8. After completing the course, you'll receive a certificate of completion

Join 10,000+ happy students...

Diogo G.

A M A Z I N G ! In one day I've learned what I need to get into Machine Learning in Python and scikit-learn.

Mo Daghlas

The way you break down steps and deliver new information is fantastic! It doesn’t feel rushed at all, and you take the time to explain all the new terminology, steps and methodology concisely.

Bob H.

Your new ML videos are fantastic. They assume nothing and explain everything.

Rafael K.

This is the best introduction to Machine Learning I have *EVER* seen. Thank you for fueling my confidence that I can master this subject!

B. F.

This is the first time I've studied Machine Learning. You are doing an outstanding job of transforming it from a science fiction term into a tangible subject.

Florine N.

This video will get me a promotion at work <3 Thank you so much!

Kashyap M.

I've always wanted to learn about Machine Learning, but any videos I found were either too complicated or too long. I found your videos and it is like a golden oasis.

Guillaume B.

Your videos are absolutely incredible. I have just completed the course on Machine Learning with Python and I can say I understood every single thing thanks to your excellent teaching style and skills.

Robin B.

This was a FANTASTIC video series. You are very easy to follow and this was the first resource I found that really walked through the Python language basics in terms of Machine Learning. Also this really helped me understand the documentation on scikit-learn so that I can apply it to more complicated models.

Navin K.

I have seen a lot of stuff on Machine Learning but couldn't get it. Your work is amazing. You let me know that I can learn ML too.

FAQs

What do I need to know before the course?

  • You don't need to have any experience with Machine Learning.
  • You do need to know how to write basic Python code. If you're new to Python, I recommend enrolling in Python Essentials for Data Scientists first.

What software do I need to install?

  • If you want to code along with me, you'll need to install the following Python libraries: scikit-learn, pandas, matplotlib, and seaborn.
  • You can use any code editor that you like, though I'll be using the Jupyter notebook.
  • The easiest way to install these libraries (plus the Jupyter notebook) is by downloading the free Anaconda distribution, which I discuss in the course.

What software versions do I need?

  • I updated the course notebooks using Python 3.9 and scikit-learn 0.23, but they should work perfectly with any version of Python 3 and any recent version of scikit-learn (0.20 or later).
  • I recorded the videos using Python 2.7 and scikit-learn 0.16, so there will be some differences between the notebooks and the videos. However, below each video I explain what changes I made to the code and why.

What if I need help during the course?

You can post a question below any video, and I'll do my best to respond!

How do I earn a certificate of completion?

Once you have watched all of the lessons and attempted all of the quizzes, you can request a certificate of completion.

How long will I have access to the course?

You will have lifetime access to the videos, quizzes, and notebooks.

What course should I take after this one?

You should take my follow-up course, Master Machine Learning with scikit-learn.

Should I learn deep learning instead of scikit-learn?

For many Machine Learning problems, scikit-learn will be the only library you need. However, there are some specialized problems for which a deep learning library can generate superior results.

That being said, deep learning does have some significant disadvantages:

  • Deep learning requires more computational resources
  • Deep learning libraries have a higher learning curve
  • Deep learning models are less interpretable
Thus, I only recommend using deep learning if you already know that you need it to solve your particular problem. But for the majority of Machine Learning problems, you are likely to get similar results using scikit-learn, and you will get those results much faster and easier with scikit-learn.

Course Outline

Introduction

Welcome to the course!
Download the course notebooks

1. What is Machine Learning, and how does it work?

  • What is Machine Learning?
  • What are the two main categories of Machine Learning?
  • What are some examples of Machine Learning?
  • How does Machine Learning "work"?
Lesson 1
Quiz 1

2. Setting up Python for Machine Learning: scikit-learn and Jupyter Notebook

  • What are the benefits and drawbacks of scikit-learn?
  • How do I install scikit-learn?
  • How do I use the Jupyter Notebook?
  • What are some good resources for learning Python?
Lesson 2
Quiz 2

3. Getting started in scikit-learn with the famous iris dataset

  • What is the famous iris dataset, and how does it relate to Machine Learning?
  • How do we load the iris dataset into scikit-learn?
  • How do we describe a dataset using Machine Learning terminology?
  • What are scikit-learn's four key requirements for working with data?
Lesson 3
Quiz 3

4. Training a Machine Learning model with scikit-learn

  • What is the K-nearest neighbors classification model?
  • What are the four steps for model training and prediction in scikit-learn?
  • How can I apply this pattern to other Machine Learning models?
Lesson 4
Quiz 4

5. Comparing Machine Learning models in scikit-learn

  • How do I choose which model to use for my supervised learning task?
  • How do I choose the best tuning parameters for that model?
  • How do I estimate the likely performance of my model on out-of-sample data?
Lesson 5
Quiz 5

Intermission

Can I ask you a quick favor?

6. Data science pipeline: pandas, seaborn, scikit-learn

  • How do I use the pandas library to read data into Python?
  • How do I use the seaborn library to visualize data?
  • What is linear regression, and how does it work?
  • How do I train and interpret a linear regression model in scikit-learn?
  • What are some evaluation metrics for regression problems?
  • How do I choose which features to include in my model?
Lesson 6
Quiz 6

7. Cross-validation for parameter tuning, model selection, and feature selection

  • What is the drawback of using the train/test split procedure for model evaluation?
  • How does K-fold cross-validation overcome this limitation?
  • How can cross-validation be used for selecting tuning parameters, choosing between models, and selecting features?
  • What are some possible improvements to cross-validation?
Lesson 7
Quiz 7

8. Efficiently searching for optimal tuning parameters

  • How can K-fold cross-validation be used to search for an optimal tuning parameter?
  • How can this process be made more efficient?
  • How do you search for multiple tuning parameters at once?
  • What do you do with those tuning parameters before making real predictions?
  • How can the computational expense of this process be reduced?
Lesson 8
Quiz 8

9. Evaluating a classification model

  • What is the purpose of model evaluation, and what are some common evaluation procedures?
  • What is the usage of classification accuracy, and what are its limitations?
  • How does a confusion matrix describe the performance of a classifier?
  • What metrics can be computed from a confusion matrix?
  • How can you adjust classifier performance by changing the classification threshold?
  • What is the purpose of an ROC curve?
  • How does Area Under the Curve (AUC) differ from classification accuracy?
Lesson 9
Quiz 9

10. Building a Machine Learning workflow

  • Why should you use a Pipeline?
  • How do you encode categorical features with OneHotEncoder?
  • How do you apply OneHotEncoder to selected columns with ColumnTransformer?
  • How do you build and cross-validate a Pipeline?
  • How do you make predictions on new data using a Pipeline?
  • Why should you use scikit-learn (rather than pandas) for preprocessing?
Lesson 10
Quiz 10

Conclusion

Can I ask you a quick favor?
Request your certificate of completion
Take another course from Data School!
Earn money by promoting Data School's courses!

👋 Welcome to Data School!

My name is Kevin, and I've taught Data Science in Python to over a million students.

My courses explain data science topics in a clear, thorough, and step-by-step manner.

I'd love to teach you, regardless of your educational background or professional experience.

Thanks for joining me! 🙌