Introduction to Machine Learning with scikit-learn

Learn the fundamentals of Machine Learning in Python with this free 4-hour course!
Introduction
Welcome to the course!
Download the course notebooks
907 KB
1. What is Machine Learning, and how does it work?
  • What is Machine Learning?
  • What are the two main categories of Machine Learning?
  • What are some examples of Machine Learning?
  • How does Machine Learning "work"?
Lesson 1
11 mins
Quiz 1
2. Setting up Python for Machine Learning: scikit-learn and Jupyter Notebook
  • What are the benefits and drawbacks of scikit-learn?
  • How do I install scikit-learn?
  • How do I use the Jupyter Notebook?
  • What are some good resources for learning Python?
Lesson 2
15 mins
Quiz 2
3. Getting started in scikit-learn with the famous iris dataset
  • What is the famous iris dataset, and how does it relate to Machine Learning?
  • How do we load the iris dataset into scikit-learn?
  • How do we describe a dataset using Machine Learning terminology?
  • What are scikit-learn's four key requirements for working with data?
Lesson 3
16 mins
Quiz 3
4. Training a Machine Learning model with scikit-learn
  • What is the K-nearest neighbors classification model?
  • What are the four steps for model training and prediction in scikit-learn?
  • How can I apply this pattern to other Machine Learning models?
Lesson 4
20 mins
Quiz 4
5. Comparing Machine Learning models in scikit-learn
  • How do I choose which model to use for my supervised learning task?
  • How do I choose the best tuning parameters for that model?
  • How do I estimate the likely performance of my model on out-of-sample data?
Lesson 5
27 mins
Quiz 5
6. Data science pipeline: pandas, seaborn, scikit-learn
  • How do I use the pandas library to read data into Python?
  • How do I use the seaborn library to visualize data?
  • What is linear regression, and how does it work?
  • How do I train and interpret a linear regression model in scikit-learn?
  • What are some evaluation metrics for regression problems?
  • How do I choose which features to include in my model?
Lesson 6
35 mins
Quiz 6
7. Cross-validation for parameter tuning, model selection, and feature selection
  • What is the drawback of using the train/test split procedure for model evaluation?
  • How does K-fold cross-validation overcome this limitation?
  • How can cross-validation be used for selecting tuning parameters, choosing between models, and selecting features?
  • What are some possible improvements to cross-validation?
Lesson 7
36 mins
Quiz 7
8. Efficiently searching for optimal tuning parameters
  • How can K-fold cross-validation be used to search for an optimal tuning parameter?
  • How can this process be made more efficient?
  • How do you search for multiple tuning parameters at once?
  • What do you do with those tuning parameters before making real predictions?
  • How can the computational expense of this process be reduced?
Lesson 8
28 mins
Quiz 8
9. Evaluating a classification model
  • What is the purpose of model evaluation, and what are some common evaluation procedures?
  • What is the usage of classification accuracy, and what are its limitations?
  • How does a confusion matrix describe the performance of a classifier?
  • What metrics can be computed from a confusion matrix?
  • How can you adjust classifier performance by changing the classification threshold?
  • What is the purpose of an ROC curve?
  • How does Area Under the Curve (AUC) differ from classification accuracy?
Lesson 9
55 mins
Quiz 9
10. Building a Machine Learning workflow
  • Why should you use a Pipeline?
  • How do you encode categorical features with OneHotEncoder?
  • How do you apply OneHotEncoder to selected columns with ColumnTransformer?
  • How do you build and cross-validate a Pipeline?
  • How do you make predictions on new data using a Pipeline?
  • Why should you use scikit-learn (rather than pandas) for preprocessing?
Lesson 10
28 mins
Quiz 10
Conclusion
Request your certificate of completion
Take another course from Data School!