Data School/Building an Effective Machine Learning Workflow with scikit-learn

Building an Effective Machine Learning Workflow with scikit-learn

  • Closed

Note: This course is no longer available for enrollment, and has been replaced by Master Machine Learning with scikit-learn.

What will I learn in this course?

In this 8-hour course, you'll learn:

  • How to prepare complex datasets for Machine Learning using scikit-learn

  • How to handle common scenarios such as missing values, text data, and categorical data

  • How to build a reusable and efficient workflow that starts with a pandas DataFrame and ends with a trained scikit-learn model

  • How to integrate feature engineering, selection, and standardization into your workflow

  • How to avoid data leakage so that you can correctly estimate model performance

  • How to tune your entire workflow for maximum performance

By the end of the course, you'll be more confident when tackling new Machine Learning problems because you'll understand what steps you need to take, why you need to take them, and how to correctly execute those steps using scikit-learn.

And because you're learning a better way to work in scikit-learn, your code will be easier to write and to read, and you'll get better Machine Learning results faster than before!

Who should take this course?

This is the perfect course for you if:

  • You've taken my introductory course and you're ready to go deeper into scikit-learn

  • You want to write efficient, readable, and reusable scikit-learn code that integrates well with pandas

  • You want to properly handle common data issues such as missing values, text data, and categorical data

  • You want to tune your entire workflow for maximum performance

  • You want to take advantage of the latest scikit-learn features

What's included in the course?

  • Video recordings of the core lessons (4.5 hours)
  • Video recordings of the office hours (3.5 hours), during which I answered 45 student questions in detail
  • Jupyter notebooks for the core lessons and office hours, including my detailed lesson notes (9,000+ words) for easy reference
  • Certificate of completion at the end of the course
  • Lifetime access to all course materials

Join 450+ happy students...

Khaled Jafar (Director of Analytics)

This was one of the best data science classes I have ever taken... I was impressed with Kevin's easy-to-understand teaching style where he clearly explains the 'what' and 'why' of each principle... I highly recommend this course.

Abla Elsergany (MS in Advanced Analytics)

This course takes you through some of the challenges we face with real data, which is not always the case in other courses... If you are familiar with Machine Learning but need to know how to apply it using scikit-learn, then this course is definitely for you!

Les Guessing (Creative Director at Creative Algorithm)

Learning Machine Learning is a bit of a zig zag process. A little from here. A little from there. Kevin Markham is BETTER THAN ANYBODY at pulling all those pieces together so you can use them and understand what you're doing.

Mike F. (Data Scientist)

This class will not only save me a lot of time in the future, but will also ensure that my models will be robust to data leakage... The explanations and demonstrations are worth the price of admission.

Pranjal Chaubey (AI Mentor at Udacity)

Kevin is a master at explaining difficult and confusing concepts with ease, and I was amazed at the sheer amount of information he was able to pack in a rather short span of time... I learned more about scikit-learn from this course than from months of watching YouTube tutorials and taking MOOCs.

João Vítor Franco (Data Scientist at 99)

I've already used the learnings from the course in a Machine Learning competition and got impressive results, while keeping the code clean and easy to understand. Also, I'm much more confident at tackling Machine Learning problems and I'm sure this will contribute a lot to my career.

You don't have to struggle like I did...

Five years ago, I remember sitting at my computer for hours, struggling to figure out how to use "Pipeline" and "FeatureUnion" together. It felt like my head might explode at any moment 🤯

Although I had been teaching scikit-learn for a year, I still couldn't figure out how to turn the concepts in my head into the code I needed to write.

Why was it so hard?

With the benefit of hindsight, I can see two reasons why I was struggling:

First, I didn't have a trusted guide who could show me the easiest way to solve my scikit-learn problems.

Second, I didn't have a complete mental model of how the scikit-learn pieces fit together. (When do you use "fit" vs "transform"? What objects are output by each step of a Pipeline? What happens when the test set differs from the training set? etc.)

Fast forward to today, and scikit-learn comes much easier to me:

  • I know how to find exactly what I need in the documentation

  • I understand nearly all of the terminology

  • I've built a clear mental model of how things work in scikit-learn

  • I know what functions and classes are available, and how to use them for maximum efficiency

These days, working in scikit-learn is a JOY. I know what code I need to write, and I can execute my Machine Learning projects much more quickly!

But it took me FIVE YEARS to get here.

Do you want to struggle for five years? Or do you want to dramatically improve your scikit-learn skills TODAY?

Let me be your guide, so that you can finally work with ease in scikit-learn and get better Machine Learning results faster than before!

FAQs

How do I know if I'm ready for the course?

You're ready for this course if you can use scikit-learn to solve simple classification or regression problems, including loading a dataset, defining the features and target, training and evaluating a model, and making predictions with new data.

You'll also need to know how to perform a few basic pandas operations, including reading a CSV file and selecting columns from a DataFrame.

If you're not yet ready, I recommend enrolling in my free introductory ML course and completing lessons 1 through 7, after which you'll be ready for this course!

What topics are covered in the course?

  • Review of the basic Machine Learning workflow
  • Encoding categorical features
  • Encoding text data
  • Handling missing values
  • Creating an efficient workflow for preprocessing and model building
  • Tuning your workflow for maximum performance
  • Avoiding data leakage
  • Proper model evaluation
  • Model persistence
  • Feature selection
  • Feature standardization
  • Feature engineering using custom transformers

Which scikit-learn functions and classes will I learn how to use?

  • Workflow composition: Pipeline, ColumnTransformer, FeatureUnion
  • Categorical encoding: OneHotEncoder, OrdinalEncoder
  • Text encoding: CountVectorizer
  • Missing value imputation: SimpleImputer, KNNImputer, IterativeImputer, MissingIndicator
  • Model building: LogisticRegression
  • Hyperparameter tuning: GridSearchCV, RandomizedSearchCV
  • Feature selection: SelectPercentile, SelectFromModel
  • Feature standardization: StandardScaler, MaxAbsScaler
  • Feature engineering: FunctionTransformer
  • Model persistence: pickle, joblib (these are external libraries)

Why does the course focus on the Machine Learning workflow rather than specific algorithms?

I've found that the workflow will have a far greater impact on your Machine Learning results than your ability to pick between algorithms. In fact, once you've mastered the workflow, you can iterate through different algorithms quickly even if you don't deeply understand them.

Understanding algorithms is still useful, but it's hard to know in advance which algorithm will work best for a particular problem. That's why it's so important to build a flexible workflow that enables you to easily experiment with different algorithms.

Is the course material up-to-date?

Yes! I created this course in 2020 using scikit-learn 0.22.1. Very little has changed in the library since that time that is relevant to the course.

Which Python libraries do I need to install?

The only libraries you'll need to install are scikit-learn (version 0.20.2 or later) and pandas (any version). To check your scikit-learn version, just open your Python editor and run these two lines of code:

  • import sklearn
  • sklearn.__version__   (those are double underscores)
If your scikit-learn version is 0.20.1 or earlier, then it's important that you upgrade it using pip or conda.

Which Python editor should I use?

I'll be writing code using the Jupyter notebook, though you can use any Python editor you like. If you'd like to install the Jupyter notebook, I recommend downloading the free Anaconda distribution, which also includes scikit-learn and pandas.

Alternatively, you could participate in the course using Google Colab. Colab is free and runs entirely in your browser, and it provides you with an interface similar to the Jupyter notebook.

What if I need help during the course?

You can post a question below any video, and I'll do my best to respond!

How do I earn a certificate of completion?

Once you have watched all of the videos, you can request a certificate of completion.

How long will I have access to the course?

You will have lifetime access to the videos and notebooks.

Do you offer any discounts?

Yes! I offer Purchasing Power Parity discounts (also known as location-based discounts) for all of my paid courses. If you're located in one of the 160+ qualifying countries, you should automatically see a discount code at the top of this page.

I also offer student discounts and hardship-based discounts, regardless of where you live. Please email me at kevin@dataschool.io and I'd be happy to send you the appropriate discount code.

What's your refund policy?

If you decide that the course isn't a good fit for you, I'd be happy to give you a full refund within 30 days of purchase.

I have another question...

Please email me at kevin@dataschool.io and I'd be happy to answer your question!

Course Outline

Introduction

Welcome to the course!
Download the course notebooks

Course videos

Session 1
Office Hours 1
Session 2
Office Hours 2

Conclusion

Request your certificate of completion
Take another course from Data School!
Earn money by promoting Data School's courses!

👋 Welcome to Data School!

My name is Kevin, and I've taught Data Science in Python to over a million students.

My courses explain data science topics in a clear, thorough, and step-by-step manner.

I'd love to teach you, regardless of your educational background or professional experience.

Thanks for joining me! 🙌