Data School/Master Machine Learning with scikit-learn

  • $299

Master Machine Learning with scikit-learn

  • 192 Lessons

Get ready for your dream job in Machine Learning!

Your dream job feels so close, yet so far away

Imagine yourself at your next job: You're using Machine Learning to solve exciting problems that matter to you, whether that's helping to cure a disease, detect fraud, or predict what will happen next in the stock market.

Machine Learning is one of the most in-demand skills in the job market today, and you know that finally mastering Machine Learning will help you to build a more fulfilling career at the company of your choice.

But making progress with Machine Learning is hard. You've built a lot of models using scikit-learn, but mostly with clean datasets. When you try to apply your knowledge to more complex scenarios, you get lost. Your code doesn't always work, and when it does, you're not sure that you're actually doing things the right way. The courses and blog posts and Stack Overflow answers that got you to this point aren't helping any more.

It's SO frustrating because you can't find any resources that help get you to the next level. You're motivated to keep learning, but you keep running into roadblocks like documentation you can't understand, papers with too many formulas, and courses that skip over details that you can't figure out on your own.

It feels like your dream of a Machine Learning job is so close, yet so far away.

I took a few Machine Learning courses during my Master's degree (Business Analytics), which gave me some basic ML knowledge.

Your courses and videos helped me a lot to further understand ML, which I believe is the reason I landed my dream job.

- Maggie Tang (Machine Learning Engineer)

I can help you get to the next level

I'm here to help. My name is Kevin Markham, and I'm the founder of Data School. I've taught data science in Python to over a million students. My courses explain data science topics in a clear, thorough, and step-by-step manner that you can understand regardless of your educational background or professional experience.

I know EXACTLY what you're going through because eight years ago, I was there too. I had been teaching scikit-learn for a year, and yet I still couldn't figure out how to turn the concepts in my head into the code I needed to write.

Fast forward to now, and scikit-learn comes SO much easier to me. I know what code I need to write in order to do proper Machine Learning, and I get better Machine Learning results faster than ever.

So how did I get here? I've spent the last eight years researching best practices in Machine Learning, digging through the scikit-learn documentation, practicing what I've learned, sharing my knowledge with the community, getting feedback from experts, and even contributing to the scikit-learn library. It was an undeniably effective process, but the journey was long and challenging.

I want to help you avoid that struggle so that you can get to where I am in a fraction of the time!

Machine Learning is huge, and there are lots of people teaching it out there. But few are as clear and practical as Kevin.

Yes, he teaches you the syntax you need to know. But more importantly, he teaches you the ideas behind the syntax, and shows you why and where you'll want to use various techniques.

If you think that Machine Learning is too complex for you to learn, I cannot recommend Kevin's courses enough. He'll give you the confidence you need, along with the knowledge you want.

- Reuven Lerner (Python trainer)

Set yourself apart from the competition

Imagine what it will be like once you finally master Machine Learning with scikit-learn:

You'll be more confident when tackling new Machine Learning problems because you'll understand what steps you need to take, why you need to take them, and how to execute those steps using scikit-learn.

You'll know what problems you might run into, and you'll know exactly how to solve them.

Your code will be easier to write and read, and you'll get better Machine Learning results faster than before.

All of these skills, and the confidence you'll have when applying them, will set you apart from the competition when you're looking for your next Machine Learning job.

I need to thank you for your videos that let me find my dream job. Thank you so much!!

- Arrigo Coen Coria (Data Scientist)

Start building these skills TODAY

So how can you build these skills?

Well, you could follow the same path that I did... if you can afford to spend ten years of your life and a lot of time banging your head against the wall!

Or you could spend two years and $50,000 getting a Master's degree in Machine Learning, but even then you might end up with a lot of theoretical rather than practical knowledge.

Or you could drop everything, move to a new city, and spend three months and $20,000 at a bootcamp, though you'd better hope that the teaching staff is good.

Or you could stay right where you are and start building these skills TODAY with my new course!

Master Machine Learning with scikit-learn will teach you how to solve almost any supervised Machine Learning problem using the latest scikit-learn techniques.

It's a distillation of everything I've learned about Machine Learning over the past ten years, packaged into clear, step-by-step, easy to understand and easy to reference lessons.

You don't have to take my word for it...

Previously, I tested a shorter version of this course with 200 students. Here's what a few of them had to say:

Khaled Jafar (Director of Analytics)

This was one of the best data science classes I have ever taken... I was impressed with Kevin's easy-to-understand teaching style where he clearly explains the 'what' and 'why' of each principle... I highly recommend this course.

Abla Elsergany (MS in Advanced Analytics)

This course takes you through some of the challenges we face with real data, which is not always the case in other courses... If you're familiar with Machine Learning but need to know how to apply it using scikit-learn, then this course is definitely for you!

Les Guessing (Creative Director at Creative Algorithm)

Learning Machine Learning is a bit of a zig zag process. A little from here. A little from there. Kevin Markham is BETTER THAN ANYBODY at pulling all those pieces together so you can use them and understand what you're doing.

Mike F. (Data Scientist)

This class will not only save me a lot of time in the future, but will also ensure that my models will be robust to data leakage... The explanations and demonstrations are worth the price of admission.

Pranjal Chaubey (AI Mentor at Udacity)

Kevin is a master at explaining difficult and confusing concepts with ease, and I was amazed at the sheer amount of information he was able to pack in a rather short span of time... I learned more about scikit-learn from this course than from months of watching YouTube tutorials and taking MOOCs.

João Vítor Franco (Data Scientist at 99)

I've already used the learnings from the course in a Machine Learning competition and got impressive results, while keeping the code clean and easy to understand. Also, I'm much more confident at tackling Machine Learning problems and I'm sure this will contribute a lot to my career.

After the test run, I spent 1,000+ hours refining and expanding the course to make it the most clear and comprehensive scikit-learn course available today.

For a tiny fraction of the cost of a Master's degree or a bootcamp, you can massively improve your skills and get ready for your dream job in Machine Learning!

This is unlike ANY other Machine Learning course

Most Machine Learning courses suffer from a host of problems: They're poorly taught, lack the necessary depth, and include unexplained or broken code. They don't teach you how to apply what you're learning, and they don't show you how to handle all of the problems you'll ACTUALLY face in real-world Machine Learning.

But in this course, we'll focus on application from the very beginning. We'll spend most of our time writing scikit-learn code, and you'll understand how every single line relates to the problem we're solving.

You'll learn the best practices for proper Machine Learning and you'll learn how to apply those practices to your own Machine Learning problems.

We'll also cover topics that are critical to effective Machine Learning but are rarely covered by other courses, such as:

  • Cost-sensitive learning

  • Class imbalance

  • Data leakage

  • Regularization

  • Multivariate imputation

  • High-cardinality categorical features

  • Custom transformers for feature engineering

  • Multiclass problems

  • Ensembling

  • Non-linear models

  • Binning numeric features

  • ROC and precision-recall curves

  • Calculating rates from a confusion matrix

  • How to read the scikit-learn documentation

Your courses are THE most to-the-point (yet) comprehensive tutorials I have come across on ML.

I have reviewed many ML courses out there, and none are as terse and yet as useful as your video materials especially when it comes to applied ML.

- Neil Dias (ML Engineer)

Here are a few of the things you'll learn:

  • Why scikit-learn is the single most popular Machine Learning framework today (and is usually a better choice than deep learning!)

  • Why your workflow matters WAY more than which algorithm you choose

  • Five key factors to consider when deciding whether to impute missing values

  • How a "missing indicator" can transform your missing values into an asset

  • How to choose between ordinal and one-hot encoding for your categorical features

  • How to create meaningful features from unstructured text data

  • Seven ways to select columns in a ColumnTransformer (this is a huge timesaver!)

  • Three automated feature selection methods that will improve your model's performance

  • How to know whether your numerical features should be standardized

  • How to calculate the confidence level of your predictions (and when NOT to do this)

  • My five-step process for properly handling class imbalance

  • Why you need to tune your transformers and your model at the same time

  • How to speed up your grid search (and when to use a randomized search instead)

  • Two easy methods for ensembling multiple models

  • Why using pandas for transformations can lead to "data leakage" (this is critical to avoid!)

  • How to do ALL of your feature engineering in scikit-learn using custom transformers

  • How to examine every step of your Pipeline (this is great for troubleshooting!)

  • How to save your best Pipeline for future predictions

  • And much, much more!

You're one of the best teachers out there! Thank you very much for making Data Science and Machine Learning so intuitive and interesting.

- Sohum Rajdev (Master's Student)

Here's what you'll get when you enroll:

  • 149 video lessons (7.5 hours) with transcripts for easy reference

  • 126 quiz questions to check your understanding

  • 900+ lines of code you can adapt for your own projects

  • Jupyter notebooks with all of the code and lecture notes

  • Downloadable datasets so you can follow along at home

  • Certificate of completion at the end of the course

  • Lifetime access to everything

  • Free access to future course updates

Choosing Data School's courses is the best decision I have ever made when embarking on my data science journey.

- Duc Nguyen Huu (Data Science Intern)

30-day money back guarantee

I'm confident that this course will help you massively improve your Machine Learning skills and move you closer to your dream job in Machine Learning.

But if you're not satisfied with the course, just let me know within 30 days of purchase and I'm happy to give you 100% of your money back, no questions asked!

Begin the course today!

If you're ready to invest in your Machine Learning career, click the button below for immediate access to the entire course.

I'm so excited for the journey you're about to take, and I can't wait to hear how this course helps you to build a more fulfilling career!

- Kevin Markham (Founder of Data School)

FAQs

How do I know if I'm ready for the course?

You're ready for this course if you can use scikit-learn to solve simple classification or regression problems, including loading a dataset, defining the features and target, training and evaluating a model, and making predictions with new data. You'll also need to know how to perform a few basic pandas operations, including reading a CSV file and selecting columns from a DataFrame.

If you're not yet ready, I recommend enrolling in my free introductory ML course and completing lessons 1 through 7, after which you'll be ready for this course!

Do I need to have a background in advanced math in order to take this course?

Not at all! I purposefully teach in a way that is accessible to a wide variety of educational backgrounds, even when covering more advanced topics like regularization, multivariate imputation, cost-sensitive learning, and so on.

What topics are covered in the course?

Here's a brief summary of the topics covered in the course:

  • Review of the basic Machine Learning workflow

  • Encoding categorical features

  • Encoding text data

  • Handling missing values

  • Preparing complex datasets

  • Creating an efficient workflow for preprocessing and model building

  • Tuning your workflow for maximum performance

  • Avoiding data leakage

  • Proper model evaluation

  • Automatic feature selection

  • Feature standardization

  • Feature engineering using custom transformers

  • Linear and non-linear models

  • Model ensembling

  • Model persistence

  • Handling high-cardinality categorical features

  • Handling class imbalance

You can scroll down to the Course Outline to see the detailed list of all 149 lessons.

Which scikit-learn functions and classes will I learn how to use?

  • Workflow composition: Pipeline, ColumnTransformer, make_pipeline, make_column_transformer, make_column_selector, make_union

  • Categorical encoding: OneHotEncoder, OrdinalEncoder

  • Numerical encoding: KBinsDiscretizer

  • Text encoding: CountVectorizer

  • Missing value imputation: SimpleImputer, KNNImputer, IterativeImputer, MissingIndicator

  • Model building: LogisticRegression, RandomForestClassifier, ExtraTreesClassifier

  • Model ensembling: VotingClassifier

  • Model selection: StratifiedKFold, cross_val_score, train_test_split

  • Model evaluation: accuracy_score, classification_report, confusion_matrix, roc_auc_score, average_precision_score, plot_confusion_matrix, plot_roc_curve, plot_precision_recall_curve

  • Hyperparameter tuning: GridSearchCV, RandomizedSearchCV

  • Feature selection: RFE, SelectPercentile, SelectFromModel, chi2

  • Feature standardization: StandardScaler, MaxAbsScaler

  • Feature engineering: FunctionTransformer, PolynomialFeatures

  • Configuration: set_config

  • Model persistence: joblib, pickle, cloudpickle (these are external libraries)

Can't I learn all of this on my own?

Yes! If you have unlimited time and a lot of patience, you can learn everything I cover in this course by reading countless books, research papers, articles, documentation pages, GitHub pull requests, and so on. (Make sure to ignore all of the erroneous and outdated information!)

Alternatively, you can save yourself a lot of time and frustration by making this small, one-time investment in yourself and your career!

Will taking this course guarantee that I can get a Machine Learning job?

No single course (or degree) can guarantee you a job in Machine Learning. Every job requires a combination of skills, experience, and domain knowledge that are specific to the employer and position.

However, I can guarantee that if you complete this course and commit to applying what you've learned, you will achieve a far greater fluency with Machine Learning and scikit-learn that will set you apart from the competition when applying for your next job!

I'm already comfortable with scikit-learn. Will I still learn a lot in this course?

Definitely! If you browse through the Course Outline below, I think you'll find that there are a ton of topics I cover in the course that are critical to effective Machine Learning but are rarely covered by other courses (or even the scikit-learn documentation!)

Why does the course focus on the Machine Learning workflow rather than specific algorithms?

I've found that the workflow will have a far greater impact on your Machine Learning results than your ability to pick between algorithms. In fact, once you've mastered the workflow, you can iterate through different algorithms quickly even if you don't deeply understand them.

Understanding algorithms is still useful, but it's hard to know in advance which algorithm will work best for a particular problem. That's why it's so important to build a flexible workflow that enables you to easily experiment with different algorithms.

What's the difference between "Master Machine Learning with scikit-learn" and "Building an Effective Machine Learning Workflow with scikit-learn"?

Master Machine Learning with scikit-learn is an updated and significantly expanded version of Building an Effective Machine Learning Workflow with scikit-learn. I spent 1,000+ hours revising every existing lesson and adding countless new lessons in order to make Master Machine Learning the most clear and comprehensive scikit-learn course available today.

Is the course material up-to-date?

Yes! I created this course using scikit-learn 0.23.2. Since then, very little has changed in the library that affects the course, and when there has been a relevant change, I note that within the course.

Which Python libraries do I need to install?

The only libraries you'll need to install are scikit-learn (version 0.20.2 or later), pandas (any version), and matplotlib (any version). To check your scikit-learn version, just open your Python editor and run these two lines of code:

import sklearn
sklearn.__version__

If your scikit-learn version is 0.20.1 or earlier, then it's important that you upgrade it using pip or conda.

Which Python editor should I use?

I'll be writing code using the Jupyter notebook, though you can use any Python editor you like. If you'd like to install the Jupyter notebook, I recommend downloading the free Anaconda distribution, which also includes scikit-learn, pandas, and matplotlib.

Alternatively, you could participate in the course using Google Colab. Colab is free and runs entirely in your browser, and it provides you with an interface similar to the Jupyter notebook.

What if I need help during the course?

Most chapters include a substantial section of Q&A lessons, which answer all of the common questions that students have asked me about that topic. In addition, you can post a question below any video, and I'll do my best to respond!

How do I earn a certificate of completion?

Once you have watched all of the videos and attempted all of the quizzes, you can request a certificate of completion.

How long will I have access to the course?

You will have lifetime access to the course so that you can work through lessons at your own pace and reference them later. I expect that it will be useful to you for years to come!

Do you offer any discounts?

Yes! I offer Purchasing Power Parity discounts (also known as location-based discounts) for all of my paid courses. If you're located in one of the 160+ qualifying countries, you should automatically see a discount code at the top of this page.

I also offer student discounts and hardship-based discounts, regardless of where you live. Please email me at kevin@dataschool.io and I'd be happy to send you the appropriate discount code.

What's your refund policy?

If you decide that the course isn't a good fit for you, I'd be happy to give you a full refund within 30 days of purchase. Simply email me at kevin@dataschool.io and I'll promptly process your refund.

I have another question...

Please email me at kevin@dataschool.io and I'd be happy to answer your question!

Course Outline

You'll notice that the chapters are divided into small, digestible video lessons so that you can work through the course as you have time and easily reference the material later.

In chapter 1, I'll give you an overview of the course and help you to get set up. Then in chapter 2, we'll move on to a review of the Machine Learning workflow in order to establish a foundation for the rest of the course.

In chapters 3 through 9, we'll explore how to handle common issues such as categorical features, text data, and missing values, and also how to integrate those steps into an efficient workflow. Then in chapter 10, we'll cover how to properly evaluate and tune your entire workflow for maximum performance.

In chapters 11 through 16, we'll walk through a variety of advanced techniques that can help to further improve your model's performance, including ensembling, feature selection, feature standardization, and feature engineering. In chapters 17 through 19, we'll dive deep into two common issues you'll run into during real-world Machine Learning, namely high-cardinality categorical features and class imbalance.

Finally, in chapter 20, I'll end the course with my advice for how you can continue to make progress with your Machine Learning education and skill development!

For more details, you can browse through the complete list of lessons below.

Chapter 1: Introduction

16 minutes
1.1 Course overview
1.2 scikit-learn vs Deep Learning
1.3 Prerequisite skills
1.4 Course setup and software versions
1.5 Course outline
1.6 Course datasets
1.7 Meet your instructor
Download the course files
List of all lessons

Chapter 2: Review of the Machine Learning workflow

31 minutes
2.1 Loading and exploring a dataset
2.2 Building and evaluating a model
2.3 Using the model to make predictions
2.4 Q&A: How do I adapt this workflow to a regression problem?
2.5 Q&A: How do I adapt this workflow to a multiclass problem?
2.6 Q&A: Why should I select a Series for the target?
2.7 Q&A: How do I add the model's predictions to a DataFrame?
2.8 Q&A: How do I determine the confidence level of each prediction?
2.9 Q&A: How do I check the accuracy of the model's predictions?
2.10 Q&A: What do the "solver" and "random_state" parameters do?
2.11 Q&A: How do I show all of the model parameters?
2.12 Q&A: Should I shuffle the samples when using cross-validation?
Chapter 2 Quiz
Chapter 2 Quiz Discussion

Chapter 3: Encoding categorical features

35 minutes
3.1 Introduction to one-hot encoding
3.2 Transformer methods: fit, transform, fit_transform
3.3 One-hot encoding of multiple features
3.4 Q&A: When should I use transform instead of fit_transform?
3.5 Q&A: What happens if the testing data includes a new category?
3.6 Q&A: Should I drop one of the one-hot encoded categories?
3.7 Q&A: How do I encode an ordinal feature?
3.8 Q&A: What's the difference between OrdinalEncoder and LabelEncoder?
3.9 Q&A: Should I encode numeric features as ordinal features?
Chapter 3 Quiz
Chapter 3 Quiz Discussion

Chapter 4: Improving your workflow with ColumnTransformer and Pipeline

32 minutes
4.1 Preprocessing features with ColumnTransformer
4.2 Chaining steps with Pipeline
4.3 Using the Pipeline to make predictions
4.4 Q&A: How do I drop some columns and passthrough others?
4.5 Q&A: How do I transform the unspecified columns?
4.6 Q&A: How do I select columns from a NumPy array?
4.7 Q&A: How do I select columns by data type?
4.8 Q&A: How do I select columns by column name pattern?
4.9 Q&A: Should I use ColumnTransformer or make_column_transformer?
4.10 Q&A: Should I use Pipeline or make_pipeline?
4.11 Q&A: How do I examine the steps of a Pipeline?
Chapter 4 Quiz
Chapter 4 Quiz Discussion

Chapter 5: Workflow review #1

7 minutes
5.1 Recap of our workflow
5.2 Comparing ColumnTransformer and Pipeline
5.3 Creating a Pipeline diagram
Chapter 5 Quiz
Chapter 5 Quiz Discussion

Chapter 6: Encoding text data

18 minutes
6.1 Vectorizing text
6.2 Including text data in the model
6.3 Q&A: Why is the document-term matrix stored as a sparse matrix?
6.4 Q&A: What happens if the testing data includes new words?
6.5 Q&A: How do I vectorize multiple columns of text?
6.6 Q&A: Should I one-hot encode or vectorize categorical features?
Chapter 6 Quiz
Chapter 6 Quiz Discussion

Chapter 7: Handling missing values

28 minutes
7.1 Introduction to missing values
7.2 Three ways to handle missing values
7.3 Missing value imputation
7.4 Using "missingness" as a feature
7.5 Q&A: How do I perform multivariate imputation?
7.6 Q&A: What are the best practices for missing value imputation?
7.7 Q&A: What's the difference between ColumnTransformer and FeatureUnion?
Chapter 7 Quiz
Chapter 7 Quiz Discussion

Chapter 8: Fixing common workflow problems

23 minutes
8.1 Two new problems
8.2 Problem 1: Missing values in a categorical feature
8.3 Problem 2: Missing values in the new data
8.4 Q&A: How do I see the feature names output by the ColumnTransformer?
8.5 Q&A: Why did we create a Pipeline inside of the ColumnTransformer?
8.6 Q&A: Which imputation strategy should I use with categorical features?
8.7 Q&A: Should I impute missing values before all other transformations?
8.8 Q&A: What methods can I use with a Pipeline?
Chapter 8 Quiz
Chapter 8 Quiz Discussion

Chapter 9: Workflow review #2

12 minutes
9.1 Recap of our workflow
9.2 Comparing ColumnTransformer and Pipeline
9.3 Why not use pandas for transformations?
9.4 Preventing data leakage
Chapter 9 Quiz
Chapter 9 Quiz Discussion

Intermission

Can I ask you a quick favor?

Chapter 10: Evaluating and tuning a Pipeline

52 minutes

10.1 Evaluating a Pipeline with cross-validation
10.2 Tuning a Pipeline with grid search
10.3 Tuning the model
10.4 Tuning the transformers
10.5 Using the best Pipeline to make predictions
10.6 Q&A: How do I save the best Pipeline for future use?
10.7 Q&A: How do I speed up a grid search?
10.8 Q&A: How do I tune a Pipeline with randomized search?
10.9 Q&A: What's the target accuracy we are trying to achieve?
10.10 Q&A: Is it okay that our model includes thousands of features?
10.11 Q&A: How do I examine the coefficients of a Pipeline?
10.12 Q&A: Should I split the dataset before tuning the Pipeline?
10.13 Q&A: What is regularization?
Chapter 10 Quiz
Chapter 10 Quiz Discussion

Chapter 11: Comparing linear and non-linear models

19 minutes

11.1 Trying a random forest model
11.2 Tuning random forests with randomized search
11.3 Further tuning with grid search
11.4 Q&A: How do I tune two models with a single grid search?
11.5 Q&A: How do I tune two models with a single randomized search?
Chapter 11 Quiz
Chapter 11 Quiz Discussion

Chapter 12: Ensembling multiple models

13 minutes

12.1 Introduction to ensembling
12.2 Ensembling logistic regression and random forests
12.3 Combining predicted probabilities
12.4 Combining class predictions
12.5 Choosing a voting strategy
12.6 Tuning an ensemble with grid search
12.7 Q&A: When should I use ensembling?
12.8 Q&A: How do I apply different weights to the models in an ensemble?
Chapter 12 Quiz
Chapter 12 Quiz Discussion

Chapter 13: Feature selection

30 minutes

13.1 Introduction to feature selection
13.2 Intrinsic methods: L1 regularization
13.3 Filter methods: Statistical test-based scoring
13.4 Filter methods: Model-based scoring
13.5 Filter methods: Summary
13.6 Wrapper methods: Recursive feature elimination
13.7 Q&A: How do I see which features were selected?
13.8 Q&A: Are the selected features the "most important" features?
13.9 Q&A: Is it okay for feature selection to remove one-hot encoded categories?
Chapter 13 Quiz
Chapter 13 Quiz Discussion

Chapter 14: Feature standardization

8 minutes

14.1 Standardizing numerical features
14.2 Standardizing all features
14.3 Q&A: How do I see what scaling was applied to each feature?
14.4 Q&A: How do I turn off feature standardization within a grid search?
14.5 Q&A: Which models benefit from standardization?
Chapter 14 Quiz
Chapter 14 Quiz Discussion

Chapter 15: Feature engineering with custom transformers

32 minutes

15.1 Why not use pandas for feature engineering?
15.2 Transformer 1: Rounding numerical values
15.3 Transformer 2: Clipping numerical values
15.4 Transformer 3: Extracting string values
15.5 Rules for transformer functions
15.6 Transformer 4: Combining two features
15.7 Revising the transformers
15.8 Q&A: How do I fix incorrect data types within a Pipeline?
15.9 Q&A: How do I create features from datetime data?
15.10 Q&A: How do I create feature interactions?
15.11 Q&A: How do I save a Pipeline with custom transformers?
15.12 Q&A: Can FunctionTransformer be used with any transformation?
Chapter 15 Quiz
Chapter 15 Quiz Discussion

Chapter 16: Workflow review #3

4 minutes

16.1 Recap of our workflow
16.2 What's the role of pandas?

Chapter 17: High-cardinality categorical features

15 minutes

17.1 Recap of nominal and ordinal features
17.2 Preparing the census dataset
17.3 Setting up the encoders
17.4 Encoding nominal features for a linear model
17.5 Encoding nominal features for a non-linear model
17.6 Combining the encodings
17.7 Best practices for encoding
Chapter 17 Quiz
Chapter 17 Quiz Discussion

Chapter 18: Class imbalance

30 minutes

18.1 Introduction to class imbalance
18.2 Preparing the mammography dataset
18.3 Evaluating a model with train/test split
18.4 Exploring the results with a confusion matrix
18.5 Calculating rates from a confusion matrix
18.6 Using AUC as the evaluation metric
18.7 Cost-sensitive learning
18.8 Tuning the decision threshold
Chapter 18 Quiz
Chapter 18 Quiz Discussion

Chapter 19: Class imbalance walkthrough

25 minutes

19.1 Best practices for class imbalance
19.2 Step 1: Splitting the dataset
19.3 Step 2: Optimizing the model on the training set
19.4 Step 3: Evaluating the model on the testing set
19.5 Step 4: Tuning the decision threshold
19.6 Step 5: Retraining the model and making predictions
19.7 Q&A: Should I use an ROC curve or a precision-recall curve?
19.8 Q&A: Can I use a different metric such as F1 score?
19.9 Q&A: Should I use resampling to fix class imbalance?
Chapter 19 Quiz
Chapter 19 Quiz Discussion

Chapter 20: Going further

12 minutes

20.1 Q&A: How do I read the scikit-learn documentation?
20.2 Q&A: How do I stay up-to-date with new scikit-learn features?
20.3 Q&A: How do I improve my Machine Learning skills?
20.4 Q&A: How do I learn Deep Learning?
Chapter 20 Quiz
Chapter 20 Quiz Discussion

Conclusion

Can I ask you a quick favor?
Request your certificate of completion
Take another course from Data School!
Earn money by promoting Data School's courses!

It's time to make a choice...

Whether or not you enroll, you're still going to want a dream job in Machine Learning.

Sure, you can choose NOT to enroll, and maybe you'll make the time to learn it all on your own. Or, you can enroll in my course and accelerate your Machine Learning skills TODAY!

Just picture yourself in a month:

  • You'll be more confident when tackling new Machine Learning problems

  • You'll understand how to write proper, efficient, high-performing scikit-learn code

  • You'll solve problems at work more quickly and easily

  • You'll be more ready than ever for your dream job in Machine Learning!

I know that in these uncertain times, it can be hard to invest in yourself.

But think about it: With the dramatic rise in AI technologies, there's no better time to invest in a career in Machine Learning!

There's zero risk, because I offer a 30-day money back guarantee. What have you got to lose?

I'll see you in the course! 🎓

- Kevin

P.S. Not quite ready? Enroll today to get lifetime access to the course, and then start the course once you're ready!

👋 Welcome to Data School!

My name is Kevin, and I've taught Data Science in Python to over a million students.

My courses explain data science topics in a clear, thorough, and step-by-step manner.

I'd love to teach you, regardless of your educational background or professional experience.

Thanks for joining me! 🙌