1. Use ColumnTransformer to apply different preprocessing to different columns
1. Use ColumnTransformer to apply different preprocessing to different columns
50 scikit-learn tips
Buy now
Learn more
Introduction
Welcome to the course!
31
Download the course notebooks
8
Data Preprocessing
1. Use ColumnTransformer to apply different preprocessing to different columns
9
2. Seven ways to select columns using ColumnTransformer
2
3. What is the difference between "fit" and "transform"?
2
4. Use "fit_transform" on training data, but "transform" (only) on testing/new data
2
38. Get the feature names output by a ColumnTransformer
5
42. Passthrough some columns and drop others in a ColumnTransformer
4
Using pandas
5. Four reasons to use scikit-learn (not pandas) for ML preprocessing
6
35. Don't use .values when passing a pandas object to scikit-learn
2
39. Load a toy dataset into a DataFrame
2
Categorical Features
6. Encode categorical features using OneHotEncoder or OrdinalEncoder
14
7. Handle unknown categories with OneHotEncoder by encoding them as zeros
2
15. Three reasons not to use drop='first' with OneHotEncoder
6
41. Drop the first category from binary features (only) with OneHotEncoder
6
43. Use OrdinalEncoder instead of OneHotEncoder with tree-based models
8
Missing Values
9. Add a missing indicator to encode "missingness" as a feature
11. Impute missing values using KNNImputer or IterativeImputer
11
14. HistGradientBoostingClassifier natively supports missing values
4
27. Two ways to impute missing values for a categorical feature
Pipelines
8. Use Pipeline to chain together multiple steps
4
12. What is the difference between Pipeline and make_pipeline?
8
13. Examine the intermediate steps in a Pipeline
2
22. Use the correct methods for each type of Pipeline
2
28. Save a model or Pipeline using joblib
2
30. Four ways to examine the steps of a Pipeline
4
34. Add feature selection to a Pipeline
2
37. Create an interactive diagram of a Pipeline in Jupyter
2
48. Access part of a Pipeline using slicing
50. Adapt this pattern to solve many Machine Learning problems
2
Intermission
Can I ask you a quick favor?
Parameter Tuning
16. Use cross_val_score and GridSearchCV on a Pipeline
12
17. Try RandomizedSearchCV if GridSearchCV is taking too long
4
18. Display GridSearchCV or RandomizedSearchCV results in a DataFrame
2
19. Important tuning parameters for LogisticRegression
4
25. Prune a decision tree to avoid overfitting
2
40. Estimators only print parameters that have been changed
44. Speed up GridSearchCV using parallel processing
6
49. Tune multiple models simultaneously with GridSearchCV
4
Model Evaluation
20. Plot a confusion matrix
9
21. Compare multiple ROC curves in a single plot
6
26. Use stratified sampling with train_test_split
10
31. Shuffle your dataset when using cross_val_score
4
32. Use AUC to evaluate multiclass problems
4
Model Inspection
23. Display the intercept and coefficients for a linear model
24. Visualize a decision tree two different ways
2
Model Ensembling
46. Ensemble multiple models using VotingClassifer or VotingRegressor
2
47. Tune the parameters of a VotingClassifer or VotingRegressor
2
Feature Engineering
29. Vectorize two text columns in a ColumnTransformer
2
33. Use FunctionTransformer to convert functions into transformers
3
45. Create feature interactions using PolynomialFeatures
2
Coding Practices
10. Set a "random_state" to make your code reproducible
2
36. Most parameters should be passed as keyword arguments
4
Conclusion
Can I ask you a quick favor?
Request your certificate of completion
Take another course from Data School!
6
Earn money by promoting Data School's courses!
2
Preview unavailable
You must log in or sign up to view this lesson.
Login
Sign up
50 scikit-learn tips
Buy now
Learn more
Introduction
Welcome to the course!
31
Download the course notebooks
8
Data Preprocessing
1. Use ColumnTransformer to apply different preprocessing to different columns
9
2. Seven ways to select columns using ColumnTransformer
2
3. What is the difference between "fit" and "transform"?
2
4. Use "fit_transform" on training data, but "transform" (only) on testing/new data
2
38. Get the feature names output by a ColumnTransformer
5
42. Passthrough some columns and drop others in a ColumnTransformer
4
Using pandas
5. Four reasons to use scikit-learn (not pandas) for ML preprocessing
6
35. Don't use .values when passing a pandas object to scikit-learn
2
39. Load a toy dataset into a DataFrame
2
Categorical Features
6. Encode categorical features using OneHotEncoder or OrdinalEncoder
14
7. Handle unknown categories with OneHotEncoder by encoding them as zeros
2
15. Three reasons not to use drop='first' with OneHotEncoder
6
41. Drop the first category from binary features (only) with OneHotEncoder
6
43. Use OrdinalEncoder instead of OneHotEncoder with tree-based models
8
Missing Values
9. Add a missing indicator to encode "missingness" as a feature
11. Impute missing values using KNNImputer or IterativeImputer
11
14. HistGradientBoostingClassifier natively supports missing values
4
27. Two ways to impute missing values for a categorical feature
Pipelines
8. Use Pipeline to chain together multiple steps
4
12. What is the difference between Pipeline and make_pipeline?
8
13. Examine the intermediate steps in a Pipeline
2
22. Use the correct methods for each type of Pipeline
2
28. Save a model or Pipeline using joblib
2
30. Four ways to examine the steps of a Pipeline
4
34. Add feature selection to a Pipeline
2
37. Create an interactive diagram of a Pipeline in Jupyter
2
48. Access part of a Pipeline using slicing
50. Adapt this pattern to solve many Machine Learning problems
2
Intermission
Can I ask you a quick favor?
Parameter Tuning
16. Use cross_val_score and GridSearchCV on a Pipeline
12
17. Try RandomizedSearchCV if GridSearchCV is taking too long
4
18. Display GridSearchCV or RandomizedSearchCV results in a DataFrame
2
19. Important tuning parameters for LogisticRegression
4
25. Prune a decision tree to avoid overfitting
2
40. Estimators only print parameters that have been changed
44. Speed up GridSearchCV using parallel processing
6
49. Tune multiple models simultaneously with GridSearchCV
4
Model Evaluation
20. Plot a confusion matrix
9
21. Compare multiple ROC curves in a single plot
6
26. Use stratified sampling with train_test_split
10
31. Shuffle your dataset when using cross_val_score
4
32. Use AUC to evaluate multiclass problems
4
Model Inspection
23. Display the intercept and coefficients for a linear model
24. Visualize a decision tree two different ways
2
Model Ensembling
46. Ensemble multiple models using VotingClassifer or VotingRegressor
2
47. Tune the parameters of a VotingClassifer or VotingRegressor
2
Feature Engineering
29. Vectorize two text columns in a ColumnTransformer
2
33. Use FunctionTransformer to convert functions into transformers
3
45. Create feature interactions using PolynomialFeatures
2
Coding Practices
10. Set a "random_state" to make your code reproducible
2
36. Most parameters should be passed as keyword arguments
4
Conclusion
Can I ask you a quick favor?
Request your certificate of completion
Take another course from Data School!
6
Earn money by promoting Data School's courses!
2