1. Use ColumnTransformer to apply different preprocessing to different columns

1. Use ColumnTransformer to apply different preprocessing to different columns

Preview unavailable

You must log in or sign up to view this lesson.

50 scikit-learn tips

Buy nowLearn more

Introduction

Welcome to the course!31

Download the course notebooks8

Data Preprocessing

1. Use ColumnTransformer to apply different preprocessing to different columns9

2. Seven ways to select columns using ColumnTransformer2

3. What is the difference between "fit" and "transform"?2

4. Use "fit_transform" on training data, but "transform" (only) on testing/new data2

38. Get the feature names output by a ColumnTransformer5

42. Passthrough some columns and drop others in a ColumnTransformer4

Using pandas

5. Four reasons to use scikit-learn (not pandas) for ML preprocessing6

35. Don't use .values when passing a pandas object to scikit-learn2

39. Load a toy dataset into a DataFrame2

Categorical Features

6. Encode categorical features using OneHotEncoder or OrdinalEncoder14

7. Handle unknown categories with OneHotEncoder by encoding them as zeros2

15. Three reasons not to use drop='first' with OneHotEncoder6

41. Drop the first category from binary features (only) with OneHotEncoder6

43. Use OrdinalEncoder instead of OneHotEncoder with tree-based models8

Missing Values

9. Add a missing indicator to encode "missingness" as a feature

11. Impute missing values using KNNImputer or IterativeImputer11

14. HistGradientBoostingClassifier natively supports missing values4

27. Two ways to impute missing values for a categorical feature

Pipelines

8. Use Pipeline to chain together multiple steps4

12. What is the difference between Pipeline and make_pipeline?8

13. Examine the intermediate steps in a Pipeline2

22. Use the correct methods for each type of Pipeline2

28. Save a model or Pipeline using joblib2

30. Four ways to examine the steps of a Pipeline4

34. Add feature selection to a Pipeline2

37. Create an interactive diagram of a Pipeline in Jupyter2

48. Access part of a Pipeline using slicing

50. Adapt this pattern to solve many Machine Learning problems2

Intermission

Can I ask you a quick favor?

Parameter Tuning

16. Use cross_val_score and GridSearchCV on a Pipeline12

17. Try RandomizedSearchCV if GridSearchCV is taking too long4

18. Display GridSearchCV or RandomizedSearchCV results in a DataFrame2

19. Important tuning parameters for LogisticRegression4

25. Prune a decision tree to avoid overfitting2

40. Estimators only print parameters that have been changed

44. Speed up GridSearchCV using parallel processing6

49. Tune multiple models simultaneously with GridSearchCV4

Model Evaluation

20. Plot a confusion matrix9

21. Compare multiple ROC curves in a single plot6

26. Use stratified sampling with train_test_split10

31. Shuffle your dataset when using cross_val_score4

32. Use AUC to evaluate multiclass problems4

Model Inspection

23. Display the intercept and coefficients for a linear model

24. Visualize a decision tree two different ways2

Model Ensembling

46. Ensemble multiple models using VotingClassifer or VotingRegressor2

47. Tune the parameters of a VotingClassifer or VotingRegressor2

Feature Engineering

29. Vectorize two text columns in a ColumnTransformer2

33. Use FunctionTransformer to convert functions into transformers3

45. Create feature interactions using PolynomialFeatures2

Coding Practices

10. Set a "random_state" to make your code reproducible2

36. Most parameters should be passed as keyword arguments4

Conclusion

Can I ask you a quick favor?

Request your certificate of completion

Take another course from Data School!6

Earn money by promoting Data School's courses!2