1. Use ColumnTransformer to apply different preprocessing to different columns

1. Use ColumnTransformer to apply different preprocessing to different columns

Preview unavailable

You must log in or sign up to view this lesson.

LoginSign up

50 scikit-learn tips

Buy nowLearn more

Introduction

  • Welcome to the course!31
  • Download the course notebooks8

Data Preprocessing

  • 1. Use ColumnTransformer to apply different preprocessing to different columns9
  • 2. Seven ways to select columns using ColumnTransformer2
  • 3. What is the difference between "fit" and "transform"?2
  • 4. Use "fit_transform" on training data, but "transform" (only) on testing/new data2
  • 38. Get the feature names output by a ColumnTransformer5
  • 42. Passthrough some columns and drop others in a ColumnTransformer4

Using pandas

  • 5. Four reasons to use scikit-learn (not pandas) for ML preprocessing6
  • 35. Don't use .values when passing a pandas object to scikit-learn2
  • 39. Load a toy dataset into a DataFrame2

Categorical Features

  • 6. Encode categorical features using OneHotEncoder or OrdinalEncoder14
  • 7. Handle unknown categories with OneHotEncoder by encoding them as zeros2
  • 15. Three reasons not to use drop='first' with OneHotEncoder6
  • 41. Drop the first category from binary features (only) with OneHotEncoder6
  • 43. Use OrdinalEncoder instead of OneHotEncoder with tree-based models8

Missing Values

  • 9. Add a missing indicator to encode "missingness" as a feature
  • 11. Impute missing values using KNNImputer or IterativeImputer11
  • 14. HistGradientBoostingClassifier natively supports missing values4
  • 27. Two ways to impute missing values for a categorical feature

Pipelines

  • 8. Use Pipeline to chain together multiple steps4
  • 12. What is the difference between Pipeline and make_pipeline?8
  • 13. Examine the intermediate steps in a Pipeline2
  • 22. Use the correct methods for each type of Pipeline2
  • 28. Save a model or Pipeline using joblib2
  • 30. Four ways to examine the steps of a Pipeline4
  • 34. Add feature selection to a Pipeline2
  • 37. Create an interactive diagram of a Pipeline in Jupyter2
  • 48. Access part of a Pipeline using slicing
  • 50. Adapt this pattern to solve many Machine Learning problems2

Intermission

  • Can I ask you a quick favor?

Parameter Tuning

  • 16. Use cross_val_score and GridSearchCV on a Pipeline12
  • 17. Try RandomizedSearchCV if GridSearchCV is taking too long4
  • 18. Display GridSearchCV or RandomizedSearchCV results in a DataFrame2
  • 19. Important tuning parameters for LogisticRegression4
  • 25. Prune a decision tree to avoid overfitting2
  • 40. Estimators only print parameters that have been changed
  • 44. Speed up GridSearchCV using parallel processing6
  • 49. Tune multiple models simultaneously with GridSearchCV4

Model Evaluation

  • 20. Plot a confusion matrix9
  • 21. Compare multiple ROC curves in a single plot6
  • 26. Use stratified sampling with train_test_split10
  • 31. Shuffle your dataset when using cross_val_score4
  • 32. Use AUC to evaluate multiclass problems4

Model Inspection

  • 23. Display the intercept and coefficients for a linear model
  • 24. Visualize a decision tree two different ways2

Model Ensembling

  • 46. Ensemble multiple models using VotingClassifer or VotingRegressor2
  • 47. Tune the parameters of a VotingClassifer or VotingRegressor2

Feature Engineering

  • 29. Vectorize two text columns in a ColumnTransformer2
  • 33. Use FunctionTransformer to convert functions into transformers3
  • 45. Create feature interactions using PolynomialFeatures2

Coding Practices

  • 10. Set a "random_state" to make your code reproducible2
  • 36. Most parameters should be passed as keyword arguments4

Conclusion

  • Can I ask you a quick favor?
  • Request your certificate of completion
  • Take another course from Data School!6
  • Earn money by promoting Data School's courses!2