14. HistGradientBoostingClassifier natively supports missing values

14. HistGradientBoostingClassifier natively supports missing values

Preview unavailable

You must log in or sign up to view this lesson.

LoginSign up

50 scikit-learn tips

Buy nowLearn more

Introduction

  • Welcome to the course!31
  • Download the course notebooks8

Data Preprocessing

  • 1. Use ColumnTransformer to apply different preprocessing to different columns9
  • 2. Seven ways to select columns using ColumnTransformer2
  • 3. What is the difference between "fit" and "transform"?2
  • 4. Use "fit_transform" on training data, but "transform" (only) on testing/new data2
  • 38. Get the feature names output by a ColumnTransformer5
  • 42. Passthrough some columns and drop others in a ColumnTransformer4

Using pandas

  • 5. Four reasons to use scikit-learn (not pandas) for ML preprocessing6
  • 35. Don't use .values when passing a pandas object to scikit-learn2
  • 39. Load a toy dataset into a DataFrame2

Categorical Features

  • 6. Encode categorical features using OneHotEncoder or OrdinalEncoder14
  • 7. Handle unknown categories with OneHotEncoder by encoding them as zeros2
  • 15. Three reasons not to use drop='first' with OneHotEncoder6
  • 41. Drop the first category from binary features (only) with OneHotEncoder6
  • 43. Use OrdinalEncoder instead of OneHotEncoder with tree-based models8

Missing Values

  • 9. Add a missing indicator to encode "missingness" as a feature
  • 11. Impute missing values using KNNImputer or IterativeImputer11
  • 14. HistGradientBoostingClassifier natively supports missing values4
  • 27. Two ways to impute missing values for a categorical feature

Pipelines

  • 8. Use Pipeline to chain together multiple steps4
  • 12. What is the difference between Pipeline and make_pipeline?8
  • 13. Examine the intermediate steps in a Pipeline2
  • 22. Use the correct methods for each type of Pipeline2
  • 28. Save a model or Pipeline using joblib2
  • 30. Four ways to examine the steps of a Pipeline4
  • 34. Add feature selection to a Pipeline2
  • 37. Create an interactive diagram of a Pipeline in Jupyter2
  • 48. Access part of a Pipeline using slicing
  • 50. Adapt this pattern to solve many Machine Learning problems2

Intermission

  • Can I ask you a quick favor?

Parameter Tuning

  • 16. Use cross_val_score and GridSearchCV on a Pipeline12
  • 17. Try RandomizedSearchCV if GridSearchCV is taking too long4
  • 18. Display GridSearchCV or RandomizedSearchCV results in a DataFrame2
  • 19. Important tuning parameters for LogisticRegression4
  • 25. Prune a decision tree to avoid overfitting2
  • 40. Estimators only print parameters that have been changed
  • 44. Speed up GridSearchCV using parallel processing6
  • 49. Tune multiple models simultaneously with GridSearchCV4

Model Evaluation

  • 20. Plot a confusion matrix9
  • 21. Compare multiple ROC curves in a single plot6
  • 26. Use stratified sampling with train_test_split10
  • 31. Shuffle your dataset when using cross_val_score4
  • 32. Use AUC to evaluate multiclass problems4

Model Inspection

  • 23. Display the intercept and coefficients for a linear model
  • 24. Visualize a decision tree two different ways2

Model Ensembling

  • 46. Ensemble multiple models using VotingClassifer or VotingRegressor2
  • 47. Tune the parameters of a VotingClassifer or VotingRegressor2

Feature Engineering

  • 29. Vectorize two text columns in a ColumnTransformer2
  • 33. Use FunctionTransformer to convert functions into transformers3
  • 45. Create feature interactions using PolynomialFeatures2

Coding Practices

  • 10. Set a "random_state" to make your code reproducible2
  • 36. Most parameters should be passed as keyword arguments4

Conclusion

  • Can I ask you a quick favor?
  • Request your certificate of completion
  • Take another course from Data School!6
  • Earn money by promoting Data School's courses!2