You won't find a better course to learn about NLP and Machine Learning in Python anywhere else! Kevin has a way of making difficult topics very accessible and understandable. I was able to quickly apply much of the theory and code regarding NLP and Machine Learning from this course to my own job.
- Cliff Baker, Statistician
Are you trying to master Machine Learning in Python, but tired of wasting your time on courses that don't move you towards your goal? Do you recognize the enormous value of text-based data, but don't know how to apply the right Machine Learning and Natural Language Processing techniques to extract that value?
In this Data School course, you'll gain hands-on experience using Machine Learning and Natural Language Processing to solve text-based data science problems. By the end of the course, you'll be able to confidently apply these techniques to your own data science problems.
Most data science courses suffer from a host of problems: They're poorly taught, lack the necessary depth, and include unexplained or broken code. They don't teach you how to apply what you're learning, and when you do apply it, there's no way to know how well you're doing.
But in this course, we'll go deep into Machine Learning with text, focusing on application from day one. We'll spend most of our time writing Python code, and you'll understand how every single line relates to the problem we're solving. You'll practice what you're learning through carefully crafted lessons and assignments.
At the end of this course, you'll leave with valuable Machine Learning experience, high-quality code that you can reuse to solve future text-based problems, and a wealth of curated resources to help you deepen your understanding of each course topic.
The course was a perfect introduction to Machine Learning with text, and I was able to apply topics covered during the first week to my work. Kevin does a great job of breaking down complex topics and providing a practical, real-world context for them.
- Ryan Cranfill, Data Scientist
In this self-paced course, you'll learn how to build effective Machine Learning models using text-based data to solve your own data science problems. The course includes:
14 hours of high-quality instructional videos
Well-commented lesson notebooks in Jupyter format (also available as Python scripts)
Substantial homework assignments (with provided solutions) to help you practice everything you're learning
A list of readings and videos to help prepare you for each class
Links to 100+ carefully selected resources to deepen your understanding of course topics
Lifetime access to all course materials
Money-back guarantee (within 30 days of purchase)
I used to work as a software developer and your course helped me to move on. I now have a job in the NLP/Machine Learning field which I am more passionate about.
- Jose Navarro, Machine Learning Engineer
Each module includes 2 to 4 hours of instructional videos, 1 lesson notebook, 1 to 2 homework assignments, and 15 to 20 supplementary resources.
Module 1: Working with Text Data in scikit-learn
By the end of this module, you'll be able to confidently perform the basic workflow for Machine Learning with text: creating a dataset, extracting features from unstructured text, building and evaluating models, and inspecting models for further insight. You'll also gain an understanding of Unicode, enabling you to troubleshoot encoding-based errors.
Extracting features from unstructured text using CountVectorizer
Building a MultinomialNB model for text classification
Examining a model for further insight
Model evaluation:
accuracy_score
confusion_matrix
roc_auc_score
Comparing MultinomialNB with LogisticRegression
Building a new dataset from individual text files using pandas
Unicode basics
Handling Unicode errors
Module 2: Applying Natural Language Processing Techniques to Machine Learning
By the end of this module, you'll be able to apply a handful of Natural Language Processing techniques to Machine Learning problems in order to improve the effectiveness of your models. You'll also learn how to perform sentiment analysis and build a simple document summarization tool for your own corpus of text.
What is Natural Language Processing (NLP)?
NLP terminology and examples
Tuning CountVectorizer for better model performance:
n-grams
stop words
corpus-specific stop words
minimum document frequency
Term Frequency-Inverse Document Frequency (TF-IDF) using TfidfVectorizer
Text summarization
Sentiment analysis using TextBlob
Module 3: Parsing Text Data Using Regular Expressions
By the end of this module, you'll be able to extract text features from messy data sources using regular expressions. You'll learn the basic rules and syntax that can be applied across programming languages, and you'll master the most important Python functions and options for working with regular expressions.
Basic rules and principles
Searching with re.search
Metacharacters
Greedy and lazy quantifiers
Match groups
Character classes
Alternatives
Substitution with re.sub
Anchors
Option flags
Efficiently searching for multiple matches with re.findall
Improving performance with re.compile
Writing readable regular expressions with re.VERBOSE
Module 4: Workflow for a Text-Based Data Science Problem
By the end of this module, you'll be able to create an end-to-end workflow for solving a text-based data science problem using scikit-learn and pandas. You'll gain experience with data exploration, feature engineering, proper model evaluation, model tuning, and generating predictions for new observations.
Data exploration and visualization
Feature engineering using pandas
Custom tokenization using regular expressions
Multi-class classification
Model evaluation:
train_test_split
cross_val_score
DummyClassifier
Searching for optimal tuning parameters using GridSearchCV
Chaining steps into a Pipeline
Making predictions for out-of-sample data
Module 5: Advanced Machine Learning Techniques
By the end of this module, you'll be able to apply advanced Machine Learning techniques to improve the accuracy of your models and the efficiency of your workflow. You'll learn how to build and tune a multi-step, multi-layer Machine Learning pipeline, as well as how to ensemble and stack your models.
Using a Pipeline for proper cross-validation
Tuning a Pipeline with GridSearchCV
Efficiently searching for tuning parameters using RandomizedSearchCV
Stacking sparse and dense feature matrices using SciPy
Combining the results of multiple feature extraction processes using FeatureUnion
Building multi-level pipelines and feature unions
Building custom transformers using FunctionTransformer
Improving classifier performance through ensembling
Unsupervised document clustering using cosine similarity
Basic strategies for model stacking
Practical and easy-to-follow course on advanced topics in Machine Learning. Videos are incredible, full of tips and resources. Outstanding teaching skills by Kevin and his team.
- Miguel Angel Regalado, Digital Analytics Consultant
Review the content from my free scikit-learn course and my free pandas course. If you are comfortable with most of the content, you are ready for the course!
If you are new to Python, I recommend first enrolling in Python Essentials for Data Scientists. If you are unsure whether you meet the course requirements, please email me at kevin@dataschool.io.
The course is application-focused, providing you with skills that you can immediately apply to your own data science problems.
The course is taught by an experienced data science instructor.
The lesson notebooks are carefully crafted and will serve as excellent reference materials for years to come.
All of the code is thoroughly explained, well-written, and compatible with both Python 2 and 3.
The homework assignments enable you to immediately practice what you have learned, and the included solutions are fully commented.
The 100+ post-class resources build directly on the course material, and will help you to explore each topic in more depth.
You will have lifetime access to all course materials.
Yes! I offer Purchasing Power Parity discounts (also known as location-based discounts) for all of my paid courses. If you're located in one of the 160+ qualifying countries, you should automatically see a discount code at the top of this page.
I also offer student discounts and hardship-based discounts, regardless of where you live. Please email me at kevin@dataschool.io and I'd be happy to send you the appropriate discount code.
Shortly after enrolling in the course, you will be given access to all course materials. You can work through the course at your own pace.
You will have lifetime access to all course materials.
You'll leave the course with valuable Machine Learning experience, high-quality code that you can reuse to solve future text-based problems, and a wealth of curated resources to help you deepen your understanding of each course topic.
You can post a question below any lesson, and I'll do my best to respond!
My name is Kevin, and I've taught Data Science in Python to over a million students.
My courses explain data science topics in a clear, thorough, and step-by-step manner.
I'd love to teach you, regardless of your educational background or professional experience.
Thanks for joining me! 🙌