Learning Data Science
What I would tell myself at the start. Books I return to, courses worth finishing, people whose writing makes you better at the craft, and tools that show up in real work.
Books
Read these in roughly this order if you are starting from scratch.
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
Best practical starting point. Covers the full ML stack from regression to deep learning with real code.
- An Introduction to Statistical Learning (ISLR)
The cleanest conceptual foundation for supervised learning. Free PDF at statlearning.com.
- Deep Learning
Rigorous theory behind neural networks. Dense, but worth it once you have the basics.
- Designing Machine Learning Systems
How ML actually works in production. Data pipelines, feature stores, monitoring, and drift. Essential for practitioners.
- The Elements of Statistical Learning (ESL)
The graduate-level companion to ISLR. Go here when ISLR feels too light.
- Python for Data Analysis
Definitive reference for pandas from its creator. Still the fastest way to get fluent with data wrangling.
Courses
Structured learning with feedback loops. Pick one and finish it before starting another.
-
Top-down, code-first. You build a working model in the first lesson. Jeremy Howard is one of the best teachers in ML.
-
The classic introduction. Clear intuition-building for gradient descent, regularisation, and model selection.
-
Covers CNNs, RNNs, optimisation, and the nuts and bolts of training deep networks.
-
MLOps, deployment, and the production side of ML. Free lectures on YouTube.
-
Most underrated skill in data science. Master SQL before anything else.
People to Follow
Reading good practitioners think out loud is underrated as a learning method.
- Lilian Weng
OpenAI safety researcher. Writes the clearest long-form explanations of complex ML papers anywhere.
- Andrej Karpathy
Former Tesla AI director. His "Neural Networks: Zero to Hero" YouTube series is exceptional.
- Eugene Yan
Applied ML at Amazon / Humans of AI. Writes deeply about production ML systems and the craft of data science.
- Sebastian Raschka
Author of Python Machine Learning. Writes detailed, well-cited posts on everything from LLMs to vision transformers.
- Chip Huyen
MLOps and ML systems. Her writing is the standard reference for production machine learning.
Tools and Libraries
What actually shows up in production projects, not just tutorials.
-
scikit-learnWhere you learn ML algorithms. Clean API, great docs, battle-tested.
-
pandas + polarspandas for exploration, polars when you need speed at scale.
-
PyTorchThe research standard. Intuitive, Pythonic, and transferable to any deep learning job.
-
Weights and BiasesExperiment tracking done right. Free tier is generous enough for personal projects.
-
GradioFastest way to put an ML demo in front of a non-technical stakeholder.
-
DVCGit for datasets and models. Essential once you have more than one version of anything.
-
Great ExpectationsData validation and quality checks. Catches data drift before your model silently breaks.