# A collection of machine learning resources that I've found helpful (I only post what I've read!) Wednesday, Nov 18, 2020

## GitHUB SOURCE : https://github.com/bradleyboehmke/data-science-learning-resources/

# Data Science Learning Resources

## Programming

### General

- The Pragmatic Programmer (Book)
- Clean Code (Book)

### Python

- A Whirlwind Tour of Python (Book)
- Python Data Science Handbook
- Python Tricks (Book)
- Learning Python (Book)
- Effective Python (Book)

### R

- R for Data Science (Book)
- Advanced R (Book)
- R Markdown: The Definitive Guide (Book)
- bookdown: Authoring Books and Technical Documents with R Markdown (Book)
- Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving (Book)
- Automated Data Collection with R (Book)
- Introduction to Data Science (Book)

### Spark

- Spark: The Definitive Guide: Big Data Processing Made Simple (Book)
- Learning Spark: Lightning-Fast Big Data Analysis (Book)
- Mastering Spark with R: The Complete Guide to Large-Scale Analysis and Modeling (Book)

### Command Line

- The Missing Semester of Your CS Education (Online course)
- Learning the bash Shell (Book)
- The Art of the Command Line (GitHub resources)
- explainshell.com (Online help)

### Containers

- Docker tips & tricks or just useful commands (Online article)
- Rocker: R configurations for Docker (GitHub resources)
- Docker and Python: making them play nicely and securely for Data Science and ML (PyCon Talk)

### Functional Programming

- An Introduction to the Basic Principles of Functional Programming (Online article)
- R for Data Science, Ch. 21 (Book)
- Advanced R, Ch. 9 (Book)
- Jenny Bryan’s purrr tutorials (Online tutorial)
- Foundations of Functional Programming with purrr (DataCamp)
- Intermediate Functional Programming with purrr (DataCamp)

### Version Control

- Excuse me, do you have a moment to talk about version control? (Paper)
- Happy Git and GitHub for the useR (Book)
- Learn Git (Online tutorial)
- Git Commit Message Style Guide (Online guide)

### Code Packaging

### Style Guide, Readability, Best Practices

- The Art of Readable Code (Book)
- The Tidyverse Style Guide (Online book)
- PEP 8 – Style Guide for Python Code (Online guide)
- Guidelines for code reviews (README)
- Code Review Best Practices (Blog post)

### Testing

- Testing R Code (Book)
- Python Testing with pytest (Book)
- Multiply your Testing Effectiveness with Parameterized Testing (PyCon Talk)
- Test-Driven Development (Book)

## Machine Learning

### General

- Introduction to Statistical Learning (Book)
- Applied Predictive Modeling (Book)
- Elements of Statistical Learning (Book)
- Computer Age of Statistical Inference (Book)
- Statistical Modeling: The Two Cultures (Paper)
- Deep Learning (Book)
- Hands-On Machine Learning with Scikit-Learn & TensorFlow (Book | GitHub)
- Hands-On Machine Learning with R (Book)
- Google’s Machine Learning Crash Course (MOOC)

### Unsupervised Modeling

- ISLR: Ch. 10.3 Clustering Methods (Book chapter)
- A K-Means Clustering Algorithm (Paper)
- Generalized Low Rank Models (Paper)
- Deep Learning Ch. 15 Autoencoders (Book chapter)
- Hands-On Mach. Learning with Scikit-Learn Ch. 15 Autoencoders (Book chapter | GitHub resource)
- Sparse autoencoder (Andrew Ng CS294A lecture notes)

### A/B Testing

- Lessons from Running Thoursands of A/B Tests (Online presentation with many references)
- Online Controlled Experiments at Large Scale (Paper)
- Peaking at A/B Tests (Paper)
- Multi-armed Bandit (Online tutorial)
- A Modern Bayesian Look at the Multi-armed Bandit (Paper behind above online tutorial)
- Predicting Search Satisfaction Metrics with Interleaved Comparisons (Paper)
- Evaluating Retrieval Performance using Clickthrough Data (Paper)

### Multivariate Adaptive Regression Splines

- Multivariate Adaptive Regression Splines (Friedman’s original paper)
- APM: Ch. 7.2 Multivariate Adaptive Regression Splines (Book chapter)
- ESL: Ch. 9.4 Multivariate Adaptive Regression Splines (Book chapter)
- Notes on the
**earth**package (Paper)

### K-Nearest Neighbor

- k-Nearest neighbour classifiers (Paper)
- APM: Ch. 7.4 & 13.5 K-Nearest Neighbors (Book chapter)
- ESL: Ch. 13.3 k-Nearest-Neighbor Classifiers (Book chapter)

### Random Forests

- An Introduction to Recursive Partitioning Using the RPART Routines (Paper)
- Random Forests - Leo Breiman’s original research paper (Paper)

### Gradient Boosting Machines

- How to explain gradient boosting (Online tutorial)
- Trevor Hastie - Gradient Boosting & Random Forests at H2O World 2014 (YouTube)
- Trevor Hastie - Data Science of GBM (2013) (slides)
- Mark Landry - Gradient Boosting Method and Random Forest at H2O World 2015 (YouTube)
- Peter Prettenhofer - Gradient Boosted Regression Trees in scikit-learn at PyData London 2014 (YouTube)
- Alexey Natekin1 and Alois Knoll - Gradient boosting machines, a tutorial (Paper)

### Deep Learning

- Deep Learning with R (Book)
- Deep Learning with Python (Book)
- Deep Learning Specialization (MOOC)
- keras.rstudio.com (Online articles & tutorials)
- blogs.rstudio.com/tensorflow (Online articles & tutorials)
- Illustrated Guide to Recurrent Neural Networks (Blog)
- Illustrated Guide on Vanishing Gradients (Blog)
- Illustrated Guide to LSTMs and GRUs (Blog)
- Understanding LSTMs (Blog)
- Rohan & Lenny: Recurrent Neural Networks & LSTMs (Blog)
- The Unreasonable Effectiveness of Recurrent Neural Networks (Blog)
- Revisiting Small Batch Training for Deep Neural Networks (Paper)
- On Loss Functions for Deep Neural Networks in Classification (Paper)
- Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper)
- Efficient BackProp (Paper)
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (Paper)
- Cyclical Learning Rates for Training Neural Networks (Paper)
- A Disciplined Approach to Neural Network Hyperparameters: Part 1 – Learning Rate, Batch Size, Momentum, and Weight Decay (Paper)
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (Paper)

### Ensembles / Model Stacking / Super Learners

- Ensemble Methods in Machine Learning (Paper)
- Stacked Regressions (Paper)
- Super Learner (Paper)

### Natural Language Processing / Text Mining

- Text Mining with R (Book)
- Probabilistic Topic Models (Paper)
- The Illustrated Word2vec (Online tutorial)
- Sebastian Ruder’s series on Word Embeddings (Online articles & tutorials)
- Neural Models for Information Retrieval (Paper)
- Why do we use word embeddings in NLP? (Blog)

### Tuning

- Hyperparameters and Tuning Strategies for Random Forest (Paper)
- Tunability: Importance of Hyperparameters of Machine Learning Algorithms (Paper)
- Machine Learning Benchmarks and Random Forest Regression (Paper)
- Random Search for Hyperparameter Optimization (Paper)

### Feature Engineering

- Feature Engineering for Machine Learning (Book)
- Feature Engineering and Selection: A Practical Approach for Predictive Models (Book)

### Feature Selection

- Feature Selection with the Boruta Package (Paper)
- APM: Ch. 19 An Introduction to Feature Selection (Book chapter)

### Machine Learning Interpretability

- Scott Lundberg’s presentation on SHAP
- H2O.ai Machine Learning Interpretability Resources (GitHub resources)
- Patrick Hall’s Awesome Machine Learning Interpretability Resources (GitHub resources)
- Interpretable Machine Learning (Book)
- Visualizing the Feature Importance for Black Box Models (Paper)
- A Simple and Effective Model-Based Variable Importance Measure (Paper)
- Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation (Paper)
- pdp: An R Package for Constructing Partial Dependence Plots (Paper)
- “Why Should I Trust You?": Explaining the Predictions of Any Classifier (Paper)
- A Unified Approach to Interpreting Model Predictions (Paper)
- Consistent Individualized Feature Attribution for Tree Ensembles (Paper)
- On the Art and Science of Machine Learning Explanations (Paper)
- Explanation in artificial intelligence: Insights from the social sciences (Paper)
- Please Stop Permuting Features: An Explanation and Alternatives (Paper)
- A Stratification Approach to Partial Dependence for Codependent Variables (Paper)
- Explaining Machine Learning Classifiers through Diverse Counterfactual Examples (Paper)

### Auto ML

- A Review of Automatic Selection Methods for Machine Learning Algorithms and Hyperparameter Values (Paper)
- Learning Multiple Defaults for Machine Learning Algorithms (Paper)

### Benchmarking

- The Design and Analysis of Benchmark Experiments (Paper)
- Szilard Pafka’s ML Benchmarking Research (GitHub resources)
- Data-driven advice for applying machine learning to bioinformatics problems (Paper)

### Resampling Procedures

- Futility Analysis in the Cross-Validation of Machine Learning Models (Paper)
- Estimating Classification Error Rate: Repeated Cross-validation, Repeated Hold-out, and Bootstrap (Paper)

### Productionalization

- 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper)
- Hidden Technical Debt in Machine Learning Systems (Paper)
- Deep Learning in Production (Github resources)

## Leadership & Strategy

Unauthorized reproduction of this site is prohibited, and offenders will be held accountable for their legal
responsibilities.

Article Title: A collection of machine learning resources that I've found helpful (I only post what I've read!)

This article URL：A collection of machine learning resources that I've found helpful (I only post what I've read!)

Article Title: A collection of machine learning resources that I've found helpful (I only post what I've read!)

This article URL：A collection of machine learning resources that I've found helpful (I only post what I've read!)

### All Categories

##### Github Trending Repositories

Explore Github Trending Repositories. See what the GitHub community is most excited about today.

Copyright 2010 - 2020