Gradient Boost Part 2 (of 4): Regression Details

StatQuest with Josh Starmer
StatQuest with Josh Starmer
294.5 هزار بار بازدید - 6 سال پیش - Gradient Boost is one of
Gradient Boost is one of the most popular Machine Learning algorithms in use. And get this, it's not that complicated! This video is the second part in a series that walks through it one step at a time. This video focuses on the original Gradient Boost algorithm used to predict a continuous value, like someone's weight. We call this, "using Gradient Boost for Regression". In part 3, we'll walk though how Gradient Boost classifies samples into two different categories, and in part 4, we'll go through the math again, this time focusing on classification.

This StatQuest assumes that you have already watched Part 1:
Gradient Boost Part 1 (of 4): Regress...

...it also assumes that you know about Regression Trees:
Regression Trees, Clearly Explained!!!

...and, while it required, it might be useful if you understood Gradient Descent: Gradient Descent, Step-by-Step

For a complete index of all the StatQuest videos, check out:
https://statquest.org/video-index/

This StatQuest is based on the following sources:

A 1999 manuscript by Jerome Friedman that introduced Stochastic Gradient Boost: https://jerryfriedman.su.domains/ftp/...

The Wikipedia article on Gradient Boosting: https://en.wikipedia.org/wiki/Gradien...
NOTE: The key to understanding how the wikipedia article relates to this video is to keep reading past the "pseudo algorithm" section. The very next section in the article called "Gradient Tree Boosting" shows how the algorithm works for trees (which is pretty much the only weak learner people ever use for gradient boost, which is why I focus on it in the video). In that section, you see how the equation is modified so that each leaf from a tree can have a different output value, rather than the entire "weak learner" having a single output value - and this is the exact same equation that I use in the video.
Later in the article, in the section called "Shrinkage", they show how the learning rate can be included. Since this is also pretty much always used with gradient boost, I simply included it in the base algorithm that I describe.

The scikit-learn implementation of Gradient Boosting: https://scikit-learn.org/stable/modul...

If you'd like to support StatQuest, please consider...

Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - https://statquest.gumroad.com/l/wvtmc
Paperback - https://www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - https://www.amazon.com/dp/B09ZG79HXC

Patreon: Patreon: statquest
...or...
YouTube Membership: @statquest

...a cool StatQuest t-shirt or sweatshirt:
https://shop.spreadshirt.com/statques...

...buying one or two of my songs (or go large and get a whole album!)
https://joshuastarmer.bandcamp.com/

...or just donating to StatQuest!
https://www.paypal.me/statquest

Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
Twitter: joshuastarmer

0:00 Awesome song and introduction
0:00 Step 0: The data and the loss function
6:30 Step 1: Initialize the model with a constant value
9:10 Step 2: Build M trees
10:01 Step 2.A: Calculate residuals
12:47 Step 2.B: Fit a regression tree to the residuals
14:50 Step 2.C: Optimize leaf output values
20:38 Step 2.D: Update predictions with the new tree
23:19 Step 2: Summary of step 2
24:59 Step 3: Output the final prediction

Corrections:
4:27 The sum on the left hand side should be in parentheses to make it clear that the entire sum is multiplied by 1/2, not just the first term.
15:47. It should be R_jm, not R_ij.
16:18, the leaf in the script is R_1,2 and it should be R_2,1.
21:08. With regression trees, the sample will only go to a single leaf, and this summation simply isolates the one output value of interest from all of the others. However, when I first made this video I was thinking that because Gradient Boost is supposed to work with any "weak learner", not just small regression trees, that this summation was a way to add flexibility to the algorithm.
24:15, the header for the residual column should be r_i,2.

#statquest #gradientboost
6 سال پیش در تاریخ 1398/01/12 منتشر شده است.
294,503 بـار بازدید شده
... بیشتر