Overfitting and Underfitting Geoff Hulten Dont expect your favorite learner to always be best Try different approaches and compare Bias and Variance Bias error caused because the model can not represent the concept Variance error caused because the learning algorithm overreacts to small changes (noise) in the training data
TotalLoss = Bias + Variance (+ noise) Visualizing Bias Goal: produce a model that matches this concept True Concept Visualizing Bias Goal: produce a model that matches this concept Training Data for the concept
Training Data Visualizing Bias Goal: produce a model that matches this concept Training Data for concept Bias Mistakes Model Predicts + Bias: Cant represent it
Model Predicts - Fit a Linear Model Visualizing Variance Goal: produce a model that matches this concept New data, new model Different Bias Mistakes Model Predicts +
Model Predicts - Fit a Linear Model Visualizing Variance Goal: produce a model that matches this concept New data, new model New data, new model Mistakes will vary Model Predicts +
Model Predicts - Variance: Sensitivity to changes & noise Fit a Linear Model Another way to think about Bias & Variance Bias and Variance: More Powerful Model Model
Powerful Models can represent complex concepts Predicts + No Mistakes! Model Predicts - Bias and Variance: More Powerful Model Model But get more data
Predicts + Not good! Model Predicts - Overfitting vs Underfitting Overfitting Underfitting Fitting the data too well
Learning too little of the true concept Features are noisy / uncorrelated to concept Modeling process very sensitive (powerful) Too much search Features dont capture concept Too much bias in model Too little search to fit model
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
0.1 0.2 0.3 0.4 0.5 0.6
0.7 0.8 0.9 1 The Effect of Noise The Effect of Features Not much info Wont learn well
Powerful -> high variance Throw out Captures concept Simple model -> low bias Powerful -> low variance N The Power of a Model Building Process Weaker Modeling Process
( higher bias ) More Powerful Modeling Process (higher variance) Simple Model (e.g. linear) Complex Model (e.g. high order polynomial) Fixed sized Model (e.g. fixed # weights) Scalable Model (e.g. decision tree)
Small Feature Set (e.g. top 10 tokens) Constrained Search (e.g. few iterations of gradient descent) Large Feature Set (e.g. every token in data) Unconstrained Search (e.g. exhaustive search) Example of Under/Over-fitting Ways to Control Decision Tree Learning
Increase minToSplit Increase minGainToSplit Limit total number of Nodes Penalize complexity ( )= ( ,+ ) 2 ( )
Ways to Control Logistic Regression Adjust Step Size Adjust Iterations / stopping criteria of Gradient Descent Regularization h ( )= ( + , )
Modeling to Balance Under & Overfitting Data Learning Algorithms Feature Sets Complexity of Concept Search and Computation
Parameter sweeps! Parameter Sweep # optimize first parameter for p in [ setting_certain_to_underfit, , setting_certain_to_overfit]: # do cross validation to estimate accuracy # find the setting that balances overfitting & underfitting # optimize second parameter # etc # examine the parameters that seem best and adjust whatever you can
Types of Parameter Sweeps Optimize one parameter at a time Optimize one, update, move on Iterate a few times Gradient descent on metaparameters Start somewhere reasonable Computationally calculate gradient wrt change in parameters Grid
Try every combination of every parameter Quick vs Full runs Expensive parameter sweep on small sample of data (e.g. grid) A bit of iteration on full data to refine Intuition & Experience Learn your tools Learn your problem Summary of Overfitting and Underfitting
Bias / Variance tradeoff a primary challenge in machine learning Internalize: More powerful modeling is not always better Learn to identify overfitting and underfitting Tuning parameters & interpreting output correctly is key
Vocabulary Word Maps. How to write a scary or suspenseful narrative. Inquiry and Editing. Wednesday, November 4, 2015. DO NOW: Voice level 1. Get your binder. ... See the word in your mind. Say each of the individual letters to...
The projections are based on calculations carried out using the ENPEP (Energy and Power Evaluation Program) programs package, developed by Argonne National laboratory of US Department of Energy (DOE) and distributed to Romania by the International Atomic Energy Agency (IAEA)...
We cant predict when something bad happens after sensing, but the definitely predict trends. ... N-Rich Strip not visible. Producer rate of 235. SBNRC 115 lb N ac rec, VRT average 83 lb N ac-1. 195 bu/ac, 190, and 190...
Cell Potential. A galvanic cell consists of an oxidizing agent (in cathode half-cell) and a reducing agent (in anode half-cell). Electrons flows through a wire from the anode half-cell to the cathode half-cell.
Chapter 4, part B Functional Anatomy of Prokaryotic and Eukaryotic Cells Plasma Membrane Selective permeability allows passage of some molecules Enzymes for ATP production Photosynthetic pigments on foldings called chromatophores or thylakoids Plasma Membrane Damage to the membrane by alcohols,...
Times New Roman Arial Gulim Default Design Outlook File Attachment Microsoft Photo Editor 3.0 Photo Microsoft Map NLANR/MNA Measurement and Network Analysis NLANR/MNA Passive measurement deployment status Active measurement deployment status AMP web info (1) AMP web info (2) International...
Native American Crafts: Choose a tribe that you would like to learn more about. Each tribal station has a video about how to make a craft that is common in that tribe's culture. Watch the video and look at the...
Ready to download the document? Go ahead and hit continue!