Quantitative Methods Analyzing dichotomous dummy variables Logistic Regression Analysis Like ordinary regression and ANOVA, logistic regression is part of a category of models called generalized linear models. Generalized linear models were developed to unify various statistical models (linear

regression, logistic regression, poisson regression). We can think of maximum likelihood as a general algorithm to estimate all these models. Logistic Regression Analysis-GLM GLM Each outcome of the dependent variable (that is, each Y) is assumed to be generated from a particular distribution function in the

exponential family (normal, binomial, poisson, etc.) Logistic Regression Analysis (a diversion into probability distributions) Normal distributiona family of distributions, each member of which can be defeind by the mean and variancemany physical phenomena can be approximated well by the normal distribution. Binomial distributionprobability distribution of #

of successes in a sequence of Bermoulli trials (where outcomes fall into one of two categories i.e., occurred and did not occur. Note that in large samples, if the dependent variable is not too skewed, then the normal distribution approximates the binomial distribution. Logistic Regression Analysis (a diversion into probability distributions) Poisson Distributionexpresses the probability of a

# of events occurring in a fixed period of time, if the events occur with a known average rate, and independently of the time since the last event. (Note that the negative binomial distribution is used to model event counts that are skewed. One can also think about the polya distribution which can be used to model occurrences of contagious discrete events tornado outbreaks.

Logistic Regressionwhen? Logistic regression models are appropriate for dependent variables coded 0/1. We only observe 0 and 1 for the dependent variablebut we think of the dependent variable conceptually as a probability that 1 will occur. Logistic Regression--examples Some examples

Vote for Obama (yes, no) Turned out to vote (yes, no) Sought medical assistance in last year (yes, no) Logistic Regressionwhy not OLS? Why cant we use OLS? After all, linear regression is so straightforward, and (unlike other models) actually has a closed form solution for the estimates.

Logistic Regressionwhy not OLS? Three problems with using OLS. First, what is our dependent variable, conceptually? It is the probability of y=1. But we only observe y=0 and y=1. If we use OLS, well get predicted values that fall between 0 and 1which is what we want but well also get predicted values that are greater than 1, or less than 0. That makes

no sense. Logistic RegressionWhy not OLS? Three problems using OLS. Second problemthere is heteroskedasticity in the model. Think about the meaning of residual. The residual is the difference between the observed and the predicted Y. By definition, what will that residual look like at the center of the distribution?

By definition, what will that residual look like at the tails of the distribution? Logistic Regressionwhy not OLS? Three problems using OLS. The third problem is substantive. The reality is that many choice functions can be modeled by an S-shaped curve. Therefore (much as when we discussed linear transformations of the X variable), it makes sense to model a non-linear

relationship. Logistic Regressionbut similar to OLS.... So. We actually could correct for the heteroskedasticity, and we could transform the equation so that it captured the non-linear relationship, and then use linear regression. But what we usually

do.... Logistic Regressionbut similar to OLS... ...is use logistic regression to predict the probability of the occurrence of an event. Logistic Regressions shaped curve

Logistic Regression S shaped curve and Bernoulli variables Note that the observed dependent variable is a Bernoulli (or binary) variable. But what we are really interested in is predicting the probability that an event occurs (i.e., the probability that y=1).

Logistic Regression--advantage Logistic regression is particularly handy because (unlike, say, discriminant analysis) it makes no assumptions about how the independent variables are distributed. They dont have to be continuous versus categorical, normally distributedthey can take any form.

Logistic Regression exponential values and natural logs Noteexp is the exponential function. Ln is the natural log. These are opposites. When we take the exponential function of any number, we take 2.72 raised to the power of that number. So, exp(3)=2.72 * 2.72 * 2.72=20.09. If we take ln (20.09), we get the number 3.

Logistic Regression--transformation Note that you can think of logistic regression in terms of transforming the dependent variable so that it fits an sshaped curve. Note that the odds ratio is the probability that a case will be a 1 divided by the probability that it will not be a 1. The natural log of the odds ratio is the logit and it is a linear function of the xs (that is, of the right hand side of the model).

Logistic Regression--transformation Note that you can equivalently talk about modelling the probability that y=1 (theta, below), as below (these are the same mathematical expressions): Logistic Regression Note that the independent variables are not related to the probability that y=1.

However, the independent variables are linearly related to the logit of the dependent variables. Logistic Regression--recap Logistic regression analysis, in other words, is very similar to OLS regression, just with a transformation of the regression formula. We also use binomial theory to conduct the tests.

Logistic RegressionModel fit Recall that in OLS, we minimized the squared residuals in order to find the line that best fit the data. In logistic regression analysis, we use a calculus-based function called Maximum Likelihood. Logistic RegressionMLE

Through an iterative process, it finds the function that will maximize our ability to predict the probability of y based on what we know about x. In other words, ML will find the best values for the estimated effect of party, ideology, sex, race, etc. the predict the likelihood that someone will vote for Obama. Logistic Regression Analysis-iteration

In other words, MLE starts with an initial (arbitrary) guesstimate of what the coefficients will be, and then determines the direction and size change which will increase the log likelihood (goodness of fitthat is, how likely it is that the observed value of the dependent variable can be predicted from the observed variables of the independent variables).

Logistic Regression Analysis-iteration After estimating an initial function, the program continues estimating with new estimates to reach an improved functionuntil convergence is reached (that is, the log likelihood, or the goodness of fit, does not

change significantly). Logistic Regression--tests There are two main forms of the likelihood ratio test for goodness of fit. Logistic Regression--tests 1. Test of the overall model (model chi-square test). Compares the researchers model to a

reduced model (the baseline model with the constant only). A well fitting model is significant at the .05 level or abovethat is, a well fitting model is one that fits the data better than a model with only the constant. A finding of significance means that one can reject the null hypothesis that all of the predictor effects are zero (this is equivalent to an f test in OLS.) Logistic Regression--tests

2. Test of individual model parameters. (Note that the Wald statistic has a chi-squared distribution, but other than that, it is just the same as the t that we use in OLS.) You can also calculate a likelihood ratio statistic. Essentially, one is comparing the goodness of fit for the overall model with the goodness of fit with a nested model which drops an independent variable. (This is generally considered preferable to the Wald statistic if the coefficient values are very high). Logistic Regression-interpretation

Most commonly, with all other variables held constant, there is a constant increase of b1 in the logit (p) for every 1-unit increase in x1. But remember that even though the right hand side of the model is linearly related to the logit (that is, to the natural log of the odds-ratio), what does it mean for the actual probability that y=1? Logistic Regression

Its fairly straightforwardits multiplicative. If b1 takes the value of 2.3 (and we know that exp(2.3)=10), then if x1 increases by 1, the odds that the dependent variable takes the value of 1 increase tenfold. Logistic Regression presentation Likewise, its difficult to explain to the reader what the

parameter estimates meanbecause they reflect changes in the logit (the natural log of the odds-ratio) for each one-unit change in x. But what you want to tell your readers is how much the probability that y=1 changes (given a 1-unit change in x). Logistic Regressiontransform back So, you need to transform into predicted probabilities.

Create predicted ys (just as you would in OLSpredicted y=a + bx + bx....) And then transform: epy / (1 + epy) = predicted probability (many software packages will do this for you. See Gary King. Or, if you are fond of rotary dial phones, create your own excel file to do this (which has the advantage of flexibility)). Logistic Regressionlogit v. probit Whats the difference? Well, MLE

requires assumptions about the probability distribution of the errors logistic regression uses the standard logistic probability distribution, whereas probit uses the standard normal distribution. Logistic Regressionlogit v. probit Logit is more common. And note that logit and probit often give the same

results. But note that there can be differences between the two link functions see this paper by Hahn and Soyer. Logistic Regressionordered logit Ordered models assume there's some underlying, unobservable true outcome variable, occurring on an interval scale.

We don't observe that interval-level information about the outcome, but only whether that unobserved value crosses some threshold(s) that put the outcome into a lower or a higher category, categories which are ranked, revealing ordinal but not interval-level information. Logistic Regressionordered logit If you are using ordered logit, you will get results that include cut points

(intercepts) and coefficients. OLR essentially runs multiple equationsone less than the number of options on ones scale. Logistic Regressionordered logit For example, assume that you have a 4 point scale, 1=not at all optimistic, 2=not very optimistic, 3=somewhat optimistic, and 4=very optimistic.

The first equation compares the likelihood that y=1 to the likelihood that y does not =1 (that is, y=2 or 3 or 4) Logistic Regressionordered logit The second equation compares the likelihood that y=1 or 2 to the likelihood that y=3 or 4. The third equation compares the likelihood that y=1, 2, or 3 to the

likelihood that y=4. Logistic Regressionordered logit Note that OLR only reports one parameter estimate for each indpendent variable. That is, it constrains the parameter estimates to be constant across categories. Logistic Regressionordered logit

It assumes that the coefficients for the variables would not vary if one actually separately estimated the different equations. Logistic Regressionordered logit (Note that in Stata one can actually test if this assumption is true, without running the separate models. Theres some parallel here to the non-linearity issue we discussed

last week, where OLS is assuming that your independent variable is linearly related to the dependent variablebut you can actually break apart the independent variable to test whether that is true.) Logistic Regressionordered logit The results also give you intercepts (check to see how these are codedthey generally mean the same thing, but the directions of

the parameters are different in SAS versus Stata (just as an example). (SAS also models y=0 in a regular logistic regulation, so you need to flip the signs to get the more intuitive results). Multinomial Analyses Multinomial logit can be used when categories of the dependent variable cannot be ordered in a meaningful way.

One category is chosen as the comparison category, and the beta coefficient (b) represents the change in odds of being in the dependent variable category relative to the comparison category (for a one-unit change in the right-hand side variables). Multinomial Analyses

The model: Multinomial Analyses Multinomial logit is simple to estimateand is often used.

However, it is appropriate only if the introduction or removal of a choice has no effect on the (proportional) probability of choosing each of the others. For examplePerot versus Clinton versus Bush, 1992. Does removing Perot from the equation mean that the probability of choosing Clinton relative to the probability of choosing Bush changes? If so

multinomial logit is inappropriate. Multinomial Analyses Multinomial probit does not require that assumption that choices are independent across alternatives. And, though it demands a great deal of computing resources, recent

advances mean that it is increasingly practical to use. Multinomial Analyses So, often Multinomial Probit is recommended.

Dow and Endersby (2004) point out, however, that the choice of a model really depends on how you see the underlying choice process that generated the observed data. In reality, neither model (MNP or MNL) will be clearly advantageous. Multinomial Analyses

And Dow and Endersby argue that MNP sometimes fails to converge at a global optimum. Put simply, they argue that MNP often comes up with imprecise estimates that is, there are multiple sets of estimates that fit the data equally well.

Two studies that compare the MNP and MNL model: Alvarez and Nagler (2001) and Quinn et. al. (1999) Alvarez and Nagler argue for MNPQuinn et. al. are more agnostic. Multinomial logit Also, conditional logit: Conditional

logit only includes variables that are related to the options being chosen for the dependent variable.