ECON 6002 Econometrics Memorial University of Newfoundland Qualitative and Limited Dependent Variable Models Chapter 16 Adapted from Vera Tabakovas notes Chapter 16: Qualitative and Limited Dependent Variable Models Nested Logit Mixed Logit AKA Random Parameters Logit Generalized Multinomial Logit Principles of Econometrics, 3rd Edition Slide 16-2

IIA assumption There is the implicit assumption in logit models that the odds between any pair of alternatives is independent of irrelevant alternatives (IIA) One way to state the assumption If choice A is preferred to choice B out of the choice set {A,B}, then introducing a third alternative X, thus expanding that choice set to {A,B,X}, must not make B preferable to A. which kind of makes sense Principles of Econometrics, 3rd Edition Slide16-3 IIA assumption There is the implicit assumption in logit models that the odds between any pair of alternatives is independent of irrelevant alternatives (IIA)

In the case of the multinomial logit model, the IIA implies that adding another alternative or changing the characteristics of a third alternative must not affect the relative odds between the two alternatives considered. This is not realistic for many real life applications involving similar (substitute) alternatives. Principles of Econometrics, 3rd Edition Slide16-4 IIA assumption This is not realistic for many real life applications with similar (substitute) alternatives Examples: Beethoven/Debussy versus another of Beethovens Symphonies (Debreu 1960; Tversky 1972) Bicycle/Pony (Luce and Suppes 1965) Red Bus/Blue Bus (McFadden 1974). Black slacks, jeans, shorts versus blue slacks (Hoffman, 2004) Etc.

Principles of Econometrics, 3rd Edition Slide16-5 IIA assumption Red Bus/Blue Bus (McFadden 1974). Imagine commuters first face a decision between two modes of transportation: car and red bus Suppose that a consumer chooses between these two options with equal probability, 0.5, so that the odds ratio equals 1. Now add a third mode, blue bus. Assuming bus commuters do not care about the color of the bus (they are perfect substitutes), consumers are expected to choose between bus and car still with equal probability, so the probability of car is still 0.5, while the probabilities of each of the two bus types should go down to 0.25 However, this violates IIA: for the odds ratio between car and red bus to be preserved, the new probabilities must be: car 0.33; red bus 0.33; blue bus 0.33

Te IIA axiom does not mix well with perfect substitutes IIA assumption We can test this assumption with a Hausman-McFadden test which compares a logistic model with all the choices with one with restricted choices (mlogtest, hausman base in STATA, but check option detail too: mlogtest, hausman detail) However, see Cheng and Long (2007) Another test is Small and Hsiaos (1985) STATAs command is mlogtest, smhsiao (careful: the sample is randomly split every time, so you must set the seed if you want to replicate your results) See Long and Freeses book for details and worked examples IIA assumption Extensions have arisen to deal with this issue The multinomial probit and the mixed logit are alternative models for nominal outcomes that relax IIA, by allowing correlation among the errors (to reflect similarity among options) but these models often have issues and assumptions themselves

IIA can also be relaxed by specifying a hierarchical model, ranking the choice alternatives. The most popular of these is called the McFaddens nested logit model, which allows correlation among some errors, but not all (e.g. Heiss 2002) Generalized extreme value and multinomial probit models possess another property, the Invariant Proportion of Substitution (Steenburgh 2008), which itself also suggests similarly counterintuitive real-life individual choice behavior The multinomial probit has serious computational disadvantages too, since it involves calculating multiple (one less than the number of categories) integrals. With integration by simulation this problem is being ameliorated now IIA assumption IIA can also be relaxed by specifying a hierarchical model, ranking the choice alternatives The most popular of these is called the

McFaddens nested logit model, which allows correlation among some errors, but not all ( e.g. Heiss 2002) IIA assumption The nested logit is a partial relaxation of the IID and IIA assumptions of the MNL model It is relatively straightforward to estimate It also has a closed-form solution IIA assumption Most NL models have only two hierarchical levels, Very few NL models are estimated with three levels, and even fewer with four levels Note that the tree structure does not have an

actual sequential interpretation of any sort It is only there to allow for differentials in the degree of correlation within and between nests Nested Logit By default, nowadays Statas nlogit uses a parameterization that is consistent with RUM Before Stata 10, a nonnormalized version of the nested logit model was used by Stata (and other packages) and you will see some papers pointing that out This can still be requested by specifying the nonnormalized option nonnormalized requests a nonnormalized parameterization of the model that does not scale the inclusive values by the degree of dissimilarity of the alternatives within

each nest. Use this option to replicate results from older versions of Stata (Stata help) Nested Logit By default, NOW Statas nlogit uses a parameterization that is consistent with RUM Before Stata 10, a nonnormalized version of the nested logit model was used by Stata (and other packages) and you will see some papers pointing that out Both versions are valid, but only the RUM-consistent version is based on a sound model of consumer behavior (the normalization is about scaling the coefcients in the second level choice, dividing them by the dissimilarity parameters, so that the utilities can be meaningfully compared, see Heiss (2002) for details) Nested Logit

Adapted from Statas help file, let us consider a model of restaurant choice use http://www.stata-press.com/data/r13/restaurant Or look it up (it is one of Statas example datasets) run describe Nested Logit Fake data on 300 families and their choice of seven local restaurants: Freebirds and Mamas Pizza sell fast food Cafe Eccell, Los Nortenos, and Wings N More are family restaurants Christophers and Mad Cows are fancy restaurants Nested Logit Model the decision of where to eat as a function of: household income

Number of kids rating, of the restaurant (coded 05) average meal cost per person distance between the household and the restaurant Nested Logit Note that: income and kids are attributes of the family rating is an attribute of the alternative (the restaurant) cost and distance are attributes of the alternative as perceived by the familiesthat is, each family has its own cost and distance for each restaurant. Nested Logit Thus: income rating cost and kids are case-specific is alternative-specific and distance are both

Nested Logit Why not only 300 obs.? Nested Logit Why not only 300 obs.? Nested Logit You could fit a conditional logit model to this data as arranged Since income and kids are case-specific, you would use asclogit instead of clogit* *asclogit is greatin the old days you would need to work a bit harder with dummies and interactions to e able to run a mixed model with the old clogit command. This is still a good exercise tough. Nested Logit You could fit a conditional logit model to this data as arranged

However, the conditional logit may be inappropriate, since it assumes that the random errors are independent, and as a result it forces the odds ratio of any two alternatives to be independent of the other alternatives, the IIA!!! Nested Logit . clogit chosen kids income cost rating distance , group(family_id) note: kids omitted because of no within-group variance. note: income omitted because of no within-group variance. Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log

likelihood likelihood likelihood likelihood = = = = -538.52688 -537.06762 -537.06643 -537.06643 Conditional (fixed-effects) logistic regression Log likelihood = -537.06643 chosen Coef. kids income cost rating distance

0 0 -.1543089 .8669793 -.085346 Std. Err. (omitted) (omitted) .0173858 .0981221 .0437514 z Number of obs LR chi2(3) Prob > chi2 Pseudo R2 P>|z| = = = =

2100 93.41 0.0000 0.0800 [95% Conf. Interval] This is the pure conditional logit! -8.88 8.84 -1.95 0.000 0.000 0.051 -.1883845 .6746635 -.1710973 -.1202333 1.059295 .0004052 . asclogit chosen cost dist, case(family_id) alternatives(restaurant) casevars(income kids) Iteration Iteration Iteration

Iteration Iteration 0: 1: 2: 3: 4: log log log log log likelihood likelihood likelihood likelihood likelihood = = = = = -487.34059

-483.15644 -482.21859 -482.21731 -482.21731 . asclogit chosen cost dist, case(family_id) Nested Logitalternatives(restaurant) casevars(income kids) Alternative-specific conditional logit Case variable: family_id Number of obs Number of cases = = 2100 300 Alternative variable: restaurant Alts per case: min = avg = max = 7

7.0 7 Wald chi2(14) Prob > chi2 Log likelihood = -482.21731 chosen chosen Coef. Coef. Std. Err. z Std. Err. P>|z| z[95% = = 60.28

0.0000 P>|z| Conf. Interval] [95% restaurant cost -.1330786 .0675342 -1.97 restaurant distance -.2127489 .0482651 -4.41 cost -.1046302 .0656595 Freebirds (base alternative) distance -.1922521 .0465579 0.049

0.000 -.2654432 -.3073468 -1.59 -4.13 -.0007141 -.1181511 0.111 0.000 MamasPizza income kids _cons .0239198 .3248002 -1.228673 .0232255 .274624 1.108607 1.03

1.18 -1.11 0.303 0.237 0.268 -.0216013 -.2134529 -3.401503 .069441 .8630533 .944157 CafeEccell income kids _cons .0426504 .3486398 .3677203 .0193985 .226436 .9075505

2.20 1.54 0.41 0.028 0.124 0.685 .0046301 -.0951667 -1.411046 .0806708 .7924463 2.146486 .0325719 .1761988 1.483996 .0193533 1.68 0.092 -.0053599 -.2662912

-.3423353 .0705037 .2442661 .6935955 .2268803 .9268626 1.08 0.75 0.282 0.454 -.2004111 -1.123022 .6889434 2.510213 .0714976 -.0374476 1.626207 .021214

3.37 0.001 .029919 -.5343376 -1.651157 .1130763 -.2062462 .8012166 .2713623 1.821835 -0.76 0.44 0.447 0.660 -.7381066 -2.769514 .3256142 4.371948

LosNortenos Christophers income income kids _cons kids WingsNmore income _cons kids _cons MadCows Christophers income income kids _cons kids MadCows _cons income kids _cons . .2257644 0.78 0.435

.0714976 .021214 .9318191 1.59 0.111 -.0374476 .2535199 .0428583 .0194924 1.672155 2.20 0.028 1.626207 .0963617 .2535199 .0221248 -0.15 0.883 1.672155 0.97 0.331 -.2062462 .2713623 .8012166 .0963617 .0221248 1.821835 4.36 0.000

I could not estimate Conf. ratingInterval] at the same time It did not converge -.2333204 .02406 Could you run -.2835039 -.1010003 3.37 .6186889 0.001 3.310328 -0.15 0.883 .0046538 0.97 .0810627 0.331 4.36 .4594424 0.000 4.903571 -0.76 0.447 0.44 .1397254

0.660 .052998 a Plain MNL With this dataset? .029919 -.5343376 -1.651157 .1130763 .4594424 4.903571 .052998 -.7381066 -2.769514 .1397254 .3256142 4.371948 Nested Logit Here we suspect that restaurants should be grouped

by type (fast, family, or fancy) Why? Nested Logit Assuming that unobserved stuff affecting a decision about one alternative has no effect on the choice other alternatives may seem innocuous, but often this assumption is too restrictive Example: when a family was deciding which restaurant to visit, they were pressed for time because of plans to attend a movie later Nested Logit The unobserved shock (being in a hurry) would raise the likelihood that of going to either fast food restaurant (Freebirds or Mamas Pizza)

Another family might be choosing a restaurant to celebrate a birthday and therefore be inclined to attend a fancy restaurant (Christophers or Mad Cows) Nested Logit With the nested logit,we are not assuming that families first choose whether to attend a fast, family, or fancy restaurant and then choose the particular restaurant We assume merely that they choose one of the seven restaurants Nested Logit We now must first create a variable that defines the structure of our decision tree nlogitgen type = restaurant(fast: Freebirds | MamasPizza, family: CafeEccell | LosNortenos| WingsNmore, fancy: Christophers | MadCows)

Nested Logit We now must first create a variable that defines the structure of our decision tree Nested Logit Our new type variable defines the three types of restaurants We can now see how the alternative-specific attributes (cost, rating, and distance) apply to the bottom alternative set (the seven restaurants) and how family-specific attributes (income and kid) apply to the alternative set at the first decision level (the three types of restaurants) Nested Logit nlogit chosen cost rating distance || type: income kids, base(family) || restaurant:, noconstant case(family_id) RUM-consistent nested logit regression

Case variable: family_id Number of obs Number of cases = = 2100 300 Alts per case: min = avg = max = 7 7.0 7 Nested Logit Alternative variable: restaurant Wald chi2(7) Prob > chi2 Log likelihood = -485.47331

Std. Err. z P>|z| = = 46.71 0.0000 chosen Coef. [95% Conf. Interval] restaurant cost rating distance -.1843847 .463694 -.3797474

.0933975 .3264935 .1003828 -1.97 1.42 -3.78 0.048 0.156 0.000 -.3674404 -.1762215 -.5764941 -.0013289 1.10361 -.1830007 income kids -.0266038 -.0872584 .0117306 .1385026

-2.27 -0.63 0.023 0.529 -.0495952 -.3587184 -.0036123 .1842016 income kids 0 0 income kids .0461827 -.3959413 5.08 -3.24

0.000 0.001 .0283595 -.6351267 .0640059 -.1567559 -1.201295 .614463 -1.407896 4.627051 4.395763 9.607583 nlogit chosen cost rating distance || type: income kids, base(family) || restaurant:, noconstant case(family_id) type equations fast family (base) (base)

fancy .0090936 .1220356 dissimilarity parameters type /fast_tau /family_tau /fancy_tau 1.712878 2.505113 4.099844 LR test for IIA (tau = 1): 1.48685 .9646351 2.810123 chi2(3) = 6.87 Prob > chi2 = 0.0762 Nested Logit

nlogit chosen cost rating distance || type: income kids, base(family) || restaurant:, noconstant case(family_id) Option noconstant suppresses the constant terms for the bottom-level alternatives Needed for convergence in this example unless you simplify things a little: nlogit chosen distance || type: income , base(family) || restaurant:, case(family_id) Nested Logit The error correlation parameters are reexpressed as dissimilarity parameters In Stata notation nlogit estimates a tau, in this example for each type (upper level branch) with subcategories (lower level branches, twigs,) In Cameron and Trivedis (MMA) notation, dissimilarity parameters are rhos and called scale parameters

Nested Logit In the normalised version of the nlogit model the dissimilarity parameters are used to scale the logsums or inclusive values: In Cameron and Trivedis (MMA) notation: Inclusive value or logsum The inclusive value for the mth nest is the expected value of the maximum utility that Individual i can obtains from choosing an alternative within nest m Nested Logit Nlogit estimates a tau (dissimilarity parameter, which is the coefficient of the inclusive value/logsum) for each type (upper level branch) with subcategories (lower level branches, twigs,) dissimilarity parameters (inversely) measure the degree of correlation (rho =1-tau2 or for the m nest) of

random shocks within each of the three types of restaurants th If greater than one the model is inconsistent with RUM Nested Logit Dissimilarity parameters must fall between 0 and 1 If one of them (say the one for fast food) were less than zero, something that increased the likelihood of choosing Freebirds would decrease the likelihood of choosing a fast food restaurant, which simply does not make any sense If the dissimilarity parameter is zero, the changes in restaurant probabilities will not affect the choice of type of restaurant and the correct model is recursive (separated) Nested Logit

The conditional logit model is a special case of nested logit where all the dissimilarity parameters equal one Our Likelihood-ratio test of this hypothesis here shows mixed evidence of the null hypothesis that all the dissimilarity parameters are equal to one IIA holds if and only if all dissimilarity parameters are equal to one Nested Logit In LIML (two-step or sequential estimation) it was often assumed for convenience that all of the dissimilarity parameters were equal This is a restriction you can impose on our stata code too

Nested Logit You could estimate the NLOGIT in two steps using LIML but you would need some complex corrections of the standard errors in the second step Nowadays we have toys powerful enough to run NLOGIT all in one step using FIML The latter is preferable (nlogit in Stata uses FIML), since it is more efficient The LIML sequential estimation might still help to provide starting values, as the FIML log-likelihood is not globally concave Nested Logit Try:

. nlogit chosen rating distance || type: income kids, base(family) || restaurant: cost, noconstant case(family_id) . nlogit chosen rating distance cost || type: kids, base(family) || restaurant: income, noconstant case(family_id) Nested Logit Issues: you can build your tree in different ways, some will work better than others Those choices in general will yield different results anyway No test to choose among trees

Mixed Logit AKA Random Parameters Logit multinomial logit models with unobserved heterogeneity They allow the parameters to vary randomly across individuals See mixlogit command (Hole, A. R. Fitting mixed logit models by using maximum simulated likelihood Stata Journal, 2007, 7, 388-401 ) (find C:/ traindata.dta) Mixed Logit AKA Random Parameters Logit multinomial logit models with unobserved heterogeneity The RPL allows for correlation across alternatives through an individual-specific random effect

Keywords binary choice models censored data conditional logit count data models feasible generalized least squares Heckit identification problem independence of irrelevant alternatives (IIA) index models individual and alternative specific variables individual specific variables latent variables likelihood function limited dependent variables linear probability model of logistic Principles Econometrics, 3rd Edition random variable

logit log-likelihood function marginal effect maximum likelihood estimation multinomial choice models

multinomial logit odds ratio ordered choice models ordered probit ordinal variables Poisson random variable Poisson regression model probit selection bias tobit model truncated data Slide 16-46 References Cameron and Trivedis MMA and MUS Hensher, Rose, and Greenes (2005) Applied Choice Analysis: A Primer, available (electronically too) at the QEII Next Ordered Choice Count data