LIN 3098 Corpus Linguistics Albert Gatt In this lecture We proceed with our discussion of how corpus-based studies influence the study of grammar. Focus: lexico-grammar Uses of corpora in grammar studies The use of corpora to study grammar is relatively recent. With corpora, the unit of analysis tends to be the word (tokens/types)

Studies of lexis therefore a natural application. The study of grammar has in fact emphasised the role of lexis. Also aided by recent developments in automatic POS tagging and parsing. Additional grammatical information enables search and analysis of complex structures. Part 1 The relationship between grammar and lexis Degrees of abstraction

We have already looked at the use of corpora in studying collocations. Given sufficient grammatical annotation, we can look at collocational patterns at different degrees of abstraction. Degrees of abstraction Example: all preceding collocates of the noun time in the BNC. Not all collocates

are equally interesting. lots of noise when searching for a single word! word frequency the 266

first 104 this 96 of 72 same

67 a 65 Practical task 1 Lets try to make our search more interesting, by focusing on a combination of lexical and grammatical material. Conduct a search for: Any adjective followed by the noun time

Degrees of abstraction Example: only adjectival collocates of the noun time in the BNC. Can make grammatically informed queries. [ADJ + ADJ + time] Allows focus on what is truly of interest. word

frequency long 38 good 11 spare 7

little 6 present 6 whole 5 Practical task 2 We can go further in abstracting away

from specific lexical material. Conduct a search for: Any adjective followed by any noun Degrees of abstraction Suppose we were interested in all adjective-noun combinations. [ADJ + ADJ + N] Given a query language of the right complexity (such as CQL), we can extract

grammatically interesting collocations. ADJ+N Freq. prime minister 102 other hand

65 local authorities 44 long time 42 soviet union 41

Limitations of these approaches What weve done still retains a focus on the word. The main purpose is to improve lexical research by incorporating a limited amount of grammatical info (usually POS) Can we go further and really investigate grammar? Part 2 Collocational Frameworks Does this sound familiar?

Colourless green ideas sleep furiously Chomskys example illustrates an approach to syntax where: the primary focus is on syntactic rules rules manipulate lexical items of the right categories grammatical or legal is distinct from sensible or meaningful syntactic rules operate (semi-) independently of lexical items: if X is of the right category, then X can be slotted into a syntactic position Chicken and egg questions When we formulate an utterance, which

comes first? syntax? lexical items? both in parallel? Do particular syntactic constructions have a meaning (or communicative function)? E.g. what is the meaning of: the appositive that-construction The reason that he gave was the extraposed it-construction It is possible to hire a car if you want one. Lexical approaches to grammar

Assumptions: syntactic structures are highly sensitive to the lexical items that they can select structures also may have specific communicative functions or meanings speakers/authors convey meaning, and syntax is used as a resource to convey it ideally, grammar+lexis should be viewed as part and parcel of the same process phraseology and co-selection play an important role in particular constructions, we find that particular words tend to co-occur with great

regularity The idiom principle Sinclair (1991): a language user has available to him or her a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analyzable into segments Implications The idiom principle suggests that

speakers/writers: Dont just apply abstract rules to build structures; Re-use bits of structure; It also implies that bits of structure are themselves meaningful. The idiom principle vs open choice This principle contrasts with the open-choice principle. Open choice predicts that: Syntactic rules operate independently of

lexical items. Structures are constructed by applying rules and plugging in lexemes. Putting the idiom principle to work Sinclair and Renouf (1991) introduced collocational frameworks Intended as a practical way to investigate the use and meaning of grammatical constructions A collocational framework consists of a pattern involving 3 items:

A function word A content word (specified via POS) Another function word Example: [ADJ + a + Noun + of] Collocational frameworks Is a pattern like [ADJ + a + Noun + of] a linguistic unit? If it is, we would expect that: The grammatical context (a, of) makes

restrictions on the semantics of the Noun in the middle (not any noun can be used) Practical task 3 Conduct a search for: The collocational framework [ADJ + a+Noun+of] In looking at the nouns that occur here, can you spot any semantic commonalities? What does this tell you about the way the structure itself is used, and what it usually means?

[ADJ + a + Noun + of] Nouns in this construction are often quantities: a lot of a number of ... This suggests that this construction itself places a restriction on the semantics of the content words used in it. Collocational frameworks: final remarks

Sinclair and Renouf did not suggest that any string of words or pattern counts as a collocational framework. Crucially, there has to be evidence for semantic restrictions on content words. E.g. [ADJ + Verb in NP] doesnt count as a good pattern, because practically any verb can occur in the first position. Part 3 Colligates Colligations

Roughly, a collocation at the level of part of speech. An idea due to Firth. The main question is: What are the grammatical environments in which a particular word occurs? One way of answering this question is to look for a word, and then look at the POSs to the left and right. Practical task 4 Conduct a search for the word consequence, specifying any word to

the right and any word to the left. Make a frequency count of node tags. What do you observe? Some data (Gries 2009) Left context of consequence Article Adjective ... Right context: Of Preposition

... Observations This operationalisation of the concept of colligation is highly related to the collocational framework of Renouf/Sinclair. Its primarily intended to give an idea of the grammatical environment in which a word occurs. Limitations Both collocational frameworks and colligations have some drawbacks:

Theyre still highly word-based They focus only on POS (not full syntax) Their view of grammatical structure is purely linear. Part 3 Some case studies Example 1: It as object Components: non-referential use of it object of a verb followed by an NP or AdjP

Examples (from the BNC): Many people who use drugs regularly find it difficult to exist in a drug-free world . You can also find it hard to remember things in court unless they agree to do so , making it difficult for detainees to challenge the validity Example 1 continued Typical analysis: this construction involves extraposition: People who use drugs find existing in a drug-free world difficult. People who use drugs find it difficult to exist in a

drug-free world Some empirical observations on lexis (Francis 1993): 98% of cases involve find and make some other verbs like think, consider, see to Possible meaning/function of the structure: a stereotyped way of presenting a situation in terms of how it is evaluated evaluation is placed after the verb Example 2: appositive clauses Apposition: a relation between an NP and another phrase which refers to the same thing (Leech and Svartvik, 1975)

Examples: your daughter, the lawyer, is here In English, can also occur with that-clauses and to-clauses: the news that your daughter was here the plot to assassinate the president Example 2: appositive clauses Distinguished from restrictive relative clauses: the dog that I saw yesterday restricts the reference of the head noun

Appositive clause: the fact that I came does not restrict the reference of the head noun amplifies or qualifies the head noun Example 2: Appositives Appositive that-clauses (BNC): The fining of airlines plus the fact that the nationals of many refugee-producing countries as firm as the Emperor Augustus about the principle that a ruler 's actual appearance matters less Traditional grammars (Leech and Svartvik

1975): head noun must be an abstract noun Question: what are the lexical restrictions here? do they have implications for the function of this syntactic structure? Levels of stereotypicality in syntax Phraseological constraints: the co-selection of particular lexical items within a particular syntactic structure

These seem to range on a continuum. At one extreme: fixed, unchanging constructions (behave like multi-word lexical items) At the other: complete freedom in lexical selection. Phraseology Completely fixed idioms:

Less fixed idioms: put on a brave face putting a brave face on put a good face on Some room for lexical manoeuvre

Semi-prepackaged phrases which allow for variation: it never rains but it pours I havent the faintest/foggiest/remotest idea/notion Highly nebulous lexico-syntactic dependencies:

be a case of X a case of dj vu a case of take the money and run Syntactic fixedness Given the cline from fixed to flexible, some linguists (e.g. Francis 1993) suggest that the distinction between lexicon and syntax is arbitrary. This argument is based on phraseological constraints observable only in very large corpora.

This is not too far from recent positions in Generative Grammar: Jackendoff (2002)s parallel architecture; Construction Grammar (e.g. Goldberg, 1995) The item and the environment Francis proposes that the distinction between lexical item and syntactic environment only be used for convenience. Proposed method:

look at a syntactic environment discover lexical regularities focus on a subset of the lexical items discover further generalisations about the grammar of those items Case study: Extraposed it-clauses One of the most frequent adjectives is possible: it is possible to hire a car it is possible that it will rain

Proposed interpretations: that-clause is used for possibility to-clause is used to express ability This suggests that possible might have (at least) two different meanings. The grammar of possible Further patterns involving possible: article + superl. adj. + possible + noun the best possible start as as possible

Main idea: specifications of possible grammatical environments of the item can help specify its range of meanings. these examples seem to confirm the ability/probability use of possible

