Corpora are genuinely slap-up for checking collocations: words that are typically used together. Collocation's a genuinely of import aspect of linguistic communication in addition to a vital business office of linguistic communication didactics if nosotros desire to assist students avoid 'doing' obvious mistakes. As skillful speakers, nosotros by in addition to large direct maintain a experience for an private word's most typical collocates, but when you're writing materials, it's tardily to acquire a detail combination stuck inwards your caput or to start doubting your intuitions - practice nosotros state get a bus or take a bus? The to a greater extent than you lot state it to yourself, the sillier each i starts to sound. Influenza A virus subtype H5N1 flake of exterior prove tin hold out genuinely helpful.
If you lot desire to purpose a corpus to cheque out collocations though, it's of import to empathize a few basics close the statistics behind what the corpus tools are showing you lot in addition to what type of collocations powerfulness hold out appropriate for the materials you're writing.
Frequent vs Typical
The most of import distinction to acquire to grips amongst is the divergence betwixt frequent collocations in addition to typical or important or strong collocations. Most corpus tools volition exhibit you lot which words most usually co-occur merely based on raw frequency, but around tools volition also direct maintain an alternative to rank collocates yesteryear forcefulness of attraction, shown every bit a score. That is, the software volition accept into draw of piece of occupation concern human relationship non merely how ofttimes 2 words occur together, but how probable that combination is based on the relative frequency of the 2 items. So the chances of 2 real frequent words occurring together is quite high in addition to thence ofttimes fairly predictable in addition to uninteresting. If you lot look, for example, at the raw frequencies for words which alter the substantive car, you'll come upward across a whole charge of real mutual adjectives - new car, sometime car, little car, starting fourth dimension car, other cars, etc. That doesn't genuinely tell you lot an awful lot close language. Most students could belike estimate these combinations. But if you lot rearrange the collocates yesteryear significance, combinations similar electric car, sports car, rental car in addition to police car start rising to the top, along amongst around cars that aren't fifty-fifty cars, similar cable car. They're clearly much to a greater extent than interesting from a linguistic perspective, much less predictable in addition to much to a greater extent than what nosotros shout out upward of when nosotros verbalize close didactics collocation. See this Sketch Engine weblog post for to a greater extent than close this in addition to to a greater extent than examples (although, I form of disagree amongst its conclusions re. linguistic communication teaching!).
![]() |
| Ranked yesteryear frequency (the underlined number) Source: Sketch Engine, English linguistic communication Web 2013 corpus |
![]() |
| Ranked yesteryear score (the reveal on the right) Source: Sketch Engine, English linguistic communication Web 2013 corpus |
When you lot desire typical
I started off using corpora every bit a lexicographer working on learner's dictionaries. In a dictionary, you lot desire to exhibit the make of a give-and-take in addition to its usage, so looking at typical collocates is a slap-up starting betoken for getting a experience for a word. It helps you lot to tease out unlike senses - similar the AmE feel of car meaning carriage, every bit in runway car, prepare car, freight car, etc. - to position possible compounds, phrases in addition to idioms - car park, auto pool, acquire auto sick - in addition to to pick out around of the most important collocates you lot powerfulness desire to exemplify in addition to maybe highlight.
The less obvious but typical collocations are of import inwards didactics materials too, especially when an unpredictable collocation is also real frequent, similar catch a bus or board a plane; which score highly on both types of measure. The typical collocations aren't, however, ever what nosotros desire to focus on.
When you lot desire vanilla
Many lexicon entries, especially for to a greater extent than frequent words, volition start amongst what's known every bit a 'vanilla' example. That is a uncomplicated instance that illustrates the basic pregnant of the give-and-take inwards a context that's authentic but doesn't comprise other elements that distract from the give-and-take beingness exemplified. Information close less obvious collocations, phrases or colligational patterns volition come upward later. So the Cambridge Dictionaries entry for car has the next instance sentences:
They don't direct maintain a car. (the 'vanilla' instance - 'have' is genuinely i of the overstep collocating verbs yesteryear raw frequency, but it's unremarkable)
Where did you lot common the car? ('park' is a to a greater extent than interesting collocate)
It's quicker by car.
a auto chase/accident/factory
The same regulation holds for many other didactics contexts.
When you're introducing potentially novel vocabulary items, you lot desire students to focus on those novel words. Of course, you lot desire to acquaint them inwards a realistic context amongst appropriate collocates, but you lot don't desire to overwhelm the pupil amongst extra information in addition to especially non amongst collocates that are good higher upward the degree of the master copy target word. So if I was, say, didactics car for the starting fourth dimension time, I belike wouldn't throw inwards sports car or rental car, but it powerfulness hold out appropriate to add together a flake of diversity to the cloth amongst uncomplicated combinations similar new car or small car. Only later on when auto was a familiar vocabulary item powerfulness I desire to extend students' make to verbalize close other types of cars every bit appropriate contexts cropped up.
When frequent isn't necessarily obvious
A specially tricky instance inwards English linguistic communication is the gear upward of 'delexical' verbs (make, do, take, get, have, put, give, etc.) which are all incredibly frequent, but for a learner of English, non at all obvious inwards price of which to choose. If nosotros acquire dorsum to what nosotros practice amongst buses, yesteryear far the most frequent collocating verb is take. If you lot await at collocates yesteryear frequency, it's correct at the overstep for most corpora. If you lot switch to fellowship collocates yesteryear significance though, because it's a real mutual verb, it drops means downwards the fellowship to hold out replaced yesteryear board, ride, catch, park in addition to drive. Obviously, that doesn't hateful that nosotros don't necessitate to learn take the motorcoach because it'll hold out obvious to our students … because it won't!
Weighing upward the numbers
So what does all this mean? Which statistics should nosotros hold out looking at? Well, the answer is belike both. When I'm researching the collocates of a word, I'll flick betwixt both types of ranking to acquire an overall motion painting of how the give-and-take works, so brand my choices based on the didactics context.
- If I'm looking for a natural instance for a novel vocab item, I'll belike await at raw frequencies to notice a collocate that's mutual but non distracting.
- If a collocate - similar catch a bus - is high on both scores - it's belike worth teaching, in addition to maybe highlighting, early on.
- If I'm looking to extend students' make in addition to acquire them to purpose familiar words inwards to a greater extent than varied ways, so I'll investigate the to a greater extent than interesting collocates that come upward up when ranked yesteryear score
A complaint close data
Finally, every bit ever amongst corpora, it’s also of import to know what information you’re looking at. As I mentioned inwards my concluding corpus insider post, most corpora are made upward of predominantly written information and, of course, that’s going to impact the type of results you lot acquire back. So, going dorsum to my interrogation at the start of this post close get the bus vs. take the bus, most of the corpora I looked at listed take as a overstep collocate yesteryear frequency, but get, which felt to a greater extent than natural to me, was much farther downwards the lists (both yesteryear score in addition to raw frequency). When I looked at the Spoken BNC2014 (a corpus of contemporary spoken British English) though, all of a abrupt get the motorcoach rocketed to the top, suggesting it's something nosotros say, but maybe write slightly less often.


0 Response to "Corpus Insider #2: Frequency & Typicality"
Post a Comment