Making online help more helpful
Investigating an algorithm from an online dictionary
A jocular post by Andrew (link) sent me on an unlikely excursion.
In the post, Andrew mused about finding quotes of himself in the online Merriam-Webster dictionary. The first example he cited was this quote:
My quick answer, was, no, the persistence method would not have worked.
This perked me up because of how silly this sentence is in teaching how to use the word "persistence" in a sentence.
The same sentence can be written for any number of nouns. The regression method, the factor method, the omission method, the research method, the drinking method, the kissing method, ....
Visiting the M-W page for persistence, I found a total of 27 example sentences with attribution, in addition to three generic examples without attribution.
The first attributed quotation is:
Nothing in the world can take the place of persistence.
This has the same nature as Andrew's sentence. One can substitute "persistence" for many a noun, without destroying the sentence. This implies that the sentence cannot explain how to use the specific word "persistence".
Not all examples are useless. The following each contains enough context to learn the meaning of "persistence":
These steps aren't easy, and can take some time and persistence.
Hall, who lives in Granbury, returned to the lake this winter and his persistence paid off on the last day of his trip.
The finish offers notes of black and brown spice notes and there is good persistence.
It would be more helpful if Merriam-Webster grouped the examples by word sense. The third sentence shown above is distinct in using persistence to refer to lingering sensation. (The original article is found here.)
The usage by Andrew is even stranger. In the Wired feature, the "persistence method" is defined in the sentence immediately before the one cited by Merriam-Webster. Andrew mentioned a climate scientist who used "persistence" to describe "the assumption [used in climate models] that conditions remain unchanged from one year to the next." This word sense maps to Merriam-Webster's second definition of "persist" (i.e., "to remain unchanged or fixed in a specified character, condition, or position"), which its editors have tagged as "obsolete."
In short, a reader can't figure out the meaning of persistence from reading Andrew's quotation.
If the Merriam-Webster examples are representative, they suggest that "persistence" is most often used in the sense of a human trait, and when used in this way, authors like to pair them up with related traits and concepts, such as "patience and persistence", "vision, persistence, and sweat", "hunger and persistence", "time and persistence", and "hard work and dogged persistence". The bounty of these specimens feels redundant.
How does Merriam-Webster select these quotations? This is what they disclose:

I assume they use an algorithm. I kept digging.
I investigated a quotation attributed to a Forbes feature:
Coupled with persistence, passion lit a path in the sky for the WASP.
This is a variation of the pattern "X and persistence" where X = "passion". But. But what is the casual reader supposed to make of "lit a path in the sky"? What is "WASP"?
If you know WASP, it's not what you're thinking. That meaning has little to do with paths in the sky. Read the Forbes article, and you'll learn that WASP stands for Women Airforce Service Pilots.
The word "persistence" appears in that article three other times.
At the root of all dreams lies persistence.
The Power of Persistence and Honoring a Legacy
So one of the lessons I’ve learned from doing this project has definitely been persistence. I mean, they kept fighting for military status until 1977 under President Carter, and that was the first time that that happened.
The algorithm evidently picked the most obstruse sentence, and also the one appearing furthest down the page. I'd have selected the third of this set - I cheated by including two sentences in the quote but without the second sentence, the meaning is elusive.
How should we make an ideal section for word usage in sentences? I'd want fewer but sharper sentences; self-contained sentences, or including surrounding sentences that provide the context for comprehension; and sentences grouped by word sense.
Implementing this type of algorithm takes a lot of work. You have to deploy a "spider" or some way of compiling a collection of text from which to extract sentences. You need a search engine to find keywords. You hope your text extraction process successfully pulled down author, date and document source (not standardized across different websites). You have to design a scoring rubric to select which sentences to show.