Professional Documents
Culture Documents
Readme en CA
Readme en CA
(Spell Checker Oriented Word Lists) wordlists available at Kevin's Word Lists
Page (http://wordlist.sourceforge.net). Lists with the suffixes 10, 20, 35,
50, 65 and 65 were used. Lists with the suffixes 70, 80 and 95 were excluded.
Copyright information for SCOWL and the wordlists used in creating it is
reproduced below.
The affix file is identical to the MySpell English (United States) affix file.
It is a heavily modified version of the original english.aff file which was
released as part of Geoff Kuenning's Ispell and as such is covered by his BSD
license.
---
The 10 level includes the 1000 most common English words (according to
the Moby (TM) Words II [MWords] package), a subset of the 1000 most
common words on the Internet (again, according to Moby Words II), and
frequently class 16 from Brian Kelk's "UK English Wordlist
with Frequency Classification".
Grady Ward
3449 Martha Ct.
Arcata, CA 95521-4884
grady@netcom.com
grady@northcoast.com
> I was wondering what the copyright status of your "UK English
> Wordlist With Frequency Classification" word list as it seems to
> be lacking any copyright notice.
There were many many sources in total, but any text marked
"copyright" was avoided. Locally-written documentation was one
source. An earlier version of the list resided in a filespace called
PUBLIC on the University mainframe, because it was considered public
domain.
> So are you saying your word list is also in the public domain?
The 20 level includes frequency classes 7-15 from Brian's word list.
The name files form the Census report is a government document which I
don't think can be copyrighted.
The name list from Alan Beale is also derived from the linux words
list, which is derived from the DEC list. He also added a bunch of
miscellaneous names to the list, which he released to the Public Domain.
The DEC Word list doesn't have a formal name. It is labeled as "FILE:
english.words; VERSION: DEC-SRC-92-04-05" and was put together by Jorge
Stolfi <stolfi@src.dec.com> DEC Systems Research Center. The DEC Word
list has the following copyright statement:
(NON-)COPYRIGHT STATUS
(NO-)WARRANTY DISCLAIMER
These files, like the original wordlists on which they are based,
are still very incomplete, uneven, and inconsitent, and probably
contain many errors. They are offered "as is" without any warranty
of correctness or fitness for any particular purpose. Neither I nor
my employer can be held responsible for any losses or damages that
may result from their use.
However since this Word List is used in the linux.words package which
the author claims is free of any copyright I assume it is OK to use
for most purposes. If you want to use this in a commercial project
and this concerns you the information from the DEC word list can
easily be removed without much sacrifice in quality as only the name
lists were used.
The 65 level includes words found in the Ispell "medium" word list.
The Ispell word lists are under the same copyright of Ispell itself
which is:
The 80 level includes the ENABLE word list, all the lists in the
ENABLE supplement package (except for ABLE), the "UK Advanced Cryptics
Dictionary" (UKACD), the list of signature words in from YAWL package,
and the 10,196 places list from the MWords package.
The 95 level includes the 354,984 single words and 256,772 compound
words from the MWords package, ABLE.LST from the ENABLE Supplement,
and some additional words found in my part-of-speech database that
were not found anywhere else.
The variant word lists were created from a list of variants found in
the 12dicts supplement package as well as a list of variants I created
myself.