Acamedic
Academic Hunspell Dictionary
Install / Use
/learn @emareg/AcamedicREADME
Acamedic – The Academic Dictionary
A small Hunspell dictionary for professional, scientific writing.
- High Quality: based on SCOWL en-US dictionary and thoroughly tested.
- High Sensitivity:
- removed words such as
thee,posses,fatuous, orjerk. - less missed errors (but probably more false-positives)
- removed words such as
- Academic Language:
- added words such as
overapproximation,whitepaper, andbitmask. - select the scientific domains you need during build
- added words such as
- Easy Install: available as extension for LibreOffice and Firefox/Thunderbird.
Getting Started
Download this repository and then decide for which applications you want to install the dictionary:
Install System-Wide for Hunspell<br>
- copy
en-Academic.dicanden-Academic.affto/usr/share/hunspell
Install LibreOffice Extension<br>
- Automatic: Download and open
acamedic-libreoffice.oxtfile in theaddonsfolder. - Manual:
- Start LibreOffice and select
Tools → Extension Manager... → Add. - Open
acamedic-libreoffice.oxtfrom theaddonsfolder.
- Start LibreOffice and select
Install Thunderbird Extension<br>
- Automatic: Download and open
acamedic-mozilla.xpi. - Manual:
- Start Thunderbird and select
Tools → Add-ons → ⚙ → Install Add-on from file. - Open
acamedic-mozilla.xpifrom theaddonsfolder.
- Start Thunderbird and select
Install for Sublime-Text<br>
Linux: Copy en-Academic.dic and en-Academic.aff to ~/.config/sublime-text-3/Packages/Language - English/
Windows: Copy en-Academic.dic and en-Academic.aff to C:\Users\YOUR_USER_NAME\AppData\Roaming\Sublime Text 3\Packages\Language - English while replacing YOUR_USER_NAME with your username.
Install for Visual Studio Code<br>
- Install the extension
denisgerguri.hunspell-spellchecker(Ctrl+Shift+P, typeExt install, typehunspell) - copy
en-Academic.dicanden-Academic.affto~/.vscode/extensions/denisgerguri.hunspell-spellchecker-1.0.1/languages/ - Follow instructions at https://marketplace.visualstudio.com/items?itemName=denisgerguri.hunspell-spellchecker#adding-new-language
Install for TeXstudio<br>
- Start TeXstudio and select 'Options → configure Texstudio ... → Language ...'
- Under the spell check sub-group, check the path of the spelling dictionary. In Windows OS, the path is: 'C:\Program Files (x86)\texstudio\dictionaries'
- Copy
en-Academic.dicanden-Academic.affto the path from step 2. - From the same configuration window of step 2, you can choose
en-Academicfrom the drop menu of the default language. - Restart TeXstudio
Install for Texmaker<br>
- Start Texmaker and select 'Options → Configure Texmaker → Editor'
- At spelling dictionary, enter the path of the downloaded repository or click the browse button.
- Select
en-Academic.dic. - Click OK to close.
Contributing Words
The project is in an early stage and you might find many words in your domain that are missing. Please collect them over time and create an issue with your list of words. Again, the idea is to include only words that you use often or rare words that are very distinct and cannot be confused with other terms.
Building and Testing
The dictionary is constructed from several individual dictionary files in the src folder.
/base/contains common words.en_US_basic.dicbasic and simple words, e.g.booken_US_extra.dicgeneric, formal and abstract, e.g.interchangeableen_US_special.dicspecial, more domain specific names that are hard to confuse.
/academic/contains special mathematical, technical, chemical, etc. terms./names/contains special names starting with capital letter./codes/contains keywords of programming languages.
Automated Testing
We perform several types of automated tests to ensure a high quality of the dictionary:
- Correctness: check if all words in the dictionary are spelled correctly. This is done by looking up all words in a larger dictionary. Some words are open for debate, e.g. whether
testcaseis correct ortest caseshould be the only correct spelling. - Duplicates: check if words are encoded twice. This is not a problem for correctness but might indicate some kind of unwanted addition.
- Coverage: check which percentage of words of reference articles (e.g. Wikipedia) are included in the dictionary. We aim for high coverage but also intentionally exclude words to improve spell checking of similar words.
- Consistency: check if there are inconsistencies with certain grammar rules. E.g. the ending
iclyshould in general beically
Motivation for this Dictionary
For spell checking of academic documents, it is not useful if dictionaries include words such as thee or wee. They will most likely mask a spelling error of the words the or we. Since they are archaic or words, probably no one is going to write them but rather read them in some historic text fragments. And even if you write them, a spelling error of these words will most likely be masked by other words because instead of wee there is we, see, or weed.
Furthermore, the standard dictionaries include a lot of problematic words, such as wit, dome, or wont.
Even text analysis tool make mistakes.
Grammarly, for example, can also analyze context and grammar but still does not spot the following spelling error:
Which refers to the state of keeping some resource secrete from unauthorized parties.
Creation Process
I have created this dictionary using the following process:
- take a very small base dictionary: SCOWL-20
- manually go through it an remove non-scientific terms
- use it to check reference papers/articles
- add newly found terms that are in SCOWL-60
- manually check and add further unrecognized terms
- use the tests to double check all words
Terms I have removed
- archaic terms such as
brethren,cobbler,sod,thee,thou,unto,wive - inappropriate words such as
cum,slut,gnome,sexy,slave - narrative adjectives such as
cunning,fatuous,fierce,ghastly,hitherto,pompous,sheer,absurd - uncommon words with common alternatives, such as
envisage(useenvision),futile(useuseless), orhorrific(usehorrible). - rare words that are very close to common terms, such as
fist(too close tofirst) - colloquial words, such as
eh,gig,hey,lad,lousy,oh,bugger - words that are very far away from technical contexts, such as
horde,hail,hog,hut,lark,mummy,bigot - religious terms:
heresy,sermon,sinful - words often used to insult:
hypocrite,illiterate,inane,jerk,moron,snobbery - British / Scottish words that sneaked into the US dictionary, such as
lorry,nay,dole,duff,dustbin - potential mistakes found in SCOWL-20:
cs,alias's(should bealias'), alsospecies's,die's,elect's,feel's,want's
Testing
I do automatic testing on the resulting dictionary file:
- check that there no duplicates or that duplicates are justified, e.g.
cleancan be a verbclean/SDGor adjectiveclean/YRT - run on some reference papers and check that not many words are missing
- check that all words are spelled correctly:
- using SCOWL-60 and SCOWL-95 dictionary
- manually double-check all remaining words with
Lessons Learned
- the suffixes
-icand-icalare mostly interchangeable - words with
-icalor-ualsuffix should (normally) have the/Yadverb extension. - words with
-tivesuffix often have/SYP - words should either have
-icityor-nesssuffices and these should not have plural forms. E.g. usesimplificationsinstead ofsimplicities. - words ending in
-edvery rarely can be negated by prefixingin-/im-/ir-/il-
Future Plan: Part-of-Speech Encoding
The Hunspell compression could be used to indicate the POS type of a word. For example /SDG means that either s, ed or ing can be appended to the word, indicating a regular verb. However, the compression is only concerned with the size and therefore it needs to be checked for confusing suffixes.
Misleading expansion examples:
we/DGTexpands towe,wed,wing, andwestbe/DTexpands tobe,bed, andbestshould/RZexpands toshouldandshoulder/S
As a result, I started to encode regular verbs with /SDG and nouns with /MS, adjectives with /RT or /Y for the adjective/adverb combination.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
