Domain Madness
OK, I admit it, I may have too much time on my hands,
but I was looking for a new DNS domain name, and decided to shove all of
the 234,937 words in /usr/share/dict/words through Whois, collecting those
not having a .com entry. The script is attached below. Currently the words
file lists the words in Webster’s Second International, who’s 1937 copyright
has expired.
Oddly enough, the result is that 63% of the words are NOT
registered names!! That’s right, 147,886 words are not taken. That’s the
good news. The bad news is that many of them are pretty weird. I’ve stashed
these guys, both compressed and clear, on
http://backspaces.net/files/NonDNSWords
http://backspaces.net/files/NonDNSWords.gz
For example, here’s the list of all 43 4-letter words not taken:
grep ^....$ NonDNSWords bikh fowk hawm koae odso shlu waup yeuk yirndird frib hewt kuar oime suld wusp yigh yirrdowf gawm jaob mowt paut syrt wype yilt yuftdowp ghuz jewy munj phoh uily yalb yirkemyd gype jhow niog rynt wauf ycie yirm
I can’t see one that calls out to me, really. A lot of these do not appear in my dictionary, but the Second International was known for stretching!
Here’s a random sampling of 100 of the 6 letter critters:
grep ^......$ NonDNSWords | ran 100 | column -xdiaene evener tummer burdie palpon coccid taxwax chanst madefy buntalhaggly masted untone dutied unmiry cynips psetta otitic gawcie beflagmidpit orgyia tutory amylic begnaw punlet adigei scrank bedrop lusorydorize repale unmold snurly scotic unsing uplead hemine unnose stibicfunori cobcab yengee cahita rutuli menkib uptend sassak beflap crantsocyroe rugose avowry mogdad coecal elleck ptotic kommos amusgo lemosiavitic amorua cacara ideist reswim napaea reshut egeran lechea embolykorait uplick baeria kurvey ureido tuchit beroll adroop degged twiselkechel solate unbare hardim upwaft sullan tineal uramil ovinia pappoxforrad jacami unlean byrlaw thymyl scrobe lyncid crenic bepity anoine
..where “ran” is a simple awk script, below, to randomly select n lines
from a file. Its kinda spooky doing all this on Mac OS X .. it really IS
Unix.
Again, not a lot of love. By the way, there were 5,166 6-letter
words, so I likely have not shown some real winners in this sampling. Let
me know if you find some real winners.
This got me a bit curious .. how do the words work out by size?
I.e. how many words are 6 letters long etc? Time for another script, also
attached below:
/usr/share/dict/words NonDNSWords 1 52 0.0221 . 2 155 0.0660 . 3 1351 0.5750 . 4 5110 2.1751 4 43 0.0291 5 9987 4.2509 5 1219 0.8243 6 17477 7.4390 6 5166 3.4932 7 23734 10.1023 7 10725 7.2522 8 29926 12.7379 8 16593 11.2201 9 32380 13.7824 9 20861 14.106110 30867 13.1384 10 22254 15.048111 26011 11.0715 11 20415 13.804612 20460 8.7087 12 17065 11.539313 14937 6.3579 13 12935 8.746614 9763 4.1556 14 8811 5.958015 5924 2.5215 15 5433 3.673816 3377 1.4374 16 3146 2.127317 1813 0.7717 17 1681 1.136718 842 0.3584 18 804 0.543719 428 0.1822 19 408 0.275920 198 0.0843 20 189 0.127821 82 0.0349 21 79 0.053422 41 0.0175 22 38 0.025723 17 0.0072 23 16 0.010824 5 0.0021 24 5 0.0034
Well, the most populous part of NonDNSWords is 10; here’s a sample:
dramseller floriation tractional clanswoman periphrasecyrtometer symphytize convolvuli mucigenous clamminesshyperacute myrtlelike unharbored ergonovine undertideddigressory preclosure parnassism habilatory boycottismnilometric paralgesic trimacular annelidian breezinessprelegatee admiringly scatophagy bonebinder morphinismendosteoma ranivorous undistinct solenodont scathinglyunfreckled unpanelled impalpably unemphatic staverwortgradientia cystospasm xenocratic cogredient rubescenceneurolytic unrebutted saponacity brachyoura depatriate
OK, I know you want to know the 5 24-letter words, so here they are:
formaldehydesulphoxylate
pathologicopsychological
scientificophilosophical
tetraiodophenolphthalein
thyroparathyroidectomize
..and, yup, antidisestablishmentarianism wasn’t there.
This did make it easy to search for substrings of interest. For example, I wanted to
find all the words with “plex” in them. There were 54:
grep plex NonDNSWords | column -x amplexation amplexicaudate amplexicaul amplexicauline amplexifoliate autocomplexes cerviciplex complexedness complexionably complexional complexionally complexioned complexionless complexively complexly decemplex diaplexal diaplexus epiplexis euplexoptera ganglioplexus holoplexia intercomplexity kataplexy myelapoplexy nulliplex overcomplex overcomplexity perplexable perplexedly perplexedness perplexingly perplexment phantoplex plexicose pleximeter pleximetric plexodont plexometer plexure pseudoapoplexy reperplex retroplexed semiamplexicaul semiduplex sextuplex simplexed supercomplex triplexity ultracomplex unimultiplex unperplexed unperplexing veniplex
This is a bit more interesting: holoplexia.com sounds nifty, as does nulliplex.com
So, I guess you’re wondering which one I took, right? Well, sadly,
none of them. While groveling around, I thought of a two-word critter I kinda
like: ComplexityWorks.com, so hmm..all this was a waste? I think not, but…
Scripts:
Check a list of words w/ whois.#!/bin/shpat=${1:-"^...*"}start=${2:-a}file=${3:-/usr/share/dict/words}words=`sed -n "/^$start/,\$p" $file | grep $pat -i`for w in $words ; do whois $w.com | sed -n '/No match for/{s:.*for .::;s:......$::p;}' | tr A-Z a-zdone
For choosing N random samples from a stream:#!/bin/shsamples=$1awk -v samples=$samples '{a[NR]=$0} # Read in fileEND { len=NR for ( len=NR; samples > 0 && len > 0; samples--) { i=int(rand()*(len+1)) print a[i] delete a[i] len-- }}'For sorting a stream by length:#!/bin/shawk '{a[length]++}END { for (i in a) printf "%2i %10i %10.4f n", i, a[i], 100*a[i]/NR}' | sort -n
I’m curious: How did you pick *your* domain?!


the Dr. Seuss technique
I read a similar article several years ago that painted a bleaker picture because it followed trends to predict that all words in the English language would be used up by some relatively recent sounding year. I tried a few favorite obscure words (I only found out later that Eric Raymond took "thyrsus"), but no luck. Rather than frustrate myself with more whois searches to get something ultimately unsatisfying, I decided to make up a word myself.
The result: mentata.com. I'll bet Tim O'Reilly can tell you where it comes from, but even beyond the concepts, its short, easy to remember, and has kind of a catchy ring to it.
Don't feel guilty, this is important work. Thanks for the list.
How disappointing...
www.antidisestablishmentarianism.com is a placeholder for a domain registration company.
I'd be curious to discover how many of those domains that are registered with whois are actually being held for sale...
Random script didn't work on RedHat 8
In order to make the random script not return the same result everytime, a call to srand() needs to be included to seed the random number generator:
#!/bin/sh
samples=$1
awk -v samples=$samples '
{a[NR]=$0} # Read in file
END {
srand()
len=NR
for ( len=NR; samples > 0 && len > 0; samples--) {
i=int(rand()*(len+1))
print a[i]
delete a[i]
len--
}
}'
Random script didn't work on RedHat 8
Thanks! I thought a bit about whether or not I wanted it to be "reproducible" .. i.e. repeat each run. I think I like your approach better than mine, 'cause I can do the same probe several times with different results.
word crisis
It could be more serious than you think. I wondered how you got 52 1-letter words out of a 26-letter alphabet... the word list includes the cap and lowercase for each letter as a 'word' - further scrutiny revealed there are seperate entries for words that can be capitalized or not, e.g., Bill, bill, Mark, mark, Will, will. No telling how soon the pool will dry up now!
Of course, a 1934 dictionary is going to be shy a few words that have surfaced in the latter part of the last century.
Do mine maddness
Would you be so kind as tell me if the prenaptualagreementalisticallyminded.com is taken?
Do mine maddness
yes, but antiprenaptualagreementalisticallyminded.com is still available