Domain Madness

OK, I admit it, I may have too much time on my hands,
but I was looking for a new DNS domain name, and decided to shove all of
the 234,937 words in /usr/share/dict/words through Whois, collecting those
not having a .com entry. The script is attached below. Currently the words
file lists the words in Webster’s Second International, who’s 1937 copyright
has expired.

Oddly enough, the result is that 63% of the words are NOT
registered names!! That’s right, 147,886 words are not taken. That’s the
good news. The bad news is that many of them are pretty weird. I’ve stashed
these guys, both compressed and clear, on
   
http://backspaces.net/files/NonDNSWords


   
http://backspaces.net/files/NonDNSWords.gz

For example, here’s the list of all 43 4-letter words not taken:

grep ^....$ NonDNSWords
bikh    fowk    hawm    koae    odso    shlu    waup    yeuk    yirndird    frib    hewt    kuar    oime    suld    wusp    yigh    yirrdowf    gawm    jaob    mowt    paut    syrt    wype    yilt    yuftdowp    ghuz    jewy    munj    phoh    uily    yalb    yirkemyd    gype    jhow    niog    rynt    wauf    ycie    yirm

I can’t see one that calls out to me, really. A lot of these do not appear in my dictionary, but the Second International was known for stretching!

Here’s a random sampling of 100 of the 6 letter critters:

grep ^......$ NonDNSWords | ran 100 | column -xdiaene evener tummer burdie palpon coccid taxwax chanst madefy buntalhaggly masted untone dutied unmiry cynips psetta otitic gawcie beflagmidpit orgyia tutory amylic begnaw punlet adigei scrank bedrop lusorydorize repale unmold snurly scotic unsing uplead hemine unnose stibicfunori cobcab yengee cahita rutuli menkib uptend sassak beflap crantsocyroe rugose avowry mogdad coecal elleck ptotic kommos amusgo lemosiavitic amorua cacara ideist reswim napaea reshut egeran lechea embolykorait uplick baeria kurvey ureido tuchit beroll adroop degged twiselkechel solate unbare hardim upwaft sullan tineal uramil ovinia pappoxforrad jacami unlean byrlaw thymyl scrobe lyncid crenic bepity anoine

..where “ran” is a simple awk script, below, to randomly select n lines
from a file. Its kinda spooky doing all this on Mac OS X .. it really IS
Unix.

Again, not a lot of love. By the way, there were 5,166 6-letter
words, so I likely have not shown some real winners in this sampling. Let
me know if you find some real winners.

This got me a bit curious .. how do the words work out by size?
I.e. how many words are 6 letters long etc? Time for another script, also
attached below:

/usr/share/dict/words           NonDNSWords 1         52     0.0221        . 2        155     0.0660        . 3       1351     0.5750        . 4       5110     2.1751         4         43     0.0291 5       9987     4.2509         5       1219     0.8243 6      17477     7.4390         6       5166     3.4932 7      23734    10.1023         7      10725     7.2522 8      29926    12.7379         8      16593    11.2201 9      32380    13.7824         9      20861    14.106110      30867    13.1384        10      22254    15.048111      26011    11.0715        11      20415    13.804612      20460     8.7087        12      17065    11.539313      14937     6.3579        13      12935     8.746614       9763     4.1556        14       8811     5.958015       5924     2.5215        15       5433     3.673816       3377     1.4374        16       3146     2.127317       1813     0.7717        17       1681     1.136718        842     0.3584        18        804     0.543719        428     0.1822        19        408     0.275920        198     0.0843        20        189     0.127821         82     0.0349        21         79     0.053422         41     0.0175        22         38     0.025723         17     0.0072        23         16     0.010824          5     0.0021        24          5     0.0034

Well, the most populous part of NonDNSWords is 10; here’s a sample:

dramseller floriation tractional clanswoman periphrasecyrtometer symphytize convolvuli mucigenous clamminesshyperacute myrtlelike unharbored ergonovine undertideddigressory preclosure parnassism habilatory boycottismnilometric paralgesic trimacular annelidian breezinessprelegatee admiringly scatophagy bonebinder morphinismendosteoma ranivorous undistinct solenodont scathinglyunfreckled unpanelled impalpably unemphatic staverwortgradientia cystospasm xenocratic cogredient rubescenceneurolytic unrebutted saponacity brachyoura depatriate

OK, I know you want to know the 5 24-letter words, so here they are:

formaldehydesulphoxylate
pathologicopsychological
scientificophilosophical
tetraiodophenolphthalein
thyroparathyroidectomize

..and, yup, antidisestablishmentarianism wasn’t there.

This did make it easy to search for substrings of interest. For example, I wanted to
find all the words with “plex” in them. There were 54:

grep plex NonDNSWords | column -x
amplexation     amplexicaudate  amplexicaul     amplexicauline  amplexifoliate
autocomplexes   cerviciplex     complexedness   complexionably  complexional
complexionally  complexioned    complexionless  complexively    complexly
decemplex       diaplexal       diaplexus       epiplexis       euplexoptera
ganglioplexus   holoplexia      intercomplexity kataplexy       myelapoplexy
nulliplex       overcomplex     overcomplexity  perplexable     perplexedly
perplexedness   perplexingly    perplexment     phantoplex      plexicose
pleximeter      pleximetric     plexodont       plexometer      plexure
pseudoapoplexy  reperplex       retroplexed     semiamplexicaul semiduplex
sextuplex       simplexed       supercomplex    triplexity      ultracomplex
unimultiplex    unperplexed     unperplexing    veniplex

This is a bit more interesting: holoplexia.com sounds nifty, as does nulliplex.com

So, I guess you’re wondering which one I took, right? Well, sadly,
none of them. While groveling around, I thought of a two-word critter I kinda
like: ComplexityWorks.com, so hmm..all this was a waste? I think not, but…

Scripts:

Check a list of words w/ whois.#!/bin/shpat=${1:-"^...*"}start=${2:-a}file=${3:-/usr/share/dict/words}words=`sed -n "/^$start/,\$p" $file | grep $pat -i`for w in $words ; do	whois $w.com | 	sed -n '/No match for/{s:.*for .::;s:......$::p;}' | 	tr A-Z a-zdone

For choosing N random samples from a stream:#!/bin/shsamples=$1awk -v samples=$samples '{a[NR]=$0} # Read in fileEND {        len=NR        for ( len=NR; samples > 0 && len > 0; samples--) {                i=int(rand()*(len+1))                print a[i]                delete a[i]                len--        }}'For sorting a stream by length:#!/bin/shawk '{a[length]++}END {	for (i in a) printf "%2i %10i %10.4f n", i, a[i], 100*a[i]/NR}' | sort -n

I’m curious: How did you pick *your* domain?!