advertisement

Listen Print Discuss

Electronic Archaeology
Pages: 1, 2, 3, 4

Glimpse

The Glimpse system is a high-speed text-indexing system. It works by scanning all your files and building up a database of words and their locations. Then, when you ask it where a word is located, all it has to do is go to the database and display the results.



The advantage here is that it is super fast. Even searches through huge volumes of text can be done quickly and easily.

But there are some disadvantages. The first is if you change a file, the changes are not reflected in the database until you rebuild the index. Thus, the information in the database may not be current.

Second, building the database takes time.

Finally, the Glimpse system is distributed with a restrictive license, which may prevent you from using it in certain circumstances.

Glimpse in Action

Let's see how to use Glimpse on a large set of source files. In this example, we've downloaded the source to OpenOffice.org. (This being the largest single GPL project we know of.)

First we need to run the glimpseindex command to index the database.

find oo_1.0.1_src \( -name *.h -o -name *.c* \) -print | \
    glimpseindex -H /home/sdo/muck/tools -F

-H
Specify directory for the database files.

-F
Read list of files to index from standard in.

To perform the search we use the glimpse command:

glimpse -H /home/sdo/muck/tools -n linux

-H
Specify the location of the database.

-n
Print the line number where the match occurs.

The following is a sample glimpseindex run:

find oo_1.0.1_src \( -name *.h -o -name *.c* \) -print | \
    glimpseindex -H /home/sdo/muck/tools -F
This is glimpseindex version 4.16.2, 2002.
Indexing "oo_1.0.1_src/common/english_us/custom.css" ...
Indexing "oo_1.0.1_src/sbasic/english_us/sbasic.cfg" ...
Indexing "oo_1.0.1_src/parser_i/tokens/tkpcont2.cxx" ...
...
Indexing "oo_1.0.1_src/parser_i/tokens/tkpstam2.cxx" ...
Indexing "oo_1.0.1_src/autodoc/source/tools/tkpchars.cxx" ...
Size of files being indexed = 137736911 B, Total #of files = 7600

Now that we have the index, we can search it. Here's an example:

glimpse -H /home/sdo/muck/tools -n linux
Your query may search about 24% of the total space! 

Continue?  (y/n) y
oo_1.0.1_src/dbaccess/source/ui/browser/dsbrowserDnD.cxx: 1279:  
 *  #65293# linux ambiguity
oo_1.0.1_src/dbaccess/source/ui/dlg/dsselect.cxx: 223:  
 *  #65293# cant compile for linux
oo_1.0.1_src/dbaccess/source/ui/dlg/indexdialog.cxx: 913:  
 *  Some error checked for linux
oo_1.0.1_src/dbaccess/source/ui/dlg/odbcconfig.cxx: 352:  
 *  #65293# cant compile for linux
oo_1.0.1_src/dbaccess/source/ui/dlg/sqlmessage.cxx: 603: 
 *  Syntax error with linux compiler
oo_1.0.1_src/svx/source/fmcomp/fmgridif.cxx: 1642: 	
 // the same props as in addColumnListeners ... linux has 
    problems with global static UStrings, so

Indent

As code evolves different people work on it. If the people in charge don't have a standard indentation style (and most don't), different programmers will use different indentation techniques. This makes the code difficult to read and understand.

But by processing the code through indent you can standardize the style. The result is code that is easier to understand.

It should be noted that indent won't work on all programs. There is still some syntax that will fool it.

The indent program came in handy when I once had to deal with "Jeff" code. Jeff was an unusual programmer who believed it was best to put the maximum amount of code on the screen at one time. As a result, he didn't put in any more whitespace than he had to. So his code started in the first column, and extended completely over to the right margin. Also, he didn't believe in comments. After all, the code was what counted, and he got as much of it on the screen as possible.

The result was something unreadable, except to Jeff:

BOOL SbiParser::Parse() { if(bAbort) return FALSE; EnableErrors(); 
Peek(); if(IsEof()) { if( bNewGblDefs&&nGblChain==0) 
nGblChain=aGen.Gen( _JUMP, 0); return FALSE; } if(IsEoln(eCurTok)) 
{ Next(); return TRUE; } if (!bSingleLineIf&& MayBeLabel(TRUE))
{ if(!pProc) Error( SbERR_NOT_IN_MAIN,aSym); else pProc-> 
GetLabels().Define( aSym); Next(); Peek(); if( IsEoln(eCurTok)) 
{ Next(); return TRUE; }} if( eCurTok==eEndTok) { Next(); if(eCurTok!=NIL)
aGen.Statement(); return FALSE; } if (eCurTok== REM) { Next(); return TRUE; }
if (eCurTok==SYMBOL ||eCurTok==DOT) { if (!pProc) Error( SbERR_EXPECTED,SUB );
else { Next(); Push( eCurTok); aGen.Statement(); Symbol(); }}}

When Jeff left the company I got his code. The first thing I did was run the program through indent. The result was something I could deal with:

BOOL SbiParser::Parse ()
{
    if (bAbort)
	return FALSE;

    EnableErrors ();
    Peek ();

    if (IsEof ())
    {
	if (bNewGblDefs && nGblChain == 0)
	    nGblChain = aGen.Gen (_JUMP, 0);
	return FALSE;
    }
    if (IsEoln (eCurTok))
    {
	Next ();
	return TRUE;
    }
    if (!bSingleLineIf && MayBeLabel (TRUE))
    {
	if (!pProc)
	    Error (SbERR_NOT_IN_MAIN, aSym);
	else
	    pProc->GetLabels().Define(aSym);

	Next();
	Peek();
	if (IsEoln(eCurTok))
	{
	    Next();
	    return TRUE;
	}
    }
    if (eCurTok == eEndTok)
    {
	Next();
	if (eCurTok != NIL)
	    aGen.Statement ();
	return FALSE;
    }
    if (eCurTok == REM)
    {
	Next();
	return TRUE;
    }
    if (eCurTok == SYMBOL || eCurTok == DOT)
    {
	if (!pProc)
	    Error (SbERR_EXPECTED, SUB);
	else
	{
	    Next();
	    Push (eCurTok);
	    aGen.Statement();
	    Symbol();
	}
    }
}

As you can see, the indent program turned "Jeff" code into something that is possible to maintain.

Pages: 1, 2, 3, 4

Next Pagearrow