March 2005 Archives

Jonathan Gennick

AddThis Social Bookmark Button

PyCon 2005. I’ve just returned to the office and recovered from a fantastic trip to Washington D.C. last week for PyCon, the annual Python Conference. While I enjoyed the conference, my wife took the kids around to a few of the many museums and monuments that are to be found in our nation’s capital. I didn’t get in on that action, except to manage a late evening subway excursioun with my daughter to see the Capital Building, the Supreme Court, and Union Station. I love old-style train stations, and the lighted Capital dome is gorgeous at 8:00pm on a dark evening.

The venue. I can’t say enough about how very much I like the venue for PyCon. The George Washington University area is bursting with life, so unlike many conference venues. There are shops, restaurants, monuments, people live in the area, and things were happening. Even the food-court known as J-Street on floor one of the Marvin Center (where the conference was held) was fun. My daughter and I found a great source of vegetarian sandwiches down there. Jenny’s a strict vegetarian, so this was no small discovery! And the University bookstore was in the building. How can you not like a bookstore that sells cool medical equipment like stethoscopes and reflex hammers (I forget the proper name) and what not? I came this close to buying my nine-year-old his own stethoscope (uh, Jeff, I hope you’re not reading this ‘blog). What I did find to buy was a copy of Timothy Gowers Mathematics: A Very Short Introduction.

Python Books. Alex Martelli and Anna Martelli Ravenscroft had only just finished revising the Python Cookbook. We managed to ship several dozen copies of the Second Edition direct from the printer to the show bookstore. If you were at the conference, you are among the first to see the book. I greatly enjoyed working with Alex and Anna on this second edition. They are excellent writers, passionate about their topic, and knowledgable. For me, the high point of the project was the day that Alex added me to the credit list for the recipe on "Finding Last Friday". While editing the chapter on time and money, I’d read the first draft of the recipe, and was hit with an idea for improving it. I worked out some details of modular arithmetic with a friend while hiking in the Pictured Rocks National Lakeshore. Then I emailed Alex my new alorithm and put the matter out of my mind. I about fell out of my chair when he sent the chapter back with my friend’s and my name in the credit list for that recipe. It just goes to show, that no matter what your Python expertise (and mine is very little indeed), you can still contribute to the cookbook. Alex, I’m honored.

For any who are curious, my friend’s and my solution is the one on page 119.

Favorite Session. There were many good sessions at the conference. Perhaps my favorite sessions were the two back-to-back sessions on
the new, decimal module by Michael Chermside and
Facundo Batista. Perhaps it’s because of my background in COBOL supporting a payroll system, but I’ve always thought it important to have support for true-decimal arithmetic, and so often languages seem not to provide for that support. It’s nice to see it coming to Python. Later that same day, I sat in on an "open space" session led by Facundo, in which Facundo, Alex, Anna, myself, and a few others discussed the possibility of creating a specific datatype for money. In the past, I’ve been skeptical of the benefit from such a type, but now I’m rethinking the idea. I’ve seen money types that are nothing more than fixed, two-digit decimal types. Those probably don’t add much value. But what if you could create a money type that combined both an amount and a currency type, so that you could store a value of USD 100 in one variable of the type and CAD 100 in another variable of the type, and the currency unit would be part of each value? What if you could somehow automate comparison of values across currency units? Well, that last is certainly an interesting challenge, isn’t it?

On the subject of time, Anna Martelli Ravenscroft gave an excellent presentation on The Time of Day tackling topics such as time zone support and Coordinated Universal Time (UTC). Anna also pointed me towards what appears to be a very comprehensive resource on Timezone Information. A few years back, I did a fair bit of research into time and time zones while revising the datetime chapter in Steven Feuerstein’s Oracle PL/SQL Programming to cover the then new, time zone, timestamp, and interval support in Oracle9i Database. Time and calendars, these things are not so simple. I have a lot yet to learn, and time is a fascinating area to explore. I never knew, for example, that Detroit, the city I grew up in, once had its own time zone.

In the favorite quote department, I had to laugh out loud when Greg Lindstrom of Novasys Health made the comment the largest obstacle to corporate adoption of Python is that "Python is too easy." A close runner up was Guido’s comment during his keynote that "Perl isn’t all bad."

I met many authors at the conference whom I don’t get a chance to see often: Alex and Anna I’ve mentioned already, there is also David Ascher, Mark Lutz, Ray Lischner (of C++ In A Nutshell fame), and Abe Fettig (upcomming book on Twisted). There were many other good sessions, on Scripting the Mac with Python, on PythonCard, on Design Patterns, and many more.

I thoroughly enjoyed my two days at the conference. It was great meeting people in person whom I usually can only trade emails with. The venue was great. My wife says it’s the best vacation (for her and the kids, anyway) that I’ve put together in a long time. I can’t wait to see what next year brings.

John Adams

AddThis Social Bookmark Button

Here’s a set of questions of interest to very few:

What major proprietary UNIX variants would you expect to see in a medium-to-large data center? How many? Two? One? All of them? How many minor UNIX variants? How many open UNIX variants?

Should I leave the polls to someone who knows how next time?

John Adams

AddThis Social Bookmark Button

That’s what’s missing in the brouhaha about college applicants who took advantage of poor security to peek at confidential information.

In one corner, we have overwrought commentary, like this gem from Patricia Keefe, editor of Information Week:

Hacking isn’t just wrong, it’s a crime. As noted by MIT dean Richard Schmalensee, the students who peeked made a conscious decision to do so and invested the necessary time. Their self-interest trumped their personal ethics. And that’s what this incident really turns on. The last thing we need in this country is more unethical people coming out of business schools. Haven’t we learned anything from the last two years of corporate debauchery and scandal?…

If these schools don’t take a stand now, to what standard will they later hold these students? If these schools really believe ethics is a serious matter, then they need to reject the students who hacked.

If what those students unwisely did was criminal, then the universities should be prosecuting them. They aren’t.

It’s even a stretch to call what the students did hacking, but that’s to be expected from a business publication. Most corporations are actively distrustful of, if not hostile toward, their IT departments. It’s a not entirely rational idea which, for instance, drives much of the fervor for outsourcing. The business computing press, which should know better, expresses this point of corporate ideology by confusing cracking with hacking. Post-dot-com-boom, management believes that hackers in the original sense of the word are bad, so why not conflate them with crackers? They’re bad, too.

The off-with-their-heads brigade is balanced, if that’s the word, by the unlocked-doors-are-an-invitation-to-enter crowd. Here’s brian d foy, writing here in his weblog:

…They weren’t being sneaky or trying to get information on anyone else other than themselves.

The information each student needed to get to the application status was gladly given to them by the web pages they were already allowed to view. I don’t see any “hacking” here.

Harvard Business School calls this “unethical”. Most businesses would call it “resourceful”, but that’s just another way schools and reality diverge…

How can you say someone isn’t being sneaky who is trying to get information before it’s been officially released? Who is using a hack (not much of one, granted) to peek at information they aren’t supposed to have?

The anthropomorphism of “gladly given to them by the web pages” (web pages aren’t glad–that’s human) hides the underlying issue that the people in charge of admissions information–which is information about both the student and the university, so the students were not just looking for information about themselves–intended for the students not to have that information at that time. The university personnel involved weren’t a bit glad.

As for businesses calling this “resourceful”, I’m thinking about what would happen at, say, a telecom company where a “resourceful” employee took deliberately separated data and reporting about, say, local service and long distance service, and then aggregated them to get sales leads. That would be resourceful as long as no one knew about it, but once the FCC realized that information which, by law, is not supposed to be aggregated had been, the consequences could be substantial. We’re talking millions of dollars in penalties here.

So, back to that sense of proportion. What these applicants did was wrong. It’s just not so wrong as to be a disqualification.

What they did wasn’t that different from what I do when I get a malformed URL to a news site–if I feel it’s justified, I poke around by altering the URL and seeing whether I can find what I’m looking for. What’s accessible on a public server is probably intended for public viewing, and trying to find that isn’t unreasonable–I’d even call it resourceful. In this case, though, the applicants who peeked were consciously trying to find out information they knew (or should have known) was intended not to be public.

What would be proportionate?

Well, what are the universities doing internally to the people responsible for the information leak? Are they firing directors of admission? Are they terminating contracts with ApplyYourself, or suing them for exposing private information? If so, then perhaps rejecting otherwise qualified applicants is fair. Are they doing so? If they are, I haven’t heard about it.

Are there “lessons learned” sessions for university employees who contributed to this screwup? There should be–and perhaps the applicants who peeked should be a part of those sessions. Maybe they should have to show up for school a few days early and spend some time living in the real world (ha!) of meetings and get their head cheese processed. That’s more reasonable, more fair than outright rejection.

The admissions departments might learn something about proportion from this process, as well. At prestigious schools, the admissions process has been turned into a circus. (Again, this comes down to corporate ideology, this time intruding itself into academia.) The process of admissions is deliberately and unnecessarily mystified, and some brave university that hasn’t yet been stampeded into Fudd-like “Kill the wabbit hacker student!” reaction should take this as a wake-up call to make admissions more transparent.

If Empire State decides in January that it might be best not to admit both Reed Richards and Victor von Doom, and that, as von Doom is a legacy student, Richards needs to make do with MIT, then what is the point of making Richards wait until April to hear about it? Mystique, hoopla, and branding–that’s all. There’s no educational purpose served by stretching things out–it’s inter-university corporate gamesmanship, the educational equivalent of what I saw succinctly described on Slashdot as “marketecture”.

Universities should also examine whether the corporate ideology that drives much outsourcing in business is affecting their decisions about outsourcing, say, parts of the admissions process. Is it really necessary to have a company handle your admissions for you? Is it an appropriate way to deal with sensitive information? Mightn’t that be better handled in-house? Or through a cooperative effort among universities? Perhaps an open-source system for handling admissions, peer-reviewed with security and privacy in mind, might be in the interest of both the universities and the applicants.

What the applicants who peeked did was wrong–no security model doesn’t mean no obligation to act ethically–but the greater wrong was committed and the greater harm done by those who allowed confidential information to be exposed, and there’s where the primary obligation to act, to repent, to reform lies.

Did you peek at my draft of this weblog before it was published? If you could have, would you have done so? If you had, should I have been offended?

Jonathan Gennick

AddThis Social Bookmark Button

Day 3 of the The Hotsos
Symposium
.
The last day. I began it with Lex de Haan’s session
on “Null Values: Nothing to Worry About”. A nice title, but it turns out that
nulls are something to worry about, and they’re not going away, so we
need to be cognizant of the sort of trouble they can cause. Lex pointed out
many cases in which the possibility of nulls can lead to subtle issues that
you must consider when writing a query. For example, suppose you wish to find
all employees in the scott/tiger emp table who have no subordinates. The
following query, using NOT IN, returns no rows:

select e1.*
from   emp e1
where  e1.empno NOT IN
      (select e2.mgr
       from   emp e2);

But use NOT EXISTS and you do get rows back:

select e1.*
from   emp e1
where  NOT EXISTS
      (select 'x'
       from   emp e2
       where  e2.mgr = e1.empno);

Which query is correct? The NOT IN query fails to return rows because there
is one (and only one) employee in the emp table with a null in the mgr column.
Who does that employee report to? It could be anybody, and, thus, you can’t
really know for sure whether any other given employee has subordinates. Which
query is correct comes down to whether you consider null to mean “has no
manager” or “we don’t know the manager”. And if you use null for both those
cases, well, then you tell me which of the above queries gives the “correct”
results.

Lex also spoke later in the day about the ISO SQL standard in a
presentation titled “Writing Portable SQL’. One fascinating tidbit Lex pointed
out was that if you set event 10407, you get access to a TIME datatype. I’ve
long surmised the existence of such a datatype, as Oracle’s support (beginning
in Oracle 9i Database) of ISO TIME literals practically demands that
such a datatype exist. Else how would the Oracle kernel be able to evaluate
expressions involving such literals. Please don’t use the TIME datatype in any
production code though. It’s not supported. Maybe someday.

Tom Kyte gave an excellent presentation on the importance of using
bind variables in OLTP applications. His demonstrations of their importance to
scaleability were most convincing. In one example, 10 sessions using a single
SQL statement with bind variables were able to insert 25,000 records each into
a table in about the same time that a single session was able to insert 25,000
records without using bind variables. Tom then demonstrated the
non-linear negative impact on scaleability when many sessions at once
are not using bind variables. He also pointed out that bind variables sidestep
many SQL injection problems, because you are not stringing together
user-supplied values in order to build up SQL statements, and thus users
cannot slip in their own SQL text. Yet bind variables aren’t always the right
answer. For data warehousing queries you are often best off not using bind
variables. In the end, Tom suggested the following rule-of-thumb: “seconds per
query: don’t bind, queries per second: do bind”.

Cary Millsap’s presentation on “How to Make an Application Easy to
Diagnose” was my last for the day, and my last for the Symposium. I should
have taken better notes for this one. Cary gave his thoughts on instrumenting
applications that you write, so that they can be traced. In his opinion:

  • Trace files should be designed to economize on space. Rather than:
    date=10-Mar-2005, time=10:00:00.00am, ela=.05, cpu=.01
    

    Cary prefers a format that does not repeat the labels:

    10-Mar-2005, 10:00:00.00am, .05, .01
    
  • Trace files should all begin with some sort of “key line” that describes
    the format of the data that follows. For example:

    date, time, ela, cpu
    10-Mar-2005, 10:00:00.00am, .05, .01
    10-Mar-2005, 10:00:00.05am, .04, .02
    ...
    

    This enables trace file
    format to change while making it possible to recognize which format is used in
    a given file.

  • Users must have the option to initiate and stop tracing, to facilitate
    capturing trace data of the correct scope.
  • The option must exist to write trace files in an unbufferred fashion, so
    that trace data can be viewed in real-time.

Cary also showed how DBMS_MONITOR can be used in conjunction with
DBMS_APPLICATION_INFO to trace applications that you write. Use DBMS_MONITOR
to start and stop tracing. Invoke DBMS_APPLICATION_INFO from within your
application to keep track of where you are in the application.
DBMS_APPLICATION_INFO calls get logged to the trace file.

And that’s it! As I write this, I hear the hotel’s housekeeping staff
closing in on my room. I’ve got to get out of here. I’ve got a plane to catch.
I can’t wait for next year’s symposium.

Jonathan Gennick

AddThis Social Bookmark Button

For me, day 2 of the href="http://hotsos.com/events/SYM05.php?event_id=36">The Hotsos Symposium began with the incrongruous sight of a
man wearing a jester hat, with blinking lights no less, speaking in detail
about the practice of performance profiling to an audience of expert DBAs who
were all paying rapt attention. It was all in good fun though, and helped
to raise several hundreds of dollars for href="http://www.nba.com/mavericks/community/Donnie_Nelsons_Assist_Youth_Foundation.html">Donnie Nelson’s Assist Youth
Foundation.

height="325" border="0" alt="Cary Millsap wearing a jester hat to raise money
for charity">

Cary Millsap sacrifices a bit of dignity to raise
funds for href="http://www.nba.com/mavericks/community/Donnie_Nelsons_Assist_Youth_Foundation.html">Donnie Nelson’s Assist Youth
Foundation.

(Photo courtesy of Carel-Jan Engel)

Steve Adams gave two of his usual,
high-quality and technically-deep presentations. The first covered the
internals of hash-join processing. His second was on single-table
hash-clusters, which provide the most efficient possible Oracle data access
path. You can create such a table, designate the primary key column as the
hash key, and Oracle can then translate a primary key value directly to a hash
value that leads directly to the slot (block and location within the block)
for the row in question. For all this to work, it’s important to have:

  • A single row per hash key.
  • One row slot per key.

Without the above, you take a hit on performance as Oracle will need to scan
all slots in the given block. Thus, if you cannot achieve the one-to-one
between hash keys and rows, you should specify the minimum possible block size
of 2K, to minimize the size of the blocks to be scanned.

Karen Morton gave an excellent presentation on setting up an effective,
Oracle test environment. She spoke on the need to record test runs, and to
also record before and after values of key statistics for each run. She
demonstrated scripts to do these things. Her scripts also recorded trace data
for each run by using the external table interface to load trace files into
LOB columns. Karen also spoke on the need to eliminate, in the test
environment, the need for DBAs to intervene. For example, it’s best if
developers can get their own trace files. The easier it is for developers to
use the test environment, the more benefit will accrue from it. Karen also
talked about things to look for when testing the performance of SQL
statements. For example, higher latch usage translates into lower
scaleability. Towards the end, Karen walked through an actual case-study in
which high redo generation from a statement ultimately led to the discovery of
a data skew problem that was adversely affecting performance.

And speaking of data skew, that was the subject of Dominic Delmolino’s
presentation
in which he recounted problems encountered by his company in
implementing a billing software package that had originally been designed for
a different type of industry. It turned out that the design of the software
itself was biased for the type of data skew typically encountered in telephone
billing. Since he wasn’t doing telephone billing, his data was skewed the
wrong way, the application assumptions did not apply, and significant
performance problems were the result.

Bruce McCartney showed off an
innovative technique (IMHO) by which he can store ALTER SESSION statements in
a table, link those statements to certain users and session characteristics,
and then those statements are executed by a logon trigger when a match occurs.
Getting back to skew again (must be a common problem this year!), Bruce talked
about how he used the technique to enable stored outlines for only for those
users of a packaged application who happened to be dealing with data skewed
differently from what the rest of the users were dealing with. Bruce talked on
a number of other ways to do session-level tracing and tuning, showing how to
write data to the database alert log, how to write data into trace files, how
to use DBMS_APPLICATION_INFO to log application modules and actions, and
more.

Day 2 was also the day for the Oracle-L list dinner. Alas, I didn’t go. I had
eaten too much lunch and was tired, and I had phone calls to return, so I
mostly relaxed in the hotel room, in between those pesky phone calls.

Speaking of food though, let me just end today’s report my complimenting the
Hotsos food selection person. Meals at this conference have been very
satisfactory. Everything is well-choosen and well-prepared too. Several whom
I’ve sat with at lunch have commented on how happy they are with the
meals.

Ok. That’s it. It’s morning of Day 3 as I write this, and I’m off to
breakfast and another full day of great sessions…

Jonathan Gennick

AddThis Social Bookmark Button

I’ve just finished day one of my hands-down, favorite Oracle conference: href="http://hotsos.com/events/SYM05.php?event_id=36">The Hotsos Symposium.
Focused on issues of performance optimizaton, the Symposium is held yearly near
Dallas, Texas. Speakers are top-notch and include leading lights in the Oracle
performance space: Dan Tow, Tanel Põder, href="http://hotsos.com/e-library/oop.html">Cary Millsap, Wolfgang Breitling,
Lex de Haan, href="http://asktom.oracle.com/">Tom Kyte, and many more. It’s not a large
conference, so there’s plenty of opportunity to mingle and talk one-on-one
with the different speakers.

Tom Kyte gave a fascinating talk entitled “SQL Techniques”. SQL is
one of my favorite tech topics. During his talk, Tom showed a technique for
writing a row-generator, a problem I’ve written about href="http://five.pairlist.net/pipermail/oracle-article/2004/000008.html">once or twice myself. The technique Tom showed came from an “Ask Tom” reader named
Mikito Harakiri and is the result of some far-out-of-the-box thinking about
CONNECT BY. For example, here’s a query that uses CONNECT BY to generate a
list of days in the current year:

SELECT TRUNC(SYSDATE,'YEAR') + LEVEL - 1
FROM DUAL
CONNECT BY 1 = 1
AND LEVEL < TRUNC(SYSDATE+366,'YEAR') - TRUNC(SYSDATE,'YEAR') + 1;

I've collected several row-generator techniques over the years, and I've
seen one or two interesting applications of CONNECT BY, but this particular
application of CONNECT BY might be the most creative solution to the
row-generation problem that I've yet seen.

Next up was Tanel Põder who spoke on "Advanced Research Techniques in
Oracle". Tanel demoed a technique that uses a Unix pipe and some external
programs to cause trace results to display automatically after executing a
query in SQL*Plus. The technique works for any sort of trace, including 10046
traces. Tanel further could specify criteria restricting the trace results
displayed to only those lines he was interested in. Tanel also demoed a way
to invoke an external debugger in response to an event.

Guđmundur Jósepsson offerred the following advice to database
administrators and developers on resolving performance problems:

  • Work together
  • Know what an application is doing
  • Collect facts
  • Don't fight symptoms, solve the problem!

Gudmunder also presented a case study involving the use of the href="http://hotsos.com/products/profiler.html">Hotsos Profiler to
identify the problem portions of a complex and business-critical query
involving eight views nested seven levels deep that originally scanned over
80,000 rows for each row returned. The DBAs and developers worked together to
rewrite the query in a way that avoided using such complex and generic views.
I wish I had written down the magnitude of the improvement; it was
significant.

Lex de Haan gave a detailed review of all the various Flashback
features in Oracle Database 10g. These include flashback queries,
version queries, and the ability to "rewind" the changes on a specific table
or on the database as a whole. Flashback can be very useful in certain types
of situations that might otherwise require more complicated, point-in-time
recovery. Interestingly, when you flashback a table, if you record the
original system-change-number, you can flashack your flashback, effectively
flashing forward back to the table's original state.

The last presentation of the day was from Dan Tow, author of the
O'Reilly SQL Tuning
book. Dan contrasted Oracle's cost-based-optimizer (CBO) versus human
optimization, enumerating ways in which humans can tune that the CBO cannot
approach. For example, the CBO is not able to make any optimization that might
change the semantics of a query. However, a human can look at the larger
picture, and, knowing the characteristics of the data, might be able to spot
corner-cases that can be safely changed. For example, consider the following
query similar to one of Dan's examples:

SELECT *
FROM a, b
WHERE a.primary_key = b.foreign_key
AND a.primary_key = :x;

The issue in this case is that a.primary key is a numeric column while
b.foreign_key is a character column. This is the sort of suboptimal design
that you sometimes just have to deal with. Oracle will implicitly convert as
follows:

WHERE a.primary_key = TO_NUMBER(b.foreign_key)

However, converting in the other direction is, in the case that Dan was
discussing, far more efficient:

WHERE TO_CHAR(a.primary_key) = b.foreign_key

The semantics of these two predicates are not the same! Consider that
b.foreign key might contain a value such as “14.0″. However, in the specific
application that Dan was tuning, such values represented a corner case that
never occurred, that could be safely ignored, and that he could protect
against with a constraint. Thus, he was able to achieve a significant
performance gain that the optimizer would not have been able to do, because
Oracle’s cost-based optimizer, unlike humans, is not allowed to change the
meaning of a query, not even in the corner cases.

The day ended with an excellent dinner and a Mardi Gras themed party during
which Mogens Nørgaard gave an
absolutely hilarious bit of impromptu, stand-up comedy. I love this
conference.