a couple of weeks ago, a friend of mine here at BEA from the NetUI team (Eddie O'Neil) sent me email saying he'd been playing around with MySQL and WLS, and that he thought it'd be interesting to post that work so that people could have an easy path to follow when trying to get the two working together. i agreed of course, so i told him to send me what he had and i'd take it, wrap some words around it, and put it online. if life were only that simple <g>
that was over 3 weeks ago. he sent me something that worked, i spent 5 minutes reducing it to a smoking mess and the rest of the 3 weeks getting everything going again. what took so long? some of it is just me, in that i've got a particular way i want things to work so that what seems simple winds up being quite hard (and i can be a tiny bit stubborn, so when i hit a rough patch i tend to keep going forward, when all sense would dictate that i go back and try something else), and some of it is that there are a number of subtleties that make it hard all by itself. we'll wind up covering both here, and see which is which
the first thing i always want to do when trying to show how something works is to use it to do something. somehow i just find i don't understand something, and hence can't explain it, until i've done something "real" with it. but what to do, what to do?
TinyURL
i don't know about you, but i've been seeing people using tinyURLs just about
everywhere lately. if you've never heard of them, they perform a very simple
but useful task; they turn big URLs into small ones. tiny ones, in fact <g>.
the big problem this solves is the line-breaking-in-email problem. i'm sure
you've all experienced a situation where you've gotten a piece of email
containing a "long" URL (like a Yahoo map or a Google query) where somewhere
along the way the URL got split onto multiple lines, and part of the link gets
"broken off" so when you click on the link, it doesn't take you where it
should. you wind up having to paste twice or some such manual hackery. and if
you sent the ling to someone who's a bit clueless, you'll get a "your link
didn't work" reply back
TinyURL takes care of this problem by allowing you to submit your "long" URL and they return a "tiny" one. this can be used pretty much anywhere the long one could have been, and since it's really short, it pretty much never gets split or broken. it has the added benefit that you can use it places that a long one wouldn't fit, like in your IM status
what do TinyURLs have in common with MySQL and WLS? nothing in particular, but i thought it would make a cool, simple example of a data based web-app that did something useful, all by itself, in not too many lines of code, so that's what i decided to build
how does TinyURL work?
before i could write a line of code, i had to figure out how TinyURL worked.
so the first thing i did was submit the following URL,
http://www.oreillynet.com/pub/au/1303,
and they returned
http://tinyurl.com/sr66. not a huge
reduction in size, but enough for you to see what they do. and when i chase the
URL they returned, i wind up at the page i submited (which happens to be my
bio on ORN)
so the next thing i did was use wget to see what was actually going back and forth on the wire to make this dance work. here's what i found
$ wget http://tinyurl.com/sr66
--14:29:19-- http://tinyurl.com/sr66
=> `sr66'
Resolving tinyurl.com... done.
Connecting to tinyurl.com[216.234.186.14]:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://www.oreillynet.com/pub/au/1303 [following]
--14:29:19-- http://www.oreillynet.com/pub/au/1303
=> `1303'
Resolving www.oreillynet.com... done.
Connecting to www.oreillynet.com[208.201.239.37]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
[ <=> ] 19,325 36.79K/s
14:29:20 (36.79 KB/s) - `1303' saved [19325]
so looking closely at what happened, wget chased
http://tinyurl.com/sr66, only to have the server return a 302 and
another URL, http://www.oreillynet.com/pub/au/1303. a
302
response code means, in essence, that the server understands what you're
asking for, but wants you to know it's temporarily someplace else, and
that place is at this URL i'm handing you. so the user-agent (read: browser, or
more properly, http-client) should now rerequest the document from this new
location. which of course is what wget does, returning the final target
document. nice
MySQL
now that we know how TinyURL works from an HTTP protocol point-of-view, we need
to start thinking about what kind of storage needs we're gonna have. first,
we're going to need to accept a URL (let's call this the userUrl) and
return a URL (let's call this the tinyUrl). sounds pretty simple. we
create a table with two columns (tinyUrl, userUrl), uniquely indexed on
tinyUrl. ok, that seems simple, just create a table like that, both columns as
TEXT, and then... wait a minute. er, exactly how did that tinyUrl get created?
i mean, how do we create a unique tinyUrl for one userUrl, and a different
tinyUrl for a different userUrl? and also make sure that it works properly in
the face of simultaneous requests, which compounds the problem of trying to
create something unique by introducing synchronization issues into things. i
mean, what good would it be to make sure we didn't duplicate any "already
created" tinyUrls, if we didn't also make sure that we didn't accidentlly
create duplicates because the tinyUrl generation code was run twice at the same
time?
fortunately for us, databases turn out to be great at helping to solve these kinds of problems. first off, they're built to handle simultaneous requests and enforce all sorts of interesting rules about how to deal with concurrency. but they also understand how to play a role in creating uniqueness, like customer-id used as the primary-key in a database. i'm not going to try to cover these concepts in a general way, as there are (quite large) books dedicated to these subjects <g>. but i'll drill into the parts we need to make our service work
AUTO_INCREMENT
MySQL has this cool feature where you can declare an integer column as
AUTO_INCREMENT. when you insert a new row into a table with a column like this,
MySQL determines what the current MAX value in the column is, adds 1 to it, and
uses that as the value for that column in the new row. the great thing about
using this feature is that MySQL is taking care of both problems we were
dealing with, namely how to come up with the "next" tinyUrl to use, and also
how to do it in such a way that simultaneous requests don't result in the same
tinyUrl being created. MySQL, like almost all database products in existence,
handle concurrency as a matter of course
of course, this changes the code we're going to need to write somewhat. instead of storing tinyUrl as a string it'll be an integer in one of these AUTO_INCREMENTing columns. so where does the URL come from? it's interesting to notice that most of what's returned to the user to use as a tinyUrl is always the same. the only part that's really unique is that little bit at the end. in the above example where we used the real TinyURL, that was just sr66
given our new desire to store this little unique bit as an integer, it would make things nice and simple if we just turned it into a string, so that the integer 1 would turn into the string "1". and so our URLs would look like .../1, with the ellipsis (...) being the "static" part of the URL, the part that never changes. hmmmmm, that is simple, why doesn't TinyURL do that?
i have no idea how TinyURL is actually written, although some friends and i have speculated on this a few times. but i think it's safe to assume they are doing something similar to what i'm describing. but as we already know, the unique part of their URL doesn't look like an integer. or does it?
most of the developers reading this will be familiar with hexadecimal notation, or HEX for short. HEX is a base-16 number format, which means there are 16 different values possible in any given "place" in a number. as opposed to the base-10 system we (or must of us) grew up with where the values are 0 through 9 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), in HEX the values are 0 through f (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f). but things don't really have to stop there...
sexatrigesimal
welcome to sexatrigesimal <g>. that's the correct name (i think) for
a base-36 number format. googling around, i found a page on the
"Names of bases for
number systems". and while the page does not provide a name for base-36
directly, i think i understand the rules being used and base-36 would be named
sexatrigesimal, or what i'm going to call STG for short
in STG the values are 0 through z. in other words, instead of stopping at f the way HEX does, it keeps going until z. of course the next question would be, why bother? why not just use decimal or HEX?
the answer would be, to keep things tiny! taking the case above where TinyURL returned sr66, and assuming i'm doing the math correctly, and also assuming that i'm correct about how this string is encoded, i'd say that that value represents 1386134. or a shortening of almost 50% over the decimal notation (from 7 characters to 4). and it should only get better over time, as the number of pages just keeps getting larger
one other question some of you may be asking is, "ok, but why not keep going with this? why not split the alphabet into upper and lowercase letters to yield another 26 values to our number format (that'd get us base-62), and there are lots of other characters that could be used in the path of a URL, like ('.', '/', '&', etc), why not use them too? we could get up to base-70 or maybe 90? the only answer i can think about that makes sense is that in addition to making URLs a lot smaller, this is also supposed to make things easier (which is a reason near and dear to my heart). and it'd be a real pain to type or copy (or speak aloud?) "http://tinyurl.com/s.S&", don't you think? and just to double-check and see what TinyURL actually does, i hit them with the URL i'd gotten back from them earlier, but with the sr66 up cased to SR66, and they still bounce me to the same page. so it at least appears like they're using something like STG as their format
whew, what a rathole, eh? <g> ok, now let's see where we are. we know how TinyURL does their tricks with the browser, how they encode their URLs (or at least how we're going to do it), how we're going to interact with MySQL to help us generate unique URLs and deal with concurrency, is there anything else, or can we just start doing stuff yet? i think we're ready to go now, let's build it
MySQL install and startup
first, we need to get MySQL installed and ready to run. since i'm doing this on
RH9 it's drop-dead-simple, since MySQL comes as part of the RH distribution. so
as root, run the following commands to get the RPMs installed:
$ sudo rpm -i mysql-3.23.54a-11.i386.rpm perl-CGI-2.81-88.i386.rpm \
perl-DBI-1.32-5.i386.rpm perl-DBD-MySQL-2.1021-3.i386.rpm
$ sudo rpm -i mysql-server-3.23.54a-11.i386.rpm
the other nice thing about using the MySQL that comes with RH9 is that it's been setup to run as a service in the *nix style. so once installed, again as root, all you need to do to get it running is to run the following command:
$ sudo /sbin/service mysqld startyou can stop it the same way (but replace the word start with stop). and if you want MySQL to start at boot time, then the following command will make sure that happens too:
$ sudo /sbin/chkconfig mysqld onnow that MySQL is installed and running, we can install and run the 'tiny' service i've written
download and install the 'tiny' app
if you want to install the service yourself, or just use the scripts and code
for your own purposes, it's all packaged into a zip that you can
download. i'm going to
walk through the steps to install it, and after that walk through in detail how
it works
$ sudo /sbin/service wls stopnote that the rest of these instructions are all relative to that install. if you've got WLS someplace else, translate accordingly
$ sudo unzip mysql-connector-java-3.0.9-stable.zip \
-d /bea/weblogic81/common/eval/mysql
there's a file, startscript.xml in the directory /bea/user_projects/domains/wls/_cfgwiz_donotdelete. the config-wizard will honor and merge any changes it needs to make with any changes you need to make. so edit this file and add the bold line you see below to the same spot in the file as it appears here (we've put this bit in a separate window because of the long code lines).
$ unzip tinyapp.zip $ cd tiny
$ sudo ./tinySetup
$ sudo /sbin/service wls start
it's time now to dive into how all the pieces fit together. here we go...
tinySetup
let's take a look at the source to tinySetup and see what it does
#!/bin/bash # make weblogic/weblogic a (super)user in the mysql system mysql --user=root < userSetup.sql # add the database and tables for the tiny service mysql --user=weblogic --password=weblogic < dataSetup.sql # add the connection-pool and data-source to the server for the tiny service /bea/weblogic81/common/bin/config.sh -mode=silent -silent_script=$PWD/silentCTiny.txt # copy the tiny web-app over so the server will deploy it on startup cp -r tiny /bea/user_projects/domains/wls/applications/tinyfirst, it runs mysql and sets up the "user" account that the app will use to access the database. then it runs mysql again, logging in as the newly created user, and creates the database and table that the app will use. next it runs the WLS config-wizard, creating both the connection-pool and the data-source. it finishes by coping the "tiny" webapp into the "wls" domain so it will be automatically deployed when the server is restarted. let's take a quick look at each of these steps. after that, we'll check out the code to see how the story ends <g>
userSetup.sql - MySQL user creation
here are all the parts of the "userSetup.sql" file, used at the start of the
tinySetup script
# make the MYSQL database the default for the following statements USE mysql;here we see the USE command used to set the default database for all subsequent commands to operate on
# create a superuser that can connect from on this machine from the command line
GRANT ALL PRIVILEGES ON *.* TO weblogic@localhost
IDENTIFIED BY 'weblogic' WITH GRANT OPTION;
now we're creating the user "weblogic@localhost" using the
GRANT command. this names
the user, defines his rights and sets the password. in this case, the user
can access anything. one thing to notice is the way the user name is seen by
MySQL. it's user-name@location. in this case, since the database and app
are running on the same machine, it's the user "weblogic" on this machine (aka
"localhost").
# create a superuser that can connect from on this machine from WLS
GRANT ALL PRIVILEGES ON *.* TO weblogic@localhost.localdomain
IDENTIFIED BY 'weblogic' WITH GRANT OPTION;
now this is weird. this shouldn't actually be needed, but it turns out
there's a RH9/MySQL interaction problem that causes "local" user names being
presented to MySQL as "user-name@localhost.localdomain". according to
the MySQL README that comes with the JDBC driver it's a RH9 bug. and i know
that you don't need to do this on RH8. in any case, this GRANT is only there to
work around this issue. and this sure was a fun one to figure out <g>
one final note: if you're going to deploy this service other then in a demo manner, make sure you change the password created in userSetup.sql and used in tinySetup and TinyServlet.java
dataSetup.sql - MySQL database and table creation
the "dataSetup.sql" file has all the commands to create a database and a table
for use by our app
# create the database TINY to be used by the tiny service DROP DATABASE IF EXISTS tiny; CREATE DATABASE tiny;the DROP DATABASE and CREATE DATABASE commands are used so that we wind up with a database named "tiny". this is the database where our apps data will live. btw, the reason it leads off with a DROP DATABASE command is so that you can run the dataSetup.sql script multiple times if needed. i had to when writing this, and it's a useful habit to get into. the "IF EXISTS" clause is there for the same reason
# make the TINY database the default for the following statements USE tiny;of course we now make the "tiny" database the default for the next statements
# create the table REDIRECT for use by the tiny service
DROP TABLE IF EXISTS redirect;
CREATE TABLE redirect
(tinyVal INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
userUrl TEXT);
and we wrap up our MySQL work by using
DROP TABLE to remove
any old "redirect" tables we might have had, and then using
CREATE TABLE to add
a table named "redirect". look carefully at the column named "tinyVal". this is
the column will hold the integer we're going to lookup as the unique part of
the tinyUrl. that column has "NOT NULL AUTO_INCREMENT PRIMARY KEY" as modifiers
on it. the "NOT NULL" is not particularly useful, as the database will always
be setting it, but the "AUTO_INCREMENT" is critical for the reasons discussed
above and the "PRIMARY KEY" was added so that MySQL would know that this would
be the main access mechanism for queries on this table, and it could build
indices appropriately to speed things up
silentCTiny.txt - connection-pool and data-source creation and deployment
i'm not going to go through the entire silent-configuration script
line-by-line. you can read about how all the options work
online.
but i am going to point out a few choice parts that caused me problems so that
you'll be successful if you customize these scripts yourself
// open domain read domain from "/bea/user_projects/domains/wls/";when writing a silent-config script, the first thing you have to do is figure out if you're creating a new domain, or modifying an existing one. here i'm modifying an existing one, so i've got to tell the config-wizard to read it as it currently exists
// define the tinyPool connection-pool create JDBCConnectionPool "tinyPool" as tinypool; . . . set tinypool.TestConnectionsOnReserve "true"; set tinypool.TestTableName "SQL SELECT * from redirect";here a connection-pool is being created, along with the settings it needs in order to run. here you need to be careful of the TestConnectionsOnReserve and TestTableName settings. this is quite confusing (IMO) if you're going through the WebLogic console to create a connection-pool. here's what's going on
at a high level, a WLS connection-pool is a cache of connections. whenever an app running on that server needs one, it can get one from the pool, use it as needed, and then return the connection to the pool where it'll sit until some app needs it again. the reason that modern app-servers have connection-pools is because creating a connection, with the associated user-authentication that goes along with it, can be quite expensive. connection-pools make this a (mostly) one-time expense. this can be a huge performance win for functionality just like that in this app, where a request is going to result in exactly one query (the lookup from the tinyUrl to the userUrl). on a lightly loaded server you might not notice it, but on a even moderately loaded system, things would get bogged down pretty quick if each external request had to pay the connection creation and teardown cost. so life is good
er, life is good as long as the connections in the pool are valid. but what can happen (quite easily it turns out) is that the connection sitting in the pool waiting to be reused can go bad. network problems, timeouts, there's a number of things that can render a connection useless. all of which wouldn't be so bad but unfortunately there doesn't seem to be anything WLS can do to detect this situation. so the app asks for a connection out of the pool, tries to use it, and as Steve Matchett would say, "ka-BLAMMO"
the solution to this is setting both the TestConnectionsOnReserve and TestTableName fields. what you can have WLS do for you is to test each connection before it's pulled out of the pool. the first tells WLS you want this done, and the second provides a query to use as the test. it'd be great if WLS could just know what query to run based on the database type, but this really isn't possible because in real production environments application connections aren't privileged to access system tables and such that are a built-in part of the database. so you have to provide a query that you know can run successfully (assuming the connection is valid). then WLS will try to run it before handing the connection out and if it fails they'll throw it away and create another one
// deploy the tinyPool connection-pool assign JDBCConnectionPool "tinyPool" to target "myserver";here you can see the connection-pool being deployed on the server "myserver"
// define the tinyDS data-source create JDBCTxDataSource "tinyDS" as tinyds; set tinyds.JNDIName "tinyDS"; set tinyds.PoolName "tinyPool"; // deploy the tinyDS data-source assign JDBCTxDataSource "tinyDS" to target "myserver";then we create the data-source, which references the connection-pool, and deploy it
// update the domain add template "/bea/weblogic81/common/templates/applications/DefaultWebApp.jar"; update domain; close domain;finally we update the domain and close it. one thing to watch out for is that even though you're simply modifying an already existing domain, you still need to refer to a "template" to make it work. so i refer to one that ships with WLS. if you don't do this the script will fail at the end, which is annoying but harmless as until the "update domain" succeeds, nothing has really happened
TinyServlet.java
most of this class is just plain vanilla servlet code, which i won't go into.
but there are a few interesting points to cover. if you look at the code, the
real work is done in the two methods, getTinyVal() and
getUserUrl(). we'll look at each one in turn
private String getTinyVal(String userUrl)
throws Exception
{
.
.
.
Context ctx = new InitialContext();
DataSource ds = (DataSource) ctx.lookup("tinyDS");
String q = "insert into redirect (userUrl) values (?)";
conn = ds.getConnection();
ps = conn.prepareStatement(q);
ps.setString(1, userUrl);
ps.executeUpdate();
// now, get the auto-generated key that got created
rs = ps.getGeneratedKeys();
if (rs.next())
val = rs.getInt(1);
.
.
.
return Integer.toString(val, 36);
}
so the first thing we do is perform a JNDI lookup on the data-source,
tinyDS. then we get a connection out of the connection-pool from the
data-source. next we prepare the query. what this means is the database,
depending on the vendor, does work (like parsing the SQL statement) up front
and remembers what that work is, so that when you actually execute the
statement, it can do the minimal work necessary. now normally, this "prep work"
would get tossed out when the connection was closed, since we're using WLS
connection-pooling, the connection is just returned to the pool on close, and
this becomes another case where connection-pooling lowers per-request latency.
let's keep going. after preparing the statement, we've got to provide the userUrl as the value to be used in the INSERT statement. that's what PreparedStatment.setString(...) does. that call is saying to use userUrl as the value for the first (1) parameter in the query. look closely at the query and you'll see a '?'. that represents a value to be replaced at a later time. since we prepare the statement and the prep work is cached, we've got to supply the actually value to use each time we want to execute. which is what happens next
at this point, the table has been changed and there's a new row in the table. but we've still got one thing to do, which is to figure out what value MySQL decided to use for the AUTO_INCREMENT column tinyVal. since this functionality is fairly common, JDBC supports it natively. the pattern is to get a ResultSet from PreparedStatement.getGeneratedKeys() and then get the value, as an integer, of the first row/column. that's the new value. and it turns out there are other options too, depending on what version of everything you're using. you can read more about it in this article up on the MySQL site
we wrap this method up by taking the integer and turning it into a string. for this we call Integer.toString(...). here's where we're actually using sexatrigesimal as the number format, because in addition to handing in the new value the MySQL produced, we're handing in 36 for the radix (fancy word for base). so when the format takes place, it's none as described above (clearly we're not the first people to have needs like this)
one last thing i should also mention about getTinyVal is that it fails to duplicate one part of the real service, which is that TinyURL will actually return the same tinyURL for a given userUrl. to do this they must be running a query first, before they run the insert, and they return the same tinyURL they returned last time. but i wanted to keep things simple, so i didn't do that part. it should be very easy for you to do, based on the code already here
now let's take a look at getUserUrl
private String getUserUrl(String tinyVal)
throws Exception
{
// transform tinyval into an int
int val = Integer.parseInt(tinyVal, 36);
.
.
.
Context ctx = new InitialContext();
DataSource ds = (DataSource) ctx.lookup("tinyDS");
String q = "select userUrl from redirect " +
"where tinyVal = ?";
conn = ds.getConnection();
ps = conn.prepareStatement(q);
ps.setInt(1, val);
rs = ps.executeQuery();
if (rs.next())
userUrl = rs.getString(1);
.
.
.
return userUrl;
}
this method does the heavy-lifting when a request comes in where a user is
actually trying to chase a "tiny" URL, and needs to be redirected. the first
thing we need to do is translate the tinyVal string into an integer, base-36
the rest is pretty plain-vanilla JDBC and allows us to run a SELECT query against the redirect table in order to get the userUrl back. that's it
that's the whole thing. a "tiny" URL service, built on WLS and MySQL. i hope this was as useful and interesting to you as it was to me. and if you have any problems running or using this, please let me know
Rod Chavez has been building systems software for more then 15 years. He spent the first half of the 90's working for Borland, and the last half for Microsoft, before leaving gainful employment to start Crossgain (co-founded with Adam Bosworth) in 2000, which was then purchased by BEA Systems in in the summer of 2001.
oreillynet.com Copyright © 2006 O'Reilly Media, Inc.