June 2009 Archives

JavaDocs are great. They're the standard for api documentation, but as good as they are they have one big problem: all classes and all methods are created equal. This can be a problem especially in the quite verbose Java language. Sometimes one class or method is much better than another one. Sometimes you get overwhelmed by the options and have no idea where to even start looking.

Someone has come up with a solution and it's pretty good! It's called Jadeite and it was written four smart guys at CMU. They've even run it on the two common api docs already.

It introduces a few useful features.

  • It weights classes based on usage. java.lang.Date is huge, java.sql.Date is not as big. This is a huge help when dealing with two similar looking classes that have very critical differences.
  • Examples (only for constructors so far) that have been scavenged off of the Internet.
  • Users can annotate the documentation as they use it with placeholders. This is a combination wishlist / alternative finder.

It's a huge time saver already. The features have added the best part of php.net's comments without all the noise. If you're doing Java or Groovy development, go check it out.

0 Votes
  • Grails is amazing at many things, but as of v1.1.1 logging configuration and documenting logging configuration is sadly not one of them. The logger has unclear default behavior that must be overridden, and several redundant complex types of syntax that can be used. To make matters worse, it's changed pretty recently and the changes were not forward compatible. This means that Google brings up a lot of useless information.

I had simple logging requirements. For production troubleshooting I want a daily log of everything (even framework code) at the INFO level. The solution was simple, but it took me lots of blind hacking to figure out. Here's what I ended up doing:

First, replace the existing log4j block in grails-app/conf/Config.groovy with this:

log4j = {
    appenders {
        appender new org.apache.log4j.DailyRollingFileAppender(name:"file", fileName:"my_grails_app.log",
                datePattern: '\'_\'yyyy-MM-dd', layout:pattern(conversionPattern: '%d{ISO8601}\t%p\t%c:%L\t%m%n'))
        console name:'stdout', layout:pattern(conversionPattern: '%d{ISO8601}\t%p\t%c:%L\t%m%n')
    }
    root {
        info 'stdout', 'file'
        additivity = true
    }
}

Next, if you don't want lots of logging when you run grails test-app change your environment configuration like this to override the default logging we set up in the previous step:

environments {
    development { ... }
    test {
        ...
        log4j = {
            root {
                error 'stdout'
                additivity = true
            }
        }
        ...
    }
    production { ... }
}

That's really it. You'll now have a daily rolling log where stacktrace.log used to be created and logging will be suppressed in test.

0 Votes

I'm using Nagios to monitor some services on my Solaris 10 systems hosted at Joyent. Until now I've just been using check_http to monitor everything that I cared about. Times change, though, and now I need to monitor disk space, free memory, and cpu load on many systems. I like to keep things simple, so I decided that it's time to install NRPE.

Building Nagios 3 and the other plugins was a breeze so I figured that this would be no problem. I downloaded NRPE and did the typical install steps. This is what I saw:

$  ./configure
... lots of configure output ...
$  gmake all
cd ./src/; gmake ; cd ..
gmake[1]: Entering directory `/home/eng/nrpe-2.12/src'
gcc -g -O2 -I/usr/local/include/openssl -I/usr/local/include -DHAVE_CONFIG_H -o nrpe nrpe.c utils.c -L/usr/local/lib  -lssl -lcrypto -lnsl -lsocket  ./snprintf.o 
nrpe.c: In function `get_log_facility':
nrpe.c:617: error: `LOG_AUTHPRIV' undeclared (first use in this function)
nrpe.c:617: error: (Each undeclared identifier is reported only once
nrpe.c:617: error: for each function it appears in.)
nrpe.c:619: error: `LOG_FTP' undeclared (first use in this function)
gmake[1]: *** [nrpe] Error 1
gmake[1]: Leaving directory `/home/eng/nrpe-2.12/src'

*** Compile finished ***

If the NRPE daemon and client compiled without any errors, you
can continue with the installation or upgrade process.

Read the PDF documentation (NRPE.pdf) for information on the next
steps you should take to complete the installation or upgrade.

Eeek! That sure is an ugly error. At first I assumed that this was a configuration issue, but that should have come up during the ./configure. I ended up doing what you're never supposed to do: I hacked the code. The rest of the installation went by the book.

Just go on into src/nrpe.c and delete the only two line references to LOGAUTHPRIV and LOGFTP. In v2.12 I found them in the middle of an if-else series.

0 Votes

I encountered a curious bug today. A form which allows the user to edit an object and use check boxes to specify relationships to child objects stopped saving the state when some of them were checked. I assumed it was a data issue since only some of the child objects were impacted. As it turns out, using fieldValue in the definition of IDs for the check boxes was to blame. Specifically it's habit of formatting integers was to blame.

Here are the details on my situation. I had an edit page for objects of type Car and an array of check boxes to associate the Car object with its equipped accessories which were of type Accessory. The accessory I most recently added, 'Dashboard Jesus' would never stay checked.

After some fumbling around I noticed that my instantiation of the list of check boxes using a collection of Accessory objects, which used the object IDs for the check box names, had commas in it! My most recent ID has broken the 1000 barrier on my ID sequence. This caused the ID for Dashboard Jesus, and hence the HTML element ID, to be represented as 1,000.

Lessons learned:

  • fieldValue may format your data (including adding commas to an integer) so never, ever use it for anything other than the contents of a field.
  • Stuff in grails may format your IDs if you're not careful. To catch bugs early bump your sequence to a number grater than 1000. This may be possible in grails, but it seems complicated. I solved the problem simply by using a database tool to bump up my sequence value after Grails set up the database.
0 Votes

I was making a release today to one of my servers at Joyent. As part of the release I ran a short script written in Java. Java complained that it could not allocate memory to create a JVM! This is a bad sign on a production system. After some poking using top and the much more useful (on Solaris) prstat, I discovered that /lib/svc/bin/svc.configd was taking up 95% of my memory! It appears to have a memory leak.

I checked out the brief man page. It seemed pretty important so I was afraid to kill the process. Some googling around for a restart solution proved my fears baseless. It's OK to kill this process. It will restart by itself.

I killed svc.configd and it came back right away without incident. My memory was freed up.

Time to start monitoring memory usage on my opensolaris zones.

0 Votes
Most of the time we want web servers to run as fast as they can. There is one time we don't want this to be true: when there are stability issues on a back end service. For example, we're using a web service or database that crashes under load and we can't either throttle the use of this service or improve its ability to accept load. It's better to be up and slow than up and down. To solve this problem you can detune the connector configuration in Tomcat to reduce the number of threads processing requests.
  1. Edit tomcat/conf/server.xml
  2. Find the configuration for your connector. It will be in an element named Connector
  3. Set the maxThreads attribute to a smaller value. I used 50. This limits the number of requests that will be processed simultaneously and in turn reduces down stream load.
  4. Add an attribute for acceptCount if it is not already defined. I set this to 1000. This is the number of requests that we will allow to wait for a thread to free up. Tomcat will drop requests once this queue is full.
My Connector configuration looked like this in the end:
<Connector port="8080" maxHttpHeaderSize="8192"
               maxThreads="50" minSpareThreads="20" maxSpareThreads="40"
               enableLookups="false" redirectPort="8443" acceptCount="1000"
               connectionTimeout="20000" disableUploadTimeout="true" 
               compression="on" compressionMinSize="2048" noCompressionUserAgents="gozilla, traviata"
               compressableMimeType="text/html,text/xml,text/plain,text/css,text/javascript,application/x-javascript,application/javascript"/>
0 Votes

I've been doing a lot of experimentation with PostgreSql lately. It's a great database but since it's not as popular as MySql, a little more experimentation around the configuration ends up happening.

I was poking postgresql.conf quite a bit and restarting the server to apply the changes each time. I have actual production data now, though, so I can't do this any more. Luckily for all configuration parameters not commented as 'restart required', you can apply the config changes live. Just run this command as postgres:

pg_ctl reload  -D /var/log/postgres

0 Votes

As I mentioned in an earlier entry I'm using PostgreSql 8.3.3 at Joyent. It works great but the documentation is not as good as the ubiquitous MySql.

I required high availability but not real time fail over so I opted to use the warm standby which is a pseudo-built in feature of postgres as of version 8.3 that provides easy failover that can be automated if necessary. It's pseudo-built in because no coding is required, but you need to get your hands pretty dirty to get it working.

To get it working you'll need to do some work on the primary server so that it spits out incremental write ahead log (WAL) files as well as a bit more work on the warm standby server(s) to consume these logs.

You can follow these steps for both a brand new postgres installation as well as an existing heavy traffic installation. Furthermore, no downtime is required to set this up.

Assumptions

  • You're running PostgreSql 8.3.3 (the version that came with my Joyent node)
  • You have at least two identical nodes for this purpose at Joyent
  • You have NFS space for your write ahead log files
  • You've at least scanned the official docs on this subject

Primary Setup

All we need to do on the primary server is set it up to push archive WAL files to our NFS mount. It will push WAL files out in 16 mb chunks. Since 16mb is a lot of data to lose on a low traffic DB like mine, I'll be forcing it to flush every hour. This makes it a space hog, but disk space is cheap (at least at Joyent).
  1. Modify the /var/pgsql/data/postgresql.conf. Set the following parameters
    archive_mode = on
    archive_command = 'cp -i %p /shared/psql_wal/%f </dev/null'
    archive_timeout = 3600
  2. Restart postgres to apply the changes
    sudo -u postgres pg_ctl restart -D /var/pgsql/data

Standby Setup

I only set up one standby node, but you can have as many as you like as long as they all have access to the same NFS mount.

This is where the configuration gets a bit hairy. We'll need to build a small, but very useful, C utility from source called pg_standby. We'll also be doing quite a bit of configuration.


  1. Before we configure stuff we'll need to build pg_standby from source. It's not provided with the default Joyent posgres installation.

    1. Download the PostgreSql source code to your standby server. I could not find the source for v8.3.3, so I grabbed v8.3.4. wget works great for this.

    2. Decompress the tar ball and run configure and gmake
      cd postgresql
      ./configure
      ... lots of output from configure ...
      ./gmake all
      ... lots of output from gmake ...

    3. Next build the contents of contrib. This is the folder where all the cool semi-supported utilities (including pg_standby) live.
      cd contrib
      gmake all
      ... lots of output from gmake ...

    4. It's built. Time to install it.
      cd pg_standby
      sudo cp ./pg_standby /opt/local/bin/pg_standby
      sudo chmod 755 /opt/local/bin/pg_standby

  2. We now have pg_standby. You can verify that it works by running which pg_standby. Next we'll stop the standby server to prepare for a checkpoint backup from our primary system. Run this command on the standby server
    sudo -u postgres pg_ctl stop -D /var/pgsql/data

  3. With the standby server down we'll log into the primary server and start the hot backup to our NFS mount.
    echo "SELECT pg_start_backup('mybackup');" | psql -U postgres
    sudo tar -cvf /shared/mybackup.tar /var/pgsql/data

  4. While the backup flag is still set, log into the standby system and restore this backup.
    sudo rm -rf /var/pgsql/data
    sudo cp /shared/mybackup.tar /var/pgsql
    cd /var/pgsql
    sudo tar -xvf mybackup.tar
    sudo chown -R postgres:postgres data

  5. With the backup completed, log back into the primary system and clear the backup flag.
    echo "SELECT pg_stop_backup();" | psql -U postgres

  6. Time to configure. On the standby server create a recovery.conf file in /var/pgsql/data/ with the following contents. This tells postgres to slurp up the log files.
    restore_command = 'pg_standby -l -d -s 2 -t /tmp/pgsql.trigger.5432 /shared/psql_wal/ %f %p %r 2>>standby.log'

  7. Start up postgres on the standby server. It will start restoring data right away (since our live DB has already produced some data for it to consume)
    sudo -u postgres pg_ctl start -D /var/pgsql/data

  8. Take a peek at the restore to make sure there aren't too many errors. It may complain about missing files. Ignore these warnings
    sudo -u postgres tail -f /var/pgsql/data/standby.log

  9. Verify that it's still in standby mode. It should complain with this message: 'psql: FATAL: the database system is starting up'
    psql

Doing the Failover

Your primary server is busy serving up requests and your warm standbys are slurping up logs every hour. This is great and all, but what do we do when the primary fails? Here's my process for flipping over to a standby.
  1. Log into the primary server and make sure that postgresql is all the way down.
  2. If the primary server is not totally shut down, take it down the rest of the way. kill -9 the primary postgres process if necessary. sudo kill -9 `sudo cat /var/pgsql/data/postmaster.pid | head -n 1`
  3. log into the warm standby
  4. switch to the postges user sudo -u postgres bash
  5. Verify that the server is still in standby mode. It should report 'psql: FATAL: the database system is starting up'. psql
  6. Assuming that it is not up, create the trigger file. This tells our standby server that it's time to become a primary node. It's very important that only one server is primary at a time so i hope you followed steps 1 and 2. touch /tmp/pgsql.trigger.5432
  7. Verify that the standby server is running psql
  8. Get your clients pointing to the new server (change DNS, change IPs in configuration files, etc.)

Credits

This tutorial was largely adapted from Ichsan's Using pg_standby for high availability of Postgresql
0 Votes

It's time for another cryptic grails v1.1.1 stack trace.

I ran into this one when committing a bit change which made the root cause difficult to track down. After much fist shaking and rage, I discovered the issue in my grails-app/conf/Config.groovy file. I had a configuration parameter that I had forgotten to quote which contained a / character.

I fixed the syntax error in Config.groovy and the problem went away.

Here's the stack trace to help confirm you've run into the same issue.

groovy.lang.MissingMethodException: No signature of method: groovy.util.ConfigObject.div() is applicable for argument types: (groovy.util.ConfigObject) values: [[:]]
    at Config.run(Config.groovy:41)
    at GrailsPackagegroovy$runclosure1.doCall(GrailsPackagegroovy:45)
    at GrailsPackagegroovy$runclosure2closure10.doCall(GrailsPackagegroovy:87)
    at _GrailsPackagegroovy$runclosure2closure10.doCall(GrailsPackagegroovy)
    at _GrailsSettingsgroovy$runclosure10.doCall(GrailsSettingsgroovy:274)
    at GrailsSettingsgroovy$runclosure10.call(GrailsSettingsgroovy)
    at GrailsPackagegroovy$runclosure2.doCall(GrailsPackagegroovy:86)
    at GrailsBootstrapgroovy$runclosure7.doCall(GrailsBootstrapgroovy:140)
    at GrailsTestgroovy$runclosure7.doCall(GrailsTestgroovy:249)
    at GrailsTestgroovy$runclosure7.doCall(GrailsTestgroovy)
    at GrailsTestgroovy$runclosure1closure19.doCall(GrailsTestgroovy:110)
    at _GrailsTestgroovy$runclosure1.doCall(GrailsTestgroovy:96)
    at TestApp$runclosure1.doCall(TestApp.groovy:66)
    at gant.Gant$dispatchclosure4.doCall(Gant.groovy:324)
    at gant.Gant$dispatchclosure6.doCall(Gant.groovy:334)
    at gant.Gant$dispatchclosure6.doCall(Gant.groovy)
    at gant.Gant.withBuildListeners(Gant.groovy:344)
    at gant.Gant.this$2$withBuildListeners(Gant.groovy)
    at gant.Gant$this$2$withBuildListeners.callCurrent(Unknown Source)
    at gant.Gant.dispatch(Gant.groovy:334)
    at gant.Gant.this$2$dispatch(Gant.groovy)
    at gant.Gant.invokeMethod(Gant.groovy)
    at gant.Gant.processTargets(Gant.groovy:495)
    at gant.Gant.processTargets(Gant.groovy:480)
Failed to compile configuration file: No signature of method: groovy.util.ConfigObject.div() is applicable for argument types: (groovy.util.ConfigObject) values: [[:]]

0 Votes