Recently in Joyent Category

Joyent uses OpenSolaris zones for its accelerators. At some point I needed to verify the physical memory size of one of these zones but was unable to use the webmin tool that Joyent provides. This seemingly simple operation was actually pretty tricky to figure out. Here are the steps I followed:

 
$ sudo rcapadm -E
                                      state: enabled
           memory cap enforcement threshold: 0%
                    process scan rate (sec): 15
                 reconfiguration rate (sec): 60
                          report rate (sec): 5
                    RSS sampling rate (sec): 5
$ rcapstat -z 1 1
    id zone            nproc    vm   rss   cap    at avgat    pg avgpg
    46 foo            -    0K    0K 2048M    0K    0K    0K    0K
$ sudo rcapadm -D
                                      state: disabled
           memory cap enforcement threshold: 0%
                    process scan rate (sec): 15
                 reconfiguration rate (sec): 60
                          report rate (sec): 5
                    RSS sampling rate (sec): 5
$ 

rcapadm seems to be a resource management tool. I’m not sure if there’s an impact to leaving it running, but I disabled it just in case.

0 Votes

While upgrading one of my Zeus ZXTM traffic managers from v5.1r2 to v6.0r4 it crashed on startup. This pretty surprising because the upgrade process appeared to have proceeded without a hitch. Here’s the error I saw in the logs and when I attempted to fire up the zxtm program using SSH:

$ ./start-zeus
Initializing Zeus Application Framework. (C) 1995 - 2010 Zeus Technology Limited
Zeus Administration Server already running: 235
Zeus Traffic Manager - (C) 1995 - 2010 Zeus Technology Limited
Version 6.0r4, Build date: Feb 10 2010 08:32:37
Process permissions set to zeus:zeus
 INFO   Zeus Traffic Manager starting
 INFO   Version 6.0r4, Build date: Feb 10 2010 08:32:37
 FATAL Parent 1234 hit FATAL at Cannot fork:Not enough space
[0x6ac417] function __1cOcommkeyChanged6FpknNConfigSection_rknKStringBase_pknKConfigFile_p6_nIRetValue__ + 0x417
[0x8d4376] function __1cFFATAL6Fpkc1i_v_ + 0x66
[0x6ad2b3] function __1cUreally_nice_shutdown6F_v_ + 0x883
[0x6b3a3b] function __1cKParentBoot6Fpkc_v_ + 0xa8b
[0x5e83c1] function main + 0x571
[0x5c532c] function _start + 0x6c
[0x0] function ?? + 0xffffffffffa3ad40
$

Not enough space? Something was seriously amiss. The solution turned out to be pretty simple, but was not something I could find in the manual.

In v6.0, a new configuration parameter was added: sharedpoolsize. This was not set since I was upgrading from v5.1. The ZXTM made its best guess, it guessed wrong on my virtualized environment and picked a value that exceeded the memory available to my zone. The fix was simple: set sharedpoolsize in $ZEUSHOME/zxtm-6.0r4/conf/settings.cfg to a size small enough to fit into my available memory. Since this configuration did not exist, I mad to add it at the bottom of the file.

0 Votes

I ran into this very cryptic one while setting up Nagios at Joyent. I copied my plugins from one nrpe client to a new server. Three of my checks used check_procs which all failed with a message like this:

check_procs
System call sent warnings to stderr: pst3: This program can only be run by the root user!

To make this even more annoying, sudo did not fix it. The same error message was displayed. What was the problem? File permissions! The error message should say “This program must be owned by the root user!” The fix:

sudo chmod root:root pst3
0 Votes

I'm using Nagios to monitor some services on my Solaris 10 systems hosted at Joyent. Until now I've just been using check_http to monitor everything that I cared about. Times change, though, and now I need to monitor disk space, free memory, and cpu load on many systems. I like to keep things simple, so I decided that it's time to install NRPE.

Building Nagios 3 and the other plugins was a breeze so I figured that this would be no problem. I downloaded NRPE and did the typical install steps. This is what I saw:

$  ./configure
... lots of configure output ...
$  gmake all
cd ./src/; gmake ; cd ..
gmake[1]: Entering directory `/home/eng/nrpe-2.12/src'
gcc -g -O2 -I/usr/local/include/openssl -I/usr/local/include -DHAVE_CONFIG_H -o nrpe nrpe.c utils.c -L/usr/local/lib  -lssl -lcrypto -lnsl -lsocket  ./snprintf.o 
nrpe.c: In function `get_log_facility':
nrpe.c:617: error: `LOG_AUTHPRIV' undeclared (first use in this function)
nrpe.c:617: error: (Each undeclared identifier is reported only once
nrpe.c:617: error: for each function it appears in.)
nrpe.c:619: error: `LOG_FTP' undeclared (first use in this function)
gmake[1]: *** [nrpe] Error 1
gmake[1]: Leaving directory `/home/eng/nrpe-2.12/src'

*** Compile finished ***

If the NRPE daemon and client compiled without any errors, you
can continue with the installation or upgrade process.

Read the PDF documentation (NRPE.pdf) for information on the next
steps you should take to complete the installation or upgrade.

Eeek! That sure is an ugly error. At first I assumed that this was a configuration issue, but that should have come up during the ./configure. I ended up doing what you're never supposed to do: I hacked the code. The rest of the installation went by the book.

Just go on into src/nrpe.c and delete the only two line references to LOGAUTHPRIV and LOGFTP. In v2.12 I found them in the middle of an if-else series.

0 Votes

I was making a release today to one of my servers at Joyent. As part of the release I ran a short script written in Java. Java complained that it could not allocate memory to create a JVM! This is a bad sign on a production system. After some poking using top and the much more useful (on Solaris) prstat, I discovered that /lib/svc/bin/svc.configd was taking up 95% of my memory! It appears to have a memory leak.

I checked out the brief man page. It seemed pretty important so I was afraid to kill the process. Some googling around for a restart solution proved my fears baseless. It's OK to kill this process. It will restart by itself.

I killed svc.configd and it came back right away without incident. My memory was freed up.

Time to start monitoring memory usage on my opensolaris zones.

0 Votes