FAQ

Frequently Asked Questions

Contents

General

What is the difference between RHQ, Jopr and JON?

RHQ is an extensible management platform and is the core engine of Jopr and is where the vast majority of the code within Jopr originates from. RHQ is the main upstream project to both JON and Jopr. RHQ is licensed as a fully open-sourced project. RHQ is where some of the base plugins that are deployed within Jopr come from (e.g. the JMX plugin, the RHQ Agent plugin, the platform plugin, et. al.).

Jopr is the open source project that JON is based on (i.e. Jopr is upstream to JON). Jopr contains JBoss middleware specific plugins, such as the JBossAS plugin, Tomcat plugin, et. al. Jopr follows the same licensing agreement as the RHQ project.

JON (aka "JBoss Operations Network" or "JBoss ON") is a commercial product offered to Red Hat customers and is a fully tested, QA'ed and certified distribution of the open-source Jopr project. Out of the three projects mentioned here (RHQ, Jopr, JON), JON is the only one officially supported by Red Hat. Note that an older version of JON (version 1.x) exists but it is not based on RHQ/Jopr - instead, JON 1.x was closed-source only.

What documentation is available?

RHQ user documentation can be found here.
RHQ developer documentation can be found here.

Note: the commercially-available-only JBoss ON product has its documentation at http://www.redhat.com/docs/en-US/JBoss_ON/.

Is there a publicly available issue tracker system to search for bugs and submit enhancement requests?

Yes. If you would like to search for a bug, report a bug, or submit an enhancement request for one of the core subsystems (i.e. the Server, the Agent or one of the core RHQ plugins), then use the RHQ JIRA located at http://jira.rhq-project.org/browse/RHQ. If your request is for one of the JBoss middleware plugins (or, more generally, a plugin provided by the Jopr project), then use the Jopr JIRA located at https://jira.jboss.org/jira/browse/JOPR. Please do not submit duplicate issues in both JIRAs - it won't make us fix it any faster : ).

Is XXX database supported?

No. PostgreSQL and Oracle are the only supported databases for production usage. RHQ 2.3 and up can also support an H2 Embedded Database for demo and developer usage only.

What is the syntax for regular expressions used within RHQ?

RHQ uses regular expressions in several places. When you encounter a place (user interface, configuration file, etc) that requires you to enter a regular expression, consult Java's javadoc documentation for the syntax rules. Below are the Javadocs for regular expression syntax and date/time syntax for Java5 and Java6:


Data Model

What is a "Measurement Definition" versus a "Measurement Schedule"?

In order to know what schedule and a definition represent, you need to understand the underlying data model for the measurement data.

In RHQ, there are "resource types" and "resources". A "resource type" represents a kind of resource (like "JBossAS server" or "Apache Web Server"). A "resource" is an instance of a resource type (like "My JBossAS App Server" or "hostname foo Apache").

RHQ has analogous entities in the realm of the measurement subsystem, too. For each resource type, we have "measurement definitions", sometimes alternatively called "metric definitions" (because these are the <metric> definitions in the XML plugin descriptors) - these represent a "kind" of measurement. For each resource, there is an instance of each metric definition called a "measurement schedule". So, for example, a "Linux Platform resource type" has a "Free Memory metric definition". Each of your Linux boxes would therefore have a "Free Memory measurement schedule". Each schedule has their own collection intervals associated with them - that's why you can collect Free Memory for platform A every 30 minutes but you can collect Free Memory for platform B every 15 minutes.

So, it is like this:

  • Resource Types have Metric Definitions.
  • Resources have Measurement Schedules.
  • A Resource is an instance of a Resource Type.
  • A Measurement Schedule is an instance of a Metric Definition.

Therefore, "measurement definition" refers to the kind of metric being collected for that resource's type (e.g. it refers to the "Free Memory" metric defined on the "Linux platform" resource type - it does not refer to any specific resource or any specific piece of data, rather it identifies the "kind" of metric). "Measurement schedule" refers to the specific measurement data that was collected for a specific resource (e.g. it refers to the Free Memory measurement data for the specific Linux platform resource named "myhost").


User Interface

How can I ignore an autodiscovered resource?

If your agent discovered a new platform and found a few resources that you do not want to take into inventory, you have to tell the RHQ Server to ignore those resources.

First, you can just select the resources to import in the auto-discovery portlet and deselect the unwanted resource. As long as they are shown in the portlet, they are not imported. Of course this has the disadvantage that it might be confusing to always see them.

The other option is to select the resource you do not want to import and click on "Ignore", so it no longer shows up in the portlet. Although, if you try this on a resource on a freshly discovered platform, it will fail. The reason behind this is that the inventory is organized in a tree like manner with the platform as a tree-root and when a server or service is taken into the system (no matter if imported or ignored) it will be attached below that root. When the platform is not yet imported into the inventory, there is no root where the ignored resource can be attached to.

So to ignore a server on a platform: first import the platform and leave that server unchecked. When the platform is successfully imported, select the server and click on "ignore".

From the above explanation you can see that it is not possible to ignore just a platform. If you want to ignore a platform, just do not run an agent on it.


Server

How do I get debug messages from the RHQ Server?

You can edit the <server-install-dir>/jbossas/server/default/conf/jboss-log4j.xml configuration file to enable debug messages. Generally, you will want to just uncomment the org.rhq category, which will set its priority to DEBUG. This will emit debug messages for all RHQ subsystems to the log file. If you want to only emit debug messages for a smaller subset of the RHQ server internals, you can uncomment only those categories you are interested in, or add your own categories. There are several commented out categories in jboss-log4j.xml, with comments that briefly explain what types of debug messages can be expected from a particular category. You can also emit debug messages for third-party subsystems - again, there are some already commented out in jboss-log4j.xml - things like JBoss/Remoting and Hibernate are examples of these third-party subsystems that you can configure to emit debug message.

After you make your changes to the jboss-log4j.xml file, save the file and the RHQ server will hot-deploy the changes within a few seconds. The debug messages can be found by examining the file <server-install-dir>/logs/rhq-server-log4j.log.

Note that by default the console window will not show the debug messages. This is because the log4j CONSOLE appender has a threshold at INFO. If you want your debug messages to also go to the console, you must change the CONSOLE appender's Threshold setting to DEBUG.

In some cases, you will want to get debug messages from the RHQ Server launcher scripts. To do this, you need to set the environment variable RHQ_SERVER_DEBUG to "true". Now when you start, the launcher scripts will output debug messages.

Most log files emitted by the RHQ Server are found in <server-install-dir>/logs.

How does RHQ integrate with external LDAP user repositories?

RHQ uses passwords to authenticate users. Authentication information, comprising user names and passwords, can be stored in an internal database (the default) or in an external LDAP repository. It is important to note that support for LDAP currently does not include storing attributes other than user names and passwords. In particular, authorization information such as roles used to control access to RHQ resources is persisted in the internal database.

  • Configuring RHQ to use LDAP for authentication
    1. In order to configure RHQ to use LDAP for authentication, navigate to the Server Configuration page (Administration>System Configuration>Settings). The following configuration parameters can be specified:
      1. URL of the LDAP server: This defaults to ldap://localhost on port 389 (or port 636 if the SSL option is selected). Do not use "ldaps:", the "Use SSL" option for that.
      2. Username/Password: The username and password to connect to the LDAP server. The username is typically the full LDAP distinguished name (DN) of a manager user, e.g. "cn=Manager,o=JBoss,c=US".
      3. Search Base: Base of the directory tree to search for usernames and passwords while authenticating users, e.g., o=JBoss, c=US
      4. Search Filter: Any additional filters to apply when doing the LDAP search. This is useful if the population to authenticate can be identified via given LDAP property, e.g., RHQUser=true
      5. Login Property: The LDAP property that contains the user name. Defaults to cn. If multiple matches are found, the first entry found is used.
      6. Use SSL: Provides the option to use SSL while communicating with the LDAP server.
    2. The configuration settings are captured and stored in the internal database. No attempt is made to validate the information at this point: any misconfiguration would be detected when a user attempts to log in to the GUI console.
  • Authenticating users via LDAP
    1. Once the RHQ Server has been configured to use LDAP for authentication, subsequent attempts to login to the GUI console result in requests to the LDAP server to validate users' credentials. Communication with the LDAP server is handled by a class that implements a JAAS Login Module for LDAP. The login module first searches the set of base directories for a matching username applying any search filters. If a matching name is found, a bind request specifying both the username and password is sent to LDAP to validate the credentials. Authentication is deemed successful if the bind request returns normally.
    2. Irrespective of whether LDAP is selected for authentication, the credentials of the root user rhqadmin are stored in the internal database. Stacking login modules makes for seamless authentication: when the LDAP option is selected, normal users are authenticated in LDAP and the root user rhqadmin is authenticated in the database.
  • Impact of user administration on the LDAP repository
    1. As stated earlier, RHQ uses LDAP only to perform credential validation. Auxiliary information about a user such as first/last name, phone number, email address, and roles is stored in the RHQ internal database. Furthermore, user administration actions performed in RHQ do not impact the LDAP repository. For instance, the LDAP repository is not populated with the username and password when a user is registered in RHQ. The user must be defined in the LDAP repository independently of RHQ administration, and is assumed to have credentials populated in the LDAP repository when he/she is ready to access the GUI console. In other words, RHQ uses the LDAP repository in read-only mode.
    2. One of the interesting features of LDAP integration in the product is the support for self-registration in RHQ available to those who are identified as potential RHQ users in the LDAP repository. One way of identifying RHQ users in the LDAP repository is to define attributes that can be specified in a search filter in the RHQ configuration, e.g., RHQUser=true. When such a user accesses the GUI console for the first time, he/she is first authenticated in the LDAP repository, and then redirected to the RHQ registration page to capture auxiliary information such as first/last name and email address. This alleviates the task of user registration for RHQ administrators, and reduces the likelihood of errors as information is entered directly by the registrants.

RHQ does not currently check server certificates for LDAP over SSL, nor can it provide client side certificates to the LDAP server. However, developers should be able to customize RHQ to perform these tasks - please see http://jira.rhq-project.org/browse/RHQ-2064 for more information.

How can I specify command-line options for the Server JVM?

On UNIX

If you want to override the default max heap and permgen sizes, set them via the RHQ_SERVER_JAVA_OPTS environment variable, e.g.:

RHQ_SERVER_JAVA_OPTS="-Dapp.name=rhq-server -Xms256M -Xmx1024M
-XX:PermSize=128M -XX:MaxPermSize=256M
-Djava.net.preferIPv4Stack=true"
export RHQ_SERVER_JAVA_OPTS

Set all other JVM options via the RHQ_SERVER_ADDITIONAL_JAVA_OPTS environment variable, e.g.:

RHQ_SERVER_ADDITIONAL_JAVA_OPTS="-Dfoo=true"
export RHQ_SERVER_ADDITIONAL_JAVA_OPTS

On Windows

For all other JVM options, add "wrapper.java.additional.n" lines to <server-install-dir>\bin\wrapper\rhq-server-wrapper.inc (creating that file if necessary), e.g.:

wrapper.java.additional.12=-verbosegc:file=gc-log.txt
wrapper.java.additional.13=-XX:+HeapDumpOnOutOfMemoryError
wrapper.java.additional.14=-XX:HeapDumpPath=heap-dump.txt

How can I confirm my server's email/SMTP settings are correct?

Each server is configured to talk to a particular SMTP server. This configuration
is found in the rhq-server.properties file:

# Email
rhq.server.email.smtp-host=localhost
rhq.server.email.smtp-port=25
rhq.server.email.from-address=rhqadmin@localhost

If you want to confirm that these settings are correct and the server can actually send emails successfully, log into the GUI as the "rhqadmin" user and go to the "test email" page located at http://<your-server>:7080/admin/test/email.jsp.

When do Baselines auto-calculate?

Go to the Administration>SystemConfiguration>Settings page of the RHQ GUI. You will see settings for Automatic Baseline Configuration Properties. Baseline Frequency determines how often the baselines will be calculated. By default it is 3 days. This means that every 3 days a new set of baselines are calculated (except for those that were manually set by the user - those remain pinned to the baselines set by the user). Baseline Dataset determines the minimum set of data that must have been collected for a measurement before a baseline for that measurement is calculated. The default is 7 days. For example, when it is determined that baselines should be calculated (every 3rd day by default), only those measurements that have data that are 7 days old or older will get a baseline calculated. Any measurements that do not yet have data from 7 days ago will be skipped. This ensures that when a measurement's baseline is calculated, you have a good representative set of data to include in the calculation (e.g. by default, you will have 7 days worth of data that will be included in the baseline calculation).

I deleted a Platform from inventory. How do I get it be rediscovered, so I can re-import it?

Just force an Agent discovery by issuing the following command at the Agent command prompt:

> discovery -f

Alternatively, you can register a new Agent by restarting the agent, specifying the -l option, on the machine corresponding to the Platform you deleted. The Platform will get rediscovered. That is, first quit the agent (using the 'quit' command), then run:

<agent-install-dir>/bin/rhq-agent.sh -l

My server machine does not have a writable directory called /var/run. How can I get my rhq-server.sh script to successfully write out its pidfile?

Set the environment variable RHQ_SERVER_PIDFILE_DIR to a full path of the directory where you want the pidfile to get stored. When you run the script, that variable's value will override the default location. If you have an older script (2.1 or older), directly edit rhq-server.sh and change /var/run to the directory that you want.

The default location for this pid file has changed in 2.2 and up - it is now written to the /bin directory of the server install directory.

When I try to start the server, I get an exception whose cause says "Exception creating identity" and the server fails to start. How can I fix this?

The message you are probably refering to looks something like this:

Caused by: java.lang.RuntimeException: Exception creating identity: my.host.name.com: my.host.name.com
| at org.jboss.remoting.ident.Identity.get(Identity.java:211)

This is not RHQ specific - its JBoss/Remoting failing. See: https://jira.jboss.org/jira/browse/JBREM-769. The core issue (that is hidden from you, because JBoss/Remoting isn't bubbling up the real error message, as per that JIRA) is typically because your hostname is not resolvable. Make sure your hostname (as reported to you in that exception message, e.g. "my.host.name.com" is a valid hostname and make sure it is resolvable by your machine (i.e. is it in /etc/hosts?? can you get an IP for it via nslookup??)

My server logs are showing the message "Have not heard from agent ... Will be backfilled since we suspect it is down". What does that mean?

When you see

[org.rhq.enterprise.server.core.AgentManagerBean] Have not heard from agent [<some agent name>]
since [<some date/time>]. Will be backfilled since we suspect it is down

it means that the agent did not send up its availability report in the required amount of time (which called the "agent quiet time" - 15 minutes by default but is configurable in the Administration>SystemConfiguration>Settings page). When this happens, the server gets worried and suspects that agent is down - at which time it "backfills" the availability of ALL resources managed by that agent to DOWN (you'll see the availabilities turn RED).

This can happen for a number of reasons:

  1. the agent really did shutdown or crash
  2. the machine the agent is running on completely shutdown or crashed
  3. the network between the agent and server went down, thus prohibiting the agent from connecting to the server and sending the availability report
  4. the machine the agent is running on is bogged down thus slowing up the agent and prohibiting the agent from being able to send up reports fast enough.

What ports do I have to be concerned about when setting up a firewall between servers and agents?

(note: the following refers to out-of-box defaults. If you configured your RHQ Servers or RHQ Agents to use different ports than their out-of-box defaults, you'll obviously want to use your custom port numbers.)

The agent is configured to talk to one or more RHQ Server endpoints. Each RHQ Server endpoint includes the port. This port is typically over 7080 (non-secure) or 7443 (SSL-secured). So, when the agent starts up, it will try to communicate to a RHQ Server over one of those two ports (an agent will only need to go over one of those ports to talk to a single RHQ Server - it depends on the transport used). If your server expects to be communicated to via the "servlet" transport, that is unsecured over 7080 by default; if it uses "sslservlet" transport, that is ssh-secured and is over 7443 by default.

That's the only port from agent to server.

When the server needs to talk to the agent, the server talks over the bind port that the agent was configured for. Out of box, that is 16163 (and that can be the same port whether the server talks to the agent over the socket or sslsocket transport - in other words, it doesn't matter if its secured or not, by default out of box its port is 16163).

That's the only port from server to agent.

Of course, the server needs to talk to its database too - so make sure any ports that your database requires to be open are actually open to the server - otherwise, the server will not be able to talk to the database.

Note: servers do not talk to one another directly - there are no ports required for server-to-server links because there is no communication like that going on.

So in summary:

server -> agent : port 16163
agent -> server : port 7080 (unsecure) OR 7443 (secure)
server -> database : specific to your database and its configuration

I installed the Server as a Windows Service, but it is failing to start with no error messages. How can I start the Server as a Windows Service?

You probably installed the Server to run as the "Local System Account" and that account probably doesn't have the proper permissions to run the Server. Perhaps your machine has been locked down due to security concerns and that Local System Account cannot access the network or run Java or any number of things. To solve this, create a user on your Windows box that can run the Server properly (you can test it, log in as the user and execute "rhq-server.bat console" to see if it can be run by that user). Then, install the Server as a Windows Service with the RHQ_SERVER_RUN_AS_ME environment variable set to "true":

rhq-server.bat remove
set RHQ_SERVER_RUN_AS_ME=true
rhq-server.bat install

For more information on installing the Server as a Windows Service, see Installing and Running as a Windows Service.


Agent

How do I get debug messages from the RHQ Agent?

The easiest and quickest way to get your agent to start logging debug messages is, before starting your RHQ Agent, to set the environment variable RHQ_AGENT_DEBUG to "true". Now when you start the agent, both the launcher scripts and the agent itself will output debug messages. When you use this environment variable, the agent will use an internal log4j configuration file called "log4j-debug.xml" which is located in the agent's main jar file.

If you want more fine-grained control over what log4j categories have DEBUG priority, you can directly edit the conf/log4j.xml file (modifying this file requires an agent restart in order to pick up the changes). You must not set RHQ_AGENT_DEBUG if you want the agent to use this log4j.xml file (setting that environment variable will cause the agent to override this log4j.xml with the internally configured log4j-debug.xml file which enabled all categories for the DEBUG level).

The log messages can be found in the log files located in the <agent-install-dir>/logs directory. If you are launching the RHQ Agent on Windows using the service wrapper, you must set RHQ_AGENT_DEBUG and then install the service via rhq-agent-wrapper.bat install.

If you want to enable or disable debug messages while the agent is still running, you can use the "debug" prompt command (type "help debug" at the agent prompt for more info).

You can write your own log4j.xml files, put them in /conf and use them via the debug -f command. For example, debug -f custom-log4j.xml. This means that while the agent is running, you can switch between log4j.xml files if you want by simply using debug -f and passing in the log4j.xml file you want to use. RHQ also ships with log4j-warn.xml in the agent jar too - this can be used if you want the agent to be especially quiet (only WARN and above messages are logged, INFO and below are not).

For example, during runtime you can invoke debug -f log4j-debug.xml which will "turn on debugging" while the agent is still running. When you are done debugging, you can invoke debug -f log4j.xml which switches the agent to the default log4j configuration without having to shutdown and restart the agent. You can get fancy with your own log4j xml files - so if you want to just enable debug for your own plugin for example, you can write your own log4j.xml, put it in conf/ and switch between that log4j configuration and the default one all without having to recycle the agent.

How do I start the RHQ Agent fresh, as if newly installed?

If you want the agent to clean itself of all previous inventory and force itself to re-register with the server, shutdown the agent and restart it with the --cleanconfig command line option. If you do this, you may also pass in the --config argument to have it start up with a configuration file you specify (otherwise, the default conf/agent-configuration.xml will be used). The -l option is an alias for --cleanconfig and -c is an alias of --config - therefore your command line can be similar to the example below (if on Windows, replace .sh with .bat):

rhq-agent.sh -l

or

rhq-agent.sh -l -c my-agent-configuration.xml

where both will clean the old configuration, but the first loads the default agent-configuration.xml and the second loads your custom my-agent-configuration.xml (it will look in the conf/ directory, unless you specify a full path to a location other than conf/).

My resources went "red" after starting the agent with -u / --purgedata or -l / --cleanconfig

If you purge the persisted data that the Agent maintains, you must also reset the "connection properties" for each resource that Agent is managing. If your resource had manually overridden connection properties (ones that you used the web console to set), then you will need to set those again. To ease the burden of doing this, consider creating compatible groups for these resources; this will enable you to set the connection properties across all members in the group at the same time.

In RHQ 2.1 and later, this is no longer an issue, since connection properties are synchronized from the Server after an Agent's data files are purged.

How can I update the plugins on all my agents?

When you add a new plugin to your system, or you upgrade an existing plugin, you normally want to tell all of your agents to update their existing plugins with the new plugin versions. You can individually do this by executing the prompt command "plugins update" at any agent prompt. Or you can individually execute the operation "Update All Plugins" from the UI's Operation Tab for each "RHQ Agent" resource. If you want to update all of your agents so they all download the latest plugins, you can use the DynaGroup feature along with the Group Operation feature to do this. First, create a DynaGroup with the expression:

resource.type.plugin = RHQAgent
resource.type.name = RHQ Agent

This creates a compatible group that dynamically adds all RHQ Agents as members to that group. Note that if you already have a compatible group with your agents as members, you can skip this group creation step.

Next, traverse to that compatible group that contains all your agents. You should see an Operations tab. From here, just invoke the "Update All Plugins" operation on that group. This will tell all of your agents in that group to update their plugins. Once that group operation is completed, all of your agent will have the latest, most up-to-date versions of all plugins.

How can I change the agent name after it has already been registered?

When you start the agent for the first time, the first setup question asked is for the "agent name". This is a name that must be unique across all agents in your environment. Once registered you cannot change this name. Anytime you attempt to re-register this agent, you must re-register it with the same name that it was registered under before.

Note that this "agent name" is not the same as the "RHQ Agent resource name" that you see in the UI. If you import an RHQ Agent resource into inventory, that resource's name will be something like "agentname RHQ Agent" where "agentname" is agent name you provided at agent setup time. This RHQ Agent resource name can be changed by editing its value within the Inventory tab. Changing this name does not change the name that the agent is registered under. Your agent is still registered under its original agent name.

I want to run agents on all my machines, but only one starts OK - the rest fail due to binding to a wrong address

If you want to run multiple agents, but many fail to start with this error:

FATAL [main] (org.jboss.on.agent.AgentMain)-
{AgentMain.startup-error}The agent encountered an error during startup
and must abort java.net.BindException: Cannot assign requested address

then there are a couple of things you need to consider.

First, if you changed your agent-configuration.xml manually (say, to change IP addresses), did you do that after you initially setup the agent? The agent's configuration XML file is not referenced after the agent is setup - it doesn't need to because its configuration is persisted using Java Preferences (this is so it can support agent updates or agent re-installs without losing its configuration). If you want to change the agent's configuration file and have those changes picked up, restart the agent and pass it the --config command line option (or -c which is shorthand for --config). This tells the agent to re-read the configuration file and make that its configuration, overriding any old configuration it persisted before.

The other question to ask is - is your home directory stored on NFS? If so, then you are probably picking up the same Java Preferences across all your machines (see $HOME/.java - that is the default location where Java stores Java Preferences on UNIX - on Microsoft Windows, it goes in the registry so this might not be relevant if you are on Windows). If you are running the agents as the same user and your user's home directory is shared (via NFS or some other sharing technology) then one solution is to have your agents use different Java Preferences names.

Each time that you start your agents, you need to tell them where they can find their preferences. You tell the agent your new preference name via --pref (or its shorthand notation of -p). Each agent must have their own preference node name. On UNIX, you could use `hostname` as its value, for example.

Read the comments at the top of agent-configuration.xml, it has some relevant info in there. You can also read the usage help too: rhq-agent.sh --help.

If you are using RHQ v1.1 agents, you must edit your agents' agent-configuration.xml files and change their Java preferences node names from "default" to something else that makes them unique across all agents. For example, change:
<node name="default">

to

<node name="another-agent-default">

Since you changed the configuration file, don't forget that the first time you restart the agent you need to pass in -c too):

rhq-agent.sh -p another-agent-default -c agent-configuration.xml

Thereafter, the agent need only be passed in -p every time you restart it (the new configuration will be persisted for you).

If you do not want to be forced to edit your configuration files or pass the -p option, the other alternative is that you can define the system property java.util.prefs.userRoot to point to some other, unique, location (e.g. /etc/rhq-agent-prefs). When the agent starts, Java will use the value of that system property as the location where it will store its Java Preferences. You set this system property on the agent via the environment variable RHQ_AGENT_ADDITIONAL_JAVA_OPTS. When you set that environment variable, rhq-agent.sh will add its value to the default set of Java options when passing in options to the agent's Java VM:

set RHQ_AGENT_ADDITIONAL_JAVA_OPTS="-Djava.util.prefs.userRoot=/etc/rhq-agent-prefs"
rhq-agent.sh

When starting the Agent via a Windows service, the Agent fails to start, and I see the error "java.lang.IllegalStateException: The name of this agent is not defined - you cannot start the agent until you give it a valid name" in the Agent wrapper log file. What does this mean?

The Agent cannot ask for its initial setup configuration when installing as a Windows service (because there is no console for the user to see and answer the prompts). This means that you need to either preconfigure the agent or run the agent in standard (non-service) mode once as the user that should run the service in order to answer the setup questions and configure it before installing it as a service.

My Agent setup is correct but my Agent is getting "Cause: org.jboss.remoting.CannotConnectException: Can not connect http client invoker."

Starting in RHQ 1.1, the Server information defined in your Agent setup is used only for initial contact with a RHQ Server (i.e. the server hostname/IP address you provide to the agent startup setup prompt is only used when initially registering with the server).

Since RHQ 1.1 supports a multi-Server "High Availability Cloud", the Agent may be serviced by any Server in your RHQ Server network. The Agent will try to connect to any Server in the cloud -and it does so via the Server endpoint as defined for the Server at Server install-time, or via the RHQ GUI's server details pages (Administration>HighAvailability>Servers).

This error is typically seen when the Server's endpoint address is not set to something that can be resolved by the Agent. The Public Endpoint Address set for each Server must be resolvable by every RHQ Agent.

Check your Server endpoint information via the GUI's HA Administration page and update if necessary. After the update, restart your Agent.

My agent machine does not have a writable directory called /var/run. How can I get my rhq-agent-wrapper.sh script to successfully write out its pidfile?

Set the environment variable RHQ_AGENT_PIDFILE_DIR to a full path of the directory where you want the pidfile to get stored. When you run the script, that variable's value will override the default location. If you have an older script (2.1 or older), directly edit rhq-agent-wrapper.sh and change /var/run to the directory that you want.

Explain how the agent scans for resources

When the agent performs discovery, it does so using two different types of "scans" to try to find resources.

A "server scan" detects top-level servers that run on your platform - things like JBossAS servers, Postgres servers and the like. These scans run by default every 15 minutes. The setting that controls this is "rhq.agent.plugins.server-discovery.period-secs".

A "service scan" detects lower-level and more fine-grained services that are running in already detected and imported top-level servers. Things like EJBs running in JBossAS, tables in a Postgres databases/tables or VHosts in Apache. These scans run by default every 24 hours (i.e. 1 day). You must have already imported the servers in inventory before services can be discovered! These types of scans are normally very "expensive" to perform since they do probing inside the managed resource, so we don't do it often (which is why its defaulted to 24 hours). The setting that controls this is "rhq.agent.plugins.service-discovery.period-secs".

The above two types of scans are "discovery" scans - in other words, they attempt to discover new resources that the agent does not have in its inventory yet but they might be ones you want to manage. There is also a third type of scan - an "availability scan". A availability scan is not a discovery scan; however, it is very important to understand what it is. When the agent performs an availability scan, it tries to determine the availability of resources that are already discovered and committed to inventory (i.e. these resources were previously discovered by one of the two types of discovery scans previously mentioned). These availability scans run by default every 5 minutes - the setting that controls this is "rhq.agent.plugins.availability-scan.period-secs". After an availability scan completes, the agent will have an up-to-date status of which resources are either UP or DOWN. Once the availability scan is finished, the agent will send an "availability report" to the server. This is how the server will know which resources should currently be displayed as UP or DOWN (aka "green" or "red"). Note that this availability report serves a second purpose - it informs the server of the agent's own availability! In other words, when the server receives an availability report from agent A, not only does the server now know the UP or DOWN status of that agent's managed resources, but it also implicitly knows that agent A itself is UP. This agent availability will thus reset the clock on that agent's "quiet time", which is used by the server to determine when it should suspect that agent is DOWN. For example, if the "max agent quiet time" server setting is set to 10 minutes, and the server hasn't received an availability report from agent A in over 10 minutes, the server will suspect that agent A is DOWN (which has a side effect of causing the server to "backfill" all of the agent A's managed resources to the availability status of DOWN).

How can I see the agent persisted configuration?

The agent's configuration is initially read from agent-configuration.xml and overlaid with values you enter at the setup prompts at startup. After the agent is initially configured, it will persist that configuration and never look at agent-configuration.xml (unless you clear the configuration). The actual location on the file system where the configuration is persisted is platform dependent - for example, on UNIX, its typically "$HOME/.java" (see the Java Preferences API documentation for more information on how and where Java persists preferences). For more details, read the comments at the top of the agent-configuration.xml file. Configure the RHQ Agent and Preconfiguring the Agent also has more information on this.

There are several ways in which you can view the agent's persisted configuration.

  1. If the agent is in your RHQ inventory, simply go to your agent's Config tab to view its live configuration (this is the same configuration that is persisted)
  2. If the agent is currently running in non-daemon mode (i.e. you have the agent prompt on your console), you can use the "getconfig" or "config" prompt commands to view the live configuration. Type "help getconfig" or "help config" for more information.
  3. If the agent is in your RHQ inventory, you can execute the "Execute Prompt Command" operation and invoke the "getconfig" prompt command to view one or more preferences.
  4. Because the agent configuration is stored in the standard Java Preferences API backing store, you can use any tool that can examine Java Preferences. One such tool is the Java Preferences Tool. This is a GUI tool that can give you a "file system" like view into your Java Preferences. The agent preferences are stored in the "User" preferences node under the node name "rhq-agent". Depending on the -p option that is passed to the agent when it is started, the actual configuration settings are found under a sub-node under "rhq-agent". The default preferences node is called "default" so typically your agent's persisted configuration is found in the user preferences under "rhq-agent/default". WARNING! Do not attempt to change the values of the preferences using third-party tools like this without knowing what you are doing - you could render the agent useless if you change the wrong preference to the wrong value. Use this mechanism only to view your agent's configuration

How can I get a dump of inventory information from an agent running on another machine?

The use-case here is that someone (call him "the customer") is running an agent in their environment and is having problems. You suspect the customer's agent inventory is corrupted somehow. As a developer, you would like to know exactly what the agent thinks is in its inventory so you can debug the problem.

To get this information, you must get the customer's agent "data/inventory.dat" file. Copy that file to your local machine (it doesn't matter what directory you put it in). Now, run your own agent on your own local machine - make sure you run that agent with the same plugins that the customer was running with. The agent doesn't necessarily have to be connected to a server, but the plugin container must be started (that means the agent has to have been registered). Now, execute this agent prompt command:

inventory --xml --export=/customer-inventory.xml /the/customer/inventory.dat

where /the/customer/inventory.dat is the full path to where you copied the customer's inventory.dat file. If you do not specify the --export option, the XML will simply be dumped to the stdout console window, otherwise, the XML is stored in the full path you specify.
Now you have an XML file that describes what the customer's agent thinks is its inventory.

I need to change the IP Address of my agent machine - how do I keep my server and agent up to date with that change?

The agent has a configuration preference named "rhq.communications.connector.bind-address" whose value is that of the IP address the agent binds to when it starts its server socket (the thing it listens to for incoming messages from the server).

If you change the agent's IP address (and invalidate the old agent IP address), you have to do a couple things:

  1. You have to change the agent's configuration so that preference value is the same as the new IP address. You can do this by issuing a setconfig prompt command on the agent prompt: setconfig rhq.communications.connector.bind-address=<the new IP address>. (NOTE: do not change agent-configuration.xml and think the change will take effect - please read and understand the comments at the top of agent-configuration.xml before you change that configuration file). If your agent is running in the background as a daemon process, you'll have to shut it down via rhq-agent-wrapper.sh/bat stop and re-start it via "rhq-agent.sh".
  2. Restart the agent once you change the IP address preference value.

Once the agent is restarted, it will use that new IP address.

When I shutdown the agent, the RHQ Server takes more than 14 minutes to detect the agent was down. Can I configure it to not take so long?

You are killing the agent entirely, so the agent is never reporting any availability data at all to the server.

To support cases like this (where the agent is completely down or unresponsive), periodically, the server needs to check to see what agents it hasn't heard from in a long time and then determine which of these "suspect" agents are really down.

Read this for background on this issue: https://bugzilla.redhat.com/show_bug.cgi?id=RHQ-1098. That issue tells you why we increased the default time.

Read this for more information - it talks about the new default time: https://bugzilla.redhat.com/show_bug.cgi?id=RHQ-2349. It states, in part, "We have a quiet time of 15m right now (recently changed to that)."

What does this mean? It means, by default, if we have not heard from an agent in 15 minutes (what we call the agent's "quiet time"), only then do we mark that agent and all of its resources down. This is why it takes more than 14 minutes to detect your agent was down.

If you do not like that, and you want it to report "down" faster, then, yes, you can change this - its configurable in the GUI... go to the main menu "Administration>SystemConfiguration>Settings" and change the setting "Agent Max Quiet Time Allowed" to something shorter. Note: the shorter your allowed quiet time interval is, the greater the possibility of a "false negative" - for example, if you set quiet time to 5 minutes and if your server can't process all your agent's availability reports fast enough, it may think it hasn't heard from an agent when in fact it just hasn't had time to process the latest avail report. When an agent is determined to be down, the server has to "backfill it" - marking all of its resources down - and this is expensive. So you don't want to do this often.


Log messages

What are "Command failed to be authenticated" messages?

Agents are assigned security tokens when they first register with the server. The token is one way an agent identifies itself with the server. If an agent does not identify itself with any token, or if it identifies itself with a wrong token, the server will deny access to that agent - in other words, the server will reject commands that come from that agent until that agent has properly registered. If an agent is continually causing "failed to be authenticated" errors on the server similar to this:

02:31:33,095 WARN [CommandProcessor] {CommandProcessor.failed-authentication}
Command failed to be authenticated! This command will be ignored and not processed:
Command: type=[identify]; cmd-in-response=[false]; config=[{}]; params=[null]

then it usually means the agent has been misconfigured, or it is an unknown agent attempting to identify itself as another agent. Restart your agent with the "--cleanconfig" command line option to clean out its configuration and re-register.

PLEASE NOTE Do not rely on the security token mechanism as a way to protect your RHQ environment from intrusion. If you require secure communications between servers and agents, see the Securing Communications section to learn how to setup SSL for authentication and encryption.

What are "fail-safe cleanup" messages?

You'll often see messages in your logs that look like:

13:43:10,781 WARN [LoadContexts] fail-safe cleanup (collections) :
org.hibernate.engine.loading.CollectionLoadContext@103583b
<rs=org.postgresql.jdbc3.Jdbc3ResultSet@d16f5b>

Please ignore these messages as they are normal and expected. The messages deal with the underlying ORM technology used (Hibernate) and how it automatically cleans up after itself to prevent memory leaks.


Plugins

Platform Plugin

How can I collect syslog messages as RHQ Events?

The Linux platform plugin can monitor syslog messages by emitting them as events. Syslog messages can be collected by the plugin by either reading syslog message files or by receiving them over a socket listener.

In either case, syslog must be configured to format the messages in a way that RHQ can parse. You can either tell RHQ (in the platform's plugin configuration - aka connection properties) what regular expressions can parse your syslog messages, or in your syslog config file (e.g. /etc/rsyslog.conf), you should format your messages that RHQ understands out of the box. In the latter case, if you make sure you define the syslog message format like below, the Linux platform plugin can parse it:

$template RHQfmt,"%timegenerated:::date-rfc3339%,%syslogpriority-text%,%syslogfacility-text%:%msg%\n"

If you then use "RHQfmt" in your syslog configuration so it writes messages out in that format, you'll be able to have RHQ understand the log messages fully. For example:

$template RHQfmt,"%timegenerated:::date-rfc3339%,%syslogpriority-text%,%syslogfacility-text%:%msg%\n"
*.* /var/log/messages-for-rhq;RHQfmt
*.* @@127.0.0.1:5514;RHQfmt

That will both write syslog messages to /var/log/messages-for-rhq and will send the messages over TCP to a listener on port 5514 (you would configure the platform's connection properties to listen to this port).

JBossAS Plugin

Why does only 1 JBossAS server show "green" availability and all the rest show "red" even though I made sure all of my JNP credentials are configured properly in my resources' connection properties?

There is a problem in the way the JBossAS JNP client works. See RHQ-1030 for the full description of the problem, but in short, if you a managing multiple JBossAS servers on a single box, all of your security credentials for those servers must be the same (i.e. the JNP username and password must be the same).

Postgres Plugin

Why is the agent showing an error in my postgres discovery about authentication failed for user "postgres"?

The Postgres plugin attempts to log into the database server using the username and password of "postgres". In many installations, this is a default superuser and will work. However, it is also possible that this login could fail for a number of reasons:

  • The "postgres" user has been deleted.
  • The password for the "postgres" user has been changed.
  • On Linux, the administrative login has been set to "ident sameuser".

In many cases, this can be alleviated as follows:

  • Inventory the discovered Postgres resource. Its availability will show as down and it will not find any child resources.
  • Navigate to the inventory tab for the Postgres resource.
  • Under Connection Properties, click the Edit button.
  • Change the "role name" and "role password" fields to reflect a valid super user account on the Postgres instance.

Additionally, Postgres may need to be changed on Linux systems to allow password based logins (i.e. "md5" v. "ident sameuser" settings in the pg_hba.conf file). Consult the Postgres for more details.

Why are most of the metrics for my Postgres resource showing up as NaN?

In many installations, Postgres will not start its statistics collector by default. To enable statistics collection, add (or change) the following line in the postgres.conf file:
stats_start_collector = on

How many database connections are necessary to monitor a Postgres database?

Each Postgres database inventoried in RHQ requires 1 connection.

Why can't I drop my database that is inventoried in RHQ?

With the frequency of availability and statistics monitoring, the Postgres plugin keeps an open connection to the database. As such, when attempting to drop a database currently inventoried in RHQ, an error will be thrown about the database being in use. In order to drop the database, the RHQ Agent monitoring the database must be shutdown or the database resource should be removed from RHQ. This will close the postgres plugin's connection to the Postgres server and thus allow you to drop the database.

Apache Plugin

Where can I get the connectors?

The Apache plugin monitors an Apache Web Server via custom modules like the SNMP connector. You can download the open-source versions of these connectors and install them in your Apache Web Server.

Augeas-based Plugins

What is this augeas plugin?

The augeas plugin is an "abstract" plugin that exists solely as an extension point for other plugins to extend. The augeas plugin provides the Java JNI classes necessary for other dependent plugins to use to access the Augeas native library. For example, the opensshd plugin depends on the augeas plugin because it uses the Augeas library to access the OpenSSH daemon configuration. The other RHQ plugins known to use this augeas plugin are: hosts, grub and apt.

Why does my agent log have this in it: "java.lang.UnsatisfiedLinkError: Unable to load library 'augeas': libaugeas.so: cannot open shared object file: No such file or directory"

This occurs when you have deployed one or more augeas-based plugins but your Linux machine does not have the augeas native library installed. See http://augeas.net for more information on Augeas and how you can install it on your machine.


Troubleshooting

Installer fails on PostgreSQL with "Relation RHQ_Principal does not exist"

First make sure that the RHQ server / installer is allowed to connect to PostgreSQL. You should look at the PostgreSQL configuration file pg_hba.conf where the permissions are configured. If this is OK, and the installer is able to connect to the database, please check the PostgreSQL page for a workaround.

RHQ 1.0 has trouble starting on Java 6

Java 6 is not supported on earlier versions of RHQ 2.0 - please use Java 5. RHQ 1.1 and up support Java 6.

The execution of a Script-resource fails on Unix

When I invoke the "Execute" operation on a Script resource, it immediately fails and I get an error saying that the script can not be executed.

Make sure that the execute bit is set on the resource. You can set it via chmod +x scriptname

Install fails on Oracle with ORA-01843

This issue happens when Oracle runs in a locale where the abbreviation for April is not 'APR' like in EN or DE locales. There are currently two workarounds

  • put Oracle in a different locale (most of the time not wanted)
  • Edit one of the server distribution files before running the installer
    • remove the old server directory and unzip the install package again
    • go to $SERVER/jbossas/server/default/deploy/rhq-installer.war/WEB-INF/classes
    • edit db-data-combined.xml There are a few dates in the form 01-APR-08 to be in the locale you have
    • save the file
    • re-run the installer - choose to overwrite the database

When trying to monitor a JBoss EAP instance, I get the error "Connection failure Failed to authenticate principal=null, securityDomain=jmx-console"

As explained in the JBoss EAP documentation, the jmx-console is secured by default, follow the instructions listed in the EAP Installation Guide to define a username/password. Then, in the RHQ GUI, go to the Inventory > Connection tab of the JBoss EAP Resource and set the username and password properties to the same values.

Also note that when starting a JBoss EAP instance without specifying a configuration parameter (-c), it will be started with the "production" configuration, as described in JBPAPP-198.

Why does my Apache SNMP module fail to start with the error ...?

"Syntax error on line 1376 of /etc/httpd/conf/httpd.conf: Unable to write to SNMPvar directory" (on stderr)

Please ensure the directory specified via the "SNMPVar" directive exists and is writable by the user that owns the Apache process.

"init_master_agent: Invalid local port (Permission denied)" (in the error_log file)

See if your Apache error_log contains a log message similar to "[notice] SELinux policy enabled; httpd running as context user_u:system_r:httpd_t:s0". If so, the SELinux (Security-Enhanced Linux) policy is preventing the httpd process from binding to the SNMP agent port (1610 by default). The easiest solution is to put SELinux in permissive mode by running the command "/usr/bin/setenforce 0" and then restart Apache. You should then see a message similar to "[notice] SELinux policy enabled; httpd running as context user_u:system_r:unconfined_t" in your error_log; note the "unconfined_t" portion, which indicates SELinux is no longer restricting the process.

When monitoring a JBAS instance, I'm not seeing any JVM resources beneath it?

In order for RHQ to discover JVM resources for a JBAS resource, the corresponding JBAS instance needs to be running on Java 5 or later, and it needs to have been started with the jboss.platform.mbeanserver System property set. For example, in UNIX-type environments, you can specify the following in the ${JBOSS_HOME}\bin\run.conf file:

JAVA_OPTS="$JAVA_OPTS -Djboss.platform.mbeanserver"

Note: With RHQ 1.0 and 1.0.1, if the system property com.sun.management.jmxremote is also specified this will prevent the JVM resources being discovered by RHQ. Removing this property will allow those resources to be found. In RHQ 1.0.1 this restriction is lifted and even if the system property com.sun.management.jmxremote is specified JVM resources should still be added to the RHQ inventory.

How can I debug JDBC access and trace SQL?

Use log4jdbc.

How can I stop my agent from thinking the server keeps going up and down when the server has remained running the whole time?

If you see information like this in your agent logs:

INFO (org.rhq.enterprise.agent.AgentAutoDiscoveryListener)- {AgentAutoDiscoveryListener.server-offline}
The Agent has auto-detected the Server going offline [InvokerLocator
[servlet://server:7080/jboss-remoting-servlet-invoker
/ServerInvokerServlet?rhq.communications.connector.rhqtype=server]] -
the agent will stop sending new messages 
...
INFO (org.rhq.enterprise.agent.AgentAutoDiscoveryListener)- {AgentAutoDiscoveryListener.server-online}
The Agent has auto-detected the Server coming online [InvokerLocator
[servlet://server:7080/jboss-remoting-servlet-invoker
/ServerInvokerServlet?rhq.communications.connector.rhqtype=server]] -
the agent will be able to start sending messages now

it means the agent has auto-detected its server going down and back up. This auto-detection was done through the multicast detector (it is different than the detection-via-polling, which is the second way the agent attempts to detect the server's status).

If you think the agent is erroneously detecting the server going up or down, it is possible your network does not support multicast traffic or the multicast network is acting abnormally. In either case, you should disable the agent multicast detector and just have the agent rely on polling to detect changes in the server status. To turn off the multicast detection, set the following agent preferences to false:

rhq.agent.server-auto-detection
rhq.communications.multicast-detector.enabled

Those are the actual Java Preference names; you may often see these in the user interface as the following:

Auto-Detect RHQ Server?
Multicast Detector Enabled?

Since you are disabling multicast detection, make sure you keep the polling detection feature enabled (i.e. rhq.agent.client.server-polling-interval-msecs should be larger than 0, typically 60000), otherwise, the agent will never be able to know when the server goes down.

Once you reconfigure the agent, you need to restart it so the communications subsystem can pick up the changes.

My Agent fails to start with "[: 207: ==: unexpected operator".

This is a known bug in RHQ 1.2/Jopr 2.2. There is a syntax error in rhq-agent.sh that causes the script to fail when executed by non-bash shells (e.g. /bin/sh on Solaris, HP-UX, or AIX). To fix the issue, edit rhq-agent.sh and change the "==" on line 207 to "=".

Why are the graphs and charts on the Monitor tab in the GUI not displayed?

If you see errors in the RHQ Server log such as:

 
java.lang.NoClassDefFoundError: Could not initialize class org.rhq.enterprise.gui.image.chart.ColumnChart

it is probably because you are missing some system fonts needed by Java to generate the text in the graphs/charts. If you are on Linux, make sure you have the urw-fonts package installed. On Fedora or RHEL, use:

yum install urw-fonts

If you are on another OS, make sure you have all the default fonts installed.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.