Analyzing Raw Apache Visitor Log Files for Analytics and Security

February 8th, 2010

Log files provide important access to data on how users interaction with your server, providing valuable information for both security and analytics purposes. The two main types of logs provided by your server are error and access logs. By understanding the meanings of information within these files you can optimize your site accordingly.

Understanding the Apache Error Log

Apache is setup to provide real time diagnostics and error notifications to server administrators through the error_log file. These errors can be manually analyzed or be sent via notification to the system administrator based upon the severity of the errors (or on a daily or hourly basis through a regular job.) The format of the error file is relatively uniform:

[Date Monthy Date Time Year] [error level] [client IP] [description of error: /file-path]

From the second variable you can set automatic notifications based upon the severity of the error level, so you are notified when a server error occurs beyond your comfort threshold. Errors include user authentication problems, bugs in CGI scripts and failed attempts to utilize basic scripts such as automated email. Additionally you can find basic information such as 403 (not found) errors when users encounter a non-existent request. Via shell you can analyze the log with the command tail –f error_log to troubleshoot particular problems.

Analyzing the Apache Data Log

Access logs are used to record all requested which are received by Apache, which can provide important information about operations and forms the basis for common web analytics functions. You can update the basic functions to extend new options via the mod_log_config file which controls access to various parts of the log server.

You can setup logs to provide custom strings of data for your own personal review based upon your needs. For example, you can setup a custom “mylog” as follows:

LogFormat "%t %h %u \"%r\" %>s" mylog
CustomLog logs/access_log mylog

Let’s walk through what each of these pieces of data in My Log tells you about your users for further analysis:

T-Variable Data

    This provides the (T)ime of the server request delivered in a Date/Month/Year:Time format which also provides the time zone of the request relative to GMT. You can use this to understand the user’s point of origin, which you can corroborate using the IP log data.

    H-Variable Data

      This refers to the (H)ost of the user, or the IP address of the request server. Keep in mind users can use a proxy server so this IP-level data may not be strictly accurate.

      U-variable Data

        This refers to the U(ser) if they are authenticated (logged-in) otherwise this will return a blank “-“ value.

        R-Variable Data

          This refers to the R(equest) a user makes, which is the path, query and protocol the user utilized in order to gain access to information on the site.

          S-Variable Data

            This refers to the S(tatus) codes generated by the user, ranging from successful requests to errors.

            0 responses so far ↓

            There are no comments yet...Kick things off by filling out the form below.

            Leave a Comment

            © Copyright 2013 -

            Follow us on Twitter Facebook Subscribe to our RSS Feed