Apache 2

How To Configure

It is important to read the documentation distributed together with the Apache server. These documents are usually kept in directory "$APACHE_HOME\manual" or "$APACHE_HOME\htdocs\manual" (where $APACHE_HOME denotes your Apache HTTP Server's installed directory - I shall assume that Apache HTTP server is installed in d:\myproject\apache). The entry page could be "index.html", "index.html.html", or "index.html.en". Read the tutorials and How-To's.

Basic Configuration

Apache is configured by placing configuration directives, such as Listen and ServerName, into a configuration file, which will be read by the Apache executable during the startup. The default configuration file is called "httpd.conf" in the directory "$APACHE_HOME\conf". Browser through this configuration file.

At a minimum, you need to check the following directives:

Access Control in Apache HTTP Server

Access control deals with controlling access to a resource, which could a set of directories, files or locations. Access control can be based on the client's identity, which is called authentication (discussed in "HTTP Authentication"). Access control could also be based on other criteria, such as the network address, the time of day, the browser which the client is using, the types of request methods, and etc.

Directory Access Control

This section deals with access control to directories. The following sections will deal with access control to files and locations.

Directive <Directory>...</Directory>: can be used to apply access control to a set of directories. The syntax is:

<Directory directories>
# access control directives for the matching directory(ies).
......
......
</Directory>

The <directory> block directive encloses a set of access-control directives, which will be applied to the matched directory(ies) and its sub-directories. The directories specifies the directories applicable to this block. Wildcard can be used in matching: "?" matches exactly one character; "*" matches zero or more characters; [...] can be used to specify a range of characters, e.g. [c-f]. Extended regular expression (regexe) can be used, which begins with a "~".

Directive Options: controls what kinds of actions are permitted for the set of resources under control.

Option [+|-]option-1 [+|-]option-2 ...

The available options are:

If no Options directive is used, the effect is All except MultiViews. However, if an Options directive is used without +/-, e.g., "Options Indexes", only Indexes option is available, and the rest of options are off. If +/- is used, only that particular option is changed, the rest of the options remain the same (inherited from the setting at the higher level).

Example 1:

<Directory /www>
   Options Indexes ExecCGI
</Directory>
   
<Directory /www/sales>
   Options Indexes
</Directory>
   
<Directory /www/support>
   Options -Indexes
</Directory>

Since the <Directory> matching applies to sub-directories, "/www" has options Indexes and ExexCGI, "/www/sales" has option Indexes only (the setting in the parent directory is ignored), and "/www/support" has option ExecCGI (inherited from its parent directory).

Directive Order: specifies the order in which Allow and Deny directives are evaluated.

Order Deny,Allow | Allow,Deny

Example 2:

Order Deny,Allow
Deny from all
Allow from test101.com
  1. access is allowed by default;
  2. all hosts are denied;
  3. those in the domain "*.test101.com" are allowed.

Consequently, only hosts in "*.test101.com" are allowed.

Example 3:

Order Allow,Deny
Allow from test101.com
Deny from sales.test101.com
  1. access is denied by default;
  2. all hosts in the "*.test101.com" domain are allowed, and;
  3. hosts in the "*.sales.test101.com" sub-domain are denied.

Consequently, all hosts in the "*.test101.com" domain except "*.sales.test101.com" are allowed.

On the other hand, if the Order is changed to Deny,Allow, all hosts will be allowed access (by default). This happens because, regardless of the actual ordering of the directives in the configuration file, the Allow from test101.com will be evaluated last and will override the Deny from sales.test101.com. Any other hosts are allowed access by default.

Example 4:

<Directory /home>
  Order Allow,Deny
</Directory>

Access is denied to all hosts to directory "/home", based on the default setting.

Example 5:

<Directory /home>
  Order Deny,Allow
  Deny from all
</Directory>

Access is denied to all hosts to directory "/home". Although the access is allowed by default, Deny from all prohibits all hosts.

Directive Allow: specifies which hosts can access a set of resources. Access can be controlled by hostname, IP Address, IP Address range, or environment variables.

Allow from all|host|env=env-variable

If Allow from all is specified, then all hosts are allowed access, subject to the configuration of the Deny and Order directives. To allow only particular hosts or groups of hosts to access the server, the host can be specified in any of the following formats:

If Allow from env=env-variable is specified, then the request is granted if the environment variable env-variable exists. This directive can be used to allow access based on such factors as the clients User-Agent (browser type), Referer, request method, or other HTTP request header.

Example 6:

SetEnvIf User-Agent ^Mozilla/4.0 Mozilla4_browser
   
<Directory /docroot>
    Order Deny,Allow
    Deny from all
    Allow from env=Mozilla4_browser
</Directory>

In this example, browsers with a User-Agent string beginning with Mozilla/4.0 will be allowed access. All other type of browsers will be denied.

Directive Deny: restricts access based on hostname, IP address, or environment variables.

Deny from all|host|env=env-variable

The arguments for the Deny directive are identical to the arguments for the Allow directive.

File .htaccess

In each directory, you can create a file called ".htacces" to control the access into that particular directory, if AllowOverride is turned on. The directives inside the .htaccess override the <directory> directive. The relevant directives to enable .htaccess in "httpd.conf" are:

# AccessFileName specifies the name of the file to look for
# in each directory for access control information.
AccessFileName .htaccess
    
# AllowOverride controls which options the .htaccess files in
# directories can override. Can be "None", "All", or any
# combination of "Options", "FileInfo", "AuthConfig", and "Limit".
AllowOverride None|All|Options|FileInfo|AuthConfig|Limit
   
# Prevent files beginning with ".ht" (such as .htaccess, .htpasswd
# from being viewed by clients for security reason.
# Since .htaccess files often contain authorization information.
<Files ~ "^\.ht">
    Order allow,deny
    Deny from all
    Satisfy All
</Files>
# To protect only .htaccess
<Files .htaccess>
    Order allow,deny
    Deny from all
</Files>

Using .htaccess can prevent frequent re-starting of the server. This is because the configuration directives in "httpd.conf" is read at startup. Any change requires a re-start. The .htaccess is check at each access. Change will take effect for the subsequent accesses. The disadvantage is degradation in performance as the .htaccess has to be check for every access into the directory.

Directives <Limit methods> & <LimitExcept methods>:

<Limit request-method-1 request-method-2>
...directives...
</Limit>

Access controls are normally effective for all the request methods (such as GET, POST, HEAD, PUT, DELETE). <Limit> and <limitExcept> blocks can be used to restrict access controls based on the HTTP request method used in the incoming request. This is useful if you have implemented PUT request but wish to limit PUT requests but not GET requests; or you might want to allow GET/HEAD but limit PUT/DELETE.

For <limit>, access control is applied to those methods listed; all the other methods are unrestricted, for example,

<Limit POST PUT DELETE>
   Order deny,allow
   Deny from All
</Limit>

Access control applied to the methods POST, PUT, and DELETE; all other methods are unrestricted.

The method names listed can be one or more of: GET, POST, PUT, DELETE, CONNECT, OPTIONS, PATCH, PROPFIND, PROPPATCH, MKCOL, COPY, MOVE, LOCK, and UNLOCK. If GET is used it will also restrict HEAD requests. The TRACE method cannot be limited.

<LimitExcept> is used to enclose a group of access control directives which will be applied to any HTTP access method NOT listed; i.e., it is the opposite of a <Limit> block and can be used to control both standard and nonstandard/unrecognized methods. A <LimitExcept> block should be used in preference to a <Limit> block when restricting access, since a <LimitExcept> block provides protection against arbitrary methods. For example,

<LimitExcept GET POST>
   Order deny,allow
   Deny from all
</LimitExcept>

Request methods other than GET and POST, such as PUT, DELETE will not be permitted.

Example 7:

<Directory "d:/myproject/apache/users">
    AllowOverride FileInfo AuthConfig Limit
    Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
    <Limit GET POST OPTIONS PROPFIND>
        # Default access is deny
        Order allow,deny
        # But allow access from all
        Allow from all
    </Limit>
    <LimitExcept GET POST OPTIONS PROPFIND>
        # Default access is allow
        Order deny,allow
        # But deny access from all
        Deny from all
    </LimitExcept>
</Directory>

File Access Control

<Files file-name>
......
</Files>

Unlike <directory>, file-name is relative to the DocumentRoot.

(Under construction)(Give some examples)

Location Access Control

<Location URL>
......
</Location>

Limit the scope of directives defined within the block to those matching URL(s).

(Under construction)(Give some examples)

Virtual Hosts

Very often, your web server has to support a few hostnames (e.g., www.test101.com, www.test102.com, and etc.), a few IP addresses or listening to a few ports. It is rather unusual and messy to run one server for each of the hostnames, IP addresses, or ports. It is better to run many "virtual hosts" within a single physical web server.

HTTP/1.1 introduces a new feature called "virtual host", which allows you to running multiple hostnames on the same physical server/machine. HTTP/1.1-complianct server can support many hostnames/IP addresses/Ports within one single server. On the other hand, HTTP/1.0 server supports only one TCP address and one host name. In HTTP/1.1, the "Host" request header is mandatory to select one of the virtual hosts.

Read "Virtual Host - How-to" in "htdocs\manual\programs\vhosts\index.html.html"

Apache support (a) Name-based virtual hosts, (b) IP-based virtual hosts, and (c) Port-based virtual host.

Named-based Virtual Hosts

Name-based virtual hosting is usually simpler, since you only need to configure your DNS server to map each hostname to the same IP address and then configure the Apache HTTP Server to recognize the different hostnames. Name-based virtual hosting also eases the demand for scarce IP addresses. Name-based virtual hosting should be used unless there is a specific reason to choose IP-based virtual hosting.

To use name-based virtual hosting, you must designate the IP address (and possibly port) on the server that will be accepting requests for the hosts. This is configured using the NameVirtualHost directive. In the normal case where any and all IP addresses on the server should be used, you can use * as the argument to NameVirtualHost.

The next step is to create a <VirtualHost> block for each different host that you would like to serve. The argument to the <VirtualHost> directive should be the same as the argument to the NameVirtualHost directive (i.e., an IP address, or * for all addresses). Inside each <VirtualHost> block, you will need at minimum a ServerName directive to designate which host is served and a DocumentRoot directive to show where in the file system the content for that host lives.

If you are adding virtual hosts to an existing web server, you must also create a <VirtualHost> block for the existing host. The ServerName and DocumentRoot included in this virtual host should be the same as the global ServerName and DocumentRoot. List this virtual host first in the configuration file so that it will act as the default host.

For example, suppose that you are serving the domain www.test101.com and you wish to add the virtual host www.test102.com, which resolves to the same IP address. Then you simply add the following to "httpd.conf":

# Virtual host for all IP addresses at Port 80
NameVirtualHost *
   
# First virtual host shall be the main server, the default host.
<VirtualHost *>
ServerName www.test101.com
DocumentRoot /www101
</VirtualHost>
   
<VirtualHost *>
ServerName www.test102.com
DocumentRoot /www102
</VirtualHost>

You can alternatively specify an explicit IP address in place of the * in both the NameVirtualHost and <VirtualHost> directives, if your server accepts multiple IP addresses.

Many servers want to be accessible by more than one name. This is possible with the ServerAlias directive, placed inside the <VirtualHost> section. For example if you add this to the first <VirtualHost> block above

ServerAlias www.test101.com *.test101.com

then requests for all hosts in the test101.com domain will be served by the www.test101.com virtual host. The wildcard characters "*" and "?" can be used to match names. Of course, you can't just make up names and place them in ServerName or ServerAlias. You must first have your DNS server properly configured to map those names to an IP address associated with your server.

Now when a request arrives, the server will first check if it is using an IP address that matches the NameVirtualHost. If it is, then it will look at each <VirtualHost> section with a matching IP address and try to find one where the ServerName or ServerAlias matches the requested hostname. If it finds one, then it uses the configuration for that server. If no matching virtual host is found, then the first listed virtual host that matches the IP address will be used.

As a consequence, the first listed virtual host is the default virtual host. The DocumentRoot from the main server will never be used when an IP address matches the NameVirtualHost directive. If you would like to have a special configuration for requests that do not match any particular virtual host, simply put that configuration in a <VirtualHost> container and list it first in the configuration file.

IP-based virtual hosts use the IP address of the connection to determine the correct virtual host to serve. Therefore you need to have a separate IP address for each host. With name-based virtual hosting, the server relies on the client to report the hostname as part of the HTTP headers. Using this technique, many different hosts can share the same IP address.

For testing virtual host without access to DNS server: You can create a few hostnames pointing to your own IP address (or localhost) in your local DNS lookup table "hosts". For example:

192.123.123.1    www.yellow.com
192.123.123.1    www.sales.yellow.com
192.123.123.1    www.orange.com
127.0.0.1        localhost
127.0.0.1        apple88
127.0.0.1        orange99

In Windows, the local DNS lookup table is called "%SYSTEM_ROOT%\system32\drivers\etc\Hosts".

IP-based Virtual Hosts

As the term IP-based indicates, the server must have a different IP address for each IP-based virtual host. This can be achieved by the machine having several physical network connections, or by use of virtual interfaces which are supported by most modern operating systems (see system documentation for details, these are frequently called "ip aliases", and the "ifconfig" command is most commonly used to set them up).

For example:

<VirtualHost 192.123.10.1>
  DocumentRoot /www201
  ServerName www.test201.com
</VirtualHost>
   
<VirtualHost 192.123.10.2>
  DocumentRoot /www202
  ServerName www.test202.com
</VirtualHost>

Host can be _default_, in which case it matches anything no <VirtualHost> matches.

Port-based Virtual Hosts

Use different port number for different virtual hosts. The advantage is you do not need many domain names or IP addresses. However, the client may not be familiar with the format of accessing HTTP server with a non-default port number.

An example is as follows:

Listen 80
Listen 8080
   
<VirtualHost 192.123.10.1:80>
  ServerName    www.test101.com
  DocumentRoot  /www101
</VirtualHost>
   
<VirtualHost 192.123.10.1:8080>
  ServerName    www.test102.com
  DocumentRoot  /www102
</VirtualHost>

The Listen directive tells the apache which port to listen to. Apache can listen to more than one port by using multiple Listen directives.

Miscellaneous Configurations

Log Files

Apache produces these log files: error log, access log. The default configuration puts the error log in "$APACHE_home\logs\error.log" and access log in "$APACHE_home\logs\access.log". Take a quick glance into these log files.

Error Log:

The configuration directives related to error logging are ErrorLog and LogLevel:

Sample entries in the error log are as follows:

[Sun Oct 18 16:53:40 xxxx] [error] [client 127.0.0.1] Invalid method in request get /index.html HTTP/1.0
[Sun Oct 18 18:36:20 xxxx] [error] [client 127.0.0.1] File does not exist: d:/myproject/apache/htdocs/t.html [Sun Oct 18 19:58:41 xxxx] [error] [client 127.0.0.1] client denied by server configuration: d:/myproject/apache/htdocs/forbidden/index.html

Access Log:

The configuration directives related to access logging are CustomLog and LogFormat:

Some sample entries in the "common" access log are as shown:

127.0.0.1 - - [18/Oct/2009:15:41:30 +0800] "GET / HTTP/1.1" 200 44
127.0.0.1 - - [18/Oct/2009:18:36:20 +0800] "GET /t.html HTTP/1.0" 404 204
127.0.0.1 - - [18/Oct/2009:18:32:05 +0800] "get /index.html HTTP/1.0" 501 215

Error Response

The main role of Apache is to deliver document. When apache encounters problems and cannot meet a client's request, it generates an error code and returns an error message to explain the error. Apache provides a default set of error messages. Nonetheless, you can customize you own error response using directive ErrorDocument as follows:

Directory Indexing & Listing

If a client issues a URL selecting a directory, Apache returns a listing of that directory, if Options Indexes is on; otherwise it returns error "403 forbidden". However, if the directory contains a file called "index.html", Apache returns this "index.html" instead. You can use directive DirectoryIndex to specify the name of the indexing file. For example,

<IfModule mod_dir.c>
    # The first item takes precedence if many exist
    DirectoryIndex index.html myindex.html
</IfModule>

You can control the appearance (e.g., fancy indexing) of the directory listing using directive IndexOptions (of module mod_autoindex). See Apache documentation for more details.

To turn off automatic indexing for a directory, you can use directive "Options -indexes". Apache will return error "403 Forbidden" if a directory request is made. For example:

<Directory dir-path>
   Options -Indexes
</Directory>

Server-side Include (SSI): TODO (check Apache documentation)

Apache HTTP server is highly configurable through the directives in the configuration file "httpd.conf", which is outside the scope of this writing. However, do browse through the configuration file to get a good feel of Apache.

Note (for Windows Vista): On Vista, you should remove the program "Apache Monitor" from the "Startup" list to prevent an error during the Windows boot, if you are not login as the administrator.

 

REFERENCES & RESOURCES

Latest version tested: Apache 2.2.14
Last modified: October 21, 2009