You are on page 1of 115

version 2.

by
Marliza Ramly
Zurina Saaya
Wahidah Md Shah
Mohammad Radzi Motsidi
Haniza Nahar
Faculty of Information and Communication Technology
Universiti Teknikal Malaysia Melaka (UTeM)
May 2007

Copyright 2007 Fakulti Teknologi Maklumat dan Komunikasi, UTeM

TABLE OF CONTENT
1.

PROXY SERVERS......................................................................................... 1
1.2

2.

INTERNET CACHING ................................................................................... 4


2.1
2.2
2.3
2.4

3.

3.4
3.5

3.6

4.2

ACCESS CONTROLS ........................................................................ 25


List of ACL type......................................................................... 26
src .......................................................................................... 27
srcdomain ................................................................................ 28
dst .......................................................................................... 29
dstdomain ................................................................................ 29
srcdom_regex........................................................................... 30
dstdom_regex........................................................................... 30
time ........................................................................................ 31
url_regex ................................................................................. 32
urlpath_regex ........................................................................... 33
port......................................................................................... 34
proto ....................................................................................... 35
method .................................................................................... 36
browser ................................................................................... 36
proxy_auth............................................................................... 37
maxconn.................................................................................. 38
Create custom error page ........................................................... 39
EXERCISES ................................................................................. 40

CACHING.................................................................................................... 42
5.1
5.2

6.

HARDWARE AND SOFTWARE REQUIREMENT ............................................ 10


DIRECTORY STRUCTURE .................................................................. 11
GETTING AND INSTALLING SQUID ........................................................ 11
Custom Configuration for Network ............................................... 11
INSTALL SQUID............................................................................. 16
BASIC SQUID CONFIGURATION .......................................................... 17
Configure SQUID ....................................................................... 17
Basic Configuration.................................................................... 17
Starting Squid Daemon ............................................................. 19
Starting Squid Daemon .............................................................. 20
BASIC CLIENT SOFTWARE CONFIGURATION ............................................ 22
Configuring Internet Browser ...................................................... 22
Using proxy.pac File................................................................... 23

ACL CONFIGURATION............................................................................... 25
4.1

5.

HIERARCHICAL CACHING ...................................................................4


TERMINOLOGY FOR HIERARCHICAL CACHING .............................................5
INTERNET CACHE PROTOCOL ...............................................................7
BASIC NEIGHBOUR SELECTION PROCESS .................................................7

INTRODUCTION TO SQUID ......................................................................... 9


3.1
3.2
3.3

4.

KEY FEATURES OF PROXY SERVERS ........................................................2


Proxy Servers and Caching ...........................................................2

CONCEPTS .................................................................................. 42
CONFIGURING A CACHE FOR PROXY SERVER ............................................ 42

SQUID AND WEBMIN ................................................................................. 47


i

6.1 ABOUT WEBMIN ................................................................................ 47


6.2
OBTAINING AND INSTALLING WEBMIN .................................................. 47
Installing from a tar. gz.............................................................. 48
Installing from an RPM ............................................................... 48
After Installation ....................................................................... 49
6.3
USING SQUID IN WEBMIN ................................................................ 49
6.4
PORTS AND NETWORKING ................................................................ 50
Proxy port ................................................................................ 51
ICP port ................................................................................... 51
Incoming TCP address................................................................ 51
Outgoing TCP address ................................................................ 52
Incoming UDP address ............................................................... 52
Outgoing UDP address ............................................................... 52
Multicast groups........................................................................ 52
TCP receive buffer ..................................................................... 53
6.5
OTHER CACHES ............................................................................ 53
Internet Cache Protocol.............................................................. 53
Parent and Sibling Relationships .................................................. 54
When to Use ICP?...................................................................... 54
6.6
OTHER PROXY CACHE SERVERS ......................................................... 55
Edit Cache Host ........................................................................ 56
Hostname ................................................................................ 56
Type........................................................................................ 57
Proxy port ................................................................................ 57
ICP port ................................................................................... 57
Proxy only? .............................................................................. 58
Send ICP queries? ..................................................................... 58
Default cache ........................................................................... 58
Round-robin cache? ................................................................... 58
ICP time-to-live ........................................................................ 59
Cache weighting........................................................................ 59
Closest only.............................................................................. 59
No digest?................................................................................ 59
No delay?................................................................................. 60
Login to proxy .......................................................................... 60
Multicast responder ................................................................... 60
Query host for domains, Dont query for domains .......................... 60
Cache Selection Options ............................................................. 61
Directly fetch URLs containing ..................................................... 61
ICP query timeout ..................................................................... 62
Multicast ICP timeout................................................................. 62
Dead peer timeout .................................................................... 62
Memory Usage.......................................................................... 63
Memory usage limit ................................................................... 63
Maximum cached object size....................................................... 64
6.7
LOGGING ................................................................................... 64
Cache metadata file................................................................... 65
Use HTTPD log format ................................................................ 65
Log full hostnames .................................................................... 66
Logging netmask....................................................................... 66
6.8
CACHE OPTIONS ........................................................................... 67
6.9
ACCESS CONTROL ......................................................................... 68
Access Control Lists ................................................................... 69
Edit an ACL .............................................................................. 69
Creating new ACL...................................................................... 70
Available ACL Types................................................................... 71
6.10 ADMINISTRATIVE OPTIONS ............................................................... 75
7.
ii

ANALYZER ................................................................................................ 78

7.1

7.2
7.3
7.4
7.5

STRUCTURE OF LOG FILE .................................................................. 78


Access log ................................................................................ 78
Cache log ................................................................................. 90
Store log.................................................................................. 93
METHODS ................................................................................... 96
Log Analysis Using Grep Command .............................................. 96
Log Analysis Using Sarg-2.2.3.1 .................................................. 96
SETUP SARG-2.2.3.1 .................................................................... 97
REPORT MANAGEMENT USING WEBMIN ................................................. 98
LOG ANALYSIS AND STATISTIC ........................................................ 105

iii

ABBREVIATIONS
Abbreviation

Details

ACL

Access Control List

CARP

Cache Array Routing Protocol

CD

Compact Disk

DNS

Domain Name Service

FTP

File Transfer Protocol

GB

Gigabyte

HTCP

Hyper Text Caching Protocol

HTTP

Hypertext Transfer Protocol

I/O

Input/Output

ICP

Internet Cache Protocol

IP

Internet Protocol

LAN

Local Area Network

MAC

Media Access Control

MB

Megabyte

RAM

Random Access Memory

RPM

Red Hat Package Manager

RTT

Round Trip Time

SNMP

Simple Network Management Protocol

SSL

Secure Socket Layer

UDP

User Datagram Protocol

URL

Uniform Resource Locator

UTeM

Universiti Teknikal Malaysia Melaka

WCCP

Web Cache Coordination Protocol

iv

Chapter

1. Proxy Servers
A Proxy Server is an intermediary server between the Internet browser
and the remote server. It acts like a "middleman" between the two
ends of the client/server network connection and also works with
browsers and servers or other application by supporting underlying
network protocols like HTTP. Furthermore, it store and download
documents in its local cache so that the downloading time from the
internet can be faster because the document is store in a local server.
For example, lets imagine when a user want to download documents
from the Internet browser with a specify URL address such as
http://www.yahoo.com, which then the document will be transfer to
workstation. (e.g UTeM to local workstation). In that situation, the
internet browser communicates directly with the proxy server UTem to
get the document.
In addition, a cache is combined with a proxy server which will make it
reliable for quicker transfer. In this matter, Internet browser will no
longer contact the remote server directly but it request document from
the proxy server.

Proxy Servers

1.2 Key features of proxy servers


Four main functions provided are:

Firewalling and Filtering (security)

Connection Sharing

Administrative Control

Caching service

Proxy Servers and Caching


Proxy Server with the caching of Web pages may leads to a better
improvement for QoS in network as in Figure 1-1. It can be specified in
three ways:

Caching may preserve bandwidth on the network and proliferate


scalability

Enhancement of response time (e.g: HTTP proxy cache can load


Web Pages more quickly into the browser)

Proxy server caches boost to the availability, where Web pages


or other files in the cache remain accessible even if the original
source or an intermediate network link goes offline.

Proxy Servers

client

client

Proxy Server

Internet

client

client

client

Figure 1-1: Generic Diagram for Proxy Server

Chapter

2. Internet Caching
2.1 Hierarchical Caching
Cache Hierarchies are a logical extension of the caching concept. A
sharing concept might help and give some benefit for a group of Web
caches and a group of Web Clients. Figure 2-1 shows how it works.
However, there are some disadvantages as well. It will depends on the
specific situation discuss below whether the advantages will outweigh
the disadvantages.

Proxy server caches


returned page

client

Proxy server returns the


requested page to the client

Proxy Server

Web server returns


requested URL to
proxy server

Yes

internet

Client browser initiates


request to proxy server
for the URL

Is requested page in
proxy server cache?

No

Proxy server
requests the page
from the web server

Figure 2-1: Proxy Server Caching Process

Web server

Internet Caching

The major advantages are:

Additional cache hits. In general, the cache hits that are


expected from the requested user will be at the neighbor caches.

Request routing. The availability to direct the HTTP traffic along


a certain path can be done by routing requests to specific
caches. (e.g., accessing the Internet with two paths, one of it is
cheap and the other is being expensive, therefore, the user can
send HTTP traffic over the cheapest link using the request
routing.

The disadvantages among the concept:

Configuration hassles. The coordination from both parties are


required to configure neighbors caches. As a result, it will put
some weight to the exacerbates membership

Additional delay for cache misses. There are many factors to


consider due to the delay. For example, delays between peers,
link congestion, and whether or not ICP is used.

2.2 Terminology for Hierarchical Caching


Cache
It is refers to an HTTP proxy that store some requests.
Objects
It is a generic term for any document, image, or other type of data
that available on the Web. Nowadays, the Uniform Resource Locators
(URLs) will identify Web Page with objects (such as images, audio,
video and binary files) rather than documents or pages only from the
data available at HTTP, FTP, Gopher and other types of servers.

Internet Caching

Hit and misses


It is a valid copy when a cache hit the requested existing object in a
cache.
If the object does not exist or no longer valid, it is refer to cache miss.
That situation, a cache must forward cache misses toward the origin
server.
Origin Server
It is the authoritative source for an object. For example, the origin
server is the hostname in URL.
Hierarchy vs. Mesh
It is hierarchically arrange when the topology is like a tree structure or
in mesh when the structure is flat. In either case these terms simply
refer to the fact that caches can be ''connected'' to each other. In
squid it can be seen at directory cache after creating it.
Neighbours, Peers, Parents, Siblings
In general, the terms neighbour and peer are the same for caches in a
hierarchy or mesh. While, for parent and sibling will refer to the
relationship between a pair of caches.
Fresh, Stale, Refresh
The status of cached objects can be refer to

A fresh object when a cache hit is returnable.

A stale object and refresh object when the Squid refresh it by


including an IMS request header and forwarding the request on
toward the origin server.

Internet Caching

2.3 Internet Cache Protocol


A quick and efficient method of inter-cache communication in ICP's is
by offering a mechanism to establish a complex cache hierarchies. The
advantages by using are;

ICP can be utilized by Squid to provide an indication of network


conditions.

ICP messages are transmitted as UDP packets. It is easier to


implement because each cache needs to maintain only a single
UDP socket.

ICP may convey to some disadvantages as well. One of the failures in


ICP is when the links is highly congested, therefore the ICP become
useless where its caching is needed most. Furthermore, an extra delay
may be a factor in processing request due to the transmission time of
the UDP packet. As a result, ICP is not the appropriate for this delay in
some situation.

2.4 Basic Neighbour Selection Process


Before describing Squid features for hierarchical caching, first lets
briefly explain the neighbor selection process referring to. Squid must
decide where to forward the request when it is unable to satisfy the
request from cache. There are basically three choices can be use:

parent cache

sibling cache

origin server

Internet Caching

How ICP can make decision for Squid?

In parent and sibling cache, Squid will send an ICP query


requested URL message to its neighbors. Usually in a UDP
packets and Squid will remembers how many queries it sends for
a given request.

By receiving ICP query in each neighboring, the URL will be


search in its own cache. If a valid copy of the URL exists, then
cache sends ICP_HIT, otherwise an ICP_MISS message.

The querying cache now collects the ICP replies from its peers.

If the cache receives an ICP_HIT reply from a peer, it


immediately forwards the HTTP request to that peer.

If the cache does not receive an ICP_HIT reply, then all replies
will be ICP_MISS.

Squid waits until it receives all replies, up to two seconds.

If one of the ICP_MISS replies comes from a parent, Squid


forwards the request to the parent whose reply was the first to
arrive. We call this reply the FIRST_PARENT_MISS. If there is no
ICP_MISS from a parent cache, Squid forwards the request to
the origin server.

We have described the basic algorithm, to which Squid offers


numerous possible modifications, including mechanisms to:

Send ICP queries to some neighbours and not to others.

Include the origin server in the ICP pinging so that if the origin
server reply arrives before any ICP_HITs, the request is
forwarded there directly.

Disallow or require the use of some peers for certain requests.

Chapter

3. Introduction to Squid
Squid is a high-performance proxy caching server for Web clients,
support FTP, gopher, and HTTP data objects. It has two basic
purposes;

to provide proxy service from machines that must pass Internet


traffic through some form of masquerading firewall

caching

Unlike traditional caching software, Squid handles all requests in a


single, non-blocking, I/O-driven process.
Squid keeps meta data and especially hot objects cached in RAM,
caches

DNS

lookups,

supports

non-blocking

DNS

lookups,

and

implements negative caching of failed requests.


Squid consists of a main server program, a Domain Name System
lookup program (dnsserver), a program for retrieving FTP data (ftpget)
and some management and client tools.
In other words Squid is
1. full featured Web proxy cache
2. free, open-source software
3. the result of many contributions by unpaid (and paid) volunteers

Introduction to Squid

Squid Support

proxy and caching of Hypertext Transfer Protocol (HTTP), File


Transfer Protocol (FTP), and other

Uniform Resource Locators (URLs)

Proxiying for Secure Socket Layer (SSL)

cache hierarchies

Internet Cache Protocol (ICP), Hyper Text Caching Protocol


(HTCP), Cache Array Routing Protocol (CARP), Cache Digests

transparent caching

Web Cache Coordination Protocol (WCCP) (Squid v2.3 and


above)

extensive access controls

HTTP server acceleration

Simple Network Management Protocol (SNMP)

caching of DNS lookups

3.1 Hardware and Software Requirement

RAM
Minimum RAM recommended = 128mb (scales by user count and
size of disk cache)

Disk
Small user count = 512MB to 1G
Large user count = 16G to 24G

Most version on UNIX


Also work on AIX, Digital UNIX, FreeBSD, Hp-UX, IRIX, LINUX,
NetBSD, NextStep,SCO, Solaris and SunOS

10

Introduction to Squid

3.2 Directory Structure


Squid normally creates a few directories shown in Table 3-1

Directories

Explaination

/var/cache

Stored the actual data

/etc/squid

Contains the squid.conf file which


it is only squid config file

/var/log

Query each connection (example


if the directory getting larger)
Table 3-1: Squid Directory

3.3 Getting and installing squid


Custom Configuration for Network
There are three configurations for proxy server in the network. The
configuration file will follow to the requirement for the usage in your
network. They are transparency proxy, reverse proxy and web cache
proxy.

11

Introduction to Squid

Configuring squid for transparency

Internet

Transparent Proxy Server

10.1.1.1

10.1.1.1
80

client
client
client

client

client
LAN

client

client
LAN

Figure 3-1: Transparent Proxy

A Transparent proxy (Figure 3-1) is configured when you want to


grab a certain type of traffic at your gateway or router and send it
through a proxy without the knowledge of the user or client. In other
words, router will forward all traffic to port 80 to proxy machine using
a route policy.
By using squid as transparent proxy, it will involve two part of process:
1. squid need to be configured properly to accept non-proxy
requests
2. web traffic gets redirected to the squid port

12

Introduction to Squid

This type of transparency proxy is suitable for

Intercept the network traffic transparently to the browser

Simplified administration- the browser does not need to be


configured to talk to a cache

Central control the user cannot change the browser to bypass


the cache

The disadvantages of using this type of proxy are

Browser dependency transparent proxy does not work very


well with certain web-browsers

User control Transparent caching takes control away from the


user where the user will change ISPs to either avoid it or get it

Configuring squid for reverse proxy

Internet

client

Reverse Proxy Server

Web Server Cluster

Figure 3-2: Reverse Proxy

13

Introduction to Squid

A Reverse Proxy (also known as Web Server Acceleration) (Figure


3-2) is a method of reducing the load on a busy web server by using a
web cache between the server and the internet.
In this case, when a client browser makes a request, the DNS will
route the request to the reverse proxy (this is not the actual web
server). Then the reverse proxy will check its cache to find out
whether the request contains is available to fulfill the client request. If
not, it will contact the real web server and downloads the requested
contains to its disk cache.
Benefits that can be gained are
1. security improvement
2. scalability improvement without increasing the complexity of
maintenance too much.
3. easy burden on a web server that provides both static and
dynamic content. The static content can be cached on the
reverse proxy while the web server will be freed up to better
handle the dynamic content.
To run Squid as an accelerator, you probably want to listen on port 80.
Hence, you have to define the machine you are accelerating for. (not
covered in this chapter).

14

Introduction to Squid

Configuring squid for Web Cache proxy


Internet

Router

Web Cache Proxy Server

Router

client

client

client

client

client

Figure 3-3 Web Cache Proxy

By default, squid is configured as a direct proxy (Figure 3-3). In order


to cache web traffic with squid, the browser must be configured to use
the squid proxy. This needs the following information

proxy server's IP address

port number by which the proxy server accepts connections

15

Introduction to Squid

3.4 Install squid


The Squid proxy caching server software package comes with Fedora
Core V6. Therefore, we do not have to install it. Just manage the
configuration file to make it work.
If no Squid installed in your server you can install it from Squid RPM
file. To do so, you need to download the RPM file from the Internet or
copy it from installation CD. Then run this command
# rpm i squid-2.6.STABLE4-1.fc6.i386.rpm
NOTE: The RPM file name may be differ depends on the version of
Squid you have downloaded

Alternatively, you can install it from Squid installation script where it


can be downloaded from official Squid Proxy server web site,
http://www.squid-cache.org.

To

do

so,

you

need

to

copy

the

installation folder into your local drive and run the following command.
# ./configure
# make
# make install

NOTE: Make sure all the dependency files are already installed in
your machine before starting to install Squid

16

Introduction to Squid

3.5 Basic Squid Configuration


Configure SQUID
All Squid configuration files are kept in the directory /etc/squid.

The following paragraph of this chapter will works through the options
that may need some further changes to get Squid to run. Most people
will not need to change all of these settings. What usually needs to
change is at least one part of the configuration file though: the default
file in squid.conf, which denies the access to the browser. If you
don't change this, Squid will not be very useful.

Basic Configuration
All of squid configuration goes in one file - squid.conf. This section
details up the configuration of Squid as a caching proxy only, not as
http-accelerator.
Some basic configuration need to be implemented. First, uncomment
and edit the following lines in the configuration file found at default file
/etc/squid/squid.conf
To construct the squid server, do the following tasks
1. log in as root to the machine
2. type the following command
# vi /etc/squid/squid.conf
The above command will open Squid configuration file for editing

17

Introduction to Squid

Then, set the port on which Squid listens. Normally, Squid will listen
on port 3128. While it may convenient to listen on this port, network
administrators often configure the proxy to listen on port 8080 as well.
This is a non-well-known port, while (port 1024 are well-known ports
and are restricted from being used ordinary users processes), and is
therefore not going to be in conflict with other ports such as 80, 443,
22, 23, etc. Squid need not be restricted to one port. It could easily be
started in two or more ports.
At squid.conf file, find out the following sentence for some changes
or leave it as default if its port is 3128.
http_port
Check
http_port 3128 (is a default.)
or
http_port 8080 3128 (for multiple port)
.

18

Introduction to Squid

Additionally, if you have multiple networks cards in your proxy server,


and would like to restrict the proxy to start on port 8080 on the first
network card and port 3128 on the second network card. You can add
the following sentence.
http_port

10.1.5.49:8080

10.0.5.50:3128

http_access
By default http_access is denied. The Access Control Lists (ACL) rules
should be modified to allow access only to the trusted clients. This is
important because it prevents people from stealing your network
resources.
ACL will be discussed in Chapter 4.
cache_dir
This directive specifies the cache directory storage format and its size
as given below.
cache_dir ufs /var/spool/squid 100 16 256
The value 100 denotes 100MB cache size. This can be adjusted to the
required size. (cache will be discuss later in Chapter 5)
cache_effective_user
cache_effective_ group

NOTE: You can edit the squid.conf file by using gedit instead
of command line

19

Introduction to Squid

Starting Squid Daemon


In this chapter, we will learn how to start Squid. Make sure you have
finished editing the configuration file. Then you can start Squid for the
first time.
First, you have to check the error in conf file. Type this command at
your terminal

# squid -k parse
If error detected, for example
# squid k parse
FATAL: could not determine fully qualified hostname, Please
set visible hostname
Squid Cache (versio 2.6.STABLE4):Terminated abnormally.
CPU Usage:0.0004 seconds=0.0004 user+0.000 sys
Maximum Resident Size:0KB
Page faults with physical i/o:0
Aborted.
Solution : Add the following sentence in squid.conf file
visible_hostname localhost
If no error detected, continue with the following command to start
squid. (This is temporarily step to start the squid)

# service squid start

If everything is working fine, then your console displays:


Starting squid: .
If you want to stop the service,
# service squid stop
Then your console will display:

20

[OK]

Introduction to Squid

Stopping squid: .

[OK]

You should be a privileged user to start or stop squid.


For permanent step, try this command
# chkconfig list
# chkconfig -level 5 squid on
You can restart the squid service by typing
#/etc/init.d/squid restart
While the daemon is running, there are several ways you can run the
squid command to change how the daemon works by using this
options:
# squid k reconfigure
- causes Squid to read again its configuration file

#squid k shutdown
- causes Squid to exit after waiting briefly for current connections to
exit
#squid k interrupt
- shuts down Squid immediately, without waiting for connections to
close

#squid k kill
kills Squid immediately, without closing connections or log files. (use
this option only if other methods dont work)

21

Introduction to Squid

3.6 Basic Client Software Configuration


Basic Configuration
To configure any browser, you need at least two pieces of information:

Proxy server's IP Address

Port number that the proxy server is accepting the requests

Configuring Internet Browser

The following section will explain the steps to configure proxy server in
Internet Explorer, Mozilla Firefox and Opera.
Internet Explorer 7.0
1. Select the Tools menu option
2. Select Internet Options
3. Click on the Connection tab
4. Select LAN settings
5. The Internet using a proxy server
6. Check the box in proxy server Type in the proxy IP address in
the Address field, and the port number in the Port field.
Example:

Address : 10.0.5.10 Port : 3128

Mozilla Firefox
1. Click Tools Options Advanced
2. Click at Network go to connection Settings
3. At the configure proxies to Access Internet

22

Introduction to Squid

4. Choose manual proxy configuration


5. At HTTP Proxy: 10.0.5.10

Port: 3128

6. Check the box to use the proxy server for all protocols
7. Then click OK
8. Now, the client can access the internet.
Opera 9.1
1. Click Tools Preferences Advanced
2. Choose Network
3. Click at Proxy Sever
Check

HTTP

: 10.0.5.10

Port :3128

HTTPs

: 10.0.5.10

Port :3128

FTP

: 10.0.5.10

Port :3128

Gropher

: 10.0.5.10

Port :3128

4. Then, Click OK

Using proxy.pac File


This setting is for the clients when they want to have browsers pick up
proxy setting automatically. The browser can be configured with a
simple proxy.pac file as shown in the example below;

function FindProxyForURL(url, host)


{
if (isInNet(myIpAddress(), "10.0.5.0", "255.255.255.0"))
return "PROXY 10.0.5.10:3128";
else
return "DIRECT";
}

23

Introduction to Squid

proxy.pac needs to be installed in a web server such as Apache, and


the client can configure proxy server using the automatic configuration
script. This script is useful when there is possibility that the proxy
server will change its IP address. To access the script, client needs to
add the URL of proxy.pac in its automatic configuration proxy script
(Figure 3-4).

Figure 3-4: Using automatic configuration script

24

Chapter

4. ACL Configuration
4.1 Access controls
Access control lists (ACL) are the most important part in configuring
Squid. The main use of the ACL is to implement simple access control
where it is used to restrict other people from using cache infrastructure
without certain permission. Rules can be written for almost any type of
requirement. It can be very complex for large organisations or just a
simple configuration to home users.
ACL is written in squid.conf file using the following formats
acl name type (string|"filename") [string2] ["filename2"]
name is a variable defined by user and it should be descriptive while
type is defined accordingly and it will be described in the next section
.

25

ACL Configuration

There are two elements in access control: classes and operators.


Classes are defined by the acl, while the name of the operators varies.
The most common operators are http_access and icp_access. The
actions for this operator are allow and deny. allow is used to allow or
enable the ACL while deny used to deny or restrict the ACL
General format for operator
http_access

allow|deny

[!]aclname [!]aclname2 ... ]

List of ACL type


ACL Type

Details

src

client IP address

srcdomain

client domain name

dst

destinations IP address

dstdomain

destinations domain name

srcdom_regex

Regular expression describing client domain name

dstdom_regex

Regular expression describing destination domain


name

time

specify the time

url_regex

Regular

expression

describing

whole

URL

of

URL

of

destination (web server)


urlpath_regex

Regular

expression

describing

path

of

destination (not include its domain name)


port

Specify port number

proto

Specify protocol

method

Specify method

browser

Specify browser

proxy_auth

User authentication via external processes

maxconn

Specify number of connection

26

ACL Configuration

src
Description
This ACL allows server to recognize client (the computer which will use
server as proxy to get access to the internet ) using its IP address. The
IP address can be listed using single IP address, range of IP or using
defined IP address in an external file.
Syntax
acl

aclname

src

ip-address/netmask .. (clients IP address)

acl

aclname

src

addr1-addr2/netmask .. (range of addresses)

acl

aclname

src

filename ..(client's IP address in external file)

Example 1
acl fullaccess src /etc/squid/fullaccess.txt
http_access allow fullaccess
This ACL is using external file named fullaccess.txt where fullaccess.txt
consist of list of IP address of the client.
Example of fullaccess.txt
198.123.56.12
198.123.56.13
198.123.56.34
Example 2
acl office.net src 192.123.56.0/255.255.255.0
http_access allow office.net
This ACL set the source address for office.net in range 192.123.56.x to
access the Internet using http_access allow operator

27

ACL Configuration

srcdomain
Description
This ACL allows server to recognize client using clients computer
name. To do so, squid needs to reverse DNS lookup (from client ipaddress to client domain-name) before this ACL is interpreted, it can
cause processing delays.
Syntax
acl

aclname

srcdomain domain-name..(reverse lookup client IP)

Example 1
acl staff.net srcdomain staff20 staff21
http_access allow
staff.net
This ACL is for clients with computer name staff20 and staff21. The
operator http_access is allowing the ACL named staff.net to access
the Internet. This option is not really effective since the computer must
do reverse name lookup to determine the source name.

NOTE: Please ensure the DNS server in running in order to use


DNS lookup service

28

ACL Configuration

dst
Description
This is same as src, the difference is only it refers to Servers IP
address (destination). First, Squid will dns-lookup for IP Address from
the domain-name, which is in request header, and then interpret it
Syntax
acl

aclname

dst

ip_address/netmask .. (URL host's or the site

IP address)
Example 1
acl tunnel dst 209.8.233.0/24
http_access deny tunnel
This ACL deny any node with IP 209.8.233.x
Example 2
acl allow_ip dst 209.8.233.0-209.8.233.100/255.255.0.0
http_access allow allow_ip
This ACL is allowing destination with IP address range
from
209.8.233.0 to 209.8.233.100.

dstdomain
Description
This ACL recognize destination using its domain. This is the effective
method to control specific domain
Syntax
acl

aclname

dstdomain

domain.com

(domain name from the site's URL)

29

ACL Configuration

Example 1
acl banned_domain dstdomain www.terrorist.com
http_access deny banned_domain
This ACL deny destionation with domain www.terrorist.com

srcdom_regex
Description
This ACL is almost similar to srcdomain where the server needs to
reverse DNS lookup (from client ip-address to client domain-name)
before this ACL is interpreted. The difference is this ACL allow the
usage of regular expression in defining the clients domain.
Syntax
acl

aclname

srcdom_regex -i

source_domain_regex

Example 1
acl staff.net srcdom_regex -i staff
http_access allow staff.net
This ACL allows all the node with the domain contains word staff to
access the internet. Option -i is used to make expression caseinsensitive

dstdom_regex
Description
This ACL allows server to recognize destination using its domain
regular expression.
Syntax
acl

30

aclname

dstdom_regex -i

dst_domain_regex

ACL Configuration

Example 1
acl banned_domain dstdom_regex -i terror porn
http_access deny banned_domain
This ACL denies client to access the destinations that contain word
terrorist or porn in its domain name. For example the access to the
domain www.terrorist.com and www.pornoragphy.net will be denied by
proxy server.

time
Description
This ACL allows server to control the service using time function. The
accessibility to the network can be set according the scheduled time in
ACL
Syntax
acl

aclname

time

day abbrevs h1:m1h2:m2

where h1:m1 must be less than h2:m2 and day will be represented
using abbreviation in Table 4-1

day

abbreviations

Sunday

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

Table 4-1 Abbreviation for Day

31

ACL Configuration

Example 1
acl SABTU time A 9:00-17:00
ACL SABTU refers to day of Saturday from 9:00 to 17:00
Example 2
acl pagi time 9:00-11:00
acl office.net 10.2.3.0/24
http_access deny pagi office.net
pagi refers time from 9:00 to 11:00, while office.net refer to the
clients' IP. This combination of ACLs deny the access for office.net if
the time is between 9.00am to 11.00 am

url_regex
Description
The url_regex means to search the entire URL for the regular
expression you specify. Note that these regular expressions are casesensitive. To make them case-insensitive, use the -i option
Syntax
acl

aclname

url_regex -i

url_regex ..

Example 1
acl banned_url url_regex -i terror porn
http_access deny banned_url
This ACL deny URL that contains word terrorist or porn.
For example, the following destination will be denied by the proxy
server;
http://www.google.com/pornography
http://www.news.com/terrorist.html
http://www.terror.com/

32

ACL Configuration

urlpath_regex
Description
The urlpath_regex is regular expression pattern matching from URL
but excluding protocol and hostname.
If

URL

is

http://www.free.com/latest/games/tetris.exe

then

this

acltype only looks after http://www.free.com/. It will leave out the http
protocol and www.free.com hostname.
Syntax
acl

aclname

urlpath_regex

pattern

Example 1
acl blocked_free urlpath_regex free
http_access deny blocked_free
This ACL will blocked any URL that only containing "free'' not "Free,
and without referring to protocol and hostname.
These regular expressions are case-sensitive. To make them caseinsensitive, add the i option.
Example 2
acl blocked_games urlpath_regex i games
http_access deny blocked_games
blocked_games refers to the URL containing word games no matter if
the spelling in upper or lower case.
Example 3
To block several URL.
acl block_site urlpath_regex i
/etc/squid/acl/block_site
http_access deny block_site

33

ACL Configuration

To block several URL, it is recommended to put the lists in one file. As


in Example 3, all block_site list is in /etc/squid/acl/block_site file.
File block_site may containing, for example
\.exe$
\.mp3$

port
Description
Access can be controlled by destination (server) port address
Syntax
acl

aclname

port port-number

Example 1
Deny requests to unknown ports
acl Safe_ports port 80
acl Safe_ports port 21
acl Safe_ports port 443 563

# http
# ftp
# https, snews

http_access deny !Safe_ports


Example 2
Deny to several untrusted ports
acl safeport port /etc/squid/acl/safeport
http_access deny safeport

34

ACL Configuration

proto
Description
This specifies the transfer protocol
Syntax
acl

aclname

proto

protocol

Example 1
acl protocol proto HTTP FTP
This refers protocols HTTP and FTP
Example 2
acl manager proto cache_object
http_access allow manager localhost
http_access deny manager
Only allow cachemgr access from localhost.
Example 3
acl ftp proto FTP
http_access deny ftp
http_access allow all
This command should block every ftp request

35

ACL Configuration

method
Description
This specifies the type of the method of the request
Syntax
acl

aclname

method

method-type

Example 1
acl connect method CONNECT
http_access allow localhost
http_access allow allowed_clients
http_access deny connect
the CONNECT method to prevent outside people from trying to connect
to the proxy server

browser
Description
Regular expression pattern matching on the request's user-agent
header. To grep the user-agent header information, squid.conf
should be added this line:
useragent_log /var/log/squid/useragent.log
Then, try to run the Mozilla browser. The user-agent header for Mozilla
should be as in the example.
Syntax
acl

aclname

browser

pattern

Example 1
acl mozilla browser ^Mozilla/5\.0
http_access deny mozilla
This command will deny Mozilla browsers or any other browser related
to it.
36

ACL Configuration

proxy_auth
Description
User authentication via external processes. proxy_auth requires an
EXTERNAL

authentication

program

to

check

username/password

combinations. In this configuration, we use the NCSA authentication


method because it is the easiest method to implement.
Syntax
acl

aclname

proxy_auth

username...

Example 1
To validate a listing of users, we should do the following steps.
Creating passwd file
# touch
# chown
# chmod

/etc/squid/passwd
root.squid /etc/squid/passwd
640 /etc/squid/passwd

Adding users
# htpasswd

/etc/squid/passwd shah

You will be prompted to enter a passwd for that user. In the example is
the passwd for user shah.
Setting rules
auth_param basic program /usr/lib/squid/ncsa_auth
/etc/squid/passwd
auth_param basic children 5
auth_param basic realm Squid proxy-caching web-server
auth_param basic credentialsttl 2 hours
These listings are already in the configuration file but need to be
adjusted to suit your environments.

37

ACL Configuration

Authentication configuration
acl LOGIN proxy_auth REQUIRED
http_access allow LOGIN
This command will only allow user that have been authenticated during
accessing network connection.
CAUTION !! proxy_auth can't be used in a transparent proxy.

maxconn
Description
A limit on the maximum number of connections from a single client IP
address. It is an ACL that will be true if the user has more than
maxconn connections open.
Syntax
acl

aclname

maxconn

number_of_connection

Example 1
acl someuser src 10.0.5.0/24
acl 5conn maxconn 5
http_access deny someuser 5conn
The command will restrict users in 10.0.5.0/24 subnet to have only
five (5) maximum connections at once. If exceed, the error page will
appear. Other users are not restricted to this command by adding the
last line.

CAUTION !! The maxconn ACL requires the client_db feature. If


client_db is disabled (for example with client_db off) then maxconn
ALCs will not work.

38

ACL Configuration

Create custom error page


# vi /etc/squid/error/ERROR_MESSAGE
Append the following
<HTML>
<HEAD>
<TITLE> ERROR : ACCESS DENIED FROM PROXY SERVER </TITLE>
</HEAD>
<BODY>
<H1> The site is blocked due to IT policy</H1>
<p> Please contact helpdesk for more information: </p>
Phone: 06-2333333 (ext 33) <br>
Email: helpdesk@utem.edu.my <br>

CAUTION !!
Do not include HTML close tags </HTML></BODY>

Displaying custom error message


acl blocked_port port 80
deny_info ERROR_MESSAGE block_port
http_access deny block_port

39

ACL Configuration

4.2 Exercises
1.

Why the users still can do the download process with the

following configuration.
acl download urlpath_regex -i \.exe$
acl office_hours time 09:00-17:00
acl GET method GET
acl it_user1 src 192.168.1.88
acl it_user2 src 192.168.1.89
acl nodownload1 src 192.168.1.10
acl nodownload2 src 192.168.1.11
http_access
http_access
http_access
http_access

allow
allow
allow
allow

it_user1
it_user2
nodownload1
nodownload2

http_access deny GET office_hours nodownload1 nodownload2


http_access deny all
The configuration should deny the nodownload1 and nodownload2. the
allow lines should be deleted.

40

ACL Configuration

2.

Why this configuration still bypasses the game.free.com?

acl ban dstdomain free.com


http_access deny ban

3.

The following access control configuration will never work. Why?

acl ME src 10.0.0.1


acl YOU src 10.0.0.2
http_access allow ME YOU

41

Caching

Chapter

5. Caching
5.1 Concepts

Caching (a.k.a proxy server) is the process of storing data on the


intermediate system between the Web server and the client.

The proxy server can simply send the content requested by the
client form it copy in cache.

The assumption is that later requests for the same data can be
serviced more quickly by not having to go all the way back to the
original server.

Caching also can reduce demands on network resources and on


the information servers.

5.2 Configuring a cache for proxy server


There are a lot of parameters related to caching in Squid and these
parameters can be divided into three main groups as below:
A. Cache size
B. Cache directories and log file path name
C. Peer cache servers and Squid hierarchy

42

Caching

However, in the following subsection, only the first two groups will be
covered.
A. Cache Size
The following are the common parameters used in cache size.
i. cache_mem
Syntax
cache_mem size(MB)
This parameter specifies the amount of cache memory (RAM)
used to store in-transit object (ones that are currently being
used), hot objects (one that are used often) and negative-cached
object (recent failed request). Default size value is 8MB.
Example:
cache_mem 16 MB

ii. maximum_object_size
Syntax
maximum_object_size

size(MB)

This parameter used if you want not to cache file that are
larger or equal to the size set. Default size value is 4MB.
Example:
maximum_object_size 8 MB

43

Caching

iii. ipcache_size
Syntax
ipcache_size

size(MB)

This parameter used to set how many IP address resolution


values Squid stores. Default value size is 1MB.
Example:
ipcache_size 32MB
iv. ipcache_high
Syntax
ipcache_high percentage
This parameter specifies the percentage that causes Squid to
start clearing out the least-used IP address resolution. Usually
the default value is always used.
Example:
ipcache_high 95
v. ipcache_low
Syntax
ipcache_low

percentage

This parameter specifies the percentage that causes Squid to


stop clearing out the least-used IP address resolution. Usually
the default value is always used.

44

Caching

Example:
ipcache_low 90
B. Cache Directories
i. cache_dir
Syntax
cache_dir

type dir

size(MB)

L1

L2

This parameter specifies the directory/directories in which cache


swap files are stored. The default dir is /var/spool/squid
directory. We can specify how much disk space to use for cache
in megabytes (100 is the default), the default number of firstlevel directories (L1) and second-level directories (L2) is 16 and
256 respectively.
Example:
cache_dir aufs /var/cache01 7000 16 256

NOTE: /var/cache01 is a partition that have been created


during Linux Fedora installation

Formula to calculate the first-level directories (L1):


Given :
x=Size of cache dir in KB (e.g., 6GB = 6,000,000KB)
y=Average object size (e.g, 13KB)
z=Objects per L2 directories (Assuming 256)
calculate:
L1 = number of L1 directories
L2 = number of L2 directories
such that:
L1 x L2 = x / y / z
45

Caching

Example :
x = 6GB
= 6 * 1024 *1024 = 6291456 KB
so ;
x / y / z = 6291456 / 13 / 256
= 1890
and
L1 * L2 = x / y / z
L1 * 256 = 1890
L1

= 7

ii. access_log
Syntax
cache_log dir
This parameter specifies the location where the HTTP and ICP
accesses are stored. The default dir /var/log/squid/access.log is
always used.
Example:
cache_log /var/log/squid/access.log

46

Chapter

6. SQUID and Webmin


6.1 About Webmin
Webmin is a graphical user interface for system administration for
Unix. It is a web-based system and can be installed in most of the Unix
system. Webmin is a free software and the installation package can be
downloaded from the Net. Webmin is largely based on Perl, and it is
running as its own process, and web server. It usually uses TCP port
10000 for communicating, and can be configured to use SSL if
OpenSSL is installed.

6.2 Obtaining and Installing Webmin


Webmin installation package is available at the official Webmin site
http://www.webmin.com/download.html.
You can download the latest package and locate it in the local machine.

47

SQUID and Webmin

Installation of Webmin differs slightly depending on which type of


package you choose to install. Note that Webmin requires a relatively
recent Perl for any of these installation methods to work. Nearly all, if
not all, modern UNIX and UNIX-like OS variants now include Perl as a
standard component of the OS, so this should not be an issue.

Installing from a tar. gz


First you must untar and unzip the archive in the directory where you
would like Webmin to be installed. The most common location for
installation from tarballs is /usr/local. Some sites prefer /opt. If
youre using GNU tar, you can do this all on one command line:
#tar zxvf webmin-1.340.tar.gz
If you have a less capable version of tar, you must unzip the file first
and then untar it:
# gunzip webmin-1.340.tar.gz
# tar xvf webmin-1.340.tar.gz
Next, you need to change to the directory that was created when you
untarred the archive, and execute the setup.sh script, as shown in
the following example. The script will ask several questions about your
system and your preferences for the installation. Generally, accepting
the default values will work. The command for installation as below:
# ./setup.sh

Installing from an RPM


Installing from an RPM is even easier. You only need to run one
command:
# rpm -Uvh webmin-1.340-1.noarch.rpm

48

SQUID and Webmin

This will copy all of the Webmin files to the appropriate locations and
run the install script with appropriate default values. For example, the
Webmin perl files will be installed in /usr/libexec/webmin while the
configuration files will end up in /etc/webmin. Webmin will then be
started on port 10000. You may log in using root as the login name
and your system root password as the password. It's unlikely you will
need to change any of these items from the command line, because
they can all be modified using Webmin. If you do need to make any
changes, you can do so in miniserv.conf in /etc/webmin.

After Installation
After

installation,

your

Webmin

installation

will

behave

nearly

identically, regardless of operating system vendor or version, location


of installation, or method of installation. The only apparent differences
between systems will be that some have more or fewer modules
because some are specific to one OS. Others will feature slightly
different versions of modules to take into account different functioning
of the underlying system. For example, the package manager module
may behave differently, or be missing from the available options
entirely, depending on your OS.

6.3 Using Squid in Webmin

To launch Webmin, open a web browser, such as Netscape or Mozilla


Firefox, on any machine that has network access to the server on
which you wish to log in. Browse to port 10000 on the IP or host name
of the server using http://computername:10000/. Go to menu Squid
Proxy Server (in submenu Server) to open the main panel (Figure 6-1)

49

SQUID and Webmin

Figure 6-1: Squid Proxy Main Page

6.4 Ports and Networking


The Ports and Networking page provides you with the ability to
configure most of the network level options of Squid. Squid has a
number of options to define what ports Squid operates on, what IP
addresses it uses for client traffic and intercache traffic, and multicast
options. Usually, on dedicated caching systems these options will not
be useful. But in some cases you may need to adjust these to prevent
the Squid daemon from interfering with other services on the system
or on your network.

50

SQUID and Webmin

Proxy port
Sets the network port on which Squid operates. This option is usually
3128 by default and can almost always be left on this address, except
when multiple Squids are running on the same system, which is
usually ill-advised. This option corresponds to the http_port option in
squid.conf.

ICP port
This is the port on which Squid listens for Internet Cache Protocol, or
ICP, messages. ICP is a protocol used by web caches to communicate
and share data. Using ICP it is possible for multiple web caches to
share cached entries so that if any one local cache has an object, the
distant origin server will not have to be queried for the object. Further,
cache hierarchies can be constructed of multiple caches at multiple
privately interconnected sites to provide improved hit rates and higherquality web response for all sites. More on this in later sections. This
option correlates to the icp_port directive.

Incoming TCP address


The address on which Squid opens an HTTP socket that listens for
client connections and connections from other caches. By default Squid
does not bind to any particular address and will answer on any address
that is active on the system. This option is not usually used, but can
provide some additional level of security, if you wish to disallow any
outside network users from proxying through your web cache. This
option correlates to the tcp_incoming_address directive.

51

SQUID and Webmin

Outgoing TCP address


Defines the address on which Squid sends out packets via HTTP to
clients and other caches. Again, this option is rarely used. It refers to
the tcp_ outgoing_address directive.

Incoming UDP address


Sets the address on which Squid will listen for ICP packets from other
web caches. This option allows you to restrict which subnets will be
allowed to connect to your cache on a multi-homed, or containing
multiple

subnets,

Squid

host.

This

option

correlates

to

the

udp_incoming_address directive.

Outgoing UDP address


The address on which Squid will send out ICP packets to other web
caches. This option correlates to the udp_outgoing_address.

Multicast groups
The multicast groups that Squid will join to receive multicast ICP
requests. This option should be used with great care, as it is used to
configure your Squid to listen for multicast ICP queries. Clearly if your
server is not on the MBone, this option is useless. And even if it is, this
may not be an ideal choice.

52

SQUID and Webmin

TCP receive buffer


The size of the buffer used for TCP packets being received. By default
Squid uses whatever the default buffer size for your operating system
is. This should probably not be changed unless you know what youre
doing, and there is little to be gained by changing it in most cases.
This correlates to the tcp_recv_bufsize directive.

6.5 Other Caches


The Other Caches page provides an interface to one of Squids most
interesting, but also widely misunderstood, features. Squid is the
reference implementation of ICP, a simple but effective means for
multiple caches to communicate with each other regarding the content
that is available on each. This opens the door for many interesting
possibilities when one is designing a caching infrastructure.

Internet Cache Protocol


It is probably useful to discuss how ICP works and some common
usages for ICP within Squid, in order to quickly make it clear what it is
good for, and perhaps even more importantly, what it is not good for.
The most popular uses for ICP are discussed, and more good ideas will
probably arise in the future as the Internet becomes even more global
in scope and the web-caching infrastructure must grow with it.

53

SQUID and Webmin

Parent and Sibling Relationships


The ICP protocol specifies that a web cache can act as either a parent
or a sibling. A parent cache is simply an ICP capable cache that will
answer both hits and misses for child caches, while a sibling will only
answer hits for other siblings. This subtle distinction means simply that
a parent cache cans proxy for caches that have no direct route to the
Internet. A sibling cache, on the other hand, cannot be relied upon to
answer all requests, and your cache must have another method to
retrieve requests that cannot come from the sibling. This usually
means that in sibling relationships, your cache will also have a direct
connection to the Internet or a parent proxy that can retrieve misses
from the origin servers. ICP is a somewhat chatty protocol, in that an
ICP request will be sent to every neighbor cache each time a cache
miss occurs. By default, whichever cache replies with an ICP hit first
will be the cache used to request the object.

When to Use ICP?


ICP is often used in situations wherein one has multiple Internet
connections, or several types of paths to Internet content. Finally, it is
possible,

though

usually

not

recommended,

to

implement

rudimentary form of load balancing through the use of multiple parents


and multiple child web caches.
One of the common uses of ICP is cache meshes. A cache mesh is, in
short, a number of web caches at remote sites interconnected using
ICP. The web caches could be in different cities, or they could be in
different buildings of the same university or different floors in the same
office building. This type of hierarchy allows a large number of caches
to benefit from a larger client population than is directly available to it.

54

SQUID and Webmin

All other things being equal, a cache that is not overloaded will
perform better (with regard to hit ratio) with a larger number of
clients. Simply put, a larger client population leads to a higher quality
of cache content, which in turn leads to higher hit ratios and improved
bandwidth savings. So, whenever it is possible to increase the client
population without overloading the cache, such as in the case of a
cache mesh, it may be worth considering. Again, this type of hierarchy
can be improved upon by the use of Cache Digests, but ICP is usually
simpler to implement and is a widely supported standard, even on
non-Squid caches.
Finally, ICP is also sometimes used for load balancing multiple caches
at the same site. ICP, or even Cache Digests for that matter, are
almost never the best way to implement load balancing. Using ICP for
load balancing can be achieved in a few ways.

Through have several local siblings, which can each provide hits
to the others clients, while the client load is evenly divided
across the number of caches.

Using fast but low-capacity web cache in front of two or more


lower-cost, but higher-capacity, parent web caches. The parents
will then provide the requests in a roughly equal amount.

6.6 Other Proxy Cache Servers


This section of the Other Caches page provides a list of currently
configured sibling and parent caches, and also allows one to add more
neighbor caches. Clicking the name of a neighbor cache will allow you
to edit it. This section also provides the vital information about the
neighbor caches, such as the type (parent, sibling, multicast), the
proxy or HTTP port, and the ICP or UDP port of the caches. Note that

55

SQUID and Webmin

Proxy port is the port where the neighbor cache normally listens for
client traffic, which defaults to 3128.

Edit Cache Host


Clicking a cache peer name or clicking Add another cache on the
primary Other Caches page brings you to this page, which allows you
to edit most of the relevant details about neighbor caches (Figure 6-2)

Figure 6-2: Create cache Host page

Hostname
The name or IP address of the neighbor cache you want your cache to
communicate with. Note that this will be one-way traffic. Access
Control Lists, or ACLs, are used to allow ICP requests from other
caches. ACLs are covered later. This option plus most of the rest of the
options on this page correspond to cache_ peer lines in squid.conf.

56

SQUID and Webmin

Type
The type of relationship you want your cache to have with the neighbor
cache. If the cache is upstream, and you have no control over it, you
will need to consult with the administrator to find out what kind of
relationship you should set up. If it is configured wrong, cache misses
will likely result in errors for your users. The options here are sibling,
parent, and multicast.

Proxy port
The port on which the neighbor cache is listening for standard HTTP
requests. Even though the caches transmit availability data via ICP,
actual web objects are still transmitted via HTTP on the port usually
used for standard client traffic. If your neighbor cache is a Squid-based
cache, then it is likely to be listening on the default port of 3128. Other
common ports used by cache servers include 8000, 8888, 8080, and
even 80 in some circumstances.

ICP port
The port on which the neighbor cache is configured to listen for ICP
traffic. If your neighbor cache is a Squid-based proxy, this value can
be found by checking the icp_port directive in the squid.conf file on
the neighbor cache. Generally, however, the neighbor cache will listen
on the default port 3130.

57

SQUID and Webmin

Proxy only?
A simple yes or no question to tell whether objects fetched from the
neighbor cache should be cached locally. This can be used when all
caches are operating well below their client capacity, but disk space is
at a premium or hit ratio is of prime importance.

Send ICP queries?


Tells your cache whether or not to send ICP queries to a neighbor. The
default is Yes, and it should probably stay that way. ICP queries is the
method by which Squid knows which caches are responding and which
caches are closest or best able to quickly answer a request.

Default cache
This is switched to Yes if this neighbor cache is to be the last-resort
parent cache to be used in the event that no other neighbor cache is
present as determined by ICP queries. Note that this does not prevent
it from being used normally while other caches are responding as
expected. Also, if this neighbor is the sole parent proxy, and no other
route to the Internet exists, this should be enabled.

Round-robin cache?
Choose whether to use round-robin scheduling between multiple
parent caches in the absence of ICP queries. This should be set on all
parents that you would like to schedule in this way.

58

SQUID and Webmin

ICP time-to-live
Defines the multicast TTL for ICP packets. When using multicast ICP, it
is usually wise for security and bandwidth reasons to use the minimum
tty suitable for your network.

Cache weighting
Sets the weight for a parent cache. When using this option it is
possible to set higher numbers for preferred caches. The default value
is 1, and if left unset for all parent caches, whichever cache responds
positively first to an ICP query will be sent a request to fetch that
object.

Closest only
Allows

you

to

specify

that

your

cache

wants

only

CLOSEST_PARENT_MISS replies from parent caches. This allows your


cache to then request the object from the parent cache closest to the
origin server.

No digest?
Chooses whether this neighbor cache should send cache digests. No
NetDB exchange When using ICP, it is possible for Squid to keep a
database of network information about the neighbor caches, including
availability and RTT, or Round Trip Time, information. This usually
allows Squid to choose more wisely which caches to make requests to
when multiple caches have the requested object.

59

SQUID and Webmin

No delay?
Prevents accesses to this neighbor cache from affecting delay pools.
Delay pools, discussed in more detail later, are a means by which
Squid can regulate bandwidth usage. If a neighbor cache is on the
local network, and bandwidth usage between the caches does not need
to be restricted, then this option can be used.

Login to proxy
Select this if you need to send authentication information when
challenged by the neighbor cache. On local networks, this type of
security is unlikely to be necessary.

Multicast responder
Allows Squid to know where to accept multicast ICP replies. Because
multicast is fed on a single IP to many caches, Squid must have some
way of determining which caches to listen to and what options apply to
that particular cache. Selecting Yes here configures Squid to listen for
multicast replies from the IP of this neighbor cache.

Query host for domains, Dont query for domains


These two options are the only options on this page to configure a
directive other than cache_peer in Squid. In this case it sets the
cache_peer_domain option. This allows you to configure whether
requests for certain domains can be queried via ICP and which should
not. It is often used to configure caches not to query other caches for
content within the local domain. Another common usage, such as in
60

SQUID and Webmin

the national web hierarchies discussed above, is to define which web


cache is used for requests destined for different TLDs. So, for example,
if one has a low cost satellite link to the U. S. backbone from another
country that is preferred for web traffic over the much more expensive
land line, one can configure the satellite-connected cache as the cache
to query for all .com, .edu, .org, net, .us, and .gov addresses.

Cache Selection Options


This

section

provides

configuration

options

for

general

ICP

configuration (Figure 6-3). These options affect all of the other


neighbor caches that you define.

Figure 6-3: Global ICP options

Directly fetch URLs containing


Allows you to configure a match list of items to always fetch directly
rather than query a neighbor cache. The default here is cgi-bin ? and
should continue to be included unless you know what youre doing.
This helps prevent wasting intercache bandwidth on lots of requests
that are usually never considered cacheable, and so will never return
hits

from

your

neighbor

caches.

This

option

sets

the

hierarchy_stoplist directive.
61

SQUID and Webmin

ICP query timeout


The time in milliseconds that Squid will wait before timing out ICP
requests. The default allows Squid to calculate an optimum value
based on average RTT of the neighbor caches. Usually, it is wise to
leave this unchanged. However, for reference, the default value in the
distant past was 2000, or 2 seconds. This option edits the icp_ query_
timeout directive.

Multicast ICP timeout


Timeout in milliseconds for multicast probes, which are sent out to
discover the number of active multicast peers listening on a given
multicast address. This configures the mcast_icp_query_timeout
directive and defaults to 2000 ms, or 2 seconds.

Dead peer timeout


Controls how long Squid waits to declare a peer cache dead. If there
are no ICP replies received in this amount of time, Squid will declare
the peer dead and will not expect to receive any further ICP replies.
However, it continues to send ICP queries for the peer and will mark it
active again on receipt of a reply. This timeout also affects when Squid
expects to receive ICP replies from peers. If more than this number of
seconds has passed since the last ICP reply was received, Squid will
not expect to receive an ICP reply on the next query. Thus, if your
time between requests is greater than this timeout, your cache will
send more requests DIRECT rather than through the neighbor caches.

62

SQUID and Webmin

Memory Usage
This page provides access to most of the options available for
configuring the way Squid uses memory and disks (Figure 6-4). Most
values on this page can remain unchanged, except in very high load or
low resource environments, where tuning can make a measurable
difference in how well Squid performs.
Gambar memory usage

Figure 6-4: Memory and disk usage

Memory usage limit


The limit on how much memory Squid will use for some parts of its
core data. Note that this does not restrict or limit Squids total process
size. What it does do is set aside a portion of RAM for use in storing intransit and hot objects, as well as negative cached objects. Generally,
the default value of 8MB is suitable for most situations, though it is
safe to lower it to 4 or 2MB in extremely low load situations. It can
also be raised significantly on high-memory systems to increase
63

SQUID and Webmin

performance by a small margin. Keep in mind that large cache


directories increase the memory usage of Squid by a large amount,
and even a machine with a lot of memory can run out of memory and
go into swap if cache memory and disk size are not appropriately
balanced. This option edits the cache_mem directive. See the section on
cache directories for more complete discussion of balancing memory
and storage.

Maximum cached object size


The size of the largest object that Squid will attempt to cache. Objects
larger than this will never be written to disk for later use. Refers to the
maximum_object_size directive. IP address cache size, IP cache highwater mark, IP address low-water mark The size of the cache used for
IP addresses and the high and low water marks for the cache,
respectively. This option configures the ipcache_size, ipcache_high,
and ipcache_low directives, which default to 1024 entries, 95%, and
90%.

6.7 Logging
Squid provides a number of logs that can be used when debugging
problems and when measuring the effectiveness and identifying users
and the sites they visit (Figure 6-5). Because Squid can be used to
snoop on users browsing habits, one should carefully consider
privacy laws in your region and, more importantly, be considerate to
your users. That being said, logs can be very valuable tools in ensuring
that your users get the best service possible from your cache.

64

SQUID and Webmin

Figure 6-5: Logging configuration

Cache metadata file


Filename used in each store directory to store the Web cache
metadata, which is a sort of index for the Web cache object store. This
is not a human readable log, and it is strongly recommended that you
leave it in its default location on each store directory, unless you really
know what you're doing. This option correlates to the cache_swap_log
directive.

Use HTTPD log format


Allows you to specify that Squid should write its access.log in HTTPD
common log file format, such as that used by Apache and many other
Web servers. This allows you to parse the log and generate reports
using a wider array of tools. However, this format does not provide
several types of information specific to caches, and is generally less
65

SQUID and Webmin

useful when tracking cache usage and solving problems. Because there
are several effective tools for parsing and generating reports from the
Squid standard access logs, it is usually preferable to leave this at its
default of being off. This option configures the emulate_httpd_log
directive. The Calamaris cache access log analyzer does not work if
this option is enabled.

Log full hostnames


Configures whether Squid will attempt to resolve the host name, so the
the fully qualified domain name can be logged. This can, in some
cases, increase latency of requests. This option correlates to the
log_fqdn directive.

Logging netmask
Defines what portion of the requesting client IP is logged in the
access.log. For privacy reasons it is often preferred to only log the
network or subnet IP of the client. For example, a netmask of
255.255.255.0 will log the first three octets of the IP, and fill the last
octet with a zero. This option configures the client_netmask directive.

66

SQUID and Webmin

6.8 Cache Options


The Cache Options page provides access to some important parts of
the Squid configuration file. This is where the cache directories are
configured as well as several timeouts and object size options (Figure
6-6).

Figure 6-6: Configuring Squids Cache Directories

The directive is cache_dir while the options are the type of filesystem,
the path to the cache directory, the size allotted to Squid, the number
of top level directories, and finally the number of second level
directories. In the example, I've chosen the filesystem type ufs, which
is a name for all standard UNIX filesystems. This type includes the
standard Linux ext2 filesystem as well. Other possibilities for this
option include aufs and diskd.

The next field is simply the space, in megabytes, of the disk that you
want to allow Squid to use. Finally, the directory fields define the upper
and lower level directories for Squid to use

67

SQUID and Webmin

6.9 Access Control


There are three types of option for configuring ICP access control.
These three types of definition are separated in the Webmin panel into
three sections. The first is labeled Access control lists, which lists
existing ACLs and provides a simple interface for generating and
editing lists of match criteria (Figure 6-7). The second is labeled Proxy
restrictions and lists the current restrictions in place and the ACLs they
effect. Finally, the ICP restrictions section lists the existing access rules
regarding ICP messages from other Web caches.

Figure 6-7: Access Control Lists

68

SQUID and Webmin

Access Control Lists


The first field in the table represents the name of the ACL, which is
simply an assigned name, that can be just about anything the user
chooses. The second field is the type of the ACL, which can be one of a
number of choices that indicates to Squid what part of a request
should be matched against for this ACL. The possible types include the
requesting clients address, the Web server address or host name, a
regular expression matching the URL, and many more. The final field is
the actual string to match. Depending on what the ACL type is, this
may be an IP address, a series of IP addresses, a URL, a host name,
etc.

Edit an ACL
To edit an existing ACL, simply click on the highlighted name. You will
then be presented with a screen containing all relevant information
about the ACL. Depending on the type of the ACL, you will be shown
different data entry fields. The operation of each type is very similar,
so for this example, you'll step through editing of the localhost ACL.
Clicking the localhost button presents the page that's shown in Figure
6-8

Figure 6-8: Edit an ACL

69

SQUID and Webmin

The title of the table is Client Address ACL which means the ACL is of
the Client Address type, and tells Squid to compare the incoming IP
address with the IP address in the ACL. It is possible to select an IP
based on the originating IP or the destination IP. The netmask can also
be used to indicate whether the ACL matches a whole network of
addresses, or only a single IP. It is possible to include a number of
addresses, or ranges of addresses in these fields. Finally, the Failure
URL is the address to send clients to if they have been denied access
due to matching this particular ACL. Note that the ACL by itself does
nothing, there must also be a proxy restriction or ICP restriction rule
that uses the ACL for Squid to use the ACL.

Creating new ACL


Creating a new ACL is equally simple (Figure 6-9). From the ACL page,
in the Access control lists section, select the type of ACL you'd like to
create. Then click Create new ACL. From there, as shown, you can
enter any number of ACLs for the list.

Figure 6-9: Creating an ACL

70

SQUID and Webmin

Available ACL Types


Browser Regexp
A regular expression that matches the clients browser type based on
the user agent header. This allows for ACL's operating based on the
browser type in use, for example, using this ACL type, one could
create an ACL for Netscape users and another for Internet Explorer
users. This could then be used to redirect Netscape users to a
Navigator enhanced page, and IE users to an Explorer enhanced page.
Probably not the wisest use of an administrators time, but does
indicate the unmatched flexibility of Squid. This ACL type correlates to
the browser ACL type.
Client IP Address
The IP address of the requesting client, or the clients IP address. This
option refers to the src ACL in the Squid configuration file. An IP
address and netmask are expected. Address ranges are also accepted.
Client Hostname
Matches against the client domain name. This option correlates to the
srcdomain ACL, and can be either a single domain name, or a list or
domain names, or the path to a file that contains a list of domain
names. If a path to a file, it must be surrounded parentheses. This ACL
type can increase the latency, and decrease throughput significantly on
a loaded cache, as it must perform an address-to-name lookup for
each request, so it is usually preferable to use the Client IP Address
type.

71

SQUID and Webmin

Client Hostname Regexp


Matches against the client domain name. This option correlates to the
srcdom_regex ACL, and can be either a single domain name, or a list
of domain names, or a path to a file that contains a list of domain
names. If a path to a file, it must be surrounded parentheses
Date and Time
This type is just what it sounds like, providing a means to create ACLs
that are active during certain times of the day or certain days of the
week. This feature is often used to block some types of content or
some sections of the Internet during business or class hours. Many
companies block pornography, entertainment, sports, and other clearly
non-work related sites during business hours, but then unblock them
after hours. This might improve workplace efficiency in some situations
(or it might just offend the employees). This ACL type allows you to
enter days of the week and a time range, or select all hours of the
selected days. This ACL type is the same as the time ACL type
directive.
Ethernet Address
The ethernet or MAC address of the requesting client. This option only
works for clients on the same local subnet, and only for certain
platforms. Linux, Solaris, and some BSD variants are the supported
operating systems for this type of ACL. This ACL can provide a
somewhat secure method of access control, because MAC addresses
are usually harder to spoof than IP addresses, and you can guarantee
that your clients are on the local network (otherwise no ARP resolution
can take place).

72

SQUID and Webmin

External Auth
This ACL type calls an external authenticator process to decide whether
the request will be allowed. Note that authentication cannot work on a
transparent proxy or HTTP accelerator. The HTTP protocol does not
provide for two authentication stages (one local and one on remote
Web sites). So in order to use an authenticator, your proxy must
operate as a traditional proxy, where a client will respond appropriately
to a proxy authentication request as well as external Web server
authentication requests. This correlates to the proxy_auth directive.

External Auth Regex


As above, this ACL calls an external authenticator process, but allows
regex pattern or case insensitive matches. This option correlates to the
proxy_auth_regex directive.

Proxy IP Address
The local IP address on which the client connection exists. This allows
ACLs to be constructed that only match one physical network, if
multiple interfaces are present on the proxy, among other things. This
option configures the myip directive.

Request Method
This ACL type matches on the HTTP method in the request headers.
This includes the methods GET, PUT, etc. This corresponds to the
method ACL type directive.

73

SQUID and Webmin

URL Path Regex


This ACL matches on the URL path minus any protocol, port, and host
name

information.

It

does

not

include,

for

example,

the

"http://www.swelltech.com" portion of a request, leaving only the


actual path to the object. This option correlates to the urlpath_regex
directive.
URL Port
This ACL matches on the destination port for the request, and
configures the port ACL directive.
URL Protocol
This ACL matches on the protocol of the request, such as FTP, HTTP,
ICP, etc.
URL Regexp
Matches using a regular expression on the complete URL. This ACL can
be used to provide access control based on parts of the URL or a case
insensitive match of the URL, and much more. This option is equivalent
to the url_regex ACL type directive.

Web Server Address


This ACL matches based on the destination Web server's IP address.
Squid a single IP, a network IP with netmask, as well as a range of
addresses

in

the

form

"192.168.1.1-192.168.1.25".

This

option

correlates to the dst ACL type directive.

Web Server Hostname


This ACL matches on the host name of the destination Web server.

74

SQUID and Webmin

Web Server Regexp


Matches using a regular expression on the host name of the
destination Web server.

6.10

Administrative Options

Administrative Options provides access to several of the behind the


scenes options of Squid. This page allows you to configure a diverse
set of options, including the user ID and group ID of the Squid process,
cache hierarchy announce settings, and the authentication realm
(Figure 6-10)

Figure 6-10: Administrative Options

Run as Unix user and group


The user name and group name Squid will operate as. Squid is
designed to start as root but very soon after drop to the user/group
specified here. This allows you to restrict, for security reasons, the
permissions that Squid will have when operating. By default, Squid will
operate as either nobody user and the nogroup group, or in the case of
some Squids installed from RPM as squid user and group. These

75

SQUID and Webmin

options

correlate

to

the

cache_effective_user

and

cache_effective_group directives.

Proxy authentication realm


The

realm

that

will

be

reported

to

clients

when

performing

authentication. This option usually defaults to Squid proxy-caching web


server, and correlates to the proxy_auth_realm directive. This name
will likely appear in the browser pop-up window when the client is
asked for authentication information.
Cache manager email address
The email address of the administrator of this cache. This option
corresponds to the cache_mgr directive and defaults to either
webmaster or root on RPM based systems. This address will be added
to any error pages that are displayed to clients.
Visible hostname
The host name that Squid will advertise itself on. This affects the host
name that Squid uses when serving error messages. This option may
need to be configured in cache clusters if you receive IP-Forwarding
errors. This option configures the visible_hostname.

Unique hostname
Configures the unique_hostname directive, and sets a unique host
name for Squid to report in cache clusters in order to allow detection of
forwarding loops. Use this if you have multiple machines in a cluster
with the same Visible Hostname.
Cache announce host, port and file
The host address and port that Squid will use to announce its
availability to participate in a cache hierarchy. The cache announce file
is simply a file containing a message to be sent with announcements.
76

SQUID and Webmin

These options correspond to the announce_host, announce_port, and


announce_file directives.

Announcement period
Configures the announce_period directive, and refers to the frequency
at which Squid will send announcement messages to the announce
host.

Most of the content in Chapter 6 is taken from Unix System Administration with
Webmin by Joe Cooper (2002) available online at
http://www.swelltech.com/support/webminguide/

77

Chapter

7. Analyzer
7.1 Structure of log file
In Fedora, the Squid log files are stored in the /var/log/squid directory
by default. It makes 3 log files which are:

Access log

Cache log

Store log

Throughout this section, each log attribute will be discussed including it


content as well as how these logs might help admin debugging
potential problems.

Access log
Location : /var/log/squid/access.log

Description

It contains entries of each time the cache has been hit or missed
when a client requests HTTP content.

78

Analyzer

The identity of the host making the request (IP address) and the
content they are requesting.

It also provides the expected time when content is being used


from cache and when the remote server must be accessed to
obtain the content.

It contains the http transactions made by the users.

Format
Option 1 : This option will be used if the emulate http daemon log is
off.
Native format (emulate_httpd_log off)
Timestamp Elapsed Client Action/Code Size Method URI Ident Hierarchy/From Content

Option 2 : This option will be used if the emulate http daemon log is
on.
Common format (emulate_httpd_log on)
Client Ident - [Timestamp1] "Method URI" Type Size

With:
Timestamp
The time when the request is completed (socket closed). The format is
"Unix time" (seconds since Jan 1, 1970) with millisecond resolution.
Timestamp1
When the request is completed
(Day/Month/CenturyYear:Hour:Minute:Second GMT-Offset)
Elapsed
The elapsed time of the request, in milliseconds. This is the time
between the accept() and close() of the client socket.

79

Analyzer

Client
The IP address of the connecting client, or the FQDN if the 'log_fqdn'
option is enabled in the config file.
Action
The Action describes how the request was treated locally (hit, miss,
etc).
Code
The HTTP reply code taken from the first line of the HTTP reply header.
For ICP requests this is always "000." If the reply code was not given,
it will be logged as "555."
Size
For TCP requests, the amount of data written to the client. For UDP
requests, the size of the request. (in bytes)
Method
The HTTP request method (GET, POST, etc), or ICP_QUERY for ICP
requests.
URI
The requested URI.
Ident
The result of the RFC931/ident lookup of the client username. If
RFC931/ident lookup is disabled (default: `ident_lookup off'), it is
logged as - .
Hierarchy
A description of how and where the requested object was fetched.

80

Analyzer

From
Hostname of the machine where we got the object
Content
Content-type of the Object (from the HTTP reply header).
The example of access.log file.

Figure 7-1 Access.log

From Figure 7-1, we know that the native format has been used. Here,
we try to understand each format fields over the contents of access.log
file. By taking the first line, we found the result as in Table 7-1

Format

Value

Timestamp

1173680297.727

Elapsed

450

Client

10.0.5.10

Action

TCP_MISS

Code

302

Size

786

Method

GET

URI

http://www.google.com/search?

Ident

Hierarchy

DIRECT

From

64.233.189.104

Content

text/html
Table 7-1 The format and its value

81

Analyzer

There are some elaborations on:


Timestamp

The timestamp represents in UNIX time with a millisecond


resolution. However, it can be converted into more readable form
by using this short Perl script:
#!
/usr/bin/perl -p
s/^\d+\.\d+/localtime

$&/e;

Action

The TCP_ codes (Table 7-2) refer to requests on the HTTP port
(usually 3128). Meanwhile the UDP_ codes refer to requests on
the ICP port (usually 3130)

Codes

Explanation

TCP_HIT

A valid copy of the requested object


was in the cache

TCP_MISS

The requested object was not in the


cache

TCP_REFRESH_HIT

The requested object was cached but


STALE. The IMS query for the object
resulted in "304 not modified"

TCP_REF_FAIL_HIT

The requested object was cached but


STALE. The IMS query failed and the
stale object was delivered

TCP_REFRESH_MISS

The requested object was cached but


STALE. The IMS query returned the new
content

82

Analyzer

TCP_CLIENT_REFRESH_MISS The client issued a "no-cache" pragma,


or

some

analogous

cache

control

command along with the request. Thus,


the cache has to re-fetch the object
TCP_IMS_HIT

The client issued an IMS request for an


object which was in the cache and fresh

TCP_SWAPFAIL_MISS

The object was believed to be in the


cache, but could not be accessed

TCP_NEGATIVE_HIT

Request for a negatively cached object,


e.g. "404 not found", for which the
cache

believes

inaccessible.

to
Also

know
refer

that
to

it

is
the

explainations for negative_ttl in your


squid.conf file
TCP_SWAPFAIL_MISS

The object was believed to be in the


cache, but could not be accessed

TCP_MEM_HIT

A valid copy of the requested object


was in the cache and it was in memory,
thus avoiding disk accesses

TCP_DENIED

Access was denied for this request

TCP_OFFLINE_HIT

The requested object was retrieved


from the cache during offline mode. The
offline mode never validates any object,
see offline_mode in squid.conf file.

UDP_HIT

A valid copy of the requested object


83

Analyzer

was in the cache


UDP_MISS

The requested object is not in this


cache

UDP_DENIED

Access was denied for this request

UDP_INVALID

An invalid request was received

UDP_MISS_NOFETCH

During "-Y" startup, or during frequent


failures, a cache in hit only mode will
return either UDP_HIT or this code.
Neighbours will thus only fetch hits

NONE

Seen

with

errors

and

cachemgr

requests

Table 7-2 TCP codes and Explanation

Code

These codes are taken from RFC 2616 and verified for Squid.
Squid-2 uses almost all codes except 307 (Temporary Redirect),
416 (Request Range Not Satisfiable) and 417 (Expectation
Failed)
Code

84

Explanation

000

Used mostly with UDP traffic

100

Continue

101

Switching Protocols

102

Processing

200

OK

Analyzer

201

Created

202

Accepted

203

Non-Authoritative Information

204

No Content

205

Reset Content

206

Partial Content

207

Multi Status

300

Multiple Choices

301

Moved Permanently

302

Moved Temporarily

303

See Other

304

Not Modified

305

Use Proxy

[307

Temporary Redirect]

400

Bad Request

401

Unauthorized

402

Payment Required

403

Forbidden

404

Not Found

405

Method Not Allowed

406

Not Acceptable

407

Proxy Authentication Required

408

Request Timeout

409

Conflict

410

Gone

411

Length Required

412

Precondition Failed

413

Request Entity Too Large

414

Request URI Too Large

415

Unsupported Media Type

85

Analyzer

[416

Request Range Not Satisfiable]

[417

Expectation Failed]

*424

Locked

*424

Failed Dependency

*433

Unprocessable Entity

500

Internal Server Error

501

Not Implemented

502

Bad Gateway

503

Service Unavailable

504

Gateway Timeout

505

HTTP Version Not Supported

*507

600

Insufficient Storage

Squid header parsing error

Method

Squid recognizes several request methods as defined in RFC


2616. Newer versions of Squid (2.2.STABLE5 and above) also
recognize RFC 2518 ``HTTP Extensions for Distributed Authoring
-- WEBDAV'' extensions (Table 7-3).

method

defined

cachabil. meaning

GET

HTTP/0.9

possibly

object

retrieval

and

simple

searches
HEAD

HTTP/1.0

possibly

POST

HTTP/1.0

CC

metadata retrieval

or submit data (to a program)

Exp.
PUT

HTTP/1.1

never

upload data (e.g. to a file)

DELETE

HTTP/1.1

never

remove resource (e.g. file)

TRACE

HTTP/1.1

never

appl. layer trace of request route

86

Analyzer

OPTIONS

HTTP/1.1

never

CONNECT

HTTP/1.1r3 never

request available comm. options


tunnel SSL connection

ICP_QUERY Squid

never

used for ICP based exchanges

PURGE

Squid

never

remove object from cache.

PROPFIND

rfc2518

retrieve properties of an object

PROPATCH

rfc2518

change properties of an object

MKCOL

rfc2518

never

create a new collection

COPY

rfc2518

never

create a duplicate of src in dst

MOVE

rfc2518

never

atomically move src to dst

LOCK

rfc2518

never

Lock

an

object

against

modifications
UNLOCK

rfc2518

never

unlock an object

Table 7-3 List of Methods

87

Analyzer

Hierarchy
The following hierarchy codes are used in Squid-2 (Table 7-4):

Codes

Explanation

NONE

For TCP HIT, TCP failures, cachemgr


requests and all UDP requests, there is no
hierarchy information.

DIRECT

The object was fetched from the origin


server.

SIBLING_HIT

The object was fetched from a sibling


cache which replied with UDP_HIT.

PARENT_HIT

The object was requested from a parent


cache which replied with UDP_HIT.

DEFAULT_PARENT

No ICP queries were sent. This parent was


chosen because it was marked ``default''
in the config file.

SINGLE_PARENT

The object was requested from the only


parent appropriate for the given URL.

FIRST_UP_PARENT

The object was fetched from the first


parent in the list of parents.

NO_PARENT_DIRECT

The object was fetched from the origin


server, because no parents existed for the
given URL.

FIRST_PARENT_MISS

The object was fetched from the parent


with the fastest (possibly weighted) round

88

Analyzer

trip time.
CLOSEST_PARENT_MISS

This

parent

was

chosen,

because

it

included the lowest RTT measurement to


the origin server. See also the closestsonly peer configuration option.
CLOSEST_PARENT

The parent selection was based on our


own RTT measurements.

CLOSEST_DIRECT

Our own RTT measurements returned a


shorter time than any parent.

NO_DIRECT_FAIL

The

object

could

not

be

requested

because of a firewall configuration, see


also never_direct and related material,
and no parents were available.
SOURCE_FASTEST

The origin site was chosen, because the


source ping arrived fastest.

ROUNDROBIN_PARENT

No ICP replies were received from any


parent. The parent was chosen, because
it was marked for round robin in the
config file and had the lowest usage
count.

CACHE_DIGEST_HIT

The peer was chosen, because the cache


digest predicted a hit. This option was
later replaced in order to distinguish
between parents and siblings.

CD_PARENT_HIT

The parent was chosen, because the


89

Analyzer

cache digest predicted a hit.


CD_SIBLING_HIT

The sibling was chosen, because the


cache digest predicted a hit.

NO_CACHE_DIGEST_DIR

This output seems to be unused?

ECT
CARP

The peer was selected by CARP.

ANY_PARENT

part of src/peer_select.c:hier_strings[].

INVALID CODE

part of src/peer_select.c:hier_strings[].

Table 7-4 Hierarchy Codes in Squid-2

Cache log
Location : /var/log/squid/cache.log

Description

It contains various messages such as information about Squid


configuration, warnings about possible performance problems
and serious errors.

Error and debugging messages of particular squid modules

Format
[Timestamp1]| Message
With
Timestamp1
When the event occurred (Year/Month/Day Hour:Minute:Second)

90

Analyzer

Message
Errors

Description of the event

ERR_READ_TIMEOUT

The

remote

site

or

network

is

unreachable - may be down.


ERR_LIFETIME_EXP

The remote site or network may be


too slow or down.

ERR_NO_CLIENTS_BIG_OBJ

All

clients

went

away

before

transmission completed and the object


is too big to cache.
ERR_READ_ERROR

The remote site or network may be


down.

ERR_CLIENT_ABORT

Client

dropped

connection

before

transmission completed. Squid fetches


the Object according to its settings for
`quick_abort'.
ERR_CONNECT_FAIL

The remote site or server may be


down.

ERR_INVALID_REQ

Invalid HTTP request

ERR_UNSUP_REQ

Unsupported request

ERR_INVALID_URL

Invalid URL syntax

ERR_NO_FDS

Out of file descriptors

ERR_DNS_FAIL

DNS name lookup failure

91

Analyzer

ERR_NOT_IMPLEMENTED

Protocol Not Supported

ERR_CANNOT_FETCH

The requested URL can not currently


be retrieved.

ERR_NO_RELAY

There is no WAIS relay host defined


for this cache.

ERR_DISK_IO

The system disk is out of space or


failing.

ERR_ZERO_SIZE_ OBJECT

The

remote

server

closed

the

connection before sending any data.

ERR_FTP_DISABLED

This

cache

is

configured

to

NOT

retrieve FTP objects.


ERR_PROXY_DENIED

Access

Denied.

The

user

must

authenticate himself before accessing


this cache.
Table 7-5 List of Error Messages

92

Analyzer

The example of cache.log file (Figure 7-2).

Figure 7-2 Cache.log

Store log
Location : /var/log/squid/store.log

Description

It contains the information and status of [not] stored objects

Format
Timestamp Tag Code Date LM Expire Content Expect/Length Methods Key

With:
Timestamp
The time entry was logged. (Millisecond resolution since 00:00:00 UTC,
January 1, 1970)
Tag
SWAPIN (swapped into memory from disk), SWAPOUT (saved to disk)
or RELEASE (removed from cache)
Code
The HTTP replies code when available. For ICP requests this is always
"0". If the reply code was not given, it will be logged as "555."

93

Analyzer

The following three fields are timestamps parsed from the HTTP reply
headers. All are expressed in Unix time (i.e.(seconds since 00:00:00
UTC, January 1, 1970). A missing header is represented with -2 and an
unparsable header is represented as -1.
Date
The time captures from the HTTP Date reply header. If the Date
header is missing or invalid, the time of the request will be used
instead.
LM
The value of the HTTP Last-Modified: reply header.
Expires
The value of the HTTP Expires: reply header.
Content
The HTTP Content-Type reply header.
Expect
The value of the HTTP Content-Length reply header. The Zero value
will be returned if the Content-Length was missing.
/Length
The number of bytes of content actually read. If the Expect is nonzero, and not equal to the Length, the object will be released from the
cache.
Method
The request method (GET, POST, etc).

94

Analyzer

Key
The cache key. Often this is simply the URL. Cache objects which never
become public will have cache keys that include a unique integer
sequence number, the request method, and then the URL.
( /[post|put|head|connect]/URI )
The example of store.log file (Figure 7-3).

Figure 7-3 Store.log

Based on Figure 7-3, we try to understand each format fields over the
contents of store.log file. By taking the second line, we found that
(Table 7-6):

Format

Value

Timestamp

1173680297.727

Tag

Release

Code

-1

Date

FFFFFFFF

LM

7832CBDDD1604B89D0F75A2437F37AD7

Expire

302

Content

1173680306 -1 -1 text/html

Expect

-1

/Length

/278

Methode

GET

Key

http://www.google.com/search?
Table 7-6 Format in Store.log

95

Analyzer

7.2 Methods
Log Analysis Using Grep Command
The log files also can be analysed using Linux or UNIX command such
as grep. It is used to filter the required information from any log files.
By using a terminal, follow the following commands in order to start
analysis the related log file.
For example:
# cat /var/log/squid/access.log | grep www.google.com
By referring Figure 7-4, the output shows the result of grep command
for the access.log file. The same technique can be applied for cache.log
and store.log files.

Figure 7-4 Analysis the Access.log using Grep command

Log Analysis Using Sarg-2.2.3.1


Basically, the preferred log file for analysis is the access.log file in the
native format. We choose to use Squid Analysis Report Generator
(Sarg) as a tool. It is used to analyze the users pattern concerning the
Internet surfing. It generates reports in html including many fields
such as users, IP addresses, bytes, sites and times.
This tool can be downloaded from:
http://linux.softpedia.com/get/Internet/Log-Analyzers/sarg-102.shtml
96

Analyzer

7.3 Setup Sarg-2.2.3.1


Step:
Download software named Sarg-2.2.3.1.tar.gz for Linux and Unix
environment.
Make a new directory called installer located in the root path.
# mkdir /installer
Copy the downloaded file into the installer directory
# copy sarg-2.2.3.1.tar.gz

/installer

Then, go into the directory and extract file Sarg-2.2.3.1.tar.gz using


the following command.
# tar zxvf sarg-2.2.3.1.tar.gz
After successfully extracted, go into sar-2.2.3.1 directory and start
configure it. Follow these command:
#
#
#
#

cd /installer/sar-2.2.3.1
./configure
make
make install

NOTE: Make sure the Squid already started before run the following
script.

Go into sarg-2.2.3.1 directory, run the sarg script.


# ./sarg
The generated result will be kept at /var/www/html/squid-reports. It is
recommended to view using GUI enviroment.

97

Analyzer

7.4 Report Management Using Webmin


For managing the report, we choose to use Webmin which is a webbased interface for system administration for Unix. In our case, it helps
admin to set some information such as the location of log source and
report destination, the format of generated report, the size of report
and also the schedule of automatic report to be generated.
Step:
1. Make sure the webmin is already setup in the server. Then, open
the browser and type http://127.0.0.1:10000/ to find the webmin.
After that, login the webmin.

Figure 7-5 Login

2. Choose Server tab, and then click on Squid Analysis Report


Generator. There are four (4) modules being offer such as Log
Source and Report Destination, Report Option, Report Style and
Scheduled Report Generation.

98

Analyzer

Figure 7-6 Sarg Main Modules in Webmin

3. Click on Log Source and Report Destination icon. In this module,


admin allows to set the source of log file and also define the
destination of generated report. For report maintenance, it also
allows admin to set the number of report to keep in certain location
and acknowledgement can be sent to admins e-mail.
Note: Please check the sarg.conf file which is located in
/usr/local/sarg/sarg.conf to ensure the correct path for locating
the source of log files.

Figure 7-7 Setting on Source and Destination Report

99

Analyzer

After setting the changes, click on Save button.


4. Click on Report Option icon. In this module, admin can manages the
pattern of generated report including data ordering, size of data
displayed, data format and log file rotation.
There are several types of report can be generates depending on
the implementation of access control list (ACL) that has been set
before.
For log file rotation, it becomes important to ensure enough disk
space to handle log file storage especially when it involves the long
term evaluations. This can covers more in Scheduled Report
Generation.

100

Analyzer

Figure 7-8 Setting on Report Content and Generation Option


101

Analyzer

5. Click on Report Style icon. Here, it allows admin to make the


generated report looks more interesting in terms of language, title
and other common style setting.

Figure 7-9 Setting on HTML Report Style and Colour Option

6. Click on Scheduled Report Generation icon. In this module, admin


allows to define the frequency of generated report by enabling the
selected or default schedule stated.
Regarding to rotate feature in Squid, it is recommended to apply
simple schedule. During a time of some idleness, the log files are
safely transferred to the report destination in one burst. Before
transport, the log files can be compressed during off-peak time. On
the destination, the log file is concatenated into one file. Therefore
one file for selected hour is the yield. However, it is depends on
companys requirement on how to generate report.

102

Analyzer

Figure 7-10 Setting on Scheduled Reporting Options

7. After setting some information in Scheduled Report Generation, the


following statement will be displayed on the main page.

Figure 7-11 Generate Report Setting

103

Analyzer

There are some considerations to be taken:


1.

Should never delete access.log, store.log, cache.log while


Squid is running. There is no recovery file.

2.

In squid.conf file, the following statements can be applied if


admin wants to disable certain log file. For example:
To disable access.log:
cache_access_log /dev/null
To disable store.log:
cache_store_log none
To disable cache.log:
cache_log /dev/null
However, the cache.log is not suitable to be disabled because it
has

file

messages.

104

contains

many

important

status

and

debugging

Analyzer

7.5 Log Analysis and Statistic


After running the Sarg analyser, the reports will generated for
access.log. This can be found in /var/www/html/squid-reports.

Figure 7-12 Collection of Squid Report for Access.log

From Figure 7-12, throughout this example we found that there are
three (3) reports generated. Basically, the latest version has no
number at the end of the filename. Each time the access log file being
analysed, the filename will renamed and an incremental number will be
placed automatically at the end of the file.
For example 2007Mar22-2007Mar22.2 was the first report had been
generated compared to 2007Mar22-2007Mar22 which indicated as the
latest version report.

105

Analyzer

Based on (Figure 7-13), the index.html file shows the list of reports
that have been generated by Sarg. To get more detail information for a
specific report, we need to click on the selected file name.

Figure 7-13 Summary of Squid reports

For example, a folder named as 2007Mar22-2007Mar22 has been


selected and opened. From (Figure 7-14), there are several standard
files which can be found in all Squid reports. Briefly, there are five (5)
html reports show statistical information regarding to index, denied,
download, siteuser and topsites. Besides, the folder also presents
collection of report for specific user by their IP addressess.

106

Analyzer

Figure 7-14 Contents of 2007Mar22-2007Mar22 as example

The following figure will show the html reports:


1. Index html

2.
Figure 7-15 Index html

107

Analyzer

3. Denied html

Figure 7-16 Denied html

4. Download html

Figure 7-17 Download html

108

Analyzer

5. Sites and Users

Figure 7-18 Siteuser html

109

Analyzer

6. Top 100 Sites

Figure 7-19 Topsites html

If we click on specific IP address, we will view all information as in


Figure 7-20

Figure 7-20 Reports generated for specific user (IP Address)

110