GeodSoft How-To: Homegrown Intrusion Detection

		GeodSoft

Homegrown Intrusion Detection

Summary
File Change Tracking
Process Tracking
The Monitored System
Regular Expressions
Track More Than Processes
The Analysis Script
Alerts
Monitoring Your Systems
Conclusion

This section deals with two specific approaches to host based intrusion detection, tracking changes to executable and system configuration files and actively monitoring all executing processes. Today much of the emphasis regarding intrusion detection focuses on network based intrusion detection. Network intrusion detection watches for hostile patterns in network traffic and is typically real time. Network intrusion detection typically protects an entire network or at least all the hosts on a LAN segment. Network intrusion detection will often detect an attack while it is in progress. After a firewall which should be regarded as the first line of defense in most environments, network based intrusion detection is often regarded as the second line of defense. There are numerous commercial products that perform network intrusion detection. Snort is probably the best known, open source, network intrusion detection product.

As important as network intrusion detection is, host based intrusion detection should not be ignored. Security threats are as likely to be internal as external. Some network based intrusion detection includes monitoring of internal activities but some network products ignore this area. Regardless of how comprehensive any network intrusion detection attempts to be, no network based monitoring is likely to detect unauthorized local activities by users or administrators authorized to use the systems being monitored.

Traditionally, host based intrusion detection focuses primarily on changes made to the system being monitored. Specifically it attempts to identify changes to key files, primarily system configuration files and selected executables. It is thus reactive. In addition to the traditional monitoring of file changes, I'll show how to actively track processes that are executing on a monitored machine and display a warning when an unfamiliar process executes or a standard process stops executing. The process monitoring is still reactive in that it can only detect processes that are already in an "abnormal" state. It can be set to operate as frequently as desired so that it can be almost real time.

When I was implementing intrusion detection on my systems, the best known product for host based intrusion detection was (and still is) Tripwire. The commercial product was simply too expensive for my site and the open source version lacked the ability to automatically store and analyze the database of the tracked machine on a remote machine. Tripwire primarily analyzes files on disk and to the best of my knowledge does not track processes in memory.

It's been my experience over the years, that if you posses good analytic skills and proficiency with a development language or tool suitable to the task, you can often build a new tool from scratch, faster than you can figure out and adapt someone else's code. This is dependent on the intrinsic complexity of the task to be performed and how many unneeded features have complicated the code. A product to perform a simple task that's been made complex with numerous configuration options to adapt to diverse environments might better be done as custom code written for your environment. Host based intrusion detection is such a task.

The core function of Tripwire is to build a database of checksums of files on the machine being monitored for later comparison to detect changes in the files being monitored. Since building such a database borders on the trivial and the versions of Tripwire available to me lacked a capability I regarded as essential, I chose to write my own intrusion detection using shell scripts and Perl, from scratch without looking at how Tripwire or any competing products functioned. The results are just over 500 lines of new shell and Perl scripts and configuration files (including comments and blank lines) that monitor both files and processes on the systems being monitored. Databases or logs are saved and analyzed on machines other than the machines being monitored and alerts for conditions that should be investigated are displayed automatically.

Part of the point of this page is to show that the basics of host based intrusion detection are really quite simple. The details of my particular implementation are not particularly important other than as a working example. Much of the complexity of a product like Tripwire comes from the need to provide numerous configuration options required to adapt to diverse environments. Hard coding numerous choices to meet the needs of a single environment greatly simplifies the problem. I would not consider building network intrusion detection from scratch because it seems so much more complex but this may simply be due to my lack of familiarity with the necessary network programming techniques. On the other hand, I see no reason why any experienced system administrator with good script skills, should not consider doing their own host based intrusion detection, if no product available at an acceptable price meets their requirements. The scripts or concepts presented here may be a sufficient starting point.

The techniques used here are applicable to a hardened server such as those described in How to Harden an OpenBSD Server. Such a server will have a stable, simple configuration with a small and infrequently changing user population and a fairly constant set of processes that may or may not be executing at any particular moment. These techniques would not work well on a general purpose multi user host where there will be almost daily system changes (including user names and passwords), frequently changing scripts and diverse user initiated processes. The techniques could be extended to a general purpose host but this would require a significant increase in complexity. Primarily, system components that change regularly would need to be filtered out from those that are monitored so routine changes would not trigger alarms. A product like Tripwire might come with sufficient flexibility to adapt to such systems.

In the form presented here, the alarm mechanism is an interactive prompt displayed to a user. Thus an operator or administrator who is logged in 24 hours a day is assumed. The scripts could be adapted to other alarm mechanisms for different environments. The monitoring is currently performed on a Windows workstation. The monitoring scripts should run on any system with a scheduling facility, Perl and a GUI interface but the details of the prompts will vary with different systems. The monitored systems are limited to UNIX servers or systems with UNIX like file attributes, directory structure and process tracking. Specifically these techniques cannot be used to monitor Windows NT or 2000 servers. Windows lacks an easy and reliable method of identifying executable files, a highly organized directory tree and the ability to track the command line which started each process, all of which these scripts depend on. Also, the generation of the content that is analyzed is so simple that it's done with shell scripts. If Windows didn't lack the necessary capabilities at the operating system level, the shell scripts could easily be converted to Perl.

It is highly desirable, though not essential, that the monitored computers be time synchronized with computers performing the monitoring. If the computers are time synchronized, then the monitoring computer can analyze data from the monitored computers within a few seconds of when it is created. This allows more frequent monitoring so that the process monitoring can approach real time. If the computers are not time synchronized, then lags equal to or greater than the largest amount by which the computer clocks can drift apart, must be built into the monitoring system.

Summary

The host based intrusion detection described here consists of two completely independent functions. The simpler part is the file change tracking. This requires a cryptographically secure method for creating check sums of files on disk. Md5 on OpenBSD and md5sum on Linux were used. OpenBSD includes other methods which could be substituted but the old UNIX standby, cksum, is not adequate. A simple shell script creates a text file of check sums that is ftp'd to another system where the most recent file is compared with the previous file. If there are any differences, the diff result is displayed. If there are no differences, then nothing is displayed. If check sum results are saved on non modifiable media, check sum comparisons over longer intervals can be made manually if suspicious activity is ever detected.

The process monitoring makes use of the command "ps -ax" run on a periodic schedule via cron. This command shows every process currently executing, the command line that invoked it, the terminal it's running on and a status. The output of the command is redirected to a file which is transferred to the monitoring system. There the files contents are matched against a database of regular expressions that identify all the processes that are normal for the system being monitored. Some process that run continuously and are necessary for the proper functioning of the system are identified as required. If any process is found that is not identified as a normal process or any required process is missing, an operator prompt is displayed identifying the unknown or missing process or processes.

File Change Tracking

Changes to tracked files begins with a simple shell script that creates a text file containing check sums of the files being monitored. The script begins by changing to a local directory where the check sum files are created and saved. This directory is hard coded. This is the kind of configuration choice that requires a configurable option in a product that runs on diverse systems. Hopefully, at a single site, there is enough standardization of locally created directory structures that a hard coded script can be used on different systems without change.

All the output from the script is redirected to a file that is named in the form ckyymmdd.log where yymmdd is the two digit year, month and day. This script can only be used at most once a day. A script that would run two or more times a day, would need to adjust the file naming convention appropriately.

The first file that is checked is the kernel loaded at boot time. In this example this is the only hard coded file to be checked. In addition to hard coding their names, specific files to be checked could be read from a file or database. This would increase complexity and potentially provide an intruder with information that might be used to evade detection or even alter the behaviour of the monitoring system. Locating the configuration file or database at a secure remote location would add further complexity.

The other files that are checked use the find program to pick files that meet specific criteria to be checked. In this script four groups of files are checked. First every file in and under the /etc directory is checked. Then every file on the system that is either SUID or GUID is checked. Then every executable file on the system is checked regardless of where it resides. There is considerable redundancy between these selections. Any executable in /etc or any SUID or GUID file will appear twice in any listing as well as twice in the diff comparisons. Sorting the output and filtering it through uniq could be used to remove duplicate entries. Because these files are likely to be more important, I chose not to go to the extra effort to remove the duplication.

The use of find and very simple broad criteria make it very difficult for an intruder to alter the behaviour of the system even if they understand it. The breadth of criteria make it almost impossible to make meaningful system changes without triggering an alert. This also means that a typical general purpose host or server with frequent changes cannot be monitored in such a fashion without causing alerts to become a routine occurrence which negates the value of the system. The simplest way for an intruder to defeat the system is to modify the script to keep sending the same check sum file day after day. Thus those responsible for monitoring, must investigate if any system changes are made and no alert is triggered.

This sample script is from a web server. In addition to the directories and file types that I would monitor on any system, all files in the directory that contains web configuration data are also monitored. Each different server is likely to have some specific files or directories that should be monitored in addition to those that are regarded as standard. If there are more than a few servers, it would be better to move system configuration specific choices out of the script and into an external configuration file or database.

The last few lines of the script use ftp to transfer the output file to a remote system. The "-n" option of ftp suppresses the user name and password prompt; instead ftp gets these from the user command which is redirected into ftp via standard input from the shell script. Since a username and password will be laying around in a plain text script file, this user should have as few privileges as possible on the receiving system. Also since these scripts are most likely to be run as root, they should also only be root readable.

After logging into the remote FTP server, the script changes to a directory, part of which is based on the host name from which the script is running. Once again a choice that could have been a configurable option has been hard coded to refer to a standard directory location.

A fairly simple Perl script is used to compare successive check sum files. The script opens with a series of function calls that are executed if the script can successfully change into a series of hard coded directories which correspond to hosts being monitored. The bulk of the logic in this script is to determine the file names of the files to be compared. Once again the script logic is kept simple by avoiding unnecessary options. This script can only compare files created on a daily basis. Here the current day and the preceding day in yymmdd format are determined. Month and year rollovers are accounted for but leap years are not.

If the current day's file is missing, the workstation alarm is sounded. This is very simple but not very robust. If the operator or administrator is not present when the script executes, they will not know a necessary file was not available. There will be no practical difference between a comparison that can't be made and a comparison that was made with no discrepancies between the compared files. A prompt that was displayed until it was acknowledged by an operator would be better. The script could be executed more than once to increase the likelihood that a missing file will be noticed by an operator but this would also interrupt the operator multiple times on any day when there were tracked changes on any system being monitored.

The final piece uses diff to compare two successive files and to capture the result of the comparison. If the captured result is not zero length, the workstation alarm is sounded and the result written to a file and displayed on the workstation monitor. The current directory is included at the top of the saved output to help identify the host that has had changes to tracked files.

When a tracked file has changed there is nothing to tell the person viewing the prompt what has changed. In the sample listing below, the lines starting with the equal signs were part of the line that now ends with "checkxyz)":

/alert/bsd/cksums

446c446
< MD5 (/usr/local/bin/checkxyz) 
      = d904fe11f370c549ffe5805269d4e439
---
> MD5 (/usr/local/bin/checkxyz) 
      = 3e01891c914d6b28138f098dde16f022

Understanding this output requires a familiarity with diff output. Here the output indicates that the script, /usr/local/bin/checkxyz has changed. More precisely line 446 has changed from the first listed line to the second, i.e. the check sum for the file /usr/local/bin/checkxyz has changed from d904fe11f370c549ffe5805269d4e439 to 3e01891c914d6b28138f098dde16f022. The actual values are irrelevant. All that matters is they are different indicating that checkxyz has changed. Paired lines starting with "<" and ">" indicate a changed file. A new file would be indicated by a single line starting with ">" and a deleted file with a single line starting with "<".

Actually identifying the content that has changed would require comparing the current file to a backup or archived copy of the file. This intrusion detection system is simply intended to alert the person seeing the prompts what files have been changed, added or removed from the monitored system. That person needs to be familiar with what is changing on the monitored systems. For example if a new user is added to a system, one would expect /etc/passwd and all related files to change. On a BSD system one would not expect system configuration files such as /etc/rc, /etc/rc.conf and /etc/inetd.conf to change except very infrequently when a configuration change was made to the system. The binary executable files in /bin, /sbin, /usr/bin, /usr/sbin, etc. should never change unless the OS is upgraded or patched. As long as the monitoring is being performed by persons familiar with the administration of the monitored systems, unexpected changes to any of the tracked files are a strong indicator of an intrusion. At a minumum, unexpected changes need to be investigated and understood.

Process Tracking

Process tracking is somewhat more complicated than tracking file changes. The analysis is done by the most complex script that is examined here. In addition, the script reads a structured text file containing regular expressions and control information to determine what processes warrant an alert.

The script consists of four major pieces. The first 100 or so lines read the file of regular expressions into hashes for use throughout the remaining execution of the script. The next 70 some lines are an endless loop that executes until the script is terminated externally. On each pass, all files from remote monitored systems are compared against the regular expressions saved in hashes. If an unknown process is found or a required process is missing, an error message is built and passed to the last piece which is responsible for the display of the error message. The third piece is the time_wait function that is called before the first execution of the main loop and then from the bottom of the main loop. This function determines the next time that monitored systems should create files listing executing processes and waits until these files should be available on the monitoring system. Time_wait returns to the calling process, the mmddHHMM part of the filenames that need to be read and analyzed.

The Monitored System

The piece that runs on the monitored systems is extremely simple. The following script is all that's needed:

cd /var/local/logs/ps
ps -ax > `date +%m%d%H%M`.log
ftp -n 192.168.28.86 <<EOT
user ftpuser1 cog9Fow
ascii
cd alert/bsd/ps
put `date +%m%d%H%M`.log
bye

The output of a "ps -ax" command is redirected to a file that is transferred to the monitoring system. The output filename is mmddHHMM.log where mmddHHMM is month, day, hour and minute.

Regular Expressions

The structured text file contains three kinds of non comment lines. Lines that start with "HOSTS", identify the names of hosts that are to be monitored. Lines that start with "HOST", have the name of a specific host and process IDs for processes that are either allowed or required on the named host. The host and hosts information may be spread across multiple lines. Lines that start with a numeric sequence include one regular expression that is used to match lines in the file being analyzed which for the most part is output of the "ps -ax" command; the numeric sequence is a process ID and must be unique. These process IDs are in no way related to host process numbers. Each process line must contain a complete regular expression. Other lines are treated as comments and ignored. The order of lines in the file does not matter.

In the sample file the line "HOSTS=bsd-req,anotherhost-req,host3-opt,four-req" lists 4 hosts named bsd, anotherhost, host3 and four. Three of the hosts are expected to be online continuously and an alarm will be triggered if the expected output of the three hosts tagged with "-req" is not available when expected. Host3, tagged with "-opt", is optional; it's a test system that may or may not be online at any particular time. If the expected output is not available, no alarm is triggered. If the output is available, it is analyzed in the same manner as that from any other host. In these examples four is a Red Hat 6.2 Linux system and the other systems are OpenBSD 2.7 systems.

The lines that start with "HOST" have the host name separated from the rest of the line with equal signs and following the second equal sign is a comma separated list of process IDs. Each process ID is followed by a dash and an "a" or "r". Processes with an "a" are allowed or recognized and may or may not be present. Processes with an "r" are required and expected to be present as long as the host is up. In the sample line "HOST=four=401-r,402-r,403-a,404-r,405-a" processes 401, 402 and 404 are required on host four while 403 and 405 are allowed.

Following the opening HOSTS line, the sample file continues with the following lines:

HOST=bsd=6-r
HOST=anotherhost=6-r
HOST=host3=6-r
HOST=four=6-r
6~^  PID TTY?\s+STAT\s+TIME COMMAND$

Lines that begin with one or more digits contain a process ID separated from a regular expression by a tilde. Everything that appears after the tilde including any leading, embedded or trailing spaces is part of the regular expression. The line ending newline is not part of the regular expression. In the sample line, process 6 really isn't a process. (The odd first process ID will be explained later.) It's the header line output by the "ps -ax" command. This regular expression will match the output of "ps -ax" on Red Hat 6.2 Linux and OpenBSD 2.7. It may need adjustment for other operating systems. This is a very simple regular expression consisting mostly of literal text. On Linux, the terminal column is indicated by TTY and on OpenBSD it's just TT so the Y is optional (question mark). The amount of white space ("\s+") between TT or TTY and STAT and between STAT and TIME is variable on the two systems. On both Linux and OpenBSD the line starts with two spaces and ends with the "D" of COMMAND with no trailing white space.

The header line of "ps -ax" output is not a required process on any system but it is marked as required on all four monitored systems. Something would be odd if the output of "ps -ax" changed and thus should be investigated. If the format of the line changed and it was an "allowed process" an alarm would still be triggered but if the line was omitted there would be no alarm. Marking the line required assures that an alarm will be triggered if the format changes or the line disappears. If the format changed, for example a trailing space sometimes appeared at the end of the line, the regular expression would be adjusted to allow either variation. Generally, I try to make the regular expressions as specific as possible, loosening them only when "normal" conditions start triggering alarms. Since the format allows multiple HOST lines, I group the lines specifying allowed and required processes for each host near the matching regular expression lines for clarity. The four HOST lines before "process" 6 indicate that it is required on each system.

The next two groups of lines identify kernel processes on OpenBSD 2.7 and 2.8 and Red Hat 6.2 Linux. These processes appear to have predictable process numbers on both systems; for both Linux and OpenBSD systems, leading spaces and specific process numbers are listed. On Linux, background processes or daemons are listed with a single question mark for the terminal and on OpenBSD a double question mark. After white space, the process status is listed. This varies from process to process. Status of kernel processes appears to show more variability on OpenBSD than Linux. The statuses shown here may not be complete. Because the monitored systems are booted infrequently and changing statuses following a reboot may not be captured by the monitoring process, there is no way to predict how long it will take to identify all valid statues for each process.

Under OpenBSD 2.7 there were 5 kernel processes that always appeared. In OpenBSD 2.8 four of these no longer normally appear. They are the swapper, pagedaemon, update and apm0 processes. Process 1, /sbin/init, is always present in both 2.7 and 2.8.

It would be simple to put a "\S+" in the regular expression where the status is listed. This would match any possible status (one or more contiguous non white space characters). As I've already said, I prefer to keep my regular expressions as specific as practical. When a new status appears on an allowed process, I want to know about it and will adjust the regular expression if the status looks like a reasonable status variation on an existing processes and not a status on an new process with a name similar to an already tracked processes. For example "+" (foreground) and "X" (traced or debugged) would be curious statuses for kernel process on a production system. Also I'd want to know about any process that had a "Z" (zombie) status. By carefully building the regular expressions used to match the output of ps (or other commands) a lot can be learned about what is happening on a monitored system and this goes well beyond simply identifying unusual processes or missing standard processes.

Process times on OpenBSD typically have one or two digits followed by colon, two digits a period and two more digits as indicated by the "[0-9]?[0-9]:[0-9]{2}\.[0-9]{2}" piece of all the OpenBSD specific regular expressions. Red Hat 6.2 Linux process times lack the period and final two digits as indicated by "[0-9]?[0-9]:[0-9]{2}" in the Linux specific regular expressions.

The next sets of lines show some typical OpenBSD and Linux daemons started at boot time. Following the tilde that separates the process ID from the regular expression, all these lines start with "^ {0,4}[0-9]{1,5} " indicating the lines begin with zero to four spaces followed by one to five digits followed by a space. Both groups of regular expressions include web server processes which are required on bsd and four which are web servers and not allowed on other machines.

Next are some cron jobs for both OpenBSD and Linux systems. Since cron jobs tend to be both site and machine specific, nearly all those from my systems have been deleted. Any machine using the process tracking described here will have the monitoring process running. Unlike other cron jobs which may or may not be running when the monitoring process runs, the monitoring process will always see itself in the "ps -ax" output. Any components that will be active when "ps -ax" is executed should be either allowed or required processes or there will always be alerts; it makes no practical difference whether these are allowed or required. Here the /usr/local/bin/wps is the script that starts the monitoring. (The wps name will be explained later.) On OpenBSD systems, an intermediate shell is spawned in addition to the script that runs the monitoring process and the "ps -ax" command. All three pieces need to be allowed or required to prevent constant alerts.

The only console only processes shown are those that control the virtual consoles on OpenBSD and Linux machines. If the monitored systems had processes that for either policy or technical reasons should only be run from the local console, this would be a good place to add them. The "tty[1-6]\s+" on Linux and "C[0-5]" on OpenBSD replace the single and double question marks that indicate the terminal for background processes. On Linux the six virtual consoles are indicated by tty and a digit one through six followed by a variable amount of white space. On OpenBSD, the six virtual consoles are indicated by an upper case "C" and the digits zero through five.

The regular expressions for several interactive OpenBSD processes are shown. On Linux, network logins or phantom terminal sessions are indicated by pts for the terminal in the regular expression "(pts\/[1-4]|tty[1-6])" which here allows up to four network sessions. On a busier system the expression would need to be adjusted to account for a reasonable number of simultaneous network logins. For example "[1-3]?[0-9]" would match any terminal number from 0 to 39. On OpenBSD a "p" indicates a network login; up to five sessions are allowed before an alarm is triggered. In both cases, the "( | )" notation ORs the expressions inside the parentheses and separated by the pipe.

Most of these are different forms of shells that are commonly present. One is for less in any form. I spend more time in less than any other program as nearly everything I look at is via less. If this were not suppressed, alerts would be popping up most of the time that I am doing anything on a monitored system. Note that vi is not included even though I spend a moderate amount of time in it. I want alerts to pop up if anyone, including myself, is editing one of the system configuration files or a script in /usr/local/bin. If there are any files edited frequently that I did not want alerts for, I would construct a regular expression specific to such files.

Note that one of the processes relates to man but that the command "man commandname" or "man \S+" is not present. I want to be alerted when man is used on the monitored systems. Remember these are hardened servers with no ordinary users, just administrators. Use of man suggests someone doing something new or unfamiliar so an alert is appropriate. On OpenBSD systems two additional sub shells are created whenever "man commandname" is used. Alerts for these are suppressed with the processes identified as 713 and either 712 or 714 (I forget which). This simplifies the alert listing helping to emphasize what command man is being used for.

Track More Than Processes

The last group of lines with process IDs 1 - 7 actually has nothing to do with "ps -ax". These lines match output from the "w" command that gives system uptime and who is logged into the system. These lines match the standard content of w output and allow a very limited number of users to login from very limited locations. Any other user logged in from any other location would trigger an alert. Because the output of two commands, w and ps is included in the same file the script that creates them is called wps; in my real environment these are the first lines in the chkproc.txt file and thus the process IDs start at 1.

It's useful to know who is logged into a system but the main reason for including these is to show that the script can be used to analyze output from other commands than "ps -ax". What is matched simply depends on the regular expressions that are created. For example, it should not be difficult to write regular expressions that triggered an alert if the 15 minute load average on a monitored system exceeded 2.99 or the 5 minute load average exceeded 3.99. This would show as an "unknown" command but as long as the person viewing the alert knew not to take this literally, it should not be a problem.

This approach can be used to trigger an alert for almost any unusual condition on a monitored system. Include the output of the command that is used to measure or display the conditions that you are interested in tracking, in the output that is monitored by chkproc.pl and write the regular expression so that "normal" or acceptable conditions are matched. When the values go outside the range that matches the regular expression, an alert will be triggered.

The Analysis Script

Since the script is well commented I'm not going to discuss much of the coding detail but rather some of the choices and reasoning behind the script. The initialization section and the main loop contain the logic to read into memory the regular expressions stored in the structured text file and compare them to the files being analyzed. In keeping with the simplicity over flexibility shown in all these scripts, the main loop assumes that analyzed files will reside in subdirectories that match the host names being processed and that files will be found in a constant subdirectory for each host.

The first interesting option doesn't come until the time_wait function. Here a runtimes array is initialized with a hard coded array of minutes that match the times the process on the monitored systems executes. It's assumed that process monitoring will occur more than once an hour but it makes no sense to check every minute for the first ten minutes of an hour and then not to check for the rest of the hour. This array could easily be changed to every sixth minute to provide consistent monitoring throughout the hour. It could be changed to every five minutes by adding two array elements; in the following for loop, "$i<10" would need to be changed to "$i<12". Any convenient interval could be used by adjusting the elements in the runtimes array. Monitor times do not need to be multiples of the interval so a six minute monitor interval could just as easily run at 1, 7, 13 . . . or 4, 10, 16 . . .

The more frequent the monitoring interval, the closer this system is to real time monitoring. The problem is this is a simple system and will trigger an alert every time an alert worthy condition is found. If the monitoring is every minute and an alert worthy condition persists for 6 minutes, five or six separate alerts would be triggered. This could be very annoying for an operator or administrator trying to deal with the situation. The longer the alertable condition persisted, the more annoying the alerts would become. A mechanism for mitigating this by surpressing alerts is described later.

On the other hand, having only a few monitor times per hour greatly increases the likelihood that significant events will be missed. If the monitor interval were more than a few minutes and an intruder knew the interval and times, they could perform their work so as to evade detection.

For the purposes of intrusion detection, grouping processes into five categories should help decide how frequently the monitoring should run and what processes should trigger alarms. First are the required or continuous processes which should never trigger alarms. These will be identified almost immediately when defining the matching regular expressions. The more frequently the monitoring is performed the more quickly the less frequent statuses will be detected and added into the matching regular expressions. Second are the routine or periodic processes. Typically these will include all cron jobs and completely ordinary user processes such as checking mail when first logging on. Maintaining web content and reviewing web logs on a web server might also be examples; depending on the number of users and their roles, these might be routine for some users and unauthorized for others. Routine processes should not trigger alerts. Even with very frequent monitoring it may take a long time before all status and command line variations of these processes are discovered.

Third and fourth are infrequent and sensitive processes. Infrequent processes are legitimate and harmless commands that probably should not trigger alerts but which occur with such infrequency and variability that it is not practical to anticipate them and build regular expressions to surpress the alerts. There will be a continuum from routine to infrequent processes. Probably whenever a new processes is seen that is not sensitive and is likely to recur from time to time, a regular expression should be built to match it to keep alerts as infrequent as possible so they are not dismissed without action by those who receive them.

Sensitive processes should occur infrequently enough that they are not routine. They will be perfectly legitimate at some times and under some circumstances but are important enough from a security perspective that they should trigger an alert. Examples might be execution of user maintenance programs or editing of the system configuration files. If sensitive processes occur with such frequency that they become routine, there is probably something wrong with the system administrative policies and procedures. Even if the activity represented by an alert related to a sensitive process is entirely legitimate, it should trigger a thought process in whoever recievers the alert. That person or persons should question who is doing what on the system and if the alert cannot be matched to expected system activity, it needs to be investigated. Typically this would begin with a review of the full "ps -ax" output to determine who or what caused the alert. Then if it was interactive, is it appropriate for the specific user to perform the action from the terminal from which it was performed? This could lead to checking with other administrative personnel or comparing file contents with backup or archived versions.

Fifth are unauthorized processes which should always trigger an alert. There is however no direct way to test for unauthorized processes since these could be virtually anything. It could be a routine process performed by an unauthorized user or from an unauthorized location or it could be the execution of a new command illicitly brought onto the monitored system. A detected unauthorized process will, by definition, never be a required or allowed process. The best way to insure that unauthorized processes are detected is to make the regular expressions defining recognized processes as specific as possible so that unauthorized commands do not accidentally match the expressions. For example "(\/usr\/bin\/)?vi \S+" would be a very poor choice for an allowed command as it would allow vi to be used on any file without detection.

Like most intrusion detection systems, getting this system right is a matter of finding a balance. Ideally, every suspicious activity will trigger an alert and every alert will trigger an operator response, even if that response is no more than a mental review of who is doing what on the monitored system. If either the system is made to ignore potentially significant events to minimize alerts or alerts become so frequent that operators ignore them, the system will likely not accomplish its purpose.

Alerts

The process_errors function in chkproc.pl is responsible for creating alerts. In this case, the script pops up a notepad window containing an error log on a Windows workstation. This is the most system specific part of the script. Both the form of the alert and the manner in which it is presented to an operator or administrator can and should be changed to meet specific site needs. On most systems it will be normal to write the condition causing an alert to some kind of log but beyond this little will be standard. The alert could also be a dialog box rather than a notepad window, a network message causing dialogue boxes on multiple remote machines, an e-mail message, an audible tone on the workstation, dialing a beeper, creating or modifying a web page or updating a database that feeds a paper or web reporting system, or a combination of these or other actions.

Other alert forms will require some additional work besides coding a delivery mechanism. The logic here provides for a single display of the error log which is left on the screen until cleared by the operator. If an additional alert is to be displayed, the previous one is cleared before the new one is displayed. In the displayed log file, the same alert condition may be repeated numerous times. If the delivery were via e-mail or pager it would be very annoying if the same condition triggered additional e-mails or pages, possibly every minute for an extended period of time. This would render the system unusable. If an alternative alert mechanism is used, a method will need to be built to determine if the current alert is essentially the same as the preceding alert and surpress it if it is.

In the current script, a single window is maintained by checking if notepad is already displaying the error log. This is done via the NT Resource Kit program tlist, which is like a simplified ps. Tlist doesn't report the command line that started the listed programs. It appears that various Windows programs such as word processors, browsers and notepad report to Windows, the primary file they are acting on. If you keep such a program open but change the file being viewed or edited, the tlist output changes accordingly.

Even though the script cannot determine if it launched a particular copy of notepad, by capturing tlist output and looking for notepad and the error log name it can determine if there is a copy of notepad it may have started. If so, it extracts the process number from the tlist output and uses the Resource Kit command, kill, to kill the process. If the workstation has been unattended for an extended period of time during which many alerts were displayed, this has the effect of conserving system resources and keeping the number of windows that an operator needs to deal with to one.

Because each new alert is popped into the foreground, killing the previous alert, does not prevent a logged in and active operator from being interrupted each time a new alert is displayed. There will be times that the person receiving the alerts will not want to be notified every time a new alert is generated. Examples might include when an investigation of an alert that cannot be completed before the same alert is repeatedly displayed or extended system administration that might generate a series of alerts.

The script includes a mechanism to allow the operator to surpress a series of potentially distracting alerts. If a nowarn.txt file exists, it is read and its contents are assumed to be a number of seconds. If the nowarn.txt file modification time is older than the number of seconds, the file is erased and the alert displayed. If the file is newer than the number of seconds, the alert is not displayed. The error message contents are still appended to the error log. The operator can create the nowarn.txt file with a simple echo command when it's needed. The script includes a fail safe. If the operator enters an unreasonably large value or an invalid format, the nowarn.txt file will be erased if it is older than four hours and alerts will resume.

The script includes no mechanism to truncate or limit the growth of the log file. To a point this is useful. If the operator has been away for a while, he or she can easily see how long an alert condition has persisted plus any processes that executed only for a limited time period and generated an alert since the last time the error log file was viewed.

After the operator has dealt with any alert condition, old alerts become a distraction. Since the script cannot easily determine when an alert has been seen, this is dealt without outside of the main script by a small maintenance script. This script simply creates a file name composed of "err" plus the two digit year and month and ".log". The current contents of the alert error log are appended to this long term error log and the alert error log is then erased. The long term error log serves no practical function in the alert system; it's simply a historical record of what alerts were generated. The maintenance script is run from a Windows desktop or start menu shortcut.

Monitoring Your Systems

The regular expressions provided here are specific to recent versions of OpenBSD and Linux and cover only a subset of the processes likely to be found on these systems. Creating regular expressions specific to your systems and what is and is not routine on them is key to the success of this method of process monitoring. To start, the small script for the monitored systems should be copied and modified. It should be run at least as frequently as it will be in production. Running it more frequently will help to identify state changes and other variations on the same process. For example there are a number of processes on Linux that sometimes are surrounded by square brackets and at other times are not; these are definitely the same process as the process number doesn't change, just the ps display.

"ps -ax" may not be the exact command on your system. You want the version of ps that shows all processes for all users including all non terminal processes. You also want a version that shows more of the command line rather than memory use or some other information that ps can display. Once the correct form of ps has been put into a script and run from cron, save the output for a day or so. View the files in time (filename) order. Every time the size of a file changes, a process has exited or a new one begun or a state has changed. There is little point in looking at successive files of identical size.

The best time to do this is on a newly installed system before it's been connected to a network and certainly before it's connected to the Internet. Once a system has been connected to the Internet or even been up for some time on local networks, it may be compromised. If you are adding intrusion detection to older systems, then install a new system with the same OS and run your monitoring script on it to get a baseline of standard processes. On the older systems, you need to figure out what every process is that's not part of the baseline. You should also use a network scanning tool to verify that the monitored systems are not exposing any ports that they should not be. If they are, the process responsible needs to be identified and disabled.

When you have a general idea of what's running on the system to be monitored and should be running, take any specific ps output file that's typical as a starting point. Unless the process is a kernel process with a constant process number, replace the process number with a regular expression that will match the range of process numbers that may be assigned. There is a resonable chance the sample regular expressions above may work. Determine if the process is a background only process, a local console process or a network terminal process or may be any of these. Replace the specific terminal identifier with a regular expression that matches the terminal locations where the process should run from.

Next create a regular expression for the status. For some processes a literal will suffice but most processes show multiple statuses over time. Use grep on the directory of saved ps output to review many instances of the command and see what statuses show, building an appropriate regular expression. Remember square brackets enclose a list of single character expressions. One character from the list will match. Parentheses group larger than one character expressions and pipes OR expressions. A question mark indicates zero or one of the preceding character or parenthetical group. A plus sign indicates one or more of the preceding character or parenthetical group. Curly braces indicate a specific number (or range with a comma) of the preceding character or parenthetical group. There are a significant number of expressions for status in the examples.

Determine if columns are separated by a fixed number of spaces or a variable number. Use literal spaces if the number is constant or "\s+" for one or more. Determine the time format for your systems version of ps. On of the above patterns may match. Here is an opportunity to do something extra besides simple process tracking. The example regular expressions for time are the same for all processes (on each type of system) and will match almost any time ps might report. If you have processes that may consume excessive resources that you want to monitor, more specific time expressions can be built so that if a specific process exceeds some predetermined amount of time, an alarm will be triggered.

The last part of the ps line is the command that started a process. Here, for starters, you'll be taking mostly literal text but precede forward slashes with back slashes. Repeat the above process for each different command in the sample file you started with. Delete multiple lines for processes that run with multiple instances of the same process. Then add in process IDs and tildes. You can number the process IDs sequentially or in sequences leaving gaps or try to break them into related groups as I have. Add a HOSTS line with just one host and one or more HOST lines defining which processes are allowed and which are required.

Allowed processes may have more than one line that matches if the statuses or command parts vary too much to easily create a single regular expression that matches all variations. For a required process to work properly without triggering unwanted alerts, every possible variation of process number, terminal, status, time and command must be accounted for with a single regular expression. If you don't know regular expressions well enough to do this, make the command allowed and build two or more regular expressions. Resist the temptation to make a general regular expression that might match unintended commands. For example "^.*$" will match any possible line. Making such an expression required for a host would prevent any alert for that host from ever being displayed.

Don't worry about getting your first chkproc.txt file right or complete for even one host. You won't, I promise. If you're not already ftping the ps output to the monitored system get that working and adjust chkproc.pl to match your directories and make the runtimes match the actual cron schedule used on the monitored systems. All systems need to use the same schedule or chkproc.pl will need significant adjustment. Make any other system specific changes that are appropriate. For testing, chkproc.pl, could be modified to read all the ps output files in a directory rather than running on a schedule. This will expedite testing the regular expressions to verify that they match expected variations in the ps output.

Whether or not you test your regular expressions against a directory of ps output, you will at some point need to start running it against live output from the monitored systems. I think it's best to start with a single system. Unless you've done an extraordinary job with your regular expressions, you will see both missing required processes and unknown processes. If you see no alerts, the other possibility is that you have an excessively broad regular expression for a required process. If you have missing processes but no unknown processes an overly broad regular expression as an allowed process could cause this.

Since alerts are supposed to be an infrequent occurrence, you need to study the alerts and compare them to your regular expressions. Keep refining your regular expressions until all the alerts are suppressed. But also keep the expressions as specific as possible. Also, I'm assuming that no one is on the system being set up for monitoring and making the kinds of changes that should be triggering alerts. Depending on the system being monitored and your skills with regular expressions, creating the first chkproc.txt file is likely to take two to several hours and refining it, half an hour to several more hours.

When you don't get new alerts each time a new ps output file is checked, it's time to move onto another system. If the operating system is different, you may want to repeat the entire process building a new chkproc.txt file and appending it to the first one. Be sure to keep process IDs unique. If the systems are very similar adding new host definitions to chkproc.txt may be all that's required. Either way expect to see some new alerts. Each additional system monitored will likely trigger some new alerts.

Once monitoring of one or more systems is in place, expect to see alerts occur from time to time. Initially they will be much more frequent and may occasionally come in bunches. Different process statuses, command formats and jobs that do not run continuously but periodically will all contribute to these. When you next perform some adminstrative tasks for the first time since monitoring was put in place, you'll see more. Early, many of the alerts will result in adjustments to regular expressions or additions of new processes. You're building a database of all the normal process states on the monitored systems. It may never be 100% complete.

Over time you'll see more alerts that you don't want to surpress, alerts for the processes described above as sensitive. You may even see alerts triggered by unauthorized activity; that after all is reason for building the system in the first place. As you refine the database of normal activity don't get complacent and just add everything you see into it. If you don't understand a process or status that triggered an alert, investigate it. This monitoring system is also a learning tool for the system administrators using it. By the time the system is running smoothly, the administrators using it should be able to look at ps listings and have a clear idea what every process displayed is doing and what started it.

Every process on a system is started by a user or cron or directly or indirectly spawned from an already running process. If a process is triggering an alert and you don't know why, don't eliminate it as an allowed process. If necessary, use nowarn.txt to temporarily stop seeing the alerts. Keep watching until a pattern develops. If you stick with it, eventually you'll figure it out. It took me about four weeks to figure out what was starting sendmail in the background. I knew I could make the alerts go away in a minute or two but wanted to know the cause and correctly believed I'd learn something about the system if I stuck with it. Once I understood the cause, I was easily able to fix the problem and stop Sendmail rather than accept a process I did not want to run.

Conclusion

Though these scripts will require modification for use on other systems, they are provided as a starting point for simple host based intrusion detection. The only restriction on their use is that the copyright notices be respected. Hopefully they have shown that simple scripts can build a fully automated notification system alerting selected personnel of changes to key files or of questionable processes that may need to be investigated on monitored systems.

Though the two pieces, file change and process monitoring are logically independent, they form a much stronger intrusion detection system when used together than either component used alone. Anyone who knows UNIX, knows that text files that are not marked as executable can still be interpreted by a shell. Thus, while it may be difficult to make permanent changes to a system without triggering a file change alert, it's not at all difficult to be on a system with file change monitoring and to be able to execute non tracked files without changing any tracked files. This would evade Tripwire or any system based on tracking file changes via checksums in the same manner that it could evade the scripts presented here. Only a file change monitoring system that tracked every file on the system would pick up suspicious changes. Effectiveness of a system that tracked every file would depend on operators or administrators who knew every file on the system likely to change and who had the time to investigate all but the most routine changes.

On a system that implemented process tracking as described here, it would be difficult for an intruder to operate long without being detected if the process monitoring was reasonably frequent. If suspicious activity was not investigated promptly and the system did not also implement file change tracking, it might be possible for an intruder to alter ps or the script that invokes ps on the monitored system in such a way that process alerts were not triggered. Simply sending the old output from a previous execution of ps again and again would normally have this effect. Thus even if authorized changes are made to the process monitoring script, its content should be rechecked when the file change tracking system reports the change. Unauthorized changes to the process monitoring script or the ps executable are by definition, unauthorized activity, whether by an intruder or an authorized user using the system in an unauthorized manner.

With both file change tracking and process monitoring as described here and used to check each other, it's difficult to see how either an intrusion or unauthorized activity by authorized users could go on for long, undetected.

Top of Page - Site Map

Copyright © 2000 - 2014 by George Shaffer. This material may be distributed only subject to the terms and conditions set forth in http://GeodSoft.com/terms.htm (or http://GeodSoft.com/cgi-bin/terms.pl). These terms are subject to change. Distribution is subject to the current terms, or at the choice of the distributor, those in an earlier, digitally signed electronic copy of http://GeodSoft.com/terms.htm (or cgi-bin/terms.pl) from the time of the distribution. Distribution of substantively modified versions of GeodSoft content is prohibited without the explicit written permission of George Shaffer. Distribution of the work or derivatives of the work, in whole or in part, for commercial purposes is prohibited unless prior written permission is obtained from George Shaffer. Distribution in accordance with these terms, for unrestricted and uncompensated public access, non profit, or internal company use is allowed.

Home >
How-To >
Intrusion Detection >
default.htm

What's New
How-To
Opinion
Book

Email address