GeodSoft logo   GeodSoft

Perl and CGI, NT vs. Linux vs. BSD - 5/06/00

Differences implementing Perl CGI and site maintenance scripts on Linux, Windows NT and OpenBSD systems. Like most of the pages in this section, it has not been updaded since it was written and does not include things I've learned since. It is not a systematic comparison of the operating systems or the use of Perl on them but a narative of specific problems encountered.

Update 5/08/00

In developing a script to aid in maintaining multiple versions of the same web site on different platforms, its clear that some cross platform issues need to be work out. A script that's going to use different graphics, depending on where in a virtual web site a page that is being updated is located, has to work with directory and path names in a way that's going to work across platforms. While Perl mitigates some of the differences, it's not safe to assume that logic that works where directories are separated by back slashes will necessarily work where they are separated by forward slashes and vice versa. Before too much actual development is done some basic testing needs to be performed to be sure that certain key elements will work on all systems.

In addition to static pages, GeodSoft.com will at some point have at least some dynamically generated pages. These need to have the same design and navigation aids as the static pages. In the past I have used a singe function to generate standard page tops and bottoms that is used both by the standardization / maintenance script and all CGI programs. This site will be much more complex because some components will be changing with where in the site the page is located (physically for static pages and logically for dynamic pages). Other components will vary with the web server and OS platform.

While I may use other tools such as PHP or Zope for one or more of the sites, so far most of my dynamic content creation has been done with Perl and regardless of what other tools may be used it's bound to be useful to be able to put up simple Perl CGI pages. My development methodology is pretty emphatic about not hard coding any site or directory location information that can be obtained by other means. Specifically I've tried (and succeeded) in building web sites so they could be moved not only from one machine to another but to different directory trees on the same machine. This allows one machine to host live, development and experimental versions of the same site, without having to make source code changes as pieces are moved from one site to the other.

For this to work, the scripts must be able to determine where they are physically located and what the site root directory is. The key piece to this is the PATH_TRANSLATED CGI/Environmental variable. I decided to set up CGI directories on all three servers and see what environmental variables are available on each server. For this purpose I have a simple Perl scripts that output minimal HTML pages listing environmental variables and values.

Since I've installed NT web servers on 6 different machines and done twice as many installs and upgrades, I started with NT. Here it was a simple matter of creating a cgi-bin directory and copying an existing script. I needed to check that script execution was enabled in the new directory and with IIS 4, explicitly make an association between the .pl extension and the Perl executable. This took a few minutes and worked the first time.

In this case it was Apache, BSD and Linux that all managed to surprise me. I created cgi-bin directories under the site root directories, copied the script and made it executable. I set up what I thought were the necessary Apache Directory and Option directives to make the new directories CGI executable. In both cases Apache said the requested URL could not be found. Further, on both systems, even though "perl -cw cgienv3.pl" returns with the "cgienv3.pl syntax OK" message, the script blows up when executed at which time Perl reports a syntax error after executing several lines.

While I have about two more years of NT web experience I still have about four more years of UNIX experience and three more years of Perl on UNIX than on NT. I know that on both NT and UNIX systems, the default installs are typically for the script directory to be outside of the document root directory. I suppose someday someone will explain to me when and why this makes sense but its not very useful for the sites I've worked on. I feel very strongly about having a production site that is separate from the test, development or staging site and no where is this more important than in the script directories which are the ones that require the most testing. Actually every web site needs at least three sites because sooner or later nearly every site goes through an overhaul where major changes are made to site structure, navigation aids, styles etc. This process takes too long to stop work on the production site so you need a third site during this period.

Whenever you're upgrading system or development software or installing major new development products, it's very advantageous to have separate machines to do the testing on but at other times the live and development systems can coexist on the same machine as the development site will rarely put a significant load on a server. Even when they are always on separate machines, there still comes the time for the site overhauls and I am sure many, actually the large majority of companies since there are many more little companies than big ones, can't really afford a third web server machine.

Further, dynamic content is about custom content so where does it make better sense to have multiple versions of directories than the script directories. It seems clear to me that though it may not be necessary, that it is advantageous if the script directories go with the site and not the server. The only place a common script directory makes sense is at an ISP where the customers don't actually have direct access to modify the scripts but only to run certain standard scripts that are configured with data that is specific to each virtual site that uses the script.

I expect that I will work through the Apache documentation and figure out what needs to be done to put a cgi-bin directory in each virtual site. Frankly it seems almost bizarre that you have to go to extra work to make a directory that is physically inside a site appear than one that is outside. Likewise, I'll eventually figure out what's causing the Perl problems and while this is most likely a Perl and not and OS issue, this round clearly goes to NT.

The Perl problem was trivial. Since the script was copied from an NT environment it was missing the "#!/usr/bin/perl" first line. It didn't take very long to find the AddHandler directive in the Apache documentation. That looked like it should be the answer but it wasn't. I've also tried specifying the directory by it's full path name and relative to the virtual root with and without a leading slash none of which seem to work.

I went to the Apache FAQ's and after a long detour looking at search products pointed to by the FAQ, returned to the CGI problem. The "What to do in case of problems" mentioned the error log and a quick look there confirmed my suspicions. Apache is trying to run CGI scripts out of the default cgi-bin directory and not the new site specific one I'm trying to create in my virtual site. I've looked at the ScriptAlias settings and documentation in httpd.conf and it's pretty clear how I could change where Apache is looking for CGI scripts but that's not what I want to do. I want to leave the cgi-bin directory of the default site and have a separate cgi-bin directory that goes with the virtual site. This is trivial with IIS and should be with Apache but I can't find anything in the documentation to suggest how to go about this.

There doesn't seem to be any good concept or overview type documentation that comes with Apache at all. This is hardly unusual anymore; most documentation seems aimed at step by step mechanics. As irritating as they are at so many things, Microsoft does have good conceptual technical documentation, at least if you're willing to pay for the Resource Kits. Of course you are paying infinitely more just for this documentation than you pay for the whole Apache package. I'll try the Apache site and if I don't find anything then I guess I'll try one of the news groups, as suggested.

In the preceding paragraph, I forgot the obvious. There is plenty of third party documentation available for Apache (as well as NT.) I just ordered three books, O'Reilly's Apache the Definitive Guide and Writing Apache Modules with Perl and C as well as Wrox's Professional Apache for about the street price of a typical Microsoft Resource Kit. BTW, I used Bookpool.com which pretty consistently has the lowest prices on technical books on the Internet. I've used them for several years. Their web site can be a bit of a pain at times and their selection is not as big as some of the better known sites, but they have really cheap prices on what they carry and have always provided excellent service in my experience.

I ended up deciding to try disabling the ScriptAlias directory completely. I always use .pl extensions on my Perl scripts regardless of the platform and sometimes have data or other none script files in my cgi-bin directory so I wasn't particularly thrilled with the descriptions of what ScriptAlias accomplished, i.e. making every file in the directory executable. The interesting question will be can I make CGI scripts work in the original cgi-bin directory at the same time as they are working in the virtual directory for GeodSoft? For the moment, I'll leave things as they are.

I did just learn one more relevant item on the NT server. Even though the site is public, it's prompting for passwords on the CGI script. I hadn't noticed this before because NT authentication had been turned on and basic authentication turned off, the IIS default, so my NT workstation was automatically authenticating me. Now that I've disabled both basic and NT authentication, allowing only anonymous access, I am getting prompted for a username and password from all machines. This is especially weird since there's no security on the NT machine. By this I mean since it's still on a private network not connected to anything, I haven't gotten around to changing the default NT "security" which is for Everyone to have full access to everything which is for all intents and purposes no security. I knew I'd have to change this before getting my SDSL connection but I'm very surprised at these password prompts now. I even added the anonymous user with explicit read and execute access and am still being prompted.

Talk about weird. Without changing a thing on the NT server, it's now no longer prompting for passwords. While I wrote the preceding paragraph, the password dialog box was on the screen. When I finished writing the paragraph, I cancelled the dialog and tried the URL again and it came up with no prompt. I haven't even switched over to the NT server and am not running any management software on this workstation. This reminds me of situations that I constantly ran into when first working with CGI on NT in early 1997. There were a variety of unpredictable time delays before some changes went into effect making it almost impossible to test certain situations because you could not be sure whether the change you just made or one you made a few minutes before fixed the problem. This is the most frustrating thing about NT; I've never seen another OS that is so unpredictable.

Even though I spent a lot more time on the UNIX like systems, I now have to call this one a draw. If I hadn't stopped to write about what was going on, there's no telling how long I might messed around with NT trying to figure out what was going on.

This is unbelievable or would be if it wasn't Microsoft. I just entered a second script name and was again prompted for a password and when I cancelled the dialog box got a "401.3 Unauthorized: Unauthorized due to ACL on resource" error message. I have two similar scripts, one which loops through all environmental variables and one which only looks for specific CGI environmental variables. The NT security settings on both are identical. Now NT is prompting for passwords on both. If I hadn't been writing this just as it happened, I'd have to question my memory or sanity.

Even worse, it won't take my valid Administrator password, a simple password I type a dozen or more times a day. It only took a few moments to find the answer when I went back to the NT machine. I had started to put access controls on some directories including the Perl directories, which were not available to the anonymous user. Now that they are, I'm not getting password prompts on either the NT workstation or Linux machine. This is good but leaves no explanation for why one script came up once during this process.

Now that my script is working on two systems, the NT system is returning PATH_TRANSLATED but not DOCUMENT_ROOT and the Linux system is returning DOCUMENT_ROOT but not PATH_TRANSLATED.

After making similar changes to Apache on the Open BSD server as I had on Linux, I got the following "You don't have permission to access /cgi-bin/cgienv.pl on this server." error message. This looks like the difference in security orientation between Open BSD and Red Hat Linux. This is the kind of difference that one should expect between these two systems.

5/08/2000

Returning to NT, I added some use statements to one of my simple test scripts. I hadn't run any CGI scripts on my workstation under Personal Web Server (PWS) and thought I'd give it a try. It didn't work. No surprise. After checking all the obvious settings and seeing no reason for it not to work I moved it the NT server and got a new error message: "CGI Error - The specified CGI application misbehaved by not returning a complete set of HTTP headers. The headers it did return are: Can't locate File/Basename.pm in @INC (@INC contains: .) at d:\geodsoft\cgi-bin\ftest3.pl line 2. BEGIN failed--compilation aborted at d:\geodsoft\cgi-bin\ftest3.pl line 2."

One thing I do like about IIS is it has good error messages. In particular, in addition to returning the server error message, it typically returns the output from a script, even when the script is not outputting valid HTML which can really help with debugging. Perhaps some will think this is a security weakness but if you do you site right, you'll have fixed these before the public ever has a chance to see them.

This problem seems straight forward enough except that it's been too long since I had to configure this that I can't remember where its done and I don't have any other properly configured NT servers available that I can look at. It only takes a minute or two to find the right section in the ActiveState documentation on installing Perl on Win32. Unfortunately, this section is as close to pure gibberish as any that I've seen in a long time. I understand where something is supposed to go in the Windows Registry but I haven't a clue what they're talking about as far as what is supposed to go into the Registry.

I've actually gotten pretty good with the NT Registry, which I think is the single stupidest computer technical "innovation" of the past decade. If I hadn't dealt with this creation almost every professional day for the past four years, I wouldn't believe such a thing actually existed. It's a huge binary object the actual structure of which is completely undocumented. It's absolutely crucial to almost everything that goes on an NT machine but the only officially supported interface to it is the GUI tools provided by Microsoft. Everyone who works seriously with NT knows that many things simply cannot be done without directly editing the registry, yet if you make such edits, Microsoft disavows any responsibility for any results that might ensue.

I am able to deduce from the ActiveState documentation that @INC is at least in part derived from the environmental variables PERLLIB or PERL5LIB. I just don't understand what is supposed to go into \\HKLM\Software\Perl\lib or \\HKLM\Software\Perl\sitelib to create or affect these environmental variables. Now system environmental variables are one thing I do work with often enough that I know exactly where to change the registry to create, change or delete them. Microsoft has provided a simple GUI interface to control user specific environmental variables but for some reason doesn't think that system administrators need to add or remove these at the system level; you can change existing values but not add or remove variables. At least in four years I haven't found any Microsoft documentation or GUI tool, except Regedit and Regedt32, that are the unsupported registry editors, that lets me change these values.

To make such a change, inside the registry key \\HKLM\CurrentControlSet\Control\Session Manager\Environment you need to add a value with the name of the environmental variable you want and the content being the value of the environmental variable, i.e. the value's name is PERLLIB and its contents or string is D:\Perl\lib or at least that's what I thought the value would be. Besides all this, to be sure the new value is visible to all software, you have to reboot the machine. If a major vendor had not actually built a system that requires rebooting to change a system environmental variable, would anyone have believed they would do so? I'd like someone to explain to me how this system is supposed to be easy to use compared to one where an administrator edits a text file in /etc and from then on processes that start or user's that log in have a new or changed environmental variable.

Adding a trailing backslash didn't solve the problem. Of course I had to reboot again to determine that; stopping and restarting the web server wasn't enough to make the change visible to it. "File" is definitely a subdirectory of d:\Perl\lib\ and Basename.pm is in it so why doesn't Perl see it? And why does the same script run fine from the command line of my NT workstation and I don't have any PERLLIB variable set? It also runs fine from both the command line and via the Apache server on Linux. Finally I just FTP'd the script to a remote NT server I have access to and it ran fine both as a CGI script and from the command line. I also telneted to that machine and there is no PERLLIB environmental variable nor is there any \\HKLM\Software\Perl registry key let alone lib or sitelib sub keys. So much for the incomprehensible ActiveState documentation. The script even runs form the command line of the NT server where it won't run as a CGI script.

It's this last thing which finally pointed me to the solution. When I gave the anonymous user read and execute rights on d:\perl\bin, I did not think to also include d:\perl\lib and its subdirectories. The script worked as soon as the anonymous user got access to the lib directories. I should know enough to always think about what security context a process that's not working is running in. Still the error message did say "locate" and not "access" and the ActiveState documentation is apparently completely irrelevant to the problem.

This was really my mistake but it did give me a chance to vent on some of my Windows pet peeves. All of the problems I mentioned above are real so I'm not going to retract what I said even though this specific instance was a wild goose chase. Four years ago I firmly believed that NT was the only operating system that most small and medium size organizations would someday be able to standardize all their computers on. After four years of dealing with quirky unreliable systems, I've come to believe that NT Server is exactly what O'Reilly proved it to be, the same system as NT Workstation. In other words, NT Server is a glorified desktop OS, advertised, priced, sold and licensed as a server system. Microsoft, NT and now 2000 are going to have their place in the computing world for some time to come. But frankly today, anyone not at least seriously evaluating UNIX and open source systems for their server needs has blinders on. For those of you who say I'm not looking at Microsoft's current products, there's an old saying that goes something like burn me once, shame on you, burn me twice, shame on me. I'm tired of spending good money after bad and intend to see if I can kick the Microsoft habit.

transparent spacer

Top of Page - Site Map

Copyright © 2000 - 2014 by George Shaffer. This material may be distributed only subject to the terms and conditions set forth in http://GeodSoft.com/terms.htm (or http://GeodSoft.com/cgi-bin/terms.pl). These terms are subject to change. Distribution is subject to the current terms, or at the choice of the distributor, those in an earlier, digitally signed electronic copy of http://GeodSoft.com/terms.htm (or cgi-bin/terms.pl) from the time of the distribution. Distribution of substantively modified versions of GeodSoft content is prohibited without the explicit written permission of George Shaffer. Distribution of the work or derivatives of the work, in whole or in part, for commercial purposes is prohibited unless prior written permission is obtained from George Shaffer. Distribution in accordance with these terms, for unrestricted and uncompensated public access, non profit, or internal company use is allowed.

 
Home >
About >
Building GeodSoft.com >
perlcgi.htm


What's New
How-To
Opinion
Book
                                       
Email address

Copyright © 2000-2014, George Shaffer. Terms and Conditions of Use.