GeodSoft logo   GeodSoft

Linux, OpenBSD, Windows Server Comparison: Stability and Reliability Summary

Miscellaneous Reliability Issues

Occasionally on the open source systems, I've encountered problems that are at first mysterious. These have always been traced to a system change, where the relationship is hard to see at first, but obvious once it's finally figured out. The UNIX problems have always been the result of something I've done.

I've never seen anything on any UNIX like systems, that resembles some of the bizarre NT changes, that take hours or days to solve. These include Index server failing when a virtual web site is added (never solved -- see also), MMC losing IIS configuration data (quick fix but no solution), two days down waiting for hardware repair because good backups and similar hardware is not sufficient (NT requires identical hardware for a successful restore), core system functions lost after a UPS initiated orderly shutdown (fixed but no NT error messages or log entries).

Last but not least, my NT server experienced a complete system failure following the installation of a security patch with no user choices or options. I subsequenly determined that a bad memory chip, installed at the same time as the patch, was almost certainly the actual trigger. Still NT self destructed in the presence of defective hardware where Linux only behaved unreliably and OpenBSD displayed useful error messages that immediately lead to identifying and correcting the underlying problem. The NT changes have been spontaneous, or an unexpected by product of an install or change that should in no way have been related to the problem.

Compare the handful of possible system crashes or networking irregularities on several UNIX like systems, over a year and a half, to far more frequent Windows system crashes and lockups on two machines. I was just learning the UNIX like systems but had three years experience with NT. The Windows problems are not frequent the way that pre NT Windows desktop failures may be, but not so infrequent that there is any surprise when they occur. Once placed into a "production" setting, the UNIX like systems exhibited almost total reliability. The NT server had nearly 70 reboots during the time the Linux server went 336 days without one. While I had it, I spent more time troubleshooting the NT server, by far, than all my UNIX like systems combined. Following the last NT failure, and knowing that I would not be migrating to the next Windows server version, the system simply wasn't worth the effort to restore again.

Stability and Reliability Summary

Though my personal experience might suggest otherwise, I believe that over all, OpenBSD is somewhat more stable than Linux, but not by such an amount that I can put any specific figure on it. Comparing the OpenBSD and Linux with NT, I'd rank the UNIX like systems together and rate NT a very distant third. My gut feeling is that the difference is something like two orders of magnitude when looking at the core of the system and system crashes with a plausible range somewhere between one and three orders of magnitude. In other words, looking at resistance to kernel crashes, I'd think NT is ten to one thousand times less stable. NT simply isn't very good at isolating itself from either system or user processes that are buggy. Moving to more peripheral parts of the operating systems, I'd expect the differences to be somewhat less.

Nearly everyone seems to agree that Windows 2000 is more stable than NT and that NT is a lot more stable than Windows 98. Microsoft's own figures put clear limits on how much more stable Windows 2000 it is than NT. If NT were only marginally more stable than 98 the upper limit would be 13 times. If these figures have any real meaning, I'd guess the actual difference is something between a third more stable and three times more stable. From an NT user's perspective, Windows 2000 may look significantly more stable than NT but from any UNIX user's perspective, both Windows systems look very unstable.

I say "if these figures have any real meaning" because if you have enough resources to really test both hardware and software compatibly, and to keep reasonably simple and unchanging server configurations, NT can be made quite stable, though still not a match for UNIX systems. Windows 2000 should be better. If you don't have those kinds of resources and conditions, then luck seems to play a large role in how stable any particular Windows system is, and this is meant to apply to the entire Window's family.

Generally, the more complex the system configuration, the more active processes running and the higher disk, CPU, and memory use is, the more likely the system is to experience crashes or bizarre behavior. To some extent this is true of any system, but it seems much worse with Windows systems. UNIX systems seem to run naturally with 100% CPU use for extended periods of time. With NT servers I've been responsible for, CPU use reaching and staying at 100%, was always cause for concern. If significant paging activity were also present, I'd be mentally if not literally, "crossing my fingers."

With the Windows family, unstable behavior is most likely to occur following any system change, including installs or seemingly trivial configuration changes. There does not seem to be any way to predict or anticipate unusual behavior, and the changes made, are likely to have no discernable relationship to the observed problems.

Microsoft has acknowledged the reliability problems inherent in Windows, with a "System Restore" feature that debuted in ME and is now included in XP. An eWeek review (no longer available) of the final XP release said "System Restore enables users or help desk personnel to return systems to a workable state in the wake of a harmful application installation or other destabilizing event." I could not imagine using a phrase like "harmful application installation or other destabilizing event" in conjunction with any UNIX system.

Why did Microsoft deliberately choose to build a system with these reliability issues when competing systems did not have similar issues? The answer is spelled "M O N O P O L Y". If you don't understand this answer then read the legal opinions in the United States v. Microsoft. In particular, the "Findings of Fact" under the "District Court Filings," lower on the page is relevant. If a Microsoft technical decision that otherwise makes no sense, makes it harder to port a Windows application to a different operating system, you may begin to understand Microsoft's motivation. Not surprisingly, Microsoft's approach to the problem is not to fix the underlying cause (the registry), but add a new feature, which presumably tracks system changes, and allows the system to be rolled back to a previous date. Do you like complexity?

transparent spacer

Top of Page - Site Map

Copyright © 2000 - 2014 by George Shaffer. This material may be distributed only subject to the terms and conditions set forth in http://GeodSoft.com/terms.htm (or http://GeodSoft.com/cgi-bin/terms.pl). These terms are subject to change. Distribution is subject to the current terms, or at the choice of the distributor, those in an earlier, digitally signed electronic copy of http://GeodSoft.com/terms.htm (or cgi-bin/terms.pl) from the time of the distribution. Distribution of substantively modified versions of GeodSoft content is prohibited without the explicit written permission of George Shaffer. Distribution of the work or derivatives of the work, in whole or in part, for commercial purposes is prohibited unless prior written permission is obtained from George Shaffer. Distribution in accordance with these terms, for unrestricted and uncompensated public access, non profit, or internal company use is allowed.

 


What's New
How-To
Opinion
Book
                                       
Email address

Copyright © 2000-2014, George Shaffer. Terms and Conditions of Use.