> Technical Notes >NFT Errors 8.3.2 

Dirty Networks NFT Errors: A Technical Summary

by Ron Herardian
©1995 Global System Services Corporation (GSS)

DIRTY NETWORKS:

Contrary to popular belief data corruptions on networks are commonplace. Further, network hardware and protocol stacks no not necessarily detect errors.

Novell's IPX protocol, for example, contains no validation mechanism by default. CRC checking can be enforced by the server but in practice no one uses it because it noticeably slows down the server.

This means that hardware is the only thing between the cc:Mail user agent and the post office. CRC errors, when detected, are typically caused by a bad network card (when they have a single point of origin, or by a cabling problem when distributed over the network.

Of course, we like to think that because there are error detection mechanisms in place that undetected corruptions of data transmitted across a network cannot occur. Unfortunately, this is just not true. I suspect that an investigation into something as simple as CRC checking in ISA network cards would yield alarming results, especially when running in PCs with non-standard bus clocks (common in clone PCs) and on cabling systems of various quality.

FAULT TOLERANT APPLICATIONS:

For most applications, minor data corruptions that occur on the wire can go unnoticed indefinitely. So long as the data on the server are intact, we are content to dismiss everyday errors as attributable "cosmic rays." For a cc:Mail database, however, even a minor corruption can have disastrous consequences.

NFT checking is basically page-level CRC checking in the NFTCHECK, CLANDATA and USR files. This is a high-level detection mechanism that reveals lower-level problems. A CRC is written to each page of the cc:Mail database. There are four possible warnings or errors: error prior to login, a warning or error reading; a warning or error writing; and a warning or error verifying. When a warning or error occurs, an entry is made in a file named NFTERROR.LOG stored in the post office data directory.

ERRORS PRIOR TO LOGIN:

An error prior to login means that test data written to and read back from the NFTCHECK file failed a CRC check. A cc:Mail user agent detecting this error would not continue to access the post office database because a corruption would likely result.

ERRORS READING:

An NFT warning reading means that a CRC embedded in a given page of the cc:Mail database did not match the data received. After retrying between 1 and 20 times, the CRC does match, therefore a warning is logged. If after 20 attempts to read the data and check the embedded CRC, the CRC still does not match, then an error reading is reported. The point of course, is that if the CRC is correct for a given page within the database file itself, it should still be correct when the data arrive at the workstation.

The NFT error reading is often misunderstood because when a cc:Mail database file is corrupted, user mail applications report errors reading. This is because one or more CRCs do not match and will never match so long as the corruption persists.

cc:Mail Support lore has it that NFT errors reading are commonly associated with bad network cards and cabling problems. In practice, the NFTERROR.LOG can be used to identify specifically what cards are bad. It's not unusual for a cc:Mail database repair technician to recommend changing a network card indicated by a particular user repeatedly logging NFT errors.

In other cases, it may be observed that all of the users appearing in the NFTERROR.LOG file are users whose workstations are attached to a particular LAN segment.

NFT ERRORS WRITING:

An NFT error writing involves a read-back-and-compare operation for data written to certain database files. If the compare fails, then the data written to the database do not match the data in memory at the workstation. This could be because a corruption has occurred or because the data, although not corrupted within the database file, are corrupted dynamically on each of attempt to read-back-and-compare, up to the point where the cc:Mail application ceases to retry (historically 20 attempts).

An error writing often indicates that the cc:Mail database has been corrupted. However, it is not impossible that data in the workstation's memory are corrupted or that the data, though written faithfully to the cc:Mail database on the server are corrupted dynamically on the network during all of the attempts to read it back and compare it, or at least the CRC, to the data in memory.

NFT ERRORS VERIFYING:

If after a write operation, a subsequent read for the purpose of comparing the data written to the cc:Mail database to the data in memory, the read operation (actually a read-back operation) fails, this, after some number of attempts, constitutes an error verifying.

According to cc:Mail Support lore, NFT errors verifying are often associated with bad disk media at the server and verify errors of increasing frequency may indicate an impending server hard drive failure.

According to cc:Mail, NFT errors are caused exclusively by lower-level network errors and the cc:Mail applications are merely detecting these errors. However, the cc:Mail applications are sometimes the only indicator that a problem exists. Verification of these errors depends upon independent confirmation through network analysis tools, such as Network General Sniffer and Novell LANalyzer, that there are in fact network errors.

IDENTIFYING THE REAL PROBLEM:

Often times, because cc:Mail software is the only indicator that there is a problem, it is believed that cc:Mail software is actually the cause of the problem, or even the cause of other network errors that correlate positively with the incidence of NFT errors. Knowing, how NFT checking works, however, makes the difficulty of proving that cc:Mail is not of In practice, however, suggests that pointing to cc:Mail when there are network problems is a questionable policy at best.

UNDERLYING NETWORK AND CC:MAIL SYSTEM DESIGN PROBLEMS:

When network bandwidth utilization is excessively high, the probability of NFT errors is increased. Ethernet networks are susceptible to this, because of the performance degradation characteristics of CSMA/CD technology, when utilization regularly runs over 30%. This problem can become critical for cc:Mail when there are cc:Mail system design problems or when there are underlying network design, capacity, or routing problems that interact with the cc:Mail system.

Common cc:Mail system design issues have to do with the message routing topology, the configuration of cc:Mail Routers, the deployment of post offices in a given LAN or WAN environment, and so forth. Common network problems can involve bottlenecks caused by network design itself, the deployment of servers in a given LAN or WAN environment, the hardware and software configurations of network routers, and so forth.

LONG-TERM SOLUTIONS:

Anyone who's been through a serious cc:Mail database corruption wants a guarantee that it won't happen again. Identifying and correcting the underlying problem, however, may require a review of the cc:Mail system design and of the network design.

Long-term solutions might range from isolating building or campus networks from segments that act as WAN backbones and upgrading network routers, to moving post offices or adding new post offices or file servers.

In well-designed network environments, NFT errors are extremely rare or simply nonexistent. Experience indicates that the more NFT errors there are, the more likely post office corruptions and other cc:Mail problems.

 

About GSS

Global System Services Corporation (GSS) is the leading provider of consulting and professional services for large-scale and distributed infrastructure systems such as email and messaging, directory services, groupware, and wireless solutions. GSS customers include Fortune 500 companies, large services providers and telecom companies, government agencies, major messaging product vendors, and innovative technology startups.

GSS provides a complementary suite of services including strategic technology consultation and competitive vendor and product analysis, product and system architecture and design, system development deployment, customization, and testing, technical support, email migration, and other IT services. GSS has been directly responsible for some of the largest global systems and solutions and counts as customers many of the largest companies in the world.

From its offices in the Silicon Valley California, GSS delivers services and solutions to customers worldwide through a network of mobile consultants and qualified GSS Affiliates. With industry certified professionals on staff, GSS is a Qualified Lotus Business Partner, a Certified Microsoft Solution Provider (MCSP), a Principal Partner in the Sun Partner Advantage program and a member of the Sun Software Partner Council, as well as a member of key industry organizations.

Contact GSS

Global System Services Corporation (GSS)
650 Castro Street, Suite 120-268
Mountain View, CA 94041, U.S.A.
1 (650) 965-8669 phone
1 (650) 965-8679 fax
http://www.gssnet.com
info@gssnet.com


 
Messaging, Directory Services, Groupware


©1995-2005 by Global System Services Corporation (GSS). Portions of this material are copyright ©1995-1999 by Ron Herardian