Maintaining Data Integrity

Upon examining the list of files that are stored on S1, the role of the primary server becomes clear. All machines, both server and client alike, maintain local copies of account list and initialization file data in memory throughout their execution lifetimes, but (under ideal conditions) only S1 retains master copies of the data. The master copies serve as the official records for all PCs on the network.

The primary server is therefore responsible for:

•      Receiving new data from other PCs,

•      Forwarding new data to all other PCs, in order to keep all local copies of the data synchronized (identical), and

•      Storing the data on a physical medium to create/maintain the master copies.

Meanwhile, the other machines are responsible for:

•      Forwarding new locally-generated data to the server,

•      Receiving (new) data from the server, and

•      Storing the data on a physical medium whenever new data arrives, in order to keep all replicated data files synchronized with the master copies.

In this way, if a network-wide failure occurs, such as a power outage, any online modifications will be preserved. But what happens if a single machine fails? In such a case, several scenarios are possible:

Scenario 1: A client fails [tested]:

If C1 fails, it does not affect the local copies on any of the other PCs, and does not affect the master copies on S1.

Scenario 2: A server other than the primary server fails [tested]:

If S2 fails, it still does not affect the local copies on any of the other PCs, and still does not affect the master copies on S1. Non-primary, or backup servers behave as clients if the primary server is active. In our example, S2 and S3 act as clients for as long as S1 is active.

Scenario 3: The primary server fails [tested]:

If S1 fails, any new data cannot be received, forwarded, or stored. The master copies will not be updated, and local copies will eventually become unsynchronized. As a result, the Security Manager for Vapor will not behave consistently across all remaining PCs. This can happen, for instance, if S1 loses its network connection.

In order to prevent this from occurring, backup servers constantly monitor their status as clients. In the event that the primary server fails, a backup server will be chosen to replace the primary server, and its status will be promoted from client to server. Now if S1 loses its network connection, the status of S2 will be promoted to server.

The newly-active server S2 will then assume that its local copies of the data are the most recent. It will forward and store its local copies, update its replicated data files to create new working master copies, and assume the responsibilities of the failed primary server. In this way, the Security Manager minimizes loss of its data due to failure.

Scenario 4: The primary server recovers [tested]:

The order in which replacements are chosen is given in the App\SecurityManager.ini file that is stored on each PC. This order forms a chain of responsibility (i.e. the server listed last becomes responsible for maintaining data integrity when all other servers have failed).

This chain of responsibility also makes itself apparent when the primary server (or another failed server mentioned earlier in the list) recovers. If S1's network connection is restored, it once again becomes the active server. The status of S2 is demoted to client, and it relinquishes its data synchronization duties.

When S1 resumes its responsibilities, it asserts that its local copies of the data are the most recent. As S2 completes its transition from server to client, it requests the data from S1. S2's local copies can then be updated.

Note: Other scenarios that involve data loss are described in the section that follows.