High availability‌ (failure inner server)

Prev Next

The DMZ server is able to accept incoming data even if the inner server cannot be reached. A cause for this may be a disturbance in the network connection between the inner Integration Server and the DMZ server, maintenance in the network or a restart of the inner server. Per default, it is assumed that the maintenance window is no longer than two hours, but the standard setting can also be changed in the configuration (see parameter "lifeTime" in section “Availability of CommunicationLogService (offline mode)‌”.

By using a DMZ cluster of at least two DMZ servers can guarantee an uninterrupted availability of the DMZ side.

The availability strategies of the AuthenticationService and the CommunicationLogService differ, due to different goals. For the AuthenticationService, a caching strategy is used, while with the CommunicationLogService the communication between DMZ server and the inner server is handled by using persistent messages if synchronous messages fail.

Availability of MessageAuthenticationService through caching‌

The MessageAuthenticationService (configuration file ./etc/auth_dmz.xml, or as entered in ./etc/factory_dmz.xml) in a DMZ scenario responds to requests it receives from communication services by forwarding them in a synchronous message to the AuthenticationService of the inner server. If the inner server is not accessible, the MessageAuthenticationService must still be able to process requests. For this, it accesses a copy of the database tables of the AuthenticationService that it holds locally. This local database copy is called cache database - or simply cache. To minimise the risk of using outdated data from the cache, the following caching rules are implemented.

Caching rules

  1. If the inner server responds, the response is used (online mode).

  2. If the inner server does not respond, the cache is used (offline mode).

  3. The cache is filled during the restart of the DMZ server. When the DMZ server is restarted, the data is filled into the cache by completely storing all partners, partner channels and certificates in the cache. This process is called initial update. It corresponds to a complete update, see (7).

  4. If an initial update fails, it is repeated every 2 minutes, until a configurable amount of retries (maxInitAttempts, default: 12) is reached.

  5. During operation, the cache is refreshed periodically (updatePeriod, default: 900000 ms = 15 minutes) by a partial update. In the process, only the records that changed are being updated. Records marked as 'deleted' are also marked as 'deleted' in the cache. However, records that have been physically deleted remain in the cache. For this reason, a complete update is required after some time to 'clean' the database. Changes to records are immediately pushed by the internal Integration Server to all DMZ MessageAuthenticationServices.

  6. As soon as the MessageAuthenticationService of the DMZ server successfully accessed the inner AuthenticationService by Message, a “syncRequired” flag is set in the response message if the inner AuthenticationService has modified data, the DMZ server has not replicated yet. The DMZ server then starts an immediate replication. The update period starts is reset after that. Using the configuration parameter ignoreNotification, this feature can be switched off.

  7. In a full update (see below), a check is executed first to see if the inner server is reachable. If not, the cache data remains untouched. If the inner server was reachable, the cache is purged completely and all partner data, partner channels and certificates are restored. If the restricted caching by channel types (cachedChannelTypes)  is configured, only allowed types are cached. Partners that do not have channels of an allowed type are not replicated.

    During the time between deleting old data and complete storage of the new data, the cache is invalid. Requests of the communication services during this phase that cannot immediately be handled online by the inner server, are temporarily postponed. After a configurable timeout (cacheUpdateTimeout, default=8000 ms), those requests fail with an exception if the update has not finished in the meantime. This situation can only occur if the network collapses during a full update and a request to the communication layer happens simultaneously.

    Requests in online mode (inner server is reachable) are always handled by the inner server, even if a full update is executed.

    The full update is not executed in offline mode, so requests can be responded to from the unchanged cache. Only the situation of a collapsed network during a full update is critical. However, failures shorter than the cacheUpdateTimeout will be handled.

  8. The following cases will trigger a full update:

    1. If the DMZ server was started (initial update, i.e. full update with retry).

    2. If the inner AuthenticationService was restarted.

    3. If the fullUpdatePeriod has passed.

    4. The first update of a day is always a full update.

  9. If the next update (because of one of the rules in item 8) needs to be a full one, but the inner server was not reachable, no immediate retry is executed (unless it is an initial update). Instead, the update is cancelled. The next update will then be scheduled as a full update.

  10. If the inner server (or its AuthenticationService) is restarted in a DMZ cluster, the next update is scheduled as a full update.

  11. If in a DMZ cluster one of the DMZ servers is restarted, only its cache is completely updated. The other DMZ servers have their own rules.

Periodic cache replication‌

Data stored in the cache can be updated periodically. The period is set using parameter updatePeriod. After each period, every DMZ server asks its inner server if there have been updates. Typically, a partial update is started then. If more time than the configured fullUpdatePeriod has passed, a full update is executed. If at the start of a partial update a restart of the inner server or its AuthenticationService is detected, a full update is executed instead. In a partial update, version numbers govern which data needs to be updated. For this purpose, it is first determined which partner IDs are affected and whether the change affects the record of the partner, the channels or the certificates. Therefore, the amount of data is minimal and the time taken to execute the update is rather small. This means that even periods of about one minute are possible. A period less than three seconds or negative values are ignored and the default value is used instead.

The message log (./logs/services/message.log) stores the time a partial update occurred.

... SYSTEM:AUTH:MSGAUTHSERVICE: starting partial update of local cache database for partnerIDs: []

If the ID list is empty, no data was copied and only version numbers have been updated.

If updatePeriod is 0, no update is executed during runtime. Not even a full update after the fullUpdatePeriod and also not after a restart of the inner server or if the current date has changed. The immediate update after a 'sync required' message from the inner server is also not executed. The DMZ cache will not be altered after the initial update on DMZ start.

If updatePeriod is set to a very large value, that period will never be reached in practice, but the immediate update after “sync required” is executed if ignoreNotification is not set to "true". In this case, an update is executed only if the DMZ server sends requests to the inner server and relevant changes exist, possibly filtered by channel types according to section “Restricted caching by channel types (cachedChannelTypes)‌”. If an update is started in this manner, a full update is executed if in the meantime the current date changed, the inner server has been restarted or the fullUpdatePeriod has been reached.

If fullUpdatePeriod is set to 0 (default value), full updates are only executed if updates are scheduled at all (see above) and if either the current date changed or the inner server has been restarted.

If fullUpdatePeriod has a very small positive value, e.g. 1, all updates are executed as a full update. This is not recommended. When those updates occur is governed by the rules described above, not by the setting of this parameter.

Configuration of parameter subID

The inner server only sends the flag “sync required” to a DMZ server, until that DMZ server has refreshed its cache. In a DMZ cluster, the inner server needs to distinguish between the different DMZ servers. To achieve this, the subID is used. During the DMZ server start, a value for the subID is computed from the host name and the IP addresses of the server. The value is in the range of 0...31, therefore it cannot be ruled out that two DMZ servers do receive an identical subID, although it is not very likely. In an immediate update, only one of the servers would be updated (which one is random). In this case, a unique subID for each DMZ server needs to be guaranteed through explicit configuration.

Shortening access time in offline mode (maxWait)‌

The DMZ server can only determine if a request to the inner server fails due to unavailability by not getting a response within a timeout waiting period. The duration can be set using the parameter maxWait. The default value is 8000 ms (= 8 seconds). In this case, the request is responded to from the cache. For following requests, the waiting period for synchronous messages is shortened. As soon as a request to the inner server was a success again, the waiting period is set back to maxWait. The shortened waiting period is calculated using the formula (maxWait/10) + 200.

Persistent or temporary cache (localDbFile)‌

The local cache database of the DMZ server can either be temporary in the main memory only (default). In this case, the cache is lost in case of a restart of the DMZ server. If the DMZ server is restarted in this scenario and the inner server is unreachable, the DMZ server cannot take requests independently.

However, the local cache database can also be stored in the file system of the DMZ server persistently. This can be achieved by using parameter localDbFile, which holds a relative or absolute path to a table space file. The file is created at the specified location. If the folder does not exist, it is created automatically.

If using a persistent cache, the DMZ server can access a valid cache after being restarted. In this situation, an immediate attempt is made to refresh the cache with an initial update, but if the inner server is not accessible, the old cache content remains valid, even if the update attempts are abandoned after reaching the maximum number.

Caution: If the DMZ server accepts requests independently, they count as received, but they are not processed. It is the responsibility of the user to assure that the inner server is operational in due course. The sent data is stored as a persistent message until the inner server can receive them. If the lifeTime of the persistent message is over, the received data is lost. The ability of the DMZ server to accept data independently is therefore only fit to bridge gaps during short network failures or maintenance windows.

Caution: If a persistent cache is used, the communication parameters are stored in the file system of the DMZ server. Using a temporary cache, therefore, protects you much better against unauthorised access of this data. However, the passwords are stored encrypted in any case.

Restricted caching by channel types (cachedChannelTypes)‌

If the DMZ server only supports individual communication protocols, you can restrict the caching function to the corresponding types of partner channels. In order to do so, use parameter cachedChannelTypes, which defines a list of those types. The list values are delimited by a comma, space or tab. Each channel type is specified by a numerical value. If the list is empty (default), all types are copied to the cache database. A restriction applies to all cache update types (initial, complete, partial).

Following the meaning of the numerical values.

1

TYPE_COMM_AS2

2

TYPE_COMM_FTP

3

TYPE_COMM_OFTP

4

TYPE_COMM_HTTP

5

TYPE_COMM_MAIL

6

TYPE_COMM_SSH

7

TYPE_COMM_X400

8

TYPE_COMM_FAX

Deactivate caching (noCaching)‌

If the configuration parameter noCaching has the value "true" when the MessageAuthenticationService is started, the local cache database is not created and no updates are executed. Therefore, all requests need to be directed to the inner server. If the inner server is not available, the request fails with an exception.

Summary and parameter list

The configuration file ./etc/auth_dmz.xml contains the most important configuration parameters for the service as comment blocks (see table below). All parameters are located within the Configure element.

<Configure class="com.ebd.hub.services.auth.MessageAuthenticationService">
...
</Configure>

Each parameter is defined in the following way (abstractly here).

<Set name="parameterName">parameterValue</Set>   

Parameters

Parameter

Function

Default Value

cachedChannelTypes

Comma-delimited list of numeric channel types that should be stored in the cache. If undefined, all types (1,2,3,4,5,6,7,8), see section “Restricted caching by channel types (cachedChannelTypes)‌”.

All types.

cacheUpdateTimeout

Timeout [ms], of how long to wait for a valid cache before a request fails.

8000 (8 seconds)

defaultTarget

<host>:<port> of the inner MessageService.

Default port: 8020 (if only the host has been specified)

defTargetService

Name of the inner AuthenticationService.

Authentication Service

fullUpdatePeriod

Period [ms] after which the next update is a complete one. Value 0 means no complete updates except for once a day and after a restart.

0 (no complete update)

ignoreNotification

If true, the immediate cache replication is switched off.

false

localDbFile

Path to the table space of a persistent cache database. If defined, the file including all folders is created. If undefined, the cache is stored in the main memory. Note: See also section XML Configuration (DatabaseService) (→ addRolloverHandler).

Not persistent.

maxInitAttempts

Number of attempts for an initial update.

12

maxWait

Waiting time [ms] for the response of a synchronous message.

8000 (8 seconds)

messageContext

Context of consumer queue of inner AuthenticationService.

System

messageQueue

Name of the consumer queue used for requests to the inner AuthenticationService. It must match the name of the queue the inner Authentication Service is registered to.

AuthCall

noCaching

true disables caching in general.

false

subID

Positive integer between 0 and 31 to identify a DMZ server in a cluster, see section “Configuration of parameter subID".

Calculated automatically.

updatePeriod

Period [ms] for cache updates. Value 0 means no update. Negative values or values between 1 and 3000 are ignored.

900000 (15 minutes)

Availability of CommunicationLogService (offline mode)

The CommunicationLogService stores communications in the Communication Log, like for example the upload of a file via FTP, including timestamp, job number, etc. The database table is maintained by the inner server. On the DMZ server, the CommunicationLogService is provided by class com.ebd.hub.services.commlog.MessageCommunicationLogService. The file ./etc/commlog_dmz.xml is used to configure the service, see table below.

In online mode (regular case), accesses to communication services on the DMZ server are immediately forwarded to the inner server. The MessageCommunicationLogService uses synchronous messages to the inner server. If the synchronous message can not be delivered because, for example, the inner server is down or if the response is not received during the maximum waiting time maxWait for any other reason, a persistent message is generated (offline mode).

In offline mode the MessageService tries independently to forward this persistent message. As soon as the inner server can be reached again, the communication event is logged retroactively. However, if the lifeTime of this message is exceeded, the log entry is lost. Note: The CommunicationLogService cannot log a job number in offline mode because the data are not being processed yet. These jobs then have no job number.

The MessageCommunicationLogService accesses its local MessageService using a service name that can be configured in parameter messageServiceName. Additionally, it needs access to its local AuthenticationService through the service name, which can be configured using parameter authenticationServiceName. Valid defaults are preconfigured. An explicit configuration is only necessary in special cases.

Analogous to the MessageAuthenticationService, the Message Queue, which is used for communication with the inner server, can be configured using parameters messageContext, messageQueue and defTargetService. Valid default values are preconfigured. An explicit configuration is only necessary in special cases and only in combination with the inner server.

The defaultTarget defines the message route to the inner server and does not need to be configured here if it has already been configured for the MessageAuthenticationService.

The parameter maxWait defines the maximum waiting time in milliseconds that a response to a synchronous message is waited for. The default value of 8000 ms should be sufficient for all use cases. The value should be large enough so that messages appearing after that time do not occur in practice.

The parameter lifeTime should be configured explicitly if you need to bridge gaps that could take longer than the default value of two hours. If the network connection is lost for a longer period than configured in lifeTime, logs will be lost.

The file ./etc/commlog_dmz.xml contains comment blocks that hold the most important configuration parameters for the service. All parameters are located in the Configure element.

<Configure class="com.ebd.hub.services.commlog.MessageCommunicationLogService">
...
</Configure>

Each parameter is defined in the following way (abstractly here).

<Set name="parameter_name">parameter_value</Set>    

Parameters in file ./etc/commlog_dmz.xml:

Parameter

Function

Default value

authenticationServiceName

Registered name of the AuthenticationService on this DMZ server.

AuthenticationService

defaultTarget

<host>:<port>, where the remote interface of the inner MessageService is reachable. For example 192.168.254.12:8020

8020 (default value for the port if only the host and no port is specified)

defTargetService

The name, under which the inner CommunicationLogService is registered.

CommunicationLogService

lifeTime

Lifetime for persistent messages in ms.

7200000 (2 hours)

maxWait

Timeout using synchronous messages in ms.

8000 (8 seconds)

messageContext

Context of the consumer queue of the inner CommunicationLogService.

System

messageQueue

Name of consumer queue that is used for requests to the inner CommunicationLogService. It needs to match the name of the queue that the inner CommunicationLogService is registered to.

CommlogCall

messageServiceName

Registered name of the MessageService on this DMZ server.

MessageService