Path to operational excellence

Management Focus

Management focus is important in ensuring efficient and effective operation, and managers should prioritize and dedicate their attention, resources, and effort towards improving integration operations. They should identify and set priorities, establish strategies and goals, and monitor progress to ensure that the company’s objectives are met efficiently and effectively. Management should also focus on continuous improvement to sustain operation excellence.

Improve Monitoring

To improve monitoring, it is recommended to monitor all servers and processes manually at fixed intervals, such as every 4 hours, and manually check the infrastructure usage status like CPU and memory. Additionally, monitoring can be improved through tools such as Splunk and Qlik, and the alerting mechanism can be enhanced by implementing automatic alerts/notifications upon process failure.

Resourcing and collaboration

Resourcing is crucial in ensuring efficient operation, and it is important to onboard resources with the right skill sets. These resources should first contribute to the development team and be trained frequently on security issues to prevent security incidents. They should also be encouraged to share knowledge across the team after issues are identified and resolved.

Coordination between the development team and operation team is essential, and DevOps methodology should be followed. Team lead should ensure that operation-friendly code is developed. The development team should also follow design principles and best practices and knowledge transfers should be completed between the operation and development teams, and the operation team should provide continuous feedback to the development team.

Analysis and reporting of issues.

Analyzing failures is crucial, and events generated by different processes should be monitored, failures should be analyzed to find the root cause, and corrective and preventive measures should be taken accordingly. Issues should be anticipated beforehand, knowledge articles should be created about issues, and an active to-do list should be maintained. If external issues cause processes to stop, they should be started after issues are resolved.

Status reporting to the management is essential, and the operation team should provide insight into the operations to the management. They should present a dashboard of incidents, service requests, failures, and improvements made and highlight major issues like failures, providing reasons for the failures. The operation team should be transparent to the management and show the corrective measures taken, asking for management support if needed.

Compliance and Security

To ensure compliance and security, internal audits should be done on frequent intervals to check process compliance. This audit should include all aspects of operational governance to ensure the team is fulfilling their duty. User access to the application should be checked and verified periodically, and the access of users who have left the team should be removed. Patching in HA environments should be done in sequence so that it does not impact the services. After patching is completed, server status should be manually checked. Pen test/security findings should be mitigated with priority, and vulnerability should be managed through patch updates or other measures. It is recommended to proactively look for security certificate renewals before expiry. DR testing processes should be deployed in production beforehand, and actual processes running in production may  not be used for DR testing as it might interrupt current business operations.

Synchronous web service calls overhead

 

Implementing and managing synchronous design pattern is easy in middle-ware. Because most of retry and error handling is taken care of source application itself. It has many benefits and drawbacks. One of the drawback is that thread is blocked until  it gets response back from ESB. Other  is  that , web service invocation may results in a timeout when ESB layer takes longer time to reply back to source application. This will have impact on performance. Sometimes it is hidden from our view that many applications are participating in this chain of invocation to complete the web service invocation loop. Following might be the typical scenario.

Source application is invoking a web service to the ESB and wait for ESB to complete the process. ESB in-turn calls target application and wait for response. Target waits for response from database. The database waits for synch up with DR site before responding.

In the above example entire chain is spread over five applications.  So source application waits till all the 5 applications complete their tasks. It take considerable amount of time.  It may results in time-out, duplicate message issue etc.  In this context, asynchronous invocation is a better option.

Test Cases

Test cases document is one the artifacts that ESB team needs to prepare.

Normally ESB team should do the testing in Dev and QA environment before processes are handed over to program or project team for UAT testing.

We should capture all the test cases covering each function mentioned explicitly and implicitly in FS. You should ask source system team to provide sample payload.  You can use the same in testing tools (SoapUI /Postman etc) to trigger the processes.

You should involve source and target application team to test the process in development environment if possible rather than involving them in higher environments for UAT.   The sooner you involve them the better. They should start with sanity testing and dry run afterwards. You should detect as many bug as possible in development environment itself. It will save efforts as detecting and fixing bugs in higher environment is costlier.

The test cases I have mentioned below should be prepared and executed by middleware team.

UAT testing plan should be prepared and executed by respective application team intended to receive or send data. Because most of the time middleware team does not have access to the source and target application.

You can use same documents to capture test results. That is why fields related to test results are also added.

The test cases document should have following contents.

List of processes to be tested.

 

Sr No Process Name Successfully Tested(Y/N) Date of Testing Comments

(You can have separate test case document for each processes if processes are complex and there are large number of test cases for each process.)

Process Name:  (Mention the process Name to be tested)

Testing scenarios:

(You should capture high level test scenarios. Normally you should do negative testing first, I have given sample test cases as per sample FS which transfer web service payload to destination after database lookup. You should set up test cases scenarios in order of functionality to be executed by the process)

 

Test Case No FS-ID Description Accepted(Yes/No) Reject Reason Comments
TC-ERR-01 FS-ERR1 Check if  invalid Messages sent  from Target
TC-ERR-02 FS-ERR-2 Check Connection failure to the database
TC-ERR-03 FS-ERR-3 Check connection failure to the Target system
TC-GEN-04 FS-GEN-1,FS-GEN-2,FS-GEN-3,FS-GEN-4,FS-ERR-3 Check  all the processing steps of successful  Message transfer

 

Test case: TC-ERR-01 (For each test cases, you should define detail steps)

Description: Check if invalid Messages sent from Target

Steps

Step No Step Description Test Data Expected Results Actual Results Date of Execution Execution Status(passed/Failed) Comments
1.1  

Trigger the process with invalid payload (from any SOAP /REST API  testing tools like Postman, SoapUI etc)

Payload=Invalid payload 1.Error mail notification should sent to the support group with invalid payload notification message

2. An audit entry should be inserted in audit table.

 

 

Test case: TC-ERR-02

Description: Check connection failure to the database system

(Mention actual test data in the testdata column)

Steps

Step No Step Description Test Data Expected Results Actual Results Date of Execution Execution Status(passed/Failed) Comments
2.1 1.     Change database  connection to wrong jdbc url

2.     Trigger the process (from any SOAP /REST API  testing tools like Postman, SoapUI etc)

Jdbc Url=wrong url,

Payload=valid payload

1.Error mail notification should sent to the support group with database connection failure messages

2. An audit entry should be inserted in audit table.

2.2 1.     Set connection String to correct JDBC url

2.     Set userid/password to invalid credentials

 

3.      Triger the process

Database url=correct url,

Userid/ Password=invalid,

payload=valid

1.Error mail notification should sent to the support group with invalid credentials message

2. An audit entry should be inserted in audit table.

 

Test case: TC-ERR-03

Description: Check connection failure to the target system

Steps

Step No Step Description Test Data Expected Results Actual Results Date of Execution Execution Status(passed/Failed) Comments
3.1 1. Change the connection url to  any wrong url as per test data,

2. Change the database connection to valid  URL with valid credentials

 

2.Trigger the process

url=invalid url,

JDBC=correct url,payload=valid

1.Error mail notification should sent to the support group with Connection failure messages

2.  An audit entry should be inserted in audit table.

3.2 1. Set connection String to correct url.

2.Set  userid /password to  invalid credentials

 

3.Triger the process

Connection url= valid url,

userd/password=Invalid

payload =valid

1.Error mail notification should sent to the support group with invalid credentials message

2. An audit entry should be inserted in audit table.

 

Test case : TC-GEN-04

Description: Check successful message delivery

Steps

Step No Step Description Test Data Expected Results Actual Results Date of Execution Execution Status(passed/Failed) Comments
4 1. Set connection credentials to correct values

 

2.Trigger the process with valid payload

Connection url=Valid,

JDBC=correct url,

Credentials =correct,

payload=valid

1.Success return code is sent from target.

2. An audit entry should be inserted in audit table.

 

 

 

 

Functional Specification

FS is important for any project. Normally you should get it from customer project team. If not, you have to prepare one.

Following content  integration FS should have. I have put some sample diagram.

Overview

Application context diagram

Interface list

(put message type, source and destination )

Business process diagram for each use case

(Sample  flowchart is given below)

 

S. No Steps Descriptions
1 Boomi Receive request from Source Source invokes Boomi Web service and pass the payload
.. .. ..

 

Sequence diagram for each use case.

(Sample sequence diagram is given below)

Functional requirement

(Here you should describe detail functional requirement like below. You can have separate section for other nonfunctional requirement or you can have those in single tabular format)

 

 

FS-ID Interface/module Description Reference to URS
FS-GEN-1   Source should  send the messages to ESB process by invoking REST endpoint URS-GEN-1
FS-GEN-2   ESB process should do validate the messages URS-GEN-2
FS-GEN-3   ESB process should do database look up to enrich the messages URS-GEN-3
FS-GEN-4   ESB process should send the messages to the target after validation and enrichment URS-GEN-4
.. .. ..  

 

 

Security requirement

Sample as below

  1. Source will use basic authentication to connect to ESB process
  2. ESB will connect to database through JDBC with basic authentication
  3. ESB will connect to Target with Two way SSL

Error handling requirement

 

FS-ID Interface/module Description Reference to URS
FS-ERR-1   ESB should check validity of payload and send email notification accordingly URS-ERR-1
FS-ERR-2   Connection failure to the external system should generate email notification URS-ERR-2
FS-ERR-3   Audit Entry to be generated for successful and failed messages. URS-ERR-3
.. .. ..  

 

Audit requirement

Compliance requirement

Non Functional requirement

SLA requirements

Mapping and transformation requirement.

(This section is very important)

 

Message Name Source field Transformation  Target field

 

Volumetrics

Sample Messages

Sample XSD.

OAuth 2.0 Authorization Framework

OAuth 2.0 is an authorization framework which enables client to access protected resources on behalf of resource owner.

Earlier we had OAuth 1.0 which is now obsolete, was mainly specific to web client.  OAuth 2.0 framework is for web as well as non-web clients. OAuth 2.0 also simplified the interaction between OAuth roles.

Following roles are defined for OAuth 2.0

1.     Resource owner

2.     Client

3.     Authorization server

4.     Resource server

Refer below sequence diagram  to know how each role interact.

OAuth Roles Interaction

For accessing resources from ESB process, we may need mainly three roles. Here client itself is resource owner.

1.     Client

2.     Authorization server

3.     Resource server

Refer below sequence diagram.

Client needs to pass authorization grants to the Authorization server to get the access token.

An authorization grant is a credentials representing resource owner authorization to get access token.

There are four Authorization grant types: Authorization Code, Implicit, Resource Owner Password Credentials, and Client Credentials.

Normally we will use Client Credentials for ESB to get access token.

There are two types of client. One is confidential and another is public.

For confidential client, client needs to authenticate with client credentials with Authorization server. Authorization server will provide client id and client secret as credentials after client is registered.

Authorization server returns access token and optionally expiry token after client authentication is successful.

Flow Chart.

Logic can be following.

  1. Get access token (optional expiry token) from authorization server by presenting  client credentials(client id +client secret)
  2. Get the protected resources by presenting access token.
  3. If token expires, get the access token again by presenting client credentials.

If ESB process is accessing the protected resources less frequently, no need to store the access token in a cache. You can get fresh access token for every new request.  But when the process needs to access protected resource very frequently, it is better to cache the access token and use it for next time. Access token normally expires after certain duration. It that case, it would be good to store and reuse access token  for  better performance as every time for accessing protected resource requesting  access token is very much time consuming. Also you can use refresh token to get new access token if refresh token is available.

You may encrypt and store access token as required for better security. Some ESB tools come up with built in adapter for OAuth 2.0. You may also use native HTTP client adapter for this purpose. Using HTTP adapter, you may need to make   two HTTP calls. First call is to get access token and second call is to access protected resources.

Processing JMS messages in sequence

Processing JMS messages in sequence

Processing messages in sequence is a challenge in ESB.

We have a notion that message queue serves the messages in FIFO so sequence will be persevered.  But that is not correct. In various situation messaging order will be violated.

Message Sequence Violation: Single Server with multiple threads

 

 

Message sequence may not be retained though JMS listener process is running in a single server.

When subscriber process run in single server environment, it may instantiate multiple subscriber instances when multiple messages are pushed by message broker.  Each instances run in parallel independent of others. So messaging sequencing may be violated though message brokers sends the messages in sequence.

Workaround:   Run the process in a single thread.  Message broker connection parameter should be configured such a way that message broker deliver a single message at a time.

Message Sequence Violation:  In server cluster environment, message sequence may not be retained, even when subscriber process may run in a single thread.

 

For cluster environment, connection will be made from multiple servers to the message broker.

Message broker normally sends the messages to the multiple subscribers, in round robin fashion. So in this case even though process running in single thread in servers, may violate FIFO sequence.

Workaround:

In this case broker connection should configured in such way that it sends the messages only to the single subscriber at a time.  Or we can configure the connection in such way, that it send the messages to the subscriber which made first connection.

In cluster environment when connection is made from the both server, broker should ensure that messages are delivered to the single subscriber only. It will stick to the subscriber who makes the connection first, in spite of number of subscriber connected.

Message Sequence violation: messages could not be delivered due to connection failure, sever issue or business error.

 

In normal situation, suppose your process is delivering messages in sequence (fulfilling workaround that is described above two cases). Suppose, a message could not be delivered to the target due to connection issue or server issue or business error.

In this can we have to take decision, whether we should keep on trying to send the messages to the target to maintain sequencing.  Keeping on trying may not be good idea, sometimes the messages can never be accepted by target due to business error. If you keep on trying to send such messages to the target, other messages will also be stuck. Also sometimes it is difficult to know if rejection happens due to business error or system error.

Workaround:

Best solution to handle such case is hybrid solution that is using Message Queue and Database combination.

If the message could not be sent to the target, send that message and subsequent messages to the database.  Write a batch job which will try to post the records to the target after reading from the database (while reading records should be sorted    on the basis of timestamp)

Best approach would be after couple of try, send notification to the business users. Let business user   take care of the messages. They can decide   if messages need to be retried or can be fixed.

Message subscriber process should do database look up and check if there is any record in the error table. In such cases all the subsequent messages should be  sent to the database instead of sending to the target.

The batch processes which picks up the messages from database should run  in a single thread.

This is inefficient way of handling sequencing.   We should follow messaging grouping context described in below section.

Workaround:  message group or set of records

 

For bad message handling, sending that particular  message and all the subsequent messages in the database to maintain sequencing, might be a bad idea. So we should follow the concept of message group/set of records to manage sequencing.  For example, we can group messages on the basis of customer id. Suppose a set of messages belongs to particular customer. So if  a bad message belongs that customer, other messages of that  customer along with that bad message, will be sent to the error table. Other customer’s messages will not be impacted. In the above diagram, message 2 of customer 2 have issue, so message 2 along with message 3 will be sent to the error table though message 3 is a good message.

When ESB gets a bad messages, only that particular message group will be impacted. This is a better solution.

.Group of messages processing in parallel

With message grouping, not only you can handle error situation better, also you can process multiple group of messages in parallel.  Thus improving the overall efficiency. You can send separate message group in separate server.  You can configure message broker in such way that it sends the message of same group on the same server.  In this way, you can achieve parallel processing maintaining message sequence.

Conclusion:

Queue is not enough to manage message sequence in real life scenario.   Best way is to use hybrid model (Message Queue + Database) and build custom solution.

When message could not be sent to the destination due to any issue, the message should be sent to error table.

Next for fresh messages coming to the queue, database lookup should be done to check if previous records already exist or not in the error table. If previous records exist, current record /message should not be sent to the destination and should be inserted to the database instead.

A batch job should run periodically to read the messages from the database. Messages should be sorted on the basis of timestamp.

Also you need to follow message grouping rules for parallel processing and for efficient processing of messages.

Pagination

Pagination:

This is a type of bulk processing. ESB pulls the data from source system page by page due to restriction placed in the firewall.  Each page can have multiple records. Suppose a restriction is enforced that for a web service call, no more than 6MB data can be sent through firewall. Let’s say source application needs to send 40MB of data to target. So ESB should make 7 web service calls to pull entire 40MB of data. For each call source application will only send 6MB of data. This is called a page.

Source system should support pagination. Page number should be passed as parameter to the source. Page number can be mapped to row number as well.

Sequential pagination.

 

Flowchart.

ESB gets the data from source, one page after another in sequence and pass to the target in sequence

Usage:

  Use this pattern, when records are to be processed sequentially. This pattern is slow.

Parallel pagination.

ESB get pages in parallel from source and send it to destination in parallel.

Usage: Use this pattern, when faster processing is required and sequential processing of records is not important.

Combining pages in ESB layer

Here ESB layer gets the pages from source, combines and send to target as one document.

Usage: Use this pattern when target needs all the records in one document.

Bulk Processing

Bulk processing

Bulk processing means multiple number of messages, documents, records etc from source get processed by ESB and are sent to target in a single execution.

Source application send large number of messages in one go to the ESB layer. ESB layer enrich and transform each message separately and send it to the destination.

Source application can push the messages to the ESB or ESB can pull the messages from source application. ESB can pull the messages from file, database, web service etc. provided by source application.

Messages can be processed by middleware in different ways

 Parallel Processing

bulk processing -parallel

bulk processing -parallel

 Source application sends number of messages in one go to the ESB and ESB enriches the messages and pass to the destination in parallel.

Source application invokes web service provided by ESB and pushes large number of messages to the ESB.  ESB splits batch of messages into individual message and processes each messages in parallel.

Usage:

Use this pattern for faster processing when target support parallel connections and number of messages are less. Use this pattern when server has enough available memory and CPU. This pattern consumes lots of resources in server.

Sequential Processing

 Source application send the large number of messages in one go and ESB process the messages sequentially.

Usages:  Use this pattern, when target does not support parallel connection.

Hybrid Processing

 This is combination of parallel and sequential processing.

Here large batch of messages can be split into smaller batch and each batch can be processed in parallel. Messages in each batch can be processed in sequence.  Also reverse can be done, like processing each batch in sequence and messages in each batch in parallel.

Usage

Parallel processing is faster but needs lot of resources in server. Also target application should support parallel processing. On the other hand sequential processing takes less resources but can be slow. Use hybrid pattern to strike a balance between these two methods.

Error Handling.

The messages in a batch that cannot be posted to destination, should retried later. The messages can be stored in the Error Table for retry.  The layout of the table   is given in another post.  Message look up should be done against error table to avoid overwriting new messages will old one during retry of the messages.

Scheduled Bulk Process

Here ESB process pulls the data from source system; it may be   data from file, database or messages returned by web service. This is normally a scheduled process. Sometimes firewall does not allow push connection from external systems to the internal targets. When push is not allowed, pull method can be used. The pull process can be scheduled to periodically pull the data from source and sending to the target. After pulling the messages from the source it enriches and transforms and sends to the target. It can use any of the design pattern like parallel, sequential, hybrid processing that were described above.

Multiple Files to Multiple Targets

Sometimes we might have a situation that we have multiple files in the same source folders. In that case we should use file transfer tracking tables to transfer and archive the files.  We can skip the files to send that are already processed as per file transfer tracking table. This way we can avoid sending duplicate files to the target in case of failure. We can have a single process to transfer all the files.  We may pick up all the files in one go but we need to process each file   separately  if we use file transfer tracking table.

It is a best practice to use file transfer tracking table to send the file to the target.

For multiple source to multiple targets we should simplify the architecture by adding separate set of processes for each source and follow single source to multiple targets design pattern.

Single File to Multiple Targets- approach 4

Transferring files using message queue.

In this approach we transfer the files to different target using message queue.

Here we can treat the file content as a message. We should have a  publisher process which read the file from destination and publish the content into the queue .Also this process archive and deletes the files from the source after successful publishing. Subscriber process listen to the queue and send the file to the destination.

We can use topic to publish the messages if more than one subscriber needs to send the file to several targets.

Use Case

 Use this pattern, when you can leverage a messaging system and when your file size is small. Also you can consider to use this approach when more number of subscribers are expected to be added in later stage.

Benefits:  Flexible design pattern. New consumer process can be added for each new target without impacting existing process. Maintainability is high.

 Disadvantage: You need messaging system to implement this. Also you may not transfer big file.