Category Archives: designpattern

Synchronous web service calls overhead

 

Implementing and managing synchronous design pattern is easy in middle-ware. Because most of retry and error handling is taken care of source application itself. It has many benefits and drawbacks. One of the drawback is that thread is blocked until  it gets response back from ESB. Other  is  that , web service invocation may results in a timeout when ESB layer takes longer time to reply back to source application. This will have impact on performance. Sometimes it is hidden from our view that many applications are participating in this chain of invocation to complete the web service invocation loop. Following might be the typical scenario.

Source application is invoking a web service to the ESB and wait for ESB to complete the process. ESB in-turn calls target application and wait for response. Target waits for response from database. The database waits for synch up with DR site before responding.

In the above example entire chain is spread over five applications.  So source application waits till all the 5 applications complete their tasks. It take considerable amount of time.  It may results in time-out, duplicate message issue etc.  In this context, asynchronous invocation is a better option.

Processing JMS messages in sequence

Processing JMS messages in sequence

Processing messages in sequence is a challenge in ESB.

We have a notion that message queue serves the messages in FIFO so sequence will be persevered.  But that is not correct. In various situation messaging order will be violated.

Message Sequence Violation: Single Server with multiple threads

 

 

Message sequence may not be retained though JMS listener process is running in a single server.

When subscriber process run in single server environment, it may instantiate multiple subscriber instances when multiple messages are pushed by message broker.  Each instances run in parallel independent of others. So messaging sequencing may be violated though message brokers sends the messages in sequence.

Workaround:   Run the process in a single thread.  Message broker connection parameter should be configured such a way that message broker deliver a single message at a time.

Message Sequence Violation:  In server cluster environment, message sequence may not be retained, even when subscriber process may run in a single thread.

 

For cluster environment, connection will be made from multiple servers to the message broker.

Message broker normally sends the messages to the multiple subscribers, in round robin fashion. So in this case even though process running in single thread in servers, may violate FIFO sequence.

Workaround:

In this case broker connection should configured in such way that it sends the messages only to the single subscriber at a time.  Or we can configure the connection in such way, that it send the messages to the subscriber which made first connection.

In cluster environment when connection is made from the both server, broker should ensure that messages are delivered to the single subscriber only. It will stick to the subscriber who makes the connection first, in spite of number of subscriber connected.

Message Sequence violation: messages could not be delivered due to connection failure, sever issue or business error.

 

In normal situation, suppose your process is delivering messages in sequence (fulfilling workaround that is described above two cases). Suppose, a message could not be delivered to the target due to connection issue or server issue or business error.

In this can we have to take decision, whether we should keep on trying to send the messages to the target to maintain sequencing.  Keeping on trying may not be good idea, sometimes the messages can never be accepted by target due to business error. If you keep on trying to send such messages to the target, other messages will also be stuck. Also sometimes it is difficult to know if rejection happens due to business error or system error.

Workaround:

Best solution to handle such case is hybrid solution that is using Message Queue and Database combination.

If the message could not be sent to the target, send that message and subsequent messages to the database.  Write a batch job which will try to post the records to the target after reading from the database (while reading records should be sorted    on the basis of timestamp)

Best approach would be after couple of try, send notification to the business users. Let business user   take care of the messages. They can decide   if messages need to be retried or can be fixed.

Message subscriber process should do database look up and check if there is any record in the error table. In such cases all the subsequent messages should be  sent to the database instead of sending to the target.

The batch processes which picks up the messages from database should run  in a single thread.

This is inefficient way of handling sequencing.   We should follow messaging grouping context described in below section.

Workaround:  message group or set of records

 

For bad message handling, sending that particular  message and all the subsequent messages in the database to maintain sequencing, might be a bad idea. So we should follow the concept of message group/set of records to manage sequencing.  For example, we can group messages on the basis of customer id. Suppose a set of messages belongs to particular customer. So if  a bad message belongs that customer, other messages of that  customer along with that bad message, will be sent to the error table. Other customer’s messages will not be impacted. In the above diagram, message 2 of customer 2 have issue, so message 2 along with message 3 will be sent to the error table though message 3 is a good message.

When ESB gets a bad messages, only that particular message group will be impacted. This is a better solution.

.Group of messages processing in parallel

With message grouping, not only you can handle error situation better, also you can process multiple group of messages in parallel.  Thus improving the overall efficiency. You can send separate message group in separate server.  You can configure message broker in such way that it sends the message of same group on the same server.  In this way, you can achieve parallel processing maintaining message sequence.

Conclusion:

Queue is not enough to manage message sequence in real life scenario.   Best way is to use hybrid model (Message Queue + Database) and build custom solution.

When message could not be sent to the destination due to any issue, the message should be sent to error table.

Next for fresh messages coming to the queue, database lookup should be done to check if previous records already exist or not in the error table. If previous records exist, current record /message should not be sent to the destination and should be inserted to the database instead.

A batch job should run periodically to read the messages from the database. Messages should be sorted on the basis of timestamp.

Also you need to follow message grouping rules for parallel processing and for efficient processing of messages.

Pagination

Pagination:

This is a type of bulk processing. ESB pulls the data from source system page by page due to restriction placed in the firewall.  Each page can have multiple records. Suppose a restriction is enforced that for a web service call, no more than 6MB data can be sent through firewall. Let’s say source application needs to send 40MB of data to target. So ESB should make 7 web service calls to pull entire 40MB of data. For each call source application will only send 6MB of data. This is called a page.

Source system should support pagination. Page number should be passed as parameter to the source. Page number can be mapped to row number as well.

Sequential pagination.

 

Flowchart.

ESB gets the data from source, one page after another in sequence and pass to the target in sequence

Usage:

  Use this pattern, when records are to be processed sequentially. This pattern is slow.

Parallel pagination.

ESB get pages in parallel from source and send it to destination in parallel.

Usage: Use this pattern, when faster processing is required and sequential processing of records is not important.

Combining pages in ESB layer

Here ESB layer gets the pages from source, combines and send to target as one document.

Usage: Use this pattern when target needs all the records in one document.

Bulk Processing

Bulk processing

Bulk processing means multiple number of messages, documents, records etc from source get processed by ESB and are sent to target in a single execution.

Source application send large number of messages in one go to the ESB layer. ESB layer enrich and transform each message separately and send it to the destination.

Source application can push the messages to the ESB or ESB can pull the messages from source application. ESB can pull the messages from file, database, web service etc. provided by source application.

Messages can be processed by middleware in different ways

 Parallel Processing

bulk processing -parallel

bulk processing -parallel

 Source application sends number of messages in one go to the ESB and ESB enriches the messages and pass to the destination in parallel.

Source application invokes web service provided by ESB and pushes large number of messages to the ESB.  ESB splits batch of messages into individual message and processes each messages in parallel.

Usage:

Use this pattern for faster processing when target support parallel connections and number of messages are less. Use this pattern when server has enough available memory and CPU. This pattern consumes lots of resources in server.

Sequential Processing

 Source application send the large number of messages in one go and ESB process the messages sequentially.

Usages:  Use this pattern, when target does not support parallel connection.

Hybrid Processing

 This is combination of parallel and sequential processing.

Here large batch of messages can be split into smaller batch and each batch can be processed in parallel. Messages in each batch can be processed in sequence.  Also reverse can be done, like processing each batch in sequence and messages in each batch in parallel.

Usage

Parallel processing is faster but needs lot of resources in server. Also target application should support parallel processing. On the other hand sequential processing takes less resources but can be slow. Use hybrid pattern to strike a balance between these two methods.

Error Handling.

The messages in a batch that cannot be posted to destination, should retried later. The messages can be stored in the Error Table for retry.  The layout of the table   is given in another post.  Message look up should be done against error table to avoid overwriting new messages will old one during retry of the messages.

Scheduled Bulk Process

Here ESB process pulls the data from source system; it may be   data from file, database or messages returned by web service. This is normally a scheduled process. Sometimes firewall does not allow push connection from external systems to the internal targets. When push is not allowed, pull method can be used. The pull process can be scheduled to periodically pull the data from source and sending to the target. After pulling the messages from the source it enriches and transforms and sends to the target. It can use any of the design pattern like parallel, sequential, hybrid processing that were described above.

Multiple Files to Multiple Targets

Sometimes we might have a situation that we have multiple files in the same source folders. In that case we should use file transfer tracking tables to transfer and archive the files.  We can skip the files to send that are already processed as per file transfer tracking table. This way we can avoid sending duplicate files to the target in case of failure. We can have a single process to transfer all the files.  We may pick up all the files in one go but we need to process each file   separately  if we use file transfer tracking table.

It is a best practice to use file transfer tracking table to send the file to the target.

For multiple source to multiple targets we should simplify the architecture by adding separate set of processes for each source and follow single source to multiple targets design pattern.

Single File to Multiple Targets- approach 4

Transferring files using message queue.

In this approach we transfer the files to different target using message queue.

Here we can treat the file content as a message. We should have a  publisher process which read the file from destination and publish the content into the queue .Also this process archive and deletes the files from the source after successful publishing. Subscriber process listen to the queue and send the file to the destination.

We can use topic to publish the messages if more than one subscriber needs to send the file to several targets.

Use Case

 Use this pattern, when you can leverage a messaging system and when your file size is small. Also you can consider to use this approach when more number of subscribers are expected to be added in later stage.

Benefits:  Flexible design pattern. New consumer process can be added for each new target without impacting existing process. Maintainability is high.

 Disadvantage: You need messaging system to implement this. Also you may not transfer big file.

Single File to Multiple Targets – approach 3

In this approach, we copy the file in a temporary storage and after archiving and deleting the file from source folder.  A single file can be replicated in different folders destined for each target.

We will have   separate file transfer process for each target to send the file from source to target.    Each process will read the files from corresponding folders and send it to the destination in parallel. After successful transfer, each process deletes the file from corresponding folders. If particular process fails, then that process to be restarted.

Use Case: Use this approach if file size is small.  Temporary storage can be provided by source system or it can be within ESB server.

Benefits: In this case no file transfer tracking table may be required.  Process for each target can be run in parallel.  Each process  will be simpler thus giving us higher maintainability.  New process for new target can be added without impacting existing process .

Disadvantage: Overhead of replicating files for each target.

Single File to Multiple Targets – approach 2

First approach as we discussed, have one process to transfer file to multiple destinations. But as we mentioned that the process will be complex and maintainability ,scale up will be expensive. So it is good to have separate process to transfer file to each target.  Also we can run each process in parallel.

We will have a separate archival process which run at last after all the files are transferred to each destination.

We can use file transfer tracking table to keep track of whether file is transferred successfully nor not.  Each process will insert a record in file transfer tracking table after successful completion.

The archival process needs to check if the file is transferred successfully or not to all the targets. We can do this by calculating number of process successfully run from file transfer tracking table. This job should be scheduled after the main file transfer processes.  Also each process can trigger archival process after completion, in that case we don’t need to schedule it separately.

Benefits:  Each process can  run in parallel. Process will be very simple and maintainability will be high. We can add new process to transfer the file to new target , without changing existing process.

Single File to Multiple Targets – approach 1

In this requirement, we need to send a single file to multiple targets.

In this design pattern a single ESB process will handle file transfer to multiple destinations. The flowchart is given above.  We have to use a file transfer tracking table to track status of file transfer for each target. We should keep checks in the process not to send the file to target if the file is already sent. This is required when we rerun the process after exception occurs.

The file transfer tracking table can have following fields.  Some fields can be optional. This generic table can be reused for all the ESB file transfer processes.

Program name: Name of the program. This is applicable for big initiative where there are multiple projects under the program.

Project name: Name of the project under the program.

Event name: Event name or message name or entity name etc can be stored

Process name: Name of the process transferring the file.

Filename: This is the unique file name. Source will generate unique file name  normally by post -fixing timestamp or by any  other method.

Messageid: Messageid uniquely identify contents of a file. We can check if file’s content is duplicate with this messageid field. If we see we processed the file with that message id already, we can ignore the file to process further. This can be a primary key of the table.

File Source: Source from where file is being sent

File Target: Destination system where file  is to be sent.

File Sent date: Timestamp of when file sent to target.

Status: Delivery status of the file. You can store value such as “delivered” . Also we can store other intermediate status such as  “picked up”, “archived” etc.

FileTransferid: This may the primary key of the table.  Also we can also use messageid field as primary key. In that case we don’t need this attribute.

This can be used in combination of audit table.

Benefit: Single process can handle file transfer to multiple targets.

Disadvantage:   We can use this pattern, if number of targets are less (may be 2 to3). More number of targets means, process will be more complex. If new targets are added or logic to be changed for particular target, the exiting process needs to get changed. So it requires regression testing. As process is complex maintainability will be expensive.

Standard File Transfer Process

We can use capability of standard ESB tools to transfer files from source to target.  Also we can use standard MFT tools available in the market.

While designing file transfer process, we may need to consider following

  1. Successful transfer of the file to the destination
  2. Archiving and deletion of the file from source after successful transfer.
  3. Resubmission of process after failures.
  4. Not to send duplicate file to the target

Single file to single target

Single file to single target is the easiest file transfer pattern.

We need a single process to implement this pattern.  If the file is transferred successfully, we archive the file and delete the file from source.  If file transfer is not successful than we need to re-run the process.

If process fails in archive and delete stage after successful file transfer, you should not rerun the process. In that case you need to archive and delete manually.

But if you need to do that automatically with re-submission of the process, you need to use file transfer tracking table. With the tracking table you can skip the steps that were already executed. The content of the tracking table is described in multiple target file transfer section.