Skip to main content

Red Hat JBoss Fuse - When size and time does matter

It is very so often in the integration space we need to deal with large amount of data. When designing the integration solution, we really need to stop and take a good look at how to deal with these data.
You may find yourself have to handle large data in the following situations,

  • Incoming data 
  • Processing Data
  • Providing output 

From my experience, when having large amount incoming data, for me it means the data comes in with very high frequency, as well as high volume of messages, the risk of having too much content flooding our application is high, my approach is restricting the data coming into the application, so it runs the maximum capacity but at the same time avoid jamming the system. In this case I will

  • Try to use Polling Consumer if possible, there are many components in JBoss Fuse support polling mechanism. Such as File, FTP, JPA, Quartz2.. etc, it supports configuring how frequently the polling should be. 

  • If no polling consumer available or the data are just coming in too fast, try EIP, the Throttler Pattern, not only it can set the frequency of polling it will also allow you to control how many messages goes in per poll. 

When applying the above pattern, just make sure the place you receiving the input is large enough to temporary hold the incoming data, try to extend the capacity of the service either by have more service to share the load or simply allocate more hardware resources. 

Another situation when it comes to large amount of data is actually handing the large message, because of the way JBoss Fuse process the messages, it loads the message into memories, when the messages gets too big, it will soon run into out of memory problem. So, when dealing with one large chunk of data my approach will be

  • Splitting your content, when dealing with large files, it is always best to split it into smaller chunks if possible, for many reasons, avoid large memory consumptions at once, by splitting, we can even process these smaller chunks in parallel, instead of processing one part after the other. (For large XML file, by using the xml tokenizer, it will significantly reduce the memory usage.)

  • Enable streaming, when this is turned on, instead of holding the entire message in memory it sends big streams to file, you can even configure the StreamCachingStrategy to customized the size, location, buffer size, memory limits, etc. 

  • Filter the content, it is often the case, with large data, not all part of the file is needed for further processing, at the same time the original message is needed later, I would then use the Claim Check pattern to first filter the data send, and then retrieve the original data when needed.

Last but not least, providing large amount of data to broad audiences, clients. We try do least data sending as possible, the most sensible way then is to place a buffer in-between the client and the output procurer.

  • Publish and subscribe(Topic) in messaging, this is probably the first scenario that comes into my mind, but it guarantee the subscriber to get the message as for the producer , it only have to write it once. 

  • Caching medium, when the messages needs to be repetitively read by client, then placing the content into a caching medium such as database or even faster ones like memory caches, is  better then messaging, as messages will be gone as soon as all the recipients receive the data.

As you can see, there are many possible way to handle large data in JBoss Fuse, because of it's flexibility, you have the freedom of choosing the perfect strategy for different situation, there are many more options and combinations we can do, what is your approach when dealing with large data? I am curious of all the genius way people solve their problem, let me know!


Popular posts from this blog

JBoss EAP 6 - 效能調校 (一) DataSource 的 Connection Pool

效能沒有什麼Best Practice, 反正能調整的就那些。 通常,一個程式的效能大概有70-80% 都跟程式怎麼寫的其實比較有關係。

最近我最疼愛的小貓Puji 因為膀胱結石開刀的時候過世了,心情很差請原諒我的口氣沒有很好,也沒有心情寫部落格。

Puji R.I.P.



JBoss 的 SubsystemDatasource WebWeb Service EJB Hibernate JMSJCAJVM 調校OS (作業系統)

先來看一下 DataSource Subsystem, DataSource 的部分主要是針對Connection Pool 做調校。

通常,程式都會需要跟資料庫界接,電腦在本機,尤其是在記憶體的運算很快,但是一旦要外部的資源連接,就是會非常的耗資源。所以現在的應用程式伺服器都會有個Pool 放一些先連接好的 資料庫connection,當程式有需要的時候就可以馬上提供,而不用花那些多餘的資源去連接資料庫。

這就是為什麼要針對Connection Pool 去做調校。

以下會討論到的參數,都是跟效能比較有關係,Datasource 還有很多參數,像是檢核connection 是否正確的,我都不會提到。如果你追求的是非常快速的效能,那我建議你一個檢核都不要加。當然,這樣就會為伺服器上面執行的程式帶來風險。這就是你要在效能與正確,安全性上面的取捨了。 (套句我朋友說的話,不可能又要馬兒好,又要馬兒不吃草的..)

最重要的調校參數就是 Connection 的 Pool 數量。(也就是那個Pool 裡面要放幾條的connection.) 這個參數是每一個應用程式都不一樣的。


Connection Pool 最少會存留的connection 數量


Connection Pool 最多可以開啓的 connection 數量


事先將connection pool 裡面建立好min-pool-size 的connection.

我的建議是觀察一下平常程式要用到的量設定為 min-pool-size 。

Red Hat JBoss Fuse/A-MQ - Fuse and A-MQ Version 6.3 GA is released!

Fuse and A-MQ 6.3 GA has just went out. Maybe, you would think this is just only a minor version release why should I care? Hold your thoughts on that! Because they have done a lot of improvements and also added many new features into this release.

Besides various bug fixes and making sure Fuse Fabric is much more stable. There are two major change in this version update:

New Tooling in JBoss Developer Studio (JBDS) 9.1 GA. Newer Apache Camel version – Camel v2.17. I was really impressed by the work put in to make developing Camel application much simpler. First is the installation of tooling itself. Now it has a all-in-one installer so you don't need to worry about which plugins you need to check. See the videos below to see the new "Getting Started" of Fuse 6.3.

And If you notice from the above video, the presentation of camel route in JBDS has also updated. It fixed some of the miss representation of logic and making it easier to read.

Old Camel Route
New Camel Route
On …

Red Hat JBoss Fuse - Getting Started with Fuse Integration Service 2.0 Tech preview

I just realized that I did not do a getting started for Fuse Integration Service 2.0 Tech preview before I did the pipeline demo, thanks for those of you who reminded me! :)

To get started with FIS 2.0, for people who has just getting to know the technology, here is how I interpret it. Basically, it's divide into two aspect,

1. Integration development, FIS uses Apache Camel as the core technology that creates, orchestrate, compose microservices into a super lightweight thin integration layer, and become the API provider and service orchestrator through exposing RESTful or messaging service endpoints. And you can choose to either package and run it with Spring-Boot or Karaf.

2. Application Deployment and Management, FIS takes advantages of OpenShift platform, and allows you to separately deploy the micro-integration service among distributed environment, at the same time takes care of the failover, high availability, load balancing and service lookup problem for you.

So, now we know …