2014-12-04 21:02:12

Yesterday I encountered a strange bug - programatically generated archives of a Lucene index in the tar format were corrupted during creation. The process finished normally, but occasionally the resulting archive would be broken. Turns out that someone had (with probably good reason once upon a time) created our own implementation of a tar packaging module in Java, based upon the Apache Ant task.

The problem with the original source ode is: the Ant tar task is limited to an individual file size of 8 GByte, though the resulting archive may be far larger. This was fixed in the Apache Compression library v1.4, some time ago, but you would have to use one of the Gnu Tar formats which support unlimited file size.

The workaround for the current problem seems to be: use the default Lucene indexRamBufferSize setting of 16 MByte, so the segment files won't grow to 25 GByte and stay below the 8 GB limit. But a change of the compression module in the near future (preferably using standard open source components instead of homebrew versions) is already planned.