A semi-common data processing task is to take the result of a query and write it to a file. People usually do this in ColdFusion something like this:
This works fine for small result sets. In my case, however, I Cartesian-joined the Northwind "Products" table to itself a few times, so that this query returns 35 million rows; the page above runs for ~3 minutes on my laptop and then dies with a java.lang.OutOfMemoryError.
I'd previously written a set of ColdFusion tags that could access JDBC directly from ColdFusion and process the resultset one row at a time, thereby avoiding out-of-memory errors:
This ran for 20mins (until it hit the timeout),writing about 37MB to the output file. Still... 20 minutes? I think we can do better.
So, I wrote a Java class to do this in a very small loop, to see the difference in speed. The Java source file ("FastResultsetToFile.java") is this:
To compile this, save the FastResultsetToFile.java to disk, then type "javac FastResultsetToFile.java". This will produce a file called FastResultsetToFile.class. Depending on how you have ColdFusion installed, the file should be placed in either:
\coldfusionmx\wwwroot\WEB-INF\classes
or
\JRun4\servers\cfusion\cfusion-ear\cfusion-war\WEB-INF\classes
The ColdFusion page looks like this:
This completes, writing all 35,153,041 rows into a 1.3 GB file in about 5 minutes. Not bad! If we'd let cf_jdbcQueryLoop complete, it'd have taken nearly 12 hours (based on time and file size ratio).
<cfsetting requesttimeout="1200" enablecfoutputonly="yes"> <cfset tab = chr(9)> <cfquery datasource="Northwind" name="qProducts"> SELECT p1.ProductID, p1.ProductName, case when p2.productname> 'B' then p2.productName else null end productName2 FROM Products p1 CROSS JOIN Products p2 CROSS JOIN Products p3 CROSS JOIN Products p4 WHERE p1.ProductName <> 'foo' ORDER BY p1.ProductID </cfquery> <cfloop query="qProducts"> <cfset line = ProductID & tab & ProductName & tab & ProductName2> <cffile action="append" file="/out.txt" output="#line#" addNewLine="yes" > </cfloop> <cfsetting enablecfoutputonly="no">
This works fine for small result sets. In my case, however, I Cartesian-joined the Northwind "Products" table to itself a few times, so that this query returns 35 million rows; the page above runs for ~3 minutes on my laptop and then dies with a java.lang.OutOfMemoryError.
I'd previously written a set of ColdFusion tags that could access JDBC directly from ColdFusion and process the resultset one row at a time, thereby avoiding out-of-memory errors:
<cfsetting requesttimeout="1200" enablecfoutputonly="yes"> <cfset tab = chr(9)> <!--- cf_jdbcQuery will return a java ResultSet object instead of a ColdFusion Query object ---> <cf_jdbcquery datasource="Northwind" name="qProducts"><cfoutput> SELECT p1.ProductID, p1.ProductName, case when p2.productname> 'B' then p2.productName else null end productName2 FROM Products p1 CROSS JOIN Products p2 CROSS JOIN Products p3 CROSS JOIN Products p4 WHERE p1.ProductName <> 'foo' ORDER BY p1.ProductID </cfoutput></cf_jdbcquery> <cf_jdbcqueryloop query="qProducts"> <cfset line = qProducts.ProductID & tab & qProducts.ProductName & tab & qProducts.ProductName2> <cffile action="append" file="/out.txt" output="#line#" addNewLine="yes" > </cf_jdbcqueryloop> <cfsetting enablecfoutputonly="no">
This ran for 20mins (until it hit the timeout),writing about 37MB to the output file. Still... 20 minutes? I think we can do better.
So, I wrote a Java class to do this in a very small loop, to see the difference in speed. The Java source file ("FastResultsetToFile.java") is this:
import java.io.BufferedOutputStream; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.SQLException; /** * @author Daryl Banttari * Released to the public domain. Do with it what you will! */ public class FastResultsetToFile { public static final byte[] CRLF = { 13, 10 }; /** * Very quickly sends data from an any size query to an output file. * * @throws IOException * @throws SQLException */ public static int exportResultsetToFile(ResultSet rs, String fileName, String delim, String charset) throws IOException, SQLException { int rowcount = 0; File file = new File(fileName); byte[] delimBytes = delim.getBytes(charset); byte[] CRLF = "\r\n".getBytes(charset); BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(file)); try { ResultSetMetaData rsmd = rs.getMetaData(); int numCols = rsmd.getColumnCount(); // iterate over resultset while (rs.next()) { // write each column for (int i = 1; i <= numCols; ++i) { String s = rs.getString(i); if (s != null) { out.write(s.getBytes(charset)); } if (i == numCols) { // end of row, write CRLF out.write(CRLF); } else { // write delimiter between data out.write(delimBytes); } } rowcount++; } } finally { if (out != null) { out.flush(); out.close(); } if (rs != null) { rs.close(); } } return rowcount; } }
To compile this, save the FastResultsetToFile.java to disk, then type "javac FastResultsetToFile.java". This will produce a file called FastResultsetToFile.class. Depending on how you have ColdFusion installed, the file should be placed in either:
\coldfusionmx\wwwroot\WEB-INF\classes
or
\JRun4\servers\cfusion\cfusion-ear\cfusion-war\WEB-INF\classes
The ColdFusion page looks like this:
<!--- cf_jdbcQuery will return a java ResultSet object instead of a ColdFusion Query object ---> <cf_jdbcQuery datasource="Northwind" name="qProducts"> SELECT p1.ProductID, p1.ProductName, case when p2.productname> 'B' then p2.productName else null end productName2 FROM Products p1 CROSS JOIN Products p2 CROSS JOIN Products p3 CROSS JOIN Products p4 WHERE p1.ProductName <> 'foo' ORDER BY p1.ProductID </cf_jdbcQuery> <cfset application.fastFileWriter = createObject("java", "FastResultsetToFile")> <cfset application.fastFileWriter.exportResultsetToFile(qProducts, "/out.txt", chr(9), "UTF-8")> Done.
This completes, writing all 35,153,041 rows into a 1.3 GB file in about 5 minutes. Not bad! If we'd let cf_jdbcQueryLoop complete, it'd have taken nearly 12 hours (based on time and file size ratio).
While you can do a lot right inside ColdFusion, it's important to remember that it's not the right tool for every job.
nice blog...
ReplyDeletevisit also coldfusion example
Crazy question,
ReplyDeleteWhy are you writing to the disk after every loop?
Great tidbit post here Daryl. Would it be faster if you used a Java StringBuffer to build a few hundred rows before writing to disk? I think it would be faster to do that versus writing to disk on each column value. It would be an interesting follow up to this blog post to explore that.
ReplyDeleteI'm using a BufferedOutputStream which serves the same purpose as your StringBuffer suggestion would, but at a lower layer.
ReplyDelete@Hem,
The Java example is not writing to disk after every loop because it's using the BufferedOutputStream; the ColdFusion example is writing on every iteration for the sake of code clarity. If I were buffering output, it'd be noticeably faster, but still not nearly as fast as the Java example.
Daryl, I should have noticed that you used the BufferedOutputStream -- I skipped right over it. Do you think it would be more efficient if the buffer size was set higher than 512 bytes which is the default if you don't pass one in the constructor? I know that the flush() method can be a little expensive and I assume that is being called each time internally when the buffer hits the threshold.
ReplyDeleteI was playing with your jdbcQuery tags, and I noticed in the jdbcQueryTest.cfm file you sleep for 4 seconds, and I was wondering why you were doing that.
ReplyDeleteOne other thing for you...I'm having an issue with connections not closing.
ReplyDeleteI see that you're calling cnx.close() in a cfcatch, but I don't see it being called anywhere else. Just thought I'd let you know.
This comment has been removed by a blog administrator.
ReplyDeleteDaryl, this is awesome! and so handy since the built in CF options are so slow... I've been using this for a while now! thanks for putting this together. I'm not a java programmer, but I add section to put in the column names. here's the code, thanks again.
ReplyDeleteimport java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
/**
* @author Daryl Banttari
* Released to the public domain. Do with it what you will!
*/
public class cd\
{
public static final byte[] CRLF = { 13, 10 };
/**
* Very quickly sends data from an any size query to an output file.
*
* @throws IOException
* @throws SQLException
*/
public static int exportResultsetToFile(ResultSet rs, String fileName, String delim, String charset) throws IOException, SQLException {
int rowcount = 0;
File file = new File(fileName);
byte[] delimBytes = delim.getBytes(charset);
byte[] CRLF = "\r\n".getBytes(charset);
BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(file));
try {
ResultSetMetaData rsmd = rs.getMetaData();
int numCols = rsmd.getColumnCount();
// iterate over resultset
while (rs.next()) {
//write the column labels
if (rowcount == 0){
for (int i = 1; i <= numCols; i++ ) {
String col = rsmd.getColumnName(i);
if (col != null) {
out.write(col.getBytes(charset));
}
if (i == numCols) {
// end of row, write CRLF
out.write(CRLF);
}
else {
// write delimiter between data
out.write(delimBytes);
}
}
}
// write each column
for (int i = 1; i <= numCols; ++i) {
String s = rs.getString(i);
if (s != null) {
out.write(s.getBytes(charset));
}
if (i == numCols) {
// end of row, write CRLF
out.write(CRLF);
}
else {
// write delimiter between data
out.write(delimBytes);
}
}
rowcount++;
}
}
finally {
if (out != null) {
out.flush();
out.close();
}
if (rs != null) {
rs.close();
}
}
return rowcount;
}
}
This used to work fine but now that I upgraded to ColdFusion 2016 is breaking. I have no idea why. Any help would be really appreciated!!
ReplyDelete@Arjan: Are you getting an error? What's happening when it doesn't work?
ReplyDelete