Attempt to write an excel file with Apache POI causing OutOfMemoryError

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • blazedaces
    Contributor
    • May 2007
    • 284

    Attempt to write an excel file with Apache POI causing OutOfMemoryError

    So I have a program below which writes an excel file with multiple sheets based on inputs of sheet names, data, cell types, etc. It uses Apache POI, which is currently the only thing I found (correct me if I'm wrong about this) that can write excel 2007 files. The program works. I've tested it out on very small files (I will include the commented out main method that you can use to test this out yourself if you'd like, that is assuming you download apache poi and all it's required .jar files too, mostly bean stuff).

    Now I'm trying to implement it, but obviously with a bit more data than when I tested it out. It runs out of memory it seems. Here's the stack trace below:

    [code=java]Exception in thread "main" java.lang.OutOf MemoryError: Java heap space
    at org.apache.xmlb eans.impl.store .Xobj$ElementXo bj.newNode(Xobj .java:2710)
    at org.apache.xmlb eans.impl.store .Xobj.copyNode( Xobj.java:1099)
    at org.apache.xmlb eans.impl.store .Xobj.copy_cont ents_from(Xobj. java:2319)
    at org.apache.xmlb eans.impl.value s.XmlObjectBase .setterHelper(X mlObjectBase.ja va:1907)
    at org.apache.xmlb eans.impl.value s.XmlObjectBase .set(XmlObjectB ase.java:1954)
    at org.apache.xmlb eans.impl.value s.XmlComplexCon tentImpl.arrayS etterHelper(Xml ComplexContentI mpl.java:1162)
    at org.openxmlform ats.schemas.spr eadsheetml.x200 6.main.impl.CTR owImpl.setCArra y(Unknown Source)
    at org.apache.poi. xssf.usermodel. XSSFRow.onDocum entWrite(XSSFRo w.java:392)
    at org.apache.poi. xssf.usermodel. XSSFSheet.write (XSSFSheet.java :1736)
    at org.apache.poi. xssf.usermodel. XSSFSheet.commi t(XSSFSheet.jav a:1701)
    at org.apache.poi. POIXMLDocumentP art.onSave(POIX MLDocumentPart. java:171)
    at org.apache.poi. POIXMLDocumentP art.onSave(POIX MLDocumentPart. java:173)
    at org.apache.poi. POIXMLDocument. write(POIXMLDoc ument.java:199)
    at Utils.Utilities .createExcelFil e(Utilities.jav a:660)
    at ReadAndSpitOutS pecificExcelDat a.makeExcelShee tWithAllTheCont rolGroupData(Re adAndSpitOutSpe cificExcelData. java:400)
    at ReadAndSpitOutS pecificExcelDat a.main(ReadAndS pitOutSpecificE xcelData.java:1 50)[/code]

    The error occurs (I'll point out where the error occurs when I post the code further below) at the line where I write "workbook.write (fileOut)" where fileOut is a FileOutputStrea m. This seems so odd to me. That implies that there's obviously enough memory for ALL of the java objects to store the excel file, but for some reason as it's writing to the hard disk it must grab a whole lot more memory.

    Just to tell you, I have attempted to increase the java heap size up until 1 gig (by adding -Xms128m -Xmx1024m) but that still doesn't seem to do the trick. And the thing is, this file when it's finished will probably barely be 1 mb, maybe 2... So I just don't understand what's going on here.

    Now, my solution to this problem unless someone can help me figure it out, will be to just write separate excel files and then combine the sheets manually. I assume that'll work, though I could be mistaken.

    Anyway, here's the code below:

    [code=java]
    //The following method makes an excel file with a bunch of sheets, each sheet containing some data, sort of all like the
    //method above but for multiple sheets
    public static void createExcelFile (ArrayList<Arra yList<ArrayList >> allSheetsData, ArrayList<Array List<String>> allSheetsHeader s, int[][] allSheetsCellTy pes, ArrayList<Strin g> sheetNames,Stri ng filePath) throws IndexOutOfBound sException {
    if(allSheetsDat a.size() != allSheetsHeader s.size() && allSheetsHeader s.size() != allSheetsCellTy pes.length && allSheetsCellTy pes.length != sheetNames.size ())
    throw new IndexOutOfBound sException("The number of sheets does not match the number of sheet names?");

    ArrayList<Array List> data;
    ArrayList<Strin g> headers;
    int[] cellTypes;
    String sheetName;

    File file = new File(filePath);
    XSSFWorkbook workbook = new XSSFWorkbook();
    XSSFSheet sheet;
    XSSFRow currentRow = null;
    XSSFCell currentCell = null;

    try {
    FileOutputStrea m fileOut = new FileOutputStrea m(file);
    try {
    for (int currentSheetNum = 0; currentSheetNum < allSheetsData.s ize(); currentSheetNum ++) {
    data = allSheetsData.g et(currentSheet Num);
    headers = allSheetsHeader s.get(currentSh eetNum);
    cellTypes = allSheetsCellTy pes[currentSheetNum];
    sheetName = sheetNames.get( currentSheetNum );

    if(data.size() != headers.size() && headers.size() != cellTypes.lengt h)
    throw new IndexOutOfBound sException("The number of data ArrayLists is not equal to the number of headers in sheet " + sheetName);

    sheet = workbook.create Sheet(sheetName );

    currentRow = sheet.createRow (0);
    for(int cellNum = 0; cellNum < headers.size(); cellNum++) {
    currentCell = currentRow.crea teCell(cellNum, XSSFCell.CELL_T YPE_STRING);
    currentCell.set CellValue(heade rs.get(cellNum) );
    }

    for(int rowNum = 0; rowNum < getSizeOfLonges tArrayList(data ); rowNum++) {
    currentRow = sheet.createRow (rowNum + 1);

    for(int cellNum = 0; cellNum < headers.size(); cellNum++) {
    if (rowNum < data.get(cellNu m).size()) {
    if(cellTypes[cellNum] == XSSFCell.CELL_T YPE_NUMERIC) {
    currentCell = currentRow.crea teCell(cellNum, cellTypes[cellNum]);
    currentCell.set CellValue(((Dou ble) data.get(cellNu m).get(rowNum)) .doubleValue()) ;
    } else if (cellTypes[cellNum] ==XSSFCell.CELL _TYPE_STRING) {
    currentCell = currentRow.crea teCell(cellNum, cellTypes[cellNum]);
    currentCell.set CellValue((Stri ng) data.get(cellNu m).get(rowNum)) ;
    }
    }
    }
    }
    }

    workbook.write( fileOut); //ERROR HAPPENS HERE!
    } finally {
    if (fileOut != null)
    fileOut.close() ;
    }
    } catch (FileNotFoundEx ception fnfex) {
    fnfex.printStac kTrace();
    } catch (IOException ioex) {
    ioex.printStack Trace();
    }
    }
    [/code]

    And here's the main method I used to test it out:

    [code=java]
    // public static void main(String args[]) {
    // ArrayList<Array List<ArrayList> > twoSheetsData = new ArrayList<Array List<ArrayList> >(2);
    // ArrayList<Array List<String>> allSheetsHeader s = new ArrayList<Array List<String>>(2 );
    // int[][] allSheetsCellTy pes = new int[2][];
    // ArrayList<Strin g> sheetNames = new ArrayList<Strin g>(2);
    // sheetNames.add( "Sheet One");
    // sheetNames.add( "Sheet Two");
    // String filePath = "C:/Documents and Settings/Asaf/Desktop/testExcelFile.x lsx";
    //
    // ArrayList<Array List> testListForShee t1 = new ArrayList<Array List>(3);
    //
    // ArrayList<Strin g> stringList1ForS heet1 = new ArrayList<Strin g>(4);
    // stringList1ForS heet1.add("one" );
    // stringList1ForS heet1.add("two" );
    // stringList1ForS heet1.add("thre e");
    // stringList1ForS heet1.add("four ");
    //
    // ArrayList<Doubl e> doubleListForSh eet1 = new ArrayList<Doubl e>(3);
    // doubleListForSh eet1.add(new Double(1));
    // doubleListForSh eet1.add(new Double(2));
    // doubleListForSh eet1.add(new Double(3));
    //
    // ArrayList<Strin g> stringList2ForS heet1 = new ArrayList<Strin g>(5);
    // stringList2ForS heet1.add("seco nd one");
    // stringList2ForS heet1.add("seco nd two");
    // stringList2ForS heet1.add("seco nd three");
    // stringList2ForS heet1.add("seco nd four");
    // stringList2ForS heet1.add("seco nd five");
    //
    // ArrayList<Strin g> headerListForSh eet1 = new ArrayList<Strin g>(3);
    // headerListForSh eet1.add("First strings");
    // headerListForSh eet1.add("Doubl es");
    // headerListForSh eet1.add("Secon d strings");
    //
    // testListForShee t1.add(stringLi st1ForSheet1);
    // testListForShee t1.add(doubleLi stForSheet1);
    // testListForShee t1.add(stringLi st2ForSheet1);
    //
    // int[] cellTypesForShe et1 = { XSSFCell.CELL_T YPE_STRING, XSSFCell.CELL_T YPE_NUMERIC, XSSFCell.CELL_T YPE_STRING };
    //
    // ArrayList<Array List> testListForShee t2 = new ArrayList<Array List>(3);
    //
    // ArrayList<Strin g> stringList1ForS heet2 = new ArrayList<Strin g>(4);
    // stringList1ForS heet2.add("five ");
    // stringList1ForS heet2.add("six" );
    // stringList1ForS heet2.add("seve n");
    // stringList1ForS heet2.add("eigh t");
    //
    // ArrayList<Doubl e> doubleListForSh eet2 = new ArrayList<Doubl e>(3);
    // doubleListForSh eet2.add(new Double(4));
    // doubleListForSh eet2.add(new Double(5));
    // doubleListForSh eet2.add(new Double(6));
    //
    // ArrayList<Strin g> stringList2ForS heet2 = new ArrayList<Strin g>(5);
    // stringList2ForS heet2.add("seco nd six");
    // stringList2ForS heet2.add("seco nd seven");
    // stringList2ForS heet2.add("seco nd eight");
    // stringList2ForS heet2.add("seco nd nine");
    // stringList2ForS heet2.add("seco nd ten");
    //
    // ArrayList<Strin g> headerListForSh eet2 = new ArrayList<Strin g>(3);
    // headerListForSh eet2.add("First strings");
    // headerListForSh eet2.add("Doubl es");
    // headerListForSh eet2.add("Secon d strings");
    //
    // testListForShee t2.add(stringLi st1ForSheet2);
    // testListForShee t2.add(doubleLi stForSheet2);
    // testListForShee t2.add(stringLi st2ForSheet2);
    //
    // int[] cellTypesForShe et2 = { XSSFCell.CELL_T YPE_STRING, XSSFCell.CELL_T YPE_NUMERIC, XSSFCell.CELL_T YPE_STRING };
    //
    // twoSheetsData.a dd(testListForS heet1);
    // twoSheetsData.a dd(testListForS heet2);
    //
    // allSheetsHeader s.add(headerLis tForSheet1);
    // allSheetsHeader s.add(headerLis tForSheet2);
    //
    // allSheetsCellTy pes[0] = cellTypesForShe et1;
    // allSheetsCellTy pes[1] = cellTypesForShe et2;
    //
    // createExcelFile (twoSheetsData, allSheetsHeader s, allSheetsCellTy pes, sheetNames, filePath);
    // }
    [/code]

    Anyway, if you can help me out with this and explain to me either what I'm doing wrong, or what is going on, I would appreciate it. Thank you very much for your time and help.

    -blazed
  • pronerd
    Recognized Expert Contributor
    • Nov 2006
    • 392

    #2
    Originally posted by blazedaces
    uses Apache POI, which is currently the only thing I found (correct me if I'm wrong about this) that can write excel 2007 files.
    That is the only option I know of for Java to create the acutal files. Some other options are to export the data in CSV format and import it to Excel, or for Office 2003 or newer you can save the in an XHTML table and Excel can open that just like a normal Excel file. For this option to work the XHTML has to be well formed, and there is some meta data tweaking that has to be done. To see an example just save a normal Excel file as HTML and look at the file it generates.




    Originally posted by blazedaces
    This seems so odd to me. That implies that there's obviously enough memory for ALL of the java objects to store the excel file, but for some reason as it's writing to the hard disk it must grab a whole lot more memory.
    I would bet that it is holding your data in collection API objects, and does not start to add all of the Excel overhead until it starts to write out the file. So at that point it is holding all of the data, and then has to open a buffer to hold all of the data with the file overhead basically more than doubling the RAM being used.




    Originally posted by blazedaces
    I have attempted to increase the java heap size up until 1 gig (by adding -Xms128m -Xmx1024m) but that still doesn't seem to do the trick.
    Are you completely sure this setting is actually being made? I have seen it happen several times where scripts are bing used to start and stop the app server that do not pass the attributes correctly so the extra config options (like Xmx) are never being set in the JVM. You might want to use JStat to verify the memory settings are actually being made.

    Also how many rows and data cells are you adding to these files? 128MB should be enough unless there is a massive amount of data... i.e.m More than you can really put in a spreadsheet. Just a note Office 2003 could only handle 65k rows, and earlier version could only do 16k. So if you are going over that the files created will be un-readable any way.

    The only other idea that comes to mind would be to try using something like a BufferedOutputS tream or BufferedWriter which might use memory more effectively when creating the file.

    Comment

    • pronerd
      Recognized Expert Contributor
      • Nov 2006
      • 392

      #3
      P.S. I would also try adding the line below just before the write() method call to make sure your heap size settings are actually being made. You might also try using a capital 'M' instead of lower case for setting the Xmx value. I see to remember seeing it that way in examples some where.

      Comment

      • blazedaces
        Contributor
        • May 2007
        • 284

        #4
        I ended up doing what I suggested earlier, which is instead of creating one file with multiple sheets just create a few files for each sheet separately. This was successful. Each file was approximately 16kb in size.

        The largest file (which would be the largest sheet) was 20kb, used 745 rows, and up to the S column. That's what's so odd. If I was creating an excel sheet with a ridiculous amount of data that would be one thing, but that's simply not the case.

        I will take your tip and run this thing with JStat to check that the xmx option is functioning as it should. You're correct in that I frustratingly realized I didn't really know if this option was doing anything.

        I would run it right now, but like I said I went ahead without it and I'm in the middle of changing a lot of code to include more data (now that I can create the excel sheets correctly). I'll post as soon as I can again.

        Thank you so much for your help,
        -blazed

        Comment

        • N002213F
          New Member
          • Sep 2007
          • 38

          #5
          I have an app that opens an excelsheet about 4MB and does some nice business logic and save into another excel sheet. Checking from file sizes you mention, creating individual files is simply running away from the actual problem. I think you code needs major reworking to improve on efficient use of memory.

          Comment

          • blazedaces
            Contributor
            • May 2007
            • 284

            #6
            Originally posted by N002213F
            I have an app that opens an excelsheet about 4MB and does some nice business logic and save into another excel sheet. Checking from file sizes you mention, creating individual files is simply running away from the actual problem. I think you code needs major reworking to improve on efficient use of memory.
            I'm not saying the problem isn't in my code... but I'm showing you my code. Where is the problem?

            -blazed

            Comment

            • r035198x
              MVP
              • Sep 2006
              • 13225

              #7
              Shouldn't you be using HSSFWorkbook instead?

              Comment

              • blazedaces
                Contributor
                • May 2007
                • 284

                #8
                Originally posted by r035198x
                Shouldn't you be using HSSFWorkbook instead?
                HSSFWorkbook is for excel files with the .xls extension (Excel 97-2003 documents). XSSFWorkbook supposedly acts the same way for Excel 2007 documents that use the .xlsx extension.

                No, I don't know enough about it to explain it thoroughly as the current examples on XSSFWorkbooks is minor at best. I had to basically go back and forth looking at examples of work done using HSSFWorkbook and HSSFRows, cells, etc. and looking at the XSSFWorkbook and other XSSF API to figure out how to write my code... And some trial and error as well.

                Let me tell you though that obtaining data from excel was the harder part of this "project" if you would call it that. I can give you more details if you want, but I don't think it's very relevant.

                Again, thank you guys for your help.

                -blazed

                Comment

                • daltondecamargo
                  New Member
                  • Aug 2010
                  • 1

                  #9
                  Hello guys, I have registered at this forum only for post this link that I've found.




                  Regards,

                  Dalton Camargo

                  Comment

                  Working...