java - POI XSSF / XLSX hashing indeterminism with MessageDigest SHA-256 -


there seems problem getting deterministic hash values poi xlsx format, messagedigest sha-256 implementation, empty bytearray streams. happens randomly, after several hundreds or thousands of iterations.

the relevant code snippets used reproduce problem:

// testng filetest: @test(enabled = true) // indeterminism @ random iterations, such 400 or 1290 public void emptyxlsxtest() throws ioexception, nosuchalgorithmexception {     final hasher hasher = new hasherimpl();     boolean differentsha256hash = false;     (int = 0; < 10000; i++) {         final bytearrayoutputstream exceladhoc1 = businessplaninmemory.getemptyexcel("xlsx");         final bytearrayoutputstream exceladhoc2 = businessplaninmemory.getemptyexcel("xlsx");          byte[] expectedbytearray = exceladhoc1.tobytearray(); string expectedsha256 = hasher.sha256(expectedbytearray); byte[] actualbytearray = exceladhoc2.tobytearray(); string actualsha256 = hasher.sha256(actualbytearray);  if (!expectedsha256.equals(actualsha256)) {             differentsha256hash = true;             system.out.println("iteration: " + i);             system.out.println("expected hash: " + expectedsha256);             system.out.println("actual hash: " + actualsha256);             break;         }     }     assert.asserttrue(differentsha256hash, "indeterminism did not occur"); } 

referenced hasher , poi code:

// hasherimpl class: public string sha256(final inputstream stream) throws ioexception, nosuchalgorithmexception {     final messagedigest digest = messagedigest.getinstance("sha-256");     final byte[] bytesbuffer = new byte[300000];      int bytesread = -1;     while ((bytesread = stream.read(bytesbuffer)) != -1) {         digest.update(bytesbuffer, 0, bytesread);     }     final byte[] hashedbytes = digest.digest();     return bytestohex(hashedbytes); } 

tried eliminate indeterminism due meta data creation time, no avail:

// poi businessplaninmemory helper class: public static bytearrayoutputstream getemptyexcel(final string fileextension) throws ioexception {     workbook wb;      if (fileextension.equals("xls")) {         wb = new hssfworkbook();     }     else {         wb = new xssfworkbook();         final poixmlproperties props = ((xssfworkbook) wb).getproperties();         final poixmlproperties.coreproperties coreprop = props.getcoreproperties();         coreprop.setcreated("");         coreprop.setidentifier("1");         coreprop.setmodified("");     }      wb.createsheet();      final bytearrayoutputstream excelstream = new bytearrayoutputstream();     wb.write(excelstream);     wb.close();     return excelstream; } 

the hssf / xls format seems not affected problem described. have clue, causing this, if not bug in poi itself? basically, code above refers https://poi.apache.org/spreadsheet/examples.htmlbusinessplan example

thanks input!

this not definitive answer suspicion happens:

docx , xlsx file formats bunch of zipped-up xml-files. can seen when renaming them .zip , opening favorite zip-tool.

when examining file created word noticed change-timestamp of files contained in archive 1980-01-01 00:00:00 while in created poi show actual timestamp file created.

so suspect problem occurs when there timestamp-difference between 1 or more of files in exceladhoc1 , exceladhoc2. might happen when clock switches next second while creating 1 or other file.

this not affect xls-files since hssf-format not of "zipped xml"-type , not contain nested files might have different timestamps.

to change timestamps after writing file try using `java.util.zip``-package. haven't tested should trick:

zipfile file = new zipfile(pathtofile); enumeration<zipentry> e = file.entries(); while(e.hasmoreelements()) {     zipentry entry = e.nextelement();     entry.settime(0l); } 

Comments