there seems problem getting deterministic hash values poi xlsx format, messagedigest sha-256 implementation, empty bytearray streams. happens randomly, after several hundreds or thousands of iterations.
the relevant code snippets used reproduce problem:
// testng filetest: @test(enabled = true) // indeterminism @ random iterations, such 400 or 1290 public void emptyxlsxtest() throws ioexception, nosuchalgorithmexception { final hasher hasher = new hasherimpl(); boolean differentsha256hash = false; (int = 0; < 10000; i++) { final bytearrayoutputstream exceladhoc1 = businessplaninmemory.getemptyexcel("xlsx"); final bytearrayoutputstream exceladhoc2 = businessplaninmemory.getemptyexcel("xlsx"); byte[] expectedbytearray = exceladhoc1.tobytearray(); string expectedsha256 = hasher.sha256(expectedbytearray); byte[] actualbytearray = exceladhoc2.tobytearray(); string actualsha256 = hasher.sha256(actualbytearray); if (!expectedsha256.equals(actualsha256)) { differentsha256hash = true; system.out.println("iteration: " + i); system.out.println("expected hash: " + expectedsha256); system.out.println("actual hash: " + actualsha256); break; } } assert.asserttrue(differentsha256hash, "indeterminism did not occur"); }
referenced hasher , poi code:
// hasherimpl class: public string sha256(final inputstream stream) throws ioexception, nosuchalgorithmexception { final messagedigest digest = messagedigest.getinstance("sha-256"); final byte[] bytesbuffer = new byte[300000]; int bytesread = -1; while ((bytesread = stream.read(bytesbuffer)) != -1) { digest.update(bytesbuffer, 0, bytesread); } final byte[] hashedbytes = digest.digest(); return bytestohex(hashedbytes); }
tried eliminate indeterminism due meta data creation time, no avail:
// poi businessplaninmemory helper class: public static bytearrayoutputstream getemptyexcel(final string fileextension) throws ioexception { workbook wb; if (fileextension.equals("xls")) { wb = new hssfworkbook(); } else { wb = new xssfworkbook(); final poixmlproperties props = ((xssfworkbook) wb).getproperties(); final poixmlproperties.coreproperties coreprop = props.getcoreproperties(); coreprop.setcreated(""); coreprop.setidentifier("1"); coreprop.setmodified(""); } wb.createsheet(); final bytearrayoutputstream excelstream = new bytearrayoutputstream(); wb.write(excelstream); wb.close(); return excelstream; }
the hssf / xls format seems not affected problem described. have clue, causing this, if not bug in poi itself? basically, code above refers https://poi.apache.org/spreadsheet/examples.htmlbusinessplan example
thanks input!
this not definitive answer suspicion happens:
docx , xlsx file formats bunch of zipped-up xml-files. can seen when renaming them .zip , opening favorite zip-tool.
when examining file created word noticed change-timestamp of files contained in archive 1980-01-01 00:00:00
while in created poi show actual timestamp file created.
so suspect problem occurs when there timestamp-difference between 1 or more of files in exceladhoc1
, exceladhoc2
. might happen when clock switches next second while creating 1 or other file.
this not affect xls-files since hssf-format not of "zipped xml"-type , not contain nested files might have different timestamps.
to change timestamps after writing file try using `java.util.zip``-package. haven't tested should trick:
zipfile file = new zipfile(pathtofile); enumeration<zipentry> e = file.entries(); while(e.hasmoreelements()) { zipentry entry = e.nextelement(); entry.settime(0l); }
Comments
Post a Comment