Office文件格式突变，促使Java和Office更完美集成-技术开发专区

Office文件格式突变，促使Java和Office更完美集成

作者：IT168 极地圣火编辑：李宁 2007-06-26 14:09

我们将这个文档保存为test.docx。注意，不要保存成向后兼容的word文档格式，也不要保存成Office2003或更好的Office的WordML格式。这个文档将是一个压缩的zip格式，如果将test.docx改成test.zip，解开后的目录结构如图1如示：

从上面解开的文件结构可以非常清楚地了解test.docx的保存结构。在Java中我们可以使用java.util.zip包来解开test.docx。从这个目录结构我们可以很容易地猜出文档的主要内容保存在document.xml中。而其他的xml文件将保存不同的信息。如字体信息将保存在fontTable.xml中，而Office主题将保存在theme.xml和theme1.xml中。

下面我们开使使用Java对这个文件进行操作。首先我们使用JUnit4来确定test.docx是否存在，以及是否可以对其进行读写，代码如下：

@Test public void verifyFile()
...{
assertTrue(new File("test.docx").exists());
assertTrue(new File("test.docx").canRead());
assertTrue(new File("test.docx").canWrite());
}

下面的代码将简单地验证java.util.zip.ZipFile类是否可以打开test.docx。

@Test public void openFile()  throws IOException, ZipException
...{
ZipFile docxFile = new ZipFile(new File("test.docx"));
 assertEquals(docxFile.getName(), "test.docx");
}

经过测试，ZipFile完全可以操作test.docx。看来很多人都迫不急待了，下面就让我们来从test.docx中来读取数据吧。首先应该打开document.xml文件。代码如下：

Test public void listContents()  throws IOException, ZipException
...{
boolean documentFound = false;
ZipFile docxFile = new ZipFile(new File("test.docx"));
Enumeration entriesIter =  docxFile.entries();
while (entriesIter.hasMoreElements())
...{
ZipEntry entry = entriesIter.nextElement();
if(entry.getName().equals("document.xml"))
documentFound = true;
}
assertTrue(documentFound);
}

但是运行上面的代码将抛出一个异常，好象说明document.xml并不存在，事实上并不是如此。而是ZipFile API需要一个完整的文件或目录名，因此，需要将上面的路径变成word/document.xml。

下一步我们将通过ZipFile得到一个ZipEntry对象，并通过这个对象来看看xml中有什么，代码如下：

@Test public void getDocument()  throws IOException, ZipException
...{
ZipFile docxFile = new ZipFile(new File("test.docx"));
ZipEntry documentXML =  docxFile.getEntry("word/document.xml");
assertNotNull(documentXML);
}

第1页：前言第2页：打开Word2007的文档第3页：读取Word文档中的数据第4页：向Word文档中写入数据

关注我们