Wednesday, February 27, 2013

Convert word document (.docx) to PDF

This post will describes how to convert word document to PDF using Java.

To convert document to Pdf we will have different type of approaches.
But in this post i am using  docx4j. It is one of the good API for conversion from XSLT to PDF and Word Document to PDF etc..

We can convert from document to Pdf with Simple java program.

Steps to follow.

Step1 :open Eclipse and create new java project- provide name as you like.

Step 2: Create new Java class  which ever you like (ex: ConvertDocToPDF )

Step 3: Paste the below lines of code inside main method of created java class

 try {

long start = System.currentTimeMillis();

// 1) Load DOCX into WordprocessingMLPackage

InputStream is = new FileInputStream(new File("test.docx"));
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(is);
//If your header and body information got over lapped then use the below code
List sections = wordMLPackage.getDocumentModel().getSections();
for (int i = 0; i < sections.size(); i++) {

System.out.println("sections Size" + sections.size());

//if you want use any Physical fonts then use the below code.

Mapper fontMapper = new IdentityPlusMapper();

PhysicalFont font = PhysicalFonts.getPhysicalFonts().get("Comic Sans MS");

fontMapper.getFontMappings().put("Algerian", font);


// 2) Prepare Pdf settings

PdfSettings pdfSettings = new PdfSettings();

// 3) Convert WordprocessingMLPackage to Pdf

org.docx4j.convert.out.pdf.PdfConversion conversion = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(wordMLPackage);

OutputStream out = new FileOutputStream(new File("test.pdf"));
System.err.println("Time taken to Generate pdf  "+ (System.currentTimeMillis() - start) + "ms");
} catch (Throwable e) {


Step 4: Now you can run the Java program, PDF will be generate for your Document file.


Scott Fratto said...

Isn't it much easier to use a web-based app for the conversion process? I have been using GroupDocs Conversion for some time now and it is quite simple and provides an embed code to use without your web-page.

Lakshmi Prasad said...

i think u forgot mention the required jar files

wiez said...

You should try Aspose.Words for Java API also for converting word docs to pdf and to many other formats.

Priyatham said...
This comment has been removed by the author.
Priyatham said...
This comment has been removed by the author.
Priyatham said...

I tried the provided code for convertion of word to pdf by including all the required jars, but got some exceptions and errors (YOU CAN SEE MY NEXT POST FOR ERRORS). So please help me in this regard.

Priyatham said...

log4j:WARN No appenders could be found for logger (org.docx4j.utils.ResourceUtils).
log4j:WARN Please initialize the log4j system properly.
18 [main] INFO org.docx4j.utils.Log4jConfigurator - Since your log4j configuration (if any) was not found, docx4j has configured log4j automatically.
37 [main] WARN org.docx4j.XmlUtils - Using default SAXParserFactory: null
294 [main] INFO org.docx4j.jaxb.Context - JAXB: RI not present. Trying Java 6 implementation.
295 [main] INFO org.docx4j.jaxb.Context - JAXB: Using Java 6 implementation.
295 [main] INFO org.docx4j.jaxb.Context - loading Context jc
4160 [main] INFO org.docx4j.jaxb.Context - loaded com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl .. loading others ..
4294 [main] INFO org.docx4j.jaxb.Context - .. others loaded ..
4303 [main] WARN org.docx4j.jaxb.JaxbValidationEventHandler - [(non)FATAL_ERROR] : unexpected element (uri:"", local:"html"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}p
4303 [main] INFO org.docx4j.jaxb.JaxbValidationEventHandler - continuing (with possible element/attribute loss)
4303 [main] ERROR org.docx4j.openpackaging.packages.OpcPackage - javax.xml.bind.UnmarshalException: unexpected element (uri:"", local:"html"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}package>,<{http://schemas.microsoft.com/office/2006/xmlPackage}xmlData>
org.docx4j.openpackaging.exceptions.Docx4JException: Couldn't load xml from stream
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:238)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:210)
at org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(WordprocessingMLPackage.java:184)
at asd.main(asd.java:25)
Caused by: javax.xml.bind.UnmarshalException: unexpected element (uri:"", local:"html"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}package>,<{http://schemas.microsoft.com/office/2006/xmlPackage}xmlData>
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.handleEvent(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportError(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportError(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportUnexpectedChildElement(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext$DefaultRootLoader.childElement(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(Unknown Source)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.SAXConnector.startElement(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDriver.scanRootElementHook(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)

Shahna said...

I got an exception java.lang.NoClassDefFoundError:

sivaraju said...

may i know where you are getting NoClassDefFoundError: exception

punitpannu said...

i got the following error while i tried to work with jboss 6.1. The same code is working fine with jboss 4.0.
13:04:34,423 ERROR [org.docx4j.utils.ResourceUtils] Couldn't get resource: docx4j.properties
13:04:34,438 ERROR [org.docx4j.Docx4jProperties] Error reading docx4j.properties: java.lang.NullPointerException
at org.docx4j.utils.ResourceUtils.getResource(ResourceUtils.java:45) [docx4j-2.7.1.jar:]

i tried with the latest docx4j jars (i.e 3.1 and 3.2) but it didnt work for me..

Post a Comment