How to convert docx to PDF without using Microsoft Word?
If you docx is mainly text, tables and images, docx4j.NET may work well for you. docx4j.NET is open source (Apache software license v2), identical to the Java version, but made into a DLL using IKVM. Currently we’re at v3.2.0, released last week.
It is easy to test; you can upload your docx to the docx4j demo webapp
Or with very little effort, you can run it from a sample project in Visual Studio. Its very easy, because docx4j.NET is in the NuGet.org repository:
To create your sample project:
- make sure you have NuGet Package Manager installed
- for VS 2012 and later, its installed by default
- for VS 2010, NuGet is available through the Visual Studio Extension Manager; see the above link.
- create a new project in Visual Studio (File > New > Project). A Console Application is fine. I chose that from the .NET 3.5 list.
- from the Tools menu, choose NuGet Package Manager > Package Manager Console
- type Install-Package docx4j.NET
You should see something like:
And then, your project/solution will be populated to look like:
We’re nearly there! Notice the file src/samples/c_sharp/Docx4NET/DocxToPDF.cs
Click on your project in Solution Explorer, then right click (or hit Alt+Enter) to get the properties pane:
Then set the “startup object” as shown in the above image.
Now you can hit Ctrl+F5 (“Start without Debugging”) – you don’t want to debug, since that’s really slow.
You should see some logging in the console window, culminating in “done! Press any key to continue..”
What just happened? All being well, the sample docx “src\samples\resources\sample-docx.docx” was saved as a PDF “OUT_sample-docx.pdf” in your project directory.
You can modify src/samples/c_sharp/Docx4NET/DocxToPDF.cs to read your own test docx.
A few comments.
XSL FO; Apache FOP. docx4j creates PDF via XSL FO. It generates XSL FO, then uses Apache FOP (v1.1) to convert the XSL FO to PDF. FOP also supports other output formats (the subject of another blog post).
Logging, Commons Logging. Logging is via Commons Logging. In the demo, it is configured programmatically (ie in DocxToPDF.cs). Alternatively, you could do it in app.config.
OpenXML SDK interop: src/main/c_sharp/Plutext/Docx4NET contains code for converting between a docx4j representation of a docx package, and the Open XML SDK’s representation.
Improving PDF support. To improve the quality of the PDF output, typically you’d make the improvement to docx4j first (ie the Java version), then create a new DLL using the ant build target dist.NET. docx4j is on GitHub, and is most easily setup using Maven (see earlier blog post).
Help/support/discussion. You can post in the docx4j PDF output forum, or on StackOverflow (be sure to use tag docx4j, plus some/all of c#, docx, pdf, fop, xslfo as you think appropriate). Please don’t cross post at both!