June 28, 2009

Web Service for Document Conversion – an Odyssey

Filed under: .Net,PHP,Programming,T3city — pj @ 2:02 pm

A couple of years ago, I needed a way to convert Microsoft Word documents to Pdf from a C# program. The application I was working on processed hundreds of documents and was run by the system scheduler every day at around 3am, so manual conversion was not an option. I wasn’t in control of the source documents, so I had to accept the documents the way they were given to me. I needed to do additional processing on the documents, so I wanted to convert them into a universal format. I already had a good library for reading Pdf files. After researching my options, I settled on using OpenOffice to do the conversion. OpenOffice has a pretty good Word filter, the ability to create Pdfs and an automation interface accessible to all .Net languages, including C#, so it was a good fit. I know there are commercial solutions and ways to automate Microsoft Office, but the OpenOffice solution was free and fairly easy to use.

Recently, I upgraded my development system from OpenOffice 2.x to OpenOffice 3.1. I can’t remember now the main reason I upgraded, but I was looking forward to being able to add the ability to convert docx to pdf (Office Open XML support was added in OpenOffice 3.x). I figured the upgrade might require some minor changes to my document conversion code, but it turned out not to be so simple.
(more…)

Powered by Teztech