Office Format Converters

Open Document Format (Open Office)

Open Office is using a relatively simple file format consisting of zipped XML-Files. Unfortunately Open Office isn't used in most German companies and in none of our pilot users companies.

  • Wiki to ODT is already implemented in plugin:odt, the other way is still missing.
  • ODT to Wiki should be easy to implement using PHP's XML parser.

Open XML (MS Office)

Open XML is Microsoft's answer to the Open Document format. The format seems to be very similar.

The filter layer could be implemented by copying the ODT solution and adjusting it to Open XML specialities. An alternative could be the OpenXML PHP API.

  • OpenXML API does not support parsing documents, only creating

Using the same mechanisms for ODT and OpenXML would be preferable IMHO.

A Word generated .docx is not easily parsable, even when document templates where used when creating it. There seems to be no semantic to the structure but everything is done through the same elements with different styles attached!

To check:

  • What Office Versions do our project users use?

MS Word .doc

The .doc format is probably the most used format. But it's a proprietary data blob.

  • Implementing Wiki to Doc is probably impossible
  • Doc to Wiki might be possible in a limited way by
    • relying on a 3rd party tool (eg. antiword)
    • analyzing a 3rd party tool and reimplementing it in PHP

3rd party tools:

  • antiword
  • catdoc
  • wv
  • wv2
lab/officeformats.txt · Zuletzt geändert: 2009/08/07 10:51 von Detlef Hüttemann
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0