Ready for a new day (continued)

As I mentioned yesterday, my expectations regarding Office 2007 were fairly low. I am not a big fan of office application suites and for most of my editing I am using either on-line tools (like Google docs or Writeboard) or Open Office. True, neither of them has the smoothness and fine polish of Microsoft Office 2003, neither has all of it's features, but - let's face it - I barely use maybe 10-20% of Word or Excel capabilities anyway. For which reason, I am not willing to pay premium just for improved user comfort.

Most people see the new, good looking Ribbon UI of Office package and user interaction as the most important improvement. This is certainly an improvement, but what was nice surprise for me was a look under the hood. It was no secret that the new file format is XML based. Which does not say much, because there are many ways how to store anything in XML, including very many very bad ways. What Microsoft did, however was really nice: every file in new format is ZIP-ped folder (with extension e.g. .docx instead .zip). Within this folder exists a hierarchy of XML files and subfolders. Very clean and very efficient. If you for example add an image to Word document, rather than mangling the JPG and merging it into tasteless binary blob with the rest of even uglier binary blob (as Word 2003/XP/2000/97 etc would do), it is stored as exactly the same, standard binary JPG file inside subfolder 'media' and referred from xml data file.

This structure saves space, because XML compresses very very well and allows access to document's data on any platform by any language as long as it can parse XML. Hallelujah ! Finally an easy and elegant way how to access office documents from Java or Python, without going through libraries such as POI (which is a Java libary to access binary OLE2 based documents) based on reverse engineering undocumented formats. Btw, POI stands for Poor Obfuscation Implementation :-).

Not only it is now possible to generate and parse office documents on server side much easier, on non-Windows platforms (e.g. Java on Solaris), it actually helps to do the job for the Windows servers as well. If you ever tried to do anything with Office documents on server side, you must have found out using Windows server does not help you much: you cannot use COM wrappers around Office installation on server anyway. Not only because on the server side, there would be nobody to listen to the poor paperclip ... but because multithreading capabilities of office libraries and COM wrappers around them are fairly limited for any number of threads large than two ...

Another very nice feature is separate storage of document itself and document form data. If you have a form in Word document with fields to be filled in by user, you can lock the form so that only content of the fields can be edited. When user saves such document after filling the form out, the data entered is actually stored in different XML file. Very useful - this is very nice alternative to user editable PDF forms. I can imagine lot's of applications in public sector.

Also macros and active content (scripts) can be now much nicer separated. Unfortunately, the VBA is still alive ... but at least it is now visible, not hidden. This may be important to identify and control VBA viruses and Malware.

Thanks to these changes, Office is suddenly interesting platform for business solutions. Both programming "from within of Office" by modifying the Office UI by script inside the document and "from outside of Office" - using Visual Studio Tools for Office is now easier and cleaner. In addition, you can now use "without Office" by accessing the data directly, on machine without Office installed using excellent .NET XML capabilities, or even without Windows, going with Java, Python, Perl, Ruby, Javascript - you name it.

One final observation from the developer's sessions (while talking about RSS): It is great that there is improved support for RSS in .NET 3.0, we all liked it a lot. What I personally liked much less was that small remark in presentation:

... we at Microsoft have implemented and enhanced the RSS standard.

Let's get one thing straight: the only thing that can be done with a standard is follow it, implement it and keep it unmodified. Enhancing standard means breaking it, guys! If you really want to have standard that is your own, create one from scratch and convince the world to follow you. Taking something that is out there, and turning it by proprietary enhancements into something new is not what innovation is suppose to be. And certainly not an improvement.

Contents