1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
|
-------------------------------------------------------------------------------
- -
- KOffice Storage Format Specification - Version 2.3 -
- -
- by Werner, last changed: 20020306 by Werner Trobin -
- -
- History : -
- Version 1.0 : binary store -
- Version 2.0 : tar.gz store -
- Version 2.1 : cleaned up -
- version 2.2 : shaheed Put each part into its own directory to allow -
- one filter to easily embed the results of another -
- and also to have its own documentinfo etc. -
- Added description of naming convention. -
- Version 2.3 : werner Allow the usage of relative links. It is now -
- possible to refer to any "embedded" image or part -
- via a plain relative URL as you all know it. -
- -
-------------------------------------------------------------------------------
The purpose of this document is to define a common KOffice Storage Structure.
Torben, Reggie, and all the others agreed on storing embedded KOffice Parts
and binary data (e.g. pictures, movies, sounds) via a simple tar.gz-structure.
The support class for the tar format is kdelibs/kio/ktar.*, written by Torben
and finished by David.
The obvious benefits of this type of storage are:
- It's 100% non- proprietary as it uses only the already available formats
(XML, pictures, tar.gz, ...) and tools (tar, gzip).
- It enables anybody to edit the document directly; for instance, to update
an image (faster than launching the application), or to write scripts
that generate KOffice documents ! :)
- It is also easy to write an import filter for any other office-suite
application out there by reading the tar.gz file and extracting the XML out
of it (at the worst, the user can extract the XML file by himself, but then
the import loses embedded Parts and pictures).
The tar.gz format also generates much smaller files than the old binary
store, since everything's gzipped.
Name of the KOffice Files
-------------------------
As some people suggested, using a "tgz"-ending is confusing; it's been dropped.
Instead, we use the "normal" endings like ".kwd", ".ksp", ".kpr", etc. To recognize
KOffice documents without a proper extension David Faure <faure@kde.org>
added some magic numbers to the gzip header (to see what I'm talking about
please use the "file" command on a KOffice document or see
http://lists.kde.org/?l=koffice-devel&m=98609092618214&w=2);
External Structure
------------------
Here is a simple example to demonstrate the structure of a KOffice document.
Assume you have to write a lab-report. You surely will have some text, the
readings, some formulas and a few pictures (e.g. circuit diagram,...).
The main document will be a KWord-file. In this file you embed some KSpread-
tables, some KChart-diagramms, the KFormulas, and some picture-frames. You save
the document as "lab-report.kwd". Here is what the contents of the
tar.gz file will look like :
lab-report.kwd:
---------------
maindoc.xml -- The main XML file containing the KWord document.
documentinfo.xml -- Author and other "metadata" for KWord document.
pictures/ -- Pictures embedded in the main KWord document.
pictures/picture0.jpg
pictures/picture1.bmp
cliparts/ -- Cliparts embedded in the main KWord document.
cliparts/clipart0.wmf
part0/maindoc.xml -- for instance a KSpread embedded table.
part0/documentinfo.xml -- Author and other "metadata" for KSpread table.
part0/part1/maindoc.xml -- say a KChart diagram within the KSpread table.
part1/maindoc.xml -- say a KChart diagram.
part2/maindoc.xml -- why not a KIllustrator drawing.
part2/pictures/ -- Pictures embedded in the KIllustrator document.
part2/pictures/picture0.jpg
part2/pictures/picture1.bmp
part2/cliparts/ -- Cliparts embedded in the KIllustrator document.
part2/cliparts/clipart0.wmf
...
Internal Name
-------------
- Absolute references:
The API provided by this specification does not require application writers
or filter writers to know the details of the external structure:
tar:/documentinfo.xml is saved as documentinfo.xml
tar:/0 is saved as part0/maindoc.xml
tar:/0/documentinfo.xml is saved as part0/documentinfo.xml
tar:/0/1 is saved as part0/part1/maindoc.xml
tar:/0/1/pictures/picture0.png
is saved as part0/part1/pictures/picture0.png
tar:/Table1/0 is saved as Table1/part0/maindoc.xml
Note that this is the structure as of version 2.2 of this specification.
The only other format shipped with KDE2.0 is converted (on reading) to look
like this through the services of the "Internal Name".
If the document does not contain any other Parts or pictures, then the
maindoc.xml and documentinfo.xml files are tarred and gzipped alone,
and saved with the proper extension (.kwd for KWord, .ksp for KSpread,
etc.).
The plan is to use relative paths everywhere, so please try not to use
absolute paths unless neccessary.
- Relative references:
To allow parts to be self-contained, and to ease the work of filter
developers version 2.3 features relative links within the storage.
This means that the KoStore now has a "state" as in "there is something
like a current directory". You can specify a link like
"pictures/picture0.png" and depending on the current directory this will
be mapped to some absolute path. The surrounding code has to ensure that
the current path is maintained correctly, but due to that we can get rid
of the ugly prefix thingy.
Thank you for your attention,
Werner <trobin@kde.org> and David <faure@kde.org>
(edited by Chris Lee <lee@azsites.com> for grammer, spelling, and formatting)
|