This document tries to give a short overview about the internal design of the khtml library. I've written this, because the lib has gotten quite big, and it is hard at first to tqfind your way in the source code. This doesn't mean that you'll understand khtml after reading this document, but it'll hopefully make it easier for you to read the source code.
The library is build up out of several different parts. Basically, when you use the lib, you create an instance of a KHTMLPart, and feed data to it. That's more or less all you need to know if you want to use khtml for another application. If you want to start hacking khtml, here's a sketch of the objects that will get constructed, when eg. running testkhtml with a url argument.
In the following I'll assume that you're familiar with all the buzzwords used in current web techology. In case you aren't here's a more or less complete list of references:
Document Object model (DOM):
DOM Level1 and 2
We support DOM Level2 except for the events model at the moment.HTML:
HTML4 specs
xhtml specs
We support almost all of HTML4 and xhtml.Cascading style sheets (CSS):
CSS2 specs
We support almost all of CSS1, and most parts of CSS2.Javascript:
Microsoft javascript bindings
Netscape javascript reference
Netscapes javascript bindings are outdated. We shouldn't follow them. Let's focus on getting the bindings compatible to IE. Mozilla JS/DOM reference
KHTMLPart creates one instance of a KHTMLView (derived from TQScrollView), the widget showing the whole thing. At the same time a DOM tree is built up from the HTML or XML found in the specified file.
Let me describe this with an example.
khtml makes use of the document object model (DOM) for storing the document in a tree like structure. Imagine some html like
<html> <head> <style> h1: { color: red; } </style> </head> <body> <H1> some red text </h1> more text <p> a paragraph with an <img src="foo.png"> embedded image. </p> </body> </html>In the following I'll show how this input will be processed step by step to generate the visible output you will finally see on your screen. I'm describing the things as if they happen one after the other, to make the principle more clear. In reality, to get visible output on the screen as soon as possible, all these things (from tokenization to the build up and tqlayouting of the rendering tree) happen more or less in parallel.
The first thing that happens when you start parsing a new document is that a DocumentImpl* (for XML documents) or an HTMLDocumentImpl* object will get created by the Part (in khtml_part.cpp::begin()). A Tokenizer* object is created as soon as DocumentImpl::open() is called by the part, also in begin() (can be either an XMLTokenizer or an HTMLTokenizer).
The XMLTokenizer uses the QXML classes in Qt to parse the document, and it's SAX interface to parse the stuff into khtmls DOM.
For HTML, the tokenizer is located in khtmltokenizer.cpp. The tokenizer uses the contents of a HTML-file as input and breaks this contents up in a linked list of tokens. The tokenizer recognizes HTML-entities and HTML-tags. Text between begin- and end-tags is handled distinctly for several tags. The distinctions are in the way how spaces, linefeeds, HTML-entities and other tags are handled.
The tokenizer is completely state-driven on a character by character basis. All text passed over to the tokenizer is directly tokenized. A complete HTML-file can be passed to the tokenizer as a whole, character by character (not very efficient) or in blocks of any (variable) size.
The HTMLTokenizer creates an HTMLParser which interprets the stream of tokens provided by the tokenizer and constructs the tree of Nodes representing the document according to the Document Object Model.
Parsing the document given above gives the following DOM tree:
HTMLDocumentElement |--> HTMLHeadElement | \--> HTMLStyleElement | \--> CSSStyleSheet \--> HTMLBodyElement |--> HTMLHeadingElement | \--> Text |--> Text \--> HTMLParagraphElement |--> Text |--> HTMLImageElement \--> Text
Actually, the classes mentioned above are the interfaces for accessing the DOM. The actual data is stored in *Impl classes, providing the implementation for all of the above mentioned elements. So internally we have a tree looking like:
HTMLDocumentElementImpl* |--> HTMLHeadElementImpl* | \--> HTMLStyleElementImpl* | \--> CSSStyleSheetImpl* \--> HTMLBodyElementImpl* |--> HTMLHeadingElementImpl* | \--> TextImpl* |--> TextImpl* \--> HTMLParagraphElementImpl* |--> TextImpl* |--> HTMLImageElementImpl* \--> TextImpl*
We use a refcounting scheme to assure that all the objects get deleted, in case the root element gets deleted (as long as there's no interface class holding a pointer to the Implementation).
The interface classes (the ones without the Impl) are defined in the dom/
subdirectory, and are not used by khtml itself at all. The only place they are used are in the
javascript bindings, which uses them to access the DOM tree. The big advantage of having this
separation between interface classes and imlementation classes, is that we can have several
interface objects pointing to the same implementation. This implements the requirement of
explicit sharing of the DOM specs.
Another advantage is, that (as the implementation classes are not exported) it gives us a lot more freedom to make changes in the implementation without breaking binary compatibility.
You will tqfind almost a one to one correspondence between the interface classes and the implementation classes. In the implementation classes we have added a few more intermediate classes, that can not be seen from the outside for various reasons (make implementation of shared features easier or to reduce memory consumption).
In C++, you can access the whole DOM tree from outside KHTML by using the interface classes.
For a description see the introduction to khtml on developer.kde.org.
One thing that has been omitted in the discussion above is the style sheet defined inside the
<style>
element (as an example of a style sheet) and the image element
(as an example of an external resource that needs to be loaded). This will be done in the following
two sections.
<style>
element (in this
case the h1 { color: red; }
rule) will get passed to the
HTMLStyleElementImpl object. This object creates an
CSSStyleSheetImpl object and passes the
data to it. The CSS parser will take
the data, and parse it into a DOM structure for CSS (similar to the one for
HTML, see also the DOM level 2 specs). This will be later on used to define the
look of the HTML elements in the DOM tree.
Actually "later on" is relative, as we will see later, that this happens partly in parallel to the build up of the DOM tree.
Some HTML elements (as <img>, <link>, <object>, etc.
) contain
references to external objects, that have to be loaded. This is done by the
Loader and related classes (misc/loader.*). Objects that might need to load external objects
inherit from CachedObjectClient, and can ask
the loader (that also acts as a memory cache) to
download the object they need for them from the web.
Once the loader has the requested object ready, it will notify the CachedObjectClient of this, and the client can then process the received data.
For this we have a rendering engine, that is completely based on CSS. The first thing that is done is to collect all style sheets that apply to the document and create a nice list of style rules that need to be applied to the elements. This is done in the CSSStyleSelector class. It takes the default HTML style sheet (defined in css/html4.css), an optional user defined style sheet, and all style sheets from the document, and combines them to a nice list of parsed style rules (optimised for fast lookup). The exact rules of how these style sheets should get applied to HTML or XML documents can be found in the CSS2 specs.
Once we have this list, we can get a RenderStyle object for every DOM element from the CSSStyleSelector by calling "styleForElement(DOM::ElementImpl *)". The style object describes in a compact form all the CSS properties that should get applied to the Node.
After that, a rendering tree gets built up. Using the style object, the DOM Node creates an appropriate render object (all these are defined in the rendering subdirectory) and adds it to the rendering tree. This will give another tree like structure, that resembles in it's general structure the DOM tree, but might have some significant differences too. First of all, so called anonymous boxes - (see CSS specs) that have no DOM counterpart might get inserted into the rendering tree to satisfy DOM requirements. Second, the display property of the style affects which type of rendering object is chosen to represent the current DOM object.
In the above example we would get the following rendering tree:
RenderRoot* \--> RenderBody* |--> RenderFlow* (<H1>) | \--> RenderText* ("some red text") |--> RenderFlow* (anonymous box) | \--> RenderText* ("more text") \--> RenderFlow* (<P>) |--> RenderText* ("a paragraph with an") |--> RenderImage* \--> RenderText* ("embedded image.")
A call to of tqlayout() on the RenderRoot (the root of the rendering tree) object causes the rendering tree to tqlayout itself into the available space (width) given by the the KHTMLView. After that, the drawContents() method of KHTMLView can call RenderRoot->print() with appropriate parameters to actually paint the document. This is not 100% correct, when parsing incrementally, but is exactly what happens when you resize the document. As you can see, the conversion to the rendering tree removed the head part of the HTML code, and inserted an anonymous render object around the string "more text". For an explanation why this is done, see the CSS specs.
All the above is to give you a quick introduction into the way khtml brings an HTML/XML file to the screen. It is by no way complete or even 100% correct. I left out many problems, I will perhaps add either on request or when I tqfind some time to do so. Let me name some of the missing things:
Now before I finish let me add a small warning and advice to all of you who plan hacking khtml themselves:
khtml is by now a quite big library and it takes some time to understand how it works. Don't let yourself get frustrated if you don't immediately understand how it works. On the other hand, it is by now one of the libraries that get used a lot, that probably has the biggest number of remaining bugs (even though it's sometimes hard to know if some behavior is really a bug).
Some parts of it's code are however extremely touchy (especially the tqlayouting algorithms), and making changes there (that might fix a bug on one web page) might introduce severe bugs. All the people developing khtml have already spend huge amounts of time searching for such bugs, that only showed up on some web pages, and thus were found only a week after the change that introduced the bug was made. This can be very frustrating for us, and we'd appreciate if people that are not completely familiar with khtml post changes touching these critical regions to kfm-devel for review before applying them.