SuikaWiki Markup Language (SWML)

Working Draft 6 March 2010

This Version
<http://suika.fam.cx/www/markup/suikawiki/spec/swml-work>
Latest Version
<http://suika.fam.cx/www/markup/suikawiki/spec/swml-work>
Latest Working Draft
<http://suika.fam.cx/www/markup/suikawiki/spec/swml-work>
Version History
<http://suika.fam.cx/gate/cvs/markup/suikawiki/spec/swml-work-src.en.html.u8>
Author
<>

Abstract

...

Status of this document

This section describes the status of this document at the time of its publication. Other documents might supersede this document.

This document is a working draft, produced as part of the SuikaWiki project. It might be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Comments on this document are welcome and may be sent to the author.

There is a list of possible new features that might be introduced in a future revision of this specification.

Translations of thie document might be available. The English version of the document is the only normative version.

Table of contents

  1. 1 Introduction
  2. 2 Terminology
    1. 2.1 Namespaces
    2. 2.2 Definitions
  3. 3 The SWML text serialization
    1. 3.1 Writing documents in the SWML text serialization
      1. 3.1.1 Document structure and header
      2. 3.1.2 Body part blocks
      3. 3.1.3 Inline contents
      4. 3.1.4 Images
      5. 3.1.5 Lexical structures
    2. 3.2 Parsing documents in the SWML text serialization
      1. 3.2.1 Tokenization of lines
        1. 3.2.1.1 The "initial" mode
        2. 3.2.1.2 The "body" mode
        3. 3.2.1.3 The "preformatted" mode
        4. 3.2.1.4 The "preformatted block" mode
        5. 3.2.1.5 The "image data" mode
      2. 3.2.2 Tokenization of a table row
      3. 3.2.3 Tokenization of a text
      4. 3.2.4 Parsing a magic line
      5. 3.2.5 Tree construction
        1. 3.2.5.1 The "in section" insertion mode
        2. 3.2.5.2 The "in table row" insertion mode
        3. 3.2.5.3 The "in paragraph" insertion mode
    3. 3.3 Serializing SWML text serialization documents
    4. 3.4 The text/x-suikawiki and text/x.suikawiki.image Internet Media Types
  4. 4 The SWML XML serialization
    1. 4.1 ... xml media type
  5. 5 Semantics of Elements and Attributes
    1. 5.1 Document structures
      1. 5.1.1 The document element in the SuikaWiki/0.9 namespace
      2. 5.1.2 The Name attribute in the SuikaWiki/0.9 namespace
      3. 5.1.3 The Version attribute in the SuikaWiki/0.9 namespace
      4. 5.1.4 The parameter element in the SuikaWiki/0.9 namespace
      5. 5.1.5 The value element in the SuikaWiki/0.9 namespace
      6. 5.1.6 The class attribute
      7. 5.1.7 The id attribute
    2. 5.2 Lexical structures
      1. 5.2.1 The replace element in the SuikaWiki/0.9 namespace
      2. 5.2.2 The text element in the SuikaWiki/0.9 namespace
    3. 5.3 Blocks
      1. 5.3.1 The insert element in the SuikaWiki/0.9 namespace
      2. 5.3.2 The delete element in the SuikaWiki/0.9 namespace
      3. 5.3.3 The dr element in the SuikaWiki/0.9 namespace
      4. 5.3.4 The comment-p element in the SuikaWiki/0.10 namespace
      5. 5.3.5 The ed element in the SuikaWiki/0.10 namespace
    4. 5.4 Hyperlinks
      1. 5.4.1 The anchor element in the SuikaWiki/0.9 namespace
      2. 5.4.2 The anchor-internal element in the SuikaWiki/0.9 namespace
      3. 5.4.3 The anchor-end element in the SuikaWiki/0.9 namespace
      4. 5.4.4 The anchor attribute in the SuikaWiki/0.9 namespace
      5. 5.4.5 The anchor-external element in the SuikaWiki/0.9 namespace
      6. 5.4.6 The resScheme attribute in the SuikaWiki/0.9 namespace
      7. 5.4.7 The resParameter attribute in the SuikaWiki/0.9 namespace
    5. 5.5 Embedded objects
      1. 5.5.1 The form element in the SuikaWiki/0.9 namespace
      2. 5.5.2 The image element in the SuikaWiki/0.9 namespace
      3. 5.5.3 The aa element in the AA namespace
    6. 5.6 Citations
      1. 5.6.1 The csection element in the SuikaWiki/0.10 namespace
      2. 5.6.2 The src element in the SuikaWiki/0.10 namespace
    7. 5.7 Qualified names
      1. 5.7.1 The qn element in the SuikaWiki/0.10 namespace
      2. 5.7.2 The qname element in the SuikaWiki/0.10 namespace
      3. 5.7.3 The nsuri element in the SuikaWiki/0.10 namespace
    8. 5.8 Inline annotations
      1. 5.8.1 The rubyb element in the SuikaWiki/0.9 namespace
      2. 5.8.2 The weak element in the SuikaWiki/0.9 namespace
      3. 5.8.3 The title element in the SuikaWiki/0.10 namespace
    9. 5.9 Key names
      1. 5.9.1 The key element in the SuikaWiki/0.10 namespace
    10. 5.10 Fallback elements
      1. 5.10.1 The attrvalue element in the SuikaWiki/0.10 namespace
      2. 5.10.2 Uppercase elements in the SuikaWiki/0.10 namespace
  6. References
    1. Normative references
    2. Non‐normative references

1 Introduction

This section is non‐normative.

This specification defines SuikaWiki Markup Language (SWML). SWML is the markup language developed and implemented for SuikaWiki hypertext system.

...

2 Terminology

All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.

The key words “MUST”, “MUST NOT”, “SHOULD”, and “MAY” in the normative parts of this document are to be interpreted as described in RFC 2119 [RFC2119].

Requirements phrased in the imperative as part of algorithms (such as “strip any leading space characters” or “return false and abort these steps”) are to be interpreted with the meaning of the key word (e.g. “MUST”) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps MAY be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)

When it is stated that some element or attribute is ignored, or treated as some other value, or handled as if it was something else, this refers only to the processing of the node after it is in the DOM. A user agent MUST NOT mutate the DOM in such situations.

The language in this specification assumes that the user agent expands all entity references, and therefore does not include entity reference nodes in the DOM. If user agents do include entity reference nodes in the DOM, then user agents MUST handle them as if they were fully expanded when implementing this specification. Entity references to unknown entities MUST be treated as if they contained just an empty text node for the purposes of the algorithms defined in this specification.

Web Applications 1.0 specification [WA1] defines the terms inter-element whitespace, content attribute, and IDL attribute.

2.1 Namespaces

For historical reason, elements and attributes defined or used in this specification belong to various namespaces.

The AA namespace is http://pc5.2ch.net/test/read.cgi/hp/1096723178/aavocab#. The preferred prefix is aa.

The HTML namespace is http://www.w3.org/1999/xhtml. The preferred prefix is html.

The SuikaWiki/0.9 namespace is urn:x-suika-fam-cx:markup:suikawiki:0:9:. The preferred prefix is sw.

The SuikaWiki/0.10 namespace is urn:x-suika-fam-cx:markup:suikawiki:0:10:. The preferred prefix is sw10.

The XHTML2 namespace is http://www.w3.org/2002/06/xhtml2/.

The XML namespace is http://www.w3.org/XML/1998/namespace. The preferred prefix is xml.

2.2 Definitions

White space characters are U+0009 CHARACTER TABULATION and U+0020 SPACE.

Digits are characters in the range U+0030 DIGIT ZERO .. U+0039 DIGIT NINE.

Uppercase letters are characters in the range U+0041 LATIN CAPITAL LETTER A .. U+005A LATIN CAPITAL LETTER Z.

Lowercase letters are characters in the range U+0061 LATIN SMALL LETTER A .. U+007A LATIN SMALL LETTER Z.

Language tag characters are digits, uppercase letters, lowercase letters, and U+002D HYPHEN-MINUS.

Scheme characters are digits, uppercase letters, lowercase letters, U+0025 PERCENT SIGN, U+002A PLUS SIGN, U+002D HYPHEN-MINUS, U+002E FULL STOP and U+005F LOW LINE.

A language specification is a string consist of a @ character followed by zero or more language tag characters. The body of a language specification is the substring in the language specification except for the first @ character. It might be the empty string.

Semantically, the body of a language specification represents a language tag, similar to the xml:lang attribute @@ ref.

3 The SWML text serialization

...

3.1 Writing documents in the SWML text serialization

This section is non‐normative.

Obviously, this section is incomplete; some prose definition is not yet available; some xrefs does not work yet. It should be specified why this is non-normative. ABNF definition & charset consideration need to be addressed.

3.1.1 Document structure and header

A document in the SWML text serialization consists of three parts: header part, body part, and optional image.

Several construct in a document refers page. A page is a unit of data in a hypertext database. The name of a page is sometimes referred to as WikiName. A page sometimes represents or is associated with an image. How to implement these concept, including how to resolve WikiNames, is not defined in this specification.

document
= header-part body-part [obs-image]

The header part has to be empty. In previous versions of SWML, a magic line could be contained, and in fact was required in some versions, in the header part of a document.

A magic line has to contain a string #?, followed by the format name, followed by a / character, followed by the format version. They identifies the version of the markup language in which the document is written. Historically, only two combinations of format name and format version as shown in the table below were defined, used, and implemented:

Format name Format version Description
SuikaWiki 0.9 The SuikaWiki/0.9 markup language.
SuikaWikiImage 0.9 The SuikaWikiImage/0.9 markup language.

A magic line can contain zero or more parameters after the format version. A parameter consists of one or more white space characters, followed by the name, followed by a = character, followed by a quoted string whose value representing zero or more values separeted by a , character. A parameter value consists of zero or more characters except for the separator character ,. Historically, following combinations of parameter names and values was defined and used:

Name Values Description
default-name Zero or more characters except for , The value represetns the default user name for WikiForm input fields. Exactly one value can be specified. The default when this parameter is implementation dependent.
import Zero or more characters except for , A value represents the WikiName by which definitions for entity references are imported. When this parameter is not specified, no definition is imported.
interactive yes or no Value yes means that the document contains an interactive content such as WikiForm. Value no, the default value used when the parameter is not specified, means the document does not contain such a content. It was intended to be used for the convinience of cache control mechanisms.
obsolete yes or no Value yes means the content of the document is obsolete, and value no, the default value used when the parameter is not specified, means the content is not obsolete.

The parameter name obsolete was defined in the SuikaWiki/0.9 specification, but the parameter name that had been actually implemented in SuikaWiki2 and used was the parameter name obsoleted.

obsoleted
page-icon Zero or more characters except for , The value represents the WikiName by which the page icon is imported. The page icon can be used as favicon @@ [ref], for example. Exactly one value can be specified. The default when this parameter is implementation dependent.
image-alt Zero or more characters except for , The value represents the alternative text for the image embedded in the document. Exactly one value can be specified. The default when this parameter is the empty string.
image-type An Internet Media Type with no parameter, white spaces, comments The value represents the type of the image embedded in the document. Exactly one value can be specified. This parameter has to be specified when the document contains an image.

The order in which parameters are specified is not significant. The parameter name of a parameter has to be different from the parameter name of any other parameter.

A magic line has to be terminated by zero or more white space characters followed by a newline.

header-part
= [obs-magic-line]
obs-magic-line
= "#?" format-name "/" format-version *(1*white-space parameter) *white-space newline
format-name
= identifier
format-version
= identifier
parameter
= parameter-name "=" quoted-string
parameter-name
= identifier
parameter-value-list
= [parameter-value *("," parameter-value)]
parameter-value
:= *(char − ",")

3.1.2 Body part blocks

The body part of a document consists of zero or more blocks.

There are several kinds of blocks: paragraphs, headings, lists, labeled lists, quotations, preformatted paragraphs, editted sections, tables, editorial notes, comment paragraphs, and empty blocks. In addition, forms and entity references can also be used as blocks.

Empty blocks, which is represented by an empty line, can be inserted between any two blocks. It is sometimes necessary to prevent a block from being interpreted as a part of the previous block.

For example, consider the following fragment:

- List item.
This line is part of the list item.

The second line is part of the list, by definition. If it is not desired, an empty block can be inserted between two lines as:

- List item.

This line is not part of the list item.
... such that the third line represents a paragraph.
body-part
= *block
block
= paragraph / heading / list / labeled-list / quotation / preformatted-paragraph / edited-section / table / editorial-note / comment-paragraph / empty-block / form / obs-entity-reference
empty-block
= newline

A paragraph represents a unit of the text, similar to HTML's p element. It consists of an optional destination anchor number, followed by a line contents, followed by a newline, followed by zero or more block children.

A paragraph cannot begin with a form or entity reference, since it is treated as a when it appears at the beginning of a line. A paragraph cannot begin with a white space character, since it is treated as a preformatted paragraph then.

A block child is one of an optional destination anchor number followed by line contents followed by a newline, a list, a labeled list, a preformatted paragraph, an edited section, a table, an editorial note, or a comment paragraph.

@@ definition for ;; and @@

paragraph
= [destination-anchor-number] line-contents newline *block-child
comment-paragraph
= ";;" *white-space [destination-anchor-number] [line-contents] newline *block-child
editorial-note
= ";;" *white-space [destination-anchor-number] [line-contents] newline *block-child
block-child
= [destination-anchor-number] line-contents newline / list / labeled-list / preformatted-paragraph / edited-section / table / editorial-note / comment-paragraph

A heading introduces a section. It is represented by one or more * characters, followed by zero or more white space characters, optionally followed by a destination anchor number, optionally followed by line contents, followed by a newline. The number of the * represents the depth of the section. A heading with only one * character begins a larger section than a heading>heading with more than one * characters. The line contents represents the name or caption for the section.

heading
= 1*"*" *white-space [destination-anchor-number] [line-contents] newline

There are three kinds of lists: ordered lists, unordered lists, and labeled lists. Ordered lists and unordered lists are called lists in this specification.

A list consists of zero or more items. An item in the list is represented by one or more - or = characters, followed by zero or more white space characters, optionally followed by a destination anchor number, optionally followed by line contents, followed by a newline, followed by zero or more block children. The number of - or = characters at the beginning of the item represents the depth of the list. In a list, depth of items has to be the same value. If there is another list in block children, it's items' depth has to be greater than the depth of the parent item. The last character that represents the depth of an item indicates the type of the list: - indicates unordered list while = indicates ordered list. In a list all items has to be same type.

A labeled list consists of one or more labeled list items. A labeled list item is represented by a : character, followed by zero or more white space characters, optionally followed by a destination anchor number, optionally followed by line contents, followed by zero or more white space characters, followed by a : character, followed by a destination anchor number, followed by zero or more white space characters, optionally followed by line contents, followed by newline, followed by zero or more block chidlren. The former line contents, if any, represents the label. Block children cannot contain a labeled list.

list
= 1*list-item
list-item
= 1*("-" / "=") *white-space [destination-anchor-number line-contents] newline *block-child
labeled-list
= 1*labeled-list-item
labeled-list-item
= ":" *white-space [destination-anchor-number] [line-contents] *white-space [destination-anchor-number] [line-contents] newline *block-child
The following example contains no quotation:
>>1 This is a reference, not a quote.
quotation
= 1*quoted-block
quoted-block
= 1*">" *white-space (paragraph / editorial-note / comment-paragraph / newline)
preformatted-paragraph
= preformatted-paragraph-block / obs-preformatted-paragraph
preformatted-paragraph-block
= '[PRE[' [class-specification] "[" *white-space newline *([destination-anchor-number] [line-contents] newline) ']PRE]' *white-space
obs-preformatted-paragraph
= white-space [line-contents] newline *([destination-anchor-number] [line-contents] newline)
edited-section
= inserted-section / deleted-section
inserted-section
= '[INS' [class-specification] "[" *white-space newline body-part ']INS]' *white-space newline
deleted-section
= '[DEL' [class-specification] "[" *white-space newline body-part ']DEL]' *white-space newline

A table represents a two-dimensional tabular data. It is similar to HTML table element, but what can be represented is even narrower than HTML table model. A table consists of one or more table rows. A table row consists of one or more table cells. Syntactically a table row is followed by a newline.

There are two kinds of table cells: data cells and colspan cells. The first cell in a row has to be a data cell. Syntactically a cell is preceded by a , character followed by zero or more white space characters, and is followed by zero or more white space characters.

A data cell represents a cell that contains data, like HTML td element. The cell consists of an optional destination anchor number, optionally followed by line contents. Syntactically, the data cell can be provided as a quoted string, in which case its value is interpreted as an optional destination anchor number, optionally followed by line contents. In fact the data cell has to be represented as a quoted string if it contains a , character, a leading " character, or leading or trailing white space characters.

A colspan cell represents that the cell that would be placed there forms an integrated part of the cell just before that cell. The cell just before that cell might also be a colspan cell.

table
= 1*table-row
table-row
= "," data-cell *("," cell) newline
cell
= data-cell / colspan-cell
data-cell
= *white-space ([cstartchar *cchar] / quoted-string) *white-space
cstartchar
= char − ("," / %x22 / white-space)
cchar
= char − ","
colspan-cell
= "=="

3.1.3 Inline contents

line-contents
= 1*(text / anchor-internal / anchor-external / anchor / tagged-inline-element / form / strong / emphasis / obs-entity-reference)
text
= 1*char
External reference scheme Syntax of external reference parameter Semantics
IW (identifier / quoted-string) ":" (identifier / quoted-string) InterWiki reference (An InterWikiName followed by a parameter)
MAIL RFC 2822 addr-spec but not RFC 2822 obs-addr-spec; no leading or trailing RFC 2822 FWS; no control characters (%x00-1f / %x7f) E-mail address
URI RFC 3986 URI reference URL
URL RFC 3986 URI reference URL

Maybe these schemes should reference Web Applications 1.0's URL and mail address syntax.

InterWiki is a mechanism for the hyperlinking and the combination of an InterWikiName and a parameter identifies the destination of the link. The interpretation of an InterWiki link is implementation dependent.

External reference schemes URI and URL ought not to be used.

destination-anchor-number
= "[" 1*DIGIT "]"
anchor-internal
= ">>" 1*DIGIT
anchor-external
= "<" external-reference ">"
external-reference
= URL / external-reference-scheme ":" external-reference-parameter
URL
= 1*uschar ":" external-reference-parameter
external-reference-scheme
= 1*xschar
external-reference-parameter
= *(char − ("<" / ">" / %x22) / quoted-string)
uschar
= char − (":" / UALPHA)
xschar
= char − (":" / LALPHA)
anchor
= "[[" [line-contents] inline-end-tag
Tag name Number of middle tags Internal reference source anchor External reference source anchor Semantics
AA 0 Not allowed Not allowed Character art (so-called ASCII-art, aa element)
ABBR 0 or 1 Not allowed Not allowed Abbreviation (HTML abbr element)
CITE 0 Not allowed Not allowed Title of a work (HTML cite element)
CODE 0 Not allowed Not allowed Code (HTML code element)
CSECTION 0 Not allowed Not allowed Title of a section in a work (csection element)
DEL 0 Allowed Allowed Removal (HTML del element)
DFN 0 or 1 Not allowed Not allowed Defined term (HTML dfn element)
INS 0 Allowed Allowed Insertion (HTML ins element)
KBD 0 Not allowed Not allowed User input (HTML kbd element)
KEY 0 Not allowed Not allowed Keyboard's key (key element)
Q 0 Allowed Allowed Quotation (HTML q element)
QN 0 or 1 Not allowed Not allowed Qualified name (qn element)
RUBY 1 or 2 Not allowed Not allowed Ruby annotation (HTML ruby element)
RUBYB 1 Not allowed Not allowed Secondary ruby annotation (rubyb element)
SAMP 0 Not allowed Not allowed Sample (HTML samp element)
SPAN 0 or 1 Not allowed Not allowed Span of text (HTML span element)
SRC 0 Not allowed Not allowed Short annotation for citation (src element)
SUP 0 Not allowed Not allowed Superscript (HTML sup element)
SUB 0 Not allowed Not allowed Subscript (HTML sub element)
TIME 0 or 1 Not allowed Not allowed Date or time (HTML time element)
VAR 0 Not allowed Not allowed Variable (HTML var element)
WEAK 0 Not allowed Not allowed Small print (weak element)

A future version of this specification might define more tag names.

An inline start tag whose tag name is INS or DEL might not be placed at the beginning of a line contents construct, since it could be interpreted as a block start tag.

A class specification represents class names unless otherwise specified. The class specification syntactically consist of a ( character followed by the body of the class specification followed by a ) character. The body of a class specification consists of zero or more characters excluding (, ), and \. The body of the class specification has similar semantics and processed similarly to HTML class attribute.

tagged-inline-element
= inline-start-tag [line-contents] *(inline-middle-tag [line-contents]) inline-end-tag
inline-start-tag
= "[" tag-name [class-specification] [language-specification] "["
tag-name
= 1*LALPHA
class-specification
= "(" *clchar ")"
clchar
= char − ("(" / ")" / "\")
language-specification
= "@" *ltchar
ltchar
= ALPHA / DIGIT / "-"
inline-middle-tag
= "]" *white-space [language-specification] "["
inline-end-tag
= "]" [anchor-internal / anchor-external] "]"

The form name specification, if any, defines the name of the form. It has to be different from any other form name defined in the document. A form name specification is syntactically class specification and the body of it is the form name. A form name cannot contain white space characters.

Specific form name Syntax of specific form parameters Semantics
comment Empty Comment input form.
embed ['IMG:'] identifier Embedding another page. The parameter specifies the WikiName of the page embedded. If the parameter begins with a string IMG:, the page is embedded as an image and the string does not form the part of the WikiName.
form N/A Reserved.
rcomment Empty Comment input form; a new comment is inserted after the form.
searched identifier Insert a search result for the parameter.

The form is an extension mechanism for the SWML text serialization. ...

The generic form can be used to embed a WikiForm specification. WikiForm provides a generic framework for describing user input forms and templates used for processing form inputs.

Three form fields in a form represents input template, output template, and options. Interpretation and processing for these fields are implementation dependent.

The name form cannot be used.

Names embed, rcomment, and searched are obsolete and cannot be used.

form
= generic-form / specific-form
generic-form
= "[[#" 'form' [form-name-specification] ":" form-field ":" form-field [":" form-field] "]]"
form-name-specification
= class-specification
form-field
= "'" *(char − ("'" / "\") / quoted-pair) "'"
specific-form
= "[[#" specific-form-name [":" specific-form-parameters] "]]"
specific-form-name
= 1*(LALPHA / "-")
specific-form-parameters
= identifier *(":" identifier)
strong
= "'''" [line-contents] "'''"
emphasis
= "''" [line-contents] "''"

3.1.4 Images

A document can contain an image by including a string __IMAGE__ followed by a newline followed by Base64 RFC 2045 encoded image data, at the end of the document. Parameters image-type and image-alt provide metadata for the image.

obs-image
= '__IMAGE__' *char

3.1.5 Lexical structures

An entity reference is a part of document that is expected to be replaced by a fragment imported from another document. It is no longer supported.

obs-entity-reference
= "__&&" 1*char "&&__"

A character is a character from the coded character set used to encode the document. Unless otherwise specified, for the purpose of this specification, control characters (characters in the range U+0000 .. U+001F and U+007F) are not a character.

A newline can be represented in any of three common conventions: CR (U+000D), LF (U+000A), or CR followed by LF.

A quoted string is zero or more characters enclosed by " characters. In a quoted string, character \ can only be used as part of quoted pair. A quoted pair is \ followed by a character. The value of a quoted string is the string obtained by removing " characters enclosing the quoted string and removing \ characters at the beginning of the quoted pairs.

identifier
= 1*(ALPHA / DIGIT / "-" / non-ascii)
non-ascii
= char − %x00-7f
char
= <Any character> − (%x00-1f / %x7f)
quoted-string
= %x22 *(char − ("\" / %x22) / quoted-pair) %x22
quoted-pair
= "\" char
newline
= %x0d %x0a / %x0d / %x0a
white-space
= %x09 / %x20

3.2 Parsing documents in the SWML text serialization

This section specifies how to convert a string of characters into a DOM [DOM] tree, assuming the string is written in the SWML text serialization. This process is referred to as parsing and an implementation that performes this process is referred to as parser.

How to convert a string of bytes into a string of characters is outside of the scope of this specification.

The parsing process is defined in terms of DOM and relies on HTML5 ... and manakai's extensions to DOM .... However, a conforming parser don't have to implement them, as long as the end result is equivalent.

The parsing process is divided into two stages: tokenization and tree construction. The tokenization stage emits a sequence of tokens, which are used as inputs for the tree construction stage. The tree construction stage constructs a DOM tree. Some steps invoked in the tokenization stage might also construct a part of the DOM tree. During the parsing, mutation events MUST NOT be invoked.

Before the actual parsing starts, a new Document object MUST be created. It represents the DOM tree constructed as a result of the parsing. The innerHTML IDL attribute of the Document object MUST be initially set to <html xmlns="http://www.w3.org/1999/xhtml"><head></head><body></body></html>. The document element is what the documentElement IDL attribute of the Document returns. The head element is what the firstChild IDL attribute of the document element returns at the time immediately after the innerHTML is set. The body element is what the lastChild IDL attribute of the document element returns at the time immediately after the innerHTML is set. The image element is initially null.


When the parser appends a character char to node node, the manakai_append_text method ... MUST be invoked on node with the argument char.

When an element is created, its prefix IDL attribute MUST be set to null.

When an attribute is created, unless otherwise specified, its prefix and namespaceURI IDL attributes MUST both set to null.

When an attribute in a namespace is created, its prefix IDL attribute MUST be set to the preferred prefix for the namespace.


A class specification is a string consist of a ( character, followed by zero or more character that is not one of ), ), or white space characters, and finally followed by a ) character. The body of a class specification is the substring of the class specification between parentheses (exclusive). It might be the empty string.

3.2.1 Tokenization of lines

When a string of characters is tokenized, the string s MUST be processed as follows:

  1. Let pos be zero (0). It represents the index in s. The index of the first character in data is zero (0).
  2. If pos is greater than or equal to the length of s, then emit an end-of-file token and abort these steps.
  3. Let line be the empty string.
  4. If the posth character of s is U+000D CARRIAGE RETURN, process line. Set line to the empty string. If the (pos + 1)th character of s is U+000A LINE FEED, increment pos by one (1).
  5. Otherwise, if the posth character of s is U+000A LINE FEED, process line. Set line to the empty string.
  6. Otherwise, append the posth character of s to line.
  7. Increase pos by one (1).
  8. Go back to the fourth step of these steps.

The steps above emit one or more sequence of tokens, which are inputs to the tree construction stage. A token can have zero or more properties, depending on the kind of the token. There are several kinds of tokens and properties as follows:

Block start tag token
Classes and tag name properties.
Block end tag token
Tag name property.
Character token
Data property.
Comment paragraph start token
No property.
Editorial note start token
No property.
Element token
Local name, namespace, anchor attribute, by attribute, resScheme attribute, resParameter attribute, and content attribute. Default for these properties are null.
Emphasis token
No property.
Empty line token
No property.
End-of-file token
No property.
Form token
Name, id, and parameters properties.
Heading start token
Depth property.
Heading end token
No property.
Inline start tag token
Tag name, classes, and language properties. Default for these properties is null.
Inline middle tag token
language property, whose default is null.
Inline end tag token
Anchor attribute, resScheme attribute, and resParameter attribute properties. Default for these properties is null.
Labeled list start token
No property.
Labeled list middle token
No property.
List start token
Depth property.
Preformatted start token
No property.
Preformatted end token
No property.
Quotation start token
Depth property.
Strong token
No property.
Table row start token
No property.
Table row end token
No property.
Table cell start token
No property.
Table cell end token
No property.
Table colspan cell token
No property.

Mode is a state of the tokenizer and is one of "initial" (the initial value used when the tokenization starts), "body", "preformatted", "preformatted block", and "image data".

Continuous line flag is another flag of the tokenizer, representing whether a new line character should be appended to the data, and takes either true or false. This flag is mainly used in the "body" mode.

When a line is processed, rules specified in the following subsections is used according to the appropriate mode. Rules below sometimes require the line be reprocessed. In such cases, rules for the appropriate mode MUST be followed with the same line.

3.2.1.1 The "initial" mode

In the "initial" mode, line MUST be processed as follows:

If line starts with #?
Parse a magic line line.
Otherwise
  1. Set the continuous line flag to false.
  2. Switch to the "body" mode and reprocess line.
3.2.1.2 The "body" mode

In the "body" mode, line MUST be processed as follows:

If line is empty
  1. Set the continuous line flag to false.
  2. Emit an empty line token.
If line starts with a white space character
  1. Emit a preformatted start token.
  2. Run the algorithm to tokenize a text with line.
  3. Switch to the "preformatted" mode.
If line starts with *
  1. Let data be line.
  2. Let depth be zero (0).
  3. While the first character of data, if any, is *, run the following substeps:
    1. Increase depth by one (1).
    2. Remove the first character of data. (The removed character will be *.)
  4. Remove white space characters at the beginning of data, if any.
  5. Emit a heading start token whose depth set to depth.
  6. Run the algorithm to tokenize a text with data.
  7. Emit a heading end token.
  8. Finally, set the continuous line flag to false.
If line starts with - or =
  1. Let data be line.
  2. Let depth be the empty string.
  3. While the first character of data, if any, is - or =, run the following substeps:
    1. Append the first character of data to depth.
    2. Remove the first character of data.
  4. Remove white space characters at the beginning of data, if any.
  5. Emit a list start token whose depth set to depth.
  6. Run the algorithm to tokenize a text with data.
  7. Finally, set the continuous line flag to true.
If line starts with :
  1. Let name be the empty string.
  2. Let data be line.
  3. Remove the first character of data. (The removed character will be :.)
  4. While data is not empty and the first character of data is not :, run the following substeps:
    1. Append the first character of data to name.
    2. Remove the first character of data.
  5. If name is the empty string, run the following substeps:
    1. Emit a character token whose data is a : character.
    2. Run the algorithm to tokenize a text with name.

    In this case, line does not represent a description list.

  6. Otherwise, run the following substeps:
    1. Remove white space characters at the beginning of name, if any.
    2. Remove white space characters at the end of name, if any.
    3. Emit a labeled list start token.
    4. Run the algorithm to tokenize a text with name.
    5. Remove the first character of data. (The removed character will be :.)
    6. Remove white space characters at the beginning of data, if any.
    7. Emit a labeled list middle token.
    8. Run the algorithm to tokenize a text with data.
  7. Finally, set the continuous line flag to true.
If line starts with >
  1. Let data be line.
  2. Let depth be zero (0).
  3. While the first character of data, if any, is >, run the following substeps:
    1. Increase depth by one (1).
    2. Remove the first character of data. (The removed character will be >.
  4. If depth is two (2), data is not empty, and the first character of data is one of digits, run the following substeps:
    1. Prepend two > characters to data.
    2. If the continuous line flag is true, preprend a U+000A LINE FEED character to data.
    3. Run the algorithm to tokenize a text with data.
    4. Set the continuous line flag to true.
  5. Otherwise, run the following substeps:
    1. Emit a quotation start token whose depth set to depth.
    2. Remove white space characters at the beginning of data, if any.
    3. If the length of data is greater than one (1) and the first two characters of data are @@, run the following substeps:
      1. Remove the first two characters of data. (The removed characters will be @@).
      2. Emit a editorial note start token.
      3. Remove white space characters at the beginning of data, if any.
      4. Set the continuous line flag to true.
    4. If the length of data is greater than one (1) and the first two characters of data are ;;, run the following substeps:
      1. Remove the first two characters of data. (The removed characters will be ;;).
      2. Emit a comment paragraph start token.
      3. Remove white space characters at the beginning of data, if any.
      4. Set the continuous line flag to true.
    5. Otherwise, if data is not empty, set the continuous line flag to true.
    6. Otherwise, set the continuous line flag to false.
    7. In any case, run the algorithm to tokenize a text with data.
If line is a string consist of a [ character, followed by one of INS, DEL, or PRE, optionally followed by class specification, followed by a [ character, followed by zero or more white space characters
  1. Emit a block start tag token whose tag name is one of INS, DEL, or PRE, that appears in line and classes is the body of the class specification, if any, or null otherwise.
  2. Set the continuous line flag to false.
  3. If the tag name is PRE, switch to the "preformatted block" mode.
If line starts with @@
  1. Let data be line.
  2. Remove the first two characters of data. (The removed characters will be @@.)
  3. Remove white space characters at the beginning of data, if any.
  4. Emit a editorial note start token.
  5. Run the algorithm to tokenize a text with data.
  6. Set the continuous line flag to true.
If line starts with ;;
  1. Let data be line.
  2. Remove the first two characters of data. (The removed characters will be ;;.)
  3. Remove white space characters at the beginning of data, if any.
  4. Emit a comment paragraph start token.
  5. Run the algorithm to tokenize a text with data.
  6. Set the continuous line flag to true.
If line is a string consist of a ] character, followed by one of INS or DEL, followed by a ] character, followed by zero or more white space characters
  1. Emit a block end tag token whoes tag name is one of INS, DEL, or PRE, that appears in line.
  2. Set the continuous line flag to false.
If line starts with ,
  1. Run the algorithm to tokenize a table row with line.
  2. Set the continuous line flag to false.
If line is __IMAGE__
Switch to the "image data" mode.
Otherwise
  1. If the continuous line flag is true, emit a character token whose data is a U+000A LINE FEED character.
  2. Run the algorithm to tokenize a text with line.
  3. Set the continuous line flag to true.
3.2.1.3 The "preformatted" mode

In the "preformatted" mode, line MUST be processed as follows:

If line is the empty string
  1. Emit a preformatted end token.
  2. Switch to the "body" mode and reprocess line.
If line is a string consist of a ] character, followed by one of INS or DEL, followed by a ] character, followed by zero or more white space characters
  1. Emit a preformatted end token.
  2. Emit a block end tag token whoes tag name is one of INS or DEL, that appears in line.
  3. Set the continuous line flag to false.
  4. Switch to the "body" mode.
Otherwise
  1. Emit a character token whose data is a U+000A LINE FEED character.
  2. Run the algorithm to tokenize a text with line.
3.2.1.4 The "preformatted block" mode

In the "preformatted block" mode, line MUST be processed as follows:

If line is a string consist of ]PRE] followed by zero or more white space characters
  1. Emit a block end tag token whoes tag name is PRE.
  2. Set the continuous line flag to false.
  3. Switch to the "body" mode.
Otherwise
  1. If the continuous line flag is true, emit a character token whose data is a U+000A LINE FEED character.
  2. Run the algorithm to tokenize a text with line.
  3. Set continuous line flag to true.
3.2.1.5 The "image data" mode

In the "image data" mode, line MUST be processed as follows:

  1. If the image element is null, then create an image element in the SuikaWiki/0.9 namespace and set the image element to that element. Append the image element to the document element.
  2. Otherwise, append a character U+000A LINE FEED to the image element.
  3. Then, append each character in line in the same order to the image element.

3.2.2 Tokenization of a table row

The algorithm to tokenize a table row data is as follows:

  1. Let pos be zero (0). It represents the index in data. The index of the first character in data is zero (0).
  2. Emit a table row start token.
  3. LOOP: If pos is greater than or equal to the length of data, emit a table row end token and abort this algorithm.
  4. Increase pos by one (1).
  5. Let cell be the empty string.
  6. Let cell quoted be null.
  7. If pos is greater than or equal to the length of data, emit a table row end token and abort this algorithm.
  8. If the posth character in data is a white space character, increase pos by one (1) and go back to the previous step.
  9. If the posth character in data is ", set cell quoted to the empty string and follow the substeps below:
    1. Increase pos by one (1).
    2. If pos is greater than or equal to the length of data, abort these substeps.
    3. Otherwise, if the posth character in data is ", abort these substeps.
    4. Otherwise, if the posth character in data is \, follow the substeps below:
      1. Increase pos by one (1).
      2. If pos is greater than or equal to the length of data, abort these substeps.
      3. Otherwise, append the posth character in data to cell quoted.
    5. Otherwise, append the posth character in data to cell quoted.
    6. Go back to the first substep in these substeps.
  10. While pos is less than the length of data, run the following substeps:
    1. If the posth character in data is ,, abort these substeps.
    2. Append the posth character in data to cell.
    3. Increase pos by one (1).
  11. Remove white space characters at the end of data, if any.
  12. If cell quoted is null and cell is equal to ==, emit a table colspan cell token and go back to the step labeled LOOP.
  13. Emit a table cell start token.
  14. If cell quoted is not null, run the algorithm to tokenize a text with cell quoted.
  15. Run the algorithm to tokenize a text with cell.
  16. Emit a table cell end token.
  17. Go back to the step labeled LOOP.

3.2.3 Tokenization of a text

The algorithm to tokenize a text data is as follows:

  1. Let nest level be zero (0).
  2. If data begins with [ followed by one or more digits followed by ], run the following steps:
    1. Let number be the digits in the matched substring.
    2. Remove the matched substring frm data.
    3. Emit an element token whose local name is anchor-end, namespace is the SuikaWiki/0.9 namespace, anchor attribute is number, and content is [ followed by number followed by ].
  3. While the length of data is not zero (0), run the appropriate steps:
    If data begins with [[#, followed by one or more lowercase letters or U+002D HYPHEN-MINUS
    1. Let name be the lowercase letters and U+002D HYPHEN-MINUS in the matched substring.
    2. Remove the matched substring from data.
    3. Let id be null.
    4. Let parameters be an empty list.
    5. If data begins with a class specification, run the following substeps:
      1. Set the id to the body of the class specification.
      2. Remove the class specification from data.
    6. While the first character of data is :, run the following substeps:
      1. Remove the first character of data.
      2. If the length of data is greater than one (1) and the first two characters of data are ]], abort these substeps.
      3. Let parameter be the empty string.
      4. If data is empty, append parameter to parameters and abort these substeps.
      5. If the first character of data is ', run the following steps:
        1. Remove the first character of data.
        2. If data is empty, abort these substeps.
        3. If the first character of data is ', abort these substeps.
        4. If the first character of data is \, run the following substeps:
          1. Remove the first character of data.
          2. If data is empty, abort these substeps.
          3. Append the first character of data to parameter.
        5. Otherwise, append the first character of data to parameter.
        6. Go back to the first substep in these substeps.
      6. Otherwise, run the following steps:
        1. If data is empty, or if the first character of data is :, abort these substeps.
        2. Append the first character of data to parameter.
        3. Remove the first character of data.
        4. Go back to the first substep of these substeps.
      7. Append parameter to parameters.
    7. If the length of data is greater than one (1) and the first two characters of data are ]], remove these characters from data.
    8. Emit a form token whose name is name, id is id, and parameters is parameters.
    Otherwise, if the data begins with [[
    1. Remove the matched substring from data.
    2. Emit an inline start tag token.
    3. Increase nest level by one (1).
    If data begins with [, followed by one or more uppercase letters, optionally followed by a class specification, optionally followed by a language specification, followed by [
    1. Let tag name be the uppercase letters in the matched substring of data.
    2. Let classes be the body of the class specification in the matched substring of data, if any, or null, otherwise.
    3. Let language be the body of the language specification in the matched substring of data, if any, or null, otherwise.
    4. Remove the matched substring from data.
    5. Emit an inline start tag token whose tag name is tag name, classes is classes, and language is language.
    6. Increase nest level by one (1).
    If data begins with ]]
    1. Remove the matched substring from data.
    2. Emit an inline end tag token.
    3. If nest level is greater than zero (0), decrease nest level by one (1).
    If data begins with ]<, followed by one or more scheme characters, followed by :
    1. Remove the matched substring from data and then act as if the first two character of the original data before the removal were < instead of ]<, except that the emitted token is an inline end tag token instead of an element token. The resScheme attribute of the token MUST be the resScheme attribute of the token that would be emitted if the first two character were <. The resParameter attribute of the token MUST be the resParameter attribute of the token that would be emitted if the first two character were <.
    2. If data begins with ], remove the character from data.
    3. If nest level is greater than zero (0), decrease nest level by one (1).
    If data begins with ]>> followed by one or more digits, followed by ]
    1. Let number be the digits in the matched substring.
    2. Remove the matched substring from data.
    3. Emit an inline end tag token whose anchor is number.
    4. If nest level is greater than zero (0), decrease nest level by one (1).
    If nest level is greater than zero (0) and data begins with ] followed by zero or more white space characters followed by [
    If nest level is greater than zero (0) and data begins with ] followed by zero or more white space characters followed by a language specification followed by [
    1. Let lang be the body of the language specification in the matched substring of data, if any, or null, otherwise.
    2. Remove the matched substring from data.
    3. Emit an inline middle tag token whose language is lang.
    If data begins with <, followed by one or more scheme characters, followed by :
    1. Let scheme be the scheme characters part of the matched substring.
    2. Remove the matched substring from data.
    3. Let value be the empty string.
    4. Run the following steps:
      1. If data is empty, abort these steps.
      2. If the first character of data is >, remove the first character of data and abort these steps.
      3. If the first character of data is ", append " to data and run the following substeps:
        1. Remove the first character of data.
        2. If data is empty, abort these steps.
        3. If the first character of data is ", append " to value, remove the first character of data, and abort these substeps.
        4. If the first character of data is \, run the following substeps:
          1. Append \ to value.
          2. Remove the first character of data.
          3. If data is empty, abort these steps.
          4. Append the first character of data to value.
        5. Otherwise, append the first character of data to value.
        6. Return back to the first substep of these substeps.
      4. Otherwise, run the following substeps:
        1. Append the first character of data to value.
        2. Remove the first character of data.
      5. Go back to the first substeps in these substeps.
    5. Let content be scheme followed by : followed by value.
    6. If scheme does not contain one of uppercase letters, set value to content and set scheme to URI.
    7. Emit an element token whose local name is anchor-external, namespace is the SuikaWiki/0.9 namespace, resScheme attribute is scheme, resParameter attribute is value, and content is content.
    If data begins with '''
    1. Remove the matched substring from data.
    2. Emit a strong token.
    Otherwise, if data begins with ''
    1. Remove the matched substring from data.
    2. Emit an emphasis token.
    If data begins with >> followed by one or more digits
    1. Emit an element token whose local name is anchor-internal, namespace is the SuikaWiki/0.9 namespace, anchor attribute is the digits part of the matched substring, and content is the matched substring.
    2. Remove the matched substring from data.
    If data begins with __&&
    1. Remove the matched substring from data.
    2. If data begins with &&__, or if data does not contain &&__ as a substring, emit four character tokens whose data are _, _, &, and & respectively and remove the first four characters of data and abort these steps.
    3. Let name be the substring of data between the beginning of the string and the first occurence of &&__ (exclusive).
    4. Remove the first occurence of &&__ and any character before it from data.
    5. Emit an element token whose local name is replace, namespace is the SuikaWiki/0.9 namespace, by attribute is name.
    Otherwise
    1. Emit a character token whose data set to the first character of data.
    2. Remove the first character of data.

3.2.4 Parsing a magic line

To parse a magic line data, the following steps MUST be used:

  1. Remove the first two characters of data. (It will be #?.)
  2. If there are one or more characters that are not white space characters at the beginning of data, run the following substeps:
    1. Let name be those characters.
    2. Let version be null.
    3. Remove those characters from data.
    4. If name contains /, set the substring after the first occurence of the character to version. Note that version might become the empty string. Remove the / character and the substring after the character from name.
    5. Set the Name content attribute of the document element in the SuikaWiki/0.9 namespace to name.
    6. If version is not null, set the Version content attribute of the document element in the SuikaWiki/0.9 namespace to version.
  3. Run the following substeps:
    1. If data is empty, abort these substeps.
    2. If the first character of data is a white space character, remove the character from data and go back to the first substep of these substeps.
    3. Let name be the empty string.
    4. If data begins with one or more characters that are not =, set name to those characters and remove those characters from data.
    5. Let parameter be a newly created parameter element in the SuikaWiki/0.9 namespace and set the name content attribute of parameter to name.
    6. Remove the first character of data. (It will be =.)
    7. If the first character of data, if any, is ", remove that character from data.
    8. Run the following substeps:
      1. Let value be the empty string.
      2. If data is empty, or if the first character of data is ", create a value element in the SuikaWiki/0.9 namespace, set the textContent IDL attribute of the node to value, and append the node to parameter.
      3. Otherwise, if the first character of data, if any, is \, run the following substeps:
        1. Remove the first character of data. (The removed character will be \.)
        2. If the first character of data, if any, is ,, abort these substeps.
        3. Otherwise, append the first character of data, if any, to value.
      4. In any case, if the first character of data is ,, create a value element in the SuikaWiki/0.9 namespace, set the textContent IDL attribute of the node to value, append the node to parameter, and go back to the first substep of these substeps.
      5. Otherwise, append the first character of data to value.
      6. Go back to the second substep of these substeps.
    9. If the first character of data, if any, is ", remove that character from data.
    10. Append parameter to the head element.
    11. Go back to the first substep of these substeps.

3.2.5 Tree construction

The tree construction stage constructs a DOM tree from a series of tokens emitted by the tokenization stage. The tree construction stage has two state variables: insertion mode and stack of open elements.

The insertion mode is one of "in section", "in table row", or "in paragraph". The default that MUST be used when the tree construction begins is the "in section" insertion mode. The rules for these insertion modes are described in the subsections below.

When the algorithm below says that the parser is to do something “using the rules for the m insertion mode”, the parser MUST use the rules described under the m insertion mode's section, but MUST leave the insertion mode unchanged.

The stack of open elements contains tuples of (element node, section depth, quotation depth, list depth). These stack grows downdards; the topmost entry on the stack is the first one added to the stack, and the bottommost entry of the stack is the most recently added entry in the stack. It initially contains only a tuple: (the body element, 0, 0, 0). When an entry is pushed to the stack of open elements, the items of the new tupple is set to the same values as the bottommost tuple unless otherwise specified.

The current element is the element node of the bottommost entry in the stack of open elements.

Structural elements are elements whose local name is one of body, section, insert, delete, blockquote, h1, ul, ol, dl, li, dt, dd, table, tbody, tr, td, p, comment-p, ed, or pre.

3.2.5.1 The "in section" insertion mode

In the "in section" insertion mode, a token MUST be processed as follows:

A heading start token
  1. If the local name of the current element is not one of body, section, insert, or delete, pop the element off the stack of open elements and follow this substep again.
  2. Let current depth be the section depth of the bottommost entry in the stack of open elements.
  3. If depth of the token is less than or equal to the current depth, pop the element off the stack of open elements and go back to the first substep of these substeps.
  4. Otherwise, if depth of the token is greater than current depth + 1, create a section element in the HTML namespace, append the element created to the current element, push the element created to the stack of open elements with section depth set to current depth + 1, quotation depth set to zero (0), and list depth set to zero (0), and go back to the first substep of these substeps.
  5. Create a section element in the HTML namespace.
  6. Append the element created to the current element.
  7. Push the element created to the stack of open elements with section depth set to depth, quotation depth set to zero (0), and list depth set to zero (0).
  8. Create a h1 element in the HTML namespace.
  9. Append the element created to the current element.
  10. Push the element created to the stack of open elements.
  11. Switch to the "in paragraph" insertion mode.
A block start tag token whose tag name is INS
  1. Create an insert element in the SuikaWiki/0.9 namespace.
  2. Append the element created to the current element.
  3. Push the element created to the stack of open elements with section depth set to zero (0), quotation depth set to zero (0), and list depth set to zero (0).
  4. If the token's classes is not null, set the class content attribute of the element created to classes.
A block start tag token whose tag name is DEL
  1. Create an delete element in the SuikaWiki/0.9 namespace.
  2. Append the element created to the current element.
  3. Push the element created to the stack of open elements with section depth set to zero (0), quotation depth set to zero (0), and list depth set to zero (0).
  4. If the token's classes is not null, set the class content attribute of the element created to classes.
A quotation start token
  1. If the local name of the current element is not one of blockquote, body, section, insert, or delete, pop the element off the stack of open elements and follow this substep again.
  2. Let current depth be the quotation depth of the bottommost entry in the stack of open elements.
  3. If depth of the token is less than the current depth, pop the element off the stack of open elements and go back to the first substep of these substeps.
  4. Otherwise, if depth of the token is greater than current depth, create a blockquote element in the HTML namespace, append the element created to the current element, push the element created to the stack of open elements with section depth set to zero (0), quotation depth set to current depth + 1, and list depth set to zero (0), and go back to the first substep of these substeps.
A list start token
  1. Let current depth be the list depth of the current element.
  2. Let inserted depth be the length of depth of the token.
  3. Let local name be ul, if the last character in depth is -, or ol, otherwise.
  4. If current depth is greater than inserted depth, pop the current element off the stack of open elements and go back to the first substep of these substeps.
  5. If the list depth of the current element is equal to inserted depth and the local name of the current element is not local name, pop the current element off the stack of open elements and go back to the first substep of these substeps.
  6. If current depth is less than inserted depth, run the following substeps:
    1. Let type be the character at the index equal to current depth in depth of the token, where the index of the first character in depth is zero (0).
    2. If type is -, create a ul element in the HTML namespace.
    3. Otherwise, create a ol element in the HTML namespace.
    4. Append the element created to the current element.
    5. Push the element created to the stack of open elements, with list depth set to current depth + 1.
    6. If current depth + 1 is less than inserted depth, run the following substeps:
      1. Create a li element in the HTML namespace.
      2. Append the element created to the current element.
      3. Push the element created to the stack of open elements.
    7. Go back to the first substep for the list start token.
  7. Create a li element in the HTML namespace.
  8. Append the element created to the current element.
  9. Push the element created to the stack of open elements.
  10. Switch to the "in paragraph" insertion mode.
A labeled list start token
  1. If the local name of the current element is dd, pop the element off the stack of open elements.
  2. If the local name of the current element is not dl, create a dl element in the HTML namespace, append the element created to the current element, and push the element created to the stack of open elements.
  3. Create a dt element in the HTML namespace.
  4. Append the element created to the current element.
  5. Push the element created to the stack of open elements.
  6. Switch to the "in paragraph" insertion mode.
A table row start token
  1. Create a table element in the HTML namespace.
  2. Append the element created to the current element.
  3. Push the element created to the stack of open elements.
  4. Create a tbody element in the HTML namespace.
  5. Append the element created to the current element.
  6. Push the element created to the stack of open elements.
  7. Create a tr element in the HTML namespace.
  8. Append the element created to the current element.
  9. Push the element created to the stack of open elements.
  10. Switch to the "in table row" insertion mode.
A block start tag token whose tag name is PRE
  1. Create a pre element in the HTML namespace.
  2. Append the element created to the current element.
  3. Push the element created to the stack of open elements.
  4. If the token's classes is not null, set the class content attribute of the element created to classes.
  5. Switch to the "in paragraph" insertion mode.
A preformatted start token
  1. Create a pre element in the HTML namespace.
  2. Append the element created to the current element.
  3. Push the element created to the stack of open elements.
  4. Switch to the "in paragraph" insertion mode.
A comment paragraph start token
  1. Create a comment-p element in the SuikaWiki/0.10 namespace.
  2. Append the element created to the current element.
  3. Push the element created to the stack of open elements.
  4. Switch to the "in paragraph" insertion mode.
A editorial note start token
  1. Create a ed element in the SuikaWiki/0.10 namespace.
  2. Append the element created to the current element.
  3. Push the element created to the stack of open elements.
  4. Switch to the "in paragraph" insertion mode.
An empty line token
  1. If the local name of the current element is not one of body, section, insert, or delete, pop the element off the stack of open elements and follow this substep again.
A block end tag token whose tag name is INS
  1. If the stack of open elements contains an element whose local name is insert, pop the current element off the stack of open elements until such an element whose local name is insert has been popped from the stack of open elements.
  2. Set the continuous line to false.
A block end tag token whose tag name is DEL
  1. If the stack of open elements contains an element whose local name is delete, pop the current element off the stack of open elements until such an element whose local name is delete has been popped from the stack of open elements.
  2. Set the continuous line to false.
A form token
An element token whose local name is replace
Process the token using the rules for the "in paragraph" insertion mode.
An end-of-file token
Now the Document has been constructed. Abort the parser.
Any other block start tag token
A labeled list middle token, heading end token, preformatted end token, table row end token, table cell start token, table cell end token, or table colspan cell token
Ignore the token.
Anything else
  1. If the local name of the current element is not one of li, dd, comment-p, or ed, run the following substeps:
    1. Create a p element in the HTML namespace.
    2. Append the element created to the current element.
    3. Push the element created to the stack of open elements.
  2. Switch to the "in paragraph" insertion mode and reprocess the token.
3.2.5.2 The "in table row" insertion mode

In the "in table row" insertion mode, a token MUST be processed as follows:

A table cell start token
  1. Create a td element in the HTML namespace.
  2. Append the element created to the current element.
  3. Push the element created to the stack of open elements.
  4. Switch to the "in paragraph" insertion mode.
A table colspan cell token
  1. If the local name of the node returned by the lastChild IDL attribute of the current element, if any, is td, increase the value of colspan IDL attribute of the node by one (1) and abort these substeps.
  2. Create a td element in the HTML namespace.
  3. Append the element created to the current element.
A table row end token
If the local name of the current element is tr, pop the element off the stack of open elements.
A table row start token
  1. Create a tr element in the HTML namespace.
  2. Append the element created to the current element.
  3. Push the element created to the stack of open elements.
Anything else
Switch to the "in section" insertion mode and reprocess the token.
3.2.5.3 The "in paragraph" insertion mode

In the "in paragraph" insertion mode, a token MUST be processed as follows:

A character token
Append the character in data of the token to the current element.
An inline start tag token whose tag name is null
  1. Create an anchor element in the SuikaWiki/0.9 namespace.
  2. Append the element created to the current element.
  3. Push the element created to the stack of open elements.
Any other inline start tag token
  1. Create an element. The namespace and local name of the element is determined according to the tag name of the inline start tag token as shown in the following table:
    Tag name Namespace Local name
    AA The AA namespace aa
    ABBR The HTML namespace abbr
    CITE The HTML namespace cite
    CODE The HTML namespace code
    CSECTION The SuikaWiki/0.10 namespace csection
    DEL The HTML namespace del
    DFN The HTML namespace dfn
    INS The HTML namespace ins
    KBD The HTML namespace kbd
    KEY The SuikaWiki/0.10 namespace key
    Q The HTML namespace q
    QN The SuikaWiki/0.10 namespace qn
    RUBY The HTML namespace ruby
    RUBYB The SuikaWiki/0.9 namespace rubyb
    SAMP The HTML namespace samp
    SPAN The HTML namespace span
    SRC The SuikaWiki/0.10 namespace src
    SUB The HTML namespace sub
    SUP The HTML namespace sup
    TIME The HTML namespace time
    VAR The HTML namespace var
    WEAK The SuikaWiki/0.9 namespace weak
    Anything else The SuikaWiki/0.10 namespace Same as tag name
  2. If the token's classes is not null, set the class content attribute of the element created to classes.
  3. If the token's language is not null, set the lang content attribute in the XML namespace of the element created to language.
  4. Append the element created to the current element.
  5. Push the element created to the stack of open elements.
A inline middle tag token
  1. Let local name be title.
  2. Let namespace be the SuikaWiki/0.10 namespace.
  3. If the local name of the current element is rt, set local name to rt, set namespace to the HTML namespace, and pop the current element off the stack of open elements.
  4. Otherwise, if the local name of the current element is title or nsuri, set local name to attrvalue and pop the current element off the stack of open elements.
  5. Otherwise, if the local name of the current element is qn, set local name to nsuri.
  6. Otherwise, if the local name of the current element is ruby or rubyb, set local name to rt and set namespace to the HTML namespace.
  7. Create an element whose local name local name in the namespace.
  8. If the token's language is not null, set the lang content attribute in the XML namespace of the element created to language.
  9. Append the element created to the current element.
  10. Push the element created to the stack of open elements.
A inline end tag token
  1. If the local name of the current element is one of rt, title, nsuri, or attrvalue, pop the element off the stack of open elements.
  2. If the current element is one of structural elements, or if the local name of the current element is strong or em, run the following substeps:
    1. If both resScheme attribute and anchor attribute of the token are null, append characters ]] to the current element, push the current element to the stack of open elements, and abort these substeps.

      As a result, the bottommost and second bottommost entries becomes equal, but one of them is popped from the stack of open elements soon.

    2. If resScheme attribute of the token is not null, create an anchor-external element in the SuikaWiki/0.9 namespace.
    3. Otherwise, create a anchor-internal element in the SuikaWiki/0.9 namespace.
    4. Append the element created to the current element.
    5. Set the textContent IDL attribute of the element created to ]].
    6. Push the element created to the stack of open elements.
  3. If anchor attribute of the token is not null, set the anchor content attribute in the SuikaWiki/0.9 namespace of the current element to anchor attribute of the token.
  4. If resScheme attribute of the token is not null, set the resScheme content attribute in the SuikaWiki/0.9 namespace of the current element to resScheme attribute of the token.
  5. If resParameter attribute of the token is not null, set the resParameter content attribute in the SuikaWiki/0.9 namespace of the current element to resParameter attribute of the token.
  6. Pop the current element off the stack of open elements.
A strong token
  1. If the local name of the current element is strong, pop the element off the stack of open elements and abort these substeps.
  2. Create a strong element in the HTML namespace.
  3. Append the element created to the current element.
  4. Push the element created to the stack of open elements.
An emphasis token
  1. If the local name of the current element is em, pop the element off the stack of open elements and abort these substeps.
  2. Create an em element in the HTML namespace.
  3. Append the element created to the current element.
  4. Push the element created to the stack of open elements.
A form token whose name is form
  1. Create a form element in the SuikaWiki/0.9 namespace.
  2. If id of the form token is not null, set the id content attribute of the element created to id of the form token.
  3. Set the input content attribute of the element created to the first item in parameters of the form token, if any, or the empty string otherwise.
  4. Set the template content attribute of the element created to the second item in parameters of the form token, if any, or the empty string otherwise.
  5. Set the option content attribute of the element created to the third item in parameters of the form token, if any, or the empty string otherwise.
  6. If the parameters contains four or more items, set the parameter content attribute of the element created to the concatenation of items in parameters, separated by a : character, in the same order.
  7. Append the element created to the current element.
Any other form token
  1. Create a form element in the SuikaWiki/0.9 namespace.
  2. Set the ref content attribute of the element created to name of the form token.
  3. Set the id of the form token is not null, set the id content attribute of the element created to id of the form token.
  4. If parameters of form token is not empty, set the parameter content attribute of the element created to the concatenation of items in parameters, separated by a : character, in the same order. The result value might be the empty string.
  5. Append the element created to the current element.
An element token
  1. Create an element whose local name is local name of the element token and namespace is namespace of the element token.
  2. If anchor attribute of the element token is not null, set the anchor content attribute in the SuikaWiki/0.9 namespace of the element created to anchor attribute of the element token.
  3. If by attribute of the element token is not null, set the by content attribute of the element created to by attribute of the element token.
  4. If resScheme attribute of the element token is not null, set the resScheme content attribute in the SuikaWiki/0.9 namespace of the element created to resScheme attribute of the element token.
  5. If resParameter attribute of the element token is not null, set the resParameter content attribute in the SuikaWiki/0.9 namespace of the element created to resParameter attribute of the element token.
  6. If content of the element token is not null, set the textContent IDL attribute of the element created to content of the element token.
  7. Append the element created to the stack of open elements.
A labeled list middle token
  1. If the current element is not one of structural elements, pop the element off the stack of open elements and follow this substep again.
  2. If the local name of the current element is dt, pop the element off the stack of open elements.
  3. Create a dd element in the HTML namespace.
  4. Append the element created to the current element.
  5. Push the element created to the stack of open elements.
A heading end token
  1. If the current element is not one of structural elements, pop the element off the stack of open elements and follow this substep again.
  2. If the local name of the current element is h1, pop the element off the stack of open elements.
  3. Switch to the "in section" insertion mode.
A table cell end token
  1. If the current element is not one of structural elements, pop the element off the stack of open elements and follow this substep again.
  2. If the local name of the current element is td, pop the element off the stack of open elements.
  3. Switch to the "in table row" insertion mode.
A block end tag token whose tag name is PRE
A preformatted end token
  1. If the current element is not one of structural elements, pop the element off the stack of open elements and follow this substep again.
  2. If the local name of the current element is pre, pop the element off the stack of open elements.
  3. Switch to the "in section" insertion mode.
Anything else
  1. If the current element is not one of structural elements, pop the element off the stack of open elements and follow this substep again.
  2. Switch to the "in section" insertion mode and reprocess the token.

3.3 Serializing SWML text serialization documents

...

3.4 The text/x-suikawiki and text/x.suikawiki.image Internet Media Types

The SWML text serialization can be identified by Internet Media Type text/x-suikawiki.

An entity labeled as text/x-suikawiki MUST be an SWML text serialization and MUST be processed as an SWML text serialization.

Additionally, for historical reason, an entity labeled as text/x.suikawiki.image MUST be processed as an SWML text serialization. This Internet Media Type MUST NOT be used for a new entity.

It was originally intended that a document with format name equal to SuikaWiki is labeled as text/x-suikawiki while a document with format name equal to SuikaWikiImage is labeled as text/x.suikawiki.image.

The charset parameter of these Internet Media Types represents the character encoding used for the entity. It has the same requirements as the charset parameter for the text/html Internet Media Type @@ todo: ref.

The version parameter MAY has the value 0.9 or 0.10 but SHOULD NOT be used. The parameter MUST be ignored.

This parameter was originally used to encode format version in favor of magic line.

... IMT template; fragment identifier

4 The SWML XML serialization

...

4.1 ... xml media type

5 Semantics of Elements and Attributes

This specification is the specification for the SuikaWiki/0.9 namespace and the SuikaWiki/0.10 namespace. Anything belongging to those namespaces is defined in this specification.

Elements and attributes in the SuikaWiki/0.9 namespace and in the SuikaWiki/0.10 namespace, as well as attributes in no namespace for elements in the SuikaWiki/0.9 namespace and in the SuikaWiki/0.10 namespace, MUST NOT be used in context where they are not allowed explicitly.

A namespaced attribute allowed in another specification can be used on elements in the SuikaWiki/0.9 namespace and in the SuikaWiki/0.10 namespace. For example, a lang attribute in the XML namespace is allowed to be specified for an XML element, as defined in the XML specification [XML]. Note the allowed attributes entry in following subsections only lists up attributes defined in this specification.

Elements in the SuikaWiki/0.9 namespace and in the SuikaWiki/0.10 namespace defined in this specification MUST conform to their content model.

Inter-element whitespace, comment nodes, and processing instruction nodes MUST be ignored when establishing whether an element matches its content model or not.

Elements in the SuikaWiki/0.9 namespace and in the SuikaWiki/0.10 namespace MAY be orphan nodes (i.e. without a parent node).

In the following subsections, attributes listed in the allowed attributes entry MAY be specified to an element described in that subsection.

Some elements belong to categories. Categories flow content and phrasing content are defined in the HTML5 specification ....

Terms and algorithms valid integer, rules for parsing integers are defined in the HTML5 specification ....

An attribute is said to be specified to an element if the hasAttributeNS method invoked on the element with appropriate arguments would return true.

That is, the term specified is irrelevant from the specified IDL attribute.

5.1 Document structures

5.1.1 The document element in the SuikaWiki/0.9 namespace

Category
None.
Content model
A head element in the XHTML2 namespace, followed by a body element in the XHTML2 namespace, optionally followed by a image element in the SuikaWiki/0.9 namespace.
Allowed attributes
None.

This element MUST NOT be used.

...

5.1.2 The Name attribute in the SuikaWiki/0.9 namespace

This attribute MUST NOT be used.

...

5.1.3 The Version attribute in the SuikaWiki/0.9 namespace

This attribute MUST NOT be used.

...

5.1.4 The parameter element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Zero or more value element in the SuikaWiki/0.9 namespace.
Allowed attributes
name

This element MUST NOT be used.

... name

5.1.5 The value element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Text.
Allowed attributes
None.

This element MUST NOT be used.

...

5.1.6 The class attribute

All elements in the HTML namespace have class attribute ....

The class attribute of an element in the SuikaWiki/0.9 namespace and SuikaWiki/0.10 namespace has the same semantics and requirements as the HTML class attribute.

The class attribute of an element in the AA namespace SHOULD be considered as having the same semantics and requirements as the HTML class attribute.

5.1.7 The id attribute

The id attribute of an element in the SuikaWiki/0.9 namespace and SuikaWiki/0.10 namespace has the same semantics and requirements as the id attribute of HTML5 ....

5.2 Lexical structures

5.2.1 The replace element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Empty.
Allowed attributes
by

This element MUST NOT be used.

... by

5.2.2 The text element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Text.
Allowed attributes
None.

This element MUST NOT be used.

...

5.3 Blocks

5.3.3 The dr element in the SuikaWiki/0.9 namespace

Category
None.
Content model
A dt element in the XHTML2 namespace, followed by a dd element in the XHTML2 namespace.
Allowed attributes
None.

This element MUST NOT be used.

...

5.4.4 The anchor attribute in the SuikaWiki/0.9 namespace

The anchor attribute in the SuikaWiki/0.9 namespace, when specified to an anchor-end element, defines an anchor number for the parent element of the anchor-end element, if any.

The attribute MUST be specified and its value MUST be a valid integer. The integer MUST have different value from any other anchor attribute in the SuikaWiki/0.9 namespace specified in an anchor-end element in the SuikaWiki/0.9 namespace that belongs to the same tree as the first attribute.


The anchor attribute in the SuikaWiki/0.9 namespace MAY be specified to q elements in the HTML namespace and in the XHTML2 namespace, as well as ins and del elements in the HTML namespace. The attribute can also be specified to anchor and anchor-internal elements in the SuikaWiki/0.9 namespace.

In these cases, the attribute represents the anchor number of the element referenced. If the element on which the attribute is specified is an anchor element, the element referenced might be found in the document referenced by the element. Otherwise, the element is in the tree the element belongs to.

If the element on which the attribute is specified is not an anchor or anchor-internal element, the attribute has similar semantics to that of the cite attribute on the element. In such cases, the anchor attribute in the SuikaWiki/0.9 namespace MUST NOT be specified when the cite attribute is specified. A user agent MUST ignore the anchor attribute in the SuikaWiki/0.9 namespace if the cite attribute is specified.

The attribute value MUST be a valid integer. Unless the element is anchor, the integer MUST be equal to one of the integer represented by the anchor attribute in the SuikaWiki/0.9 namespace specified to an anchor-internal element in the SuikaWiki/0.9 namespace that belongs to the same tree.

5.4.6 The resScheme attribute in the SuikaWiki/0.9 namespace

The resScheme attribute in the SuikaWiki/0.9 namespace MAY be specified to q elements in the HTML namespace and in the XHTML2 namespace, as well as ins and del elements in the HTML namespace. The attribute can also be specified to an anchor-external element in the SuikaWiki/0.9 namespace.

...

5.4.7 The resParameter attribute in the SuikaWiki/0.9 namespace

The resParameter attribute in the SuikaWiki/0.9 namespace MAY be specified to q elements in the HTML namespace and in the XHTML2 namespace, as well as ins and del elements in the HTML namespace. The attribute can also be specified to an anchor-external element in the SuikaWiki/0.9 namespace.

...

5.5 Embedded objects

5.5.1 The form element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Content model
Empty.
Allowed attributes
id
input
option
parameter
ref
template

... ref, parameter.


... input, template, option

5.5.2 The image element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Text.
Allowed attributes
None.

This element MUST NOT be used.

...

5.5.3 The aa element in the AA namespace

The aa element in the AA namespace ... falls into the phrasing content category for the purpose of the content models in this specification.

The content model of this element SHOULD be considered as phrasing content.

5.6 Citations

5.7 Qualified names

5.7.2 The qname element in the SuikaWiki/0.10 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
None.

This element MUST NOT be used.

...

5.7.3 The nsuri element in the SuikaWiki/0.10 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
None.

...

5.8 Inline annotations

5.8.1 The rubyb element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Content model
Phrasing content, followed by a rt element in the HTML namespace.
Allowed attributes
class

...

5.8.3 The title element in the SuikaWiki/0.10 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
None.

This element MAY be inserted as the last child of abbr, dfn, span, or time element in the HTML namespace when the title attribute of that element is not specified.

If the parent element of the element has a title attribute specified, or the element is not the last child, the element MUST be ignored.

Inter-element whitespaces, comments, and processing instructions can be inserted after this element.

...

5.9 Key names

5.10 Fallback elements

5.10.1 The attrvalue element in the SuikaWiki/0.10 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
None.

This element MUST NOT be used.

...

5.10.2 Uppercase elements in the SuikaWiki/0.10 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
class

Uppercase elements are elements in the SuikaWiki/0.10 namespace whose local name consists of one or more uppercase letters.

These elements MUST NOT be used.

These elements might be inserted into the DOM by a parser when an inline start tag with unknown tag name is found.

References

Normative references

AAVOCAB
...
DOM
Web DOM Core? DOM3 Core?
MANAKAI
manakai's DOM extensions.
RFC2119
Key words for use in RFCs to Indicate Requirement Levels, Scott Bradner, IETF BCP 14, March 1997.
WA1
Web Applications 1.0, Ian Hickson, WHATWG Draft Standard.
XHTML2
...
XML
...

Non‐normative references

SW09
SuikaWiki/0.9 Document Markup Format: Syntax Specification, Wakaba, , updated .
SW10
SuikaWiki/0.10, .
SuikaWiki/0.10 — Inline Element Type Additions, .
SuikaWiki/0.10 — Language Tags, .
SuikaWiki/0.10 — Block‐level Additional Vocabulary, .