GoFiler Legato Script Reference
Legato v 1.5b Application v 5.24b
|
Table of Contents | < < Previous | Next >> |
Chapter Eleven — SGML Functions
11.1 Introduction to SGML Support
Legato provides for a couple of levels of Standard Generalized Markup Language (SGML) support. The SGML umbrella covers HTML, XML, XBRL and a number of other SGML style formats.
SGML support is broken into a SGML Object for reading and writing SGML data, an HTML Writer Object for easily generating HTML code, HTML Table Object and an HTML Outline Object.
Within the world of SGML, HTML, XML and Legato, there are a number of common terms to be familiar with:
tag | A collection of an element and zero or attribute value pairs surrounded by ‘<‘ and ‘>‘ angle brackets. For example: |
<P>
Specifies the start of a paragraph block in HTML.
element | The element is the text that defines the tag. In the above example the element is ‘P’. For HTML elements are not case-sensitive, for XML they are. Elements can be referenced by string or token. |
attribute | Attributes specify zero or more data parameters associated with an element. For example: |
<P ALIGN="CENTER">
Specifies an alignment of center. SGML attributes are always in the form of name=data except for certain boolean attributes such as NOWRAP which can have no value or be explicitly NOWRAP=NOWRAP. Attributes are separated from the leading element and other attributes by white space.
Values are normally quoted with single or double straight quotes. If the data contains spaces it must be quoted. By default, the SGML functions will always write data with quotes.
When referencing an attribute it is known as a parameter. Within this documentation attributes are always upper case.
property | Because HTML has CSS interlaced, SGML function processes CSS properties in the same manner as attributes. An alternative to the above example: |
<P STYLE="text-align: center">
In this case, the STYLE attribute STYLE contains one or more CSS property: value pairs. Property-value pairs are separated by a semicolon. Data can be further quoted as required within the CSS value. Property names are not case-sensitive.
When referencing a CSS property, it is also known as a parameter. Within this documentation attributes are always lower case.
Tokens are generally used to reference parameters with attributes properties being of different classes to avoid confusion. If referencing by string name, conflicts can arise. For example, HTML has an attribute known as ‘COLOR’ and CSS property as ‘color’
PCDATA | Parsed Character (PCDATA) is generally the information between the tags. Certain characters must be represented as character entities. For example, the chevron characters < > must be represented as < and > character entities if used as text to be expressed as text within the document and differentiated from tags. Similarly, the & character must also be represented as & to avoid being confused with a badly formatted character entity. |
field | A field is a proprietary special HTML comment structured as high-level control data for publishing and document control. They are structured by a combination HTML/CSS element. Fields are also structure and nesting like elements. |
Review the W3C and other sources for more information and terms.
Another aspect of XML and XHTML is the use of namespaces. Namespaces effectively classify elements and attribute names allowing for segmentation and grouping. While the underlying SGML class has namespace support, it is not exposed in version 1 of Legato. For text and HTML processing, namespace processing is not required.
A central focus of HTML and SGML functionality and support is the SGML Object. It is an underlying class used throughout the supporting application to process HTML and XML data.
SGML and related object and functions use a couple of predefined SDK data types to aid in documentation and programming clarity. These are PVALUE and TOKEN for parameter value and token value. Each define to a dword (a 32-bit unsigned value). When specified as a data type they will be italic and when used in general discussion they will be plain text.
The SGML Object and element class are programmed to support HTML and CSS data types. Therefore, if an attribute was equal to ‘12%’ or a property is set to ‘0.125in’, the string value is translated automatically into a more useful pvalue format. They can be stored as simple strings but this significantly reduces the utility of the data. The design is meant to represent data in standard HTML and CSS units with a practical level of precision. For most values, this is 100ths. For one unit type, inches, the resolution is 10000th units. For document publishing, 0.01mm is a very fine resolution while 10000ths is needed for inches since representing an 1/8 of an inch requires a higher level of precision.
The PVALUE type is a useful structure for quickly and easily representing complex HTML and SGML data types. A pvalue is a 32-bit value structured in a manner for easy storage and transport. The keyword PVALUE is defined as a dword data type.
The data type considers CSS measurements, HTML and CSS keywords, strings (CDATA) and other information. Internally pvalues are a bitwise arrangement of a Parameter Type (PT_) and parameter data. The top 5 bits (PT_MASK or 0xF8000000) specify the type of data being represented such as “inches” or “degrees”. The lower 27 bits represent the data (PT_VALUE_MASK or 0x07FFFFFF) of which the structure depends on the type of data.
Six classes of data are represented: strings, arrays, errors, measurements, colors and keywords. The top 5 bits of the dword (0xF8000000 or PT_MASK) determine the type of data contained in the pvalue. The remaining bits are split to a value which can contain flags and other information. For simple measurements, the lower 27 bit portion contains a signed integer in 100ths or 10000th precision.
When used in the SGML Object, strings, arrays and error data is stored on the element class heap. In such a case, the lower portion is a heap offset. The script does not have direct access to the heap but rather uses the high-level SGMLGetParameter and SGMLSetParameter style functions to access the data. The text of error and string is managed in the same manner with a heap offset to a zero-terminated text string. However, errors also contain Simple Type Error codes as a way of representing an error with a simple classification.
For most measurement types, the lower data is an integer with the decimal place shifted 2 positions to allow for a resolution of 100ths. For inches, the resolution is set to 10000ths with the decimal shifted 4 places. (There is a version of inches as 100ths but a measurement such as 1/8 of an inch cannot be represented with 100ths).
The SDK values are defined as follows:
Defined Data | Value | Description | ||||
Parameter Control | ||||||
Masks | ||||||
PT_MASK | 0xF8000000 | Parameter Type Mask — Bitwise AND with this value reveals the underlying parameter type. | ||||
PT_VALUE_MASK | 0x07FFFFFF | Value Mask — Bitwise AND expresses the associated value. | ||||
PT_HEAP_MASK | 0x0000FFFF | Mask for Data on Heap Mask — For values that are strings or arrays, the bitwise AND reveals the heap offset. | ||||
PT_KEYWORD_MASK | 0x0000FFFF | Ordinal Value Mask for Keyword | ||||
Signed Numbers | ||||||
PT_SIGN_BIT | 0x04000000 | Sign, Sign Extend Bit, Data Type — If set the value is a negative number. | ||||
PT_SIGN_EXTEND | 0xF8000000 | OR to Extend Data Sign | ||||
Non-Value Conditions | ||||||
PT_IMPLIED | 0xFFFFFFFF | Value is Implied (default) | ||||
PT_MIXED | 0xFFFFFFFE | Mixed Condition (multiple items) | ||||
PT_UNTRANSLATED | 0xFFFFFFFD | Value Expected to be Translated — The value required action or failed on translation. This value can be returned by failed math operations. | ||||
PT_STRING | 0xF8000000 | Offset to String on Heap | ||||
PT_STRING_SIZE | 0x07FF0000 | Size of Item on Heap (must be shifted) | ||||
PT_ARRAY | 0xE8000000 | Offset to Array Data on Heap | ||||
PT_ARRAY_COMMA | 0x02000000 | If Set, Array Entries Comma Delimited | ||||
PT_ARRAY_COUNT | 0x01FF0000 | Mask to Count of PT_ on Heap | ||||
Errors (on heap) | ||||||
Error Control | ||||||
PT_ERROR | 0xD8000000 | Error Data on Heap (Error : String) | ||||
PT_ERROR_MASK | 0x07FF0000 | Mark for Error Type | ||||
PT_ERROR_NO_DETAIL | 0x0000FFFF | No Offset for Detail Error String | ||||
Simple Error Type Codes | ||||||
PT_ERROR_NONE | 0x00000000 | No Error in Value | ||||
PT_ERROR_SYNTAX | 0x00010000 | Item Fails on Syntax | ||||
PT_ERROR_QUOTE | 0x00020000 | Failure to Close Quote | ||||
PT_ERROR_UNITS | 0x00030000 | Inappropriate Units | ||||
PT_ERROR_RANGE | 0x00040000 | Value Out of Range | ||||
PT_ERROR_SIZE | 0x00050000 | Value to Big | ||||
PT_ERROR_KEYWORD | 0x00060000 | Invalid Keyword | ||||
PT_ERROR_REQUIRED | 0x00070000 | Value Required | ||||
PT_ERROR_DUPLICATE | 0x00080000 | Value Duplicated Elsewhere | ||||
PT_ERROR_OVERFLOW | 0x00090000 | Value Overflows Internal Data | ||||
PT_ERROR_WHOLE_UNITS | 0x000A0000 | Values May Be Whole Only | ||||
PT_ERROR_UNKNOWN_UNITS | 0x000B0000 | Unknown Units | ||||
PT_ERROR_CONFLICT | 0x000C0000 | Conflicting Parameters | ||||
PT_ERROR_CSS_PROPERTY_NAME | 0x000D0000 | Unknown CSS Property Name | ||||
PT_ERROR_CSS_UNKNOWN_SH_ITEM | 0x000E0000 | Unknown Item (CSS shorthand) | ||||
PT_ERROR_HEAP_OVERFLOW | 0x04000000 | Internal Heap Overflow (no offset) | ||||
Warnings | ||||||
PT_WARNING_FRACTIONAL_UNITS | 0x01010000 | Fractional Units Not Allowed | ||||
Parameter Types | ||||||
SGML | ||||||
PT_INT | 0x00000000 | Unsigned Integer/Number (i.e., 23.23) | ||||
PT_SIGNED_INT | 0x08000000 | Signed Integer/Number (+/- i.e., -2.2, +7) | ||||
PT_PERCENT | 0x18000000 | Percentage (i.e., 43.00%) | ||||
PT_RGB | 0x28000000 | Color (24-bit RGB | string) | ||||
PT_RGB_MASK | 0x00FFFFFF | Mask for Heap or Color | ||||
PT_RGB_HEAP_FLAG | 0x02000000 | Color Flag, Value on Heap XXXX/ss | ||||
PT_KEYWORD | 0x38000000 | Keyword Token — The value is the ordinal which is dependent on the attribute or property defined in the DTD for HTML. | ||||
PT_KEYWORD_MASK | 0x0000FFFF | Keyword Mask (16-bit) | ||||
PT_CHAR | 0x48000000 | Character (8-bit ANSI) | ||||
PT_CHAR_MASK | 0x000000FF | Character Mask | ||||
PT_BOOL | 0x58000000 | Boolean (i.e., CHECHED=CHECKED) | ||||
CSS Size as Metric | ||||||
PT_MM | 0x10000000 | Millimeters (+/- i.e., 12.22mm) | ||||
PT_CM | 0x20000000 | Centimeters (+/- i.e., 3.12cm) | ||||
CSS Size English | ||||||
PT_IN_100 | 0x30000000 | Inch (100ths) (+/- i.e., 2.50in) | ||||
PT_IN | 0x68000000 | Inch (10000ths) (+/- i.e., 2.3250in) | ||||
CSS Size Typography | ||||||
PT_PX | 0x40000000 | Pixel (+/- i.e., 4.84px) | ||||
PT_EM | 0x50000000 | Em Spaces (+/- i.e., 2.23em) | ||||
PT_EX | 0x60000000 | Ex Height (+/- i.e., 1.15ex) | ||||
PT_PC | 0x70000000 | Picas (+/- i.e., 12.50pc) | ||||
PT_PT | 0x80000000 | Points (+/- i.e., 22.40pt) | ||||
CSS Angle | ||||||
PT_DEG | 0x90000000 | Degrees (+/- i.e., 4.01deg) | ||||
PT_GRAD | 0xA0000000 | Gradians (+/- i.e., 21.22grad) | ||||
PT_RAD | 0xB0000000 | Radians (+/- i.e., 2.77rad) | ||||
CSS Time | ||||||
PT_HZ | 0xC0000000 | Hertz (+ i.e., 122.12hz) | ||||
PT_KHZ | 0xD0000000 | Kilohertz (+ i.e., 12.11khz) | ||||
PT_MS | 0xE0000000 | Milliseconds (+ i.e., 12.11ms) | ||||
PT_S | 0xF0000000 | Seconds (+ i.e., 4.23s) |
Since pvalue formatting uses all bit positions (including 0x80000000), pvalues can easily be confused for formatted error codes. Programmers are cautioned on using IsError and IsNotError and related functions on data declared as a pvalue type.
As mentioned above, elements, attributes and properties can be referenced by string name or by token value. Tokens are 32-bit dword values with a special type definition of TOKEN. The top bits classify the token types:
Defined Data | Value | Description | ||||
Token Control | ||||||
TT_TYPE_MASK | 0xF0000000 | Token Type Mask | ||||
TT_TOKEN_MASK | 0x000FFFFF | Token Value Mask | ||||
TT_TOKEN_MASK_16 | 0x0000FFFF | Token Value Mask (non field) | ||||
TT_USER_FLAG | 0x00008000 | Token is user-defined | ||||
Fields | ||||||
Note that fields can receive pseudo token status for SGML open/close for stacking and other purposes. | ||||||
TT_SGML_FIELD_MASK | 0x000F0000 | Field Mask | ||||
TT_SGML_FIELD | 0x00030000 | Field Type/Name | ||||
SGML (HTML/XML) | ||||||
TT_SGML_OPEN | 0x10000000 | SGML Start Element (i.e., TABLE) | ||||
TT_SGML_CLOSE | 0x20000000 | SGML End Element (i.e., /TABLE) | ||||
TT_ATTRIBUTE | 0x30000000 | SGML Attribute | ||||
TT_ENTITY | 0x40000000 | Entity | ||||
TT_VALUE | 0x50000000 | Named Entity Values (Properties as defined in a DTD such as NUMBER or %Length) | ||||
TT_NAMESPACE | 0x60000000 | XML Name Space | ||||
TT_NAMESPACE_DEFAULT | 0x60000000 | Default Namespace (zero token mask value) | ||||
CSS | ||||||
TT_CSS_PROPERTY | 0x70000000 | CSS Property (i.e., border or font-size) | ||||
TT_CSS_RULE | 0x80000000 | CSS Rule (i.e., @import) | ||||
Miscellaneous | ||||||
TT_NULL | 0xF0000000 | Item is null or empty (attribute, etc) | ||||
TT_ERROR | (TT_NULL + 1) | Error in Item | ||||
TT_UNIVERSAL | (TT_NULL + 2) | Universal (i.e., * specified for a class name) | ||||
TT_UNIVERSAL_IMPLIED | (TT_NULL + 3) | Universal (i.e., not specified as implied as universal) |
Like pvalues, tokens can use the high bit and programmers should be careful to avoid using the IsError and IsNotError functions on tokens.
The SDK contains predefined token values for HTML and CSS. Programmers should not hard code tokens since the values can possibly change from version to version of the application.
11.1.8 SGML Classes and Objects
Major object/function groupings:
SGML Object — Low-level parsing, reading and writing.
DTD Object— Low-level functions for managing a Document Type Definition or XML schema.
HTML Table Object — Medium-level HTML table mapping and support.
HTML Header Object — Low-level support for HTML file headers.
HTML Outline Object— High-level document outline.
SGML Code Tools — High-level tools for testing and adjusting generic SGML.
HTML Code Tools — High-level tools for testing and adjusting HTML.
HTML Writer Object — High-level HTML writing.
HTML Page Break — Medium level functions for creating, reading and managing structured page breaks.
HTML Fields — Low-level functions for managing proprietary HTML fields.
RSS Feed Object — Low-level functions for reading RSS or Atom feed files.
HTML Compare — Low-level functions for comparing HTML documents.
Each of these classes have their own object handle type and are discussed in the following sections.
Table of Contents | < < Previous | Next >> |
© 2012-2024 Novaworks, LLC. All rights reserved worldwide. Unauthorized use, duplication or transmission prohibited by law. Portions of the software are protected by US Patents 10,095,672, 10,706,221 and 11,210,456. GoFiler™ and Legato™ are trademarks of Novaworks, LLC. EDGAR® is a federally registered trademark of the U.S. Securities and Exchange Commission. Novaworks is not affiliated with or approved by the U.S. Securities and Exchange Commission. All other trademarks are property of their respective owners. Use of the features specified in this language are subject to terms, conditions and limitations of the Software License Agreement.