This week we will be covering scanning, testing, and some functions for manipulating textual string data.
Friday, December 30. 2016
Legato Developers Corner #15: String Testing and Translation
String Testing and Translation
This week we will be covering scanning, testing, and some functions for manipulating textual string data.
Introduction
In certain cases, it is necessary to manually scan and test string content. One could manually step through a string character by character and perform specific tests. This would be slow and cumbersome. Fortunately, the Legato SDK contains hundreds of functions to process and manipulate strings. In this article, we will be discussing parsing, basic boolean test functions, word analysis functions, and some conversion functions.
Boolean and Bitwise Results
Functions that test or get data frequent either return a boolean value or a bitwise result. While they are both fundamentally numbers, the boolean data type is used to indicate that if the value is 0, it is FALSE. A non-zero (usually 1) is TRUE. Bitwise return values are typically defined as a dword, a 32-bit unsigned integer. Conventionally the top bit (0x80000000) is not used as part of the result since it normally indicates the value is a formatted error code.
With boolean return values, it is easy to perform operations such as:
if (HasText(mystring)) { ... }
Bitwise return values can contain a lot of information and are normally tested with bit and ordinal masks. This will be covered in detail later. Since working with values like 0x0001000 can be clumsy, the SDK contains definitions for all common bitwise return functions.
A sample use of bitwise return values might look like this:
result = AnalyzeText(mystring);
if ((result & TEXT_TYPE_MASK) == TEXT_TYPE_HEADING) { ... }
A Short Review of the Word Parse Object
Generally speaking, we have to have data to test. The easiest way to look through a string of text is by using the Word Parse Object. The object supports a series of functions that are specifically tailored to “parse” or process textual data. The general-purpose parse runs in three modes: text, tags, and program. The text mode provides word parsing for reading general text. The tags mode is tailored to work on XML, HTML, or SGML tags and character entities. Finally, program mode is made to parse typical program or script text.
We will be focusing on the default parsing mode or text mode (WP_GENERAL), which stops on word spaces, returns (line endings), and punctuation within the textual information.
Basic Operation
The general steps are as follows: create (get handle), load/set data, and iterate until the data has been exhausted. New data can be repeatedly loaded to the same object to process multiple buffers or lines. After completion, the Word Parse Object handle should be closed.
As each item is parsed, the leading spaces and statistics are stored. For example, the caller can check to see if there are leading spaces and even get the white space character as a string.
Once the source data is set, the source variable can be changed or released. The Word Parse Object makes an internal copy of the data.
Setting Up a Parse Operation
The first action is to create a Word Parse Object and retrieve a handle. That handle is then used in subsequent operations to move through the text and examine each parsed item. For example:
handle hWP; string s1, s2; int spaces, count, pos; s1 = "My favorite pastime is waiting for my browser to load a page.\rEnd."; hWP = WordParseCreate(); if (hWP == NULL_HANDLE) { MessageBox('x', "Error on handle"); exit; } WordParseSetData(hWP, s1); s2 = WordParseGetWord(hWP); while (s2 != "") { count++; pos = WordParseGetPosition(hWP); spaces = WordParseGetSpaceSize(hWP); AddMessage(" %3d %3d %3d :%s:", count, pos, spaces, s2); s2 = WordParseGetWord(hWP); } CloseHandle(hWP);
The result in the log:
1 2 0 :My: 2 11 1 :favorite: 3 19 1 :pastime: 4 22 1 :is: 5 31 2 :waiting: 6 35 1 :for: 7 38 1 :my: 8 46 1 :browser: 9 49 1 :to: 10 54 1 :load: 11 56 1 :a: 12 62 1 :page.: 13 67 1 :End.:
In this case, the parse object is created with the default mode (text). A string is added to the parse object and then each successive word is retrieved along with certain attributes. When added to the log, we surround the returned string value with “::” to illustrate that the string does not contain white space. Note as shown in the log, the first entry has no leading spaces. There is an additional space before “waiting” and a return before “End”.
Functions are provided to retrieve and change the parsing position. In addition, a parse object can be used repeatedly, provided the parse mode remains the same.
Skipping Through a String
Another option while parsing is to skip through a string looking for text or spaces. A series of functions are provided that take a string and a zero-based index as a parameter and then return a new index position.
Function | Description | ||
SkipBackWordSpaces | Skips back from a specified index to the first non-word space character. | ||
SkipToLineEnding | Skips forward to the next line ending character (0x0D/0x0A). | ||
SkipToNonText | Skips forward to a character that is not alpha-numeric. | ||
SkipToWordSpace | Skips forward until a word space is found. | ||
SkipWordSpaces | Skips forward until not on a word space. |
Once positions have been established, they can be used with the word parser or functions such as the GetStringSegment function.
Testing a Word
To avoid having to test each character in a word to determine its type, a large set of SDK functions are provided to test a string, or in some cases a character, as meeting a particular criterion. The following is a partial list of the ‘Is’ or ‘Has’ functions that return TRUE (1) or FALSE (0) depending on the result.
Function | Description | ||
HasNumeric | Tests a string for any numeric characters (digits). | ||
HasText | Tests a string for any text (alpha characters). | ||
IsAllLower | Tests a string for all lower case on any text that is present. | ||
IsAllUpper | Tests a string for all upper case on any text that is present. | ||
IsASCII | Tests a string for non-ASCII characters allowing for return and tab characters. | ||
IsASCIICharacters | Tests a string for non-ASCII (no control characters). | ||
IsAccounting | Tests a string for accounting characters (i.e., -123 or (34,555.44)). | ||
IsAlpha8859 | Tests a character or string for ASCII Letters and ISO-8859 Latin letter characters. | ||
IsAlphaNumeric8859 | Tests a character or string for ASCII letters, numbers, and ISO-8859 Latin letter characters. | ||
IsAlpha | Tests a character or string for ASCII characters. | ||
IsCurrency | Tests a character or string as currency group (allows . , and ( ) characters). | ||
IsCurrencyFormatted | Tests a string as properly formatted currency (US, Euro, Pounds, Yen, cents). | ||
IsCurrencyPrefix | Tests a string for currency leader (i.e., USD$ or CAN$) allowing for other ISO-4217 codes. Following number can be loosely formatted as accounting. | ||
IsCurrencyProper | Tests a string for properly structured currency, i.e., 128,333, allowing for multiple commas and periods for US and European formats. It does not check the number of digits. | ||
IsDrawing | Tests character or string for a limited set of characters commonly used for drawing such as ‘-’ or ‘=’. | ||
IsFalse | Checks string for common terms such as ‘no’, ‘false’,‘0’ or empty checkboxes as being the same as logical FALSE. | ||
IsFootnoteReference | Checks string for typical footnote characters, numbers or letters in the hole: (1) or (b). | ||
IsHTML | Tests a string as being HTML by checking for certain HTML tags. | ||
IsHex | Tests a character or string as being valid hex characters. It cannot have the 0x prefix. | ||
IsInString | Checks for all or part of string or character inside of a target string (like InString but with a boolean result). | ||
IsLeaderBackFill | Looks backward in a string for leader fill characters. The string can contain text prior to the leader. | ||
IsLeaderFill | Looks in string for leader fill characters. | ||
IsLower | Checks a string (word) for being lower case. The word cannot contain any non-alpha characters. | ||
IsNil | Checks a character or string as being a financial nil value. | ||
IsNonBreakingSpace | Checks a character or a string as being non-breaking space character(s) (0xA0 or 160). | ||
IsNonBreakingSpaceEntity | Checks for the start of a string being a non-breaking space (PCDATA) as or   characters. | ||
IsNonBreakingSpacePCDATA | Checks for string containing only non-breaking spaces and optional word spaces. | ||
IsNumeric | Tests a string for strictly numeric (digits). | ||
IsPCDATARequired | Checks a string for a requirement to encode as PCDATA. | ||
IsPercentage | Checks a string as a percentage value such as 0% or 00.00% etc. | ||
IsReal | Checks a string as being a real number. | ||
IsRealStrict | Checks a string as a real number but with strict requirements. | ||
IsRegexMatch | Performs a regular expression pattern match on string data. | ||
IsRoman | Tests a character as a roman numeral or a string as a roman number. | ||
IsSectionNumber | Tests a string for being a section number (i.e.. 1, 2.2, 2.1.1., etc.). | ||
IsStringPadded | Tests to see if a string has space padding either before or after. | ||
IsTabbedString | Checks a string for tab characters. | ||
IsText | Tests a string for being a textual word with or without conventional punctuation. | ||
IsTrue | Tests a string for common terms such as ‘yes’, ‘true’, ‘1’ or various checked checkbox styles being the same as logical TRUE. | ||
IsUpper | Checks a string (word) for being upper case. The word cannot contain any non-alpha characters. | ||
IsSGMLCharacterEntity | Tests a string for basic SGML character entity structure. It does not check the actual character specification. | ||
IsSGMLEmptyElement | Tests a string for basic SGML character entity structure. It does not check the actual character specification. | ||
IsSGMLTag | Tests a string for basic SGML tag structure. Does not check the content other than for <a> or </a> structure. | ||
IsValidSGMLAttribute | Tests a string for a valid syntax attribute name (with name space). | ||
IsValidSGMLElement | Tests a string for a valid syntax element name (with name space). | ||
IsWildListMatch | Matches a list of items against a target string with wildcards. The match string list can be a semicolon separated list of test cases. | ||
IsWildMatch | Checks two strings for wild card match. Matches string 2 to string 1. | ||
IsWildString | Tests a string for containing one or more wild card characters. |
Some of the above functions will also accept and test an individual character. Functions that test only characters are also available:
Function | Description | ||
IsANSISpace | Tests a character as white space including backspace (back tab, 0x08) and tab (0x09). | ||
IsAlphaNumeric | Tests a character for character ASCII letters or numbers. | ||
IsDigit | Tests a character as a number (0-9). | ||
IsExpressionCharacter | Tests a character that could be used in an expression (i.e.., { < +, etc.). | ||
IsExpressionGroup | Tests a character used in an expression group (i.e., " ' ( or [ ). | ||
IsExtendedAlpha | Tests a character to see if it is part of ISO-8859 latin alpha set commonly used in English. | ||
IsFinancial | Checks character as a number, currency or ‘,’ or ‘.’. | ||
IsSentenceDelimiter | Tests character as one that delimits a sentence (. ! : ?). | ||
IsValidSGMLCharacter | Tests a character that can be part of an SGML element or attribute. | ||
IsValidSGMLStartCharacter | Tests a character as a start character in an SGML element or attribute. | ||
IsValidVariableCharacter | Tests a character that can be used in a programming variable (no lead character exclusion). | ||
IsVowel | Tests a character as western vowel (A E I O U, upper and lower case). | ||
IsWordDelimiter | Tests a character as word style delimiter (‘' . , : ; ! ? ( ) [ ] { }’). | ||
IsWordSpace | Tests a character as a Word Space (space, return, tab, new line, 0x00). |
Another powerful tool is the GetWordType function. The GetWordType function analyzes the content of a provided word and returns the type and attributes. The prototype:
dword = GetWordType ( string data );
The data parameter is a string containing a word without leading or trailing spaces. The returned value is a 32-bit dword (or int, unsigned) containing bitwise information. The results can be any of the following:
Definition | Bitwise | Description | ||||
Item Types | ||||||
WT_TYPE_ITEM_MASK | 0x000F0000 | Item Type Mask | ||||
WT_TYPE_UNKNOWN | 0x00000000 | Unknown Value | ||||
WT_TYPE_WORD | 0x00010000 | Word (dog, cat, monkey) | ||||
WT_TYPE_NUMBER | 0x00020000 | Number | ||||
WT_TYPE_NUMBER_SERIAL | 0x00030000 | Serial Number (12, 63) | ||||
WT_TYPE_LEADER | 0x00040000 | Leader Line | ||||
WT_TYPE_RULER | 0x00050000 | Ruler (possible or dash, nil) | ||||
WT_TYPE_CURRENCY_LEADER | 0x00060000 | Opening Currency “$ 1,121” | ||||
WT_TYPE_NIL | 0x00070000 | Nil or Compound Nil “--(a)” or “—” or “$-” | ||||
WT_TYPE_DATE | 0x00080000 | Date “12/12/12”, “12.12.12”, “23:22” or ISO | ||||
Word Variations | ||||||
WT_WORD_MASK | 0x00700000 | Word Type Mask | ||||
Types | ||||||
WT_WORD_UNKNOWN | 0x00000000 | Unknown or General Word Type | ||||
WT_WORD_LOWER | 0x00100000 | Lower Case Word | ||||
WT_WORD_UPPER | 0x00200000 | Upper Case Word | ||||
WT_WORD_INITIAL | 0x00300000 | Initial Capital | ||||
Word Flags | ||||||
WT_WORD_TRAIL_MASK | 0x000000FF | Punctuation (low in char) | ||||
WT_WORD_TRAIL_PUNCTUATION | 0x00800000 | Trails Punctuation (in low char) | ||||
WT_WORD_QUOTED | 0x01000000 | Word Quoted (can be partial) | ||||
WT_WORD_IN_HOLE | 0x02000000 | Word has Parenthesis or Brackets | ||||
WT_WORD_LEADER_TRAIL | 0x04000000 | Word has a Trailing Leader Line | ||||
Lexicon | ||||||
WT_WORD_LEXICON_MASK | 0x70000000 | Lexicon Mask | ||||
WT_WORD_DATE_MONTH | 0x10000000 | Word is in Month Lexicon | ||||
WT_WORD_DATE_DAY | 0x20000000 | Word is in Day Lexicon | ||||
WT_WORD_HONORIFIC | 0x30000000 | Word is in Honorific Lexicon | ||||
Number Variations | ||||||
WT_NUMBER_ALIGN_MASK | 0x000000FF | Alignment Position at Size | ||||
Types | ||||||
WT_NUMBER_MASK | 0x00700000 | Number Type Mask | ||||
WT_NUMBER_UNKNOWN | 0x00000000 | Unknown Type | ||||
WT_NUMBER_YEAR | 0x00100000 | Number is Year (1900-2099) | ||||
WT_NUMBER_DAY | 0x00200000 | Number is Day (1-31) | ||||
WT_NUMBER_FORMATTED | 0x00300000 | Number is Formatted | ||||
WT_NUMBER_LIST | 0x00400000 | Part of a List (1-99 with trail) | ||||
Number Flags | ||||||
WT_NUMBER_NEGATIVE | 0x01000000 | Negative Number (000) or -000 | ||||
WT_NUMBER_IN_HOLE | 0x02000000 | Negative Number (000) | ||||
WT_NUMBER_FOOTNOTE | 0x04000000 | Has Footnote | ||||
WT_NUMBER_CURRENCY | 0x08000000 | Has Currency | ||||
WT_NUMBER_PERCENT | 0x10000000 | Has Percent | ||||
WT_NUMBER_IN_HOLE_ERROR | 0x20000000 | Error in Parenthetical | ||||
WT_NUMBER_BAD_FORMAT | 0x40000000 | Bad Format (characters, not structure) | ||||
Leader Variation | ||||||
WT_LEADER_SIZE_MASK | 0x00000FFF | Word Type Mask (character in bottom) | ||||
Ruler Variations | ||||||
WT_RULER_MASK | 0x00700000 | Drawing Character in the Lower 8-bits | ||||
WT_RULER_CHARACTER | 0x000000FF | Mask for Ruler Character | ||||
Ruler Types | ||||||
WT_RULER_MIXED | 0x00000000 | Of Indeterminate Type | ||||
WT_RULER_SUBTOTAL | 0x00100000 | Subtotal Type | ||||
WT_RULER_TOTAL | 0x00200000 | Total Type | ||||
Ruler Flags | ||||||
WT_RULER_DASH | 0x01000000 | Possible Connecting Dash | ||||
Date Variations | ||||||
WT_DATE_MASK | 0x0F000000 | Date Code Mask | ||||
WT_DATE_AS_GENERAL | 0x00000000 | Date as Any Type (short mm/yy not supported) | ||||
WT_DATE_ISO_8601 | 0x01000000 | Date as ISO (in part, w w/o time) | ||||
WT_DATE_TIME_ONLY | 0x02000000 | A Time with Optional AM/PM | ||||
Unknown Word Data | ||||||
WT_UNKNOWN_ALPHA | 0x0000000F | Alpha Count | ||||
WT_UNKNOWN_NUMERIC | 0x000000F0 | Numeric Count | ||||
WT_UNKNOWN_CURRENCY | 0x00000300 | Currency Count (4) | ||||
WT_UNKNOWN_PUNCTUATION | 0x00000C00 | Sentence Punctuation Count (4) | ||||
WT_UNKNOWN_COMMA_PERIOD | 0x00003000 | Comma Period Count | ||||
WT_UNKNOWN_GROUP | 0x0000C000 | Parenthesis/Brace Group | ||||
WT_UNKNOWN_QUOTE | 0x00300000 | Quote Character Count | ||||
WT_UNKNOWN_FOOTNOTE | 0x00C00000 | Footnote Type Characters | ||||
WT_UNKNOWN_RULE | 0x03000000 | Rule Character Count | ||||
WT_UNKNOWN_ELLIPSE | 0x0C000000 | Ellipse Count | ||||
WT_UNKNOWN_OTHER | 0x30000000 | Other Count |
Depending on your programming background, bitwise operation may be a bit foreign. They are widely used under the hood in many environments and can be very efficient at conveying a lot of information in a small form factor. Generally, the binary information is segmented into flags and ordinals. Flags are simple. If a bit is set, then the condition is true. Ordinals, on the other hand, require a mask to filter the group associated bits. Those bits in turn represent one of a set of conditions. For example, the resulting dword can be filtered by ‘ANDing’ the result with the WT_TYPE_ITEM_MASK value:
code = GetWordType(word); switch (code & WT_TYPE_ITEM_MASK) { case WT_TYPE_UNKNOWN: break; case WT_TYPE_WORD: break; case WT_TYPE_NUMBER: break; case WT_TYPE_NUMBER_SERIAL: break; case WT_TYPE_LEADER: break; case WT_TYPE_RULER: break; case WT_TYPE_CURRENCY_LEADER: break; case WT_TYPE_NIL: break; case WT_TYPE_DATE: break; }
Each case section can then count or act upon the details of the item. For example, if the type is date, then the WT_DATE_ items can be tested to narrow the type of date.
The GetWordType function is useful for aggregating information from a text stream to perform high level analysis. For example, a line of text can be parsed, information accumulated, and the first and last word data examined to determine the probability of line being a heading, part of a paragraph, or perhaps a row of a table.
Analysis is performed on a gross level basis. That is, types of characters are counted and then run through logic to perform a basic analysis. For example, if one or two dashes are present without text, the content will be considered a “nil” value as would be seen in a table. On other hand, three dashes would be considered as a possible rule or visual aid.
In addition, there are the related functions GetListType and GetNumericType, which are similar in operation to GetWordType but return data specific to values as list and numbers, respectively.
The words to test should be passed to the function without spaces. If the Word Parse Object is employed with WP_GENERAL mode, the data returned is compatible with analysis.
Converting Common String Forms
The Legato SDK also contains a number of functions for performing common string conversions and operations:
Function | Description | ||
ChangeCase | Changes the case of a string, including HTML. | ||
CharacterToLowerCase | Converts a character to lower case (ANSI only). | ||
CharacterToUpperCase | Converts a character to upper case (ANSI only). | ||
ConformAddressString | Conforms the case and style of an address line. | ||
ConvertAddNewlines | Copies string and adds newline (0x0A) characters to return (0x0D) characters. | ||
ConvertDeleteNewlines | Copies string while deleting newline (0x0A) characters. | ||
ConvertFromEscapeCharacters | Copies from escaped characters (with backslash such as \r or \n). | ||
ConvertFromUnderbars | Converts underbars in a string to spaces. | ||
ConvertFromUnderlines | Removes the static control underline characters. | ||
ConvertNoCodes | Converts a string and changes all control codes (including newlines, returns, tabs) to period (.) characters. | ||
ConvertNoPunctuation | Converts a string by removing any punctuation. | ||
ConvertNoSpaces | Converts a string by removing all space characters (0x20). | ||
ConvertSoftBreaksToSpaces | Converts soft break characters (0x09, 0x0D, 0x0A) to spaces. | ||
ConvertToEscapeCharacters | Copies to escaped characters (with backslash such as \r \n) | ||
ConvertToUnderbars | Copies with spaces changed to underbars. | ||
ConvertToUnderlines | Copies to static control underline characters using escaped &. | ||
ConvertWordSpaces | Converts all word spaces to single spaces. | ||
MakeLowerCase | Makes a string lower case (ANSI only). | ||
MakeUpperCase | Makes a string upper case (ANSI only). | ||
PadString | Pads a string to a specified size with an optional fill string. | ||
ReplaceInString | Replaces matching strings inside another string with or without case sensitivity. | ||
ReplaceInStringRegex | Replaces matching strings inside another string using regular expression rules. | ||
ReverseString | Reverses the character position content of a string. | ||
TrailStringAfter | Trails off a string with an ellipse (‘...’ characters) if exceeds specified size. | ||
TrailStringAfterAlways | Trails off a string with an ellipse (‘...’ characters) at specified size or always. | ||
TrailStringBefore | Truncates a string and adds ellipse (‘...’ characters) at the start of the string if the length exceeds the specified size. | ||
TrimNonBreakingSpaces | Trims non-breaking spaces (as raw characters). | ||
TrimPadding | Trims the padding on both left and right sides of string. | ||
TrimString | Trims the trailing spaces from the right side (end) of a string. |
Changing case is a common operation, which can be performed using the MakeLowerCase and MakeUpperCase functions. The ChangeCase function is substantially more sophisticated allows for the processing of sentences of data in a number of modes, such as title capitalization.
Expanding Our Example
Let us add a few things to the above example:
handle hWP; string s1, s2, s3; dword type; int spaces, count, pos; s1 = "On July 27, 2016, the Company: (i) purchased a dog; (ii) found a vet; "; s1 += "(iii) purchased a dog bed; and, (iv) spent $110 on a doggy ID chip. "; hWP = WordParseCreate(); if (hWP == NULL_HANDLE) { MessageBox('x', "Error on handle"); exit; } WordParseSetData(hWP, s1); s2 = WordParseGetWord(hWP); while (s2 != "") { count++; pos = WordParseGetPosition(hWP); spaces = WordParseGetSpaceSize(hWP); type = GetWordType(s2); s2 = ":" + s2 + ":"; s2 = PadString(s2, 12); switch (type & WT_TYPE_ITEM_MASK) { case WT_TYPE_UNKNOWN: s3 = "Unknown"; break; case WT_TYPE_WORD: s3 = "Word"; break; case WT_TYPE_NUMBER: s3 = "Number"; break; case WT_TYPE_NUMBER_SERIAL: s3 = "Number (Serial)"; break; case WT_TYPE_LEADER: s3 = "Leader"; break; case WT_TYPE_RULER: s3 = "Ruler"; break; case WT_TYPE_CURRENCY_LEADER: s3 = "Currency"; break; case WT_TYPE_NIL: s3 = "Nil"; break; case WT_TYPE_DATE: s3 = "Date"; break; default: s3 = ""; } AddMessage(" %3d %3d %3d 0x%08X %s %s", count, pos, spaces, type, s2, s3); s2 = WordParseGetWord(hWP); } CloseHandle(hWP);
The result in the log would appear as:
1 2 0 0x00310000 :On: Word 2 7 1 0x10310000 :July: Word 3 11 1 0x00230002 :27,: Number (serial) 4 17 1 0x00030005 :2016,: Number (serial) 5 21 1 0x00110000 :the: Word 6 30 1 0x00B1003A :Company:: Word 7 34 1 0x02110000 :(i): Word 8 44 1 0x00110000 :purchased: Word 9 46 1 0x00110000 :a: Word 10 51 1 0x0011003B :dog;: Word 11 56 1 0x02110000 :(ii): Word 12 62 1 0x00110000 :found: Word 13 64 1 0x00110000 :a: Word 14 69 1 0x0011003B :vet;: Word 15 75 1 0x02110000 :(iii): Word 16 85 1 0x00110000 :purchased: Word 17 87 1 0x00110000 :a: Word 18 91 1 0x00110000 :dog: Word 19 96 1 0x0011003B :bed;: Word 20 101 1 0x0011002C :and,: Word 21 106 1 0x02110000 :(iv): Word 22 112 1 0x00110000 :spent: Word 23 117 1 0x08020004 :$110: Number 24 120 1 0x00110000 :on: Word 25 122 1 0x00110000 :a: Word 26 128 1 0x00110000 :doggy: Word 27 131 1 0x00210000 :ID: Word 28 137 1 0x0091002E :chip.: Word
We are using the PadString function to make a fixed size field in the log for the word, and we are still maintaining the ‘::’ convention that contains the word. The return value from the GetWordType function is both translated to a friendly string and printed in hexadecimal form in the log. Note that the words “27,” and “2016,” are considered serial numbers, as in a list that could appear within narrative as opposed to a table cell.
Conclusion
Since Legato is a part of GoFiler and GoFiler specializes in converting and editing text, many of the foundational string functions are exposed as script functions. If you cannot find a function to match your particular operation, contact technical support as it may already exist.
Scott Theis is the President of Novaworks and has been involved in the EDGAR industry for over thirty years. He has worked with the EDGAR system at multiple levels: as a financial printer, a member of the EDGAR design team, and as a software developer. He has extensive expertise with EDGAR, HTML, XBRL, and other programming languages. |
Additional Resources