Last month I described a tool I built to create and insert web news links in my blog, “A Quick Web Link Builder”. I recently expanded the functionality to load the web page and extract information to load link information into the dialog, thus making linking an even quicker process.
Friday, February 28. 2020
LDC #167: A Quick Web Link Builder Part II
Introduction
The problem to adding news links is that one not only has to add the link but also the description and the citation. If you have not read the first article on inserting the links, I recommend taking a look at LDC #165: A Quick Web Link Builder.
In that article, the resource has lines commented (here denoted in blue text):
#beginresource #define ASL_URL 201 #define ASL_URL_LOOKUP 202 #define ASL_TEXT 203 #define ASL_CITE 204 LinkQuery01Dlg DIALOGEX 0, 0, 280, 118 STYLE DS_3DLOOK | WS_POPUP | WS_VISIBLE | WS_CAPTION CAPTION "Add Quick Web Link" FONT 8, "MS Shell Dlg" { CONTROL "Link To:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE, 6, 4, 30, 8, 0 CONTROL "", -1, "static", SS_ETCHEDFRAME | WS_CHILD | WS_VISIBLE, 36, 9, 236, 1, 0 CONTROL "&URL:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 12, 18, 30, 8 CONTROL "", ASL_URL, "edit", ES_AUTOHSCROLL | WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP, 45, 16, 170, 12 CONTROL "Look Up", ASL_URL_LOOKUP, "button", BS_PUSHBUTTON | BS_CENTER | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 225, 16, 40, 12, 0 CONTROL "Parameters:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE, 6, 37, 40, 8, 0 CONTROL "", -1, "static", SS_ETCHEDFRAME | WS_CHILD | WS_VISIBLE, 46, 42, 226, 1, 0 CONTROL "&Text:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 12, 53, 30, 8, 0 CONTROL "", ASL_TEXT, "edit", ES_LEFT | ES_AUTOHSCROLL | WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP, 45, 51, 220, 12, 0 CONTROL "&Cite:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 12, 69, 30, 8, 0 CONTROL "", ASL_CITE, "edit", ES_LEFT | ES_AUTOHSCROLL | WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP, 45, 67, 220, 12, 0 CONTROL "", -1, "static", SS_ETCHEDFRAME | WS_CHILD | WS_VISIBLE, 6, 90, 268, 1, 0 CONTROL "OK", IDOK, "BUTTON", BS_PUSHBUTTON | BS_CENTER | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 168, 96, 50, 14 CONTROL "Cancel", IDCANCEL, "BUTTON", BS_PUSHBUTTON | BS_CENTER | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 223, 96, 50, 14 } #endresource
By uncommenting the lookup button and adding the code described below, we get the following functionality:
The “Look Up” button loads the specified URL and then captures meta data from the source file to load the Text and Cite controls for the URL.
In this blog, we will be adding a couple of global variables, a host name lookup, a button action function, and the code to retrieve the meta data.
The Button Control
After removing the comment from the resource, we need to add in code to capture the button press:
void lq_action(int c_id, int c_ac) { int rc; if (c_id == ASL_URL_LOOKUP) { url = EditGetText(ASL_URL); if (url != "") { rc = get_url_meta_data(); if (IsNotError(rc)) { EditSetText(ASL_TEXT, title); EditSetText(ASL_CITE, cite); } } } }
The dialog procedure lq_action has two parameters: the Control ID and the Control Action. The function is named lq_action with the prefix “lq_" specified in the DialogBox function and the procedure “action" as predefined by the Legato dialog processor. The control ID is specified by the resource. For buttons, the action code is not relevant.
Our snippet of code simply checks the ID, gets the text of the URL, and, if present, passes it to the get_url_meta_data function. Assuming that function succeeds, the title and cite global variables are loaded to the dialog controls.
We could add all the lookup codes required in the action routine, but that is not advisable. To optimize operation, any complex action should be placed in subroutines. The action routine is called constantly for each control to process key presses, focus changes, selections, and other messages. So having any extra code declared in the local variable pool wastes processor time. It also improves debugging, as we shall see later.
Getting the Meta Data
The get_url_meta_data function consists of five sections: (i) get the target page data; (ii) get the title; (iii) get the cited host/domain; (iv) post process the cite for certain domains; and (v) complete the prep of global variables title and cite. Here is the code:
int get_url_meta_data() { string page, s1, s2; int rc, ix; title = ""; cite = ""; page = HTTPGetString(url); if (page == "") { return GetLastError(); } ix = FindInString(page, "<title>"); if (ix > 0) { title = GetStringSegment(page, ix + 7, 200); ix = InString(title, "<"); if (ix > 0) { title[ix] = 0; } title = TrimPadding(title); } s1 = GetURIHost(url); if (s1 != "") { ix = FindInTable(hostlist, s1); if (ix < 0) { cite = s1; } else { cite = hostlist[ix][1]; } } if (cite == "[1]") { ix = InString(title, " - YouTube"); if (ix > 0) { title[ix] = 0; } s2 = ",\\\"author\\\":\\\""; ix = InString(page, s2); if (ix > 0) { ix += GetStringLength(s2); cite = GetStringSegment(page, ix, 40); ix = InString(cite, "\\\""); if (ix > 0) { cite[ix] = 0; } } } title = EntitiesToUTF(title); title = UTFToAnsi(title); cite = "(" + cite + " " + GetLocalTime(DS_MMDDYYYY) + ")"; return ERROR_NONE; }
The first action in the routine is to clear the global cite (for a citation) and title variables and load the URL into a string. If the load fails, an error code is returned (note that in this quick script, so the action routine ignores the error and pressing the button does nothing). Assuming we have text, we get the title of the HTML document (which is located in the header between the <TITLE></TITLE> tags). This assumes that the author of the page employs best practices and actually used the TITLE element.
That takes care of returning one parameter: the title or text of the link. (The next version of Legato, 1.2i, improves the function HTMLHeaderGetTitle to allow the web source to be supplied as a string.)
The citation can be the host (domain name) or a translated value. The GetURIHost function pulls the domain from the URL. Assuming it was successful, we can use the FindInTable function to look for a translation in a table loaded at the start of the script:
int run(int f_id, string mode) { handle hEO; string code; string s1; int c_x, c_y, rc; if (mode != "preprocess") { return ERROR_NONE; } if (ArrayGetAxisDepth(hostlist) == 0) { hostlist = CSVReadTable(GetScriptFolder() + "Host List.csv"); } . . .
If the hostlist variable is empty, the CSVReadTable function is used to load the list. The list is very simple:
Or,
"www.youtube.com","[1]" "www.africanews.com","Africanews" "www.arrl.org","ARRL" "blog.scoutingmagazine.org","Bryan On Scouting" "hackaday.com","Hackaday" "www.iaru-r2.org","IARU" "www.radioworld.com","Radio World" "www.southgatearc.org","SouthgateARC" "www.wbrc.com","WBRC"
If you look closely, you will see the notation “[1]” on YouTube. This is used later in the citation. After looking in the first column for a host name, the translated value is set into the cite variable. If a translation cannot be located, then the raw domain name is used for the citation.
As mention, the notation [1] is used for YouTube. You can add your own exception, but this is mine for YouTube:
if (cite == "[1]") { ix = InString(title, " - YouTube"); if (ix > 0) { title[ix] = 0; } s2 = ",\\\"author\\\":\\\""; ix = InString(page, s2); if (ix > 0) { ix += GetStringLength(s2); cite = GetStringSegment(page, ix, 40); ix = InString(cite, "\\\""); if (ix > 0) { cite[ix] = 0; } } }
In this section, I am relying on the YouTube formatted page. So the vendor could change the format at any time a possibly break my code. For now, this is a decent approach.
First, we want to remove some text from the title. We search for the text “ - YouTube” and the string truncated. Other sites may also be formatted this way, but it is prominent with YouTube. For the citation, I really want the author (or channel name) as it appears on the page, not “YouTube”. To find the channel, I am performing a sloppy string search within the obfuscated JavaScript to find the author. I tested about ten YouTube pages — the formatting seems consistent and has been working fine with this tool. The code gets the size of match string and then captures the web page from the match position plus the length of the match string for about 40 characters (I am assuming most channel names will fit in that area). Then I am looking for the end of the string literal in the object and truncating the string. There are some obvious faults to this logic when the string starts to exceed 40 characters, but, again, we can cross that bridge in a later version. At the end of the process, we have the citation name.
Finally, we need to condition the title and create the complete citation with the date. My format places the citation in parentheses. The preparation of the title string is a little ugly since Legato does not have a single routine at this point to take entities to ANSI text.
Testing the Routine
Debugging code inside of dialog procedures is a bit cumbersome because the IDE does not allow code stepping in the dialog context. When I write a routine like get_url_meta_data, I usually write a test jig, for example (the blue text is the code for the test jig):
string hostlist[][2]; // Site Look up string url, title, cite; // Working Dialog Info int get_url_meta_data (); int main() { int rc; hostlist = CSVReadTable(GetScriptFolder() + "Host List.csv"); url = "https://www.youtube.com/watch?v=wOikIWz4wgc&t=436s"; rc = get_url_meta_data(); AddMessage("Error : 0x%08X", rc); AddMessage("URL : %s", url); AddMessage("Title : %s", title); AddMessage("Cite : %s", cite); return 0; } int get_url_meta_data() { string page, s1, s2; int rc, ix; title = ""; cite = ""; page = HTTPGetString(url); if (page == "") { return GetLastError(); } ix = FindInString(page, "<title>"); if (ix > 0) { title = GetStringSegment(page, ix + 7, 200); ix = InString(title, "<"); if (ix > 0) { title[ix] = 0; } title = TrimPadding(title); } s1 = GetURIHost(url); if (s1 != "") { ix = FindInTable(hostlist, s1); if (ix < 0) { cite = s1; } else { cite = hostlist[ix][1]; } } if (cite == "[1]") { ix = InString(title, " - YouTube"); if (ix > 0) { title[ix] = 0; } s2 = ",\\\"author\\\":\\\""; ix = InString(page, s2); if (ix > 0) { ix += GetStringLength(s2); cite = GetStringSegment(page, ix, 40); ix = InString(cite, "\\\""); if (ix > 0) { cite[ix] = 0; } } } title = EntitiesToUTF(title); title = UTFToAnsi(title); cite = "(" + cite + " " + GetLocalTime(DS_MMDDYYYY) + ")"; return ERROR_NONE; }
This allows for setting different URL values, testing, and stepping through as needed. When run, the above jig outputs:
The Complete Revised Script
Here is the completed revised script:
/************************************************/ string a_class; string a_text; string a_url; string a_cite; string hostlist[][2]; // Site Look up string url, title, cite; // Working Dialog Info /************************************************/ int setup() { string fnScript; string item[10]; int rc; item["Code"] = "EXTENSION_QUICK_WEB_LINK"; item["MenuText"] = "&Quick Web Link"; item["Description"] = "<B>Quick Web Link</B>\r\rAdds a custom external web hypertext link."; item["Class"] = "DocumentExtension"; rc = MenuFindFunctionID(item["Code"]); if (IsNotError(rc)) { return ERROR_NONE; } rc = MenuAddFunction(item); if (IsError(rc)) { return ERROR_NONE; } fnScript = GetScriptFilename(); MenuSetHook(item["Code"], fnScript, "run"); ArrayClear(item); item["KeyCode"] = "L_KEY_CONTROL"; item["ChordAKey"] = "Q"; item["FunctionCode"] = "EXTENSION_QUICK_WEB_LINK"; item["Description"] = "Insert quick keb link"; QuickKeyRegister("Page View", item); return ERROR_NONE; } /************************************************/ int main() { string s1; int rc; s1 = GetScriptParent(); if (s1 == "LegatoIDE") { rc = MenuFindFunctionID("EXTENSION_QUICK_WEB_LINK"); if (IsError(rc)) { setup(); } else { MenuDeleteHook("EXTENSION_QUICK_WEB_LINK"); s1 = GetScriptFilename(); MenuSetHook("EXTENSION_QUICK_WEB_LINK", s1, "run"); } MessageBox('i', "Hook running on IDE"); } return ERROR_NONE; } /************************************************/ int run(int f_id, string mode) { handle hEO; string code; string s1; int c_x, c_y, rc; if (mode != "preprocess") { return ERROR_NONE; } if (ArrayGetAxisDepth(hostlist) == 0) { hostlist = CSVReadTable(GetScriptFolder() + "Host List.csv"); } hEO = GetActiveEditObject(); if (hEO == NULL_HANDLE) { return ERROR_CONTEXT; } rc = GetSelectMode(hEO); if (rc != EDO_NOT_SELECTED) { MessageBox('x', "Deselect to use this function"); return ERROR_CANCEL; } c_y = GetCaretYPosition(hEO); c_x = GetCaretXPosition(hEO); rc = DialogBox("LinkQuery01Dlg", "lq_"); if (IsError(rc)) { return rc; } a_class = "page"; code = "<a " + "class=\"" + a_class + "\" " + "target=\"_blank\" " + "href=\"" + a_url + "\"" + ">"; code += ANSITextToXML(a_text); code += "</a>"; code += " <font style=\"font-size: 8pt\">"; code += ANSITextToXML(a_cite); code += "</font>"; WriteSegment(hEO, code, c_x, c_y); return ERROR_NONE; } /************************************************/ #beginresource #define ASL_URL 201 #define ASL_URL_LOOKUP 202 #define ASL_TEXT 203 #define ASL_CITE 204 LinkQuery01Dlg DIALOGEX 0, 0, 280, 118 STYLE DS_3DLOOK | WS_POPUP | WS_VISIBLE | WS_CAPTION CAPTION "Add Quick Web Link" FONT 8, "MS Shell Dlg" { CONTROL "Link To:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE, 6, 4, 30, 8, 0 CONTROL "", -1, "static", SS_ETCHEDFRAME | WS_CHILD | WS_VISIBLE, 36, 9, 236, 1, 0 CONTROL "&URL:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 12, 18, 30, 8 CONTROL "", ASL_URL, "edit", ES_AUTOHSCROLL | WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP, 45, 16, 170, 12 CONTROL "Look Up", ASL_URL_LOOKUP, "button", BS_PUSHBUTTON | BS_CENTER | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 225, 16, 40, 12, 0 CONTROL "Parameters:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE, 6, 37, 40, 8, 0 CONTROL "", -1, "static", SS_ETCHEDFRAME | WS_CHILD | WS_VISIBLE, 46, 42, 226, 1, 0 CONTROL "&Text:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 12, 53, 30, 8, 0 CONTROL "", ASL_TEXT, "edit", ES_LEFT | ES_AUTOHSCROLL | WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP, 45, 51, 220, 12, 0 CONTROL "&Cite:", -1, "static", SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 12, 69, 30, 8, 0 CONTROL "", ASL_CITE, "edit", ES_LEFT | ES_AUTOHSCROLL | WS_CHILD | WS_VISIBLE | WS_BORDER | WS_TABSTOP, 45, 67, 220, 12, 0 CONTROL "", -1, "static", SS_ETCHEDFRAME | WS_CHILD | WS_VISIBLE, 6, 90, 268, 1, 0 CONTROL "OK", IDOK, "BUTTON", BS_PUSHBUTTON | BS_CENTER | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 168, 96, 50, 14 CONTROL "Cancel", IDCANCEL, "BUTTON", BS_PUSHBUTTON | BS_CENTER | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 223, 96, 50, 14 } #endresource int get_url_meta_data() { string page, s1, s2; int rc, ix; title = ""; cite = ""; page = HTTPGetString(url); if (page == "") { return GetLastError(); } ix = FindInString(page, "<title>"); if (ix > 0) { title = GetStringSegment(page, ix + 7, 200); ix = InString(title, "<"); if (ix > 0) { title[ix] = 0; } title = TrimPadding(title); } s1 = GetURIHost(url); if (s1 != "") { ix = FindInTable(hostlist, s1); if (ix < 0) { cite = s1; } else { cite = hostlist[ix][1]; } } if (cite == "[1]") { ix = InString(title, " - YouTube"); if (ix > 0) { title[ix] = 0; } s2 = ",\\\"author\\\":\\\""; ix = InString(page, s2); if (ix > 0) { ix += GetStringLength(s2); cite = GetStringSegment(page, ix, 40); ix = InString(cite, "\\\""); if (ix > 0) { cite[ix] = 0; } } } title = EntitiesToUTF(title); title = UTFToAnsi(title); cite = "(" + cite + " " + GetLocalTime(DS_MMDDYYYY) + ")"; return ERROR_NONE; } void lq_action(int c_id, int c_ac) { int rc; if (c_id == ASL_URL_LOOKUP) { url = EditGetText(ASL_URL); if (url != "") { rc = get_url_meta_data(); if (IsNotError(rc)) { EditSetText(ASL_TEXT, title); EditSetText(ASL_CITE, cite); } } } } int lq_validate() { string parts[]; string s1; int rc; a_url = EditGetText(ASL_URL, "URL", EGT_FLAG_REQUIRED); rc = GetLastError(); if (IsError(rc)) { return rc; } parts = GetURIComponents(a_url); s1 = MakeLowerCase(parts["scheme"]); if ((s1 != "http:") && (s1 != "https:")) { MessageBox('x', "Need an HTTP or HTTPS scheme on link."); return ERROR_SOFT | ASL_URL; } a_text = EditGetText(ASL_TEXT, "Text", EGT_FLAG_REQUIRED); rc = GetLastError(); if (IsError(rc)) { return rc; } a_cite = EditGetText(ASL_CITE, "Cite", EGT_FLAG_REQUIRED); rc = GetLastError(); if (IsError(rc)) { return rc; } return ERROR_NONE; }
Conclusion
I have been using this since the posting of the last blog and find it works quite well. This demonstrates how a specialize tool can be added to the application to perform certain tasks that would otherwise have many steps or be very time-consuming.
Scott Theis is the President of Novaworks and the principal developer of the Legato scripting language. He has extensive expertise with EDGAR, HTML, XBRL, and other programming languages. |
Additional Resources
Quicksearch
Categories
Calendar
November '24 | ||||||
---|---|---|---|---|---|---|
Mo | Tu | We | Th | Fr | Sa | Su |
Sunday, November 10. 2024 | ||||||
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 |