LDC #135: How To Automate Inserting Pictures into an HTML File

Friday, May 10. 2019

LDC #135: How To Automate Inserting Pictures into an HTML File

Whenever a task has to be done repeatedly and there’s not much thought that needs to go into running the task, it’s generally a great candidate for automating using a Legato script. I’ve been asked a few times in the past month or so how to quickly and easily insert multiple images into an HTML file. Since that’s a relatively easy task that just requires a lot of the same operation over and over again, I thought it would make a great example of automation for our Legato blog. So this week, we’ll take a look at a simple script that takes a folder as an input, scans it for images, and then inserts all of the images it finds into an HTML file.

Let’s start by taking a look at the defines that will be needed for this file:


#define PBNUM   "%%%PBNUM%%%"
#define FNAME   "%%%FN%%%"

#define EMPTY_PARA "<P STYLE=\"margin: 0\">&nbsp;</P>\r\n"

#define IMG_1   "<P STYLE=\"margin:0\"><img src=\"%%%FN%%%\"></P>\r\n"
#define IMG_2   "<!-- Field: Page; Sequence: %%%PBNUM%%% -->\r\n"
#define IMG_3   "  <DIV STYLE=\"margin-bottom: 6pt\"><TABLE CELLPADDING=\"0\" CELLSPACING=\"0\" STYLE=\"border-collapse: collapse; width: 100%; font-size: 10pt\"><TR STYLE=\"vertical-align: top; text-align: left\"><TD STYLE=\"width: 33%\">&nbsp;</TD><TD STYLE=\"width: 34%; text-align: center\">-<!-- Field: Sequence; Type: Arabic; Name: PageNo -->%%%PBNUM%%%<!-- Field: /Sequence -->-</TD><TD STYLE=\"width: 33%; text-align: right\">&nbsp;</TD></TR></TABLE></DIV>\r\n"
#define IMG_4   "  <DIV STYLE=\"page-break-before: always; margin-top: 6pt\"><TABLE CELLPADDING=\"0\" CELLSPACING=\"0\" STYLE=\"border-collapse: collapse; width: 100%; font-size: 10pt\"><TR STYLE=\"vertical-align: top; text-align: left\"><TD STYLE=\"width: 100%\">&nbsp;</TD></TR></TABLE></DIV>\r\n"
#define IMG_5   "  <!-- Field: /Page -->\r\n\r\n"

The defines PBNUM and FNAME are the page break number and the file name, respectively. For each image file we insert, we’ll need to do a find/replace operation on these values with the correct values for that image. The EMPTY_PARA define represents an empty paragraph in GoFiler. If we want to replace an empty paragraph with other content, we can replace this defined value with new content. The defines IMG_1 through IMG_5 are just snippets of HTML code. I could have put them all into a single define, though I felt it made it easier to understand when they were broken out like this. Each represents a single line of HTML we’re going to insert for each picture, starting with the image and then including a page break.

This script is pretty simple. It only includes four functions. Of these, setup and main are the same as in almost every other script that has been demonstrated on this blog, so they don’t need to be covered in further detail. The edgarize_filename function is also something that has been used in previous blog scripts, so it doesn’t need to be described again either. In summary, all this function does is take a file name string as an input and return an EDGAR compliant version of that filename. Our main function here that does all the work is, as usual, the run function, so let’s take a closer look at that.


int run(int f_id, string mode){

    string              src;
    string              file;
    string              o_fname;
    string              new_fname;
    string              images[];
    string              f_contents;
    int                 rx;
    int                 ix;
    int                 size;
    int                 rc;

    // only run once
    if(mode!="preprocess"){
      return ERROR_NONE;
      }

    // get the folder we're going to pull images from
    src = BrowseFolder("Select Source Folder");
    rc = GetLastError();
    if(rc!=ERROR_NONE){
      return rc;
      }

    // get the images
    images = EnumerateFiles(AddPaths(src,"*.jpg;*.gif"),FOLDER_LOAD_NO_FOLDER_NAV);
    size = ArrayGetAxisDepth(images);

The run function starts off with the normal variable declarations. Then it checks its mode is not preprocess to ensure it doesn’t run twice (otherwise it would run once in preprocess mode and another time in postprocess). Next, we can invoke the BrowseFolder function to get a folder to scan for images. We need to check the return code of that function immediately in case the user cancelled the operation. If the function returned anything other than ERROR_NONE, we can return here, because it means there was an error or the user cancelled the operation. In either case, we cannot continue.

After we have a valid folder, we can get all the images in that folder with the EnumerateFiles function and ascertain the number of images in the folder with the ArrayGetAxisDepth function.


    // if there are no images
    if(size==0){
      MessageBox('x',"No images in target folder found.");
      return ERROR_NONE;
      }

    // get the name of the file we're saving to
    BrowseAddSaveScope(src);
    file = BrowseSaveFile("Save File", "EDGAR HTM File|*.htm",src);
    rc = GetLastError();
    if(rc!=ERROR_NONE){
      BrowseAddSaveScope("");
      return rc;
      }
    BrowseAddSaveScope("");

If we have no images, we need to display an error and then stop the script because there’s nothing else to do. If we have images, we can have the user create a new HTML file in that folder. Using the BrowseAddSaveScope function here is a good idea, since we want the user to save the HTML file in the same folder as the images. This way, the user cannot accidentally pick a different folder. Of course the user could still put the HTML file in a sub folder and cause problems, but adding error handling for every single contingency would complicate the code even further, and this is supposed to be a quick script. Putting a script like this into production necessitates better handling of these error situations. If the user cancelled, or there was an error, we need to reset the save scope and return. Otherwise, we can simply reset the save scope and move on.


    // create the file template
    RunMenuFunction("FILE_NEW_HTML");
    RunMenuFunction("FILE_SAVE_AS","Filename:"+file);
    rc = GetLastError();
    if(IsError(rc)){
      MessageBox('x',"Cannot save file %s, error %0x",file,rc);
      return rc;
      }
    RunMenuFunction("FILE_CLOSE");
    f_contents = FileToString(file);

Now that we have a list of images and a place to save our HTML file, we need to actually create it. Rather than manually creating a file and writing the basic HTML code into it, we can just use GoFiler to do it. The RunMenuFunction function is used here to create a new HTML file before saving the file as whatever name the user selected. If the file cannot be saved, we need to return an error message and exit the script. If the file was saved correctly, we can close the file and read the contents of the file to memory so we can modify it before writing it back out.


    // for each image, make sure name is EDGAR compliant.
    rc = YesNoBox('q',"Conform names to EDGAR compliant standard?");
    if(rc == IDYES){
      for(ix=0;ix<size;ix++){
        o_fname = GetFilename(images[ix]);
        new_fname = edgarize_filename(o_fname);
        new_fname = AddPaths(src,new_fname);
        rc = RenameFile(AddPaths(src,images[ix]),new_fname);
        if(IsError(rc)){
          MessageBox('x',"Cannot rename file %s",o_fname);
          }
        else{
          images[ix] = GetFilename(new_fname);
          }
        }
      }
    SortList(images,SORT_ALPHA_NUMERIC);

While writing this, I realized that the images might be output from a system that produces EDGAR compliant names, so I added this section to compensate by renaming images. Before proceeding, it’s important to ask the user if they want to rename images, since this is going to change the names of files on their machine. If the user presses “Yes”, we can iterate over each image, and for each, we can run the edgarize_filename function to build a new name. After that, we append the source of the images folder to it to get our new image name. Then we can use the RenameFile function to change the name of our image to our new EDGAR compliant name. This function may actually change names that don’t need to be changed For example, even though “img_001.jpg” is a perfectly compliant file name, our script will still change it to “img001.jpg” because it’s far easier to guarantee an EDGAR compliant name if it has no punctuation at all. This can potentially cause issues if you have “image_001.jpg” and “image-001.jpg” in the same folder though, since GoFiler will try to rename both to the same name. The second rename will fail, and the user will see an error message, so he or she can always go in afterwards and manually correct the image name.

After trying to rename the image, we need to set the new image name into our images array if it was successful. Once we loop through all images, we can sort our images array in ascending order to ensure images like “image001.jpg” come before images like “image002.jpg”. If the user wants some different ordering, this script could be modified to offer sorting options here, but for a basic, quick script this is sufficient to do the job.


    // write the file
    for(ix=0;ix<size;ix++){
      f_contents = ReplaceInString(f_contents,EMPTY_PARA,IMG_1+IMG_2+IMG_3+IMG_4+IMG_5+EMPTY_PARA);
      f_contents = ReplaceInString(f_contents,PBNUM,FormatString("%d",ix+1));
      f_contents = ReplaceInString(f_contents,FNAME,images[ix]);
      }

    // write the output
    StringToFile(f_contents,file);
    RunMenuFunction("FILE_OPEN","Filename:"+file);
    return ERROR_NONE;
    }

The final step here, after renaming all our images and sorting them, is to insert the contents into our output file. For each image, we can run three replace in string operations. The first operation will replace our EMPTY_PARA with our image tag defines, followed by another empty paragraph. This is important, since the next image will need that empty paragraph at the end to replace, so its own code. can be added. After we’ve inserted the code for our image and the page break, we can replace PBNUM with a string representation of the current page number (which is just the index of the image +1 to account for the zero indexed nature of arrays in Legato). After, we can replace FNAME with the name of the image to complete the image link. Once this is complete for each image, we can simply write the adjusted contents of our file back out to the file on the file system, open it for review, and then the script can exit. Doing multiple find and replace operations like this is pretty inefficient, even though it will work fine. It’s important to consider the application and the maximum file size we’re working with. if we had a 300 MB file, doing this many find/replace operations on it could take a very long time. However, the maximum number of images in an EDGAR filing is 500 images. Adding each image increases the size of our file by 5 lines, so the maximum size of a file our script can create is ~2500 lines. This is still going to be an extremely small document, so in this case, efficiency is not as important. Still, it’s important to consider the application of the script you’re writing, and consider if you will be working with larger files or not. If large files are a concern, then using more efficient means of writing out data, like perhaps a string pool with data appended to it, would be a better idea.

While writing this, I thought of quite a few different improvements that could be added, several of which I mentioned in the commentary above. However, as always we need to be mindful about cost/benefit of adding these features. Would it be nice if it forced the user to save the file to the current folder instead of allowing users to dump the file to a sub folder? Sure. It would require multiple additional conditional checks though, which would take more time to write, and the current, simpler approach is probably good enough for most users. We could do things like adding more sorting modes or implementing a UI to pick only certain images out of the folder to insert. However, all of these things would drastically increase the time needed to write a script like this, and that somewhat defeats the purpose of doing a quick-one-off Legato script to solve a problem. If you take 10 hours to solve a problem that saves 30 seconds, it’s going to take a very long time indeed for that 10 hours to actually pay for itself.

Here’s the full version of the script without any commentary:


/*
 * Insert Pictures To File
 *
 * Author: Steven Horowitz
 */


#define PBNUM   "%%%PBNUM%%%"
#define FNAME   "%%%FN%%%"

#define EMPTY_PARA "<P STYLE=\"margin: 0\">&nbsp;</P>\r\n"

#define IMG_1   "<P STYLE=\"margin:0\"><img src=\"%%%FN%%%\"></P>\r\n"
#define IMG_2   "<!-- Field: Page; Sequence: %%%PBNUM%%% -->\r\n"
#define IMG_3   "  <DIV STYLE=\"margin-bottom: 6pt\"><TABLE CELLPADDING=\"0\" CELLSPACING=\"0\" STYLE=\"border-collapse: collapse; width: 100%; font-size: 10pt\"><TR STYLE=\"vertical-align: top; text-align: left\"><TD STYLE=\"width: 33%\">&nbsp;</TD><TD STYLE=\"width: 34%; text-align: center\">-<!-- Field: Sequence; Type: Arabic; Name: PageNo -->%%%PBNUM%%%<!-- Field: /Sequence -->-</TD><TD STYLE=\"width: 33%; text-align: right\">&nbsp;</TD></TR></TABLE></DIV>\r\n"
#define IMG_4   "  <DIV STYLE=\"page-break-before: always; margin-top: 6pt\"><TABLE CELLPADDING=\"0\" CELLSPACING=\"0\" STYLE=\"border-collapse: collapse; width: 100%; font-size: 10pt\"><TR STYLE=\"vertical-align: top; text-align: left\"><TD STYLE=\"width: 100%\">&nbsp;</TD></TR></TABLE></DIV>\r\n"
#define IMG_5   "  <!-- Field: /Page -->\r\n\r\n"
#define IMG_6   "<P STYLE=\"margin: 0\">&nbsp;</P>\r\n"


        int             setup();
        int             run(int f_id, string mode);
        string          edgarize_filename(string filename);

int  main(){

    // if run from development mode
    if(GetScriptParent()=="LegatoIDE"){
      run(0,"preprocess");
      setup();
      }
    return ERROR_NONE;
    }

// setup function
int setup(){

    string              fn;
    string              props[];

    // set up hook codes
    props["Code"] = "INSERT_PICS_TO_FILE";
    props["MenuText"] = "Insert Images Into HTML";
    props["Description"] = "Insert a folder of images into a new HTML file.";
    fn = GetScriptFilename();

    // add hooks to menu
    MenuAddFunction(props);
    MenuSetHook(props["Code"],fn,"run");
    return ERROR_NONE;
    }

// run function
int run(int f_id, string mode){

    string              src;
    string              file;
    string              o_fname;
    string              new_fname;
    string              images[];
    string              f_contents;
    int                 rx;
    int                 ix;
    int                 size;
    int                 rc;

    // only run once
    if(mode!="preprocess"){
      return ERROR_NONE;
      }

    // get the folder we're going to pull images from
    src = BrowseFolder("Select Source Folder");
    rc = GetLastError();
    if(rc!=ERROR_NONE){
      return rc;
      }

    // get the images
    images = EnumerateFiles(AddPaths(src,"*.jpg;*.gif"),FOLDER_LOAD_NO_FOLDER_NAV);
    size = ArrayGetAxisDepth(images);

    // if there are no images
    if(size==0){
      MessageBox('x',"No images in target folder found.");
      return ERROR_NONE;
      }

    // get the name of the file we're saving to
    BrowseAddSaveScope(src);
    file = BrowseSaveFile("Save File", "EDGAR HTM File|*.htm",src);
    rc = GetLastError();
    if(rc!=ERROR_NONE){
      BrowseAddSaveScope("");
      return rc;
      }
    BrowseAddSaveScope("");

    // create the file template
    RunMenuFunction("FILE_NEW_HTML");
    RunMenuFunction("FILE_SAVE_AS","Filename:"+file);
    rc = GetLastError();
    if(IsError(rc)){
      MessageBox('x',"Cannot save file %s, error %0x",file,rc);
      return rc;
      }
    RunMenuFunction("FILE_CLOSE");
    f_contents = FileToString(file);

    // for each image, make sure name is EDGAR compliant.
    rc = YesNoBox('q',"Conform names to EDGAR compliant standard?");
    if(rc == IDYES){
      for(ix=0;ix<size;ix++){
        o_fname = GetFilename(images[ix]);
        new_fname = edgarize_filename(o_fname);
        new_fname = AddPaths(src,new_fname);
        rc = RenameFile(AddPaths(src,images[ix]),new_fname);
        if(IsError(rc)){
          MessageBox('x',"Cannot rename file %s",o_fname);
          }
        else{
          images[ix] = GetFilename(new_fname);
          }   
        }
      }
    SortList(images,SORT_ALPHA_NUMERIC);

    // write the file
    for(ix=0;ix<size;ix++){
      f_contents = ReplaceInString(f_contents,EMPTY_PARA,IMG_1+IMG_2+IMG_3+IMG_4+IMG_5+EMPTY_PARA);
      f_contents = ReplaceInString(f_contents,PBNUM,FormatString("%d",ix+1));
      f_contents = ReplaceInString(f_contents,FNAME,images[ix]);
      }

    // write the output
    StringToFile(f_contents,file);
    RunMenuFunction("FILE_OPEN","Filename:"+file);
    return ERROR_NONE;
    }

string edgarize_filename(string filename){

    string              firstchar;
    string              ext;

    ext = GetExtension(filename);
    ext = ChangeCase(ext,CASE_CHANGE_LOWER);
    filename = ClipFileExtension(filename);
    filename = ChangeCase(filename,CASE_CHANGE_LOWER);
    filename = ConvertNoPunctuation(filename);
    filename = ConvertNoSpaces(filename);
    if (GetStringLength(filename)>0){
      firstchar = GetStringSegment(filename,0,1);
      if((GetWordType(firstchar)&WT_TYPE_ITEM_MASK)==WT_TYPE_NUMBER){
        filename = "s"+filename;
        }
      }
    return filename+ext;
    }

Steven Horowitz has been working for Novaworks for over five years as a technical expert with a focus on EDGAR HTML and XBRL. Since the creation of the Legato language in 2015, Steven has been developing scripts to improve the GoFiler user experience. He is currently working toward a Bachelor of Sciences in Software Engineering at RIT and MCC.