Before we begin I would like to take a second and get some of the legal discussion out of the way. Legato allows you to automate the conversion process. However, this feature is not intended to be used outside of the normal licensing of GoFiler. Use of the conversion features in no way abrogates the per workstation licensing requirements of the application or its components. Use of the conversion SDK functions serving commercial production services, web services, or large scale filing is prohibited under the standard license agreement unless covered by a separate and specific license agreement. Essentially, this is provided as a tool to make your own conversion faster, not as a way to build the conversion into your own conversion software.
With that out of the way, let’s dive into what you can do with the ConvertFile function. This is the syntax:
int = ConvertFile ( string source, string destination, [handle hLog], [string options] );
This is an extremely flexible function. All it requires is the location of a file to convert and the location where Legato will put the file once it is finished converting the file. The function does all of the heavy lifting for you. It returns an int with either ERROR_NONE or a formatted error code on failure.
The ConvertFile function will use GoFiler’s conversion functions to convert a number of different file types to HTML or ASCII text files. Here is a short list of the options that you can use:
- CSV to HTML
- CSV to Text
- DOC(X) to HTML
- DOC(X) to Text
- HTML to HTML
- RTF to Text
- XLS(X) to HTML
- XLS(X) to Text
- Text to HTML
- PDF to HTML
- PPT to HTML
You tell the function which option you are using by giving it two filenames including the extensions. The function does the rest for you, including figuring out what conversion mode is necessary.
The conversion functions in GoFiler may use a number of outside programs under the hood in order to perform the conversion. For example, converting Word or Excel documents into HTML will use OLE automation to open these programs on the computer and get information out of them. This means that in order for the function to work properly, Microsoft Office would have to be installed. If the OLE automation fails an error will be returned and the conversion will fail. An important detail to note is that if the destination string already exists as a file, the function will automatically overwrite the existing file. If this is going to be a problem, you will have to check if the file exists before you convert the file.
The optional parameters are a handle to a log object where conversion details will be put and a string of options for the conversion. If a log is not given to the function, the default log will be used instead. The options string can be a set of property: value; parameters that will override the application’s current conversion settings for the properties that are specified. If you do not need to override any of the conversion settings that are currently set in the application preferences, you can leave this parameter blank. For a full list of properties that can be set please refer to the Legato Script Reference, Chapter 22.1.2: Conversion Options and Parameters.
The final note about the ConvertFile function before I show an example script is that the options are limited by the product that you are using. If the application does not support the conversion mode that is being requested, an ERROR_UNSUPPORTED (0x86000000) will be returned. This will also happen if you pass an incorrect conversion mode to the function. An example of this would be if you ran:
ConvertFile("D:\\Test\\Test.docx", "D:\\Test\\Test.pdf")
Executing this code will return an ERROR_UNSUPPORTED. This is also true if the specific application does not support the conversion. For example, GoXBRL cannot convert PDF to HTML. You will have to be careful in checking the error returned to make sure that you know what has occurred before continuing onward with your script.
Now let’s take a look at an example script that I have put together to show off the function in action. This is a fairly simple script where a user can choose a local folder and the script will convert all .docx files to HTML documents.
/*****************************************************************************************************************
Conversion Example
------------------
Revision:
05-17-19 JCK Initial creation
Notes:
-
(c) 2019 Novaworks, LLC. All Rights Reserved.
*****************************************************************************************************************/
int rc;
string folder;
string files[];
string newname;
int numfiles;
int count;
folder = BrowseFolder("Select Folder to Convert");
rc = GetLastError();
files = EnumerateFiles(folder+"\\*.docx", FOLDER_LOAD_FOLDER_NAMES);
numfiles = ArrayGetAxisDepth(files);
while (numfiles == 0 && rc == ERROR_NONE) {
MessageBox("No files found to convert. \r\nSelect a folder with Word documents.");
folder = BrowseFolder("Select Folder to Convert", folder);
rc = GetLastError();
files = EnumerateFiles(folder+"\\*.docx", FOLDER_LOAD_FOLDER_NAMES);
numfiles = ArrayGetAxisDepth(files);
}
ProgressOpen("Converting Files");
count = 0;
while (count < numfiles) {
ProgressUpdate(count+1, numfiles);
ProgressSetStatus(1, "Converting file %d of %d", count+1, numfiles);
ProgressSetStatus(2, "%s", files[count]);
newname = ReplaceInString(files[count], ".docx", ".htm", false);
if (newname != "") {
rc = ConvertFile(folder+files[count], folder+newname);
if (rc == ERROR_NONE) {
AddMessage("File %s successfully converted.", files[count]);
}
else {
AddMessage("File %s failed to convert with error code 0x%08X.", files[count], rc);
}
}
count++;
}
ProgressClose();
This script can be broken up into two halves: getting the folder from the user and converting all of the documents. As always, we start the script by declaring some variables and then we do this:
folder = BrowseFolder("Select Folder to Convert");
rc = GetLastError();
files = EnumerateFiles(folder+"\\*.docx", FOLDER_LOAD_FOLDER_NAMES);
numfiles = ArrayGetAxisDepth(files);
while (numfiles == 0 && rc == ERROR_NONE) {
MessageBox("No files found to convert. \r\nSelect a folder with Word documents.");
folder = BrowseFolder("Select Folder to Convert", folder);
rc = GetLastError();
files = EnumerateFiles(folder+"\\*.docx", FOLDER_LOAD_FOLDER_NAMES);
numfiles = ArrayGetAxisDepth(files);
}
We ask the user for a folder in which we can search for files to convert. When one is selected, we enumerate the folder searching for any files that have a “.docx” filetype using the EnumerateFiles function, and we then see how many of those files are found. If the number is zero and the user clicked on the “OK” button (if the user clicks on “Cancel” rc will be ERROR_CANCEL), we show the user a message box asking the user to select another folder. We can display the same dialog again for the user to do that, this time opening to the location that was selected last time, so if the user accidentally clicked on a folder one level above where he or she meant to, it is easy to select the correct location. This loop will continue to run until a valid folder is selected or the user cancels the dialog. We then take the array of files we found for conversion use it as a basis for our next loop:
ProgressOpen("Converting Files");
count = 0;
while (count < numfiles) {
ProgressUpdate(count+1, numfiles);
ProgressSetStatus(1, "Converting file %d of %d", count+1, numfiles);
ProgressSetStatus(2, "%s", files[count]);
newname = ReplaceInString(files[count], ".docx", ".htm", false);
if (newname != "") {
rc = ConvertFile(folder+files[count], folder+newname);
if (rc == ERROR_NONE) {
AddMessage("File %s successfully converted.", files[count]);
}
else {
AddMessage("File %s failed to convert with error code 0x%08X.", files[count], rc);
}
}
count++;
}
ProgressClose();
We open a progress box with the ProgressOpen function as conversions can take some time and we do not want to leave the user hanging. We then enter a loop going through each file in the array. At the beginning of each time through the loop, we update the progress bar with the ProgressUpdate function and set the status with the ProgressSetStatus function to report not only how far we are through the process but also the current filename being converted. Files can take a longer or shorter time to convert depending on the size and complexity, so we can provide the user with exact information about which files are taking up processing cycles.
Our next step is to figure out what the new name of the file should be. In this case, since we are converting from Word Document to HTML, we can search for “.docx” in the name string and replace it with “.htm” with the ReplaceInString function. We also make sure that this search is case-insensitive so that we don’t miss a file. After that, we do our conversion with the ConvertFile function. If our conversion returns anything other than ERROR_NONE, we note that in the log. We then repeat this code for every file that we found in the folder, finish up our loop, and close our progress window. The default log will be shown after the script ends.
You’ll notice that this script does not include a main or a hook like a lot of our examples. In this case I was writing an example that could be used a separate function and included as part of a larger program, like a step in a function for starting someone’s day before they clean up all of these newly converted documents.
Converting files is one of GoFiler’s most powerful features. Respecting the rules of the EDGAR system is no easy task, but the conversion tools offered make it easy to do, and using the ConvertFile function in Legato allows you to easily integrate this conversion into your own personal steps for creating filings and otherwise modifying files.
Joshua Kwiatkowski is a developer at Novaworks, primarily working on Novaworks’ cloud-based solution, GoFiler Online. He is a graduate of the Rochester Institute of Technology with a Bachelor of Science degree in Game Design and Development. He has been with the company since 2013. |
Additional Resources
Novaworks’ Legato Resources
Legato Script Developers LinkedIn Group
Primer: An Introduction to Legato