The Philosophy of FileMaker - Parsing A Web Form

"No Holding Back FileMaker Blogging"
Navigation: About John Mark Blog Listing Blog Roulette Contact John Mark Subscribe to Newsletter FileMaker Friends
Support this site by clicking on a sponsor below or becoming a patron! Document Management Videos Become a patron of this FREE web site!
Recent Blogs:
Window Locking
Everything Changes
Subsummary Unique Count
Scripted Change Log
Abstracted Log

Meta-Consulting Meta-consulting is a service Database Pros offers to help other developers over programming hurdles via GoToMeeting. Email John Mark Osborne or call John Mark at (909) 393-4664 to find out more about this service.
Document Management Videos
Quick Tip:
Merge Field Madness I'm sure most of you have run into this problem at one point or another. You wish to place a merge field on the layout, but the name of the field extends beyond your available layout space. While the data still displays correctly in browse mode, the length of the merge field covers up nearby objects, making it difficult to work in layout mode. One solution is to shorten the name of your field. If you can't or don't want to do that, try changing the font size of your merge field so the text block fits in your available space. Keep the first less-than sign in your chevron (e.g. <) the font size you want to display the merge data. Even though most of the text in your merge field is a tiny font size, the first merge field character will set the size for the data. Pretty cool. This can also be done with symbols that are surrounded by curly brackets (e.g. {).

Document Management Videos
Fun Stuff:
The Wedge Claris Corporation and the FileMaker, Inc. have been housed in a building called the wedge in Santa Clara, California since they spun off from Apple, Inc. The building is nicknamed the wedge because it looks like a wedge. For some years they had an adjacent building called the interstitial but it was let go when the company downsized to focus on FileMaker products.

The Philosophy of FileMaker recommends PCI!
RSS Feed


`Parsing A Web Form`		Level: Advanced Version: FileMaker 17 Category: Calculations Tuesday, July 10, 2018

Have you ever gone back to modify a script six months later and wondered what the heck you did? Even with all the commenting and documentation, it still takes you a while to relearn the script. And, what if you want to move the script to another solution. You have to modify it to match the layout, field and table names in the new database. Dynamic scripting is about creating smart solutions. If you program a script with intelligence, it can adapt to layout, field, table and file modifications. It takes a lot of practice but can ultimately save you a lot of time and make you a better programmer.

FYI: Dynamic programming can also be referred to as Adaption or Indirection.

In this Article
In this article, an email generated from a form on a web site will be parsed. When a visitor clicks the submit button on a web form, their information is stored in a database or emailed via a CGI (Common Gateway Interface). In the case of the email, the data is stored in the body as a return separated list much like the following.

first_name: John
last_name: Osborne
address: 2066 Rancho Hills Drive
city: Chino Hills
state: CA
zip: 91709

HUH: Why not just publish your FileMaker database to the web using FileMaker Server and PHP? Unfortunately, not everyone has the luxury of a co-located machine running FMS so sometimes they need to go with a less expensive CGI option offered by their ISP. Besides, this is a common occurrence and we need a subject for this article. It's not always about the subject matter but what you learned that can be generalized to your own solutions.

The data from the body of the email can be moved into a single FileMaker Pro field via AppleScript, Visual Basic, a plug-in, some other third party scripting language or even using a simple manual copy and paste. The goal in this article is to parse the text from a single email field into separate fields and not to demonstrate how to get the email into FileMaker. For that you will have to go elsewhere. Here's a good article on Archiving Email in FileMaker using a plug-in.

Do your Homework
Before starting the complex text parsing routine, a warmup exercise of basic parsing will be covered to get your juices flowing. This material was thoroughly dissected in the previous article for this series titled Data Parsing. Let's see if you remember. The contents of the Name field for the formulas below is "John Osborne".

Left(Name; 4) = John

LeftWords(Name; 1) = John

Left(Name; Position(Name; " "; 1; 1) - 1) = John

Right(Name; 7) = Osborne

RightWords(Name; 1) = Osborne

Right(Name; Length(Name) - Position(Name; " "; 1; 1)) = Osborne

Simplest is not always best. If you've read other articles in this blog, I'm always touting simplicity. This is a good general rule to follow. Another rule is to choose the solution that best suits your needs. For example, the Left function is slightly faster than the other formulas since it uses the most basic algorithm. However, LeftWords is more dynamic since it will grab the first name for any entry length. Which is better depends on what you are trying to do. If you are using the Left function in a script that replaces the first 4 characters across a million records, it will be faster than the LeftWords function. If you have to grab the first word from a field then Left won't do the trick and you'll need the adaptive word separator algorithm built into the LeftWords function.

So, why would you use the Position function and complicate the formula even further? Well, the Word functions (LeftWords, RightWords and MiddleWords) predefine word separators. Spaces, commas, periods, colons and other characters are built into the intelligence of the Word functions. You have no control over what constitutes a word. However, the Position function allows you to to essentially define your word separator or even a string of characters. In other words, it's much more flexible.

The Position function uses four parameters to locate a value and return a number corresponding to the location. The first parameter is where Position looks. It can be a field, string or result of another function but is usually a field. The second parameter is what the Position function is looking for. It can be a field, string or result of another function but is usually a string. The third parameter is the starting character and the fourth parameter is the occurrence. These parameters are typically 1 but can be the result of a function or a field reference.

Position(String; Search; Start; Occurrence)

The Position function returns a value which is used in the Left function to make it adaptive like LeftWords. Again, the difference is you can look for any character(s), not just a space. One is subtracted because you don't want to include the space between the first and last name in the result.

If you aren't completely clear about the answers, please go back and read or reread the Data Parsing article. Otherwise, you're going to be lost in the complexity of the formulas presented below.

Parsing an Email
Let's use the Position function to parse an email from a web form. The first thing you want to do is examine the email and look for patterns in the text. Patterns can be spotted using the Position function and used to delineate the text you want to grab.

First_Name: John¶
Last_Name: Osborne¶
Address: 21521 Saint John Lane¶
City: Huntington Beach¶
State: CA¶
Zip: 92646¶

NOTE: The text above is a standard format for web form generated emails. The pilcrow characters (¶) represent the hidden return characters.

The first pattern you want to notice is that each line ends in a carriage return. The carriage returns are invisible but have been revealed in the text above. The other pattern is not as easy to spot. It varies from line to line but there is a pattern. Notice that each line starts with a field name followed by a colon and a space. Now that you have located the patterns, you can grab the text between. The calculation below accomplishes this task.

Middle("¶" & EMAIL::email & "¶"; Position("¶" & EMAIL::email & "¶"; Get(ActiveFieldName) & ":"; 1; 1) + Length(Get(ActiveFieldName)) + 2; Position("¶" & EMAIL::email & "¶"; "¶"; Position("¶" & EMAIL::email & "¶"; Get(ActiveFieldName) & ":"; 1; 1); 1) - (Position("¶" & EMAIL::email & "¶"; Get(ActiveFieldName) & ":"; 1; 1) + Length(Get(ActiveFieldName)) + 2))

The formula is quite complex until you notice that the same basic formula is repeated over and over. Let's start by adding returns and the Let function to the calculation to make it easier to read.

Let(

[@Email = "¶" & EMAIL::Email & "¶";
@Begin = Position(@Email; Get(ActiveFieldName) & ":"; 1; 1);
@FieldLen = Length(Get(ActiveFieldName)) + 2];

Middle(

@Email;

@Begin + @FieldLen;

Position(@Email; "¶"; @Begin; 1) - (@Begin + @FieldLen)

)

)

NOTE: The returns do not affect the result of the formula but do make it easier to read. You can easily spot each of the three parameters that constitute the Middle function. When returns are added to a formula, they are generally placed between parameters in the outside function. Spaces can also be used to indent very complicated nested calculations.

The Let function is declared at the beginning of the calculation to reduce redundant code and speed up the calculation of the formula. This allows the variables "@Email", "@Begin" and "@FieldLen" to reference the repetitive code and even makes it easier to read the formula. Returns are added to the Email field to make sure each line is surrounded by returns (you'll understand why, later in the exercise).

Let(

[@Email = "¶" & EMAIL::Email & "¶";
@Begin = Position(@Email; Get(ActiveFieldName) & ":"; 1; 1);
@FieldLen = Length(Get(ActiveFieldName)) + 2];

<the rest of the formula>

)

SIDE TIP: I use at signs (@) at the beginning of my variable names for a couple reasons. First, it makes the variable names stand out when you are searching for them in the calculation formula. Second, it allows you to declare a variable of any name. For example, if you declare a variable named "Left" it disrupts the functionality of the existing Left function. If you name your variable "@Left", you can still use the name you want and not prevent the use of the original Left function.

The actual result portion of the Let function is a Middle function call. The first parameter of the Middle function is the field to search. This is the Email field where the body of the email is contained. The second parameter is what character to start at when grabbing the text. Instead of supplying a static value, a nested formula consisting of the Position function is provided. The third parameter determines the number of characters to grab, but we'll get to that later.

Middle(

@Email;

@Begin + @FieldLen;

Position(@Email; "¶"; @Begin; 1) - (@Begin + @FieldLen)

)

Let's examine the second Middle function variable more closely. Below, the Let variables have been replaced with the actual code, returns have been added to highlight the four parameters of the Position function and some of the functions have been replaced with content from a sample record to aid in the learning process. Since the Get(ActiveFieldName) function returns the name of the currently selected field (where the cursor is blinking), the formula changes depending on the selected field. If the currently selected field is "First_Name", the formula will distill to the following.

Position(

"<Email Content>";

"First_Name:";

1;

1

)

+ Length("First_Name") + 2

The Position function locates the first occurrence of the text string "First_Name:" in the Email field starting at the first character of the Email field. A number is returned corresponding to the beginning of the string. This is very important to understand why the Length of the Get(ActiveFieldName) plus two is added onto the position. It is the end of the "First_Name:" string we want to locate in order to specify the beginning of the first name. The function with our sample data will ultimately return a thirteen corresponding to the thirteenth character in the Email field.

First_Name: John¶
12345678901234567

FYI: The result above has been numbered to assist in counting the number of characters. If the length of the field name plus two characters is not added to the position of the field inside the email, a number one will be returned. In the example above a thirteen is required to locate the beginning of the first name contents.

The third parameter in the Middle function specifies how many characters to grab, starting from the point specified in the second parameter. It looks even more daunting but it is not that difficult when you realize it repeats most of the same code from the previous steps.

Position(

"¶" & EMAIL::Email & "¶";

"¶";

Position("¶" & EMAIL::Email & "¶"; Get(ActiveFieldName) & ":"; 1; 1); 1) -

(Position("¶" & EMAIL::Email & "¶"; Get(ActiveFieldName) & ":"; 1; 1) + (Length(Get(ActiveFieldName)) + 2)

)

The first thing you will notice is that the code does not reference the variables. Even though it makes it more difficult to read, it helps in the learning process IMHO. It also makes you appreciate the Let function more. When I started working with FileMaker, there was no Let function so all formulas were written like this.

There are many carriage returns in the Email field. In order to locate the one after the current line, one Position function is nested within another. The inner Position function locates the starting point for the outer Position function. The inner Position function locates the current field name within the Email field and returns the corresponding character position. This is applied to the third parameter of the outside Position function and tells it to start looking at the beginning of the line containing the desired carriage return.

The location of the carriage return is subtracted from the location of the beginning of the first name. This is very similar to the technique discussed at the beginning of this section where the last name was grabbed by subtracting the length of the full name from the beginning of the last name. If the third parameter of the Middle function required the end character rather than how many characters to grab, the formula wouldn't require the portion of the formula following the minus sign. However, we have to work within the confines of the function parameters and must provide how many characters to grab for the third parameter of the Middle function.

NOTE: If the Email field was not surrounded by returns, the last line might not end in a return and cause the last field from the web form to not be parsed.

Scripting the Calculation
Let's take a look at the script where this formula is utilized. The Set Field script step in the screen shot is cut off at the right side, but no worries as it simply contains the formula previously discussed.

The script consists of two loops. The inner loop cycles through all the fields in the tab order on the current layout. This is often referred to as a field loop. The outer loop cycles through all the records in the current found set. This is often termed a record loop. The first step is a Freeze Window which speeds up any record looping script by preventing screen redraw that occurs from record to record. The Go to Record/Request/Page [First] is another common step that precedes a record loop. The purpose of this step is to initialize the record loop to ensure every record in the found set is parsed. If the current record is in the middle of the found set, the loop will start there and skip all the records prior to the current one.

The record loop begins by selecting a field on the layout using the Go to Next Field step. One may or may not be selected, but that won't matter. What is important is to select any field in the tab order so a starting point for the field loop can be established. All fields in the tab order are considered available for parsing and the reason why a special layout is best for this solution (see step 2). Otherwise, it's possible someone may accidentally include the email field in the tab order causing the original email to be destroyed. It's usually best not to marry a script to a layout but in this case it's likely to prevent more problems than it will cause.

The Set Variable script step outside the record loop records a starting point for the field loop since there is no way to exit a loop after the last field in the tab order. Although, this would be a cool option on the Go to Next Field script step... hint, hint FMI. The current field name is stored in a variable and will be used to exit the loop later.

The first step after entering the inside loop is to parse the currently selected field. A Set Field step with the formula discussed previously parses the currently selected field. No destination field is specified so FileMaker Pro places the result of the calculation into the currently selected field. Insert Calculated Result could also be used with the same results.

The script advances to the next field and determines if it should exit the loop when the $StartField field equals the currently selected field. Otherwise, it cycles through the steps once again to parse the next field. When the field loop exits, the record loop takes control again. It advances to the next record and enters the field loop again. If there are no more records, the outside loop will exit and the script will end.

Adaptability Test
Try this test using the sample file provided at the end of this article. Create a new field in Manage Database and add it to the "Parse" and "Form" layouts. Type the name of that field in the Email Data field with corresponding data, being careful to include the colon and space between the field and the data. It doesn't matter where the field is located as long as the name you type matches the Manage Database field exactly. When you run the script, the new field will be parsed without modifying the script. That's a truly dynamic script that adapts to new fields without modifying the script.

Extra Credit
The Position function searches for the field name and a colon to make sure it locates the correct field reference. For example, a user could enter a name of "Felicity". If a colon was not added to the field name, the word "city" inside the name "Felicity" would be located instead of the City field reference. How can you make the search value even more unique? That's right. Add an extra return to the search criteria for the Position function. Since the contents of the Email field was surrounded by an extra return, every field name is preceded by a return.

Let(

[@Email = "¶" & EMAIL::email & "¶";
@Field = "¶" & Get(ActiveFieldName) & ":";
@Begin = Position(@Email; @Field; 1; 1) + 1;
@FieldLen = Length(@Field)];

Middle(

@Email;

@Begin + @FieldLen;

Position(@Email; "¶"; @Begin; 1) - (@Begin + @FieldLen)

)

)

You'll see the change with the declaration of the @Field variable. The @Begin variable also has a one added to it to account for the return character added to the @Field parameter. Otherwise, the Position function will locate the return at the end of the preceding line.

Is it Worth it?
It's hard to say whether indirection is worth it or not. If you only use the script once then probably not. If you end up using the script hundreds of times then you just saved yourself a ton of time. If you are all about the journey then you'll always learn something when programming dynamically. I think you have to decide who's paying for the work. Is it your client or you.

Author:
John Mark Osborne
jmo@filemakerpros.com
www.databasepros.com

Example File (available to patrons at $5.00 per month): Download here if you are a patron

This blog is completely free. Please support it by clicking on one of the advertisers at the left side of the window or becoming a patron. Thanks so much!

Comments:

Mauricio Navarro

07/10/2018

Very Good. What do you do if "Name" is Mary Joe, not just Mary. Do you have to count the words in "Name" and assume the last word is "Last Name"?, what if "Last Name" is hyphenated? And further more, in Latin America we use two "Last Names", so a full name could go: MARIA DEL CARMEN GUTIERREZ RAMIREZ or worse most married women use their husband's last name w a preposition: MARIA DEL CARMEN GUTIERREZ RAMIREZ de AZUERO, where the first name is composed of three words and last name of two "de" is the preposition and "Azuero" is the husband's last name. How to go about that. I have a 2 million+ database (phone directory) that I have been trying to parse; the only way I am solving the "riddle" is by first having two fields where I manually type how many words for "name" and how many for "last name", then doing something similar to what you are showing here.

Response by: John Mark Osborne

07/11/2018

Yes, the scenarios you mention are more complicated. In the first article in this series, the issue was discussed (there's a link in this article to the first article). I went down the road of more complicated names for a bit in the first article but then referred the reader to another article on name parsing that covers all name variations. The focus for these articles are general data parsing skills so that's why I didn't confront the issue head on. However, feel free to refer to the article I linked. Happy FileMaking!



Add Comment:

First Name:	*
Last Name:
Email:	*
Web Site:
Comment:	* Email Addresses will not be shared on the web site!

Navigation:

Quick Tip:

Fun Stuff: