Seite 1 von 5 Issue Date: FoxTalk July 2000 It Might Be Valid, But It's Still Wrong Paul Maskens and Andy Kramek This month, Paul Maskens and Andy Kramek discuss the problems of validating data entry. Paul: Okay, Andy, let's see what you make of this one. I have a simple data entry form for inserting a name, address, and phone number into a table. What I want to do is to make sure that the data is valid, so the question is: What sort of code should I be using, and where should I put it? Andy: Well, I'd say that it all depends on what you mean by a "Form," what you mean by a "Table," and, most importantly of all, what you mean by "Valid." Paul: That sort of cop-out won't do! This is a very real problem just how should we set about handling validation of user input. In the good old FoxPro 2.6 days, I've have simply used the VALID snippet of a control and put the code right in there. Andy: But my answer wasn't a cop-out, Paul. The issue of what sort of code, and where to place it, does depend on those three things! For example, if the "Form" is a simple data entry form running inside a VFP application directly against a local VFP table, then there's no problem just put the code into the Valid method of the controls. However, if your "Form" is actually being displayed in a Web browser, or is accessing remote data through ODBC, then you have a whole different set of problems. Paul: And that's is precisely my point where should the validation go then? In the Database Container? Andy: Well, that depends on what you mean by a "Table." Again, if we're talking a pure VFP solution, then it might be possible. But it's not going to work if you're using SQL Pass-Through to access a back-end data server you don't actually have tables in a DBC in that situation. Paul: Hmph! We don't seem to be getting very far. Maybe we should review the possibilities and see where that takes us? Andy: Sounds good to me. Let's start with the design (now there's a novel concept <g>). What are you actually trying to build, and on what sort of platform is it going to run? Paul: I don't know! Actually, on reflection, perhaps it would be more accurate to say that I can't predict how this application is going to evolve. What I can be sure of is that I can't afford to tie myself to a "pure VFP" solution. I must at least plan for the possibility that this application will have to run against a remote data source and might even require multiple user interfaces. Andy: Good, we have a starting point, then. What you're saying is that we'll need to use at least a three-tier model, and we should probably plan for a true n-tier architecture. Paul: I've never really been happy about this distinction. When does a three-tier model become n-tier? Andy: I don't think it works like that. The basis of the designs is completely different. But I agree that the terminology is certainly unclear. I think that the difference is best illustrated diagrammatically. Figure 1 is a layered three-tier model. Figure 1: Layered tree-tier architecture. Paul: Yes, that looks right. Each tier comprises a number of layers, and there are always at least three: an "upward" and a "downward" interface, and a core, which may itself be made up of a number of layers. The basic rule here is that any layer may have knowledge of only the layer immediately below itself. Andy: That's a very important rule too! In this diagram, the middle tier's UI Interface Layer doesn't need to know anything about what exists above itself, but must know how to address the Rules Layer below. Conversely all that the User Interface needs to know about is the Public Interface of the Middle Tier. Paul: Which, yet again, emphasizes the importance of programming to interface, not implementation. So for the n-tier model,
Seite 2 von 5 what's different? Andy: In practice, nothing. What we're really doing is dividing the tiers into separate functional components, as opposed to incorporating the functionality into layers within a single tier. An n-tier diagram might look something like Figure 2. Figure 2: N-tier architecture. Paul: Ah, I see. As we have the three-tier model drawn, the middle tier must include both the actual business rules and the necessary functionality to communicate with the data tier. So if more than one application wants to share the same data, we'd need to duplicate that code in a different middle-tier object. Andy: Exactly in the n-tier model, the task of communicating with the database is no longer part of the middle tier, but is separate in its own right. Paul: That's very clear, Andy. A picture really is worth a thousand words in this case. I agree that the n-tier model is the way we should go, though the price to pay is obviously going to be a more complex set of interfaces and an increase in messaging. But at the risk of keeping to the point, where should our validation go then? Andy: The diagram makes it pretty clear that it really has to go in either the Application Rules Tier, the Data Tier, or both. Paul: I can see why you're saying the Application Rules anything above that tier really has to be UI-specific and, since we're looking for data validation, that should be independent of the UI. I don't see why you'd have validation in both places, though. Andy: That would depend on the implementation. But in general I'd say that business rules must be implemented only in the Rules Tiers and integrity rules only in the Data Tier. Paul: I'm not sure that I follow your distinction here we're not back into "what is data" again, are we? Andy: Not quite <bg>. There's a distinction between rules that are required for enforcing the integrity of data and those that are purely business related. Paul: So what you're saying is that a rule like this: "An entry to the Orders table must reference a valid entry in the Customer table." is a data integrity rule, while "All entries in the Customer table must have a telephone number." is merely a business rule? Andy: Exactly. Violating the first would break the referential integrity of your database, and so it must be implemented in, and by, the database through whatever mechanism it provides for enforcing referential integrity. The second, however, doesn't really matter to the database one way or the other. If a customer record is missing the telephone number field, it won't change my ability to access the database in any way (other than by searching via telephone number, of course). Paul: I have no problem with the database enforcing referential integrity that seems entirely proper. But you also seem to be saying that we should not be enforcing field- and, by extension, record-level validation in the database. If that were really the case, then why would databases have field- and record-level validation rules built into them? Andy: Well, actually they don't. Built-in field/record-level validation is a feature of the Visual FoxPro DBC, but it's not a normal part of a SQL database. Such validation is enforced either through index constraints or in triggers and stored procedures. Some back-end databases will allow for default values to be specified in the table definition, but I don't know of one that permits field-level validation like Visual FoxPro does. Paul: Ah! So it's Visual FoxPro that's out of line here, not other databases. I suppose this is a consequence of the way in which the DBC has to be implemented in a file-based database like Visual FoxPro. Andy: That seems a reasonable assumption, but I wouldn't really know. So where are we now in terms of your original question? Paul: Well, I think we're agreed that I need to implement an n-tier design and that the job of maintaining the referential integrity
Seite 3 von 5 will be left to the database. Business rules will be enforced in their own tier, which will provide a standard interface so that different presentation tiers can access them. That leaves only the issue of "assistive validation" to be addressed, then. Andy: Assistive validation? What does that mean? Paul: I mean validation placed in the UI with the objective of assisting the user during data input. One thing that really annoys me is the kind of interface where I enter a whole lot of data, hit the Save key, and the damn thing then sneers at me and says something like "You may not leave the telephone number field blank!" Assistive validation means trapping this sort of thing immediately when it occurs, rather than waiting for the results of the submission process. Andy: Oh boy, Paul, you really must hate the Web interfaces then <g>. Very few of them have this sort of validation, but I can see where you're coming from, and you do have a good point. Paul: Of course, to implement it we should be making calls into the Application Tier rather than simply adding yet more code to the UI and duplicating functionality. Andy: Good, that sounds very reasonable. Most importantly, it also provides for situations in which different business rules may apply to the same set of data in different circumstances. Paul: Huh? How can two sets of rules apply to the same data; surely that's not very sensible. Andy: On the contrary, it can easily happen. Consider the case where you have a common table in which the addresses and phone numbers of all people with whom you deal are stored. It might be entirely reasonable to say that a "supplier" must have a telephone number, but surely an employee need not have one? If you were to enforce the "must have a telephone number" rule in the database, either you'd be unable to employ someone unless they had a telephone, or you'd need a special table for storing the addresses of employees. Paul: Or, more likely, the users will just put some dummy or meaningless information into the telephone number field. Which brings us back to the specific problem I originally wanted to address how should we validate user input fields? Andy: I suppose that there are essentially four situations with which we have to deal and each requires a different solution. Paul: I think I see where you're going; let me guess. First is "Choose a Value from a List." Andy: Absolutely! This one is easy to deal with because we can use a list or combo box and simply populate it with all the valid options, and only the valid options! If necessary, we can force a default by setting the ListIndex property to the appropriate value. Paul: I'd go further and say that a default is always necessary when using a predefined list from which an option must be chosen. Andy: I wouldn't disagree there. The second case is where there's a list of possible values, but new ones can be added by the user at runtime. You could just use a standard combo box to cover this. Paul: No, I don't like that idea at all. The combo box will allow you to enter data for only one field directly. I'm much more likely to use a lookup table that includes a primary key, code, and description. I could add only one field in the combo, so I'd much prefer to have a proper data entry form called when a new entry to the list is required. Andy: Good, we're agreed on that one too. So for both of these situations, the issue of validation has to be addressed when an item is added to the list rather than in the application at runtime. Paul: Ah, but in the second case, the addition of the item is actually going to happen at runtime. Andy: That could be true for either case don't you include pick-list maintenance screens in your application? Paul: Of course, so what we're saying is that using list-based values merely shifts the problem of validation to the point at which the item is added to the list. Andy: Yes, in reality we come down to only two situations. Either we're dealing with "formatted" or "unformatted" input. Paul: What do you mean by "formatted"? You don't just mean using the Format or InputMask properties, do you? Andy: No. Although they're useful in many circumstances, they're applicable only when we're actually using Visual FoxPro directly (either as DBC properties or as properties of native controls). I mean input where we can define a value's type, range, or both.
Seite 4 von 5 Paul: Specifying that a value must be a "date" would qualify it as formatted input, then? Andy: Yes, because there are standard rules for checking dates. Similarly, specifying that a value must be a number between 0 and 10 would qualify as formatted. Paul: But now you're implying that this should go into the User Interface by setting up a Visual FoxPro text box with a date value, for example. Andy: Not at all! The fact that, in the Visual FoxPro-based UI we can set up a text box to accept only date values is a bonus, but it doesn't relieve us of the necessity to ensure, in the Application Rules tier, that the value that's been supplied is actually appropriate. Paul: Ah! You're talking about ensuring that the value supplied is actually valid for the purpose for which it's been entered! For example, if we want to specify a start and an end date for a reporting period, we can use a date text box control in the UI to ensure that the user can enter only dates. However, the check that the end date is later than the start date still belongs in the Application Rules. Andy: I'd say that you need to check both that the value is really a date and that it conforms to the business rules. The same applies to any other formatted entry. Paul: This implies that there are two stages to validating formatted inputs. The first is to ensure that the input supplied is of the correct type. This might be enforced in the UI assuming it supports the necessary functionality but should still be checked in the Application Rules. The second is to ensure that the input is valid in the context of the application, which must be done in the rules tier. That seems entirely reasonable, but what about unformatted input? Andy: That's a more difficult problem. Since we're saying that the input isn't formatted, by definition we can't know what it's supposed to contain, and therefore the only approach we can take is to ensure that it is not invalid. Paul: That suggests that the first stage of validation is actually the same whether we're dealing with formatted or unformatted data, then? Andy: And so it is. The second-stage validation is what differs. Paul: So maybe first-stage validation could be implemented as a separate layer. That would help with the assistive validation too! Andy: Nice one! I hadn't thought of that. Paul: But, as always, as soon as we start saying things are the same, we should be thinking in terms of abstracting the implicit functionality. At least, that's what you keep telling me! Andy: Okay, I just hadn't thought of it in those terms. So, when we're dealing with formatted data, we can apply positive rules because we know what the data must look like. Conversely, for unformatted data, all we can apply are negative rules because all we know is what it can't look like. Paul: Coming back to my name, address, and telephone number example (which is where we started), I can see that I'm in trouble. There's no universal format for a name, for an address, or even for a telephone number! Andy: I'm afraid not. Of course you might be able to define some local rules. Addresses and phone numbers within the UK do conform to some basic standards, and so do those in the USA, although the standard is different, of course. Paul: Yes, I can see how to do that all right. A simple root class, specialized for different locations and implemented at runtime using a strategy based on locale would handle it. But the actual validation still bothers me. Andy: It will, because what you're really dealing with in this situation is unformatted data and, as we just said, all you can do is apply negative rules. Paul: So the best I can do is to say that the telephone number must be a character string containing at least nine and not more than 13 digits, can't be empty, and might (or might not) contain a "+" or a "-" or parentheses or periods. Andy: If those are the rules that you want to apply, then yes. Of course, you could include a look-up into a list of valid area codes and even apply formatting rules to the numbers if appropriate. Paul: This is my real problem. I was hoping that I could stop my users from doing things (like entering telephone numbers that don't exist) to bypass the validation and so have a greater degree of confidence that this data would be valid. But I can see
Seite 5 von 5 that I can't. Andy: Sorry to disagree with you there, Paul. You can always ensure that the data is valid, what you can't do is to ensure that it is right! This is the fundamental problem with all data entry. No matter how carefully you validate your data, there's no way to detect when an input is wrong if it meets the rules. After all, your software can't possibly know that the name "Paul Maskers" should really have been entered as "Paul Maskens." Paul: Of course. So the conclusion is that, while we can check that data is entered according to rules, and we can even specify those rules at various levels, there's just no software solution to the problem of data that is "valid, but wrong."