XHTML and Accessibility in ASP.NET Whidbey

16515 ワード

Programming with web standards in mind, although vastly ignored, is becoming more and more important. It almost seems it took too long to promote ASP.NET. Now that we're over the hill and "this stuff works"it is about time to start paying attention to web standards. In this article you will learn how to implement a response filter and plug it into the ASP.NET pipeline. The filter will transform outgoing HTML into XHTML 1.0-compliant markup.
A Call To Web Standards
Having read thousands of pages of articles on the web, magazines, MSDN documentation, online forums, etc, I still don't recall seeing a call for producing ASP.NET code compliant with web standards. Wait! I take it back. I saw one—a post in Scott Guthrie's blog. This one is a must-read! According to Scott, the upcoming version of Visual Studio .NET will feature server controls that produce web standard-compliant code, accessibility validation, etc. Now that's very good news!
As of the time of this writing ASP.NET does not produce code that is capable of passing successful validation in any of the SRTICT modes (see Eric Meyer's Picking a Rendering Mode and W3C's List of valid DTDs you can use in your document for more information on DOCTYPEs). To enforce XHTML compliant code it takes some effort to implement automatic code cleaning (all right, fudging).
The point of this article is two-fold—to reiterate the importance of web standards and learn how to implement response filters.
Anatomy of HTTP Response Filters
Instead of creating an abstract sample for this discussion, I'll refer to a real-world example of a filter application. This very site, www.AspNetResources.com, utilizes this filter to enforce XHTML 1.0 Strict compliancy.
The HttpResponse class has a very useful property:
public Stream Filter {get; set;}

MSDN provides a helpful description of this property: "Gets or sets a wrapping filter object used to modify the HTTP entity body before transmission."Confused? In other words, you can assign your own custom filter to each page response. HttpResponse will send all content through your filter. This filter will be invoked right before the response goes back to the user and you will have a change to transform it if need be. This could be extremely helpful if you need to transform output from "legacy"code or substitute placeholders (header, footer, navigation, you name it) with proper code. Besides, at times it's simply impossible to ensure that every server control plays by the rules and produces what you expect it to. Enter response filters.
The Filter property is of type System.IO.Stream . To create your own filter you need to derive a class from System.IO.Stream (which is an abstract class) and add implementation to its numerous methods.
using System;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;
using System.Web;

namespace AspNetResources.Web
{
/// <summary>
/// PageFilter does all the dirty work of tinkering the 
/// outgoing HTML stream. This is a good place to
/// enforce some compilancy with web standards.
/// </summary>
public class PageFilter : Stream
{
    Stream          responseStream;
    long            position;
    StringBuilder   responseHtml;

public PageFilter (Stream inputStream)
{
    responseStream = inputStream;
    responseHtml = new StringBuilder ();
}

#region Filter overrides
public override bool CanRead
{
    get { return true;}
}

public override bool CanSeek
{
    get { return true; }
}

public override bool CanWrite
{
    get { return true; }
}

public override void Close()
{
    responseStream.Close ();
}

public override void Flush()
{
    responseStream.Flush ();
}

public override long Length
{
    get { return 0; }
}

public override long Position
{
    get { return position; }
    set { position = value; }
}

public override long Seek(long offset, SeekOrigin origin)
{
    return responseStream.Seek (offset, origin);
}

public override void SetLength(long length)
{
    responseStream.SetLength (length);
}

public override int Read(byte[] buffer, int offset, int count)
{
    return responseStream.Read (buffer, offset, count);
}
#endregion

#region Dirty work
public override void Write(byte[] buffer, int offset, int count)
{
    string strBuffer = System.Text.UTF8Encoding.UTF8.«
                              GetString (buffer, offset, count);

    // ---------------------------------
    // Wait for the closing </html> tag
    // ---------------------------------
    Regex eof = new Regex ("</html>", RegexOptions.IgnoreCase);

    if (!eof.IsMatch (strBuffer))
    {
        responseHtml.Append (strBuffer);
    }
    else
    {
        responseHtml.Append (strBuffer);
        string  finalHtml = responseHtml.ToString ();

        // Transform the response and write it back out

        byte[] data = System.Text.UTF8Encoding.UTF8.«
                              GetBytes (finalHtml);
        
        responseStream.Write (data, 0, data.Length);            
    }
}
#endregion

As you can see most methods have more or less dummy code. The Write method does all the heavy lifting. Before we transform the output stream we need to wait until the buffer is full. Therefore a Regex looks for the closing </html> tag.
Now that we have the entire HTML response string we can transform it. I really liked Julian Roberts' approach as laid out in his Ensuring XHTML compliancy in ASP.NET article, although I chose to redo the regular expressions to my liking.
Forcing XHTML Compliancy
Basically, this particular filter simply tries to fix a few of the known inconsistencies:
  • Place the __VIEWSTATE hidden input in a <div> to make the validator happy.
  • Remove the name attribute from the main form. By default your server-side form gets a name and an id attribute. The validator is not happy about the name attribute so we need to get rid of it.

  • My first take is to wrap the __VIEWSTATE input in a <div> :
    // Wrap the __VIEWSTATE tag in a div to pass validation
    re = new Regex ("(<input.*?__VIEWSTATE.*?/>)",
                     RegexOptions.IgnoreCase);
    
    finalHtml = re.Replace (finalHtml, 
                            new MatchEvaluator (ViewStateMatch));
    

    The Regex class allows you to wire a match evaluator delegate which kicks in every time a match is found. The ViewStateMatch delegate is implemented as follows:
    private static string ViewStateMatch (Match m)
    {
      return string.Concat ("<div>", m.Groups[1].Value, "</div>");
    }
    

    If you were to implement Step 2 and use this filter as-is right now you'd run into some issues with post-back processing. Why's that? View the page source. Your __doPostBack method will look something like this:
    function __doPostBack(eventTarget, eventArgument)
    {
     var theform;
     if (window.navigator.appName.toLowerCase().indexOf("netscape") > -1) 
     { theform = document.forms["mainForm"];}
     else 
     { theform = document.mainForm; }
    
     ...
    }
    

    The gotcha here is that the form is referenced by its name, not id. If we get rid of the name attribute it can't handle postbacks. With the name attribute it's not valid XHTML code. Seems to be a catch-22 situation.
    The following hack is of my own making. So far it has worked fine on this site and our www.custfeedback.com site, so I can't complain. However, keep in mind this is a hack so use it wisely and test your code well before going to production.
    I decided to rewrite the __doPostback method to use DOM as opposed to the "old ways". This is to say, "To hell with old and bad browsers". Browser usage stats show that the ones without DOM1 support are almost extinct. Therefore assess your audience and see if this is going to work for you.
    // If __doPostBack is registered, replace the whole function
    if (finalHtml.IndexOf ("__doPostBack") > -1)
    {
     try
     {
      int     pos1 = finalHtml.IndexOf ("var theform;");
      int     pos2 = finalHtml.IndexOf ("theform.__EVENTTARGET", pos1);
      string  methodText = finalHtml.Substring (pos1, pos2-pos1);
      string  formID = Regex.Match (methodText,«
              "document.forms//[/"(.*?)/"//];",
              RegexOptions.IgnoreCase).«
              Groups[1].Value.Replace (":", "_");
    
     finalHtml = finalHtml.Replace (methodText,  
         @"var theform = document.getElementById ('" + formID + "');");
    
     }
     catch {}
    }
    

     

     

    http://weblogs.asp.net/scottgu/archive/2003/11/25/39620.aspx

     


    The transformed __doPostback should look similar to this:
    function __doPostBack(eventTarget, eventArgument)
    {
     var theform = document.getElementById ('mainForm');
    
     ...
    }
    

    This one will keep the validator happy. And last, but not least, we're supposed to remove the name attribute from the main form.
    // Remove the "name" attribute from <form> tag(s)
    re = new Regex("<form//s+(name=.*?//s)", RegexOptions.IgnoreCase);
    finalHtml = re.Replace(finalHtml, new MatchEvaluator(FormNameMatch));
    

    A corresponding match evaluator delegate is implemented like this:
    private static string FormNameMatch (Match m)
    {
     return m.ToString ().Replace (m.Groups[1].Value, string.Empty);
    }
    

    Installing the Request Filter
    I prefer to wire a request filter in an HttpModule . The nuts and bolts of the HttpModule and HttpApplication classes are outside the scope of this article. You can find a brief overview in my other article, ASP.NET Custom Error Pages.
    Below is bare-bones code of an HttpModule :
    // ---------------------------------------------
    public void Init (HttpApplication app)
    {
     app.ReleaseRequestState += new EventHandler(InstallResponseFilter);
    }
    
    // ---------------------------------------------
    private void InstallResponseFilter(object sender, EventArgs e) 
    {
     HttpResponse response = HttpContext.Current.Response;
    
     if(response.ContentType == "text/html")
           response.Filter = new PageFilter (response.Filter);
    }
    

    The app parameter passed to the Init method is of type System.Web.HttpApplication . You tap into the ASP.NET HTTP pipeline by wiring handlers of the various HttpApplication events. The diagram on the left illustrates the sequence of these events. See how late in the game your page filter is called? In the code sample above I install the response filter in the ReleaseRequestState event handler. To make sure the filter processes only pages I explicitly check for content type:
    if (response.ContentType == "text/html") 
           response.Filter = new PageFilter (response.Filter);      
    

    The final step of plugging your HttpModule into the pipeline is listing it in web.config (also explained in my other article):
    <system.web>
     <httpModules>
       <add name="MyHttpModule" type="MyAssembly.MyHttpModule,
              MyAssembly" /> 
     </httpModules>
    </system.web>
    

    Remember to replace MyHttpModule and MyAssembly with appropriate module and assembly names from your project.
    Performance Considerations
    Back when we were implementing the Write method I used a string variable, finalHtml. Keep in mind that in .NET strings are immutable, i.e. you cannot change a string's length or modify any of its characters. For example:
    finalHtml = re.Replace(finalHtml, new MatchEvaluator (FormNameMatch));
    

    The finalHtml variable holds the entire HTML response. When the line of code above runs a whole new string will be allocated and assigned to finalHtml. If you manipulate large strings and do it again and again it may negatively effect performance and breed garbage in memory.
    When Filters Don't Work At All
    One last issue before I wrap up this article. Your filter won't be called at all if you call HttpApplication.CompleteRequest() one way or another. The pipeline will bypass your filter and send an unmodified response. The following methods do call HttpApplication.CompleteRequest() :
  • Server.Transfer()
  • Response.End()
  • Response.Redirect()

  • The only one that doesn't call HttpApplication.CompleteRequest() is Server.Execute() .
    "You lie!!!"No, see for yourselves:
    // --- HttpServerUtility.Transfer ---
    public void Transfer(string path, bool preserveForm)
    { 
     if (this._context == null)
        throw new HttpException(...);
    
     this.ExecuteInternal(path, null, preserveForm);
     this._context.Response.End();
    }
    
    // --- HttpServerUtility.Execute ---
    public void Execute(string path)
    { 
     this.ExecuteInternal(path, null, 1);
    }
    
    // --- HttpResponse.End ---
    public void End()
    { 
     ...
     this.Flush();
     this._ended = true;
     this._context.ApplicationInstance.CompleteRequest();
    } 
    
    // --- HttpResponse.Redirect ---
    public void Redirect(string url, bool endResponse)
    {
     ... 
     if (endResponse)
      this.End();
    }
    

    If HttpApplication.CompleteRequest() is called during an event the ASP.NET HTTP pipeline will interrupt request processing once the event handling completes. If it's of any consolation it will fire the EndRequest event.
    Conclusion
    I hope this article was a wake-up call in terms of programming with web standards in mind. We looked at a real-world example of writing a request filter and enforcing XHTML 1.0 compliancy. This is a highly experimental article as you will most likely discover other gotchas when you validate your pages against the W3C MarkUp Validation Service.
    As indicated at the beginning of the article, ASP.NET 2.0 is supposed to bring to the table a host of useful features. Why bother with request filters then? Why all this hacking? XHTML compliancy is promised anyway. Well, we can just sit around and drool over the features of Whidbey, Yukon and what have you. We have real jobs, real projects and a real paycheck. Besides, I illustrated only one practical application of a filter. There are many more.
    There's a common misconception that ASP.NET is easy to master and that it just takes care of everything for you. Not so. ASP.NET is not easy. It's powerful. It puts you in the driver's seat. Therefore hacking doesn't go away any time soon.