SEO for Ajax and Silverlight Applications

This post contains a writeup of an Ajax pattern I demonstrated at one of my MIX07 talks around improving the indexability of Ajax and Silverlight, and in general RIA applications as part of an SEO effort...

Search engine optimization or SEO is a key thing for web sites and key ingredient is indexability, which works well for static content. However, the model breaks down as applications become more dynamic and start using Ajax or RIA-based technologies such as Silverlight (or Flash) to dynamically fetch and display content using client-side logic. In fact, often the HTML page simply becomes a shell containing presentation and behavior, and doesn't contain the meaningful data that needs to be indexed. The fact that the application is data-driven implies that there wouldn't be much useful data to extract from it, even if it were to be indexed. I'll additionally claim that most post-back based sites also suffer from lack of indexability, because fundamentally the model that works with search engines is navigation.

At MIX07, I presented a pattern in my Ajax patterns talk around improving indexability for Ajax and Silverlight applications in the context of a slide show example. I am putting down in writing, what I described on-stage, with the hope of fleshing it further based on comments. As feared, writing this turned out to be lengthier than expected. However, I hope this will be an interesting read. This is mostly a discussion of the pattern itself but if you're interested in seeing a working sample as well, check out the SlideShow server control in the presentation download.

I still don't have a good name for the pattern. Perhaps I'll call it "HTML Data Channel for RIAs" ... suggestions on naming are also welcome... or is there an existing name for this approach? At a high level, the idea is to basically use static HTML or more accurately said, semantically correct markup, embedded in the page as the data delivery mechanism for an RIA.

The scenario from my demo is as follows: I have a page that has a flickr-like tag cloud and a slide show with fancy transition effects for displaying the photos matching the selected tag. I am using Ajax and Silverlight to implement the next/previous interaction rather than post-backs or navigation to separate pages. Furthermore, tag selections in the tag cloud are handled also in client-side code that uses XMLHttp to fetch the list of photos, and updates the slide show in-place. The problem of course is that the application doesn't score high on indexability. The search engine sees this application as a page with empty placeholder that the client-side code happens to fill with an image on the fly. The goal then is to serve up a page that is search engine friendly without trying to detect a search engine crawler on the server, and without maintaining a parallel static version of the page.

So here is what the SlideShow server control renders out.

<div style="width:0px;height:0px;overflow:hidden;">
  <object id="myPhotosSlideShow" class="slideShow" type="application/ag-plugin">
    ...
  </object>
</div>
<script type="text/javascript">
  document.write('<div style="display: none">');
</script>
<div>
  <div>
    <img src="[photo #1 url]" alt="[photo #1 title]" />
    <span>[photo #1 description]</span>
  </div>
  <div>
    <img src="[photo #2 url]" alt="[photo #2 title]" />
    <span>[photo #2 description]</span>
  </div>
  ...
</div>
<script type="text/javascript">
  document.write('</div>');
</script>

Essentially the SlideShow control does two things - first it renders out the rich view, a Silverlight <object> tag in this case, that is hidden by default, and dynamically turned visible by client script. Secondly, it offers the capabilities of a Repeater control with an ItemTemplate that it uses for generating some alternate static content that it surrounds with a dynamically rendered hidden container.

Here is what happens in a regular browser, with script enabled: The two script blocks, one at the top of alternate content and one at the bottom, execute and the document.write calls result in surrounding the static content within a <div style="display: none"> element. This ensures that the alternate content doesn't flash in the page as it first gets rendered by the browser, only to be instantly hidden by some initialization script (like you see in some Ajax pages). I used to generally dislike document.write, but sure enough, the API has a nice characteristic that can be leveraged here - it generates markup that is parsed in order as if it were part of the document rendered by the server.

Here is what happens when the page is seen by the search engine, where script is disabled: The two script blocks never get to execute, and as a result the static content isn't wrapped up in a hidden container. Furthermore the static content is designed to be simple and focused on expressing data, not layout or style, and uses a set of tags such as <img>, <span>, <h1>, <h2>, ..., <a> etc. for what they semantically mean (hence the notion of semantically correct markup) so that it indexes well.

You might be wondering why the <object> tag is invisible by default. This is so that it doesn't consume space on the page if the user is viewing a page with script turned off. If script is enabled, the script gets to toggle its visibility. An interesting fall-out of this mechanism is that a reasonable script-disabled experience of page can be accomodated without too much of pain, if that is a requirement for your application.

In terms of the scenario at hand, the script on the page extracts the images and descriptions to be shown in the slide show from the static HTML content (or the server could render out another copy of the data as a JSON blob). This has another benefit. In lots of Ajax pages, script often issues an XMLHttp request upon page load to fetch the initial data. This consumes an additional request back to the server, and also increases the perceived load time. This additional connection and latency can simply be avoided by serving up the first page's worth of data in the initial page rendering.

<aside>There is one side note I have to make here (particularly interesting to me given I get to work on both server and client sides of the platform)... server-side rendering of pages and HTML is still very much useful in the RIA world, even if post-backs are used less and less as logic moves into client-side code and services on the server.</aside>

To complete the indexability discussion, there is one additional problem to solve. If you have a large dataset, you aren't going to embed it all as static content in one shot. Instead, you're going to use some sort of paging or filtering mechanism. In this scenario, the list of tags on the page are the filters. Clicking on one of them causes the script to fetch the specific section of data and update the slide show in-place. However, you certainly want the search engine to index all your data. This is where site maps come into play to complement and complete this pattern for achieving indexability.

It is typical that you will use query string parameters (or better yet, rewritten friendly URLs) to filter down the results and the static data sent down to the client in the initial page request. Effectively your page URLs start to look like SlideShow.aspx?tag=travel or perhaps SlideShow.aspx/travel. Rather than simply having the one SlideShow.aspx URL as the single entry point for your RIA, each URL variation becomes an entry point. While the client-script on your page continues to work unchanged by simply extracting the data it operates on from the static data to start with, and using XMLHttp requests to fetch additional data as the user interacts with the page, your site as a whole now has a set of URLs representing every section of your entire data set, and they can be listed in a site map. Something like so:

<urlset>
  <url>
    <loc>http://.../SlideShow.aspx?tag=travel</loc>
  </url>
  <url>
    <loc>http://.../SlideShow.aspx?tag=nature</loc>
  </url>
  ...
</urlset>

If you have questions or some feedback, or related ideas please do share using the comment form... I am sure there are other ways to think of this problem... given the importance, and the amount of discussion around this topic.


[ Tags: | | | | ]
Posted on Thursday, 5/24/2007 @ 2:36 PM | #Ajax


Comments

24 comments have been posted.

sdether

Posted on 5/24/2007 @ 4:12 PM
I guess this really counts as "separate script disabled experience" rather than separate rendering for search engines, since most engines consider rendering different content for their bots as deception and can get you blacklisted.

Considering that rich experiences are becoming so commonplace, you'd hope that search engines would allow some kind of attached or inline RDF markup to better service the SEO need. I suppose it would be just ripe for SEO abuse.

Nikhil Kothari

Posted on 5/24/2007 @ 4:27 PM
Actually the whole point is that you're rendering the same content for all clients, and your client logic is simply attaching a different visualization/behavior for the same content when script is enabled. Of course this depends on how you interpret "rendering different content" ... will be interesting... would certainly be interested in getting a point of view from one or more of the search engines...

Don D

Posted on 5/24/2007 @ 5:05 PM
To extend the hiding pattern a bit, you can use take advantage of css selectors to make this a little easier. Say you have several places on a page that you want hidden: it would be a bit tedious to have each control render out the script and document.write and display:none bits for each one, and error prone if it doesn't come from one control. Instead, you can add a css rule that goes something like ".jsEnabled .niftyArea {display:none;}", add the .niftyArea class to the div/tag containing your special stuff, and then add the .jsEnabled class to the body tag with js (write a script tag with code doing this after the body tag of the document). To a robot without javascript, the css selector that does the hiding won't apply, because .jsEnabled class never got added to anything. For everyone else, any block with the class .niftyArea will be hidden.

Henrik N

Posted on 5/24/2007 @ 10:23 PM
You are probably aware of this already, but there are a couple of wide-spread scripts for doing exactly this (some part at least) for flash. Check out SWFObject and UFO. (Was not allowed to include the links!)
I think they are really much nicer packaged than this document.write and inline style method. One should be able to do something similar here. Keep up the good work!

Nikhil Kothari

Posted on 5/24/2007 @ 10:31 PM
Don: Not sure the css class approach works - the goal is if script is disabled, the content should be visible, and if script is enabled, it should be invisible, but furthermore shouldn't flash away... If content is visible by default, then if you set it invisible in script using css classes or any other dynamic style setting approach it will result in a momentary flash/content relayout. If content is invisible by default, it defeats the purpose of this whole approach, since script may not run to turn it visible. Hence the document.write as the only way to satisfy all requirements.

Henrik: SWFObject and similar scripts are about adding an object tag into the page. The problem being discussed here is not about inserting an object tag dynamically but about using HTML markup as the data model. Two unrelated points...

Henrik N

Posted on 5/25/2007 @ 12:20 AM
Well I don't think the are that unrelated. You write a good deal about indexability and I think that one of the main reasons for using SWFObject is to be able to present an alternative html content, just like you do with your document.write elements.

An if you use it together with a flash movie that extracts that html alternative content just like you describes (I have actually done just that in a project) you have a similar scenario (if I'm not totally misreading you).

Nikhil Kothari

Posted on 5/25/2007 @ 5:54 AM
Henrik: I didn't know SWFObject had the capability to present alternate content, but that said, if I am not mistaken in guessing how it works, it doesn't help here in the context of indexability, as it would essentially be running script to show the alternate content, and that script simply doesn't run when a page is indexed by a crawler.

Ian O

Posted on 5/25/2007 @ 7:39 AM
The CSS approach mentioned by Don should work, although I normally something like:

document.documentElement.className = 'js'

in the head section of the html: this adds a class of 'js' to the html element. You then use this construct you CSS selectors like this:

div#static { display:block; }
.js div#static { display:none; }

Of course, you could always use the DOM to add the dynamic content in once the page has loaded to enhance the accessibility of the page.

Jason Nussbaum

Posted on 5/25/2007 @ 8:34 AM
Nikhil - the way SWFObject works is by replacing a div's content with the Flash content at runtime. Since the DIV's original content is/can be/should be filled in (either manually or via a server side tag/repeater/etc) with alternate content (eg: a list of thumbnails linking to full images), it does achieve the indexability required.

Nikhil Kothari

Posted on 5/25/2007 @ 9:00 AM
Ian - I know how you can apply css rules... but try actually using css to hide content when script is enabled, and to leave content visible when script is not without resulting in an intermediate re-rendering-caused flash in the UI... its not quite possible.

Jason - I don't see how ... the script engine does *not* run script, so anything you do at runtime doesn't achieve indexability.

Don D

Posted on 5/25/2007 @ 10:52 AM
You won't get the flash of unstyled content if you use the script to do the DOM modification (adding the css class to the body/html tag) soon enough (in the head or immediately after the body tag, in either case before you run other, potentially longer running, scripts). Yes there is an inherent race condition there, but it is fairly safe one to bet on. The same race is there in the original post's method (adding document.write with styled div tags) anyway, so the modification I'm suggesting isn't really different, just extending it for simplicity.

It would be nice to know exactly when a particular browser (re)applies css rules vs. scripts to see how best to avoid the flash of unstyled content. Most folks do it the empirical way: trial and error. This is a very useful technique for SEO and accessibility purposes, however you do it, and having it as a habit can make life much easier.

volkan

Posted on 5/27/2007 @ 11:45 PM
Hello Nikhil,your web development helper project files can not be download,can you interested in this,please.

Wesley Bakker

Posted on 5/28/2007 @ 3:03 AM
http://www.w3schools.com/tags/tag_noscript.asp

tried to make full comment, but all words unallowed

POhEe.com

Posted on 5/28/2007 @ 7:12 PM
Really good article. Would you mind to tell us what is the result after you implement these? Is it really push up your ranking google?

Nikhil Kothari

Posted on 5/28/2007 @ 11:00 PM
PohEe: The reason indexability is important is so that when the search engine crawler discovers your page, it has good and meaningful data/content to index. This ensures your content is searcheable and given 70+% of traffic to most sites comes from search engines, this is super-critical if you have a site. Searchability and indexability are essentially prerequisites discoverability, monetization, etc. etc. Ranking depends on indexability, but also depends on inbound links.

Martin Normark

Posted on 5/29/2007 @ 11:28 AM
Surely a great idea on how to get AJAX and RIA's more search engine friendly...

But isn't this another variation of what SEO' describes as cloaking? (Which also happens to be strictly forbidden by the search engines...)

Whether the distinguishing between robot or user happens by the web-server itself before the page renders - og it happens by using scripts - it still does something similar: Shows one version to the users, and another version to the search engines.

How do you look on that?

James Crowley

Posted on 6/14/2007 @ 1:25 PM
Nikhil - I'll be interested to read your full article when it arrives. This is something I'm actively investigating, given the number of clients who now want full "web 2.0" functionality, alongside full SEO-optimisation.

Just to follow up on Jason's point about SWFObject - he is entirely correct. You place the accessible, seo-ed content within the content area - this appears to search engines, and if javascript is disabled. You then attach a bunch of javascript, that when enabled, and when flash is available, swaps out the content area for the appropriate Flash object tag - which works very nicely. I'm sure the exact same could be done for Silverlight.

Robert S. Robbins

Posted on 7/16/2007 @ 8:36 AM
The worst SEO offense that Microsoft development tools commit is encouraging you to use underscores in your file names. Underscores in file names don't help you in the search engines because only dashes count to separate keywords in the file name. Somebody would have to be searching using keywords with underscores to find web pages named with underscores. I know FrontPage 2003 would encourage you to use underscores in file names by generating new pages that use them. I have not checked to see if Expression Web has retained that bad habit.

Stephen Cassels

Posted on 7/31/2007 @ 6:49 AM
Hi

I like your solution and have something similar on our site to present Image galleries with no AJAX only Javascript, see;

www.pkc.gov.uk/Education and learning/Schools/Schools - development projects/Investment in Learning/North Inch Community Campus/North Inch Community Campus image gallery.htm

The primal requirement being WCAG, test it out with JavaScript and CSS turned off and I believe you get the SEO version. I took an excellent idea developed by Thierry Koblentz and plugged it into a Microsoft Content Management Server template. My understanding is that SEO and Accessibility are very similar, the search engine bot is often talked of in terms of a screen reader user.

I'm glad to read that SEO and Accessibility are being worked on in the AJAX / Silverlight world.

Ali Rıza Babaoglan

Posted on 8/18/2007 @ 3:31 AM
I am interested in Search Engine Optimization. But it is the first time that i heard about SEO for Silverlight Applications.
Thanks for a good summary.

Jason

Posted on 9/7/2007 @ 11:31 AM
If you called an ashx page as the src for a script tag and it output all the values and a document.write of the contents would this been seen by the robots or would this text not get picked up at all since its in a script tag?

ie

<script src="myjsgen.ashx" type="text/javascript"></script>

This would output something like document.write('some content here');

Since looking at the source of the generated content does not show the document.write - would it be safe to assume that spiders / robots will not see content generated this way either?

Nikhil Kothari

Posted on 9/8/2007 @ 12:25 PM
Jason - Yes, you can certainly link to a script - that by itself shouldn't make a difference.

Benj Arriola

Posted on 11/30/2007 @ 11:17 AM
I totally agree with you Nikhil. Good points. I played around with AJAX and SEO just to learn how to do it well and ended up with this ajaxoptimize.com

Clean links on the plain content layer and my presentation design layer is what has the js events.

Softweb Solutions

Posted on 5/2/2008 @ 10:58 PM
Thanks for the good point for seo..

Regards
Arpit Kothari
Sr. SEO Expert
The discussion on this post has been closed. Please use my contact form to provide comments.