If you're developing a web site, especially a public facing site, search and indexability are probably high on your list of requirements and priorities. Yesterday, searchability of rich internet applications got extra attention with Adobe's announcement that it is providing technology to search engines to improve indexability of swf/flash-based applications.
This is particularly interesting to me as I think getting the right SEO behavior for RIAs is based on looking at end-to-end solutions that involve complementary server-side techniques (guess that is not totally unexpected from someone like me who has been working on asp.net). I've also been presenting on Search and RIA and enabling indexability for Silverlight and Ajax apps for a couple of years at various conferences... links to those below.
There has always been a question mark around indexability of RIAs, whether they're built in Flash, or Silverlight, or even Ajax. The fundamental problem is that static indexing of a RIA is likely to turn up only the user interface of the application, and not the interesting and meaningful data fetched by application logic and presented dynamically to the user. Indexing an application binary or script is akin to having desktop search index winword.exe instead of your documents... not very useful. Most folks are now seeing indexing something like a raw swf binary as less and less useful, as applications become more and more dynamic.
The two key things around improving SEO (besides various general techniques like URL canonicalization, friendly URLs and search-engine friendly URL rewriting) are ensuring indexability and facilitating relevance. Indexability is created through addressing the what content is visible to the crawler, as well as where the crawler should look. Relevance is primarily addressed through creating deep linkable content and interesting content (so folks actually link to it).
The Adobe/Google announcement takes indexing one a step further beyond indexing static binary content, by attempting to simulate human behavior and interacting with the application to extract textual content and links from the application. I can see how automated clicks and the like might allow the crawler to cause an application to execute some partial logic, but a lot of application interaction is driven off of actual meaningful text input (eg. keywords in a search input box) where meaningful often depends on the specific application in question. The announcement does not go into any details... somewhat strange, I think, so there is naturally some guessing going on. The comment stream also contains a good mix of folks questioning whether the approach will even work? (for example here and here)
It is interesting to see the buzz - it is good to see search engines at least begin to think beyond indexing static HTML. Technically speaking, this sort of approach to indexability lends itself to Silverlight apps as well pretty easily. First a Silverlight application packaged in a xap file is easily cracked open without a special SDK - it is simply a zip file after all. Any static textual xaml content is easily parsed by virtue of being XML. Second it is easy to embed and extract metadata via an additional file within the zip archive. Third, the Silverlight DOM itself can be easily walked and inspected programmatically to detect all text, links and images that are being visualized by the control. Finally, it is possible to automate the application thanks to the extensible API that Silverlight offers for enabling accessibility and screen reader capabilties. Additionally, Silverlight apps can also support deep linking which is also important for facilitating relevance. Essentially, Silverlight provides simple APIs to allow the app to easily consume the URL it was loaded from, and use information on the URL query string to load and display appropriate data.
All that said, it will be interesting to see how well this approach pans out, as there are a number of challenges in simulating a user realistically, especially without any hints provided by the application developer. In the mean time, this is as good as any a time to share my slide deck on building indexable Silverlight applications that I used in my presentation at SMX, a search conference that took place last month here in Seattle.
The deck above illustrates the pattern for supplying alternate content with sample markup. At a high-level, the approach can be summarized as combining client-side logic with server-side rendering and sitemaps to address the "what" and "where" of indexability. The specific implementation of the pattern is interesting in how it achieves alternate content without requiring the developer to implement two applications and do double work.
I especially like the pattern because it works today, across search engines, applies to Silverlight/Flash apps as well as Ajax apps, and has a number of side-benefits around networking optimization and graceful degradation in script-less environments as listed in the deck above. Its always nice to pick a single pattern that can help solve multiple problems that Web developers encounter regularly. I first blogged about this approach back in 2007 right after MIX07. I had a chance to present it once more at MIX08, and you can actually check out the presentation and demo in the session video (skip ahead about 42 minutes into the talk for the part on indexability).
Any questions on the approach? Feel free to ask below. Also, I am curious what are your thoughts on the alternate content approach, or on the overall subject of search for RIAs?
Posted on Wednesday, 7/2/2008 @ 6:34 AM
| #
Silverlight