Tuesday, July 01, 2008

Flash content to be indexed better by google and yahoo

[UPDATE]: Looks like Peter Elst have almost similar reaction to this news and posted a better worded post here

There is a lot of news today about Flash files being indexed better by google and yahoo thanks to some new 'Adobe Flash player technology'. The 'new Flash player technology' will enable search engine crawlers to crawl deeper into swfs and hence index areas of the swf that were previously inaccessible.

If adobe has developed a specialized flash player for search engines to crawl, how about that version of Flash player be released to individual developers as well? A flash player like that will help testing professionals develop automated tests for Flash based applications. Currently Flex application testing can be automated, but that requires developers to add hooks into their Flex applications to let testing software (QTP etc) talk to their application. This new Flash player build might help developers achieve the same for Flash IDE based applications too, without any extra effort.

While the news as a whole seems to be interesting, I have a few concerns.

Both google and adobe say that there will be no change required from developers or companies to 'enable' their content to be searchable. Any text that appears in the SWF becomes searchable tag now. Though on a high level this sounds great, it also rings an alarms. In google's own blogpost about the news, Google says
If you prefer Google to ignore your less informative content, such as a "copyright" or "loading" message, consider replacing the text within an image, which will make it effectively invisible to us.
Umm... Really? Now how is that gonna be practical? Its not just 'loading' and 'copy right'. You'll have to hide text like 'first name', 'last name' etc, if you have a form, and will we have to create images for all of those?? Wouldn't that add a lot of unnecessary hassle to developers now? And if developers were lazy and didn't do this, wouldn't we start seeing a lot of results like this one?

There seems to be some confusion between google and adobe, when adobe says
"As a result, millions of pre-existing RIAs and dynamic Web experiences that utilize Adobe Flash technology, including content that loads at runtime, are immediately searchable ..."

and google says
"We currently do not attach content from external resources that are loaded by your Flash files. If your Flash file loads an HTML file, an XML file, another SWF file, etc., Google will separately index that resource.."

Secondly, if a XML file that a SWF file loads is indexed separately, the XML file will appear in search results, not the SWF. What benefit would that give users now? ( of course google says that they are working on this, so probably this will be fixed)

If google tried indexing an RIA application that has a Java backend, will the spider try hitting the service directly? ( thats what the above statement seems to imply) And if it did really do that, and the service returns a different results each time, how and what will google add to the index? Kinda confusing...

I'd prefer that SWF's text content being searchable should be an opt-in feature. If developers can choose to let search engines see some portion of the text that they need it to index, wouldn't that be better?

Of course the search engines being able to traverse through different paths of navigation within a SWF and being able to find content there is all a good thing, but how will search engines be able to take the user directly to a particular state? ( flex apps have deep linking feature, but not many people use it - and in many places, it might not make sense to use it in a flash app.)

Finally, is it only me who sees that both adobe and Google are trying to say how smart they are by themself? [UPDATE: Jack Schofield of guardian.co.uk did see that the google page dint have any credits to adobe. Umm, they did link adobe's press release though :P ] Adobe's press release says "Adobe Flash Technology Enhances Search Results for Dynamic Content and Rich Internet Applications" and Google goes "Google's ... new algorithm for indexing textual content in Flash files of all kinds" hmm..

No comments: