At Yahoo!, I spend a fair amount of time on the Web. Not just wasting time on My Yahoo! and Digg, but trying to make sense of the manifold ways that sites expose structured (and not-so-structured) data. There isn't enough agreement about how a web page should describe, say, sports scores or movie showtimes or
WebPath is a little project of mine that provides a way to query the web of structured data hidden in messy markup, from the most simple query (like "does this page use microformats?") up to complex, web-spanning requests (like "can I get to a movie review within two clicks from here?").
The ideas behind WebPath were published at the XML 2007 conference, where lots of great conversations started. Now, more discussions are happening, since the full source code behind the project has been released under a BSD-style license. So, what makes WebPath worth looking at?
Regardless of what type of markup a site uses under the hood, my team -- the Structured Web Group -- looks for ways to pull that structured information into an index where it can be sliced and diced in ways that help searchers get instant gratification via immediate answers to their queries. To accomplish this, the team needed to get a feel for which approaches work best with particular sites. It turns out that as these requirements surfaced, a local Hack Day was coming up, providing the opportunity we needed to try some experimental approaches. In little more than 24 hours, the project progressed from scratches in a notebook to a working implementation.
WebPath at this point is a technical endeavor, suitable for those who enjoy command-line programming. It's written in Python. (It pretty much had to be, given the timeline of its development!) You could consider WebPath a kind of XPath engine, to use the W3C jargon, but its main focus is on the "Web" part of its name. Webpath includes access to a module that accepts messy and invalid pages--just like the ones all over the Web--and "tidies" them up into something easier to work with and analyze.
What's next for WebPath as an open source project? Already developers are getting involved, using WebPath in their own projects, pitching in to offer new features, and, of course, helping fix bugs! I maintain a plain-text todo
list of areas for immediate attention, but in the bigger picture I'd like to wire this up to Hadoop as a front end for truly web-scale queries. You too can help shape the future of this project. If you've taken a look at WebPath, I'd love to hear from you.
-m
Golden Web photo by Cyron.
Popularity: 30% [?]
Built for the most recent Yahoo! Hack Day, NewsGlobe is a fun new way to browse Yahoo! News Top Stories. It pulls together two existing Yahoo! services and takes advantage of the performance enhancements in the latest Adobe Flash Player.
The NewsGlobe consists of three basic pieces: a Yahoo! News Top Stories RSS feed, a geo-encoding web service from Yahoo! Maps, and a free, open-source library of 3D classes for ActionScript 3 called Papervision3D. The application loads the Yahoo! News RSS feed every few minutes and extracts the dateline for each story. It sends this descriptive textual information to the Maps service to find a matching location and thereby return a latitude/longitude coordinate. Then it's simply a matter of using the 3D classes in ActionScript to create a visually engaging experience that's either automated or interactive.
Papervision3D makes it incredibly easy to create a 3D scene, add 3D objects to it, and specify where the camera (i.e., the user's viewpoint) should be located. For each story location where we could discern a lat-long coordinate, we draw a marker object and place it in the proper position on a sphere representing the Earth. The display is calculated and drawn in real time. This allows us to animate the view over time and even let the user change the view by interacting with the objects in the scene.
Since the final product itself is a SWF file, NewsGlobe works online as a web application or off -network as a scaled-down, embedded customizable badge. It could easily be integrated into a Yahoo! Widget or packaged as an Adobe AIR application to run locally on the desktop. By passing in different RSS feeds or search terms, it'd be possible to filter the news and watch stories occurring in a specific part of the world, from a particular category, or matching other keywords.
Who built it
Popularity: 30% [?]
Yossi Vardi - prime mover at ICQ many web years ago (recently covered by Scoble at Davos) quoted Teddy Roosevelt at a Techcrunch event a few months back:
"It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again, because there is no effort without error and shortcoming; but who does actually strive to do the deeds; who knows great enthusiasms, the great devotions; who spends himself in a worthy cause; who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly, so that his place shall never be with those cold and timid souls who neither know victory nor defeat."
Nice going, hackers. You guys are -- seriously -- the reason why I work here at Yahoo!.
(Slideshow photos from Maximum Mitch )
Popularity: 10% [?]
PhotoSoup is a visual word puzzle generator built with tag-photo pairs from Flickr. This prototype uses public, Creative-Commons-licensed photos and Flickr's open APIs. The object of the game is to find all the tags hidden (up, down, across, and diagonal) in the puzzle square. The photos around the perimeter of the puzzle are the clues -- the player views a series of photos and tries to discover the associated tag-words. The player's objective is to find all the tags hidden in the puzzle before time runs out.
PhotoSoup began as a hack for Yahoo! Europe's internal hack day back in October 2007. When our puzzle hack won the "Coolest Hack" prize, we were inspired to finish what we'd started, and share the fun with the rest of the world.
To create and play a new puzzle, the player enters a topic (or tag), such as "zoo" or "landscape" or "food." Or, you can enter your own Flickr screen name to generate a puzzle built with images from your photostream. Of course, you can also try this with public photos from your friends and contacts, by entering their screen names.
Who built it
How do I get it
Finally, it's worth mentioning that it is also possible to embed a PhotoSoup puzzle on your homepage, or in a blog. Simply follow the link to embed the puzzle, and we will generate the code that you should include.
Popularity: 15% [?]
PhotoSoup started as a hack for an internal Yahoo! Europe Hack Day this past fall. Despite an approaching deadline for participation at the upcoming 2008 WWW conference, we could no longer resist the challenge, and had to participate. Over lunch in Barcelona, we started collecting and discussing ideas we could implement in 24 hours, that would have some potential to win.
We came up with PhotoSoup, a visual word puzzle generator that allows players to create word search puzzles with tag-photo pairs taken from Flickr. The tag is hidden in the puzzle, and the photo is shown as a clue. The objective is to find all hidden tags in the puzzle before you run out of time. The jury loved it, and we won the prize for "coolest hack."
Most of us on the PhotoSoup team work at Yahoo! Research in Barcelona, Spain. The Barcelona lab is one of Yahoo!'s three international research laboratories outside the U.S., and has a truly international character. We currently speak 14 different languages and represent an equal number of nationalities. The Barcelona lab is young -- just recently we celebrated our second anniversary. Our work is focused on topics related to Web retrieval, mining, natural language processing (NLP), and multimedia.
The photo above shows the PhotoSoup team members. Simon Overell is missing from this photo. At the time of our hack, he was an intern at Yahoo! Research, but now he's now back in London on a mission to finish his PhD at Imperial College. Lluis Garcia (Yahoo! Spain) is our man with Flash running through his veins. Lluis single-handedly developed the PhotoSoup front-end in Flex and connected all the back-end components produced by the other team members.
Saludos desde España,
Roelof
Popularity: 12% [?]
When / Where
About Next*