Post

ufXtract microformats parser

ufXtract is a new microformats parser I have built to help explore the real world issues of creating portable social networks. Although I have previously designed a number spiders that can find the most common hCard and XFN structures, this is my first full blown parser. It has been built from the ground up to take configuration objects which allow the parsing of different microformats or POSH patterns. It was important that I could parse more general patterns such as the joint hCard-XFN being promoted by the microformats community for use with friend’s lists.

http://lab.backnetwork.com/ufXtract/ /> Now superseded by later version/site http://ufxtract.com/

After some further testing I am going to start to produce a number of portable social network demo’s and posts. This should also provide others with experimental API’s. By sharing this early work I hope in some way to add to the important technical and architectural discussions that are taking place.

I have already added hCard-XFN, rel=”me”, rel=”next” and hAtom to the parser. These are the four cornerstone microformats/patterns required to gather profile and content from other social networks. Although for technical/speed reasons ufXtract is currently only parsing the hEntry sub-element of hAtom.

The component also contains extendable output options, so far, I have built a simple text format for debugging, JSON and XML for building services. For the more technically minded ufXtract is a .net component written in c#. It uses a combination of DOM structures and xPaths. It can typically parse a page in 50-200ms.

At the moment, I am building a test suite to fine tune the components’ compliancy. It still has some small issues with most of the compound microformats, which I am trying to address. If you have any comments or want to point out issues please add them below, I would like as much feedback as possible.

  • Microformats
  • Projects

Data formats:

API