Posts

 

Ident Engine

Without much conscious thought, most of us have built identities across the web. We have filled in profiles, uploaded photos, videos, reviews and bookmarks. The Ident Engine uses semantic web API’s to bring together these web footprints.

Ident Engine site /> A List Apart – Ident Engine

The Ident Engine is part of a year long personal project into combining social graph data with other open data sources. With over 1.3 billion hCard profiles on the web and RDF growing in strength, the semantic web has reached a tipping point. I built the Ident Engine to show that constructing user experiences by blending semantic web APIs and other open data sources is not only practical, but offers exciting opportunities.

Related projects

Identify – Firefox add-on /> Identify is a Firefox add-on that combines identities across various social network/media sites. Its uses the Ident Engine library.

Social Graph Explorer /> This is the original server-side application on which the Ident Engine library is based.

Microformat API – OAuth /> This experimental demo showes how OAuth can be used to add privacy to open data sources like microformats and feeds.

Presentations

I have given a number of talks about blending social graph data and other open data sources over the last year. The slide deck from Twiist.be in Leuven (below) contains the most update version of my /> talks.

Articles

Resources

If you are interested in this area I have listed a numberof different resources you may find useful.

  • Identity
  • Microformats
  • Projects

Collecting favicons

Icons will help users quickly scan lists of social media sites. However, pulling favicons directly from sites is a lot more challenging than you may first think. Google have provided a simple API to do this.

http://www.google.com/s2/favicons?domain=flickr.com

The only draw back I have found with using Google’s favicons API is that all the icons are on white backgrounds. I have created a set of PNG favicons with transparent backgrounds for the 70 social media sites which are part of the Ident Engine download or you could try Paul Lloyd’s beautifully designed set of Social Media Icons.

When creating an “elsewhere on the web” listing try and make them as easy to read as possible. Full URLs or even domain names are hard to scan read, always try and use the site name, i.e. http://www.flickr.com/people/glennjonesnet/ is displayed just as Flickr (glennjonesnet). It is important to display the username to reassure the user that you have found the right account.

Ident Engine - Web addresses

The Ident Engine will accept 4 different ways of inputting a web account address

  • http://twitter.com/glennjones
  • twitter.com glennjones
  • twitter glennjones
  • glennjones@twitter.com

There is currently a lot of discussion about finding the easiest way for a user to describe their ownership of a web account. Most systems currently provide a URL that represents an individual, but these can be long and hard to remember i.e.:

  • http://twitter.com/glennjones
  • http://upcoming.yahoo.com/user/62673/
  • http://www.mybloglog.com/buzz/members/glennjones/hcard
  • http://www.linkedin.com/in/glennjones

The OpenID community has found that asking users to type in their full URL causes some user experience issues and often creates a barrier to the user.

Any type of web identity discovery system needs two visible pieces of information to work. The DNS name of the service to query and an identifier for the user. The hidden third element is the context of the query i.e. we are searching for information about a person and not some other type of entity.

To facilitate easy parsing it is useful to have a delimiting character that divides the two pieces of information.

How you then construct the input string to identify an individual is less of a technical problem and more an issue of ease and convention. The one I personally find easiest to use is an abbreviated version of the URL: /> /> http://twitter.com/glennjones becomes twitter.com glennjones

The pattern is the web sites DNS then a space and then the user identifier i.e. username. With a closed system like Ident Engine you can go one step further and accept just the name for the website.

The other pattern gaining traction at the moment is the Webfinger email structure. This aims to make the most use of the fairly well ingrained understanding of how you construct an email address and translate that convention into creating an address for any web account:

http://twitter.com/glennjones becomes glennjones@twitter.com

This works best when there is a corresponding email account but can feel a little strange where one does not exist like the Twitter example above.

I have yet to find any publically available user research which backs up either approach.

  • Identity
  • Projects

Ident Engine - On Google Code

As well as helping to manage the source code Google’s Code site provides a number of other features such as an RSS feed for change notification and an area to raise issues. In the future if people wish to contribute to the development of Ident Engine the site will enable collaborative working.

http://code.google.com/p/identengine/

  • Projects

Experiments in Data Portability

I have post a screencast with video’s of the demo’s and synced audio from the podcast

On Tuesday, I gave a skillswap talk about some of the experimental work I have been engaged in. For the last few months I have had a growing interest in the current data portability design patterns used in social media sites. As we start to see the maturing of the earlier technologies such as OpenID and growth of some exciting new ones like OAuth and XRDS-Simple, focus is moving to the user experience. Through this presentation I try to look at current patterns of data portability that could be implemented now and look at some of the lessons that could be leant from the early adopters.

I showed some of what is currently possible using the Google Social Graph API coupled with a Microformats parser in a little application called theSocial Network Explorer. The other major demo was my work on a Microformats API which makes use of OAuth to provide user controlled release of private data.

Finally I asked the question do we really own our data in the same way as we own other property. If the value of data decays over time should we be looking more at systems which make use of attention metrics.

Experiments in Data Portability /> /> /> />

View SlideShare presentation or Upload your own. (tags: stack open)

Skillswap has always been a vibrant part of the Brighton web scene and it was great to finally get around to giving a talk having listened to many great skillwap talks in the past. Bruce also gave a great talk on OAuth and issues around the password anti-pattern.

James Box will be releasing a podcast of the audio soon.

URL’s for topic in the presentation

  • Data Portability
  • Identity
  • Microformats

Microformats to Portable Contacts API converters

I have been doing some research work into the new Portable Contacts API. It’s designed to enable users to securely port their friend lists or address books from one site to another.

Currently most social networking and address book sharing sites have their own proprietary contacts API’s. These API’s often provide some sort of distributed authorisation model for a developer to code against. If you integrate a few of these different interfaces into your site the levels of complexity become nightmarish.

From this tangle was born the password anti-pattern, this is where you ask the user for their username and password and then log in to another site as them and scrape the data you need. This is a bad idea on many counts, but looking at the historical alternatives you can understand why most developers took this route.

OAuth was designed to help remove the complexity of multiple distributed authorisation models. Whereas the Portable Contacts API tackles the second element of friend lists sharing, by providing common API and discovery.

Portable Contacts API is built on open specifications which can be used across sites. It uses a number of pre-existing technologies with its data structures based around OpenSocial and vCard, which should create a common access pattern and contact schema for everyone to use.

I have built a number of interfaces to help evaluate any data loss or added ambiguity that may occur when converting microformats into the Portable Contacts API data schema.

They do not yet provide the querying, sorting and pagination nor the endpoint URL elements of the specification. It’s also worth mentioning that Portable Contacts API is still in development and these interfaces are based on the Draft C specification.

  • Microformats
  • Projects

OAuth.net

OAuth is one of those technologies that when first understood makes your mind race with the possibilities. Authentication is a dry subject at the best of times, but OAuth shines not because the technology is cool, but because it has the capability to fundamentally change the way people use the web.

Today Madgex open-sourced OAuth.net, which as it sounds is an OAuth library for .net. This is a spin off from an internal research project we are working on.

OAuth.net provides full OAuth consumer and provider support to the core specification. The library facilitates secure API authentication in a simple and standard method for desktop and web applications. We are putting the full source code with a MIT licence on Google code.

http://lab.madgex.com/oauth-net/ /> http://code.google.com/p/oauth-dot-net/

The thing that really stuck in my mind when I started to investigate OAuth was the joy of the user experience. With sometimes as little as 2 clicks I can share data between sites, which would of taken me ages to re-enter. Somehow this /> experience has that nice feeling like using the iPhone interface for the first time. Maybe it’s just because I am so sick of re-entering things over and over again.

I have been fascinated with allowing users to share information for over 3 years now, whatever the name used Portable Social Networks, Social Graph or Data Portability.

There is a vast amount of publically available data about each of us embedded in the pages of sites we use. There are hundreds of millions of social network profile pages. The Microformats community has done a great job defining practical ways to extract semantic structures of data, but the issues of privacy, authentication and authorisation are beyond its scope.

OAuth allows users to share data between sites which they do not wish to make public. It does this without resorting to the heinous practice of asking a user to hand over their account details for other sites.

If you have a Google mail account take a look at this demo. It is a very simple demo, but it shows you some of the power of the concept.

http://lab.madgex.com/oauth-net/googlecontacts/

There are also demos for Fire Eagle and extracting protected Microformat resources.

OAuth.net is the first full library for .net. Our hope is that by sharing some of our work it will help move forward the adoption of OAuth. I will be talking about OAuth and OAuth.net at Barcamp London 5 on 27/28 September. Bruce Boughton one of the libraries developers will be talking about the project at Barcamp Brighton 3 on 6/7 September.

Enjoy

  • Projects

Microformats test-suite concept

I have been working on a concept for a new microformats test-suite. I need a comprehensive test-suite before I can move the development of UfXtract forward. Rather than just build something in isolation I thought it would be nice to find a way to share this work with the community.

I have written two POSH patterns “testsuite” and “testfixture”. They follow the principles of microformats design:

  • They are self-describing and created with HTML ,with no hidden metadata
  • Building a test should be easy even for those who are HTML authors
  • They are not linked to any one programming language and should be easy to share
  • They allow for the creation of an in-browser Testrunner

For example go to

http://ufxtract.com/testsuite/hcard/

The earliest tests for hcard used vcards to describe the expected output. As the community has moved forward it has designed microformats which are independent of external specification. So this test-suite is designed around the concept of a standardised data structure. In this case, expressed in JSON, but they could be converted into XPaths to test XML or other languages.

I have started to build a small console app, which will spider the HTML and create NUnit/C# class files for my build tests. Although this is specific to my own parsers development, it should be easy to do the same for other programming languages and projects.

Parsing “testsuite” and “testfixture”. I have already setup UfXtract to parse these patterns into JSON/XML. It would not take much for other microformats parser developers to construct profiles for these POSH patterns.

Here is an example of the output. /> ttp://lab.backnetwork.com/ufXtract/?url=http%3A%2F%2Fufxtract.com%2Ftestsuite%2Fhcard%2Fhcard1.htm&format=test-fixture&output=json

The Testrunner

If you go to http://ufxtract.com/testsuite/hcard/hcard1.htm and press Alt X you can see a working demonstration of the testrunner.

I have observed that most parser developers are using comparative testing as their main tool to quickly understand how the complex rules and optimizations are applied. So I have built a JavaScript Testrunner which allows for simple /> comparative testing between parsers.

It uses a number of techniques to standardise both access to the parsers API’s and the JSON output. Please note that at this stage the JSON standardisation process can cause a test to be marked as failed when it could be judged to have passed. Most of the current differences in parser output are down to whether a value is stored as single property or an array of properties.

At the moment the Testrunner is only working with the testfixure , it would not take much to extend the Testrunner to run a whole test-suite.

I would love to add Operator and other parsers to the Testrunner.

Proof of concept

This is very early proof of concept stuff. What I would like to ask is a number of questions before moving it forward.

  • Are people interested in the idea of shared test-suites?
  • What do you think to the approach?
  • Can you see any big issues with the concepts?
  • If you already have tests/test-suites, would you be willing to add them to theproject?
  • Would you be interested in contributing to a project like this?
  • Microformats
  • Projects

Semantic Camp London

I spent this weekend at Semantic Camp learning all about RDF and speaking about parsing microformats. Although the focus of the event was very small, it attracted a nice group of people who were all passionate about open data portability in one way or another. I think I have a much clearer view of how RDF fits into the concepts of the semantic web. Over the next few weeks I am going to take some time to explore SPARQL. I had a chance to talk to a number of interesting people

I used my talk to try and answer the question Drew McLellan posed a year ago in his presentation Can Your Website be Your API? I used some of my experience building ufXtract and parsing social networks information to see if it is possible in the real world.

Presentation: Can your website be your API and real life (pdf 1.25Mb)

/> /> />

I believe with a few small changes you can create successful ‘read only APIs’ using only microformats embedded in your html. The Q&A moved onto whether you could create ‘read/write APIs’ and some sort of HTTP verb discovery mechanism.

After my talk I spent some time chatting with Dan Brickley author of FOAF. We exchanged a few ideas about where microformats and RDF are going and talked about parsing issues. Dan has made me think hard about the lack of a formal framework for developing microformats parsers. This was reinforced by

Gareth Rushgrove discussion on different microformats parsers. There are three things that seem to be missing:

  1. A codified specification, something along the lines of XSD or even a propriety profile that all the current parsers could use.
  2. A strong set of tests which we could use to check whether a parser works to a pre-defined standard. We should have both positive and negative tests to make things like required attribute validation work.
  3. A standard output format so that developers can abstract libraries easily swapping from one to another. It would also allow us to do programmatic comparative testing.

This whole area is something that needs to be raised on the microformats-dev list.

Andrew Walkingshaw did a great talk on automatic indexing using natural-language processing. The whole area of NLP keeps re-surfacing in my work at Madgex and I am becoming more convinced that there is a place for this technology in the aggregation of structured and unstructured information sources. The guys from the BBC should have a special mention for all the great work they demonstrated.

Jon Linklater-Johnson created Semantopoly for the event, yet another great effort from the man who gave us the very cool CSS specificity card game. Just take a look at the flickr pictures, Semantopoly was a great game made better by the twitters that where created around it.

I would like to thank to Tom and Daniel for organising a fantastic event and to the sponsors who helped support it: BBC Backstage, Hedgehog Lab, OpenLink Software, Osmosoft and Talis.

  • Events
  • Microformats

Data formats:

API