<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DATAMI</title>
	<atom:link href="http://www.datami.co.uk/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.datami.co.uk</link>
	<description>Just another WordPress site</description>
	<lastBuildDate>Thu, 29 Mar 2012 11:08:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>The DATAMI Interface</title>
		<link>http://www.datami.co.uk/?p=82</link>
		<comments>http://www.datami.co.uk/?p=82#comments</comments>
		<pubDate>Thu, 29 Mar 2012 08:58:15 +0000</pubDate>
		<dc:creator>Mathieu d'Aquin</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[datami]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[interaction]]></category>
		<category><![CDATA[interface]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[tag-cloud]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://www.datami.co.uk/?p=82</guid>
		<description><![CDATA[As described in the previous post, one of the advantages of relying on flexible semantic technologies is that access to the information only requires querying the knowledge base that was built out of monitoring the web traffic of the user (with the datami-proxy) and processing the corresponding resources to be semantically annotated (with the datami [...]]]></description>
			<content:encoded><![CDATA[<p>
As described in <a href="http://www.datami.co.uk/?p=68">the previous post</a>, one of the advantages of relying on flexible semantic technologies is that access to the information only requires querying the knowledge base that was built out of monitoring the web traffic of the user (with the <a href="http://www.datami.co.uk/?p=24">datami-proxy</a>) and processing the corresponding resources to be semantically annotated (with the <a href="http://www.datami.co.uk/?p=59">datami text processing component</a>). That makes it possible for use the build an interface entirely in HTML/Javascript, that focuses on providing appropriate visualization and interaction mechanisms to the user.
</p>

<p>
The current version of this interface has now been <a href="http://code.google.com/p/datami/source/browse/#svn%2Ftrunk%2Fdatami-interface">released in our code base</a> (with the associated <a href="http://code.google.com/p/datami/wiki/RunningTheDatamiInterface">short documentation</a>). At the moment, it looks like the screenshot below. Compared the the version in the <a href="http://www.datami.co.uk/?p=46">demo video</a>, the ability to define filters based on the date had been added. It is now possible to ask about the popular entities that I encountered &#8220;last week&#8221; for example.
</p>

<center>
<div id="attachment_86" class="wp-caption alignnone" style="width: 510px"><a href="http://www.datami.co.uk/wp-content/uploads/2012/03/Screen-shot-2012-03-28-at-15.16.37.png"><img src="http://www.datami.co.uk/wp-content/uploads/2012/03/Screen-shot-2012-03-28-at-15.16.37.png" alt="datami-interface screenshot" title="datami-interface screenshot" width="500" class="size-full wp-image-86" /></a><p class="wp-caption-text">datami-interface screenshot</p></div>
</center>

<p>
The interaction with this interface is reasonably straightforward: entities and websites are shown in tag clouds. Hovering over one of them would show, in a tool-tip, more information about it and clicking on it would add a filter. Using the calendar at the top, the two boundaries of a range of dates can also be selected as filter. Hovering over the type of entities in one of the three entity clouds allows to select the type to display, depending on the entities available with the current filters.
</p>

<p>
In terms of development, the core of the interface is showing the results of SPARQL queries, similar to the <a href="http://www.datami.co.uk/?p=68">one described before on this blog</a>, adding to them particular patterns to reflect the current filters. Additional work is now required, especially not make the interface more appealing to users.
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datami.co.uk/?feed=rss2&#038;p=82</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Querying my semantic personal web history knowledge base</title>
		<link>http://www.datami.co.uk/?p=68</link>
		<comments>http://www.datami.co.uk/?p=68#comments</comments>
		<pubDate>Wed, 28 Mar 2012 22:28:12 +0000</pubDate>
		<dc:creator>Mathieu d'Aquin</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[discussion]]></category>
		<category><![CDATA[datami]]></category>
		<category><![CDATA[interface]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[rank]]></category>
		<category><![CDATA[score]]></category>
		<category><![CDATA[sparql]]></category>

		<guid isPermaLink="false">http://www.datami.co.uk/?p=68</guid>
		<description><![CDATA[The beauty of the DATAMI application is that, through the combination of the datami-proxy and the datami text processing component, what is being built is a complete knowledge base of what the user, me, encounters on the Web. The architecture of the application shows an additional component to score and rank entities from this knowledge [...]]]></description>
			<content:encoded><![CDATA[The beauty of the DATAMI application is that, through the combination of the <a href="http://www.datami.co.uk/?p=24">datami-proxy</a> and the <a href="http://www.datami.co.uk/?p=59">datami text processing component</a>, what is being built is a complete knowledge base of what the user, me, encounters on the Web. The <a href="http://www.datami.co.uk/?p=12">architecture</a> of the application shows an additional component to score and rank entities from this knowledge base. In reality, the interface (as in the <a href"http://www.datami.co.uk/?p=46">demo video</a>) only needs to send the right queries to the knowledge base (as materialized in a SPARQL 1.1 compliant triple store) to obtain scored and ranked entities. 
<br/>
<br/>

So let&#8217;s look at what sort of queries we need to get, say, the 25 most popular places in my web history.
<br/>
<br/>

The <a href="http://www.datami.co.uk/?p=24">datami-proxy</a> gives me indication about the websites that have been accessed, through requests. So basically, if I want to know which websites I have accessed, I need to query the &#8216;?ws&#8217; such that 
<pre>
        ?r &lt;http://weblifelog.com/ontology/toSite&gt; ?ws
</pre>
?r representing the request.
<br/>
<br/>

The annotation produced by the Stanbol Enhancer Service are then connected to the requests through a &#8220;related to&#8221; relation. So, if I wanted to know about all the entities ?x I encountered in websites, I would need to add:
<pre>
        ?r &lt;http://datami.co.uk/ontology/relatedTo&gt; ?ea.
        ?ea &lt;http://fise.iks-project.eu/ontology/entity-reference&gt; ?x
</pre>
using the stucture of entity annotation returned by the enhancer service (i.e. the &#8220;entity reference&#8221; relation). Adding to it that the type of the entity should be &#8220;place&#8221;:
<pre>
        ?ea &lt;http://fise.iks-project.eu/ontology/entity-type&gt; &lt;http://dbpedia.org/ontology/Place&gt;
</pre>
I get (as ?x) all the places I encountered through my online activities.
<br/>
<br/>

Now what we want is the 25 &#8220;most popular&#8221; ones. Starting simple, we can use as a score for popularity the number of websites mentioning the entity, so the query that would obtain the 25 most mentioned one (in SPARQL 1.1) would be:
<pre>
select distinct ?x (count(distinct ?ws) as ?nws)
where {
   ?r &lt;http://datami.co.uk/ontology/relatedTo&gt; ?ea. 
   ?ea &lt;http://fise.iks-project.eu/ontology/entity-reference&gt; ?x.
   ?r &lt;http://weblifelog.com/ontology/toSite&gt; ?ws.
   ?ea &lt;http://fise.iks-project.eu/ontology/entity-type&gt; &lt;http://dbpedia.org/ontology/Place&gt;
} group by ?x order by desc(?nws) limit 25
</pre>
Easy!
<br/>
<br/>

Adding a tiny bit of complexity, we can then also retrieve the label of the entity and use the confidence returned by the enhancer service as part of the score. The query then becomes:
<pre>
select distinct ?x ?l (count(distinct ?nws) * avg(?conf) as ?score)
where {
   ?r &lt;http://datami.co.uk/ontology/relatedTo&gt; ?ea.
   ?ea &lt;http://fise.iks-project.eu/ontology/entity-reference&gt; ?x.
   ?r &lt;http://weblifelog.com/ontology/toSite&gt; ?ws.
   ?ea &lt;http://fise.iks-project.eu/ontology/confidence&gt; ?conf.
   ?ea &lt;http://fise.iks-project.eu/ontology/entity-label&gt; ?l.
   ?ea &lt;http://fise.iks-project.eu/ontology/entity-type&gt; &lt;http://dbpedia.org/ontology/Place&gt;
} group by ?x ?l order by desc(?score) limit 25
</pre> 
<br/>

And that&#8217;s done! All that is left to add to get the DATAMI interface is to add filters for selected entities, websites and times, and the possibility to change the type of the entities considered. ]]></content:encoded>
			<wfw:commentRss>http://www.datami.co.uk/?feed=rss2&#038;p=68</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The DATAMI text processing component</title>
		<link>http://www.datami.co.uk/?p=59</link>
		<comments>http://www.datami.co.uk/?p=59#comments</comments>
		<pubDate>Tue, 27 Mar 2012 15:42:26 +0000</pubDate>
		<dc:creator>Mathieu d'Aquin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.datami.co.uk/?p=59</guid>
		<description><![CDATA[Following the release of the datami-proxy, the next piece of the architecture of the DATAMI application to be made available is its text processing component. To a large extent, this is where the &#8216;magic&#8217; of the application happens: this tool is the part of the architecture in charge of identifying important/significant entities within the elements [...]]]></description>
			<content:encoded><![CDATA[<p>
Following the release of the <a href="http://www.datami.co.uk/?p=24">datami-proxy</a>, the next piece of the <a href="http://www.datami.co.uk/?p=12">architecture</a> of the DATAMI application to be made available is its text processing component. To a large extent, this is where the &#8216;magic&#8217; of the application happens: this tool is the part of the architecture in charge of identifying important/significant entities within the elements of the user&#8217;s web history. This is where we use <a href="http://incubator.apache.org/stanbol/">Apache Stanbol</a>, developed in the <a href="http://www.iks-project.eu/">IKS project</a>; specifically taking benefit from the included <a href="http://incubator.apache.org/stanbol/docs/trunk/enhancer.html">enhancer service</a>.
</p>
<p>
Since the Stanbol enhancer service is doing most of the work for us, the component is actually quite simple (see the <a href="http://code.google.com/p/datami/source/browse/#svn%2Ftrunk%2Fdatami-process-text">source code</a> and <a href="http://code.google.com/p/datami/wiki/RunningTheTextProcessingComponent">short documentation</a> on our code base). It basically queries the triple store populated by the datami-proxy to obtain new texts to process, call the enhancer service from Stanbol and store the resulting annotations into another triple store (in connection with the information about the web interactions that led to the text). 
</p>
<p>
The results however are very interesting: as the datami-proxy collects simple data about online resources and the text processing component enhances it with information about the semantic entities they contain, what is being built and constantly updated is a knowledge base of the things I encounter online. 
</p>]]></content:encoded>
			<wfw:commentRss>http://www.datami.co.uk/?feed=rss2&#038;p=59</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>DATAMI Initial Demo</title>
		<link>http://www.datami.co.uk/?p=46</link>
		<comments>http://www.datami.co.uk/?p=46#comments</comments>
		<pubDate>Wed, 21 Mar 2012 18:42:34 +0000</pubDate>
		<dc:creator>Mathieu d'Aquin</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[alpha]]></category>
		<category><![CDATA[datami]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[interface]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://www.datami.co.uk/?p=46</guid>
		<description><![CDATA[We have been very busy in the last couple of weeks setting up an initial interface and demo: a first demonstrator of what DATAMI will be. This is clearly still far from the final product as we envisage it, but it shows the potential of the idea: background processes collecting and processing information about my [...]]]></description>
			<content:encoded><![CDATA[We have been very busy in the last couple of weeks setting up an initial interface and demo: a first demonstrator of what DATAMI will be. This is clearly still far from the final product as we envisage it, but it shows the potential of the idea: background processes collecting and processing information about my online activities to provide me with a personal, semantic web history dashboard. But, since a video is better than I don&#8217;t really know how many words, here it is:<br/>

<iframe width="560" height="315" src="http://www.youtube.com/embed/LZPlQtQelp0?rel=0" frameborder="0" allowfullscreen></iframe><br/>

All the data on which this demo relies has been collected using the <a href="http://www.datami.co.uk/?p=24">datami-proxy</a> running on my personal laptop. It is then transparently processed (in batch running every hour) through the <a href="http://incubator.apache.org/stanbol/docs/trunk/enhancer.html">stanbol enhancer services</a> to identify entities in the texts of the websites I encountered, as well as additional information about these entities. All these data are kept in a semantic store, which is being directly queried by the javascript interface.

Many improvements are still needed before DATAMI could be made accessible, some of them more tricky then others:
<ul>
<li>Adding a way to explore and filter by ranges of dates</li>
<li>Adding information about the connections between the displayed entities</li>
<li>Putting some design into the interface&#8230; it is rather ugly and in need of a better look and feel</li>
<li>Improving data processing and querying to make it faster, and so that it can fit entirely on a standard laptop. This will require in particular to implement clever data trimming and caching mechanisms</li>
<li>Improving the entity ranking mechanism. At the moment, we use a combination of the popularity of the entity (how many websites mention it) and of the confidence with which Stanbol identifies it. However, Stanbol sometimes come up with rather funny results, and more sophisticated approaches might be needed to separate the important entities from the irrelevant ones.</li>
</ul>

The next few posts will be dedicated to the implementation of the components of this demo, and to updates regarding the progress on these improvements. So, watch this space!

]]></content:encoded>
			<wfw:commentRss>http://www.datami.co.uk/?feed=rss2&#038;p=46</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>First version of the datami-proxy</title>
		<link>http://www.datami.co.uk/?p=24</link>
		<comments>http://www.datami.co.uk/?p=24#comments</comments>
		<pubDate>Thu, 08 Mar 2012 11:29:58 +0000</pubDate>
		<dc:creator>Mathieu d'Aquin</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[collection]]></category>
		<category><![CDATA[http traffic]]></category>
		<category><![CDATA[local proxy]]></category>
		<category><![CDATA[loggin]]></category>
		<category><![CDATA[proxy]]></category>
		<category><![CDATA[RDF]]></category>
		<category><![CDATA[recording]]></category>
		<category><![CDATA[semantic history]]></category>
		<category><![CDATA[triple store]]></category>
		<category><![CDATA[Web history]]></category>
		<category><![CDATA[web trafic]]></category>
		<category><![CDATA[weblifelog]]></category>
		<category><![CDATA[weblifelogging]]></category>

		<guid isPermaLink="false">http://www.datami.co.uk/?p=24</guid>
		<description><![CDATA[As shown in our initial Architecture Diagram, at the basis of the DATAMI application is a mechanism to record the Web traffic generated by a user, to generate a &#8220;Web History Store&#8221;, containing enough information for the IKS Stanbol services to do their &#8220;enrichment&#8221; magic. This mechanism is essentially based on a &#8220;Local Logging Proxy&#8221;, [...]]]></description>
			<content:encoded><![CDATA[As shown in our initial <a href="http://www.datami.co.uk/?p=12">Architecture Diagram</a>, at the basis of the DATAMI application is a mechanism to record the Web traffic generated by a user, to generate a &#8220;Web History Store&#8221;, containing enough information for the IKS Stanbol services to do their &#8220;enrichment&#8221; magic. This mechanism is essentially based on a &#8220;Local Logging Proxy&#8221;, called the datami-proxy, and the first operational version of this tool is now available in our <a href="http://code.google.com/p/datami/">code base</a>.

<h2>How does it work?</h2>

The datami-proxy is essentially (like most Web proxies) a Web server that, instead of directly delivering content, captures requests to other websites and redirects them to their intended destination. There can be a lot of different reasons to use a proxy, but in our case, the goal is to log information about the corresponding Web traffic. In other terms, this tool will monitor all the Web traffic coming in and out of the computer, and store (in a semantic triple store) information about the websites accessed, the time and the textual content of the responses. The aim is, in the next component of the architecture, to annotate this textual content in order to obtain a semantically enriched personal Web history.
<br/>

In practice, the datami-proxy runs as a background process, alongside a semantic triple-store. The computer/browser of the user needs to be configured so that the datami-proxy is used as Web proxy for all HTTP requests. A <a href="http://code.google.com/p/datami/wiki/RunningTheDatamiProxy">wiki page in the code base</a> provides all the instructions necessary to install, configure and run the datami-proxy on a user&#8217;s computer. Once configured, the datami-proxy should in principle run transparently for the user, not affecting their Web experience.

<h2>Tests, limitations and next steps</h2>

While the current version of the datami-proxy is only a development version, and many issues still need to be fixed, our overall experience with it is very positive. I have personally been running it on my computer for more than a month now, and I&#8217;m quite amazed about how, without me noticing, a knowledge base of everything I&#8217;m doing on the Web is slowly building up on my hard-drive (and being synced at regular intervals with an external store). In other terms, I now have the supreme power to query my own Web history for all sort of crazy things (how many google searches do I do per day? How many of the websites I have visited mention my name? At what time of the day am I the most active on the Web?). Soon, thanks to semantic enhancement, I will actually have an interface to do that, which will connect the different entities I encounter on the Web which each other.
<br/>

Of course, as an &#8220;in development&#8221; system, there are still quite a few issues to be solved. For example, the datami-proxy does not handle cookies and sessions very well, meaning that I often have trouble authenticating to online websites (e.g., this blog). I currently tackle that by keeping a browser not configured to use the datami-proxy, and where I do all the things that the datami-proxy can&#8217;t handle. Besides tackling the other components of the DATAMI architecture, the continuous development and improvement of the tool will mean that, soon, we should obtain a more robust version that would seamlessly handle any type of Web traffic. ]]></content:encoded>
			<wfw:commentRss>http://www.datami.co.uk/?feed=rss2&#038;p=24</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Initial Architecture Diagram</title>
		<link>http://www.datami.co.uk/?p=12</link>
		<comments>http://www.datami.co.uk/?p=12#comments</comments>
		<pubDate>Thu, 02 Feb 2012 12:42:49 +0000</pubDate>
		<dc:creator>Mathieu d'Aquin</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[diagram]]></category>
		<category><![CDATA[enhancing]]></category>
		<category><![CDATA[recording]]></category>
		<category><![CDATA[visualising]]></category>

		<guid isPermaLink="false">http://www.datami.co.uk/?p=12</guid>
		<description><![CDATA[The nice thing about a project like DATAMI, which is mostly being developed from scratch, is that we always get to start with a kind of architecture diagram. In my view, while it does not really get into the details of what&#8217;s going on, this is still the best way to do project planning: identify [...]]]></description>
			<content:encoded><![CDATA[The nice thing about a project like DATAMI, which is mostly being developed from scratch, is that we always get to start with a kind of architecture diagram. In my view, while it does not really get into the details of what&#8217;s going on, this is still the best way to do project planning: identify the individual components and tasks, the dependencies between them, and their requirements. We made one on a whiteboard a couple of weeks ago with <a href="http://kmi.open.ac.uk/people/member/carlo-allocca">Carlo</a> and I though that translating it into a human readable form (turns out the DATAMI team, i.e., Carlo and I, are not exactly human) would be the right first &#8216;serious&#8217; post here.

<div id="attachment_15" class="wp-caption alignnone" style="width: 560px"><a href="http://www.datami.co.uk/wp-content/uploads/2012/02/datami-archi.png"><img src="http://www.datami.co.uk/wp-content/uploads/2012/02/datami-archi-1024x768.png" alt="Architecture overview of DATAMI" title="Architecture overview of DATAMI" width="550" class="size-large wp-image-15" /></a><p class="wp-caption-text">Architecture overview of DATAMI</p></div>

I won&#8217;t go into the details of it now, but the basic principle is that there are three main activities: recording, enhancing and visualising. Recording corresponds to capturing the online activities of the user, through a local web proxy. This tool already exists in an initial form and will be finalised soon. Enhancing is where the <a href="http://www.iks-project.eu/projects/apache-stanbol">IKS Stanbol services</a> are being used, extracting additional semantic information related to the online resources being encountered by the user through their online activities. Visualising is the last bit of the puzzle, where we will need to find a way to enable users to view, understand and explore their semantically enhanced web history. The whole data management and communication aspect of this architecture will be handled through the use of semantic technologies, with information being stored in RDF triple stores (<a href="http://incubator.apache.org/jena/documentation/serving_data/index.html">Fuseki</a>) and the exchange of information between components being tackled using the <a href="http://www.w3.org/TR/rdf-sparql-query/">SPARQL</a> and <a href="http://www.w3.org/TR/sparql11-update/">SPARQL-Update</a> protocols.
]]></content:encoded>
			<wfw:commentRss>http://www.datami.co.uk/?feed=rss2&#038;p=12</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Welcome to DATAMI</title>
		<link>http://www.datami.co.uk/?p=1</link>
		<comments>http://www.datami.co.uk/?p=1#comments</comments>
		<pubDate>Thu, 26 Jan 2012 17:01:38 +0000</pubDate>
		<dc:creator>Mathieu d'Aquin</dc:creator>
				<category><![CDATA[discussion]]></category>

		<guid isPermaLink="false">http://www.datami.co.uk/?p=1</guid>
		<description><![CDATA[The DATAMI project/application intends to support users in managing their interactions with content on the Web. More and more information is being exchanged between individual users and a large variety of online organizations. From the point of view of the Web user, these exchanges are happening in a fragmented, un-managed way, which makes it harder [...]]]></description>
			<content:encoded><![CDATA[The <a href="http://datami.co.uk">DATAMI</a> project/application intends to support users in managing their interactions with content on the Web. More and more information is being exchanged between individual users and a large variety of online organizations. From the point of view of the Web user, these exchanges are happening in a fragmented, un-managed way, which makes it harder for them to take full benefit from this content, to obtain an overview of their online activities and make efficient reuse of these activities. We will build on the Stanbol Services to produce a semantic personal Web history for users, enriched with information about the various types of entities (websites, people, organisations, places) they encounter. 

The need for such types of application relate in particular to scenarios where individuals rely on information found on the Web, but without necessarily having at their disposal the tools, time or capacity to organise the online content they interact with. In other terms, what DATAMI intends to do is to provide ways to automatically, without affecting the user&#8217;s normal activities, monitor and organise the resources he/she is interacting with, so that it can be later explored, re-used and retrieved.
]]></content:encoded>
			<wfw:commentRss>http://www.datami.co.uk/?feed=rss2&#038;p=1</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
