First version of the datami-proxy

· development
As shown in our initial Architecture Diagram, at the basis of the DATAMI application is a mechanism to record the Web traffic generated by a user, to generate a “Web History Store”, containing enough information for the IKS Stanbol services to do their “enrichment” magic. This mechanism is essentially based on a “Local Logging Proxy”, called the datami-proxy, and the first operational version of this tool is now available in our code base.

How does it work?

The datami-proxy is essentially (like most Web proxies) a Web server that, instead of directly delivering content, captures requests to other websites and redirects them to their intended destination. There can be a lot of different reasons to use a proxy, but in our case, the goal is to log information about the corresponding Web traffic. In other terms, this tool will monitor all the Web traffic coming in and out of the computer, and store (in a semantic triple store) information about the websites accessed, the time and the textual content of the responses. The aim is, in the next component of the architecture, to annotate this textual content in order to obtain a semantically enriched personal Web history.
In practice, the datami-proxy runs as a background process, alongside a semantic triple-store. The computer/browser of the user needs to be configured so that the datami-proxy is used as Web proxy for all HTTP requests. A wiki page in the code base provides all the instructions necessary to install, configure and run the datami-proxy on a user’s computer. Once configured, the datami-proxy should in principle run transparently for the user, not affecting their Web experience.

Tests, limitations and next steps

While the current version of the datami-proxy is only a development version, and many issues still need to be fixed, our overall experience with it is very positive. I have personally been running it on my computer for more than a month now, and I’m quite amazed about how, without me noticing, a knowledge base of everything I’m doing on the Web is slowly building up on my hard-drive (and being synced at regular intervals with an external store). In other terms, I now have the supreme power to query my own Web history for all sort of crazy things (how many google searches do I do per day? How many of the websites I have visited mention my name? At what time of the day am I the most active on the Web?). Soon, thanks to semantic enhancement, I will actually have an interface to do that, which will connect the different entities I encounter on the Web which each other.
Of course, as an “in development” system, there are still quite a few issues to be solved. For example, the datami-proxy does not handle cookies and sessions very well, meaning that I often have trouble authenticating to online websites (e.g., this blog). I currently tackle that by keeping a browser not configured to use the datami-proxy, and where I do all the things that the datami-proxy can’t handle. Besides tackling the other components of the DATAMI architecture, the continuous development and improvement of the tool will mean that, soon, we should obtain a more robust version that would seamlessly handle any type of Web traffic.


Comments RSS
  1. DATAMI Initial Demo – DATAMI linked to this post.
  2. The DATAMI Interface – DATAMI linked to this post.

Leave a Comment