The DATAMI text processing component

· Uncategorized

Following the release of the datami-proxy, the next piece of the architecture of the DATAMI application to be made available is its text processing component. To a large extent, this is where the ‘magic’ of the application happens: this tool is the part of the architecture in charge of identifying important/significant entities within the elements of the user’s web history. This is where we use Apache Stanbol, developed in the IKS project; specifically taking benefit from the included enhancer service.

Since the Stanbol enhancer service is doing most of the work for us, the component is actually quite simple (see the source code and short documentation on our code base). It basically queries the triple store populated by the datami-proxy to obtain new texts to process, call the enhancer service from Stanbol and store the resulting annotations into another triple store (in connection with the information about the web interactions that led to the text).

The results however are very interesting: as the datami-proxy collects simple data about online resources and the text processing component enhances it with information about the semantic entities they contain, what is being built and constantly updated is a knowledge base of the things I encounter online.

1 Comment

Comments RSS

Leave a Comment