Dynamic topic hierarchies and segmented rankings in textual OLAP technology.

Nenhuma Miniatura disponível
Data
2017
Título da Revista
ISSN da Revista
Título de Volume
Editor
Resumo
The OLAP technology emerged 20 years ago and recently has been redesigned so that its dimensions, hierarchies and measures can support the particularities of textual data. Organizing textual data hierarchically can be solved with topic hierarchies. Currently, the topic hierarchy is defined only once in the data cube, i.e., for the entire lattice of cuboids. However, such hierarchy is sensitive to the document collection content. Thus, a data cube cell can contain a collection of documents distinct from others in the same cube, causing potential changes in the topic hierarchy. Furthermore, the text segment used in OLAP analysis also changes this hierarchy. In this work, we present a textual data cube with multiple dynamic topic hierarchies for each cube cell. Multiple hierarchies, since the presented approach builds a topic hierarchy per text segment. Another contribution of this work refers to query response. The state-of-the-art normally returns the top-k documents to the topic selected in the query. We go beyond by returning other text segments, such as the most significant titles, abstracts and paragraphs. The approach is designed in four additional steps and each step attenuates a bit more the impact of building multiple topic hierarchies and segmented rankings per cube cell. Experiments using part of the DBLP papers as a document collection reinforce our hypotheses.
Descrição
Palavras-chave
Data cube, Text database, Ranking, Topic hierarchy
Citação
SOUZA, A. N. de P. e; FORTES, R. S.; LIMA, J. de C. Dynamic topic hierarchies and segmented rankings in textual OLAP technology. Journal of Convergence Information Technology, Gyeongju, v. 12, p. 1-17, 2017. Disponível em: <http://www.globalcis.org/jcit/home/index.html>. Acesso em: 16 jan. 2018.