inis 16(7): e2

Research Article

Knowledge Extraction Framework for Building a Largescale Knowledge Base

Download800 downloads
  • @ARTICLE{10.4108/eai.21-4-2016.151157,
        author={Haklae Kim and Liang He and Ying Di},
        title={Knowledge Extraction Framework for Building a Largescale Knowledge Base},
        journal={EAI Endorsed Transactions on Industrial Networks and Intelligent Systems},
        volume={3},
        number={7},
        publisher={EAI},
        journal_a={INIS},
        year={2016},
        month={4},
        keywords={Knowledge base; knowledge extraction, knowledge graph.},
        doi={10.4108/eai.21-4-2016.151157}
    }
    
  • Haklae Kim
    Liang He
    Ying Di
    Year: 2016
    Knowledge Extraction Framework for Building a Largescale Knowledge Base
    INIS
    EAI
    DOI: 10.4108/eai.21-4-2016.151157
Haklae Kim1,*, Liang He2, Ying Di2
  • 1: Samsung Electronics Co., Ltd., (Maetan dong) 129, Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do, 16677, Korea
  • 2: Samsung Electronics (China) R&D Center, Chuqiaocheng, 57# Andemen Street, Yuhuatai Distict, Nanjing, Jiangsu, PRC 210012
*Contact email: haklaekim@gmail.com

Abstract

As the Web has already permeated to life styles of human beings, people tend to consume more data in online spaces, and to exchange their behaviours among others. Simultaneously, various intelligent services are available for us such as virtual assistants, semantic search and intelligent recommendation. Most of these services have their own knowledge bases, however, constructing a knowledge base has a lot of different technical issues. In this paper, we propose a knowledge extraction framework, which comprises of several extraction components for processing various data formats such as metadata and web tables on web documents. Thus, this framework can be used for extracting a set of knowledge entities from large-scale web documents. Most of existing methods and tools tend to concentrate on obtaining knowledge from a specific format. Compared to them, this framework enables to handle various formats, and simultaneously extracted entities are interlinked to a knowledge base by automatic semantic matching. We will describe detailed features of each extractor and will provide some evaluation of them.