eCite Digital Repository

A multimodel fusion engine for filtering webpages


Deng, Z and He, T and Ding, W and Cao, Z, A multimodel fusion engine for filtering webpages, IEEE Access, 6 pp. 66062-66071. ISSN 2169-3536 (2018) [Refereed Article]


Copyright Statement

Copyright 2018 IEEE.

DOI: doi:10.1109/ACCESS.2018.2878897


Fusing multiple existing models for filtering webpages can mitigate the shortcomings of individual filtering models. To provide an engine for such fusion, we propose a multimodel fusion engine for filtering webpages for the extraction of target webpages. This engine can handle large datasets of webpages crawled from websites and supports five individual filtering models and the fusion of any two of them. There are two possible fusion methods: one is to simultaneously satisfy the conditions of both individual models, and the other is to satisfy the conditions of one of the two individual models. We present the functions, architecture, and software design of the proposed engine. We use recall ratio (RR) and precision ratio (PR) as the evaluation indices of the filtering models and propose rules describing how PR and RR change when individual models are fused. We use 200 000 webpages collected by crawling the popular online shopping website "" as the experimental dataset to verify these rules. The experimental results show that two-model fusion can improve either PR or RR. Thus, the proposed engine has good practical value for engineering applications.

Item Details

Item Type:Refereed Article
Keywords:multimodel, fusion, engine design, webpage filtering, data
Research Division:Information and Computing Sciences
Research Group:Computer vision and multimedia computation
Research Field:Pattern recognition
Objective Division:Information and Communication Services
Objective Group:Information systems, technologies and services
Objective Field:Information systems, technologies and services not elsewhere classified
UTAS Author:Cao, Z (Dr Zehong Cao)
ID Code:131576
Year Published:2018
Web of Science® Times Cited:2
Deposited By:Information and Communication Technology
Deposited On:2019-03-23
Last Modified:2019-05-13
Downloads:33 View Download Statistics

Repository Staff Only: item control page