University of Tasmania
Browse
131576 - A multimodel fusion engine for filtering webpages.pdf (1.82 MB)

A multimodel fusion engine for filtering webpages

Download (1.82 MB)
journal contribution
posted on 2023-05-20, 02:10 authored by Deng, Z, He, T, Ding, W, Cao, Z
Fusing multiple existing models for filtering webpages can mitigate the shortcomings of individual filtering models. To provide an engine for such fusion, we propose a multimodel fusion engine for filtering webpages for the extraction of target webpages. This engine can handle large datasets of webpages crawled from websites and supports five individual filtering models and the fusion of any two of them. There are two possible fusion methods: one is to simultaneously satisfy the conditions of both individual models, and the other is to satisfy the conditions of one of the two individual models. We present the functions, architecture, and software design of the proposed engine. We use recall ratio (RR) and precision ratio (PR) as the evaluation indices of the filtering models and propose rules describing how PR and RR change when individual models are fused. We use 200 000 webpages collected by crawling the popular online shopping website “http://www.jd.com” as the experimental dataset to verify these rules. The experimental results show that two-model fusion can improve either PR or RR. Thus, the proposed engine has good practical value for engineering applications.

History

Publication title

IEEE Access

Volume

6

Pagination

66062-66071

ISSN

2169-3536

Department/School

School of Information and Communication Technology

Publisher

Institute of Electrical and Electronics Engineers

Place of publication

United States

Rights statement

Copyright 2018 IEEE.

Repository Status

  • Open

Socio-economic Objectives

Information systems, technologies and services not elsewhere classified

Usage metrics

    University Of Tasmania

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC