Web Document Analysis and its Application to Anti-phishing

The World Wide Web is expanding at an amazing speed in fact it is now the primary information and knowledge database. Many web documents are gathered, which need intelligent processing and examination for intelligent programs. In this project, we explore the web document analysis technique and also build a software to anti-phishing. For Web document analysis, a visual factor based page segmentation approach is offered and executed. Using the W3C DOM model of HTML, this method first breaks down the full webpage into many separate salient blocks, that are visually and semantically consistent within each block but distinguishable between adjacent blocks. In the next step, the method aggregates these salient blocks into semantically important blocks as per their positions and visual cues in the web page. In such as bottom-up method, the technique ultimately develops a hierarchical segmented blocks tree. We use our web page segmentation to the Anti-Phishing problem. Phishing webpages normally display related visual styles and structure with their target ones. Determined by web page segmentation, we suggest three metrics (block level similarity, layout similarity, and over-all style similarity) to observe the visual similarities between a phishing page and its target. If one of them exceeds a specific threshold, a phishing alarm is issued. We have put together a prototype system to demonstrate the business model of our anti-phishing mechanism, and feel our approach may be used as an enterprise solution for anti-phishing.

Chapter 1 Introduction
1.1 Problem Description
1.1.1 Webpage Segmentation
1.1.2 Anti-phishing
1.2 Motivation
1.3 Contributions
1.4 Thesis Organization

Chapter 2 Literature Review
2.1 Web Document Analysis
2.2 Webpage Segmentation
2.3 Phishing Webpage Detection
Chapter 3 Web Document Analysis
3.1 Related Works
3.2 Salient Block Decomposition
3.3 Block Clustering
3.3.1 Location and Appearance clues
3.3.2 Semantics clues
3.4 Experiments
3.4.1 Prototype System
3.4.2 Evaluation Results
3.5 Conclusions on Web Document Analysis
Chapter 4 Phishing Webpage Detection
4.1 Related Works
4.2 The Anti-Phishing Approach
4.3 Similarity Assessment
4.3.1 Block Level Similarity
4.3.2 Layout Similarity
4.3.3 Overall Style Similarity
4.4 Experiments
4.5 Conclusions on Phishing Webpage Detection….

Source: City University of Hong Kong

Download URL 2: Visit Now

Leave a Comment