An Effective Keyword Extraction Method for Videos in Web Pages by Analyzing their Layout Structures
Jongwon Lee Chungkang College 162 Chungkang-ro Majang-myun Ichon-si, Gyunggi-do 467-744, Korea
Abstract- This paper proposes an effective keyword extraction method for the Web videos by analyzing the structure of the Web pages. The proposed scheme calculates the relative importance (or weights) of the text blocks to a video by analyzing the distances of the text blocks to the video. This distance, called the layout distance, indicates a degree of relevance of text block to video, and could be estimated by analyzing the layout structure of Web pages. Since the Web pages with several videos such as Web pages posting UCC videos have a special layout structure, this layout analysis helps to precisely estimate the relevance of text block to the video. This weight of text block is used to compute the final weights of keywords extracted from that text block by analyzing their HTML tags and other well-known techniques such as TF/IDF. Some experiments with 1,087 Web pages that have total 2,462 videos show that the precision of the proposed extraction scheme is 17% higher than ImageRover[1].

Giseok Choi, Juyeon Jang, and Jongho Nang Sogang University 1 Sinsu-dong Mapo-gu, Seoul 121-742, Korea weights are inverse proportional to the layout distances to the video, however, they are adjusted by reflecting the structural characteristics of Web pages with videos. After assigning the weights to the text blocks, the keywords for the video are extracted from all text blocks in the Web pages together with their importance with some well-known techniques such as TF/IDF and HTML tag analyses. The final weights of keywords for the video are calculated by considering the importance of keywords within the text block and the layout distance of that text block to the video. Since the…...

