A Survey on Python Libraries Used for Social Media Content Scraping

Thivaharan, S (2020) A Survey on Python Libraries Used for Social Media Content Scraping. In: 2020 International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India.

[thumbnail of A Survey on Python Libraries.pdf] Text
A Survey on Python Libraries.pdf

Download (238kB)

Abstract

Python has a rich set of libraries available for extracting the digital contents that are spread across the internet. Among the available libraries, the following three libraries are popularly deployed for the purpose: They are BeautifulSoup, LXml and RegEx. A statistical study carried out over the scattered available data set shows that RegEx is capable of delivering the answer on an average of 153.7 ms. Still, RegEx has the inherent drawback of haiing limited rule extraction when it comes for the web page with more inner tags. Because of this demerit RegEx is termed as capable of performing only moderately complex contexts. Nevertheless the other libraries BeautifulS oup and LXml are capable of extracting web content under critical environment yielding the response rate of 458.68 ms and 202.96 ms respectively. Also, these two libraries are based on the DOM model proving to be the scalable libraries. The modern content grading system [1] specifically developed for the regional languages available in social media are mostly influenced by the web scrappers. This survey justifies the overwhelming performance of RegEx under differing scenarios.

Item Type: Conference or Workshop Item (Paper)
Subjects: C Computer Science and Engineering > Database Management System
Divisions: Computer Science and Engineering
Depositing User: Users 5 not found.
Date Deposited: 10 Apr 2024 08:59
Last Modified: 30 Apr 2024 09:05
URI: https://ir.psgitech.ac.in/id/eprint/271

Actions (login required)

View Item
View Item