SkillAgent Search skills...⌘K

Spider

爬虫python3 (request,BeautifulSoup,xpath,re,Selenium,wordcloud等模块)

Generate Convert Improve

Install / Use

/learn @HuangCongQing/Spider

About this skill

Quality Score

0/100

Category

Development & Engineering

Supported Platforms

Universal

Tags

bf4 charles lxml python3 python3x re request requests selenium spider spiders xpath

README

spider

python3 各种爬虫技术

个人爬虫笔记：https://www.yuque.com/huangzhongqing/spider

@双愚 , 若fork或star请注明来源

note笔记

爬虫介绍：https://www.yuque.com/docs/share/edb944f3-880a-4a48-a053-df2953be56b4?# 《爬虫基础学习（总结）》
notes/01数据爬取requests_note
notes/02数据解析note

模块库

package/1request
package/1request-advanced: cookie&代理
package/2BeautifulSoup4
package/3xpath
package/4re正则表达式
1. re.findall
2. re.search
package/5selenium
package/6wordcloud&jieba 词云

| 功能 | 包名 | 作用 | | - | - | - | | 数据获取 | request | 爬取网页 | | 数据解析 | re | 正则表达式 | | <br/> | BeautifulSoup | <br/> | | <br/> | xpath | xpath语法来进行文件格式解析 | | <br/> | lxml | lxml库结合libxml2快速强大的特性，使用xpath语法来进行文件格式解析，与Beautiful相比，效率更高。 | | 模拟浏览器 | Selenium | 用于测试网站的自动化测试工具，支持各种浏览器包括Chrome、Firefox、Safari等主流界面浏览器，同时也支持phantomJS无界面浏览器。模拟点击 | | <br/> | PhantomJS | 无界面浏览器 | | <br/> | pandas | <br/> | | <br/> | jieba | 使用结巴分词进行中文分词 | | <br/> | pandas | <br/> | | <br/> | wordcloud | 词云包 | | <br/> | matplotlib | 绘制图表 | | | random | <br/> |

通用代码(输出|表格|)

common.ipynb

爬虫实战

文件操作

读取保存excel，txt等文件

LICENSE

本项目全部内容遵守 MIT 许可协议.

HuangCongQing

View profile

GitHub Stars16

CategoryDevelopment

Updated4mo ago

Forks12

HuangCongQing/Spider

Languages

HTML

Security Score

92/100

Audited on Nov 4, 2025

No findings