Changes for version 0.06 - 2017-01-17
- Fixed a bug where initial URLs mismatch scraped URLs on encoding.
- Changed asset name generator argument from URL to job.
- Changed hyper link path coding from root-relative to relative.
- Improved file extension detection.
- Improved HTML-like content-type detection.
- Improved crawling rule to reduce requests to already existing asset files.
- Added log_name attribute to write crawling logs.
Modules
Flatten a web pages deeply and make it portable