Changes for version 0.06 - 2017-01-17

  • Fixed a bug where initial URLs mismatch scraped URLs on encoding.
  • Changed asset name generator argument from URL to job.
  • Changed hyper link path coding from root-relative to relative.
  • Improved file extension detection.
  • Improved HTML-like content-type detection.
  • Improved crawling rule to reduce requests to already existing asset files.
  • Added log_name attribute to write crawling logs.

Modules

Flatten a web pages deeply and make it portable