1
2
mirror of https://github.com/vimagick/dockerfiles synced 2024-06-30 18:51:24 +00:00
dockerfiles/crawlee/README.md

28 lines
642 B
Markdown
Raw Normal View History

2023-03-08 07:03:19 +00:00
crawlee
=======
[Crawlee][1] is a web scraping and browser automation library Crawlee is a web
scraping and browser automation library.
```bash
2023-03-08 08:07:39 +00:00
$ docker run --rm -it -v $PWD:/tmp apify/actor-node:16 sh
>>> export PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1
>>> export PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1
>>> npx crawlee create -t cheerio-js my-crawler
>>> mv my-crawler /tmp
>>> exit
2023-03-08 07:03:19 +00:00
2023-03-08 07:51:09 +00:00
$ docker-compose build my-crawler
2023-03-08 07:03:19 +00:00
2023-03-08 07:51:09 +00:00
$ docker-compose run --rm my-crawler
$ tree my-crawler/storage/
2023-03-08 07:03:19 +00:00
├── datasets
│   └── default
2023-03-08 07:51:09 +00:00
│   └── 000000001.json
2023-03-08 07:03:19 +00:00
├── key_value_stores
└── request_queues
```
[1]: https://crawlee.dev/