Github 抓取參數

Web Scraper API Github 抓取參數

使用 Thordata 的 Web Scraper API 配置 Github 抓取參數，包括儲存庫 URL、搜尋 URL、程式碼 URL 和其他參數。

唯一标識：

token ，訪問令牌（必填）

此參數用作 API 存取令牌，以確保抓取的合法性。

請求範例：

Authorization: Bearer ********************

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer ********************" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-url" ^
  -d "spider_parameters=[{\"url\": \"https://github.com/TheAlgorithms/Python/blob/master/divide_and_conquer/power.py\"},{\"url\": \"https://github.com/AkarshSatija/msSync/blob/master/index.js\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

一、產品-抓取儲存庫資訊：

1. Github - 透過倉庫 URL 抓取倉庫訊息

spider_id ，所屬抓取工具 (必填)

定義要使用的抓取工具。

請求範例：

spider_id=ithub_repository_by-repo-url

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-repo-url" ^
  -d "spider_parameters=[{\"repo_url\": \"https://github.com/TheAlgorithms/Python\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

repo_url，倉庫URL（必填）

此參數指定要抓取的儲存庫 URL。

請求範例：

"repo_url": "https://github.com/TheAlgorithms/Python"

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-repo-url" ^
  -d "spider_parameters=[{\"repo_url\": \"https://github.com/TheAlgorithms/Python\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

2.Github - 透過搜尋 URL 抓取倉庫訊息

spider_id ，所屬抓取工具 (必填)

定義要使用的抓取工具。

請求範例：

spider_id=github_repository_by-search-url

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-search-url" ^
  -d "spider_parameters=[{\"search_url\": \"https://github.com/search?q=ML%26type=repositories\",\"page_turning\": \"\",\"max_num\": \"1\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

search_url ，搜尋URL（必填）

此參數指定要抓取的搜尋URL。

請求範例：

"search_url": "https://github.com/search?q=ML%26type=repositories"

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-search-url" ^
  -d "spider_parameters=[{\"search_url\": \"https://github.com/search?q=ML%26type=repositories\",\"page_turning\": \"\",\"max_num\": \"1\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

page_turning ，翻頁（可選）

此參數指定抓取結果數量的限制，請輸入頁數。

請求範例：

"page_turning": "1"

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-search-url" ^
  -d "spider_parameters=[{\"search_url\": \"https://github.com/search?q=ML%26type=repositories\",\"page_turning\": \"1\",\"max_num\": \"1\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

max_num ，最大數量（可選）

此參數指定要爬取的最大倉庫數量。

請求範例：

"max_num": "1"

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-search-url" ^
  -d "spider_parameters=[{\"search_url\": \"https://github.com/search?q=ML%26type=repositories\",\"page_turning\": \"1\",\"max_num\": \"1\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

3.Github - 透過 URL 抓取儲存庫訊息

spider_id ，所屬抓取工具 (必填)

定義要使用的抓取工具。

請求範例：

spider_id=github_repository_by-url

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-url" ^
  -d "spider_parameters=[{\"url\": \"https://github.com/TheAlgorithms/Python/blob/master/divide_and_conquer/power.py\"},{\"url\": \"https://github.com/AkarshSatija/msSync/blob/master/index.js\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

url，代碼網址（必填）

此參數指定要搜尋的程式碼 URL。

請求範例：

"url": "https://github.com/TheAlgorithms/Python/blob/master/divide_and_conquer/power.py"

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-url" ^
  -d "spider_parameters=[{\"url\": \"https://github.com/TheAlgorithms/Python/blob/master/divide_and_conquer/power.py\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

如果您需要進一步的協助，請透過電子郵件聯繫 [email protected]。

PreviousYelp 抓取參數 NextTikTok 抓取參數

Last updated 3 months ago