Github Scraping Parameters

Web Scraper API Github Scraping Parameters

Configure Github Scraping Parameters using Thordata's Web Scraper API, including repository URL, search URL, code URL and other parameters.

Unique Identifier:

tokenAccess token(required)

This parameter is used as the API access token to ensure the legitimacy of the scraping request.

Request examples:

Authorization: Bearer ********************

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer ********************" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-url" ^
  -d "spider_parameters=[{\"url\": \"https://github.com/TheAlgorithms/Python/blob/master/divide_and_conquer/power.py\"},{\"url\": \"https://github.com/AkarshSatija/msSync/blob/master/index.js\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

Product - Scrape repository information:

1. Github - Scrape repository information by repository URL

spider_idScraper tool (required)

Defines which scraper tool to use.

Request examples:

spider_id=github_repository_by-repo-url

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-repo-url" ^
  -d "spider_parameters=[{\"repo_url\": \"https://github.com/TheAlgorithms/Python\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"
repo_urlRepository URL (required)

This parameter specifies the repository URL to be scraped.

Request examples:

"repo_url": "https://github.com/TheAlgorithms/Python"

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-repo-url" ^
  -d "spider_parameters=[{\"repo_url\": \"https://github.com/TheAlgorithms/Python\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

2.Github - Scrape repository information by search URL

spider_idScraper tool (required)

Defines which scraper tool to use.

Request examples:

spider_id=github_repository_by-search-url

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-search-url" ^
  -d "spider_parameters=[{\"search_url\": \"https://github.com/search?q=ML%26type=repositories\",\"page_turning\": \"\",\"max_num\": \"1\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"
search_urlSearch URL (required)

This parameter specifies the Search URL to be scraped.

Request examples:

"search_url": "https://github.com/search?q=ML%26type=repositories"

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-search-url" ^
  -d "spider_parameters=[{\"search_url\": \"https://github.com/search?q=ML%26type=repositories\",\"page_turning\": \"\",\"max_num\": \"1\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

page_turningPage Turning (optional)

This parameter specifies the limit on the number of crawled results , please enter the number of pages.

Request examples:

"page_turning": "1"

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-search-url" ^
  -d "spider_parameters=[{\"search_url\": \"https://github.com/search?q=ML%26type=repositories\",\"page_turning\": \"1\",\"max_num\": \"1\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

max_numMaximum number (optional)

This parameter specifies the maximum number of warehouses to crawl.

Request examples:

"max_num": "1"

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-search-url" ^
  -d "spider_parameters=[{\"search_url\": \"https://github.com/search?q=ML%26type=repositories\",\"page_turning\": \"1\",\"max_num\": \"1\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

3.Github - Scrape repository information by URL

spider_idScraper tool (required)

Defines which scraper tool to use.

Request examples:

spider_id=github_repository_by-url

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-url" ^
  -d "spider_parameters=[{\"url\": \"https://github.com/TheAlgorithms/Python/blob/master/divide_and_conquer/power.py\"},{\"url\": \"https://github.com/AkarshSatija/msSync/blob/master/index.js\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"


urlCode URL (required)

This parameter specifies the code URL to search.

Request examples:

"url": "https://github.com/TheAlgorithms/Python/blob/master/divide_and_conquer/power.py"

curl -X POST "https://scraperapi.thordata.com/builder" ^
  -H "Authorization: Bearer Token-ID" ^
  -H "Content-Type: application/x-www-form-urlencoded" ^
  -d "spider_name=github.com" ^
  -d "spider_id=github_repository_by-url" ^
  -d "spider_parameters=[{\"url\": \"https://github.com/TheAlgorithms/Python/blob/master/divide_and_conquer/power.py\"}]" ^
  -d "spider_errors=true" ^
  -d "file_name={{TasksID}}"

If you need further assistance, please contact us via email at [email protected].

Last updated

Was this helpful?