Github Scraping Parameters
Web Scraper API Github Scraping Parameters
Configure Github Scraping Parameters using Thordata's Web Scraper API, including repository URL, search URL, code URL and other parameters.
Unique Identifier:
token,Access token(required)
This parameter is used as the API access token to ensure the legitimacy of the scraping request.
Request examples:
Authorization: Bearer ********************
curl -X POST "https://scraperapi.thordata.com/builder" ^
-H "Authorization: Bearer ********************" ^
-H "Content-Type: application/x-www-form-urlencoded" ^
-d "spider_name=github.com" ^
-d "spider_id=github_repository_by-url" ^
-d "spider_parameters=[{\"url\": \"https://github.com/TheAlgorithms/Python/blob/master/divide_and_conquer/power.py\"},{\"url\": \"https://github.com/AkarshSatija/msSync/blob/master/index.js\"}]" ^
-d "spider_errors=true" ^
-d "file_name={{TasksID}}"
Product - Scrape repository information:
1. Github - Scrape repository information by repository URL
spider_id ,Scraper tool (required)
Defines which scraper tool to use.
Request examples:
spider_id=github_repository_by-repo-url
curl -X POST "https://scraperapi.thordata.com/builder" ^
-H "Authorization: Bearer Token-ID" ^
-H "Content-Type: application/x-www-form-urlencoded" ^
-d "spider_name=github.com" ^
-d "spider_id=github_repository_by-repo-url" ^
-d "spider_parameters=[{\"repo_url\": \"https://github.com/TheAlgorithms/Python\"}]" ^
-d "spider_errors=true" ^
-d "file_name={{TasksID}}"
repo_url,Repository URL (required)
This parameter specifies the repository URL to be scraped.
Request examples:
"repo_url": "https://github.com/TheAlgorithms/Python"
curl -X POST "https://scraperapi.thordata.com/builder" ^
-H "Authorization: Bearer Token-ID" ^
-H "Content-Type: application/x-www-form-urlencoded" ^
-d "spider_name=github.com" ^
-d "spider_id=github_repository_by-repo-url" ^
-d "spider_parameters=[{\"repo_url\": \"https://github.com/TheAlgorithms/Python\"}]" ^
-d "spider_errors=true" ^
-d "file_name={{TasksID}}"2.Github - Scrape repository information by search URL
spider_id ,Scraper tool (required)
Defines which scraper tool to use.
Request examples:
spider_id=github_repository_by-search-url
curl -X POST "https://scraperapi.thordata.com/builder" ^
-H "Authorization: Bearer Token-ID" ^
-H "Content-Type: application/x-www-form-urlencoded" ^
-d "spider_name=github.com" ^
-d "spider_id=github_repository_by-search-url" ^
-d "spider_parameters=[{\"search_url\": \"https://github.com/search?q=ML%26type=repositories\",\"page_turning\": \"\",\"max_num\": \"1\"}]" ^
-d "spider_errors=true" ^
-d "file_name={{TasksID}}"
search_url ,Search URL (required)
This parameter specifies the Search URL to be scraped.
Request examples:
"search_url": "https://github.com/search?q=ML%26type=repositories"
curl -X POST "https://scraperapi.thordata.com/builder" ^
-H "Authorization: Bearer Token-ID" ^
-H "Content-Type: application/x-www-form-urlencoded" ^
-d "spider_name=github.com" ^
-d "spider_id=github_repository_by-search-url" ^
-d "spider_parameters=[{\"search_url\": \"https://github.com/search?q=ML%26type=repositories\",\"page_turning\": \"\",\"max_num\": \"1\"}]" ^
-d "spider_errors=true" ^
-d "file_name={{TasksID}}"
page_turning ,Page Turning (optional)
This parameter specifies the limit on the number of crawled results , please enter the number of pages.
Request examples:
"page_turning": "1"
curl -X POST "https://scraperapi.thordata.com/builder" ^
-H "Authorization: Bearer Token-ID" ^
-H "Content-Type: application/x-www-form-urlencoded" ^
-d "spider_name=github.com" ^
-d "spider_id=github_repository_by-search-url" ^
-d "spider_parameters=[{\"search_url\": \"https://github.com/search?q=ML%26type=repositories\",\"page_turning\": \"1\",\"max_num\": \"1\"}]" ^
-d "spider_errors=true" ^
-d "file_name={{TasksID}}"
max_num ,Maximum number (optional)
This parameter specifies the maximum number of warehouses to crawl.
Request examples:
"max_num": "1"
curl -X POST "https://scraperapi.thordata.com/builder" ^
-H "Authorization: Bearer Token-ID" ^
-H "Content-Type: application/x-www-form-urlencoded" ^
-d "spider_name=github.com" ^
-d "spider_id=github_repository_by-search-url" ^
-d "spider_parameters=[{\"search_url\": \"https://github.com/search?q=ML%26type=repositories\",\"page_turning\": \"1\",\"max_num\": \"1\"}]" ^
-d "spider_errors=true" ^
-d "file_name={{TasksID}}"
3.Github - Scrape repository information by URL
spider_id ,Scraper tool (required)
Defines which scraper tool to use.
Request examples:
spider_id=github_repository_by-url
curl -X POST "https://scraperapi.thordata.com/builder" ^
-H "Authorization: Bearer Token-ID" ^
-H "Content-Type: application/x-www-form-urlencoded" ^
-d "spider_name=github.com" ^
-d "spider_id=github_repository_by-url" ^
-d "spider_parameters=[{\"url\": \"https://github.com/TheAlgorithms/Python/blob/master/divide_and_conquer/power.py\"},{\"url\": \"https://github.com/AkarshSatija/msSync/blob/master/index.js\"}]" ^
-d "spider_errors=true" ^
-d "file_name={{TasksID}}"
url,Code URL (required)
This parameter specifies the code URL to search.
Request examples:
"url": "https://github.com/TheAlgorithms/Python/blob/master/divide_and_conquer/power.py"
curl -X POST "https://scraperapi.thordata.com/builder" ^
-H "Authorization: Bearer Token-ID" ^
-H "Content-Type: application/x-www-form-urlencoded" ^
-d "spider_name=github.com" ^
-d "spider_id=github_repository_by-url" ^
-d "spider_parameters=[{\"url\": \"https://github.com/TheAlgorithms/Python/blob/master/divide_and_conquer/power.py\"}]" ^
-d "spider_errors=true" ^
-d "file_name={{TasksID}}"
If you need further assistance, please contact us via email at [email protected].
Last updated
Was this helpful?