> For the complete documentation index, see [llms.txt](https://doc.thordata.com/doc/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://doc.thordata.com/doc/zh-hk/zhua-qu-liu-lan-qi/pei-zhi.md).

# 配置

### 如何配置 Thordata 的抓取瀏覽器

\
本文將引導您完成 Thordata 抓取瀏覽器 的整個配置與使用流程，包括憑證獲取、基礎配置、範例腳本執行及即時會話管理。遵循本指南，您將能夠快速上手並高效地進行網頁資料抓取。\
在開始之前，請先準備好您的帳戶憑證，即用於網路自動化工具的使用者名稱和密碼。\
您可以在 Thordata 抓取瀏覽器 區域的 「演示場」標籤頁 中直接查看這些憑證資訊。我們假設您已獲得有效憑證，若尚未獲取，請從 Thordata 處申請。\
在使用抓取瀏覽器 之前，需完成基礎環境配置。我們將逐步指導您完成身份憑證的配置、API 基本參數設置，以及如何在操作控制台中管理即時瀏覽器會話，助您更順暢地啟用瀏覽器功能。

### &#xD; 抓取瀏覽器快速入門範例

\
我們為您準備了一系列抓取範例，幫助快速入門。您只需替換腳本中的個人憑證和目標網址，即可根據實際業務需求進行調整和擴展。如需編寫更複雜的抓取邏輯，可參考 Thordata 官方文件中支援的框架協議說明。\
您可以在儀表板中的「演示場」中線上調試腳本，也支援在本地環境中執行實際抓取任務。\
若選擇本地執行，請確保已安裝相應依賴（參考 Thordata 支援的框架協議），正確配置身份憑證後，執行範例腳本即可獲取目標資料。

{% tabs %}
{% tab title="Python-Playwright" %}

```sh
import asyncio  
from playwright.async_api import async_playwright  
  
const AUTH = 'PROXY-FULL-ACCOUNT:PASSWORD';  
const SBR_WS_SERVER = `wss://{AUTH}@ws-browser.thordata.com`;  
  
async def run(pw):  
    print('Connecting to Browser API...')  
    browser = await pw.chromium.connect_over_cdp(SBR_WS_SERVER)  
    try:  
        print('Connected! Navigating to Target...')  

        page = await browser.new_page()  
        await page.goto('https://example.com', timeout= 2 * 60 * 1000) 

        # Screenshot
        print('To Screenshot from page')  
        await page.screenshot(path='./remote_screenshot_page.png')  
        # html content
        print('Scraping page content...')  
        html = await page.content()  
        print(html)  
 
    finally:  
        # In order to better use the Browser API, be sure to close the browser 
        await browser.close()  
   
async def main():  
    async with async_playwright() as playwright:  
        await run(playwright)  
  
if _name_ == '_main_':  
 asyncio.run(main())
 
```

{% endtab %}

{% tab title="Python-selenium" %}

```python
from selenium.webdriver import Remote, ChromeOptions  
from selenium.webdriver.chromium.remote_connection import ChromiumRemoteConnection  
from selenium.webdriver.common.by import By  

# Enter your credentials - the zone name and password  
AUTH = 'PROXY-FULL-ACCOUNT:PASSWORD'  
REMOTE_WEBDRIVER = f'https://{AUTH}@hs-browser.thordata.com'  

def main():  
    print('Connecting to Browser API...')  
    sbr_connection = ChromiumRemoteConnection(REMOTE_WEBDRIVER, 'goog', 'chrome')  
    with Remote(sbr_connection, options=ChromeOptions()) as driver:  

        # get target URL
        print('Connected! Navigating to target ...')  
        driver.get('https://example.com') 

        # screenshot 
        print('screenshot to png')  
        driver.get_screenshot_as_file('./remote_page.png')  

        # html content
        print('Get page content...')  
        html = driver.page_source  
        print(html)  
  
if __name__ == '__main__':  
   main()

```

{% endtab %}

{% tab title=" NodeJS-Puppeteer " %}

```glimmer-js
const puppeteer = require('puppeteer-core');  

const AUTH = 'PROXY-FULL-ACCOUNT:PASSWORD';  
const WS_ENDPOINT = `wss://{AUTH}@ws-browser.thordata.com`;  
  
(async () => {
    console.log('Connecting to Scraping Browser...');  
    const browser = await puppeteer.connect({  
        browserWSEndpoint: SBR_WS_ENDPOINT,
        defaultViewport: {width: 1920, height: 1080}  
   });  
    try {  
        console.log('Connected! Navigating to Target URL');  
        const page = await browser.newPage();  
        
        await page.goto('https://example.com', { timeout: 2 * 60 * 1000 });  

        //1.Screenshot
        console.log('Screenshot to page.png');  
        await page.screenshot({ path: 'remote_screenshot.png' }); 
        console.log('Screenshot be saved');  

        //2.Get content
        console.log('Get page content...');  
        const html = await page.content();  
        console.log("source Htmml: ", html)  

    } finally {  
        // In order to better use the Browser API, be sure to close the browser after the script is executed
        await browser.close();  
   }  
})();
```

{% endtab %}

{% tab title="NodeJS-Playwright" %}

```glimmer-js
const pw = require('playwright');


const AUTH = 'PROXY-FULL-ACCOUNT:PASSWORD';  
const SBR_CDP = `wss://{AUTH}@ws-browser.thordata.com`;  
  
async function main() {  
    console.log('Connecting to Browser API...');  
    const browser = await pw.chromium.connectOverCDP(SBR_CDP);  
    try {  
        console.log('Connected! Navigating to target...');  
        const page = await browser.newPage();
        // Target URL
        await page.goto('https://www.windows.com', { timeout: 2 * 60 * 1000 });  
        // Screenshot
        console.log('To Screenshot from page');  
        await page.screenshot({ path: './remote_screenshot_page.png'});  

        // html content
        console.log('Scraping page content...');  
        const html = await page.content();  
        console.log(html);  
    } finally {  
        // In order to better use the Browser API, be sure to close the browser after the script is executed
        await browser.close();  
   }  
}  
  
if (require.main === module) {  
    main().catch(err => {  
        console.error(err.stack || err);  
        process.exit(1);  
   });  
}
```

{% endtab %}
{% endtabs %}

### &#xD; 瀏覽器 API 初始導航

\
根據抓取瀏覽器的會話管理機制，每個會話僅允許執行一次初始導航，即首次載入目標網站以進行資料提取的操作。在此會話起點確立後，使用者便可在該網站內部通過點擊、捲動等互動動作自由跳轉。然而，任何需要從初始導航階段重新開始的抓取任務——無論目標是同一網站還是不同網站——都必須通過建立新會話來完成。

### &#xD; 會話時間限制

\
自動超時機制： 所有瀏覽器會話均受限於 30 分鐘的最大存活時間。若會話未通過腳本指令主動終止，系統將在此時間後自動將其結束。\
Web 控制台特殊限制： 在 Web 控制台環境中，系統強制實行單一帳戶單一活動會話的策略。為避免資源衝突與潛在錯誤，請在您的自動化腳本中務必加入顯式關閉會話的邏輯。\
如果您需要進一步配置方面的幫助，請隨時通過以下方式與我們聯繫： <support@thordata.com>.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://doc.thordata.com/doc/zh-hk/zhua-qu-liu-lan-qi/pei-zhi.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
