Documentation - JavaScript Scenario

Interact with the webpage you want to scrape.

You can also discover this feature using our Postman collection covering every ScrapingBee's features.

💡 Important:
This page explains how to use a specific feature of our main web scraping API!
If you are not yet familiar with ScrapingBee web scraping API, you can read the documentation here.

Basic usage

If you want to interact with pages you want to scrape before we return your the HTML you can add JavaScript scenario to your API call.

For example, if you wish to click on a button, you will need to use this scenario.

{
    "instructions": [
       {"click": "#buttonId"}
    ]
}

You can use both CSS and XPath selectors in all instructions. All selectors beginning with / will be treated as XPath selectors. All other selectors will be treated as CSS selectors.

And so our scraper will scrape the webpage, click on the button #buttonId and then return you the HTML of the page.

Important: JavaScript scenarios are JSON formatted, and in order to pass them to a GET request, you need to stringify them.

# Install the Python ScrapingBee library:
# pip install scrapingbee

from scrapingbee import ScrapingBeeClient

client = ScrapingBeeClient(api_key='YOUR-API-KEY')
response = client.get(
    'https://www.scrapingbee.com/blog',
    params={
        'js_scenario': {"instructions": [{ "click": "#buttonId" }]},
    },
)
print('Response HTTP Status Code: ', response.status_code)
print('Response HTTP Response Body: ', response.content)

// request Axios
const axios = require('axios');

axios.get('https://app.scrapingbee.com/api/v1', {
    params: {
        'api_key': 'YOUR-API-KEY',
        'url': 'https://www.scrapingbee.com/blog',
        'js_scenario': '{"instructions": [{ "click": "#buttonId" }]}',
    }
}).then(function (response) {
    // handle success
    console.log(response);
})

String encoded_url = URLEncoder.encode("YOUR URL", "UTF-8");

require 'net/http'
require 'net/https'
require 'uri'

# Classic (GET )
def send_request
    js_scenario = URI::encode('{"instructions": [{ "click": "#buttonId" }]}')
    uri = URI('https://app.scrapingbee.com/api/v1/?api_key=YOUR-API-KEY&url=https://www.scrapingbee.com/blog&js_scenario=' + js_scenario)

    # Create client
    http = Net::HTTP.new(uri.host, uri.port)
    http.use_ssl = true
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER

    # Create Request
    req =  Net::HTTP::Get.new(uri)

    # Fetch Request
    res = http.request(req)
    puts "Response HTTP Status Code: #{ res.code }"
    puts "Response HTTP Response Body: #{ res.body }"
rescue StandardError => e
    puts "HTTP Request failed (#{ e.message })"
end

send_request()

<?php

// get cURL resource
$ch = curl_init();

// set url
$js_scenario = urlencode('{"instructions": [{ "click": "#buttonId" }]}');

curl_setopt($ch, CURLOPT_URL, 'https://app.scrapingbee.com/api/v1/?api_key=YOUR-API-KEY&url=https://www.scrapingbee.com/blog&js_scenario=' . $js_scenario);

// set method
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

// return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);



// send the request and save response to $response
$response = curl_exec($ch);

// stop if fails
if (!$response) {
  die('Error: "' . curl_error($ch) . '" - Code: ' . curl_errno($ch));
}

echo 'HTTP Status Code: ' . curl_getinfo($ch, CURLINFO_HTTP_CODE) . PHP_EOL;
echo 'Response Body: ' . $response . PHP_EOL;

// close curl resource to free up system resources
curl_close($ch);

?>

package main

import (
	"fmt"
	"io/ioutil"
	"net/http"
    "net/url"
)

func sendClassic() {
	// Create client
	client := &http.Client{}


    // Stringify rules
    js_scenario := url.QueryEscape(`{"instructions": [{ "click": "#buttonId" }]}`)
	// Create request
	req, err := http.NewRequest("GET", "https://app.scrapingbee.com/api/v1/?api_key=YOUR-API-KEY&url=https://www.scrapingbee.com/blog&js_scenario=" + js_scenario, nil)


	parseFormErr := req.ParseForm()
	if parseFormErr != nil {
		fmt.Println(parseFormErr)
	}

	// Fetch Request
	resp, err := client.Do(req)

	if err != nil {
		fmt.Println("Failure : ", err)
	}

	// Read Response Body
	respBody, _ := ioutil.ReadAll(resp.Body)

	// Display Results
	fmt.Println("response Status : ", resp.Status)
	fmt.Println("response Headers : ", resp.Header)
	fmt.Println("response Body : ", string(respBody))
}

func main() {
    sendClassic()
}

You can add multiple instructions to the scenario, they will get executed one by one on our end.

Important: We strongly advice you to use JS scenario with json_response set to true (learn more), as it will return you a detailed report of the scenario execution under the js_scenario_report key. Very useful for debugging for example.

Below is a quick overview of all the different instruction you can use.

{"evaluate": "console.log('foo')"} # Run custom JavaScript
{"click": "#button_id"} # Click on a an element
{"wait": 1000} # Wait for a fixed duration in ms
{"wait_for": "#slow_div"} # Wait for an element to appear
{"wait_for_and_click": "#slow_div"} # Wait for an element to appear and then click on it
{"scroll_x": 1000} # Scroll the screen in the horizontal axis, in px
{"scroll_y": 1000} # Scroll the screen in the vertical axis, in px
{"fill": ["#input_1", "value_1"]} # Fill some input
{"evaluate": "console.log('toto')"} # Run custom JavaScript code
{"infinite_scroll": # Scroll the page until the end
    {
        "max_count": 0, # Maximum number of scroll, 0 for infinite
        "delay": 1000, # Delay between each scroll, in ms
        "end_click": {"selector": "#button_id"} # (optional) Click on a button when the end of the page is reached, usually a "load more" button
    }
}

Of course, you can choose to use them in the order you want, and you can use the same one multiple time in one scenario.

Here is an example of a scenario that wait for a button to appear, click on it and then scroll, wait a bit, and scroll again.

{
    "instructions": [
        {"wait_for_and_click": "#slow_button"},
        {"scroll_x": 1000},
        {"wait": 1000},
        {"scroll_x": 1000},
        {"wait": 1000}
    ]
}

Strict mode

By default, our JavaScript scenario are executed in "strict mode", which means that if an error occurs during the execution of the scenario, the whole scenario will be aborted and an error will be returned.

You can disable this behavior by setting the strict key to false in your scenario.

{
    "strict": false,
    "instructions": [
        {"wait_for_and_click": "#slow_button"},
        {"scroll_x": 1000},
        {"wait": 1000},
        {"scroll_x": 1000},
        {"wait": 1000}
    ]
}

Clicking on a button

click CSS/XPath selector

To click on a button, use this instruction with the CSS/XPath selector of the button you want to click on

If you want to click on the button whose id is secretButton you need to use this JavaScript scenario:

{
    "instructions": [
        {"click": "#secretButton"}
    ]
}

Wait for a fixed amount of time

wait duration in ms

To wait for a fixed amount of time, use this instruction with the duration, in ms, you want to wait for.

If you want to wait for 2 seconds, you need to use this JavaScript scenario:

{
    "instructions": [
        {"wait": 2000}
    ]
}

Wait for an element to appear

wait_for CSS/XPath selector

To wait for a particular element to appear, use this instruction with the CSS/XPath selector of the element you want to wait for.

If you want to wait for the element whose class is slow_div to appear before getting some results, you need to use this JavaScript scenario:

{
    "instructions": [
        {"wait_for": ".slow_div"}
    ]
}

Wait for an element to appear and click

wait_for_and_click CSS/XPath selector

To wait for a particular CSS/XPath element to appear, and then click on it, use this instruction.

If you want to wait for the element whose class is slow_div to appear before clicking on it, you need to use this JavaScript scenario:

{
    "instructions": [
        {"wait_for_and_click": ".slow_div"}
    ]
}

Note: this is exactly the same as using:

{
    "instructions": [
        {"wait_for": ".slow_div"},
        {"click": ".slow_div"}
    ]
}

Scroll Horizontally

scroll_x number of pixel

To scroll horizontally on a page, use this instruction with the number of pixels you want to scroll.

If you want to scroll down 1000px you need to use this JavaScript scenario:

{
    "instructions": [
        {"scroll_x": 1000}
    ]
}

Scroll Vertically

scroll_y number of pixel

To scroll vertically on a page, use this instruction with the number of pixels you want to scroll.

If you want to scroll down 1000px you need to use this JavaScript scenario:

{
    "instructions": [
        {"scroll_y": 1000}
    ]
}

Filling form input

fill [ selector, value ]

To fill an input, use this instruction with the CSS/XPath selector of the input you want to fill and the value you want to fill it with.

If you want to fill an input whose CSS/XPath selector is #input_1 with the value value_1 you need to use this JavaScript scenario:

{
    "instructions": [
        {"fill": ["input_1", "value_1"]}
    ]
}

Infinite scrolling

infinite_scroll configuration

To scroll the page until the end, use this instruction with the configuration you want to use.

{"infinite_scroll": # Scroll the page until the end
    {
        "max_count": 0,
        "delay": 1000,
        "end_click": {"selector": "#button_id", "selector_type": "css|auto|xpath"}
    }
}

max_count: The maximum number of scroll you want to do, 0 for infinite.
delay: The delay between each scroll, in ms.
end_click: (optional) A click instruction to click on a button when the end of the page is reached, usually a "load more" button. You can use both a CSS or XPath selector, but you will need to pass the correct selector_type value.

Note: This instruction is currently not supported when stealth proxies are used.

Executing custom JavaScript

evaluate JavaScript code

If you need more flexibility and need to run custom JavaScript, you need to use this instruction. However, this instruction is currently not supported when stealth proxies are used.

If you want to run the code console.log('foo') on the webpage you need to use this JavaScript scenario:

{
    "instructions": [
        {"evaluate": "console.log('foo')"}
    ]
}

💡 Good to know: The results of any evaluate instruction will be added to the evaluate_results key in the JSON response if json_response=True is used. You can read more about this here

Timeout

Your whole scenario should not take more than 40 seconds to complete, otherwise the API will timeout.