Data extraction in PHP

One of the most important features of ScrapingBee, is the ability to extract exact data without need to post-process the request’s content using external libraries.

We can use this feature by specifying an additional parameter with the name extract_rules. We specify the label of elements we want to extract, their CSS Selectors and ScrapingBee will do the rest!

Let’s say that we want to extract the title & the subtitle of the  data extraction documentation page . Their CSS selectors are h1 and span.text-20 respectively. To make sure that they’re the correct ones, you can use the JavaScript function: document.querySelector("CSS_SELECTOR") in that page’s developer tool’s console.

The full code will look like this:

<?php

// Get cURL resource
$ch = curl_init();

// Set base url & API key
$BASE_URL = "https://app.scrapingbee.com/api/v1/?";
$API_KEY = "YOUR-API-KEY";

// Set the extract rules:
$rules = array(
  'title' => 'h1',
  'subtitle' => 'span.text-20'
);

$rules = json_encode($rules);

// Set parameters
$parameters = array(
    'api_key' => $API_KEY,
    'url' => 'https://www.scrapingbee.com/documentation/data-extraction', // The URL to scrape
    'extract_rules' => $rules
);
// Building the URL query
$query = http_build_query($parameters);

// Set the URL for cURL
curl_setopt($ch, CURLOPT_URL, $BASE_URL.$query);

// Set method
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

// Return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

// Send the request and save response to $response
$response = curl_exec($ch);

// Stop if fails
if (!$response) {
    die('Error: "' . curl_error($ch) . '" - Code: ' . curl_errno($ch));
}

echo 'HTTP Status Code: ' . curl_getinfo($ch, CURLINFO_HTTP_CODE) . PHP_EOL;
echo 'Response Body: ' . $response . PHP_EOL;

// Close curl resource to free up system resources
curl_close($ch);
?>
Go back to tutorials