One of the most important features of ScrapingBee, is the ability to extract exact data without need to post-process the request’s content using external libraries.
We can use this feature by specifying an additional parameter with the name extract_rules
. We specify the label of elements we want to extract, their CSS Selectors and ScrapingBee will do the rest!
Let’s say that we want to extract the title & the subtitle of the
data extraction documentation page
. Their CSS selectors are h1
and span.text-20 respectively
. To make sure that they’re the correct ones, you can use the JavaScript function: document.querySelector("CSS_SELECTOR")
in that page’s developer tool’s console.
The full code will look like this:
package main
import (
"encoding/json"
"fmt"
"io"
"log"
"net/http"
)
const API_KEY = "YOUR-API-KEY"
const SCRAPINGBEE_URL = "https://app.scrapingbee.com/api/v1"
func extract(target_url string, rules interface{}) ([]byte, error) {
raw_rules, err := json.Marshal(rules)
if err != nil {
return nil, fmt.Errorf("Failed to encode rules: %s", err)
}
req, err := http.NewRequest("GET", SCRAPINGBEE_URL, nil)
if err != nil {
return nil, fmt.Errorf("Failed to build the request: %s", err)
}
q := req.URL.Query()
q.Add("api_key", API_KEY)
q.Add("url", target_url)
q.Add("extract_rules", string(raw_rules))
req.URL.RawQuery = q.Encode()
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
return nil, fmt.Errorf("Failed to request ScrapingBee: %s", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("Error request response with status code %d", resp.StatusCode)
}
bodyBytes, err := io.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("Failed to read the response body: %s", err)
}
return bodyBytes, nil
}
func main() {
target_url := "https://www.scrapingbee.com/documentation/data-extraction"
rules := map[string]interface{}{
"title": "h1",
"subtitle": "span.text-20",
}
raw_json, err := extract(target_url, rules)
if err != nil {
log.Fatal(err)
}
fmt.Println(string(raw_json))
}
And as you can see, the result is:
'{"title": "Documentation - Data Extraction", "subtitle": "Extract data with CSS selector"}'
You can find more about this feature in our documentation: Data Extraction . And more about CSS selectors in W3Schools - CSS Selectors page.
Go back to tutorials