Home > Blog > Getting Started with DataScrap Language

Getting Started with DataScrap Language

May 7, 2023
2 min read

Introduction to DataScrap Language

DataScrap Studio’s custom scripting language makes it easy to extract data from any website without learning complex programming languages like Python or JavaScript. In this tutorial, we’ll cover the basics of the DataScrap language and show you how to create your first data extraction script.

Basic Syntax

The DataScrap language has a simple, intuitive syntax designed for non-technical users. Here’s a basic example:

DataScrap Studio

FIND .product-card
EXTRACT name: .product-title, TEXT
EXTRACT price: .product-price, TEXT
EXTRACT image: img, ATTR(src)

This script will:

  1. Find all elements with the class product-card
  2. For each product card, extract the product name from elements with class product-title
  3. Extract the price from elements with class product-price
  4. Extract the image URL from the src attribute of any img tag within the product card

Key Commands

FIND

The FIND command selects elements from the page using CSS selectors. This is always the first command in your script:

DataScrap Studio

FIND .article

You can use any valid CSS selector with the FIND command:

DataScrap Studio

FIND #main-content .product-listing li

EXTRACT

The EXTRACT command gets data from the selected elements:

DataScrap Studio

EXTRACT title: h1, TEXT
EXTRACT description: .description, TEXT
EXTRACT url: a, ATTR(href)
EXTRACT html: .content, HTML

The format is:

EXTRACT [name]: [selector], [type]

Where:

  • name is what you want to call this data in your results
  • selector is a CSS selector relative to the elements found by the FIND command
  • type can be:
    • TEXT - Gets the text content
    • HTML - Gets the HTML content
    • ATTR(attribute_name) - Gets a specific attribute value

Advanced Example

Here’s a more advanced example that extracts product information from an e-commerce site:

DataScrap Studio

FIND .product-grid .product
EXTRACT name: .product-name, TEXT
EXTRACT price: .price .current, TEXT
EXTRACT originalPrice: .price .original, TEXT
EXTRACT rating: .rating-stars, ATTR(data-rating)
EXTRACT inStock: .stock-status, TEXT
EXTRACT imageUrl: .product-image img, ATTR(src)
EXTRACT productUrl: a.product-link, ATTR(href)

Next Steps

Now that you understand the basics of the DataScrap language, you can start creating your own data extraction scripts. In our next tutorial, we’ll cover more advanced features like pagination, filtering, and data transformation.

Ready to try it yourself? Download DataScrap Studio and start extracting data in minutes!

Sarah Chen

About the Author

Sarah Chen

Sarah is a data scientist with over 8 years of experience in web scraping and data analytics. She specializes in developing automated data extraction solutions for e-commerce and marketplace businesses.