Skip to main content

Get Pages

Synopsis

Gets pages from URLs in an attribute and stores them into a new attribute.

Description

This operator retrieves pages, whose URLs are contained in the input data set. For each row in the data set, the URL is extracted from the specified attribute. A GET request is sent and a page is acquired. This page is stored in a new attribute specified by the parameter page attribute.

Input

Example Set

The Example Set port.

Output

Example Set

The Example Set port.

Parameters

The attribute that contains the URLs.

Page attribute

The name of the attribute that should contain the pages.

Random user agent

Choose a user agent randomly from a set of 7000 user agents

User agent

The user agent property.

Connection timeout

The timeout (in ms) for the connection.

Read timeout

The timeout (in ms) for reading from the URL.

Follow redirects

Specifies, whether redirects should be followed.

Accept cookies

Specifies, whether cookies should be accepted.

Specifies the scope of the cookies used

Request method

Specifies the request method.

Delay

Specifies whether execution should not be delayed, delayed by a fixed or random amount of time.

Delay amount

The delay amount in ms.

Min delay amount

The minimum delay amount in ms.

Max delay amount

The maximum delay amount in ms.