Cut Document
Synopsis
Cuts an input document into segments using regular expressions specifiying start and end of segments.
Description
This operator segments a text based on a starting and ending regular expression.
Input
document
Output
documents
Description
Collection of the segmented document.
Parameters
Query type
Specifies the type of the query. The available query types are: ** String Matching, Regular Expression, Regular Region, Indexed, XPath** and JSONPath;
String matching queries
Specifies a list of string matching start and end sequences. Everything between will be used as result. See the operator documentation for details on string matching.
Attribute type
Specifies the type of the resulting attributes. If numerical or binomial is chosen, ensure that the returned result is interpretable. The available types are: Nominal, Numerical and Binominal;
Regular expression queries
Specifies a list of attribute names and their corresponding regular expressions. The first matching group is used as value. See the operator documentation for details on regular expressions.
Regular region queries
Specifies a list of attribute names and their corresponding regular expressions. Two regular expressions might be specified in order to define the start and the end of a region. Everything in between the two matches will be delivered as result.
Xpath queries
Specifies a list of attribute names and their corresponding XPath queries. See the operator documentation for details on XPath.
Namespaces
Specifies pairs of identifier and namespace for use in XPath queries. The namespace for (x)html is bound automatically to the identifier h.
Ignore cdata
Indicates if CDATA should be ignored when using the XPATH expression.
Assume html
If checked a more tolerant xml parser will be used, which copes with forbidden HTML constructions, but always assumes HTML and adds missing tags. For plain XML uncheck this.
Index queries
Specifies a list of attribute names and the regions. Regions are specified as offset index and length of the match.
Jsonpath queries
Specifies a list of attribute names and their corresponding JSONPath queries.