Read CSV
Synopsis
This Operator reads an ExampleSet from the specified CSV file.
Description
CSV is an abbreviation for Comma-Separated Values. The CSV files store data (both numerical and text) in plain-text form. All values corresponding to an Example are stored as one line in the CSV file. Values for different Attributes are separated by a separator character. The separator remains constant. Each row in the file uses the constant separator for separating Attribute values. The term 'CSV' suggests that the Attribute values would be separated by commas, but other separators can also be used.
The easiest way to import a CSV file is to use the Import Configuration Wizard from the Parameters panel. All parameters can also directly be set in the Parameters panel. For more details about the Operator, see the description of the parameters.
Please make sure that the CSV file is read correctly as an ExampleSet before building a Process that uses it.
Differentiation
There are many Read <source> Operators in the Data Access group and Files/Read sub-group. For example, there is Read Excel, Read URL, Read SPSS, Read XML and other Operators, which can read ExampleSet from different file formats.
Input
file
A CSV file can be optionally passed in as a file object. This can be created with Operators having file output ports such as the Read File Operator.
Output
output
This port delivers the ExampleSet created from the CSV file provided at the input port, imported through the Import Configuration Wizard or loaded from the path given to the csv file parameter.
Parameters
Import configuration wizard
This user-friendly wizard guides you to easily configure this Operator to import the CSV file.
Csv file
The path of the CSV file is specified here. It can also be selected using the 'Choose a file' button.
Column separators
Column separators for CSV files can be specified here. It can also be provided as a regular expression. A good understanding of regular expressions can be developed by studying the description of Select Attributes Operator and its tutorial Processes.
Trim lines
This parameter indicates if lines should be trimmed (removal of empty spaces at the beginning and the end) before the column split is performed. This option might be problematic if TABs ('\t') are used as separators.
Use quotes
This parameter indicates if quotes should be regarded. Quotes can be used to store special characters like column separators. For example if (,) is set as column separator and (") is set as quotes character, then a row (a,b,c,d) will be translated as 4 values for 4 columns. On the other hand ("a,b,c,d") will be translated as a single column value a,b,c,d. If this parameter is set to false, the quotes character parameter and the escape character parameter cannot be defined.
Quotes character
This parameter defines the quotes character and is only available if use quotes is set to true.
Escape character
This parameter specifies the character used to escape the quotes and is only available if use quotes is set to true. For example, if (") is used as quotes character and ('') is used as escape character, then ("yes") will be translated as (yes) and ("yes") will be translated as ("yes").
Skip comments
This parameter is used to ignore comments in the CSV file (if any). If this option is set to true, a comment character should be defined using the comment characters parameter.
Comment characters
This parameter is available if comment characters is set to true. Lines beginning with these characters are ignored. If this character is present in the middle of the line, anything that comes in that line after this character is ignored. The comment character itself is also ignored.
Parse numbers
This parameter specifies whether numbers are parsed or not.
Decimal character
This character is used as the decimal character.
Grouped digits
This parameter decides whether grouped digits should be parsed or not. If this parameter is set to true, a grouping character parameter has to be specified.
Grouping character
This character is used as the grouping character. If this character is found between numbers, the numbers are combined and this character is ignored. For example if "22-14" is present in the CSV file and "-" is set as the grouping character, then "2214" will be stored.
Infinity string
This parameter can be set to parse a specific infinity representation (e.g. "Infinity"). If it is not set, the local specific infinity representation will be used.