Write Database
Synopsis
This operator writes an ExampleSet to an SQL database.
Description
The Write Database operator is used for writing an ExampleSet to the specified SQL database. You need to have at least basic understanding of databases and database connections in order to use this operator properly. Go through the parameters and the attached Example Process to understand the flow of this operator.
The user can specify the database connection and a table name. Please note that the table will be created during writing if it does not exist. The most convenient way of defining the necessary parameters is the Manage Database Connections wizard. The most important parameters (database URL and user name) will be automatically determined by this wizard. At the end, you only have to define the table name. This operator only supports the writing of the complete ExampleSet consisting of all regular and special attributes and all examples. If this is not desired, perform some preprocessing operators like the Select Attributes or Filter Examples operators before applying the Write Database operator. Data from database tables can be read in RapidMiner by using the Read Database operator.
Input
input
This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process.
connection
This input port expects a Connection object if any. See the parameter connection entry for more information.
Output
through
The ExampleSet that was provided at the input port is delivered through this output port without any modifications. This is usually used to reuse the same ExampleSet in further operators of the process.
connection
This output port delivers the Connection object from the input port. If the input port is not connected the port delivers nothing.
Parameters
Define connection
This parameter indicates how the database connection should be specified. It gives you four options: repository, predefined, url and jndi. The parameter is not visible if the connection input port is connected.
Connection entry
This parameter is only available when the define connection parameter is set to repository. This parameter is used to specify a repository location that represents a connection entry. The connection can also be provided using the connection input port.
Connection
This parameter is only available when the define connection parameter is set to predefined. This parameter is used for connecting to the database using a predefined connection. You can have many predefined connections. You can choose one of them using the drop down box. You can add a new connection or modify previous connections using the button next to the drop down box. You may also accomplish this by clicking on **Manage Database Connections... ** from the Tools menu in the main window. A new window appears. This window asks for several details e.g. Host, Port, Database system, schema, username and password. The Test button in this new window will allow you to check whether the connection can be made. Save the connection once the test is successful. After saving a new connection, it can be chosen from the drop down box of the connection parameter. You need to have basic understanding of databases for configuring a connection.
Database system
This parameter is only available when the define connection parameter is set to url. This parameter is used for selecting the database system in use. It can have one of the following values: MySQL, PostgreSQL, Sybase, HSQLDB, ODBC Bridge (e.g. Access), Microsoft SQL Server (JTDS), Ingres, Oracle.
Database url
This parameter is only available when the define connection parameter is set to url. This parameter is used for defining the URL connection string for the database, e.g. 'jdbc:mysql://foo.bar:portnr/database'.
Username
This parameter is only available when the define connection parameter is set to url. This parameter is used for specifying the username of the database.
Password
This parameter is only available when the define connection parameter is set to url. This parameter is used for specifying the password of the database.
Jndi name
This parameter is only available when the define connection parameter is set to jndi. This parameter is used for giving the JNDI a name for a data source.
Table name
This parameter is used for selecting the required table from the specified database. Please note that you can also write a table name here, if the table does not exist it will be created during writing.
Overwrite mode
This parameter indicates if an existing table should be overwritten or data should be appended to the existing data.
Set default varchar length
This parameter allows you to set varchar columns to default length.
Default varchar length
This parameter is only available when the set default varchar length parameter is set to true. This parameter specifies the default length of varchar columns.
Add generated primary keys
This parameter indicates whether a new attribute holding the auto generated primary keys should be added to the table in the database.
Db key attribute name
This parameter is only available when the add generated primary keys parameter is set to true. This parameter specifies the name of the attribute for the auto generated primary keys.
Batch size
This parameter specifies the number of examples which are written at once with one single query to the database. Larger values can greatly improve the speed. However, too large values can drastically decrease the performance. Moreover, some databases have restrictions on the maximum number of values written at once. Set to 0 to disable batching.