Loop SharePoint
Synopsis
This operator loops over all files in the specified folder from the SharePoint.
Description
After you have configured your Sharepoint account(see below),you can process all the SharePoint files within the selected folder.
In order to access SharePoint service, you need to set up an application connection with Microsoft. This involves a number of steps and at the end, you will have the credentials to create a RapidMiner Sharepoint Connection, needed as an input of this operator. These steps require user-specific choices, but the overall workflow along with some hints is explained below.
The Operator is based on Microsoft's Graph API using its v1.0 endpoint. Further information on Graph API can be found over: https://developer.microsoft.com/en-us/graph.
Login to Microsoft Azure Portal at: https://portal.azure.com with your company's active directory account (which you use to access SharePoint, Outlook or other Office 365 services)
- Search for "App Registrations" and then click on "New registration". Enter a name for your application and provide this "https://storageauth.rapidminer.com" as the redirect URI for the type Web.
- During App creation, the system will give you a "Client Secret", which is a hash string. Keep it as it’s needed for the Connection. You can later create new secrets from “Certificates and secrets”.
- During App creation, please enable the two check-boxes for "Access tokens" and "ID tokens". You can edit that later in “Authentication”.
- After App creation, you need to grant it API-level permissions. Click on "Api permissions" link in the left panel. It will show you "Microsoft Graph" API hierarchy.
- For the "Application Flow" - Select "Application permissions" and grant at least these: Application.ReadWrite.All, Application.ReadWrite.OwnedBy, Device.ReadWrite.All, Directory.Read.All, Directory.ReadWrite.All, Domain.ReadWrite.All, Sites.Manage.All, Sites.Read.All, Sites.ReadWrite.All, User.Read.All.
- For the "OAuth 2.0 Delegated Flow" - Select "Delegated permissions" and grant at least these: AllSites.Manage, AllSites.Read, AllSites.Write, MyFiles.Read, MyFiles.Write, Project.Read, Project.Write.
- After the App has been created, click on the name of your App. You will see some properties you’ll need for the Connection: Application (client) ID and Directory (tenant) ID.
- Your admin needs to approve your account. Approval is done opening this URL and following the message displayed. You can build the URL with
https://login.microsoftonline.com/{YOUR_DIRECTORY_TENANT_ID}/adminconsent?client_id={YOUR_APPLICATION_CLIENT_ID}(you’re the previously obtained values of the{}placeholders. - The SharePoint "site" is visible in your SP URL. If the URL is https://company.sharepoint.com/sites/onboarding, the SharePoint URL is company.sharepoint.com and the SharePoint Site is onboarding.
Be aware that the operator cannot read the file as example set. For this reason, you must connect the file input in the inner process of this operator to another appropriate operator to process the file. For example, if you want to load Excel files from your Sharepoint Account, you must connect the file input in the inner process with the Read Excel operator.
Input
in
Optional input data which is delivered to the inner process.
connection
This input port expects a Connection object if any. See the parameter connection entry for more information.
Output
out
Output data of the inner process.
connection
This output port delivers the Connection object from the input port. If the input port is not connected the port delivers nothing.
Parameters
Connection entry
This parameter is used to specify a repository location that represents a connection entry. The connection can also be provided using the connection input port.
Folder
Provide the name of the Sharepoint 'folder' over which you want to loop.
Filter
Optional filter via a regular expression which is used to exclude files from looping over them, e.g. 'a.*b' for all files starting with 'a' and ending with 'b'. Ignored if empty.
Filtered string
Indicates which part of the file name is matched against the filter expression.
- file_name: Filtered on the name, e.g. 'myfolder/myfile.txt'
- full_path: Filtered on the full path, e.g. 'parentfolder/myfolder/myfile.txt'
- parent_path: Filtered on the parent folder, e.g. 'myfolder/'
File name macro
The name of the macro which will contain the name of the current file for each file the loop iterates over, e.g. 'myfolder/myfile.txt'
File path macro
The name of the macro which will contain the full path of the current file for each file the loop iterates over, e.g. e.g. 'parentfolder/myfolder/myfile.txt'
Parent path macro
The name of the macro which will contain the parent folder of the current file for each file the loop iterates over, e.g. e.g. 'myfolder/'
Recursive
If selected, the loop will also iterate over all files in all subfolders of the selected folder. Otherwise, it will only iterate over the files in the selected folder.