CSV Import
Concept¶
The manual configuration for importing CSV files consists of an XML configuration file for the Implex. This file is divided into several sections to ensure the login to the CortexEngine, the reading of the sources (ReaderModule) and the mapping between source and target. The structure of this configuration file is basically the same for every import and differs in the respective parameters for sources, ReaderModule and target mapping.
Important
The first row of a CSV file contains column headings (field identifiers). If this is not the case, you must change the value for the ColumnMode in the ReaderModule section of the configuration file. The options available here are HEADER (if the first row contains the identifiers), NUMERIC or ABC.
ReaderModule for CSV files¶
The CSV files show two records to be imported with the respective field identifiers in the first line.
Example of a CSV file (values separated by semicolons):¶
"LastName"; ‘FirstName’; ‘City’; ‘HobbyNo’; "Hobbies" "Smith"; ‘Max’; ‘New York’; ‘1,2’; "Soccer,Hockey" "Jones"; ‘Cynthia’; ‘Philadelphia’; ‘1,2’; "Dancing,Riding"
Example of a CSV file (tab-separated values):¶
"LastName" ‘FirstName’ ‘City’ ‘HobbyNo’ "Hobbies" "Smith" ‘Max’ ‘New York’ ‘1,2’ "Football,Hockey" "Jones" ‘Cynthia’ ‘Philadelphia’ ‘1,2’ "Dancing,Riding"
The section for the area <ReaderModule...="[...]">
shows the appropriate configuration settings for semicolon-separated values.
<ReaderModule type="csv">
<Filename>/Data/import/2024Mai.csv</Filename> <!-- relative path to implex in bin directory -->
<Separator>;</Separator> <!-- field separator -->
<Enclosure>"</Enclosure> <!-- field delimiter -->
<RepSeparator>,</RepSeparator> <!-- separator for repeated content in one field; e.g.: "email 1; email 2; email 3; ..." -->
<ColumnMode>HEADER</ColumnMode> <!-- column mode HEADER, NUMERIC, ABC -->
<Charset>UTF-8</Charset>
</ReaderModule>
To import the file successfully, you must
- specify the file name with the absolute path
- define the correct character set (<
Charset
>) of the import file so that umlauts and special characters are interpreted correctly (UTF-8, Windows-1251 and ISO-8859-1) - characters other than semicolons to separate the values must be specified in the FieldSeparator field.
- use the placeholder {TAB} when using tabs
If the examples shown so far are combined as a configuration file, the XML file looks like this:
<xml version="1.0" encoding="UTF-8"?>
<CtxImport>
<Global>
<LoginIP>127.0.0.1</LoginIP>
<LoginPort>29001</LoginPort>
<LoginUser>admin</LoginUser>
<LoginPW>adm#13qzy2!</LoginPW>
<ImportMode>nu</ImportMode>
</Global>
<ReaderModule type="csv">
<Filename>/Data/import/2024Mai.csv</Filename>
<Separator>;</Separator>
<Enclosure>"</Enclosure>
<RepSeparator>,</RepSeparator>
<ColumnMode>HEADER</ColumnMode>
<Charset>UTF-8</Charset>
</ReaderModule>
<ImportSection recordtype="[.......]">
[.......]
</ImportSection>
</CtxImport>
Import Section for CSV files¶
Within the <ImportSection ...="[...]"
you define the field assignments and references for the import. You can save field contents as history information and use further functions (e.g. define case distinctions based on certain contents or the modification of contents).
The following import section can be used for the CSV example above if fields with the synonyms "perName", "perFor" and "perLocation" are available:
<ImportSection recordtype="PERS">
<Reference>perName</Reference>
<Field>perName=getChar('Name')</Field>
<Field>perVorn=getChar('Vorname')</Field>
<Field>perAOrt=getChar('Ort')</Field>
</ImportSection>
- the abbreviation of the record type is defined as a parameter for the
<ImportSection ...="[...]"
(in the example shown<...> ...="PERS"
) - the field assignments thus refer to this record type
- new records are created if no record for an update can be found using the reference and the ImportMode has been set to
nu
orn
Update Records¶
If you want to update existing records, it is necessary to specify a reference field. You define this via the Reference entry.
You then assign all other source fields to a target by creating a
<Field>perName=getChar('Name')</Field>
In this example, the value from the 'Name' field is assigned to the target field 'perName'. Please note that the source fields are always enclosed with the getChar
function.
Importing CSV files without a Header¶
If CSV files exist without a header and either NUMERIC or ABC was specified in the ReaderModule, other specifications must also be made when reading the fields. In this case, the getChar
function cannot be applied to a field name, but to the numeric or alphabetic column entries.
<Field>perName=getChar('A')</Field>
<Field>perLNam=getChar('B')</Field>
Use of Functions¶
There are several functions available for configuration that you can use to change the content. You can find a complete overview in the chapter functions.
<ImportSection recordtype="PERS">
<Reference>perName</Reference>
<Field>perName=getChar('Name')</Field>
<Field>perVorn=getChar('Vorname')</Field>
<Field>perAOrt=getChar('Ort')</Field>
<Field>Quelle=file_woext(sourceFileName)()</Field>
<Field>perGes=iif(getChar('Vorname')== 'Max','M','W')</Field>
<Field>perSuch=getChar('Vorname')+getChar('Name')</Field>
</ImportSection>
The functions access various data and generate new information:
- the file name that is passed to the
file_woext
function must be specified without the file extension (2024May.csv thus becomes 2024May) - a case distinction is made using
iif
based on the first name: In the case of 'Max', the fieldperGes
(gender) receives the value 'M', otherwise 'W' - in addition, the source fields 'First name' and 'Last name' are combined into one value and written to the field 'perSuch'.
Note
Please note that the source values must first be declared as numerical values (casting) for arithmetic operations. You can do this with the cint
or cfloat
functions, for example. You may also have to perform an adjustment for other types (e.g. for converting numbers to text).
Repetition fields and Repetition field Groups¶
In the field configuration, you can define that a field should be saved multiple times within a record (e.g. hobby or e-mail address). These repetition fields can be combined as a group (e.g. bank details with account, sort code, IBAN,...). You can therefore also import the corresponding data from a source into CortexEngine using the import function.
The CSV files contain the hobbies and a number of the hobbies, which together form a repetition field group and must be configured accordingly.
The number and name of the hobbies should now be imported into the person's record type as a repetition field group. To do this, add the following information to the <ImportSection ...="[...]">
. This means that the first number and the first hobby are regarded as a group, followed by the second number and the second hobby, etc.
<RepGroup>
<Field>HobNr = getChar('HobbyNr')</Field>
<Field>HobBez = getChar('Hobbies')</Field>
</RepGroup>
If it is a single repeated field without any further grouping, the individual field is written to the group of repeated fields.
You can also include several repetition fields and repetition field groups in one record.
Hint
A group of fields can be updated if there is a reference to a leading field and the group that no longer exists in the source can be deleted. This requires an extension of the example above:
The reference
attribute activates the referencing of the group. The first field is always the reference field for updating the group.
The deltalist
attribute specifies that each record to be imported from the source must exist in the CortexEngine. Records that do not exist in the source (newly created or updated) are deleted from the CortexEngine after the import.
Caution
The combination of all attributes can possibly cause unwanted changes. A prior test is therefore recommended.
Import - Example¶
If the examples shown so far are combined as a configuration file, the XML file will look like this:
<?xml version="1.0" encoding="UTF-8"?>
<CtxImport>
<Global>
<LoginIP>127.0.0.1</LoginIP>
<LoginPort>29001</LoginPort>
<LoginUser>admin</LoginUser>
<LoginPW>adm#13qzy2!</LoginPW>
<ImportMode>nu</ImportMode>
</Global>
<ReaderModule type="csv">
<Filename>/Data/import/2013Mai.csv</Filename>
<Separator>;</Separator>
<Enclosure>"</Enclosure>
<RepSeparator>,</RepSeparator>
<ColumnMode>HEADER</ColumnMode>
<Charset>UTF-8</Charset>
</ReaderModule>
<ImportSection recordtype="PERS">
<Reference>perName</Reference>
<Field>perName=getChar('Name')</Field>
<Field>perVorn=getChar('Vorname')</Field>
<Field>perAOrt=getChar('Ort')</Field>
<Field>Quelle=file_woext(sourceFileName)()</Field>
<Field>perGes=iif(getChar('Vorname')== 'Max','M','W')</Field>
<Field>perSuch= getChar('Vorname')+getChar('Name')</Field>
<RepGroup reference='1'>
<Field>HobNr = getChar('HobbyNr')</Field>
<Field>HobBez = getChar('Hobbies')</Field>
</RepGroup>
</ImportSection>
</CtxImport>