Implex optimization¶
Speed¶
Essentially, there are three principles:
- Hardware plays a crucial role in import speed.
- SSD storage is much faster than conventional hard disks.
- An import without referencing is faster than an update of existing records.
Import mode of the server¶
The CortexEngine can be switched to import mode at the time of startup to speed up the data import.
- The data to be imported is read in without creating the management structures.
- Only after the actual data import is the CortexEngine terminated, restarted and reorganized in order to create these structures.
This is useful if considerable amounts of data are imported during the initial import and then only supplementary information is imported.
Import mode procedure¶
- exit the running CortexEngine
- start the CortexEngine manually with the parameter
-m
- carry out the import
- exit the CortexEngine after the import
- start the CortexEngine regularly
- carry out a reorganization via Remote admin
Note
- The CortexEngine is started manually in the terminal.
- Under Windows, run the application
cmd.exe
and then start the CortexEngine:ctxserver64.exe -m
.
Reorganization via remote admin¶
- start the Remote admin
- click on Normalization
At this point it is not possible to work within the CortexEngine, as the management structures are being set up or corrected.
Tip
If your data inventory for initial import and regularly recurring consists of extensive records (several hundred million or billions), we will be happy to assist you with your needs via our customer support (cortex-info@cortex-ag.com).
Filter function¶
Within an import configuration, it is possible to configure a filter that only allows the import of certain records. Invalid records are excluded from the outset and are not skipped.
Within the ImportSection, only one additional parameter is added for this purpose, which is used to define the filter.
<ImportSection datensatztyp="Pers">
<FilterFunktion>
getChar('P_id')!=''
</FilterFunktion>
<Referenz>PersID</Referenz>
<Feld>PersID=getChar('P_id')</Feld>
<Feld>Vor=getChar('Vorname')</Feld>
<Feld>Nam=getChar('Name')</Feld>
</ImportSection>
- all records for which no recordsID exists from the source system are excluded.
- the record is only imported if the filter function returns true.
A combination of several functions is also possible.
<ImportSection datensatztyp="Pers">
<FilterFunktion>
AND(getChar('P_id')!='',getChar('Name')!='')
</FilterFunktion>
<Referenz>PersID</Referenz>
<Feld>PersID=getChar('P_id')</Feld>
<Feld>Vor=getChar('Vorname')</Feld>
<Feld>Nam=getChar('Name')</Feld>
</ImportSection>
Creation of a hash value¶
In addition to the reference import - i.e. updating existing data - it is possible to use a hash field. When importing CSV files, Implex supports the creation of a hash value for the record and writes this hash value to a separate field. If a new import takes place, the hash value is searched for in the CortexEngine before the reference search.
- If it is found, there is no change.
- If it is not found, the reference takes effect and an existing record is updated or a new record will be created.
To use this function, an addition to the import configuration in the ImportSection
block and the specification of an additional field is necessary.
<ImportSection datensatztyp="MyDS">
<HashFilter>hashfld</HashFilter>
<Referenz>myRefFl</Referenz>
<Feld>myField1=getChar('field1')</Feld>
<Feld>myField2=getChar('field2')</Feld>
<Feld>myFieldX=getChar('...')</Feld>
</ImportSection>
In this example, the hash value is stored in the hashfld
field and compared with it during a new import. Further configurations for other fields remain unchanged.
It is not necessary to create another line with an import configuration for the hash value.
Delta import¶
The delta import enables the automatic processing of unmodified records.
For this purpose, the Implex carries the DeltaList
, in which only the records that have not been updated are noted after the source file has been processed. These remaining records (DeltaList
) can be processed according to four different modes.
Settings of the DeltaList
¶
Setting | Action | Description |
---|---|---|
c | clear | set all values, except the current contents of the reference fields from today's date to empty |
l | delete | irrevocable deletion of the complete record |
a | archive | archive according to Uniplex type |
w | recoverable deletion | recoverable deletion by type of Uniplex |
<?xml version="1.0" encoding="UTF-8"?>
<CtxImport>
<Global>
<LoginIP>127.0.0.1</LoginIP>
<LoginPort>29000</LoginPort>
<LoginUser>import-user</LoginUser>
<LoginPW>userpasswd</LoginPW>
<ImportModus>u</ImportModus>
<DeltaListe>l</DeltaListe>
</Global>
<ReaderModul typ="csv">
<Dateiname>import-file.csv</file name>
<FeldTrenner>,</FeldTrenner>
<FeldBegrenzer>"</FeldBegrenzer>
<WdhlTrenner>,</WdhlTrenner>
<SpaltenModus>HEADER</SpaltenModus>
<Charset>ISO-8859-1</Charset>
</ReaderModul>
<ImportSection datensatztyp="Pers">
<Referenz>Name</Referenz>
<Feld>Name=LastName</Feld>
<Feld>Vor=FirstName</Feld>
</ImportSection>
</CtxImport>
- Existing records are updated using the reference.
- Records that are not updated are irrevocably deleted.
Note
The DeltaList
only uses the information of the specified reference field. The record type is not taken into account.
If the reference field is used in different record types, all records that do not belong to the configured type in the ImportSection
are deleted.
Delta import for repetition fields¶
Repetition fields can also be updated via delta import. This means that repetition fields are deleted if they no longer exist in the source.
<WiederholGruppe start="Hobbies">
<Feld deltaliste='d'>PeHob = getChar(Hobby)</Feld>
</WiederholGruppe>
- The contents of a person's hobby field are read from an XML file.
- If a hobby is already stored in the CortexEngine but no longer exists in the source, the corresponding field is removed.
Warning
The combination of the parameter deltaListe
and the specification of a reference, as well as the use of the deltaListe
in several fields of a repetition field group can lead to unexpected results and should be tested sufficiently before final use.