Using regular expressions when profiling SQL Server with Talend Open Profiler

28 08 2009
Email column analysis

Email column analysis

By default, SQL Server does not provide integrated functions to do pattern matching with regular expressions. If you want to use Talend Open Profiler to analyze the matching of your data against some regular expressions (for example to analyze a column that contains emails, dates,…) you need to install a CLR User-Defined Function (UDF) on your SQL Server.

Amine who has done an internship at Talend has tried it. He provided us a step-by-step user guide about how to create the UDF, install it on your SQL Server and call it from Talend Open Profiler. This will allow you to use all predefined patterns to ensure the quality of your data.

You can also find new regular expressions for Talend Open Profiler in Talend Exchange.

Advertisements




Share your regular expressions on Talend Exchange

12 02 2009

A new service is offered on the Talend Forge website. It’s called Talend Exchange and is aimed at providing a place where you can share TOS components, TOS templates, TOS jobs and TOP patterns. It is an enhancement of the previously called “Ecosystem” service.

I have added a few new regular expressions for Talend Open Profiler which may be useful to French users. You may download them and import them in Talend Open Profiler-1.1.x via the “import patterns” menu available on the “Libraries/Patterns” folder.

You may add your own regular expressions too. For the Talend Open Profiler-1.1.x platform, you need to create a simple xls file with the following columns
* C1: Label: the label of the pattern
* C2: Purpose: the purpose of the pattern
* C3: Description: the description of the pattern
* C4: Regular expression: the regular expression applicable to all databases
* C5: MySQL Regexp: the regular expression applicable to MySQL databases
* C6: Oracle Regexp: the regular expression applicable to Oracle databases
* C7: Author: the author of the regular expression
Then compress it in the zip format and you’re ready to upload it on Talend Exchange.

For the current development version of Talend Open Profiler-1.2.0xx, the task is even easier because there is a menu to export your regular patterns and/or your SQL patterns in csv format. For this version, it is suggested to export your patterns individually. Then compress your csv file(s) and upload them on Talend Exchange to share them with other users. The “import” menu will be able to read these files (once unzipped).

We’ll work later on a more integrated mechanism to share your patterns in one click.