Using regular expressions when profiling SQL Server with Talend Open Profiler

28 08 2009
Email column analysis

Email column analysis

By default, SQL Server does not provide integrated functions to do pattern matching with regular expressions. If you want to use Talend Open Profiler to analyze the matching of your data against some regular expressions (for example to analyze a column that contains emails, dates,…) you need to install a CLR User-Defined Function (UDF) on your SQL Server.

Amine who has done an internship at Talend has tried it. He provided us a step-by-step user guide about how to create the UDF, install it on your SQL Server and call it from Talend Open Profiler. This will allow you to use all predefined patterns to ensure the quality of your data.

You can also find new regular expressions for Talend Open Profiler in Talend Exchange.

Advertisements

Actions

Information

7 responses

31 08 2009
Amine

Thanks Sebastiao ,

If any one needs more help to install this or others functions , be free to contact me.
hallam-dz.com

7 12 2010
Pinback

Thank you Sebatiao

I have gone through all the trouble of getting Visual Studio Pro 2005 and what not to find out that my TDQ 4.0.1 has not got the UDF function under preferences… are there any workarounds for this?

Regards

7 12 2010
scorreia

Hi Pinback,
The preference page existed in older version of the profiler because there was no access to the indicators. Now, you can edit the indicators directly in “Libraries/Indicators/System Indicators/”.
You need to add the SQL template appropriate to your database (MS SQL) in the Regex Matching indicator. See more details on Talend’s forum.

8 12 2010
Pinback

Hi scorreia

Thank you very much for the speedy reply, much appreciated. Will have a look at the topic

24 03 2011
Pinback

Hi Scorreia…

I have found the UDF in Talend, but what is the fuction i need to add in the field?

24 03 2011
Pinback

Sorry, my bad… what id like to know what the template should look like. Sorry, new to all of this 🙂

24 03 2011
scorreia

If your function is called RegExMatch, then the template should look like this:


SELECT COUNT(CASE WHEN dbo.RegExMatch(<%=__COLUMN_NAMES__%>,<%=__PATTERN_EXPR__%>) THEN 1 END), COUNT(*) FROM <%=__TABLE_NAME__%> <%=__WHERE_CLAUSE__%>

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: