<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments for My IT Weblog</title>
	<atom:link href="http://scorreiait.wordpress.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://scorreiait.wordpress.com</link>
	<description>Just another WordPress.com weblog</description>
	<lastBuildDate>Mon, 31 Aug 2009 16:12:30 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on Using regular expressions when profiling SQL Server with Talend Open Profiler by Amine</title>
		<link>http://scorreiait.wordpress.com/2009/08/28/using-regular-expressions-when-profiling-sql-server-with-talend-open-profiler/#comment-155</link>
		<dc:creator>Amine</dc:creator>
		<pubDate>Mon, 31 Aug 2009 16:12:30 +0000</pubDate>
		<guid isPermaLink="false">http://scorreiait.wordpress.com/?p=196#comment-155</guid>
		<description>Thanks Sebastiao ,

If any one needs more help to install this or others functions , be free to contact me.
hallam-dz.com</description>
		<content:encoded><![CDATA[<p>Thanks Sebastiao ,</p>
<p>If any one needs more help to install this or others functions , be free to contact me.<br />
hallam-dz.com</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How to detect random text in a free text field? by scorreia</title>
		<link>http://scorreiait.wordpress.com/2009/03/13/how-to-detect-random-text-in-a-free-text-field/#comment-145</link>
		<dc:creator>scorreia</dc:creator>
		<pubDate>Thu, 02 Jul 2009 18:17:12 +0000</pubDate>
		<guid isPermaLink="false">http://scorreiait.wordpress.com/?p=171#comment-145</guid>
		<description>yes, the physical proximity of the keys is what gave me this idea. But it&#039;s also because on the French keyboard, there is no vowel on the home row. Hence the words created with the keys of the home row only have a low probability to be valid words. 

I have run some tests on real data with emails and company names. I have found a few percentage of invalid data. 

For example, I have found the following invalid emails:
szdfsdf@dfsdf.sdfsd
gddggh@nfngn.fhgfh
bnmbnm@gffdg.grgtrg
cdcdc@vde.dcdcw
fdghfdgh@fghfg.dfghfd
ff@fgfh.gbvgg
adsdc@dscsdcdsc.cddcsc
adsdc@dscsdcdsc.cddcsc
rhhtyhtyhtyh@rthgrthr.frtrgtr
vbcvb@cvbcvb.cvbcv
sdf@adf.pldff
ggugiug@ghghg.ckcjf
hkl@domain.sdfgsdfg


and for the companies:
fghfg
ssdfgfsdgvgx
SDCDSC
dslck
dscsd
sdcfscds
fdvdfv
xcvxcvx
dfsdf
sqsqsq
xvxvxv
sdfsdf
sddssdsdsd
cqwxd
ANPCyT
gdfgdf
fffhk
drgdrg
drgdrg
hjhkjhkj
hbhbjhb
hbhbjhb
nttcw
sgdfg
BCBSF
asdgggs
ytyrtytttttttttttttttttttttt
qwdqw
dddqd
sdfsdfsdfsdf
gfjhfg

In fact, I had to change my regular expression and search for 5 consecutive consonants instead of 4. The reason is that some companies have 4 consecutive consonants. For example: SNCF and many german companies contain the acronym GmbH. 

But when I used my smaller regular expression limited to the home row keys, these terms do not match.</description>
		<content:encoded><![CDATA[<p>yes, the physical proximity of the keys is what gave me this idea. But it&#8217;s also because on the French keyboard, there is no vowel on the home row. Hence the words created with the keys of the home row only have a low probability to be valid words. </p>
<p>I have run some tests on real data with emails and company names. I have found a few percentage of invalid data. </p>
<p>For example, I have found the following invalid emails:<br />
<a href="mailto:szdfsdf@dfsdf.sdfsd">szdfsdf@dfsdf.sdfsd</a><br />
<a href="mailto:gddggh@nfngn.fhgfh">gddggh@nfngn.fhgfh</a><br />
<a href="mailto:bnmbnm@gffdg.grgtrg">bnmbnm@gffdg.grgtrg</a><br />
<a href="mailto:cdcdc@vde.dcdcw">cdcdc@vde.dcdcw</a><br />
<a href="mailto:fdghfdgh@fghfg.dfghfd">fdghfdgh@fghfg.dfghfd</a><br />
<a href="mailto:ff@fgfh.gbvgg">ff@fgfh.gbvgg</a><br />
<a href="mailto:adsdc@dscsdcdsc.cddcsc">adsdc@dscsdcdsc.cddcsc</a><br />
<a href="mailto:adsdc@dscsdcdsc.cddcsc">adsdc@dscsdcdsc.cddcsc</a><br />
<a href="mailto:rhhtyhtyhtyh@rthgrthr.frtrgtr">rhhtyhtyhtyh@rthgrthr.frtrgtr</a><br />
<a href="mailto:vbcvb@cvbcvb.cvbcv">vbcvb@cvbcvb.cvbcv</a><br />
<a href="mailto:sdf@adf.pldff">sdf@adf.pldff</a><br />
<a href="mailto:ggugiug@ghghg.ckcjf">ggugiug@ghghg.ckcjf</a><br />
<a href="mailto:hkl@domain.sdfgsdfg">hkl@domain.sdfgsdfg</a></p>
<p>and for the companies:<br />
fghfg<br />
ssdfgfsdgvgx<br />
SDCDSC<br />
dslck<br />
dscsd<br />
sdcfscds<br />
fdvdfv<br />
xcvxcvx<br />
dfsdf<br />
sqsqsq<br />
xvxvxv<br />
sdfsdf<br />
sddssdsdsd<br />
cqwxd<br />
ANPCyT<br />
gdfgdf<br />
fffhk<br />
drgdrg<br />
drgdrg<br />
hjhkjhkj<br />
hbhbjhb<br />
hbhbjhb<br />
nttcw<br />
sgdfg<br />
BCBSF<br />
asdgggs<br />
ytyrtytttttttttttttttttttttt<br />
qwdqw<br />
dddqd<br />
sdfsdfsdfsdf<br />
gfjhfg</p>
<p>In fact, I had to change my regular expression and search for 5 consecutive consonants instead of 4. The reason is that some companies have 4 consecutive consonants. For example: SNCF and many german companies contain the acronym GmbH. </p>
<p>But when I used my smaller regular expression limited to the home row keys, these terms do not match.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How to detect random text in a free text field? by bronius</title>
		<link>http://scorreiait.wordpress.com/2009/03/13/how-to-detect-random-text-in-a-free-text-field/#comment-144</link>
		<dc:creator>bronius</dc:creator>
		<pubDate>Wed, 01 Jul 2009 19:53:28 +0000</pubDate>
		<guid isPermaLink="false">http://scorreiait.wordpress.com/?p=171#comment-144</guid>
		<description>Ha!  Clever idea. The general concept is &quot;detecting a number of characters entered whose keys are in physical proximity of one-another&quot;.

I like it.  How is it working out for you in practice?</description>
		<content:encoded><![CDATA[<p>Ha!  Clever idea. The general concept is &#8220;detecting a number of characters entered whose keys are in physical proximity of one-another&#8221;.</p>
<p>I like it.  How is it working out for you in practice?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How to detect random text in a free text field? by Datenqualität in Textfeldern mit RegExp überprüfen - dijit</title>
		<link>http://scorreiait.wordpress.com/2009/03/13/how-to-detect-random-text-in-a-free-text-field/#comment-112</link>
		<dc:creator>Datenqualität in Textfeldern mit RegExp überprüfen - dijit</dc:creator>
		<pubDate>Fri, 13 Mar 2009 18:09:09 +0000</pubDate>
		<guid isPermaLink="false">http://scorreiait.wordpress.com/?p=171#comment-112</guid>
		<description>[...] Ansatz, um Texteingaben nach bewussten Falscheingaben zu durchsuchen, hat mein Kollege Sebastiao in seinem Blog veröffentlicht.Er macht sich dabei eine sehr interessante Tatsache zu nutze - das [...]</description>
		<content:encoded><![CDATA[<p>[...] Ansatz, um Texteingaben nach bewussten Falscheingaben zu durchsuchen, hat mein Kollege Sebastiao in seinem Blog veröffentlicht.Er macht sich dabei eine sehr interessante Tatsache zu nutze &#8211; das [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Talend Open Studio vs Pentaho Data Integration by scorreia</title>
		<link>http://scorreiait.wordpress.com/2008/03/29/talend-open-studio-vs-pentaho-data-integration/#comment-109</link>
		<dc:creator>scorreia</dc:creator>
		<pubDate>Sun, 01 Mar 2009 13:19:32 +0000</pubDate>
		<guid isPermaLink="false">http://scorreiait.wordpress.com/?p=30#comment-109</guid>
		<description>Hi,

sorry for this late reply, I missed your last comment. I don&#039;t know about a document related to Cognos.</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>sorry for this late reply, I missed your last comment. I don&#8217;t know about a document related to Cognos.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Talend Open Studio vs Pentaho Data Integration by Pawan</title>
		<link>http://scorreiait.wordpress.com/2008/03/29/talend-open-studio-vs-pentaho-data-integration/#comment-102</link>
		<dc:creator>Pawan</dc:creator>
		<pubDate>Mon, 02 Feb 2009 05:21:07 +0000</pubDate>
		<guid isPermaLink="false">http://scorreiait.wordpress.com/?p=30#comment-102</guid>
		<description>Thanks a lot for this nice benchmarking doc. Do you have any benchmarking doc related to &quot;Cognos Vs Japser or Pentaho&quot;.</description>
		<content:encoded><![CDATA[<p>Thanks a lot for this nice benchmarking doc. Do you have any benchmarking doc related to &#8220;Cognos Vs Japser or Pentaho&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Talend Open Studio vs Pentaho Data Integration by scorreia</title>
		<link>http://scorreiait.wordpress.com/2008/03/29/talend-open-studio-vs-pentaho-data-integration/#comment-101</link>
		<dc:creator>scorreia</dc:creator>
		<pubDate>Fri, 30 Jan 2009 15:35:41 +0000</pubDate>
		<guid isPermaLink="false">http://scorreiait.wordpress.com/?p=30#comment-101</guid>
		<description>Hi,

I don&#039;t know about an English version of this paper. But you may find &lt;a href=&quot;http://marcrussel.files.wordpress.com/2008/10/etlbenchmarks_manappsc221008.pdf&quot; rel=&quot;nofollow&quot;&gt;this benchmark&lt;/a&gt; interesting. It is commented on &lt;a href=&quot;http://marcrussel.wordpress.com/2008/10/29/etl-benchmark-by-manapps/&quot; rel=&quot;nofollow&quot;&gt;Marc Russel&#039;s blog&lt;/a&gt; and on &lt;a href=&quot;http://blog.gobansaor.com/2008/10/30/open-source-metrics/&quot; rel=&quot;nofollow&quot;&gt;Goban Saor&#039;s blog&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I don&#8217;t know about an English version of this paper. But you may find <a href="http://marcrussel.files.wordpress.com/2008/10/etlbenchmarks_manappsc221008.pdf" rel="nofollow">this benchmark</a> interesting. It is commented on <a href="http://marcrussel.wordpress.com/2008/10/29/etl-benchmark-by-manapps/" rel="nofollow">Marc Russel&#8217;s blog</a> and on <a href="http://blog.gobansaor.com/2008/10/30/open-source-metrics/" rel="nofollow">Goban Saor&#8217;s blog</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Talend Open Studio vs Pentaho Data Integration by Pawan</title>
		<link>http://scorreiait.wordpress.com/2008/03/29/talend-open-studio-vs-pentaho-data-integration/#comment-100</link>
		<dc:creator>Pawan</dc:creator>
		<pubDate>Fri, 30 Jan 2009 07:19:54 +0000</pubDate>
		<guid isPermaLink="false">http://scorreiait.wordpress.com/?p=30#comment-100</guid>
		<description>Hi, I am really interested in checking this paper but, i don&#039;t know french. Does anybody have english version of this paper?</description>
		<content:encoded><![CDATA[<p>Hi, I am really interested in checking this paper but, i don&#8217;t know french. Does anybody have english version of this paper?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Talend Open Profiler by Cécile Maindron</title>
		<link>http://scorreiait.wordpress.com/2008/06/20/talend-open-profiler/#comment-40</link>
		<dc:creator>Cécile Maindron</dc:creator>
		<pubDate>Fri, 01 Aug 2008 11:58:47 +0000</pubDate>
		<guid isPermaLink="false">http://scorreiait.wordpress.com/?p=38#comment-40</guid>
		<description>Do you want to learn more about Talend Open Profiler? Are you interested in seeing a live demo of the tool? Reserve your Webinar seat now at: https://www1.gotomeeting.com/register/472014645 (English version)
https://www1.gotomeeting.com/register/256033456 (French version)
https://www1.gotomeeting.com/register/553490277 (German version)

To view a full calendar of Talend Webinars:
http://www.talend.com/campaign/campaign.php?id=125&amp;src=PostCalendarWebinar</description>
		<content:encoded><![CDATA[<p>Do you want to learn more about Talend Open Profiler? Are you interested in seeing a live demo of the tool? Reserve your Webinar seat now at: <a href="https://www1.gotomeeting.com/register/472014645" rel="nofollow">https://www1.gotomeeting.com/register/472014645</a> (English version)<br />
<a href="https://www1.gotomeeting.com/register/256033456" rel="nofollow">https://www1.gotomeeting.com/register/256033456</a> (French version)<br />
<a href="https://www1.gotomeeting.com/register/553490277" rel="nofollow">https://www1.gotomeeting.com/register/553490277</a> (German version)</p>
<p>To view a full calendar of Talend Webinars:<br />
<a href="http://www.talend.com/campaign/campaign.php?id=125&amp;src=PostCalendarWebinar" rel="nofollow">http://www.talend.com/campaign/campaign.php?id=125&amp;src=PostCalendarWebinar</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Talend Open Profiler by Database Management &#187; Blog Archive &#187; Talend Open Profiler</title>
		<link>http://scorreiait.wordpress.com/2008/06/20/talend-open-profiler/#comment-38</link>
		<dc:creator>Database Management &#187; Blog Archive &#187; Talend Open Profiler</dc:creator>
		<pubDate>Fri, 20 Jun 2008 22:31:12 +0000</pubDate>
		<guid isPermaLink="false">http://scorreiait.wordpress.com/?p=38#comment-38</guid>
		<description>[...] creativelinkjobs.com wrote an interesting post today onHere&#8217;s a quick excerpt I have been working on this project for a few months now and I am pleased to announce the first public release candidate of Talend Open Profiler. This tool helps you to browse, explore your databases and analyze your data. For each column that you want to analyze, you have several indicators at your disposal: row counts, null counts, duplicate counts… field length, frequency table, summary statistics… There are also indicators based on regular expressions. These indicators helps you to discov [...]</description>
		<content:encoded><![CDATA[<p>[...] creativelinkjobs.com wrote an interesting post today onHere&#8217;s a quick excerpt I have been working on this project for a few months now and I am pleased to announce the first public release candidate of Talend Open Profiler. This tool helps you to browse, explore your databases and analyze your data. For each column that you want to analyze, you have several indicators at your disposal: row counts, null counts, duplicate counts… field length, frequency table, summary statistics… There are also indicators based on regular expressions. These indicators helps you to discov [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
