Introduction to quantum computer technology

24 04 2008

Here is an interesting article: An Overview of Quantum Computing for Technology Managers. As the title says, it is aimed at the technology managers. This means that there is no formula and that this paper can be read by everybody.

In 20 pages, you will have an overview of the currently very active field of quantum computing. A short presentation of the two most important quantum algorithms are given which are the Grover’s algorithm (database search algorithm) and the Shor’s algorithm (integer factorization algorithm). Simulation,  cryptography and security are also discussed.

I did not read the whole article, but it does not seem to discuss about the concrete applications. One could think that we are still far from building a quantum computer. But a Canadian company seems to be much closer than the scientific community would have thought. See here and here for some comments about this company. Did they build a real quantum computer? Some people are skeptical…




Talend Open Studio vs Pentaho Data Integration

29 03 2008

You don’t know which Open source ETL to choose? Have a look at this white paper (in French).

It contains some interesting benchmarks that can help you choosing the best ETL for your need.




Talend’s contribution in scientific domains

28 02 2008

As an open source software, Talend Open Studio (now in version 2.3.1) can help you to integrate your scientific data. Here is an example at the University for Health Sciences, Medical Informatics and Technology (UMIT) in Austria.

Moreover, it is not said in the article, but Talend Open Studio is able to convert your data in the Weka format if you need to do some data mining. The components are named tFileInputARFF and tFileOutputARFF.

They allow you to import (export) data from (into) any supported storage system into (from) a file in ARFF (Attribute-Relation File Format). The documentation for these components will come soon. If you need help, don’t hesitate to ask questions on the forum or to look at the documentation for Talend Opend Studio.




Java program as a Windows service

11 02 2008

A few days ago, I used JSL for transforming a small Java program as a Windows service. The usage is really simple, define a class with static methods for starting and stopping the service. An example is the class TelnetEcho.java given with the sources of the library.

Then add the “jsl.jar” into the classpath, adapt the “jsl.ini” file to your needs and run “jsl.exe -debug” for testing the correct configuration of the service. This launches the service. Type “Ctrl-C” for stopping properly the service.

If everything is OK in this debug test, then you simply have to run “jsl.exe -install” in order to install the service. The name of the service is set in the jsl.ini file.

The service is then removed (if needed) with “jsl.exe -remove”.

Thanks Michael for this library.




What is Master Data?

7 02 2008

What is Master Data?

In this podcast, an interesting categorization of data is given by Malcom Chisholm.

In brief, there are six types of data:

  1. Meta Data (e.g. tables names…)
  2. Reference Data (e.g. code tables…)
  3. Enterprise Structure Data (hierarchical enterprise organization of data, e.g. product line…)
  4. Transaction Structure Data (e.g. customer, product data…)
  5. Transaction Activity Data (transactional event data, e.g. orders…)
  6. Transaction Audit Data (states changes in transactional event data, e.g. logs…)

Master data are then the aggregation of reference data, enterprise structure data and transaction structure data. This means that there is not only one category of master data. Thus the meaning of the different types of master data must be taken into account in any good MDM tool.




Analyze your Java code

1 02 2008



New look for Talendforge

29 01 2008

The design of Talendforge has changed. Have a look.

And by the way: Happy New Year.

voeux_2008.jpg




Necessary metadata

28 01 2008

In this note, Bill Inmon complains about the endless task of documenting all possible Metadata. Let’s think a bit about this.

If we take the following definition for metadata : data about data, it is clear that documenting all metadata is impossible and endless. Because metadata is also data, metadata can be information about data but can also be information about metadata. And hence the endless loop of documenting the data used for documenting the data…

But, all metadata do not change that much. As an example, see the CWM specification which describes already most of the metadata needed in the data management domain.
Of course, CWM is not exhaustive and cannot be. But maybe, CWM could play the role of the “necessary metadata” searched by Bill Inmon, at least in the domain of data management.

Then data like averages, maximum and all computed data are not really metadata as I understand them. These data are not really a description of data, they are rather data computed from the data. They depend on instances and not on data classes. They are other informations (complementary information) about the data. And I don’t think they should be called metadata. Otherwise everything is a metadata since everything is a data about some other data. Then we could ask ourselves “what is a data?”

Finally, it seems to be obvious that documenting all metadata is impossible. It’s like writing the perfect program: even for a simple program that takes an input and writes it to the output, there is the possibility to write pages of code for handling all the use cases we can think at. Even then, the program will not be able to handle some unexpected cases.




Nice desktop

24 11 2007

Amazing!




On the complexity of Computer Science

24 11 2007

Who said that Physics is more complex than Computer Science. Have a look at how a simple physics equation E=mc^2 is written using the Common Warehouse Metamodel. :-)

emc2.png