Monday, 11 May 2015

PDI- Multiway Merge Join


Dear Friends,

            Lets have a look on the function of Multiway Merge Join.

Step 1: Drag the table inputs and give the DB credentials.

Step 2: First of all , for Multiway merging we need the output from DB is to be sorted because its mandatory for Merging in PDI . Otherwise, the data might be wrong on the continuous rows.

Step 3: Then drag the  Multiway Merge Join  find under Joins tab in left side .

Step 4: Double click on Multiway Merge join icon and give the input steps. Then , press the Select Keys button in order to mention the join keys betwen steps.

Step 5: In order to check the process, press Fn+F10. Preview will be shown.


Find the example screen above

Step 6: Change the joining Type as per the requirement..

Thanks,


Dilip

PDI Tutorials - Merging Two Tables from different servers


Lets have a look at the join function provided by the pdi.

Step 1: Drag the table inputs and give the DB credentials.

Step 2: First of all , for merging we need the output from DB is to be sorted because its mandatory for Merging in PDI . Otherwise, the data might be wrong on the continuous rows.

Step 3: Then drag the  Merge Join  find under Joins tab in left side .

Step 4: Double click on Merge join icon and give the first step & second step respectively. Then , press the Get Fields button to get all the fields from the step.

Step 5: Then remove the non key fields from the list and give OK.

Step 6: In order to check the process, press Fn+F10. Preview will be shown.


Find the example screen above.

Thanks,
Dilip

Sunday, 10 May 2015

Pentaho Data Integration Tutorial.

    Hi Guys,

                              By this tutorial, I am going to explain you the features of Pentaho data integration (a.k.a Kettle).

What is ETL:

                In computing, Extract, Transform and Load (ETL) refers to a process in database usage and especially in data warehousing that :

  • Extract is the process of reading data from a database.
  • Transform is the process of converting the extracted data from its previous form into the form it needs to be in so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining the data with other data.
  • Load is the process of writing the data into the target database.


                ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to convert databases from one format or type to another.

PDI provides you the tool which is visualized to eliminate the coding and complexity. Just drag and Drop the required input ,output & transform the data as per the business logic.

        PDI can also be used for other purposes:
                  Migrating data between applications or databases
                  Exporting data from databases to flat files
                  Loading data massively into databases
                  Data cleansing
                  Integrating applications

PDI - Loading Multiple files based on reg exp...

                 Let's have a look on PDI transformation to load multiple files at a time using regular expression.

Step 1: Drag a Getting file name function from Input .



Step 2: Double click on it and provide the file directory from where to pick the files.



Step 3: Then, on Regular Expression, give the file previx followed by .*\. filepostfix.

                    For Ex. File_.*\.txt 


Step 4: Then, press the add button and give show filenames. File names will be listed from that directory based on the file prefix (File_) and file postfix(txt).


Step 5: Now, drag the table output and load the files.


Thanks,
Dilip

Thursday, 7 May 2015

PDI - Loading CSV File to Table




                     Hi guys, let us follow the simple steps to load a csv file to a table and the vice versa.

Step 1 : Drag the text file input to the workspace.


Step2: Double click on the file input and give the proper values for the tabs.



 Step 3: After given the values , click get fields to get all the fields from the file.


Step 4: Then, drag the table output and link the input and output.

 
Step 5: Give all the db credentials in connection box and give the field names.

 

 
Step 6: Then run the transformation, data will be loaded into the table.

 

Thanks.
Queries are highly appreciated....
Dilip Yadav S


Monday, 8 September 2014

Intro

Hi Guys,
 
          I am Dilip Yadav S, working as a BI developer on Pentaho BI Suite around 2.7Yrs. having wide range of knowledge in Buisness Intelligence tools.

Pentaho :

           Currently  Pentaho is the worlds best Open Source Buisness Intelligence tool. Recently they have updated the look and feel, performance, and merged the admin and biserver into a single GUI which is easy for the End user, as well as for the Developers.


When you look at the below graph, the most of the clients or companies using the pentaho as a BI tool , because of the extraordinary features and the frequent updates with latest techniques.