Pages

Pages - Menu

Thursday, October 25, 2012

Oracle Endeca Information Discovery - Clover ETL

Following to the overview on Endeca  today we will see how to create Clover ETL graph and populating data which will be used by MDEX engine for reporting (Studio)

Endeca Information discovery helps organization to answer quickly on relevant data of both structured and Un structured it helps to search and discover and analysis

Information is loaded from multiple data source systems and stored in a faceted data model that dynamically supports changing data .  Information discovery enables an iterative approach



Integration features a new ETL tool, The integrator (Clover ETL) that lets you extract source records from a variety of source types flat files to databases.
The oracle Endeca Server provides support for guided exploration search and analysis on structured,semi - structured/unstructured from inside/external sources 
In structured data  can be easily and directly loaded into a faceted model
Each row becomes a records
Each column becomes and attribute
Unstructured elements can be stored their own records for side by side analysis
Studio is highly interactive ,component-based environment for building analytic application it is built on web-based technology that enable building analytic applications delivered through use of web browsers
Integrator has below components 
Graph :   A graph is a pipeline of components that processes the data. . The simplest graph has one Reader component to read in the source data and one of the integrator connectors to write the data to the Oracle Endeca Server
Edge : The line that connects components ports, carries data between components
Transformer / Joins : Used to transform the data coming from source data system , and joiner is used to join data between two tables (Dim - Fact)
Once Endeca Information discovery installed you can see below components in Start Menu

Start Endeca Server --> Integrator

LoadData.grf(Graph)



In the CloverETL perspective, there is a Navigator pane on the left side of the window. In this pane, you can expand the project folder. After that, you will be presented with the folder structure. There are sub-folders for:


Purpose
Standard folder
Standard parameter
Parameter usage
all connections
conn
CONN_DIR
${CONN_DIR}
input data
data-in
DATAIN_DIR
${DATAIN_DIR}
output data
data-out
DATAOUT_DIR
${DATAOUT_DIR}
temporary data
data-tmp
DATATMP_DIR
${DATATMP_DIR}
graphs
graph
GRAPH_DIR
${GRAPH_DIR}
lookup tables
lookup
LOOKUP_DIR
${LOOKUP_DIR}
metadata
meta
META_DIR
${META_DIR}
sequences
seq
SEQ_DIR
${SEQ_DIR}
transformation definitions (both source files and classes)
trans
TRANS_DIR
${TRANS_DIR}

 Creating Project:

Start Endeca Server from start menu and integrator -->File -->New Project


We can Edit the CloverETL project sub directories names while creating the project

You can look at the workspace.prm file by clicking this item in the Navigator pane, by right-clicking and choosing Open WithText Editor from the context menu.

Default project parameters are stored in workspace.prm file located  in project root 
A new graph automatically links the workspace.prm

Metadata  describes record structure and format
Where structure defines what fields form a record
Format defines how to serialize a record from a data set

New database connection 

File --> New Project--> Clover ETL -- > Create Connection(Connection.cfg)
 


Data Store creation :
Copy the data store from Quick Start to our new project and add below entries
# Configuration parameters for running Endeca Data Store
ENDECA_SERVER_HOST=localhost
ENDECA_SERVER_PORT=7770
DATA_STORE_NAME=quickstart
 Where quick start would be our project name in below example it is  'Project_01'

2) Copy the InitDatastore graph from QuickStart to newly created project/graph folder and run it once(only Create Data Store last one) remove others


Metadata import :
File --> New Project--> Clover ETL--> Metadata (Extract from database) and navigate to your project related folder under workspace view

After importing the metadata (samp_offices_d) it will create a file with an extension of .fmt you can change the data type by editing .fmt files as shown below
Creating Graph:

Right click on Graph folder --->New-->ETL Graph

Drag the palettes  to the surface area and connect them by using EDGE palette



  Once the palette are connected using EDGE now we have to assign METADATA(DB connection) created in above steps as shown below



Once the Metadata is created you can edit the properties of the table column by changing the data type/add any new fields for calculation purpose and it will populate on MDEX engine 




Click on finish next on Edge(Data Input) we need to assign db connection and query URL,

Right click on DBInput Table -->Rename -->Products
DBInput Table --> Rename --> Revenue



Now we have to join Revenue and Products tables based on PROD_KEY by using ExtHashJoin and select the join Type





Now we have to load the data coming from both revenue & products to a single table ?

No, MDEX  will do in-memory calculations and it will store the data in tmp files(RAM) not on tables in order to achieve it transform the useful columns from SAMP_PRODUCTS_D & SAMP_REVENUE_F to a new table 'WC_REVENUE_PRODUCT'  as shown below



Make sure Transformation is valid   select the METADATA for the EDGE from ExtHashJoin to Bulk Add/Replace Records



Edit the Bulk Add/Replace records palette and select the Spec Attribute -->which should be unique and save the Graph and right click Run AS --> ETL Graph if everything is correct the Graph execution should be successful





 Reports creation in another post





No comments:

Post a Comment

.