Oracle BI, Big Data and ETL: Oracle Endeca Information Discovery

Following to the overview on Endeca today we will see how to create Clover ETL graph and populating data which will be used by MDEX engine for reporting (Studio)

Endeca Information discovery helps organization to answer quickly on relevant data of both structured and Un structured it helps to search and discover and analysis

Information is loaded from multiple data source systems and stored in a faceted data model that dynamically supports changing data . Information discovery enables an iterative approach

Integration features a new ETL tool, The integrator (Clover ETL) that lets you extract source records from a variety of source types flat files to databases.

The oracle Endeca Server provides support for guided exploration search and analysis on structured,semi - structured/unstructured from inside/external sources

In structured data can be easily and directly loaded into a faceted model

Each row becomes a records

Each column becomes and attribute

Unstructured elements can be stored their own records for side by side analysis

Studio is highly interactive ,component-based environment for building analytic application it is built on web-based technology that enable building analytic applications delivered through use of web browsers

Integrator has below components

Graph : A graph is a pipeline of components that processes the data. . The simplest graph has one Reader component to read in the source data and one of the integrator connectors to write the data to the Oracle Endeca Server

Edge : The line that connects components ports, carries data between components

Transformer / Joins : Used to transform the data coming from source data system , and joiner is used to join data between two tables (Dim - Fact)

Once Endeca Information discovery installed you can see below components in Start Menu

Start Endeca Server --> Integrator

LoadData.grf(Graph)

In the CloverETL perspective, there is a Navigator pane on the left side of the window. In this pane, you can expand the project folder. After that, you will be presented with the folder structure. There are sub-folders for:

Purpose	Standard folder	Standard parameter	Parameter usage
all connections	conn	CONN_DIR	${CONN_DIR}
input data	data-in	DATAIN_DIR	${DATAIN_DIR}
output data	data-out	DATAOUT_DIR	${DATAOUT_DIR}
temporary data	data-tmp	DATATMP_DIR	${DATATMP_DIR}
graphs	graph	GRAPH_DIR	${GRAPH_DIR}
lookup tables	lookup	LOOKUP_DIR	${LOOKUP_DIR}
metadata	meta	META_DIR	${META_DIR}
sequences	seq	SEQ_DIR	${SEQ_DIR}
transformation definitions (both source files and classes)	trans	TRANS_DIR	${TRANS_DIR}

Creating Project:

Start Endeca Server from start menu and integrator -->File -->New Project

We can Edit the CloverETL project sub directories names while creating the project

You can look at the workspace.prm file by clicking this item in the Navigator pane, by right-clicking and choosing Open With → Text Editor from the context menu.

Default project parameters are stored in workspace.prm file located in project root

A new graph automatically links the workspace.prm

Metadata describes record structure and format
Where structure defines what fields form a record
Format defines how to serialize a record from a data set

New database connection

File --> New Project--> Clover ETL -- > Create Connection(Connection.cfg)

Data Store creation :
Copy the data store from Quick Start to our new project and add below entries

# Configuration parameters for running Endeca Data Store
ENDECA_SERVER_HOST=localhost
ENDECA_SERVER_PORT=7770
DATA_STORE_NAME=quickstart

Where quick start would be our project name in below example it is 'Project_01'

2) Copy the InitDatastore graph from QuickStart to newly created project/graph folder and run it once(only Create Data Store last one) remove others

Metadata import :

File --> New Project--> Clover ETL--> Metadata (Extract from database) and navigate to your project related folder under workspace view

After importing the metadata (samp_offices_d) it will create a file with an extension of .fmt you can change the data type by editing .fmt files as shown below

Creating Graph:

Right click on Graph folder --->New-->ETL Graph

Drag the palettes to the surface area and connect them by using EDGE palette

Once the palette are connected using EDGE now we have to assign METADATA(DB connection) created in above steps as shown below

Once the Metadata is created you can edit the properties of the table column by changing the data type/add any new fields for calculation purpose and it will populate on MDEX engine

Click on finish next on Edge(Data Input) we need to assign db connection and query URL,

Right click on DBInput Table -->Rename -->Products
DBInput Table --> Rename --> Revenue

Now we have to join Revenue and Products tables based on PROD_KEY by using ExtHashJoin and select the join Type

Now we have to load the data coming from both revenue & products to a single table ?

No, MDEX will do in-memory calculations and it will store the data in tmp files(RAM) not on tables in order to achieve it transform the useful columns from SAMP_PRODUCTS_D & SAMP_REVENUE_F to a new table 'WC_REVENUE_PRODUCT' as shown below

Make sure Transformation is valid select the METADATA for the EDGE from ExtHashJoin to Bulk Add/Replace Records

Edit the Bulk Add/Replace records palette and select the Spec Attribute -->which should be unique and save the Graph and right click Run AS --> ETL Graph if everything is correct the Graph execution should be successful

Reports creation in another post

Oracle BI, Big Data and ETL

Pages

Pages - Menu

Thursday, October 25, 2012

Oracle Endeca Information Discovery - Clover ETL

No comments:

Post a Comment