Extract, Transform and Load (ETL) refers to a process in database usage where the data warehousing that extracts data from homogeneous or heterogeneous data sources, transforms the data for storing it in proper format or structure for querying and analysis purpose and loads it into the final target generally a database, or a operational data store, data mart, or data warehouse.
All the three phases are executed generally in parallel and the data extraction takes time. When the data is being pulled, another transformation process generally executes, processing the already received data and prepares the data for loading and as soon as there is some data ready to be loaded into the target. The data loading kicks off parallel without waiting for the completion of the previous phases. Extract, Transform and Load - ETL systems commonly integrate data from multiple applications, generally developed and supported by different vendors or hosted on separate computer hardware.
Business Intelligence are a set of techniques and tools for the transformation of raw data into meaningful and useful information for business analysis purposes. Business intelligence companies technologies are capable of handling large amounts of unstructured data to help identify, develop and otherwise create new strategic business opportunities with the goal of Business Intelligence to allow for the easy interpretation of these large volumes of data. Identifying new opportunities and implementing an effective strategy based on insights can provide businesses with a competitive market advantage and long-term stability
Extract, Transform and Load (ETL) Software
Extract, Transform and Load (ETL) software are available as free software and proprietary software.
ETL processes can be set up using almost any programming language, but building such processes from scratch can become complex and hence companies are buying ETL tools to help in the creation of ETL processes.
A good extract transform load free software must be able to communicate with the many different relational databases and read the various file formats used throughout an organization. Extract, Transform and Load free software tools have started to migrate into Enterprise Application Integration, or even Enterprise Service Bus, systems that now cover much more than just the extraction, transformation, and loading of data. Free ETL tools also now have data profiling, data quality, and metadata capabilities.
The first part of an ETL process involves extracting the data from the source systems. Most data warehousing projects consolidate data from different source systems. Each separate system may also use a different data organization and/or format. Common data source formats are relational databases, XMLs and flat files, but may include non relational database structures such as Information Management System (IMS) or other data structures such as Virtual Storage Access Method (VSAM) or Indexed Sequential Access Method (ISAM), or even fetching from outside sources such as through web spidering or screen-scraping.
The data transformation stage applies a series of rules or functions to the extracted data from the source to derive the data for loading into the end target. An important function of data transformation is cleansing of data that aims to pass only proper data to the target.
The load phase loads the data into the end target that may be a simple de-limited flat file or a data warehouse Some data warehouses may overwrite existing information with cumulative information; updating extracted data is frequently done on a daily, weekly, or monthly basis. Other data warehouses (or even other parts of the same data warehouse) may add new data in an historical form at regular intervals—for example, hourly.
Some of the activities which can be performed by free top etl tools are selecting only certain columns to load, Translating coded values, Encoding free-form values, Deriving a new calculated value, Sorting: Order the data based on a list of columns to improve searching, Joining data from multiple sources , Generating surrogate-key values, Transposing or pivoting and Splitting a column into multiple columns.