What is meant by data federation?

By: Bart Baesens, Seppe vanden Broucke

This QA first appeared in Data Science Briefings, the DataMiningApps newsletter. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.


You asked: What is meant by data federation?

Our answer:

Data federation typically follows a pull approach where data is pulled from the underlying source systems on an on-demand basis.  Enterprise Information Integration (EII) is an example of a data federation technology (see figure below).  EII can be implemented by realizing a virtual business view on the dispersed underlying data sources.  The view serves as a universal data access layer.  No moving or replication of data is needed since all data stays in the source systems.  Hence, a federation strategy enables real-time access to current data, which is not the case for a data consolidation strategy.

Enterprise Information Integration (EII) as a data federation solution.

EII can be beneficial as it leaves data in place that otherwise might dramatically increase overall storage requirements if a consolidated approach would be followed. One important disadvantage to remember is the overall worse performance of EII.  Since queries performed on the business view must be translated to underlying data sources, a performance hit is unavoidable.