PyDI: Python Data Integration package

 
SourceForge project

Download PyDI

API Documentation

Browse source

 

  PyDI is a lightweight, 100% Python, federated data integration system. Users may create custom schemas for disparate sources, query and expand results across sources to find related data. The primary usage for PyDI is to facilitate graph-based data mining of biomedical repositories (or any set of repositories that can be interlinked but are inherently fractured and disparate).

Supported features:

  • Easily connects different data sources together without having to declare explicit linkages between source interfaces; make a single query, and expand it automatically to produce relevant results.
  • Capable of integrating multiple sources of information in different locations simultaneously, including the Web, databases, local flatfiles, sockets, etc.
  • Frame-based schema allows the straightforward addition of new sources, or the quick, on-the-fly alteration of existing sources.
  • Extensible architecture allows other modules and packages to wrap PyDI easily; data is passed real-time via a plug-in interface (graph images were generated via networkx and pydot).
  • Save/restore supported via pickling; allows for the simultaneous restoration of multiple sessions.

    Example usage:

    Querying a sequence of unknown function across multiple biologic data sources:
       # Instantiate a browser with the preferred schema, and with
       #  source interfaces in a module called 'generators'.
       >>> browser = BrowserEngine("biological_schema.txt","generators")
    
       # Wrap a graph around the engine, to explore the data later
       # using graph-theoretic methods.
       >>> G = NXBrowserGraph(browser)
    
       # Seed a protein query (the source 'UserQuery' and entity 'Protein'
       #  are definable in the schema.
       >>> browser.seed("UserQuery","Protein","Sequence","MNSTTKHLLH....")
       
       # Now enlarge the seeded result up to two expansions
       >>> auto_expand_browser(browser,2)
       	->DB:MNSTTKHLLH... is loaded into database UserQuery.
    	->QUERY: NCBI.MNSTTKHLLH...
    	->QUERY: UniProt.MNSTTKHLLH...
    	...
    	...
    	...
    
       # Generate an image of the results graph.
       >>> generate_graph_jpg(browser,"unknown.jpg")
       >>>
    		



  • Copyright © 2008 Eithon Cadag