Skip to content

Getting Started

Installation

Stable release

To install PyFusion, run this command in your terminal:

$ pip install pyfusion

This is the preferred method to install PyFusion, as it will always install the most recent stable release. Documentation of changes made during each release can be found in the Changelog.

If you don't have pip installed, this Python installation guide can guide you through the process.

From source

The source for PyFusion can be downloaded from the Github repo.

You can either clone the public repository:

$ git clone git://github.com/jpmorganchase/fusion

Or download the tarball:

$ curl -OJL https://github.com/jpmorganchase/fusion/tarball/master

Once you have a copy of the source, you can install it with:

$ pip install .

Storing Credentials

In order to connect to the Fusion API via SDK, we recommend storing your credentials in a JSON file. This file can be generated using Fusion’s application registration page.

Your credentials file should be located in a directory accessible from the location you are using the SDK. By default, the SDK will look for the credentials file at the path 'config/client_credentials.json'.

Formatting Credentials File

Your credentials file should be formatted as follows.

{
    "client_id": "YOUR_CLIENT_ID",
    "client_secret": "YOUR_CLIENT_SECRET",
    "resource": "",
    "auth_url": "",
    "proxies": {}
}
  • client_id: Generated using Fusion's application registration page.
  • client_secret: Generated using Fusion's application registration page.
  • resource: Can be found on Fusion's calling the API page.
  • auth_url: Can be found on Fusion's calling the API page.
  • proxies: HTTP and HTTPS proxy values. Optional.

Populating Proxies

If your application is running behind a proxy, for example a corporate firewall, then the proxies value will also need to be defined. For example: "proxies" : {"http": "http://proxy.myfirm.com:8080", "https": "https://proxy.myfirm.com:8080"}

If you do not require proxies, either remove this argument or leave it as an empty dictionary {}.

Usage

Import Fusion

To begin using pyfusion, simply execute the import below.

from fusion import Fusion

Fusion Object

Connection to the Fusion platform can be easily established by instantiating a Fusion() object.

This object will act as a Fusion client, managing your credentials and connectivity to the API. This client also provides an extensive list of methods for browsing, retrieving, and creating metadata and data.

If your credenitals are stored in 'config/client_credentials.json', you can instantiate your client as follow:

fusion = Fusion()

If your credentials are stored in an alternative location, you can provide the appropriate path as the credentials argument:

fusion = Fusion(credentials="path/to/my/credentials.json")

Alternatively, if you wish to provide your credentials directly to the client, you can utilize the FusionCredentials object:

from fusion import FusionCredentials

credentials = FusionCredentials(
    client_id="<CLIENT_ID>",
    client_secret="<CLIENT_SECRET>",
    resource="<RESOURCE>"
)

fusion = Fusion(credentials=credentials)

View Available Methods

Once you have instantiated the client fusion, running the following cell will display its available methods:

fusion
The output will be a table containing the available methods along with a short description:
Fusion object 
Available methods:
+--------------------------------------+--------------------------------------------------------------------------------------------------------------+
| attribute                            | Instantiate an Attribute object with this client for metadata creation.                                      |
| attributes                           | Instantiate an Attributes object with this client for metadata creation.                                     |
| catalog_resources                    | List the resources contained within the catalog, for example products and datasets.                          |
| create_dataset_lineage               | Upload lineage to a dataset.                                                                                 |
| dataset                              | Instantiate a Dataset object with this client for metadata creation.                                         |
| dataset_resources                    | List the resources available for a dataset, currently this will always be a datasetseries.                   |
| datasetmember_resources              | List the available resources for a datasetseries member.                                                     |
| delete_all_datasetmembers            | Delete all dataset members within a dataset.                                                                 |
| delete_datasetmembers                | Delete dataset members.                                                                                      |
| download                             | Downloads the requested distributions of a dataset to disk.                                                  |
| from_bytes                           | Uploads data from an object in memory.                                                                       |
| get_async_fusion_vector_store_client | Returns Fusion Embeddings Search client.                                                                     |
| get_events                           | Run server sent event listener and print out the new events. Keyboard terminate to stop.                     |
| get_fusion_filesystem                | Retrieve Fusion file system instance.                                                                        |
| get_fusion_vector_store_client       | Returns Fusion Embeddings Search client.                                                                     |
| input_dataflow                       | Instantiate an Input Dataflow object with this client for metadata creation.                                 |
| list_catalogs                        | Lists the catalogs available to the API account.                                                             |
| list_dataset_attributes              | Returns the list of attributes that are in the dataset.                                                      |
| list_dataset_lineage                 | List the upstream and downstream lineage of the dataset.                                                     |
| list_datasetmembers                  | List the available members in the dataset series.                                                            |
| list_datasetmembers_distributions    | List the distributions of dataset members.                                                                   |
| list_datasets                        | Get the datasets contained in a catalog.                                                                     |
| list_distributions                   | List the available distributions (downloadable instances of the dataset with a format type).                 |
| list_indexes                         | List the indexes in a knowledge base.                                                                        |
| list_product_dataset_mapping         | get the product to dataset linking contained in  a catalog. A product is a grouping of datasets.             |
| list_products                        | Get the products contained in a catalog. A product is a grouping of datasets.                                |
| list_registered_attributes           | Returns the list of attributes in a catalog.                                                                 |
| listen_to_events                     | Run server sent event listener in the background. Retrieve results by running get_events.                    |
| output_dataflow                      | Instantiate an Output Dataflow object with this client for metadata creation.                                |
| product                              | Instantiate a Product object with this client for metadata creation.                                         |
| report                               | Instantiate Report object with this client for metadata creation for managing regulatory reporting metadata. |
| to_bytes                             | Returns an instance of dataset (the distribution) as a bytes object.                                         |
| to_df                                | Gets distributions for a specified date or date range and returns the data as a dataframe.                   |
| to_table                             | Gets distributions for a specified date or date range and returns the data as an arrow table.                |
| upload                               | Uploads the requested files/files to Fusion.                                                                 |
| default_catalog                      | Returns the default catalog.                                                                                 |
+--------------------------------------+--------------------------------------------------------------------------------------------------------------+

Executing Available Methods

While detailed documentation for each function is located on the Modules page, below are a few examples to get you started.

Retrieve your available catalogs

fusion.list_catalogs()

Browse products available within a catalog

fusion.list_products(catalog="<CATALOG_ID>")

Browse datasets available within a catalog

fusion.list_datasets(catalog="<CATALOG_ID>")

Browse datasets associated with a product

fusion.list_datasets(catalog="<CATALOG_ID>", product="<PRODUCT_ID>")

Display all metadata for datasets available within a catalog

fusion.list_datasets(catalog="<CATALOG_ID>", display_all_columns=True)

Retrieve attributes for a dataset

fusion.list_dataset_attributes(dataset="<DATASET_ID>", catalog="<CATALOG_ID>")

Additional Resources

Now that you are set up with the SDK, you can start exploring our pages on downloading and uploading.