Fusion - Metadata Creation¶
In [1]:
Copied!
from fusion import Fusion
import pandas as pd
from fusion import Fusion
import pandas as pd
Establish the connection¶
In [2]:
Copied!
fusion = Fusion()
fusion = Fusion()
Show the available functionality¶
In [3]:
Copied!
fusion
fusion
Out[3]:
Fusion object Available methods: +------------------------------+--------------------------------------------------------------------------------------------------+ | attribute | Instantiate an Attribute object with this client for metadata creation. | | attributes | Instantiate an Attributes object with this client for metadata creation. | | catalog_resources | List the resources contained within the catalog, for example products and datasets. | | create_dataset_lineage | Upload lineage to a dataset. | | dataset | Instantiate a Dataset object with this client for metadata creation. | | dataset_resources | List the resources available for a dataset, currently this will always be a datasetseries. | | datasetmember_resources | List the available resources for a datasetseries member. | | delete_all_datasetmembers | Delete all dataset members within a dataset. | | delete_datasetmembers | Delete dataset members. | | download | Downloads the requested distributions of a dataset to disk. | | from_bytes | Uploads data from an object in memory. | | get_events | Run server sent event listener and print out the new events. Keyboard terminate to stop. | | get_fusion_filesystem | Creates Fusion Filesystem. | | list_catalogs | Lists the catalogs available to the API account. | | list_dataset_attributes | Returns the list of attributes that are in the dataset. | | list_dataset_lineage | List the upstream and downstream lineage of the dataset. | | list_datasetmembers | List the available members in the dataset series. | | list_datasets | Get the datasets contained in a catalog. | | list_distributions | List the available distributions (downloadable instances of the dataset with a format type). | | list_product_dataset_mapping | get the product to dataset linking contained in a catalog. A product is a grouping of datasets. | | list_products | Get the products contained in a catalog. A product is a grouping of datasets. | | listen_to_events | Run server sent event listener in the background. Retrieve results by running get_events. | | product | Instantiate a Product object with this client for metadata creation. | | to_bytes | Returns an instance of dataset (the distribution) as a bytes object. | | to_df | Gets distributions for a specified date or date range and returns the data as a dataframe. | | to_table | Gets distributions for a specified date or date range and returns the data as an arrow table. | | upload | Uploads the requested files/files to Fusion. | | default_catalog | Returns the default catalog. | +------------------------------+--------------------------------------------------------------------------------------------------+
Create Product¶
Create Product Object¶
In [4]:
Copied!
my_product = fusion.product(
identifier="PYFUSION_PRODUCT",
title="PyFusion Product",
description="A product created using the PyFusion SDK.",
short_abstract="A product created using the PyFusion SDK.",
is_restricted=True,
maintainer="J.P. Morgan Fusion",
region="Global",
publisher="J.P. Morgan",
theme="Research"
)
my_product
my_product = fusion.product(
identifier="PYFUSION_PRODUCT",
title="PyFusion Product",
description="A product created using the PyFusion SDK.",
short_abstract="A product created using the PyFusion SDK.",
is_restricted=True,
maintainer="J.P. Morgan Fusion",
region="Global",
publisher="J.P. Morgan",
theme="Research"
)
my_product
Out[4]:
Product( identifier='PYFUSION_PRODUCT', title='PyFusion Product', category=None, short_abstract='A product created using the PyFusion SDK.', description='A product created using the PyFusion SDK.', is_active=True, is_restricted=True, maintainer=['J.P. Morgan Fusion'], region=['Global'], publisher='J.P. Morgan', sub_category=None, tag=None, delivery_channel=['API'], theme='Research', release_date=None, language='English', status='Available', image='', logo='', dataset=None )
Upload to catalog¶
In [ ]:
Copied!
my_product.create()
my_product.create()
Create Dataset¶
Create a dataset object¶
In [5]:
Copied!
my_dataset = fusion.dataset(
identifier="PYFUSION_DATASET",
title="PyFusion Dataset",
description="A dataset created using the PyFusion SDK.",
is_restricted=True,
maintainer="J.P. Morgan Fusion",
region="Global",
publisher="J.P. Morgan",
product="PYFUSION_PRODUCT",
is_raw_data=False,
)
my_dataset
my_dataset = fusion.dataset(
identifier="PYFUSION_DATASET",
title="PyFusion Dataset",
description="A dataset created using the PyFusion SDK.",
is_restricted=True,
maintainer="J.P. Morgan Fusion",
region="Global",
publisher="J.P. Morgan",
product="PYFUSION_PRODUCT",
is_raw_data=False,
)
my_dataset
Out[5]:
Dataset( identifier='PYFUSION_DATASET', title='PyFusion Dataset', category=None, description='A dataset created using the PyFusion SDK.', frequency='Once', is_internal_only_dataset=False, is_third_party_data=True, is_restricted=True, is_raw_data=False, maintainer='J.P. Morgan Fusion', source=None, region=['Global'], publisher='J.P. Morgan', product=['PYFUSION_PRODUCT'], sub_category=None, tags=None, created_date=None, modified_date=None, delivery_channel=['API'], language='English', status='Available', type_='Source', container_type='Snapshot-Full', snowflake=None, complexity=None, is_immutable=None, is_mnpi=None, is_pci=None, is_pii=None, is_client=None, is_public=None, is_internal=None, is_confidential=None, is_highly_confidential=None, is_active=None, owners=None, application_id=None )
In [ ]:
Copied!
my_dataset.create()
my_dataset.create()
Create Attributes¶
Retrieve template for attributes¶
In [6]:
Copied!
attributes_df = fusion.attributes().to_dataframe()
attributes_df
attributes_df = fusion.attributes().to_dataframe()
attributes_df
Out[6]:
identifier | index | dataType | title | description | isDatasetKey | source | sourceFieldId | isInternalDatasetKey | isExternallyVisible | unit | multiplier | isPropagationEligible | isMetric | availableFrom | deprecatedFrom | term | dataset | attributeType | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | example_attribute | 0 | String | Example Attribute | Example Attribute | False | None | example_attribute | None | True | None | 1.0 | None | None | None | None | bizterm1 | None | None |
Download and edit¶
In [7]:
Copied!
attributes_df.to_csv('attributes.csv', index=False)
attributes_df.to_csv('attributes.csv', index=False)
Convert to attributes list¶
In [8]:
Copied!
attributes = pd.read_csv('attributes.csv')
attributes
attributes = pd.read_csv('attributes.csv')
attributes
Out[8]:
identifier | index | dataType | title | description | isDatasetKey | source | sourceFieldId | isInternalDatasetKey | isExternallyVisible | unit | multiplier | isPropogationEligible | isMetric | availableFrom | deprecatedFrom | term | dataset | attributeType | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | example_attribute0 | 0 | String | Example Attribute 0 | Example Attribute 0 | False | NaN | example_attribute 0 | NaN | True | NaN | 1.0 | NaN | NaN | NaN | NaN | bizterm1 | NaN | NaN |
1 | example_attribute1 | 1 | String | Example Attribute 1 | Example Attribute 1 | False | NaN | example_attribute 1 | NaN | True | NaN | 1.0 | NaN | NaN | NaN | NaN | bizterm1 | NaN | NaN |
2 | example_attribute2 | 2 | String | Example Attribute 2 | Example Attribute 2 | False | NaN | example_attribute 2 | NaN | True | NaN | 1.0 | NaN | NaN | NaN | NaN | bizterm1 | NaN | NaN |
In [9]:
Copied!
attributes_list = fusion.attributes().from_object(attributes)
attributes_list
attributes_list = fusion.attributes().from_object(attributes)
attributes_list
Out[9]:
[ ('example_attribute0', 0, <Types.String: 1>, 'Example Attribute 0', 'Example Attribute 0', False, None, 'example_attribute_0', None, True, None, 1.0, None, None, None, None, 'bizterm1', None, None), ('example_attribute1', 1, <Types.String: 1>, 'Example Attribute 1', 'Example Attribute 1', False, None, 'example_attribute_1', None, True, None, 1.0, None, None, None, None, 'bizterm1', None, None), ('example_attribute2', 2, <Types.String: 1>, 'Example Attribute 2', 'Example Attribute 2', False, None, 'example_attribute_2', None, True, None, 1.0, None, None, None, None, 'bizterm1', None, None) ]
Upload attributes to dataset on catalog¶
In [ ]:
Copied!
attributes_list.create(dataset="PYFUSION_DATASET")
attributes_list.create(dataset="PYFUSION_DATASET")
Upload a file¶
In [11]:
Copied!
file_df = pd.read_csv('sample.csv')
file_df
file_df = pd.read_csv('sample.csv')
file_df
Out[11]:
example_attribute0 | example_attribute1 | example_attribute2 | |
---|---|---|---|
0 | A | A | A |
1 | B | B | B |
2 | C | C | C |
In [ ]:
Copied!
fusion.upload(
path='sample.csv',
dataset="PYFUSION_DATASET",
dt_str="20241025",
)
fusion.upload(
path='sample.csv',
dataset="PYFUSION_DATASET",
dt_str="20241025",
)
Create Raw Dataset¶
In [7]:
Copied!
my_raw_dataset = fusion.dataset(
identifier="PYFUSION_RAW_DATASET",
title="PyFusion Raw Dataset",
description="A dataset created using the PyFusion SDK.",
is_restricted=True,
maintainer="J.P. Morgan Fusion",
region="Global",
publisher="J.P. Morgan",
product="PYFUSION_PRODUCT",
is_raw_data=True,
)
my_raw_dataset
my_raw_dataset = fusion.dataset(
identifier="PYFUSION_RAW_DATASET",
title="PyFusion Raw Dataset",
description="A dataset created using the PyFusion SDK.",
is_restricted=True,
maintainer="J.P. Morgan Fusion",
region="Global",
publisher="J.P. Morgan",
product="PYFUSION_PRODUCT",
is_raw_data=True,
)
my_raw_dataset
Out[7]:
Dataset( identifier='PYFUSION_RAW_DATASET', title='PyFusion Raw Dataset', category=None, description='A dataset created using the PyFusion SDK.', frequency='Once', is_internal_only_dataset=False, is_third_party_data=True, is_restricted=True, is_raw_data=True, maintainer='J.P. Morgan Fusion', source=None, region=['Global'], publisher='J.P. Morgan', product=['PYFUSION_PRODUCT'], sub_category=None, tags=None, created_date=None, modified_date=None, delivery_channel=['API'], language='English', status='Available', type_='Source', container_type='Snapshot-Full', snowflake=None, complexity=None, is_immutable=None, is_mnpi=None, is_pci=None, is_pii=None, is_client=None, is_public=None, is_internal=None, is_confidential=None, is_highly_confidential=None, is_active=None, owners=None, application_id=None )
In [ ]:
Copied!
my_raw_dataset.create()
my_raw_dataset.create()
Upload data without schema¶
In [ ]:
Copied!
fusion.upload(
path='sample.csv',
dataset="PYFUSION_RAW_DATASET",
dt_str="20241025",
)
fusion.upload(
path='sample.csv',
dataset="PYFUSION_RAW_DATASET",
dt_str="20241025",
)