Fusion - Metadata Creation¶

In [1]:

Copied!

from fusion import Fusion
import pandas as pd
from fusion import Fusion
import pandas as pd

Establish the connection¶

In [2]:

Copied!

fusion = Fusion()
fusion = Fusion()

Show the available functionality¶

In [3]:

Copied!

fusion
fusion

Out[3]:

Fusion object 
Available methods:
+------------------------------+--------------------------------------------------------------------------------------------------+
| attribute                    | Instantiate an Attribute object with this client for metadata creation.                          |
| attributes                   | Instantiate an Attributes object with this client for metadata creation.                         |
| catalog_resources            | List the resources contained within the catalog, for example products and datasets.              |
| create_dataset_lineage       | Upload lineage to a dataset.                                                                     |
| dataset                      | Instantiate a Dataset object with this client for metadata creation.                             |
| dataset_resources            | List the resources available for a dataset, currently this will always be a datasetseries.       |
| datasetmember_resources      | List the available resources for a datasetseries member.                                         |
| delete_all_datasetmembers    | Delete all dataset members within a dataset.                                                     |
| delete_datasetmembers        | Delete dataset members.                                                                          |
| download                     | Downloads the requested distributions of a dataset to disk.                                      |
| from_bytes                   | Uploads data from an object in memory.                                                           |
| get_events                   | Run server sent event listener and print out the new events. Keyboard terminate to stop.         |
| get_fusion_filesystem        | Creates Fusion Filesystem.                                                                       |
| list_catalogs                | Lists the catalogs available to the API account.                                                 |
| list_dataset_attributes      | Returns the list of attributes that are in the dataset.                                          |
| list_dataset_lineage         | List the upstream and downstream lineage of the dataset.                                         |
| list_datasetmembers          | List the available members in the dataset series.                                                |
| list_datasets                | Get the datasets contained in a catalog.                                                         |
| list_distributions           | List the available distributions (downloadable instances of the dataset with a format type).     |
| list_product_dataset_mapping | get the product to dataset linking contained in  a catalog. A product is a grouping of datasets. |
| list_products                | Get the products contained in a catalog. A product is a grouping of datasets.                    |
| listen_to_events             | Run server sent event listener in the background. Retrieve results by running get_events.        |
| product                      | Instantiate a Product object with this client for metadata creation.                             |
| to_bytes                     | Returns an instance of dataset (the distribution) as a bytes object.                             |
| to_df                        | Gets distributions for a specified date or date range and returns the data as a dataframe.       |
| to_table                     | Gets distributions for a specified date or date range and returns the data as an arrow table.    |
| upload                       | Uploads the requested files/files to Fusion.                                                     |
| default_catalog              | Returns the default catalog.                                                                     |
+------------------------------+--------------------------------------------------------------------------------------------------+

Create Product¶

Create Product Object¶

In [4]:

Copied!





my_product = fusion.product(
    identifier="PYFUSION_PRODUCT",
    title="PyFusion Product",
    description="A product created using the PyFusion SDK.",
    short_abstract="A product created using the PyFusion SDK.",
    is_restricted=True,
    maintainer="J.P. Morgan Fusion",
    region="Global",
    publisher="J.P. Morgan",
    theme="Research"
)
my_product
my_product = fusion.product(
    identifier="PYFUSION_PRODUCT",
    title="PyFusion Product",
    description="A product created using the PyFusion SDK.",
    short_abstract="A product created using the PyFusion SDK.",
    is_restricted=True,
    maintainer="J.P. Morgan Fusion",
    region="Global",
    publisher="J.P. Morgan",
    theme="Research"
)
my_product

Out[4]:

Product(
identifier='PYFUSION_PRODUCT',
 title='PyFusion Product',
 category=None,
 short_abstract='A product created using the PyFusion SDK.',
 description='A product created using the PyFusion SDK.',
 is_active=True,
 is_restricted=True,
 maintainer=['J.P. Morgan Fusion'],
 region=['Global'],
 publisher='J.P. Morgan',
 sub_category=None,
 tag=None,
 delivery_channel=['API'],
 theme='Research',
 release_date=None,
 language='English',
 status='Available',
 image='',
 logo='',
 dataset=None
)

Upload to catalog¶

In [ ]:

Copied!

my_product.create()
my_product.create()

Create Dataset¶

Create a dataset object¶

In [5]:

Copied!





my_dataset = fusion.dataset(
    identifier="PYFUSION_DATASET",
    title="PyFusion Dataset",
    description="A dataset created using the PyFusion SDK.",
    is_restricted=True,
    maintainer="J.P. Morgan Fusion",
    region="Global",
    publisher="J.P. Morgan",
    product="PYFUSION_PRODUCT",
    is_raw_data=False,
)
my_dataset
my_dataset = fusion.dataset(
    identifier="PYFUSION_DATASET",
    title="PyFusion Dataset",
    description="A dataset created using the PyFusion SDK.",
    is_restricted=True,
    maintainer="J.P. Morgan Fusion",
    region="Global",
    publisher="J.P. Morgan",
    product="PYFUSION_PRODUCT",
    is_raw_data=False,
)
my_dataset

Out[5]:

Dataset(
identifier='PYFUSION_DATASET',
 title='PyFusion Dataset',
 category=None,
 description='A dataset created using the PyFusion SDK.',
 frequency='Once',
 is_internal_only_dataset=False,
 is_third_party_data=True,
 is_restricted=True,
 is_raw_data=False,
 maintainer='J.P. Morgan Fusion',
 source=None,
 region=['Global'],
 publisher='J.P. Morgan',
 product=['PYFUSION_PRODUCT'],
 sub_category=None,
 tags=None,
 created_date=None,
 modified_date=None,
 delivery_channel=['API'],
 language='English',
 status='Available',
 type_='Source',
 container_type='Snapshot-Full',
 snowflake=None,
 complexity=None,
 is_immutable=None,
 is_mnpi=None,
 is_pci=None,
 is_pii=None,
 is_client=None,
 is_public=None,
 is_internal=None,
 is_confidential=None,
 is_highly_confidential=None,
 is_active=None,
 owners=None,
 application_id=None
)

In [ ]:

Copied!

my_dataset.create()
my_dataset.create()

Create Attributes¶

Retrieve template for attributes¶

In [6]:

Copied!

attributes_df = fusion.attributes().to_dataframe()
attributes_df
attributes_df = fusion.attributes().to_dataframe()
attributes_df

Out[6]:

	identifier	index	dataType	title	description	isDatasetKey	source	sourceFieldId	isInternalDatasetKey	isExternallyVisible	unit	multiplier	isPropagationEligible	isMetric	availableFrom	deprecatedFrom	term	dataset	attributeType
0	example_attribute	0	String	Example Attribute	Example Attribute	False	None	example_attribute	None	True	None	1.0	None	None	None	None	bizterm1	None	None

Download and edit¶

In [7]:

Copied!

attributes_df.to_csv('attributes.csv', index=False)
attributes_df.to_csv('attributes.csv', index=False)

Convert to attributes list¶

In [8]:

Copied!

attributes = pd.read_csv('attributes.csv')
attributes
attributes = pd.read_csv('attributes.csv')
attributes

Out[8]:

	identifier	index	dataType	title	description	isDatasetKey	source	sourceFieldId	isInternalDatasetKey	isExternallyVisible	unit	multiplier	isPropogationEligible	isMetric	availableFrom	deprecatedFrom	term	dataset	attributeType
0	example_attribute0	0	String	Example Attribute 0	Example Attribute 0	False	NaN	example_attribute 0	NaN	True	NaN	1.0	NaN	NaN	NaN	NaN	bizterm1	NaN	NaN
1	example_attribute1	1	String	Example Attribute 1	Example Attribute 1	False	NaN	example_attribute 1	NaN	True	NaN	1.0	NaN	NaN	NaN	NaN	bizterm1	NaN	NaN
2	example_attribute2	2	String	Example Attribute 2	Example Attribute 2	False	NaN	example_attribute 2	NaN	True	NaN	1.0	NaN	NaN	NaN	NaN	bizterm1	NaN	NaN

In [9]:

Copied!

attributes_list = fusion.attributes().from_object(attributes)
attributes_list
attributes_list = fusion.attributes().from_object(attributes)
attributes_list

Out[9]:

[
('example_attribute0', 0, <Types.String: 1>, 'Example Attribute 0', 'Example Attribute 0', False, None, 'example_attribute_0', None, True, None, 1.0, None, None, None, None, 'bizterm1', None, None),
 ('example_attribute1', 1, <Types.String: 1>, 'Example Attribute 1', 'Example Attribute 1', False, None, 'example_attribute_1', None, True, None, 1.0, None, None, None, None, 'bizterm1', None, None),
 ('example_attribute2', 2, <Types.String: 1>, 'Example Attribute 2', 'Example Attribute 2', False, None, 'example_attribute_2', None, True, None, 1.0, None, None, None, None, 'bizterm1', None, None)
]

Upload attributes to dataset on catalog¶

In [ ]:

Copied!

attributes_list.create(dataset="PYFUSION_DATASET")
attributes_list.create(dataset="PYFUSION_DATASET")

Upload a file¶

In [11]:

Copied!

file_df = pd.read_csv('sample.csv')
file_df
file_df = pd.read_csv('sample.csv')
file_df

Out[11]:

	example_attribute0	example_attribute1	example_attribute2
0	A	A	A
1	B	B	B
2	C	C	C

In [ ]:

Copied!





fusion.upload(
    path='sample.csv',
    dataset="PYFUSION_DATASET",
    dt_str="20241025",
)
fusion.upload(
    path='sample.csv',
    dataset="PYFUSION_DATASET",
    dt_str="20241025",
)

Create Raw Dataset¶

In [7]:

Copied!





my_raw_dataset = fusion.dataset(
    identifier="PYFUSION_RAW_DATASET",
    title="PyFusion Raw Dataset",
    description="A dataset created using the PyFusion SDK.",
    is_restricted=True,
    maintainer="J.P. Morgan Fusion",
    region="Global",
    publisher="J.P. Morgan",
    product="PYFUSION_PRODUCT",
    is_raw_data=True,
)
my_raw_dataset
my_raw_dataset = fusion.dataset(
    identifier="PYFUSION_RAW_DATASET",
    title="PyFusion Raw Dataset",
    description="A dataset created using the PyFusion SDK.",
    is_restricted=True,
    maintainer="J.P. Morgan Fusion",
    region="Global",
    publisher="J.P. Morgan",
    product="PYFUSION_PRODUCT",
    is_raw_data=True,
)
my_raw_dataset

Out[7]:

Dataset(
identifier='PYFUSION_RAW_DATASET',
 title='PyFusion Raw Dataset',
 category=None,
 description='A dataset created using the PyFusion SDK.',
 frequency='Once',
 is_internal_only_dataset=False,
 is_third_party_data=True,
 is_restricted=True,
 is_raw_data=True,
 maintainer='J.P. Morgan Fusion',
 source=None,
 region=['Global'],
 publisher='J.P. Morgan',
 product=['PYFUSION_PRODUCT'],
 sub_category=None,
 tags=None,
 created_date=None,
 modified_date=None,
 delivery_channel=['API'],
 language='English',
 status='Available',
 type_='Source',
 container_type='Snapshot-Full',
 snowflake=None,
 complexity=None,
 is_immutable=None,
 is_mnpi=None,
 is_pci=None,
 is_pii=None,
 is_client=None,
 is_public=None,
 is_internal=None,
 is_confidential=None,
 is_highly_confidential=None,
 is_active=None,
 owners=None,
 application_id=None
)

In [ ]:

Copied!

my_raw_dataset.create()
my_raw_dataset.create()

Upload data without schema¶

In [ ]:

Copied!





fusion.upload(
    path='sample.csv',
    dataset="PYFUSION_RAW_DATASET",
    dt_str="20241025",
)
fusion.upload(
    path='sample.csv',
    dataset="PYFUSION_RAW_DATASET",
    dt_str="20241025",
)