I don’t think SAP Data Hub requires and introduction anymore, but in case you do, please refer to a product introduction here. This blog is about a very specific aspect of SAP Data Hub – “custom developed content”.
What is custom developed content?
SAP Data Hub differs from most of SAP’s product portfolio significantly. It brings SAP and non-SAP systems and data together; it is built from the ground up in a cloud-native architecture and runs 100% containerized (Docker containers) on its own Kubernetes cluster. This (by design) very open architecture allows for customization of data integration scenarios, custom functionality, and custom code. Although SAP Data Hub comes with a huge set of connectors, operators, and libraries out of the box, you can still extend it very easily. Such custom developed content can be custom operators, additional Docker images, and complete pipeline scenarios for now (we have more categories planned).
A custom operator could be a very specific functionality (simple or complex) Data Hub does not support out of the box but you may require. Let’s say, for example, you are trying to insert data from an S3 file into a database like SAP HANA, but the file is stored in JSON file format, which does not lend itself well to insert into a table. You would have to change the format from JSON to CSV for example. You could code the functionality just once in my pipeline, but you could also create a dedicated operator, so you can reuse it again in other pipelines (it also encapsulates my custom functionality nicely for troubleshooting purposes). Although SAP Data Hub provides an operator called “Format Converter”, which includes type conversions between CSV, JSON, and XML, its does not cover this particular direction. So you could create a new operator from a base operator; you can find a detailed example in Jens Rannacher’s blog on how to create a custom operator from a base operator.
If you decide to utilize a specific library in my preferred language (e.g. the pandas library in Python 3, which is not included by default in Data Hub) to accomplish your goal, you can include any available library in an additional Docker image, so you can reference it in my custom operator later. You can find more details on this also in Jens Rannacher’s blog on how to develop a custom operator with my own Docker file.
The third of the currently supported custom content categories is complete pipeline scenarios. These could include custom operators and additional libraries or tags in form of your own Docker images or just derive from base functionality inclusive of your own code that provides a unique solution. They could be designed for a specific use case, a data extraction plus conversion, a data anonymization function for a specific set of data or leverage an API in a specific way. These are basically complete – ready to use – sequences of operators that you have developed and want to share. You can reference and download examples on SAP’s API Business Hub here. You could have built them just to share within your team or organization to help enforce data integration standards and guidelines, you could collect these pipelines as a repository of functions to faster realize data integration scenarios for one of your customers (which also reduces your repeating development efforts significantly), and you could offer them through SAP channels like SAP API Business Hub or the SAP Data Hub GitHub section .
How can I exchange custom content between SAP Data Hub installations?
SAP Data Hub includes in the System Management section in the Files tab, the ability to browse through your repository of objects, mark the ones that make up the complete functionality you want to bundle (from a custom operator, to an additional Docker image, to a complete pipeline scenario), and export it as a solution. This creates a compressed tar file, that can be stored, shared and imported in other SAP Data Hub instances. You can find an example of such a file on the SAP API Business Hub for an integration of Data Hub with SAP Fieldglass leveraging the CDI (Cloud Data integration) API .
How can I share or sell my custom content?
As described above, you can list your custom content on GitHub, which are free-to-use examples, or on SAP API Business Hub, which can be free-to-use examples, but they can also be a link to the SAP App Center, where you can sell your custom content directly to SAP customers. In this case however, you need to have your custom content validated by SAP. For this purpose, you can follow the instructions in SAP note 2752041. Of course, you must be a registered SAP Partner to take advantage of the SAP App Center. If you want to learn more about SAP Partner options, please refer to an excellent detailed blog by Ivo van Barneveld here and SAP App Center documentation here.
Why would I do this?
In many cases partners and developers have great ideas and experience on how to solve a problem in their particular area of expertise. You may just want to share your solution with other developers, or you may want to sell your solution to SAP customers that run SAP Data Hub. Many partners provide SAP data integration solutions to their customers by simply replicating SAP data into a Hadoop or object store. Instead of relying on outdated data the moment it has been replicated, you could realize your data integration scenario using SAP Data Hub functionality instead and leverage real time data from any SAP system within your data integration solution. You can even include ML (Machine Learning) functions for your scenario within SAP Data Hub in form of Python code, built in TensorFlow and OpenCV libraries. Have a look at all the built in features of SAP Data Hub here. Additionally, listen to one of SAP’s partners – Avalon Consulting LLC., who has developed several SAP Data Hub scenarios, and hear why and how in a recent LinkedIn live session here and a recent SAP Data Bits & Bytes recording here.
A few words on SAP Data Intelligence
SAP Data Intelligence was announced during SAPPHIRE 2019 and went GA a few months ago. Data Intelligence focuses on the development, iteration, and operationalization of ML models and ML services and provides Jupyter notebook integration. Every ML model relates to a specific data set; these data sets have to be generated and maintained as well. Data Intelligence includes the entire set of functionality of SAP Data Hub. It utilizes the same code set. So, any of your custom code developed in SAP Data Hub can also be leveraged by SAP Data Intelligence.
Leave a Reply