Kaiser NCAP:
Migrated the workflows from Data Engineering Tool (DET, internal tool kit) to Azure Data Factory.
Developed data pipelines and data flows to handle both history & incremental data.
Initiated and developed a complete prototype of Change Data Capture (CDC) pipeline which was used as a plug-in data pipeline by all other teams in different data pipelines.
Extracted the data, applied transformation logics in AzDF and build a staging & extract logic to the data to move from tenant zone to enriched zone.
It follows ETL process and sending the data from staging to extract in the form of split files to the costing partner and retrieving the data from the same for cost accounting.
Guided 2 juniors throughout the project.
Industry Asset Development:
Builded features for Test Automation for Data Processing(TA), a patented project of Accenture accelerator. Publication number: 20210133087
Designed and developed a self-service framework (TA) in Spark using Scala to automate data ingestion and data validation library suite to handle both rest and real-time data.
Implemented different types of Change Data Capture logics into TAF Framework.
As per client’s ad hoc suggestion, developed a Data Profiler to create a statistical data frame which will determine the quality of data. If the weightage satisfies the threshold it will trigger the TAF framework.
Derived meaningful insights on extraction of Twitter data based on hashtag(s) or keyword(s) using Kafka and Spark Streaming functionality.
Presented this product to 5 healthcare clients of Accenture, resulting in sales to 3 client and earned revenue.
Research Works:
Designed and implemented highly scalable and maintainable Scala routine for Hive-on-Spark, thus achieved more than 60% improvement in performance of response.
Converted a complex nested JSON to simple flat Json and further into a data frame dynamically using Spark Scala.
Configured and implemented Apache Kylin to enhance the performance of fetching results of aggregation operation with Kylin Cubes.
Performed POC's on various Big Data tools, thus developed prototypes and provided solutions to data driven challenges which was implemented in other ongoing projects.