* Create and maintain the underlying data pipeline architecture for the solution offerings from raw client data to final solution output.
* Create, populate, and maintain data structures for machine learning and other analytics.
* Use quantitative and statistical methods to derive insights from data.
* Guide the data technology stack used to build solution offerings.
* Combine machine learning, artificial intelligence (ontologies, inference engines and rules) and natural language
processing under a holistic vision to scale and transform businesses - across multiple function and process.
o Create and maintain optimal data pipeline architecture, incorporating data wrangling and ExtractTransform-Load (ETL) flows.
o Assemble large, complex data sets to meet analytical requirements - analytics tables, feature-engineering
o Build the infrastructure required for optimal, automated extraction, transformation, and loading of data
from a wide variety of data sources using SQL and other 'big data' technologies such as Databricks.
o Build automated analytics tools that utilize the data pipeline to derive actionable insights.
o Identify, design, and implement internal process improvements: automating manual processes,
optimizing data delivery, re-designing infrastructure for greater scalability, etc.
o Design and develop data integrations and data quality framework.
o Develop appropriate testing strategies and reports for the solution as well as data from external sources.
o Configure data pipelines to accommodate client-specific requirements to onboard new clients.
o Perform regular operations tasks to ingest new and changing data - implement automation where
o Implement processes and tools to monitor data quality - investigate and remedy any data-related issues
in daily solution operations.
* Minimum of a bachelor's degree in Computer Science or related field (STEM subjects preferred)
* 3+ years hands on experience as a data engineer or similar position
* 3+ years of commercial experience with Python or Scala Programming Language
* 3+ years of SQL and experience working with relational databases (Postgres preferred)
* Knowledge of at least one of the following - Databricks, Spark, Hadoop or Kafka
* Demonstratable knowledge and experience developing data pipelines to automate data processing workflows
* Demonstratable experience in data modeling
* Demonstratable knowledge of data warehousing, business intelligence, and application data integration
* Demonstratable experience in developing applications and services that run on a cloud infrastructure (Azure
* Excellent problem-solving and communication skills
The following additional skills would be beneficial:
* Knowledge of one or more of the following technologies: Data Science, Machine Learning, Natural Language
Processing, Business Intelligence, and Data Visualization
* Knowledge of statistics and experience using statistical or BI packages for analyzing large datasets (Excel, R,
Python, Power BI, Tableau etc.)
* Experience with container management and deployment, e.g., Docker and Kubernetes