Key Responsibilities:
Build & maintain high tps, reliable, performant and cost-effective data collection and extraction modules using Node.js & Python, using streaming solutions like Kafka.
Deploy, maintain and support these modules on AWS & GCP cloud.
Index, archive and retain necessary data in multiple persistence stores like Object stores(S3), Key value store (Dynamo DB), and Elastic Search based on the use case.
Manage the quality of data collected using data quality libraries built using SQL/Python/Spark on AWS Glue and exposed as Dashboards for monitoring using AWS Quick sight and Kibana.
Restfully abstract the data collected to the downstream applications through a Node.js backend.
Collaborate well with engineers, researchers, and data implementation specialists to design and create advanced, elegant and efficient end to end competitive intelligence solutions.
Skills Required:
Proven experience as a Software Development Engineer that has built, deployed and operationally supported systems in production.
Excellent knowledge of programming languages such as Node.JS, Python
Strong understanding of software design patterns, algorithms, and data structures
Experience with SQL & NoSQL databases.
Good communication and collaboration skills.
Works with good ownership and accountability.
Ability to work in a fast-paced and dynamic environment.
Experience in writing high volume/tps, reliable crawlers and scrapers is a plus.
Bachelor's or master's degree in computer science or a related field.