Data Science Platform Administrator - Big Data Tools (8-15 yrs)

Bangalore Job Code: 421422

Role Summary/Purpose:

The Data Science Platform Administrator will be part of Data Usage CoE and responsible to manage Enterprise wide open source environments (Anaconda Enterprise and Desktop, and H2O). This role will also work closely with the business teams to drive strategic programs and functional requirements. This role is responsible for analytics development, collaboration and deployment for data scientists working in Python and R.

Essential Responsibilities :

- Oversee and manage distributed Data Science applications - Anaconda and H2O platform.

- Manage end-to-end Platform & Infrastructure request with minimal direction from Functional Managers.

- Knowledge and experience using a variety of data transformation languages to implement and operate data science infrastructure assets (e.g., Spark, Livy, Scala, Anaconda, R, Python, C, C++, Hadoop, Hive, HBase, Pig, MapReduce and other Hadoop eco-system components).

- Maintains and updates existing Anaconda Enterprise and H2O servers, test client applications to determine compatibility with upgrades, and work with internal customers to resolve any issues.

- Servers as primary technical point of contact on Anaconda Enterprise environment engagements of moderate to high complexity including service tickets requiring analysis.

- Participates in capacity planning and performance testing to ensure Anaconda and H2O environments are adequately sized and configured to meet current and projected demand.

- Demonstrated experience developing and managing complex technical projects involving parallel or distributed computing, including Hadoop, the Apache Stack and related technologies.

- Oversee administration guidelines for server upgrades, backups, patching, performance tuning and security and administration of the application.

- Manage the users, groups, licenses and integration with Active Directory, taking care of authorization and permissions.

- Understand and propose improvements to underlying data models and infrastructure.

- Clearly communicate solutions to both technical and non-technical teams.

- Develop a set of best practices and share across user groups.

- Stay ahead of new data science capabilities and deliver internal training as needed to internal functional user groups.

- Perform trend analysis of system logs, audits, management data, etc. to identify performance issues and drive improvements to ensure optimal performance.


- Minimum of 2 years of experience with Big Data tools such as Anaconda, H2O, or Hadoop.

- 2 years of experience using Data Lake supporting data science work.

- 1 year of experience with package-management systems

- 1 year of experience with Docker/Kubernetes

- 1 year of experience gathering, and translating end user requirements into end

