Senior Consultant at Antal International Network
Views:319 Applications:71 Rec. Actions:Recruiter Actions:6
Data Engineer - Big Data/ETL (5-10 yrs)
Job Description :
- A Data engineer in our clients tech team works on the data pipeline infrastructure that is veritably the backbone of the business. In a day's job you would be writing elegant functional SCALA code to crunch TBs of data on Hadoop clusters, mostly using Spark. In a week's job, you would be owning a data pipeline deployment to clusters : on-prem or AWS or Azure or more.
- In a month's job, you would be managing Hadoop clusters right from security to reliability to HA. Did we mention we are building a pluggable, unified data-lake from scratch?
- Time to time you have new challenges for automating and scaling tasks for Data science team. We constantly look to improve our framework and pipelines, hence learning on the job is sort of a given. This is the big picture of our big data system.
- Our expertise and requirements include but are not limited to Spark, Scala, HDFS, Yarn, Hive, Kafka, Distributed Systems, Python, Datastores (Relational and NoSql) and Airflow.
- In short, we are the Data enablers for the entire company for all the WW deployments of our IP, can it get more exciting.
- Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities
- Work with data source providers to understand and validate source data to ensure they are adequate
- Implementing ETL process, and Feature engineering
- Create automated End to End job flows from Ingestion, Transformation, Validation, Feature engineering, Model execution and data model data export
- Implement full data life cycle management
Skills and Qualifications :
- Proficient understanding of distributed computing principles
- Proficiency with Hadoop v2, MapReduce, HDFS
- Experience with Spark
- Experience with integration of data from multiple data sources
- Experience with NoSQL databases, such as HBase, Cassandra, MongoDB
- Knowledge of various ETL techniques and frameworks, such as Flume
- Experience with various messaging systems, such as Kafka or RabbitMQ
- Experience with Big Data ML toolkits, such as Mahout, SparkML, or H2O
- Good understanding of Lambda Architecture, along with its advantages and drawbacks
- Experience with Cloudera
- BS or MS in CS
- At least 5 years of experience as software architect for large-scale enterprise software solutions (must have)
- At least 2 years of hands-on experience in Big Data stacks including Hadoop and Spark (must have)
- Working experience in multicultural/cross-border business environment (must have)
- At least master degree in computer science (preferred)
About Us :
Our Client is a Fintech and Artificial Intelligence company that pioneers alternative credit scoring for emerging markets using telecom data and other new data sources.
They aim to provide 1 billion credit scores to the unbanked consumers who currently do not have access to formal credit. We target emerging markets worldwide, with the initial focus on Southeast Asia and India. Having commercialized thier solution in Vietnam, they are now expanding to other Asian markets including Philippines, Indonesia, India and Bangladesh. In Vietnam, thier clients include some of the top banks and consumer finance companies in the country.