- Hadoop development and implementation.
- Loading from disparate data sets.
- Pre-processing using Hive and Pig.
- Designing, building, installing, configuring and supporting Hadoop.
- Translate complex functional and technical requirements into detailed design.
- Perform analysis of vast data stores and uncover insights.
- Maintain security and data privacy.
- Create scalable and high-performance web services for data tracking.
- High-speed querying.
- Managing and deploying HBase.
- Define all the possible test cases along with the test data.
- Implement ETL pipelines to support data and analytical needs as per the design.
- Perform unit testing of big data implementation
- Convert any SQL statement to Pig or Hive
- Design and develop technical solutions utilizing the big data platform
- Work very closely with Big Data Analytics (Data Engineer and Data Science) Team to quickly and accurately assemble appropriate databases for data mining
- Define technical requirements
- Load data into HDFS
- Manage Linux directory structure
- Manage the HDFS framework
- Manage Hive databases, data extraction
- Data transformation, automating jobs, create complex reports, jobs’ production setup
- Explore new big data technologies within a Massively Parallel Processing environment
- Mentor Junior Big Data Analyst/Data Engineers/Data Scientists working in projects
- Perform code reviews, suggest improvements, Perform Testing of the code developed by developers
- Take part in conducting POC’s by evaluating Big Data tools
- Machine Learning and AI – theory and implementation
- Predictive Analytics – Advanced theory and implementation
Required Experience, Skills and Qualifications
- Tech /B.E. in Computer Science/Engineering or equivalent relevant experience. Should be strong in Data Structures and Algorithms.
- Knowledge of best practices and Architecture for Hadoop.
- Should have proficiency in Hadoop, HDFS, MapReduce, Pig, Hive, HiveQL, Impala, Hbase, Sqoop, Oozie
- Hands on experience on Spark
- Hands on experience in Kafka
- 2yrs + Experience with at least one ETL tool (eg: Talend, Pentaho, Informatica) preferably Pentaho.
- Experience in Python / R is preferable
- Knowledge on NoSql database is a plus
- Knowledge of Data Warehousing concepts would be an added Advantage.
- Knowledge of Agile Methodology is a plus
- Must be a team player with a positive attitude and ability to collaborate effectively
|Job Category||Hadoop Developer|