Manager Data Science at Wolters Kluwer
Views:238 Applications:15 Rec. Actions:Recruiter Actions:0
Wolters Kluwer - Senior Data Scientist - NLP/R/Python (7-10 yrs)
Senior Data Scientist (with focus on NLP) :
- In this role, you will lead a range of data analytics efforts, monitor and improve the performance of our Natural Language Understanding (NLU) models.
As the main person in charge of analysis and insights, your responsibilities will include :
- Diving deep into data, doing analysis, and discovering patterns/root causes. Generate insights that drive the product.
- Analyze and evaluate the quality of data used for model training and testing
- Present proposals and results in a clear manner backed by data and coupled with actionable conclusions to drive business decisions
- Collaborate with scientists and engineers on data collection and feature design efforts
- Communicate your results to diverse audiences with effective writing and visualizations
BASIC QUALIFICATIONS :
- MSc in Statistics, Physics, Engineering, or related quantitative field
- Experience with analyzing and quantifying data collected through crowd sourcing protocols
- Strong experience with descriptive statistics and visualization tools
- Solid experience with Natural Language Processing (NLP)
- Expert in Python, or another scripting language (R, Perl); command line usage, e.g., Bash. Solid experience with SQL
- Knowledge of statistical modelling / machine learning techniques
- Experience with data selection methods: identify how to choose which data for which experimental set ups
- Excellent communication and organizational skills with significant attention to detail
- Experience with big data tools (Hive, Pig) and familiarity/experience with AWS technology stack (S3, Redshift).
- Demonstrable track record dealing well with ambiguity, prioritizing needs, and delivering results in an agile, dynamic environment
NLP: Text Extraction from various sources (MS Word, plain text files, pdf files, html pages, etc.), Text Cleaning, Text Pre-Processing, Tokenization, POS tagging, NER, Dependency Parsing, Coreference Resolution, Feature Vector Generation (binary, count, tf-idf, etc.), word2vec, doc2vec, glove, RAKE, document similarity (Cosine, Jaccard, etc.), fuzzy text matching, Lexical and Semantic Information Extraction
Techniques: text clustering (k-means, DBSACN, etc.), text classification (Na- ve Bayes, MAXENT, SVM, Tree Based models, other ML & Deep Learning models)
Tools : Python expert level with packages like NLTK, spaCy, genism, Pattern, TextBlob, Vocabulary, Stanford CoreNLP Python wrappers. Text extraction tools like PDFMiner, Apache Tika with Python, PyPDF2, etc. pandas, sklearn, numpy, xgboost, matplotlib, keras, etc.
Other Guidelines :
1. The candidate should be very hands on with various Data Science tools and techniques with NLP and should be ready to work independently as well as manage and mentor a team of junior Data Scientists
2. The candidate should be able to understand the nature of text for different use cases and apply the extraction/cleaning/pre-processing logic to generate the most useful data and features from it in the most efficient manner
3. Candidates having good grasp on various NL constructs like Parts of Speech, Sentence structures, Subject - Verb - Object relationships, word dependencies (ROOT, compound, etc.) will be preferred
4. Experience of working with multi-lingual data and understanding of nuances of working with different language scripts in NLP will be an added advantage