Public summary
We are seeking a Senior Data Scraping Engineer experienced with Python to work remotely on freelance projects involving advanced data extraction and processing. The role involves collaborating with AI systems and human agents to deliver accurate, structured datasets from complex web sources using tools like Apify and OpenRouter. This position offers flexible scheduling and is ideal for professionals skilled in dynamic content scraping, data validation, and automation within a hybrid AI-human environment. English proficiency at B2 level or higher is required.
Salary
USD 37.00 - 37.00 hour
Responsibilities
- Manage end-to-end data extraction workflows from complex, dynamic websites ensuring accuracy and completeness. - Utilize internal platforms and custom scripts to expedite data collection, validation, and task execution. - Adapt methods to handle JavaScript-rendered content and changes in website structures. - Enforce data quality through validation, consistency checks, and formatting adherence before delivery. - Scale operations efficiently with batching or parallelization while monitoring for failures. - Collaborate with AI agents in a hybrid system to enhance automation and output quality.
Qualifications
- Minimum 3 years of experience in data engineering, web scraping, automation, or software development. - Strong proficiency in Python-based web scraping tools and techniques (e.g., BeautifulSoup, Selenium). - Experience with dynamic content extraction including JavaScript, AJAX, infinite scrolling, and working with proxies. - Proven skills in extracting data from complex and inconsistent web page structures. - Solid background in data cleaning, normalization, and validation with output formats such as CSV, JSON, and Google Sheets. - Hands-on experience with Large Language Models and AI frameworks is a plus. - Strong attention to detail and ability to troubleshoot independently. - Upper-intermediate (B2) or higher English proficiency. - Bachelor’s or Master’s degree in Engineering, Applied Mathematics, Computer Science, or related fields is advantageous. - A GitHub profile showcasing relevant work is considered a plus.