Public summary
A remote freelance position for a Python Data Scraping Engineer to manage specialized data extraction workflows combining AI and human expertise. The role involves developing and maintaining complex web scraping pipelines, ensuring high quality and accurate data delivery using various tools and custom methods. Flexible part-time work schedule with performance-based compensation.
Salary
USD 37.00 - 37.00 hour
Responsibilities
Own end-to-end web data scraping workflows from complex sites, ensuring comprehensive coverage, accuracy, and reliable output of structured datasets. Utilize internal tools and custom workflows (e.g., Apify, OpenRouter) to accelerate data gathering, validation, and task execution. Adapt scraping methods to handle dynamic and interactive content such as JavaScript-rendered pages, infinite scroll, and API integration via proxies. Maintain data quality through validation checks, cross-source consistency verification, and strict adherence to formatting standards. Scale scraping operations efficiently for large datasets using batching or parallelization while monitoring system stability and handling changes in site structures.
Qualifications
At least 3 years of experience in data engineering, web scraping, automation, or software development. Proficiency with Python web scraping libraries and techniques (BeautifulSoup, Selenium or similar), including handling dynamic web content and APIs. Demonstrated ability to extract data from complex or inconsistent HTML structures. Strong skills in data cleaning, normalization, validation, and delivering structured datasets in formats like CSV, JSON, or Google Sheets. Experience working with large language models and AI frameworks to enhance automation. Detail-oriented with strong commitment to data accuracy. Ability to work independently and troubleshoot technical challenges. Upper-intermediate (B2) English proficiency or higher. A degree in Engineering, Computer Science, Applied Mathematics or related field is a plus. Linking to a GitHub portfolio is advantageous.