Mechanical Turk Force Optimization with Automation for Retail

Optimize & Automate mundane tasks and reduce resource head count by 80% and save cost


  • Reduce overall head count of mechanical turk force by 80%.
  • Reduce error rate of manual checks by automating many mundane tasks.
  • Speed up the scraping process of 3500 store locators from 45 days to few days.
  • Reduce dependencies among Data, Quality Control, and Research teams.
  • Improve the accuracy of data collection and geo-location of retail stores by comparing with multiple geo-coding software.


  • Treselle had to take over from the original team that had 47 members and reduce the head count to 10.
  • Treselle received only 3 1-hour knowledge transfer sessions from the original team.
  • 70% of the store locators are done manually with different frequencies based on the importance of the store locators.
  • Complex store locators that are map, search, infinity scroll-based were taking majority of the scraping team’s time due to their complexity.
  • Most of the store locators become outdated for following scraping frequency due to layout changes of the store locator page.
  • Only 30% of the store locators were processed monthly and the remaining in different frequencies such as 3-months or 6-months due to teams inefficiency.
  • 70% of the team was focused on the data quality checks of the scraped store locators and further mapping of stores to their respective banners.
  • Identifying stores that are opened, closed, relocated, etc is a tedious process and error-prone due to manual comparison of historical scraped files with new scraped data.


  • Treselle team prepared a flow chart of scraping and data quality check processes after the KT session and identified inefficient and manual stages.
  • Treselle’s data engineering team automated 90% of the store locator scraping with frameworks such as Selenium, Scrapy, and wrote custom programs to scrape stores that are map and search based by interpreting the POST method calls.
  • Wrote programs to automate 80% of the manual data quality checks such as duplicate store checks, column misplacement of data in the stored files, keywords check, store count checks, store split processes, store geo-coding validation, and dozen more checks.
  • Applied Natural Language algorithms such as Naïve Bayes cost-minimizing probabilistic record matcher, Levenshtein Distance – FuzzyWuzzy partial ratio, Smith-Waterman local string alignment, and others to perform entity identification, linkage, and disambiguation to identify stores status.
  • Reduced 37 manual data quality checks to 4 manual checks, and automated the others with custom programs.
  • Historical store locator files are stored in MongoDB collections for easy comparison with new locator results. S3 was used to store raw excel files that comes as output of the scraping process that was retained as such for the audit purpose.
  • Within 2 months of taking over, the overall process was reduced from 45 days to 12 business days and we processed 100% of the locators in the same month.
  • Used AWS infrastructure on-spot instances to launch multiple instances to perform scraping process in parallel and shutdown soon after the process is done to save cost.

Entity Matching & Resolution Flow: