Coflnet

Case Study: Development of a Scraping System

Flou
#case-study#scraping#big data#ai
  • In 2023, Coflnet took on a project for an advertising company from northern Germany with around 10 employees. The company was already using a web scraping system that had served them well for years and had collected millions of articles from various websites. Over time, however, the old system reached its limits: It became unreliable, inefficient and was no longer able to optimally scrape the most important pages for the company.

    The company needed a new, more efficient solution that could specifically scrape the relevant websites. This is where Coflnet came in.

    We had already developed a system for this client with great success. So it was logical that we would also take on this project.

    The problem

    The central problem was not to start from scratch, but to migrate from an old, unreliable system. The existing scraping system that the company had previously purchased was overwhelmed. Too often, important information from sites was not being captured correctly, leading to inaccurate data and later wrong decisions. However, as several million articles had already been collected with this old system, it could not simply be abandoned.

    The challenge was to carry out this migration without causing major downtime, while at the same time developing a new, optimized scraping system that focused on the websites relevant to the advertising company. relevant websites for the advertising company.

    The approach

    Coflnet approached the project with two clear objectives: The existing data should remain reliable and accessible, and to develop a new scraping system that was faster, more robust and better tailored to the needs of the client.

    1. Development of a new scraping system: a completely new scraping engine that was tailored to the dynamic and constantly changing structure of websites. This system was specifically optimized to scrape the most important websites for the advertising company.

      • The scrapers were designed to be more adaptable and better able to cope with layout changes or popups.
      • The system was also optimized for speed so that new content could be scraped and saved quickly
        to provide the advertising team with up-to-date data. As the most relevant articles for the advertisi ng company are
        always the ones that were published most recent.
    2. Compatibility with the old system: The data generated by the new system should be compatible with the old system. The data generated by the new system is transferred to the target system via the same interface as the data from the old scraping system.

    Results

    The migration to the new scraping system was a complete success. The advertising company now has a fast, reliable and highly targeted solution that focuses on the websites that are most important to them.

    Outlook

    The new system that has been developed continues to support the advertising company in its growth. With a scalable database and a flexible scraping engine, the company is well-equipped to process more data , expand into new markets and adapt to changes in the digital world.

    This project demonstrates the importance of not only developing new technologies, but also migrating old systems effectively to ensure continuity, availability and reliability.

    ← Back to Blog