Automating Internet Research with Apify and n8n
This article explores a system for automating internet research using Apify and n8n, demonstrating how to scrape data, make API calls, and generate reports delivered directly to your Slack channel. This automation can be applied to various tasks like gathering daily news, competitor updates, and lead generation, saving you significant time and effort.
Introduction: The Need for Automated Research
Staying informed on the latest AI news and trends
As an AI development company, staying informed on the latest technology, trends, and news is crucial. This requires constant research, which can be time-consuming. The system described here automates this research process, delivering a daily report with relevant information directly to a designated Slack channel. This eliminates the need to manually search for articles, significantly streamlining the research workflow.
Demo: Automated AI News Delivery to Slack
The demonstration showcases a system that automatically researches the latest AI trends every morning and delivers a report to a Slack channel. The report contains links to relevant articles, eliminating a manual step in the research process.
For example, the Slack report displayed articles concerning "AI's growing role across B2B payments," "a revolution in robotics," and "AI-powered warehouses," all with direct links to the source material. This demonstrates how the automation gathers relevant articles and presents them in an easy-to-digest format, ready for review. The system is flexible and can be configured to deliver reports via email or WhatsApp as well. Beyond just linking to articles, the system can be further enhanced to summarize the retrieved information, offer deeper analysis using platforms like Perplexity, or trigger specific actions based on the data found.
Building the Workflow: Scraping with Apify and n8n
Visualizing the workflow in a mirror board
To illustrate the building process, a mirror board simplifies the visualization of the workflow. The system utilizes Apify, a web scraping and automation platform, and n8n, a workflow automation tool.
The process begins with identifying a recurring task or frequently searched data. In the example, the presenter’s recurring task is researching emerging technologies and trends in AI.
First, determine what needs to be automated. Several departmental examples are provided:
- Competitor Research: Monitor competitor updates, pricing changes, and product launches.
- Market Research: Analyze customer reviews for pain points, track product prices across competitor websites, and monitor changes in compliance documents.
- Industry Monitoring: Scrape articles about AI automation, track trending topics on Google News, and monitor industry fundings or investments.
- Lead Generation: Scrape contact information from various sources.
- Content Creation: Aggregate information from various sources for content creation.
Next, identify if Apify has existing "actors" (modules or APIs) applicable to the task. For example, scraping Google News for AI Automation trends could leverage the Google News scraper actor.
The workflow in n8n begins with a "Cron" node, scheduling the automation to run daily at 9:00 AM. This node triggers the entire process each morning.
Making the API Call: Connecting n8n to Apify
Setting up the HTTP request node in n8n
The system interacts with Apify via HTTP requests in n8n. A dedicated "HTTP Request" node is used for this purpose.
Here's a detailed breakdown:
- API Endpoint: The API endpoint for the Apify actor is entered into the URL field of the HTTP Request node. This endpoint, along with the necessary API key, initiates the data scraping process on Apify. This information is found on the Apify platform under API -> Endpoints. For this specific workflow the endpoint is:
api.apify.com/v2/actor-runs
. - Headers: The HTTP Request node includes headers to specify the content type. For this workflow the
Content-Type
is set toapplication/json
, indicating a json structure for communication between n8n and Apify. - Body: The body content for the API call is formatted in JSON, defining the scraping parameters for the chosen Apify actor. This includes details like the search query (“Emerging AI Technologies”), the start and end dates for the search, and memory constraints (memory allocated to the actor in mb).
- Authentication: The Apify API key is included either in the header parameters or directly within the API endpoint URL within the HTTP request node to authenticate the call. This key allows n8n to securely communicate and run the scraper on Apify. The memory allocation (
&memory=1024
) is added within the URL.
Accessing The System and Utilizing the Findings
The completed n8n workflow is showcased showing the connections and order of tasks executed: cron node, set node, multiple HTTP request nodes, and a slack node. The system can be further customized. For instance, after retrieving the links, a separate node could be added to use a service like Perplexity to summarize the articles before posting to a slack channel.
The access to the complete workflow, including set up instructions, is available within a paid community linked in the video description. The content creator also invites business owners seeking custom AI solutions to apply using the link also provided in the description.
Building this workflow entails three primary HTTP requests:
- Initiating the Apify Actor: This retrieves the actor run ID, including custom settings such as search topic and data parameters formatted in JSON.
- Retrieving the Dataset ID: The Dataset ID is obtained based on the run ID from stage one allowing access to the scraped data.
- Fetching Data and Sending to Slack: Finally, using the Dataset ID from stage two, an API call is made to fetch all the data that can be then parsed and sent to Slack as a daily report.
This modular approach allows adaptation to other scraping tasks by simply changing the Apify actor and associated parameters within the workflow. The system could also be expanded to send the report to various platforms like email, WhatsApp, or Google Sheets, demonstrating broader practical application.