Claude Reddit Scraper:
ClaudeRedditScraper is a powerful tool designed to extract data from Reddit, leveraging the capabilities of Claude Code with Playwright MCP. This node allows you to scrape Reddit posts and comments efficiently, providing a structured way to gather information from various Reddit sources such as URLs, subreddits, search queries, or user profiles. It is particularly beneficial for users who need to analyze Reddit data for insights, trends, or content creation. The node automatically configures the scraping process and handles data extraction, making it accessible even to those without a technical background. By using this node, you can obtain detailed metadata, posts, and comments, which are organized in a clean JSON format, facilitating easy integration into your projects or analyses.
Claude Reddit Scraper Input Parameters:
source_type
This parameter specifies the type of Reddit source you want to scrape. It can be a URL, subreddit, search query, or user profile. The choice of source type determines the scope and focus of the data extraction. The default value is "subreddit", which is ideal for gathering data from specific communities. This parameter is crucial as it defines the starting point of your scraping task.
source
The source parameter is a string that represents the specific Reddit URL, subreddit name, search query, or username you wish to scrape. It acts as the target for the scraping operation. The default value is "programming", which can be changed to any valid Reddit identifier based on your needs. This parameter directly impacts the content and relevance of the data collected.
scrape_mode
This parameter determines what type of content you want to scrape from Reddit. Options include "comments", "posts", "both", and "metadata". The default setting is "comments", which focuses on extracting user discussions. Choosing the appropriate scrape mode is essential for tailoring the data to your specific requirements, whether you're interested in user interactions, content, or metadata.
max_items
The max_items parameter sets the maximum number of items to scrape, ranging from 1 to 100. The default value is 10. This parameter controls the volume of data collected, allowing you to manage the scope of your scraping task and ensure it aligns with your data processing capabilities.
model
This parameter specifies the Claude model to use during the scraping process. Available options are "default", "sonnet", and "opus", with "sonnet" as the default choice. The model selection can influence the efficiency and accuracy of the scraping operation, as different models may have varying strengths in handling specific types of data or tasks.
sort_by
The sort_by parameter allows you to define the sorting order of the scraped content. Options include "hot", "new", "top", etc., with "hot" as the default. This parameter affects the prioritization of content, enabling you to focus on the most relevant or popular items based on your objectives.
time_filter
This parameter sets the time frame for the content to be scraped, such as "day", "week", "month", etc. The default is "day". It helps in narrowing down the data to a specific period, which is useful for analyzing trends or changes over time.
include_metadata
A boolean parameter that determines whether to include metadata in the scraped data. The default is True. Including metadata can provide additional context and insights, such as timestamps and source details, enhancing the value of the collected data.
max_comment_depth
This parameter specifies the maximum depth of comments to scrape, with a default value of 2. It controls how deep into comment threads the scraper will go, which is important for capturing the full scope of discussions without overwhelming the data set with excessive detail.
memory
An optional parameter that can be used to store intermediate data or states during the scraping process. It is useful for maintaining continuity in complex scraping tasks or when integrating with other systems.
previous_output
This optional parameter allows you to input data from a previous scraping operation, facilitating incremental data collection or updates. It is beneficial for ongoing projects where data needs to be refreshed or expanded over time.
unique_id
A string parameter that assigns a unique identifier to the scraping task. This is useful for tracking and managing multiple scraping operations, ensuring that data is organized and easily retrievable.
Claude Reddit Scraper Output Parameters:
output
The output parameter is a dictionary containing the folder name, response summary, and metadata of the scraping operation. It provides a comprehensive overview of the task, including the location of saved data and a summary of the extracted content, which is essential for further analysis or reporting.
scraped_data
This parameter holds the actual data extracted from Reddit, structured in a clean JSON format. It includes posts, comments, and any additional information based on the scrape mode selected. This data is crucial for any subsequent processing, analysis, or integration into other applications.
summary
The summary parameter provides a concise overview of the scraping results, highlighting key findings or notable content. It serves as a quick reference for understanding the scope and outcome of the scraping task.
item_count
This parameter indicates the total number of items successfully scraped during the operation. It is useful for verifying the completeness of the data collection and ensuring that the task met its objectives.
Claude Reddit Scraper Usage Tips:
- To optimize performance, choose the appropriate
scrape_modebased on your specific needs, such as focusing on "comments" for user interaction analysis or "metadata" for subreddit insights. - Utilize the
max_itemsparameter to control the volume of data collected, ensuring it aligns with your processing capabilities and project requirements. - Experiment with different
modeloptions to find the one that best suits your data extraction needs, as different models may offer varying efficiencies and accuracies.
Claude Reddit Scraper Common Errors and Solutions:
Invalid source_type
- Explanation: The
source_typeparameter is not set to a valid option such as "url", "subreddit", "search", or "user". - Solution: Ensure that the
source_typeis correctly specified and matches one of the allowed options.
Exceeded max_items limit
- Explanation: The
max_itemsparameter exceeds the allowed range of 1 to 100. - Solution: Adjust the
max_itemsvalue to fall within the specified range to ensure proper execution.
Missing source parameter
- Explanation: The
sourceparameter is not provided or is invalid, leading to an inability to locate the target for scraping. - Solution: Verify that the
sourceparameter is correctly specified with a valid Reddit URL, subreddit name, search query, or username.
