Twitter Scrapper

The Twitter Scraper component is used to fetch tweets from Twitter based on a specified hashtag. It supports fetching both historical tweets and real-time streams, making it useful for:

  • Social media sentiment analysis

  • Brand monitoring

  • Event tracking and trend analysis

Configuration Sections

All configurations are classified into the following sections:

  • Basic Information

  • Meta Information

  • Resource Configuration

  • Connection Validation

Basic Information Tab

The Basic Information tab defines general execution settings.

Field
Description
Required

Invocation Type

Select execution mode: Batch or Real-Time.

Yes

Deployment Type

Displays the deployment type of the component (pre-selected).

Yes

Container Image Version

Displays the Docker image version used (pre-selected).

Yes

Failover Event

Select a failover event to handle retries or errors.

Optional

Batch Size

Maximum number of records processed in one execution cycle (minimum: 10).

Yes

Meta Information Tab

The Meta Information tab defines authentication and query parameters for fetching tweets.

Field
Description
Required

Consumer API Key

API key provided by Twitter Developer Portal.

Yes

Consumer API Secret Key

Secret key (acts as password) associated with the Consumer API Key.

Yes

Filter Text

Hashtag or keyword to filter tweets (e.g., #AI, #BigData).

Yes

Twitter Data Type

Select one of the following options: History (fetch past tweets) or Real-Time (fetch live tweets as they are posted).

Yes

Saving the Configuration

  1. Enter API credentials and filter details in the Meta Information tab.

  2. Click the Save Component (Storage icon).

  3. A success message confirms that the component properties have been saved.

  4. Activate the pipeline to start fetching tweets.

Example Workflow

  1. Configure Twitter Scraper with:

    • Consumer API Key: xxxxx

    • Consumer API Secret Key: yyyyy

    • Filter Text: #ClimateChange

    • Twitter Data Type: Real-Time

  2. Start the pipeline.

  3. Tweets containing #ClimateChange are ingested into the pipeline and passed to downstream components for sentiment analysis and dashboard visualization.