Best 15 Twitter Datasets for Natural Language Processing and Machine Learning In 2020

While it might be tough for AI scientists as well as programmers to discover social media sites information for machine learning, one open resource of information is Twitter. Various instructional companies, research study groups, as well as independent scientists have actually scuffed tweets from Twitter as well as made the information offered for public usage.

From belief evaluation models to content moderation models and various other NLP usage instances, Twitter information can be utilized to educate different equipment discovering formulas.

Below is a listing of 20 open Twitter datasets for artificial intelligence.
Ideal Twitter Datasets for All-natural Language Handling as well as Artificial intelligence

1. Apple Twitter Sentiment

A dataset including tweets concerning the huge technology business, Apple. The tweets in this dataset were put together utilizing tweets including the hashtag #AAPL, the referral @apple, as well as others. The tweets were after that split right into favorable, unfavorable, or neutral views.

2. Avengers Endgame Tweets

This dataset for artificial intelligence includes 10,000 tweets that include the hashtag #AvengersEndgame.

3. Charlottesville on Twitter

This dataset includes 150,000 tweets discussing Charlottesville or including the #Charlottesville hashtag.

4. Credibility Corpus in French and English

The Credibility Corpus in French and English was developed to evaluate details reputation as well as find false information as well as reports. The dataset is included both French as well as English tweets concerning reports.

5. Customer Support on Twitter

This dataset is a big corpus of tweets as well as responds to as well as from customer care assistance lines on Twitter.

6. Every Donald Trump Tweet

The Every Donald Trump Tweet dataset is a collection of every tweet the head of state has actually ever before published. The information was later on relocated to the TrumpTwitterArchive, however can still be accessed.

7. FollowTheHashtag: Tokyo

From FollowtheHashtag, this dataset is a collection of 200,000 geolocated tweets from Tokyo.

8. FollowTheHashtag: USA

Likewise from FollowtheHashtag, this dataset is a collection of 200,000 geolocated tweets from the USA of America.

9. Game of Thrones Season 8 Tweets

The tweets gathered for this dataset capture target market responses for every episode by accumulating Video game of Thrones relevant tweets after each episode of period 8 was launched.

10. Pre-processed Twitter Tweets

This is a straightforward social media sites dataset included pre-processed tweets for belief evaluation. The tweets have actually been arranged right into favorable, neutral, as well as unfavorable groups.

11. Russian Troll Tweets

Throughout an examination right into Russia’s impact on the 2016 United States political election, Twitter erased 200,000 Russian giant tweets. This Twitter dataset consists of information on both the private tweets as well as accounts where they were published.

12. Sentiment 140

Sentiment 140 is a device for uncovering the general belief for a brand name, subject, or item on Twitter. The business has actually additionally made their training information offered for download on their website.

13. SMILE Twitter Emotion

A straightforward dataset for belief evaluation, the SMILE Twitter Smiley Dataset includes 3,085 tweets each sharing a various feeling: rage, disgust, joy, shock, as well as despair.

14. Stanford SNAP Twitter Dataset

From the SNAP collection data source at Stanford University, this dataset includes 476 million tweets from 20 million customers over a 7-month duration.

15. Top 20 Most-Followed Users on Twitter

This Twitter dataset is made up of over 52,000 tweets from the 20 most-followed Twitter accounts. For this dataset retweets were not gathered.

We will be happy to hear your thoughts

Leave a reply

Ideas Are Free
Logo
Reset Password