While it might be tough for AI scientists as well as programmers to discover social media sites information for machine learning, one open resource of information is Twitter. Various instructional companies, research study groups, as well as independent scientists have actually scuffed tweets from Twitter as well as made the information offered for public usage.
From belief evaluation models to content moderation models and various other NLP usage instances, Twitter information can be utilized to educate different equipment discovering formulas.
Below is a listing of 20 open Twitter datasets for artificial intelligence.
Ideal Twitter Datasets for All-natural Language Handling as well as Artificial intelligence
1. Apple Twitter Sentiment
A dataset including tweets concerning the huge technology business, Apple. The tweets in this dataset were put together utilizing tweets including the hashtag #AAPL, the referral @apple, as well as others. The tweets were after that split right into favorable, unfavorable, or neutral views.
2. Avengers Endgame Tweets
This dataset for artificial intelligence includes 10,000 tweets that include the hashtag #AvengersEndgame.
3. Charlottesville on Twitter
This dataset includes 150,000 tweets discussing Charlottesville or including the #Charlottesville hashtag.
4. Credibility Corpus in French and English
The Credibility Corpus in French and English was developed to evaluate details reputation as well as find false information as well as reports. The dataset is included both French as well as English tweets concerning reports.
5. Customer Support on Twitter
This dataset is a big corpus of tweets as well as responds to as well as from customer care assistance lines on Twitter.
6. Every Donald Trump Tweet
The Every Donald Trump Tweet dataset is a collection of every tweet the head of state has actually ever before published. The information was later on relocated to the TrumpTwitterArchive, however can still be accessed.
7. FollowTheHashtag: Tokyo
From FollowtheHashtag, this dataset is a collection of 200,000 geolocated tweets from Tokyo.
8. FollowTheHashtag: USA
Likewise from FollowtheHashtag, this dataset is a collection of 200,000 geolocated tweets from the USA of America.
9. Game of Thrones Season 8 Tweets
The tweets gathered for this dataset capture target market responses for every episode by accumulating Video game of Thrones relevant tweets after each episode of period 8 was launched.
10. Pre-processed Twitter Tweets
This is a straightforward social media sites dataset included pre-processed tweets for belief evaluation. The tweets have actually been arranged right into favorable, neutral, as well as unfavorable groups.
11. Russian Troll Tweets
Throughout an examination right into Russia’s impact on the 2016 United States political election, Twitter erased 200,000 Russian giant tweets. This Twitter dataset consists of information on both the private tweets as well as accounts where they were published.
12. Sentiment 140
Sentiment 140 is a device for uncovering the general belief for a brand name, subject, or item on Twitter. The business has actually additionally made their training information offered for download on their website.
13. SMILE Twitter Emotion
A straightforward dataset for belief evaluation, the SMILE Twitter Smiley Dataset includes 3,085 tweets each sharing a various feeling: rage, disgust, joy, shock, as well as despair.
14. Stanford SNAP Twitter Dataset
From the SNAP collection data source at Stanford University, this dataset includes 476 million tweets from 20 million customers over a 7-month duration.
15. Top 20 Most-Followed Users on Twitter
This Twitter dataset is made up of over 52,000 tweets from the 20 most-followed Twitter accounts. For this dataset retweets were not gathered.