TweetParser
Parses raw twitter JSON from stdin using python. I'm only extracting a few fields for quick processing in PIG. Still a lot of work to do. Currently, it extracts id, timestamp, client program, author, and tweet text. I'll add more fields such as geo, if requested. The filenames for the output and bad tweets are currently hardcoded for my testing. I'll make this more dynamic shortly.
Install / Use
/learn @neilkod/TweetParserREADME
twitter parser 2010 18Data license license license.... accepts twitter JSON from stdin and extracts tweet id, username, timestamp, client used, and tweet text trying to keep the output lightweight for performance reasons and to quickly process in map/reduce environments such as apache pig. big to-do - override the default filenames for the output file and bad file.
