Java – how to use twitter4j to retrieve images in tweets?
I want to issue a query for a keyword or topic tag and retrieve all images from all tweets containing that keyword I can easily issue queries and retrieve generated tweets using Twitter 4J and Java I Know http://t.co/xxxx Link I can access and view relevant images in the browser The picture is located in https://pbs.twimg.com/xxxxx. So all I need to do is complete this process in my code!
I can easily parse each tweet http://t.co/xxxx Link However, when I retrieve all the HTML from this link, I don't see any https://pbs.twimg.com/xxxx Images: (. What I think is happening is that Twitter is loading these images through JavaScript
Is there any way to easily retrieve images on each tweet?
This is what I have done so far:
package com.company; import twitter4j.*; import twitter4j.conf.ConfigurationBuilder; import java.io.BufferedReader; import java.io.InputStreamReader; import java.net.URL; import java.util.regex.Matcher; import java.util.regex.Pattern; public class Main { public static void main(String[] args) throws Exception { ConfigurationBuilder cb = new ConfigurationBuilder(); cb.setDebugEnabled(true) .setOAuthConsumerKey("xxxxxxxxxx") .setOAuthConsumerSecret("xxxxxxxxxxxx") .setOAuthAccessToken("xxxxxxxxx-xxx-xxxxxxxx") .setOAuthAccessTokenSecret("xxxxxxxxxxxxxxxxxxx"); TwitterFactory tf = new TwitterFactory(cb.build()); Twitter twitter = tf.getInstance(); Query query = new Query("#hashtag"); QueryResult result = twitter.search(query); Pattern pattern = Pattern.compile("http://t.co/\\w{10}"); Pattern imagePattern = Pattern.compile("https\\:\\/\\/pbs\\.twimg\\.com/media/\\w+\\.(png | jpg | gif)(:large)?"); for (Status status : result.getTweets()) { if (status.isRetweet()) continue; System.out.println("@" + status.getUser().getScreenName() + ":" + status.getText()); Matcher matcher = pattern.matcher(status.getText()); if (matcher.find()) { System.out.println("found a t.co url"); URL oracle = new URL(matcher.group()); BufferedReader in = new BufferedReader( new InputStreamReader(oracle.openStream())); String inputLine; while ((inputLine = in.readLine()) != null) { matcher = imagePattern.matcher(inputLine); if (matcher.find()) System.out.println("YAYAAYAYAYYAYAYAYAYAYAYAYAYAAYAYYAYAAYYAYAYAYA: " + matcher.group()); } in.close(); } } } }
Solution
There is a simpler way to retrieve images from tweets If an image is inserted into the tweet, you can use getmediaentities () to get the media data, and then use getmediaurl () to retrieve the web address
MediaEntity[] media = status.getMediaEntities(); //get the media entities from the status for(MediaEntity m : media){ //search trough your entities System.out.println(m.getMediaURL()); //get your url! }