How to use the new Apache Http Client to make a HEAD request
If you’ve updated your Apache HTTP Client code to use the newest library (at the time of this writing it is version 4.3.5 for the httpclient and version 4.3.2 for httpcore) from the version 4.2.x you’ll notice that some classes, like org.apache.http.impl.client.DefaultHttpClient
or org.apache.http.params.HttpParams
have become deprecated. Well, I’ve been there, so in this post I’ll present how to get rid of the warnings by using the new classes.
1. Use case from Podcastpedia.org
The use case I will use for demonstration is simple: I have a batch job to check if there are new episodes are available for podcasts. To avoid having to get and parse the feed if there are no new episodes, I verify before if the eTag
or the last-modified
headers of the feed resource have changed since the last call. This will work if the feed publisher supports these headers, which I highly recommend as it spares bandwidth and processing power on the consumers.
So how it works? Initially, when a new podcast is added to the Podcastpedia.org directory I check if the headers are present for the feed resource and if so I store them in the database. To do that, I execute an HTTP HEAD request against the URL of the feed with the help of Apache Http Client. According to the Hypertext Transfer Protocol — HTTP/1.1 rfc2616, the meta-information contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request).
In the following sections I will present how the code actually looks in the Java, before and after the upgrade to the 4.3.x version of the Apache Http Client.
2. Migration to the 4.3.x version
2.1. Software dependencies
To build my project, which by the way is now available on GitHub – Podcastpedia-batch, I am using maven, so I listed below the dependencies required for the Apache Http Client:
2.1.1. Before
Apache Http Client dependencies 4.2.x
<!-- Apache Http client --> <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.2.5</version> </dependency> <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpcore</artifactId> <version>4.2.4</version> </dependency>
2.1.2. After
Apache Http Client dependencies
<!-- Apache Http client --> <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.3.5</version> </dependency> <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpcore</artifactId> <version>4.3.2</version> </dependency>
2.2. HEAD request with Apache Http Client
2.2.1. Before v4.2.x
Example of executing a HEAD request with Apache HttpClient
private void setHeaderFieldAttributes(Podcast podcast) throws ClientProtocolException, IOException, DateParseException{ HttpHead headMethod = null; headMethod = new HttpHead(podcast.getUrl()); org.apache.http.client.HttpClient httpClient = new DefaultHttpClient(poolingClientConnectionManager); HttpParams params = httpClient.getParams(); org.apache.http.params.HttpConnectionParams.setConnectionTimeout(params, 10000); org.apache.http.params.HttpConnectionParams.setSoTimeout(params, 10000); HttpResponse httpResponse = httpClient.execute(headMethod); int statusCode = httpResponse.getStatusLine().getStatusCode(); if (statusCode != HttpStatus.SC_OK) { LOG.error("The introduced URL is not valid " + podcast.getUrl() + " : " + statusCode); } //set the new etag if existent org.apache.http.Header eTagHeader = httpResponse.getLastHeader("etag"); if(eTagHeader != null){ podcast.setEtagHeaderField(eTagHeader.getValue()); } //set the new "last modified" header field if existent org.apache.http.Header lastModifiedHeader= httpResponse.getLastHeader("last-modified"); if(lastModifiedHeader != null) { podcast.setLastModifiedHeaderField(DateUtil.parseDate(lastModifiedHeader.getValue())); podcast.setLastModifiedHeaderFieldStr(lastModifiedHeader.getValue()); } // Release the connection. headMethod.releaseConnection(); }
If you are using a smart IDE, it will tell you that DefaultHttpClient
, HttpParams
and HttpConnectionParams
are deprecated. If you look now in their java docs, you’ll get a suggestion for their replacement, namely to use the HttpClientBuilder
and classes provided by org.apache.http.config
instead.
So, as you’ll see in the coming section, that’s exactly what I did.
2.2.2. After v 4.3.x
HEAD request example with Apache Http Client v 4.3.x
private void setHeaderFieldAttributes(Podcast podcast) throws ClientProtocolException, IOException, DateParseException{ HttpHead headMethod = null; headMethod = new HttpHead(podcast.getUrl()); RequestConfig requestConfig = RequestConfig.custom() .setSocketTimeout(TIMEOUT * 1000) .setConnectTimeout(TIMEOUT * 1000) .build(); CloseableHttpClient httpClient = HttpClientBuilder .create() .setDefaultRequestConfig(requestConfig) .setConnectionManager(poolingHttpClientConnectionManager) .build(); HttpResponse httpResponse = httpClient.execute(headMethod); int statusCode = httpResponse.getStatusLine().getStatusCode(); if (statusCode != HttpStatus.SC_OK) { LOG.error("The introduced URL is not valid " + podcast.getUrl() + " : " + statusCode); } //set the new etag if existent Header eTagHeader = httpResponse.getLastHeader("etag"); if(eTagHeader != null){ podcast.setEtagHeaderField(eTagHeader.getValue()); } //set the new "last modified" header field if existent Header lastModifiedHeader= httpResponse.getLastHeader("last-modified"); if(lastModifiedHeader != null) { podcast.setLastModifiedHeaderField(DateUtil.parseDate(lastModifiedHeader.getValue())); podcast.setLastModifiedHeaderFieldStr(lastModifiedHeader.getValue()); } // Release the connection. headMethod.releaseConnection(); }
Notice:
- how the
HttpClientBuilder
has been used to build aClosableHttpClient
[lines 11-15], which is a base implementation ofHttpClient
that also implementsCloseable
- the
HttpParams
from the previous version have been replaced byorg.apache.http.client.config.RequestConfig
[lines 6-9] where I can set the socket and connection timeouts. This configuration is later used (line 13) when building theHttpClient
The remaining of the code is quite simple:
- the HEAD request is executed (line 17)
- if existant, the
eTag
andlast-modified
headers are persisted. - in the end the internal state of the request is reset, making it reusable –
headMethod.releaseConnection()
2.2.3. Make the http call from behind a proxy
If you are behind a proxy you can easily configure the HTTP call by setting a org.apache.http.HttpHost
proxy host on the RequestConfig
:
HTTP call behind a proxy
HttpHost proxy = new HttpHost("xx.xx.xx.xx", 8080, "http"); RequestConfig requestConfig = RequestConfig.custom() .setSocketTimeout(TIMEOUT * 1000) .setConnectTimeout(TIMEOUT * 1000) .setProxy(proxy) .build();
Resources
Source Code – GitHub
- podcastpedia-batch – the job for adding new podcasts from a file to the podcast directory, uses the code presented in the post to persist the eTag and lastModified headers; it is still work in progress. Please make a pull request if you have any improvement proposals
Web
- Hypertext Transfer Protocol — HTTP/1.1
- Maven Repository
Reference: | How to use the new Apache Http Client to make a HEAD request from our JCG partner Adrian Matei at the Codingpedia.org blog. |