You can easily download files using Java by making a URLConnection. However, if you need to login before accessing the file, how can do this automatically? If you automate the login, how will the code that makes the URLConnection be able to make use of the logged in session? Well, if the logged in session uses a cookie, you can simply extract the cookie and resend it in your code that makes the URLConnection. Here is an example where I’ve used the Selenium web testing tool to do the login. The code is actually Scala, but you can easily convert it to Java:
def getFileViaSelenium() {
println("Logging in via Selenium")
val driver = new FirefoxDriver()
driver.get("https://www.someurl.com/login")
driver.findElement(By.id("username")).clear();
driver.findElement(By.id("username")).sendKeys("John Smith");
driver.findElement(By.id("password")).clear();
driver.findElement(By.id("password")).sendKeys("password");
driver.findElement(By.name("commit")).click();
// now get the cookies
val seleniumCookies = driver.manage().getCookies().asScala
val cookieString = new StringBuilder()
for (cookie <- seleniumCookies) {
println("Cookie value: " + cookie.getValue())
cookieString.append(cookie.getName())
cookieString.append("=")
cookieString.append(cookie.getValue())
cookieString.append("; ")
}
println("Getting file")
val url = new URL("https://www.someurl.com/somefile.csv")
val con = url.openConnection()
// resend the cookies with this request
con.setRequestProperty("Cookie",cookieString.toString())
val contentType = con.getContentType()
println("Content type: " + contentType)
val in = con.getInputStream()
// save file
val fileOut = new File("C:\my_download.csv")
inputToFile(in,fileOut)
println("File saved")
}