Java – too many open files (selenium phantom jsdriver)
In my embedded selenium / phantom JS driver, it seems that resources have not been cleaned up Running the client synchronously will result in millions of open files, and eventually lead to the "open too many files" type exception
When the program was running for about a minute, I collected some output from lsof
$lsof | awk '{ print $2; }' | uniq -c | sort -rn | head
1221966 12180
34790 29773
31260 12138
20955 8414
17940 10343
16665 32332
9512 27713
7275 19226
5496 7153
5040 14065
$lsof -p 12180 | awk '{ print $2; }' | uniq -c | sort -rn | head
2859 12180
1 PID
$lsof -p 12180 -Fn | sort -rn | uniq -c | sort -rn | head
1124 npipe
536 nanon_inode
4 nsocket
3 n/opt/jdk/jdk1.8.0_60/jre/lib/jce.jar
3 n/opt/jdk/jdk1.8.0_60/jre/lib/charsets.jar
3 n/dev/urandom
3 n/dev/random
3 n/dev/pts/20
2 n/usr/share/sbt-launcher-packaging/bin/sbt-launch.jar
2 n/usr/share/java/jayatana.jar
I don't understand why the result set using the - P flag on lsof is smaller However, it seems that most entries are pipes and anons_ inode.
The client is very simple in line ~ 100, and calls the driver at the end of use Close() and driver quit(). I tried to cache and reuse the client, but it didn't reduce the open files
case class HeadlessClient(
country: String,userAgent: String,inheritSessionId: Option[Int] = None
) {
protected var numberOfRequests: Int = 0
protected val proxySessionId: Int = inheritSessionId.getOrElse(new Random().nextInt(Integer.MAX_VALUE))
protected val address = InetAddress.getByName("proxy.domain.com")
protected val host = address.getHostAddress
protected val login: String = HeadlessClient.username + proxySessionId
protected val windowSize = new org.openqa.selenium.Dimension(375,667)
protected val (mobProxy,seleniumProxy) = {
val proxy = new BrowserMobProxyServer()
proxy.setTrustAllServers(true)
proxy.setChainedProxy(new InetSocketAddress(host,HeadlessClient.port))
proxy.chainedProxyAuthorization(login,HeadlessClient.password,AuthType.BASIC)
proxy.addLastHttpFilterFactory(new HttpFilteRSSourceAdapter() {
override def filterRequest(originalRequest: HttpRequest): HttpFilters = {
new HttpFiltersAdapter(originalRequest) {
override def proxyToServerRequest(httpObject: HttpObject): io.netty.handler.codec.http.HttpResponse = {
httpObject match {
case req: HttpRequest => req.headers().remove(HttpHeaders.Names.VIA)
case _ =>
}
null
}
}
}
})
proxy.enableHarCaptureTypes(CaptureType.REQUEST_CONTENT,CaptureType.RESPONSE_CONTENT)
proxy.start(0)
val seleniumProxy = ClientUtil.createSeleniumProxy(proxy)
(proxy,seleniumProxy)
}
protected val driver: PhantomJSDriver = {
val capabilities: DesiredCapabilities = DesiredCapabilities.chrome()
val cliArgsCap = new util.ArrayList[String]
cliArgsCap.add("--webdriver-loglevel=NONE")
cliArgsCap.add("--ignore-ssl-errors=yes")
cliArgsCap.add("--load-images=no")
capabilities.setCapability(CapabilityType.PROXY,seleniumProxy)
capabilities.setCapability("phantomjs.page.customHeaders.Referer","")
capabilities.setCapability("phantomjs.page.settings.userAgent",userAgent)
capabilities.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS,cliArgsCap)
new PhantomJSDriver(capabilities)
}
driver.executePhantomJS(
"""
|var navigation = [];
|
|this.onNavigationRequested = function(url,type,willNavigate,main) {
| navigation.push(url)
| console.log('Trying to navigate to: ' + url);
|}
|
|this.onResourceRequested = function(request,net) {
| console.log("Requesting " + request.url);
| if (! (navigation.indexOf(request.url) > -1)) {
| console.log("Aborting " + request.url)
| net.abort();
| }
|};
""".stripMargin
)
driver.manage().window().setSize(windowSize)
def follow(url: String)(implicit ec: ExecutionContext): List[HarEntry] = {
try{
Await.result(Future{
mobProxy.newHar(url)
driver.get(url)
val entries = mobProxy.getHar.getLog.getEntries.asScala.toList
shutdown()
entries
},45.seconds)
} catch {
case e: Exception =>
try {
shutdown()
} catch {
case shutdown: Exception =>
throw new Exception(s"Error ${shutdown.getMessage} cleaning up after Exception: ${e.getMessage}")
}
throw e
}
}
def shutdown() = {
driver.close()
driver.quit()
}
}
I tried several versions of selenium just in case there was a bug fix build. sbt:
libraryDependencies += "org.seleniumhq.selenium" % "selenium-java" % "3.0.1" libraryDependencies += "net.lightbody.bmp" % "browsermob-core" % "2.1.2"
In addition, I tried phantom JS 2.0 1 and 2.1 1:
$phantomjs --version 2.0.1-development $phantomjs --version 2.1.1
Is this phantom JS or selenium? Is my client using the API improperly?
Solution
Resource usage is caused by browsermob To close the agent and clean up resources, you must call stop()
For this client, this means modifying the shutdown method
def shutdown() = {
mobProxy.stop()
driver.close()
driver.quit()
}
Another way to abort is to terminate the proxy server immediately without waiting for traffic to stop
