Java – too many open files (selenium phantom jsdriver)
In my embedded selenium / phantom JS driver, it seems that resources have not been cleaned up Running the client synchronously will result in millions of open files, and eventually lead to the "open too many files" type exception
When the program was running for about a minute, I collected some output from lsof
$lsof | awk '{ print $2; }' | uniq -c | sort -rn | head 1221966 12180 34790 29773 31260 12138 20955 8414 17940 10343 16665 32332 9512 27713 7275 19226 5496 7153 5040 14065 $lsof -p 12180 | awk '{ print $2; }' | uniq -c | sort -rn | head 2859 12180 1 PID $lsof -p 12180 -Fn | sort -rn | uniq -c | sort -rn | head 1124 npipe 536 nanon_inode 4 nsocket 3 n/opt/jdk/jdk1.8.0_60/jre/lib/jce.jar 3 n/opt/jdk/jdk1.8.0_60/jre/lib/charsets.jar 3 n/dev/urandom 3 n/dev/random 3 n/dev/pts/20 2 n/usr/share/sbt-launcher-packaging/bin/sbt-launch.jar 2 n/usr/share/java/jayatana.jar
I don't understand why the result set using the - P flag on lsof is smaller However, it seems that most entries are pipes and anons_ inode.
The client is very simple in line ~ 100, and calls the driver at the end of use Close() and driver quit(). I tried to cache and reuse the client, but it didn't reduce the open files
case class HeadlessClient( country: String,userAgent: String,inheritSessionId: Option[Int] = None ) { protected var numberOfRequests: Int = 0 protected val proxySessionId: Int = inheritSessionId.getOrElse(new Random().nextInt(Integer.MAX_VALUE)) protected val address = InetAddress.getByName("proxy.domain.com") protected val host = address.getHostAddress protected val login: String = HeadlessClient.username + proxySessionId protected val windowSize = new org.openqa.selenium.Dimension(375,667) protected val (mobProxy,seleniumProxy) = { val proxy = new BrowserMobProxyServer() proxy.setTrustAllServers(true) proxy.setChainedProxy(new InetSocketAddress(host,HeadlessClient.port)) proxy.chainedProxyAuthorization(login,HeadlessClient.password,AuthType.BASIC) proxy.addLastHttpFilterFactory(new HttpFilteRSSourceAdapter() { override def filterRequest(originalRequest: HttpRequest): HttpFilters = { new HttpFiltersAdapter(originalRequest) { override def proxyToServerRequest(httpObject: HttpObject): io.netty.handler.codec.http.HttpResponse = { httpObject match { case req: HttpRequest => req.headers().remove(HttpHeaders.Names.VIA) case _ => } null } } } }) proxy.enableHarCaptureTypes(CaptureType.REQUEST_CONTENT,CaptureType.RESPONSE_CONTENT) proxy.start(0) val seleniumProxy = ClientUtil.createSeleniumProxy(proxy) (proxy,seleniumProxy) } protected val driver: PhantomJSDriver = { val capabilities: DesiredCapabilities = DesiredCapabilities.chrome() val cliArgsCap = new util.ArrayList[String] cliArgsCap.add("--webdriver-loglevel=NONE") cliArgsCap.add("--ignore-ssl-errors=yes") cliArgsCap.add("--load-images=no") capabilities.setCapability(CapabilityType.PROXY,seleniumProxy) capabilities.setCapability("phantomjs.page.customHeaders.Referer","") capabilities.setCapability("phantomjs.page.settings.userAgent",userAgent) capabilities.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS,cliArgsCap) new PhantomJSDriver(capabilities) } driver.executePhantomJS( """ |var navigation = []; | |this.onNavigationRequested = function(url,type,willNavigate,main) { | navigation.push(url) | console.log('Trying to navigate to: ' + url); |} | |this.onResourceRequested = function(request,net) { | console.log("Requesting " + request.url); | if (! (navigation.indexOf(request.url) > -1)) { | console.log("Aborting " + request.url) | net.abort(); | } |}; """.stripMargin ) driver.manage().window().setSize(windowSize) def follow(url: String)(implicit ec: ExecutionContext): List[HarEntry] = { try{ Await.result(Future{ mobProxy.newHar(url) driver.get(url) val entries = mobProxy.getHar.getLog.getEntries.asScala.toList shutdown() entries },45.seconds) } catch { case e: Exception => try { shutdown() } catch { case shutdown: Exception => throw new Exception(s"Error ${shutdown.getMessage} cleaning up after Exception: ${e.getMessage}") } throw e } } def shutdown() = { driver.close() driver.quit() } }
I tried several versions of selenium just in case there was a bug fix build. sbt:
libraryDependencies += "org.seleniumhq.selenium" % "selenium-java" % "3.0.1" libraryDependencies += "net.lightbody.bmp" % "browsermob-core" % "2.1.2"
In addition, I tried phantom JS 2.0 1 and 2.1 1:
$phantomjs --version 2.0.1-development $phantomjs --version 2.1.1
Is this phantom JS or selenium? Is my client using the API improperly?
Solution
Resource usage is caused by browsermob To close the agent and clean up resources, you must call stop()
For this client, this means modifying the shutdown method
def shutdown() = { mobProxy.stop() driver.close() driver.quit() }
Another way to abort is to terminate the proxy server immediately without waiting for traffic to stop