|
Sometimes you'd like to spawn a process from a build that lives longer than the build itself. For example, maybe a part of the build is to launch a new application server with the result of the build. When you do this, you often experience a problem where the build doesn't terminate; you'll see that shell script/ant/maven terminates as expected, but Hudson just insists on waiting, as if it didn't notice that the build is over.
Why?The reason this problem happens is because of file descriptor leak and how they are inherited from one process to another. Hudson and the child process are connected by three pipes (stdin/stdout/stderr.) This allows Hudson to capture the output from the child process. Since the child process may write a lot of data to the pipe and quit immediately after that, Hudson needs to make sure that it drained the pipes before it considers the build to be over. Hudson does this by waiting for EOF. When a process terminates for whatever reasons, the operating system closes all the file descriptors it owned. So even if the process didn't close stdout/stderr, Hudson will nevertheless get EOF. The complication happens when those file descriptors are inherited to other processes. Let's say the child process forks another process to the background. The background process (AKA daemon) inherits all the file descriptors of the parent, including the writing side of the stdout/stderr pipes that connect the child process and Hudson. If the daemon forgets to close them, Hudson won't get EOF for pipes even when the child process exits, because daemon still have those descriptors open. That's how this problem happens. A good daemon program closes all file descriptors to avoid problems like this, but often there are bad ones that don't follow the rule. Work aroundOn Unix, you can use a wrapper like this to make the daemon behave. <scriptdef name="get-next-minute" language="beanshell">
<attribute name="property" />
date = new java.text.SimpleDateFormat("HH:mm")
.format(new Date(System.currentTimeMillis() + 60000));
project.setProperty(attributes.get("property"), date);
</scriptdef>
<get-next-minute property="next-minute" />
<exec executable="at">
<arg value="${next-minute}" />
<arg value="/interactive" />
<arg value="${jboss.home}\bin\run.bat" />
</exec>
Another similar workaround on Windows is to use a wrapper script and launch your program through it. <exec executable="cscript.exe"> <env key="ANTRUN_TITLE" value="Title for Window" /> <!-- optional --> <env key="ANTRUN_OUTPUT" value="output.log" /> <!-- optional --> <arg value="//NoLogo" /> <arg value="antRunAsync.js" /> <!-- this script --> <arg value="real executable" /> </exec> Another workaraund for Windows XP and later is to shedule permanent task and force running it from the ant script. C:\>SCHTASKS /Create /RU SYSTEM /SC ONSTART /TN Tomcat /TR "C:\Program Files\Apache Software Foundation\Tomcat 6.0\bin\startup.bat" Note, that ONSTART can be replaced with ONCE if you do not want to keep Tomcat running. <exec executable="SCHTASKS"> <arg value="/Run"/> <arg value="/TN"/> <arg value="Tomcat"/> </exec> Another possibility that we can consider is to do something in Hudson. |
Comments (11)
Jul 04, 2007
Kohsuke Kawaguchi says:
For my book keeping purpose, this bug discusses the possibility of using NIO wit...For my book keeping purpose, this bug discusses the possibility of using NIO with processes.
May 23, 2009
Keith Clarke says:
At least for Tomcat 6, installing Tomcat as a service and using "net start tomca...At least for Tomcat 6, installing Tomcat as a service and using "net start tomcat6" and "net stop tomcat6" avoids this problem and is simple.
Jul 29, 2009
J. Michael McGarr says:
I am currently experiencing this issue on Solaris 10. I have Hudson runnin...I am currently experiencing this issue on Solaris 10. I have Hudson running and I am trying to setup a job that stops and then starts a local instance of JBoss. I have been unsuccessful in convincing the SA's to install daemonize as is recommended above. I am trying to run a shell script to stop/start the JBoss instance, but when the Hudson job completes, the JBoss instance shutdowns. Does anybody have any recommendations on how we can spawn an instance of JBoss from a Hudson job without using the daemonize script?
Aug 03, 2009
Duana Stanley says:
I'm having the same problem trying to start websphere from my ant script via a w...I'm having the same problem trying to start websphere from my ant script via a windows batch command (the batch command spawns a child process). I'd like to know why it works from ant on the command line but not from Hudson. When run under Hudson the child process that my windows batch file creates gets killed somehow. I understand there may be some issues around open file descriptors etc, but if it's not a problem for ant, why is it a problem for Hudson? The Hudson build doesn't hang or give any warnings. I'm working in a limited environment where I can't install services or schedule background tasks. I'm very interested to understand what's behind this problem.
Aug 09, 2009
Duana Stanley says:
According to issue 2729 https://hudson.dev.java.net/issues/show_bug.cgi?id=2729 ...According to issue 2729 https://hudson.dev.java.net/issues/show_bug.cgi?id=2729 , it sounds like Hudson is willfully killing my build job's beloved spawn.
They mention a workaround of setting the environment variabe BUILD_ID in the job eg. set BUILD_ID=dontKillMe which worked for me.
Oct 02, 2010
Noah Sussman says:
Thanks for the tip. It took me a while to realize that when a build is done, Hu...Thanks for the tip. It took me a while to realize that when a build is done, Hudson kills any child processes that are started by build steps.
Overriding that default behavior by setting BUILD_ID as you suggest,
worked for me too.
Aug 03, 2009
Duana Stanley says:
A simpler way, not involving bean-shell, to get the "next minute" in the ant exa...A simpler way, not involving bean-shell, to get the "next minute" in the ant example above using the at command is
<tstamp>
<format property="next-minute" pattern="HH:mm" offset="1" unit="minute"/>
</tstamp>
Aug 04, 2009
Chetan Sarva says:
I wasn't able to get the daemonize method to work (not sure why) for a free-styl...I wasn't able to get the daemonize method to work (not sure why) for a free-style project on linux but atd works great. I simply schedule my job like so -
$ echo <command to run> | at `date -d '+1 minute' +"%H:%M"`
Dec 17, 2009
pangzi says:
I use the ' csript.exe' method run 'c:\apache-tomcat-6.0.18\bin\startup.ba...I use the ' csript.exe' method run 'c:\apache-tomcat-6.0.18\bin\startup.bat'
on win 2000 and xp ,then it can run on local ant command,but don't run in hudson. --- on win7 ok
I also try 'at' , win 2000 can't execute , xp ok
why??
Started by user anonymous
[workspace] $ cmd /c call C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\4\hudson2567484063519403955.bat
C:\Documents and Settings\Administrator\.hudson\jobs\idpro-wap\workspace>cscript //nologo d:\builds\projects\idpro-wap\antRunAsync.js c:\apache-tomcat-6.0.18\bin\startup.bat
C:\Documents and Settings\Administrator\.hudson\jobs\idpro-wap\workspace>exit 0
Finished: SUCCESS
Jul 29, 2010
Shimsha Rao says:
Hi Pangzi, How did you fix this problem? Even I am facing the similar problem. ...Hi Pangzi,
How did you fix this problem? Even I am facing the similar problem.
thanks,
Shimsha
May 06, 2010
Gerald Reinhart says:
I launch a background command with the plugin Post build task ... and Huds...I launch a background command with the plugin Post build task ... and Hudson do not kill it.