Splitting a big job into smaller jobs

A build is normally a fairly sequential process, and in a big project, a full execution can easily take hours. While one could bring such a job on Hudson, a long turn-around time to the result tends to reduce the value of continuous integration. This page discusses a technique to cope with this problem.

The idea to is to split a big build into multiple stages. Each stage is executed sequentially for a particular build run, but this works like a CPU pipeline and increase the throughput of CI, and also reduces the turn-around time by reducing the time a build sits in the build queue.


In this situation, your earlier stage needs to pass files to later stages. A general way to do this is as follows:

  1. An earlier stage archives all the files into a zip/tgz file at the end of the build.
  2. Tell Hudson to archive this zip/tgz file as a post-build action, take a fingerprint of it, then trigger the next stage.
  3. The first thing the next stage does in its build is to obtain this bundle through the permalink for the last successful artifact, then unzip it. Do keep this archive file around because we'll take a fingerprint of it here, too.
  4. The build proceeds by using the files obtained from the earlier stage.
  5. Tell Hudson to fingerprint the zip/tgz file. This allows you to correlate executions of these stages to track the flow.

If you have more than 2 stages, you can repeat this process. In some cases, this "zip/tgz" file would have to contain the entire workspace. If so, the next stage can use the URL SCM plugin to simplify the retrieval.

issue #682 keeps track of an RFE to more explicitly and better support this use case. Please feel free to add yourself to CC, vote, and comment on the issue.

Note: there is a new plugin which helps with this problem: http://wiki.hudson-ci.org/display/HUDSON/Clone+Workspace+SCM+Plugin

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.
  1. Sep 30, 2008

    Pete says:

    It would be nice to have an example of this.  I would like to have projects...

    It would be nice to have an example of this.  I would like to have projects setup to minimize the amount of time pulling from CVS things that don't change often.  I am new to Hudson and am learning every day! 

     We want to do something like this?

    Project 1:  ReallyBigCVSModules:  Using a custom work space, pulls Project/ThirdParty, Project/OtherHugefiles then just simply echo completion.

    Project 2: BuildsSomething1: Using the same work space as #1 above, pull Project/src and compile.  If successful, tag both project 1 and 2.

    Or, is the idea to make all the pull from CVS #1 a tarball and have #2 untar?  If so, how do we tag both?

  2. Jan 12, 2009

    - says:

    I tried this method but did not like it at all for a few reasons. my biggest iss...

    I tried this method but did not like it at all for a few reasons.
    my biggest issue with this idea is that it totally ties the build process to hudson. you can no longer independently run buildscripts unless the hudson server is available and running with exactly the right configurations, so this suggestion seems to contradict the recommended practice of merely launching independently runnable buildscripts. It also takes considerable time & resources to download, and unpack previously built projects in this way, but once its set up it works ok.

    I tried this with a set of build jobs running before a final deploy-to-test job, but the more I did, the deeper became the dependency on hudson. I am now trying to find a way to share common configuration files amongst the jobs without hudson becoming the lynchpin. I'd love to know how others have solved these issues.

  3. Jan 15, 2009

    Salim Fadhley says:

    Assume we have two builds, A and B. A compiles a library which will be used for ...

    Assume we have two builds, A and B. A compiles a library which will be used for all kinds of testing. B is one of the test-processes that will check the output of A.

    We let A do it's thing and at the end of the process it produces the required output which is saved as an Artifact.

    How does B get the artifact from the completed process A? Do I need to look it up via some kind of API call or is there a simpler way to ensure that an artifact from a previously completed build is available for another build to work on?

    Thanks

    1. May 06, 2009

      jackson ha says:

      Salim,  did you find a solution to your question?  i think custom wor...

      Salim,

       did you find a solution to your question?  i think custom workspaces might work...but I haven't figured it out totally yet.

  4. Apr 23, 2009

    Vladislav Roshchin says:

    it's surprising and very bad that nobody anwered that last question. is hudson p...

    it's surprising and very bad that nobody anwered that last question. is hudson project and community is dead?

    simple question was here and I have similair. Project A references project B (.Net). I can't find a way to build A, unless in the project A I reference not a project B itself but an assembly from B. 

    I want to reference a project, this how complex projects are stored in source control.

  5. May 06, 2009

    Mike Buchanan says:

    "3. The first thing the next stage does in its build is to obtain this bundle th...

    "3. The first thing the next stage does in its build is to obtain this bundle through the permalink for the last successful artifact"

    How do I do this?

    1. May 20, 2009

      jackson ha says:

      Mike...  install URL SCM. after you have the zip file made in stage 1, con...

      Mike...

       install URL SCM. after you have the zip file made in stage 1, configure your stage 2's URL SCM to point to it. 

      i got it working last wk.... works pretty well.

      1. Sep 02, 2009

        Christian Schneider says:

        IMO this could be nice ... in case I could add a URL to copy from additionally t...

        IMO this could be nice ... in case I could add a URL to copy from additionally to my regular, e.g. SVN, checkout - not alternatively!

  6. Sep 01, 2009

    Austin Tam says:

    After scouring the web for a solution, this is what I've come up with: I've se...

    After scouring the web for a solution, this is what I've come up with:

    • I've setup two jobs: 'build' and 'test'.
    • 'Build' obviously builds my project (e.g. an ear - for me, it deploys to server as well).
    • 'Test' runs unit tests and selenium tests AFTER it executes a shell command to copy 'build' job's workspace into the 'test' workspace (see below)
    • The 'test' job is triggered by the 'build' job. e.g. 'build' job's post build actions builds 'test' OR 'test' job's build trigger is set to build after the 'build' job completes (e.g. 'Build after other projects are built')
    • SVN Tagging and Publish Test results are configured on the 'test' job. 
    • fingerprint is turned on (apparently this is required for downstream test aggregation)

    shell command to copy a job's workspace to another workspace:

    cd ..
    rm -rf workspace
    cp -r /opt/tomcat5/.hudson/jobs/<job name>/workspace workspace

  7. Sep 02, 2009

    Christian Schneider says:

    I propose to install an additional 'check-boxed' item to include lastStableBuild...

    I propose to install an additional 'check-boxed' item to include lastStableBuilds / lastSuccesfulBuilds artifacts with a drop down menu providing the available jobs.

    If no artifacts are available the using job might just fail. This could even be a task of a new plugin.

  8. Sep 05, 2009

    Andy Tomlin says:

    I am a new user to Hudson and am also trying to get this working. Here is the pr...

    I am a new user to Hudson and am also trying to get this working. Here is the problem I am trying to solve. I have a build of multiple targets from the same source tree (C source on Windows). I have already created a mechanism in cygwin bash that allows as many machines as possible to execute and load balance the build its self (this is done via file locking on a shared directory and file system primitives). Anyway, all I want to do is execute the same command on multiple machines. I looked at two options - 1) multiple jobs 2) distfork and both have problems it seems

    1) Multiple jobs

    If I have 5 build machines, and create 1 main job to trigger 5 dependent jobs (one on each machine) I cannot see how to get info from the first build to the subsequent builds. The first build I want to get source, tgz it up and distribute to other machines. If I could pass any kind of parameter between jobs I could get it to work, but I cannot see anything.

    2) diskfork 

    I do not see where to get the CLI.jar file from. This almost looks like exactly what I want, although it is not clear that if I launch multiple commands how do I keep all the console messages straight.

    Any help would be appreciated

    1. Sep 06, 2009

      Andy Tomlin says:

      Ok, I think I got it working. I am using the multiple jobs scheme. I have 1 pro...

      Ok, I think I got it working. I am using the multiple jobs scheme.

      I have 1 project that gets all the source code, gets some SCM info (accurev) and tgz's the entire workspace and stores the result as an Artifact of the build. This project then triggers multiple sub projects that operate in parallel, 1 per build server. These servers all get the tgz file

      Project 1: Test1.

      accurev stat -R * >FileStats.txt
      accurev info >WsInfo.txt
      tar -cvzf source.tgz --exclude source.tgz *
      

      Project 2-n

      rm -rf *
      "c:\program files\gnuwin32\bin\wget" -nv http://hudson:8080/job/Test1/lastSuccessfulBuild/artifact/source.tgz
      tar -xvf source.tgz
      dobuild.bat
      

      The advantage of this scheme is that all the targets a guaranteed to have the same source atomically (if multiple machines fetch source code it can result in multiple different versions)

      Note that Test1 gets source code and the other projects do not.

      1. Sep 06, 2009

        Andy Tomlin says:

        Here is bash script used on all targets to balance build. All the servers share ...

        Here is bash script used on all targets to balance build. All the servers share a cfg file and execute 1 line per build, others may find this file system locking scheme useful.

        Note that COPYLOC is environment variable that exists on network share. 

        get_locked_line()
        {
            LOCKFOLDER="$COPYLOC"
            LOCKFILE="buildlock.txt"
            LOCKSHARED="lock.txt"
            LOCKFILEFULL="$LOCKFOLDER/$LOCKSHARED"
            LOCKCOUNTFILE="$LOCKFOLDER/count.txt"
            LOCKFILECOUNT=0
            LOCKDONE="NO"
        
            echo "Lock file is $LOCKFILEFULL"
            echo "$COMPUTERNAME" >$LOCKFILE
        
            while [ $LOCKFILECOUNT -lt $cnt ] && [ "$LOCKDONE" == "NO" ]
            do
                #try to copy my lock file to share - may or may not be successful
                cp $LOCKFILE $LOCKFILEFULL
        
                #try to make file read only - may or may not be successful
                if chmod 444 $LOCKFILEFULL
                then
                   #only perform check on read only file so that all computers read same thing
                   if [ "$(cat $LOCKFILEFULL)" == "$COMPUTERNAME" ]
                   then
                      # Yay, I got the lock
                      if [ -f $LOCKCOUNTFILE ]
                      then
                         # file already exists so get current line number
                         LOCKFILECOUNT=$(cat $LOCKCOUNTFILE)
                      fi
                      #set return value of function
                      LOCKCOUNTRETURN=$LOCKFILECOUNT
                      LOCKDONE="YES"
                      echo "Lock obtained, cfg file line to process is $LOCKFILECOUNT"
                      let "LOCKFILECOUNT = $LOCKFILECOUNT + 1"
                      echo $LOCKFILECOUNT > $LOCKCOUNTFILE
                      rm -f $LOCKFILEFULL
        
                   else
                      echo "Waiting for lock Zzzz.."
                      sleep 1
                   fi
                fi
                echo "Lock loop"
            done
            echo "Lock done"
        }
        


  9. Jul 31, 2010

    Iftach Bar says:

    Hi, I've just posted something that shows a simple case of splitting the job int...

    Hi, I've just posted something that shows a simple case of splitting the job into build and test jobs:

    Here is the link:

    http://barnashcode.blogspot.com/2010/07/split-hudson-jobs.html

    I hope you'll like it.