Monday, 19 April 2010

TF215097: An error occured while initializing a build

Since, I spent a significant amount of time resolving this issue while moving our team project and builds from TFS 2008 to TFS 2010, I thought to write it down in my blog so that anyone stuck in the same quagmire might find some help.

We have been looking to leverage some exciting new features in TFS 2010 and team build in our project so when the RTM release was issued last Monday we decided to make the move instantly.

The migration process is simple, straight forward and well-documented. I must give it to Microsoft who has done tremendous work to make the whole installation process much simpler and a far cry from early days of TFS 2005.

Once installed, I followed the configuration wizard to upgrade my projects, which went pretty smoothly as well. The last block in the puzzle was to make Team build to work. So, I disabled the build service of TFS 2008 and installed TFS 2010 build service on our build machines. The wizard picked up existing build agents and created a Build Controller and Build agent for me. Ran the build and bang it all worked.

I ran the built again and got the following error

TF215097: An error occured while initializing a build for build definition \Gateway2.0\2.7_Gateway. There was no endpoint listening at http://ggtfs26build1.9191/Build/v3.0/Services/Controller/21 that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details.

Confused as to why it worked the first time around and not second time, I looked in to the properties of the build controller where it failed. Clicked the Test Connection button and it worked correctly. Tried the build again and got the same error. Finally, after restarting my build agents and build controllers and my TFS collections I was able to do a build again only for it to fail again.

After talking to a few people in the Microsoft Visual Studio team, got to know that there are two methods
TestConnection and StartBuild. While the TestConnection method always succeeded the StartBuild would fail. After running some traces on my TFS server, they found that the calls to StartBuild, which goes directly through w3wp.exe succeed but if they are called via TFSJobAgent, it won’t succeed, and since the call to TestConnection always go through w3wp.exe, it always succeeded.


The innerException on my build machine suggested that the proxy could not be resolved. So, I changed my internet connection settings, restarted my TFSJobAgent service and ran a build. Again, the build ran fine the first time around but failed in the second go. Though, it did not resolve the issue but helped to identify that the exception is coming in from the Job Agent service.

The next step was to enable tracing in TFSJobAgent service. You can do that by going to the C:\Program Files\Microsoft Team Foundation Server 2010\Application Tier\TFSJobAgent directory and add the following within the listeners tag in
<System.Diagnostics> section

<add name="myListener" type="System.Diagnostics.TextWriterTraceListener" initializeData="C:\logs\jobagent.log" />


Also, change the value from 0 to 4 in each of the switches. My System.Diagnostics section looked like this

<system.diagnostics>

<trace autoflush="false" indentsize="4">

<!--To enable tracing to file, simply uncomment listeners section and set trace switch(es) below.
Directory specified for TextWriterTraceListener output must exist, and job agent service account must have write permissions. -->

<listeners>

<add name="myListener"
type="System.Diagnostics.TextWriterTraceListener"
initializeData="C:\logs\jobagent.log" />

<remove name="Default" />

</listeners>

</trace>

<switches>

<!-- Trace Switches
Each of the trace switches should be set to a value between 0 and 4, inclusive.
0: No trace output
1-4: Increasing levels of trace output; see Systems.Diagnostics.TraceLevel-->

<add name="API" value="4" />

<add name="Authentication" value="4" />

<add name="Authorization" value="4" />

<add name="Database" value="4" />

<add name="General" value="4" />

<add name="traceLevel" value="4" />

</switches>

</system.diagnostics>


Restarted the service and check the entries in the job agent. After a few minutes, the following error appeared in the log file

[Error, PID 3916, TID 6784, 13:29:49.002] Exception: {
Exception Message: Team Foundation services are not available from server http://:8080/VSTSCI/WebServices/notifyservices.asmx?proj=7fd3fa6a-33bb-4ec8-8cfd-9e6a9cc59015.
Technical information (for administrator):
Unable to connect to the remote server (type TeamFoundationServiceUnavailableException)

Exception Stack Trace: at Microsoft.TeamFoundation.Client.TeamFoundationClientProxyBase.CreateSoapRequest(String methodName, HttpWebRequest& request, XmlWriter& requestXml)
at Microsoft.TeamFoundation.JobService.Extensions.Core.TeamFoundationNotificationClient.Notify(String eventXml, String tfsIdentityXml, Subscription subscription)
at Microsoft.TeamFoundation.JobService.Extensions.Core.NotificationJobExtension.SendSoapNotification(TeamFoundationRequestContext requestContext, TeamFoundationNotification notification, TeamFoundationIdentityService identityService)

Inner Exception Details:

Exception Message: Unable to connect to the remote server (type WebException)
Exception Stack Trace: at System.Net.HttpWebRequest.GetRequestStream(TransportContext& context)
at System.Net.HttpWebRequest.GetRequestStream()
at Microsoft.TeamFoundation.Client.TeamFoundationClientProxyBase.CreateSoapRequest(String methodName, HttpWebRequest& request, XmlWriter& requestXml)

Inner Exception Details:

Exception Message: No connection could be made because the target machine actively refused it OldTFSIPAddress:8080 (type SocketException)

Exception Stack Trace: at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Int32 timeout, Exception& exception)



This was rather strange that i could see web services calls made to my old TFS server, which had been upgraded. So, I did a repair of TFS installation and hurrah the builds start running merrily.

Now, I don’t know if it is an upgrade issue or anything peculiar in our team project which might have caused this error but hopefully with this post you would have an idea on how to resolve this issue. So, if you spot this issue

1) Make sure that the build servers and controller could speak to each other. If the TestConnection button is working, it is a good indication that there is no comms error.

2) Make sure that your proxy settings are correct on TFS server for the user under who’s context the TFS Build Agent is running.

3) If the error still exists, enable tracing on TFSBuildAgent service and see if it is logging an errors.
Apart from the above error, I also got the behaviour that the build would start but the build service could not find the build agent. Internally, it doesn’t start the build either.

4 comments:

santosh said...

Hi Hamid -- thank you so much for this detailed article I can not tell how much this helped me today.

our customers were so blocked with the jobagent failing to trigger with the exact same issue.

Once again thanks a lot -- keep sharing

Christer Romson Lande said...

I seem to have a similar problem. My builds often run, but sometimes I get the error: ” TF215097: An error occurred while initializing a build for build definition \Obstetrix\Obstetrix Development 2.13.01.100: There was no endpoint listening at http://seuvy0016srv:9191/Build/v3.0/Services/Controller/4 that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details.” and sometimes the build controller can’t find the build agent.

I tried to enable tracing in the TFSJobAgent service as you describe, but I don’t have any config files in the C:\Program Files\Microsoft Team Foundation Server 2010\Application Tier\TFSJobAgent\ directory. It’s empty except for a plugins subdirectory. The file’s on the build server not on the TFS server, right?

We have one machine running all the TFS tires and another Hyper-V hosted virtual server running the build controller and build agent. These errors started once I cloned the build server. Now we have two build servers, each running a build controller and a build agent. The old one works as it always has, but the new one sometimes fails as described above.

Padda said...

Hi Christer,

Did you ever get to the bottom of this? I'm having a similar problem. Everything was working fine, until the Server Team updated all the windows patches last weekend. I now get this same behavior.

Was it a case of Add/Remove programs and repair Team Foundation Server?

Thanks

Christer Romson Lande said...

Nope, I still have the problem. Let me know if you solve it, please!