Indexing arbitrary error (XMLStreamException Message: null)

Author Message

Jens Görisch

Monday 31 August 2009 7:11:30 am

Hello,

as the title implies, I have problems with indexing into the eZ Find Solr index.

First I want to make clear that I don't have problems with the eZ Find index script. Or at least I don't checked, if this error occurs with this script, too.

Explanation:
We are using a data model, that is ezContentObject-compliant, but more lightweight. To index this data model, we are using the schema file of eZ Find, since the core fields are the same.

This model holds ~8300 objects, which are indexed twice to switch between the "searchable index" and the "indexing index". eZ Publish has 97800 object indexed, which results in more than 100k objects in the index. I don't noticed this error with lower-count-indexes.

Now to the error itself:
When indexing, sometimes the update process causes an error (sometimes means a few XML packets, not a few index processes). The result from Solr is empty and the log-file contains the following entry:

SEVERE: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[4004,3038]
Message: null
	at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:586)
	at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:321)
	at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:195)
	at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
	at org.mortbay.jetty.Server.handle(Server.java:285)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
	at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

I dumped the XML data and checked the respective position. It always was an ordinary (but different) character. Validating the XML data with xmllint also results in valid XML. Occasionally even no error occurs and indexing succeeds.

I've found a workaround to bypass this temporarily, by just retrying the particular packages until <i>eZSolrBase::addDocs()</i> returns <i>true</i> (up to a count of 3). Strangely the <b>same</b> XML works the second or third time.

Does anybody can report about similar problems? And perhaps already have found a (real) solution and the reason for this?

Thanks in advance,

Jens Görisch

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.