org.apache.solr.handler.dataimport
Class EntityProcessorBase

java.lang.Object
  extended by org.apache.solr.handler.dataimport.EntityProcessor
      extended by org.apache.solr.handler.dataimport.EntityProcessorBase
Direct Known Subclasses:
FileListEntityProcessor, LineEntityProcessor, MailEntityProcessor, PlainTextEntityProcessor, SqlEntityProcessor, TikaEntityProcessor, XPathEntityProcessor

public class EntityProcessorBase
extends EntityProcessor

Base class for all implementations of EntityProcessor

Most implementations of EntityProcessor extend this base class which provides common functionality.

This API is experimental and subject to change

Since:
solr 1.3
Version:
$Id: EntityProcessorBase.java 1071595 2011-02-17 12:32:39Z rmuir $

Field Summary
static String ABORT
           
static String CACHE_KEY
           
static String CACHE_LOOKUP
           
protected  String cachePk
          Only used by cache implementations
protected  String cacheVariableName
          Only used by cache implementations
protected  Map<String,Map<Object,List<Map<String,Object>>>> cacheWithWhereClause
          Only used by cache implementations
protected  Context context
           
static String CONTINUE
           
protected  List<Map<String,Object>> dataSourceRowCache
           
protected  String entityName
           
protected  boolean isFirstInit
           
static String ON_ERROR
           
protected  String onError
           
protected  String query
           
protected  Iterator<Map<String,Object>> rowIterator
           
protected  Map<String,List<Map<String,Object>>> simpleCache
          Only used by cache implementations
static String SKIP
           
static String SKIP_DOC
           
static String TRANSFORM_ROW
           
static String TRANSFORMER
           
protected  List<Transformer> transformers
           
 
Constructor Summary
EntityProcessorBase()
           
 
Method Summary
protected  void cacheInit()
          Only used by cache implementations
 void destroy()
          Invoked for each parent-row after the last row for this entity is processed.
protected  void firstInit(Context context)
          first time init call.
protected  List<Map<String,Object>> getAllNonCachedRows()
           Get all the rows from the the datasource for the given query.
protected  Map<String,Object> getFromRowCacheTransformed()
           
protected  Map<String,Object> getIdCacheData(String query)
          If the where clause is present the cache is sql Vs Map of key Vs List of Rows.
protected  Map<String,Object> getNext()
           
protected  Map<String,Object> getSimpleCacheData(String query)
          If where clause is not present the cache is a Map of query vs List of Rows.
 void init(Context context)
          This method is called when it starts processing an entity.
 Map<String,Object> nextDeletedRowKey()
          This is used during delta-import.
 Map<String,Object> nextModifiedParentRowKey()
          This is used during delta-import.
 Map<String,Object> nextModifiedRowKey()
          This is used for delta-import.
 Map<String,Object> nextRow()
          For a simple implementation, this is the only method that the sub-class should implement.
 
Methods inherited from class org.apache.solr.handler.dataimport.EntityProcessor
close, postTransform
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

isFirstInit

protected boolean isFirstInit

entityName

protected String entityName

context

protected Context context

rowIterator

protected Iterator<Map<String,Object>> rowIterator

transformers

protected List<Transformer> transformers

query

protected String query

onError

protected String onError

cachePk

protected String cachePk
Only used by cache implementations


cacheVariableName

protected String cacheVariableName
Only used by cache implementations


simpleCache

protected Map<String,List<Map<String,Object>>> simpleCache
Only used by cache implementations


cacheWithWhereClause

protected Map<String,Map<Object,List<Map<String,Object>>>> cacheWithWhereClause
Only used by cache implementations


dataSourceRowCache

protected List<Map<String,Object>> dataSourceRowCache

TRANSFORMER

public static final String TRANSFORMER
See Also:
Constant Field Values

TRANSFORM_ROW

public static final String TRANSFORM_ROW
See Also:
Constant Field Values

ON_ERROR

public static final String ON_ERROR
See Also:
Constant Field Values

ABORT

public static final String ABORT
See Also:
Constant Field Values

CONTINUE

public static final String CONTINUE
See Also:
Constant Field Values

SKIP

public static final String SKIP
See Also:
Constant Field Values

SKIP_DOC

public static final String SKIP_DOC
See Also:
Constant Field Values

CACHE_KEY

public static final String CACHE_KEY
See Also:
Constant Field Values

CACHE_LOOKUP

public static final String CACHE_LOOKUP
See Also:
Constant Field Values
Constructor Detail

EntityProcessorBase

public EntityProcessorBase()
Method Detail

init

public void init(Context context)
Description copied from class: EntityProcessor
This method is called when it starts processing an entity. When it comes back to the entity it is called again. So it can reset anything at that point. For a rootmost entity this is called only once for an ingestion. For sub-entities , this is called multiple once for each row from its parent entity

Specified by:
init in class EntityProcessor
Parameters:
context - The current context

firstInit

protected void firstInit(Context context)
first time init call. do one-time operations here


getNext

protected Map<String,Object> getNext()

nextModifiedRowKey

public Map<String,Object> nextModifiedRowKey()
Description copied from class: EntityProcessor
This is used for delta-import. It gives the pks of the changed rows in this entity

Specified by:
nextModifiedRowKey in class EntityProcessor
Returns:
the pk vs value of all changed rows

nextDeletedRowKey

public Map<String,Object> nextDeletedRowKey()
Description copied from class: EntityProcessor
This is used during delta-import. It gives the primary keys of the rows that are deleted from this entity. If this entity is the root entity, solr document is deleted. If this is a sub-entity, the Solr document is considered as 'changed' and will be recreated

Specified by:
nextDeletedRowKey in class EntityProcessor
Returns:
the pk vs value of all changed rows

nextModifiedParentRowKey

public Map<String,Object> nextModifiedParentRowKey()
Description copied from class: EntityProcessor
This is used during delta-import. This gives the primary keys and their values of all the rows changed in a parent entity due to changes in this entity.

Specified by:
nextModifiedParentRowKey in class EntityProcessor
Returns:
the pk vs value of all changed rows in the parent entity

nextRow

public Map<String,Object> nextRow()
For a simple implementation, this is the only method that the sub-class should implement. This is intended to stream rows one-by-one. Return null to signal end of rows

Specified by:
nextRow in class EntityProcessor
Returns:
a row where the key is the name of the field and value can be any Object or a Collection of objects. Return null to signal end of rows

destroy

public void destroy()
Description copied from class: EntityProcessor
Invoked for each parent-row after the last row for this entity is processed. If this is the root-most entity, it will be called only once in the import, at the very end.

Specified by:
destroy in class EntityProcessor

cacheInit

protected void cacheInit()
Only used by cache implementations


getIdCacheData

protected Map<String,Object> getIdCacheData(String query)
If the where clause is present the cache is sql Vs Map of key Vs List of Rows. Only used by cache implementations.

Parameters:
query - the query string for which cached data is to be returned
Returns:
the cached row corresponding to the given query after all variables have been resolved

getAllNonCachedRows

protected List<Map<String,Object>> getAllNonCachedRows()

Get all the rows from the the datasource for the given query. Only used by cache implementations.

This must be implemented by sub-classes which intend to provide a cached implementation

Returns:
the list of all rows fetched from the datasource.

getSimpleCacheData

protected Map<String,Object> getSimpleCacheData(String query)
If where clause is not present the cache is a Map of query vs List of Rows. Only used by cache implementations.

Parameters:
query - string for which cached row is to be returned
Returns:
the cached row corresponding to the given query

getFromRowCacheTransformed

protected Map<String,Object> getFromRowCacheTransformed()


Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.