Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lucenerdd] save index to hdfs #159

Open
wants to merge 33 commits into
base: develop
Choose a base branch
from
Open

Conversation

zouzias
Copy link
Owner

@zouzias zouzias commented Mar 14, 2019

WIP

Tries to fix #147

Anastasios Zouzias and others added 17 commits March 11, 2019 22:07
* Remove unused code (#141)

* Revert "Setting version to 0.3.4-SNAPSHOT"

This reverts commit 2f1d7be.

* README: update to 0.3.3

* README: fix javadoc badge

* remove unused param

* [sbt] version updates

* [conf] allow not_analyzed string fields (#145)

* [not-analyzed-fields] do not analyzed fields ending with _notanalyzed

* [hotfix] fixes issue 150

* [tests] issue 150

* fix typo

* [blockEntityLinkage] drop queryPartColumns
* [linkage] block linker with => Query

* [linkage] block linker is Row => Query

* remove Query analyzer on methods
@zouzias zouzias force-pushed the feature/save_index_to_hdfs branch from e355378 to 4f6f0a1 Compare March 15, 2019 21:43
zouzias and others added 8 commits March 15, 2019 23:17
* [analyzers] custom analyzer

* test return null

* [travis] travis_wait 1 min

* Revert "[travis] travis_wait 1 min"

This reverts commit c79456e.

* use lucene examples

* custom analyzer return null

* fix java reflection

* add docs
* [lucene] upgrade to version 8.0.0

* [lucene] remove ngram analyzer

* delete ngram analyzer

* minor fix

* add scaladoc
@yeikel
Copy link
Contributor

yeikel commented Apr 8, 2019

@zouzias Do you need any support with this?

@zouzias
Copy link
Owner Author

zouzias commented Apr 9, 2019

Hi @yeikel , if you want to work on the issue feel free to continue the PR.

The missing items are:

  • From every executor copy the index folder to HDFS, i.e., /path_in_hdfs/partition_id/lucene_index.
  • Implement a LuceneRDD constructor that reads the precomputed index from HDFS.

zouzias added 3 commits April 9, 2019 22:36
* [issue_163] per field analysis

* [sbt] update scalatest to 3.0.7

* [issue_163] fix docs; order of arguments

* fixes on ShapeLuceneRDD

* [issue_163] fix test

* issue_163: minor fix

* introduce LuceneRDDParams case class

* fix apply in LuceneRDDParams

* [issue_163] remove duplicate apply defn

* add extra LuceneRDD.apply
[issue_165] throw runtime exception; handle multi-valued fields in DataFrames
* [refactor] configuration loading

* [travis] code hygiene
* WIP

* fix tests

* remove SparkDoc class

* make test compile

* use GenericRowWithSchema

* tests: getDouble score

* score is a float

* fix casting issue with Seq[String]

* tests: LuceneDocToSparkRowpec

* tests: LuceneDocToSparkRowpec

* more tests

* LuceneDocToSparkRowpec: more tests

* LuceneDocToSparkRowpec: fix tests

* LuceneDocToSparkRow: fix Number type inference

* LuceneDocToSparkRowpec: fix tests

* implicits: remove StoredField for Numeric types

* implicits: revert remove StoredField for Numeric types

* fix more tests

* fix more tests

* [tests] fix LuceneRDDResponse .toDF()

* fix multivalued fields

* fix score type issue

* minor

* stored fields for numerics

* hotfix: TextField must be stored using StoredField

* hotfix: stringToDocument implicit

* link issue 179

* fix tests

* remove _.toRow() calls

* fix compile issue
@yeikel
Copy link
Contributor

yeikel commented Apr 24, 2019

Any chance you can resolve the merge conflicts here?

@zouzias
Copy link
Owner Author

zouzias commented Apr 25, 2019

Any chance you can resolve the merge conflicts here?

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants