deepset-ai · tholor · Sep 27, 2021 · Sep 22, 2021 · Sep 22, 2021 · Sep 22, 2021
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -16,3 +16,105 @@ Please give a concise description in the first comment in the PR that includes:
 - What are limitations?
 - Breaking changes (Example of before vs. after)
 - Link the issue that this relates to
+
+## Running tests
+
+### CI
+Tests will automatically run in our CI for every commit you push to your PR. This is the most convenient way for you and we encourage you to create early "WIP Pull requests".
+
+### Local
+However, you can also run the tests locally by executing pytest in your terminal from the `/test` folder.
+
+#### Running all tests
+**Important**: If you want to run **all** tests locally, you'll need **all** document stores running in the background before you run the tests.
+Many of the tests will then be executed multiple times with different document stores.
+
+You can launch them like this:
+```
+docker run -d -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms128m -Xmx128m" elasticsearch:7.9.2
+docker run -d -p 19530:19530 -p 19121:19121 milvusdb/milvus:1.1.0-cpu-d050721-5e559c
+docker run -d -p 8080:8080 --name haystack_test_weaviate --env AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' --env PERSISTENCE_DATA_PATH='/var/lib/weaviate' semitechnologies/weaviate:1.7.0
+docker run -d -p 7200:7200 --name haystack_test_graphdb deepset/graphdb-free:9.4.1-adoptopenjdk11
+docker run -d -p 9998:9998 -e "TIKA_CHILD_JAVA_OPTS=-JXms128m" -e "TIKA_CHILD_JAVA_OPTS=-JXmx128m" apache/tika:1.24.1
+```
+Then run all tests:
+```
+cd test
+pytest
+```
+
+#### Recommendation: Running a subset of tests
+In most cases you rather want to run a **subset of tests** locally that are related to your dev:
+
+The most important option to reduce the number of tests in a meaningful way, is to shrink the "test grid" of document stores.
+This is possible by adding the `--doc_store_type` arg to your pytest command. Possible values are: `"elasticsearch, faiss, memory, milvus, weaviate"`.
+For example, calling `pytest . --doc_store_type="memory"` will run all tests that can be run with the InMemoryDocumentStore, i.e.: 
+- all the tests that we typically run on the whole "document store grid" will only be run for InMemoryDocumentStore
+- any test that is specific to other document stores (e.g. elasticsearch) and is not supported by the chosen document store will be skipped (and marked in the logs accordingly)
+
+
+Run tests that are possible for a **selected document store**. The InMemoryDocument store is a very good starting point as it doesn't require any of the external docker containers from above: 
+```
+pytest . --doc_store_type="memory"
+```
+Run tests using a **combination of document stores**:
+```
+pytest . --doc_store_type="memory,elasticsearch"
+```
+*Note: You will need to launch the elasticsearch container here as described above'*
+
+Just run **one individual test**:
+```
+pytest -v test_retriever.py::test_dpr_embedding
+```
+Select a **logical subset of tests** via markers and the optional "not" keyword:
+```
+pytest -m not elasticsearch
+pytest -m elasticsearch
+pytest -m generator
+pytest -m tika
+pytest -m not slow
+...
+```
+
+
+## Writing tests
+
+If you are writing a test that depend on a document store, there are a few conventions to define on which document store type this test should/can run:
+
+### Option 1: The test should run on all document stores / those supplied in the CLI arg `--doc_store_type`: 
+Use one of the fixtures `document_store` or `document_store_with_docs` or `document_store_type`.
+Do not parameterize it yourself. 
+
+Example:
+```
+def test_write_with_duplicate_doc_ids(document_store):
+        ...
+        document_store.write(docs)
+        ....
+
+```
+
+### Option 2: The test is only compatible with certain document stores: 
+Some tests you don't want to run on all possible document stores. Either because the test is specific to one/few doc store(s) or the test is not really document store related and it's enough to test it on one document store and speed up the execution time.
+
+Example:
+```
+# Currently update_document_meta() is not implemented for InMemoryDocStore so it's not listed here as an option
+
+@pytest.mark.parametrize("document_store", ["elasticsearch", "faiss"], indirect=True)
+def test_update_meta(document_store):
+    ....
+``` 
+
+### Option 3: The test is not using a `document_store`/ fixture, but still has a hard requirement for a certain document store:
+
+Example:
+```
+@pytest.mark.elasticsearch
+def test_elasticsearch_custom_fields(elasticsearch_fixture):
+    client = Elasticsearch()
+    client.indices.delete(index='haystack_test_custom', ignore=[404])
+    document_store = ElasticsearchDocumentStore(index="haystack_test_custom", text_field="custom_text_field",
+                                                embedding_field="custom_embedding_field")
+``` 
diff --git a/README.md b/README.md
@@ -558,41 +558,11 @@ You will find the Swagger API documentation at
 
 ## :heart: Contributing
 
-We are very open to the community's contributions - be it a quick fix of a typo, or a completely new feature! You don't need to be a Haystack expert to provide meaningful improvements. To avoid any extra work on either side, please check our [Contributor Guidelines](https://github.com/deepset-ai/haystack/blob/master/CONTRIBUTING.md) first.
+We are very open to the community's contributions - be it a quick fix of a typo, or a completely new feature! You don't need to be a Haystack expert to provide meaningful improvements. 
+To avoid any extra work on either side, please check our [Contributor Guidelines](https://github.com/deepset-ai/haystack/blob/master/CONTRIBUTING.md) first.
+You can also find instructions to run the tests locally there.
+
 
 We'd also like to invite you to our Slack community channels. Please join [here](https://haystack.deepset.ai/community/join)!
 
-#### Tests
-Tests will automatically run in our CI for every commit you push to your PR. You can also run them locally by executing pytest in your terminal from the root folder of this repository.
-If you want to run **all** tests locally, you'll need **all** document stores running in the background.
-You can launch them like this:
-```
-docker run -d -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms128m -Xmx128m" elasticsearch:7.9.2
-docker run -d -p 19530:19530 -p 19121:19121 milvusdb/milvus:1.1.0-cpu-d050721-5e559c
-docker run -d -p 8080:8080 --name haystack_test_weaviate --env AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' --env PERSISTENCE_DATA_PATH='/var/lib/weaviate' semitechnologies/weaviate:1.7.0
-docker run -d -p 7200:7200 --name haystack_test_graphdb deepset/graphdb-free:9.4.1-adoptopenjdk11
-docker run -d -p 9998:9998 -e "TIKA_CHILD_JAVA_OPTS=-JXms128m" -e "TIKA_CHILD_JAVA_OPTS=-JXmx128m" apache/tika:1.24.1
-```
-Then run all tests:
-```
-cd test
-pytest
-```
-To just run individual tests:
-```
-pytest -v test_retriever.py::test_dpr_embedding
-```
-To just select a logical subset of tests via markers and the optional "not" keyword:
-```
-pytest -m not elasticsearch
-pytest -m elasticsearch
-pytest -m generator
-pytest -m tika
-pytest -m not slow
-...
-```
-If you want to run all test cases but not with all document store variants, you can make use of the arg `--document_store_type`:
-```
-pytest -v test_retriever.py::test_dpr_embedding --document_store_type="faiss"
-pytest -v test_retriever.py::test_dpr_embedding --document_store_type="faiss, memory"
-```
+
diff --git a/docs/_src/api/api/document_store.md b/docs/_src/api/api/document_store.md
@@ -1536,6 +1536,9 @@ Usage:
 1. Start a Weaviate server (see https://www.semi.technology/developers/weaviate/current/getting-started/installation.html)
 2. Init a WeaviateDocumentStore in Haystack
 
+Limitations:
+The current implementation is not supporting the storage of labels, so you cannot run any evaluation workflows.
+
 <a name="weaviate.WeaviateDocumentStore.__init__"></a>
 #### \_\_init\_\_
 

diff --git a/haystack/document_store/weaviate.py b/haystack/document_store/weaviate.py
@@ -29,6 +29,9 @@ class WeaviateDocumentStore(BaseDocumentStore):
     Usage:
     1. Start a Weaviate server (see https://www.semi.technology/developers/weaviate/current/getting-started/installation.html)
     2. Init a WeaviateDocumentStore in Haystack
+
+    Limitations:
+    The current implementation is not supporting the storage of labels, so you cannot run any evaluation workflows.
     """
 
     def __init__(

diff --git a/test/conftest.py b/test/conftest.py
@@ -36,26 +36,32 @@
 
 
 def pytest_addoption(parser):
-    parser.addoption("--document_store_type", action="store", default="all")
+    parser.addoption("--document_store_type", action="store", default="elasticsearch, faiss, memory, milvus, weaviate")
 
 
 def pytest_generate_tests(metafunc):
+    # Get selected docstores from CLI arg
+    document_store_type = metafunc.config.option.document_store_type
+    selected_doc_stores = [item.strip() for item in document_store_type.split(",")]
+
     # parametrize document_store fixture if it's in the test function argument list
     # but does not have an explicit parametrize annotation e.g
     # @pytest.mark.parametrize("document_store", ["memory"], indirect=False)
     found_mark_parametrize_document_store = False
     for marker in metafunc.definition.iter_markers('parametrize'):
-        if 'document_store' in marker.args[0]:
+        if 'document_store' in marker.args[0] or 'document_store_with_docs' in marker.args[0] or 'document_store_type' in marker.args[0]:
             found_mark_parametrize_document_store = True
             break
-
+    # for all others that don't have explicit parametrization, we add the ones from the CLI arg
     if 'document_store' in metafunc.fixturenames and not found_mark_parametrize_document_store:
-        document_store_type = metafunc.config.option.document_store_type
-        if "all" in document_store_type:
-            document_store_type = "elasticsearch, faiss, memory, milvus"
-
-        document_store_types = [item.strip() for item in document_store_type.split(",")]
-        metafunc.parametrize("document_store", document_store_types, indirect=True)
+        # TODO: Remove the following if-condition once weaviate is fully compliant
+        # Background: Currently, weaviate is not fully compliant (e.g. "_" in "meta_field", problems with uuids ...)
+        # Therefore, we have separate tests in test_weaviate.py and we don't want to parametrize our generic
+        # tests (e.g. in test_document_store.py) with the weaviate fixture. However, we still need the weaviate option
+        # in the CLI arg as we want to skip test_weaviate.py if weaviate is not selected from CLI
+        if "weaviate" in selected_doc_stores:
+            selected_doc_stores.remove("weaviate")
+        metafunc.parametrize("document_store", selected_doc_stores, indirect=True)
 
 
 def _sql_session_rollback(self, attr):
@@ -76,8 +82,11 @@ def _sql_session_rollback(self, attr):
 SQLDocumentStore.__getattribute__ = _sql_session_rollback
 
 
-def pytest_collection_modifyitems(items):
+def pytest_collection_modifyitems(config,items):
     for item in items:
+
+        # add pytest markers for tests that are not explicitly marked but include some keywords
+        # in the test name (e.g. test_elasticsearch_client would get the "elasticsearch" marker)
         if "generator" in item.nodeid:
             item.add_marker(pytest.mark.generator)
         elif "summarizer" in item.nodeid:
@@ -95,6 +104,15 @@ def pytest_collection_modifyitems(items):
         elif "weaviate" in item.nodeid:
             item.add_marker(pytest.mark.weaviate)
 
+        # if the cli argument "--document_store_type" is used, we want to skip all tests that have markers of other docstores
+        # Example: pytest -v test_document_store.py --document_store_type="memory" => skip all tests marked with "elasticsearch"
+        document_store_types_to_run = config.getoption("--document_store_type")
+        for cur_doc_store in ["elasticsearch", "faiss", "sql", "memory", "milvus", "weaviate"]:
+            if cur_doc_store in item.keywords and cur_doc_store not in document_store_types_to_run:
+                skip_docstore = pytest.mark.skip(
+                    reason=f'{cur_doc_store} is disabled. Enable via pytest --document_store_type="{cur_doc_store}"')
+                item.add_marker(skip_docstore)
+
 
 @pytest.fixture(scope="session")
 def elasticsearch_fixture():
@@ -271,7 +289,7 @@ def test_docs_xs():
     return [
         # current "dict" format for a document
         {"text": "My name is Carla and I live in Berlin", "meta": {"meta_field": "test1", "name": "filename1"}},
-        # meta_field at the top level for backward compatibility
+        # metafield at the top level for backward compatibility
         {"text": "My name is Paul and I live in New York", "meta_field": "test2", "name": "filename2"},
         # Document object for a doc
         Document(text="My name is Christelle and I live in Paris", meta={"meta_field": "test3", "name": "filename3"})
@@ -288,6 +306,7 @@ def reader_without_normalized_scores():
         use_confidence_scores=False
     )
 
+
 @pytest.fixture(params=["farm", "transformers"], scope="module")
 def reader(request):
     if request.param == "farm":
@@ -401,7 +420,7 @@ def get_retriever(retriever_type, document_store):
     return retriever
 
 
-@pytest.fixture(params=["elasticsearch", "faiss", "memory", "sql", "milvus"])
+@pytest.fixture(params=["elasticsearch", "faiss", "memory", "milvus"])
 def document_store_with_docs(request, test_docs_xs):
     document_store = get_document_store(request.param)
     document_store.write_documents(test_docs_xs)

diff --git a/test/pytest.ini b/test/pytest.ini
@@ -9,4 +9,4 @@ markers =
     pipeline: marks tests with pipeline
     summarizer: marks summarizer tests
     weaviate: marks tests that require weaviate container
-    vector_dim: marks usage of document store with non-default embedding dimension (e.g @pytest.mark.vector_dim(128))
+    vector_dim: marks usage of document store with non-default embedding dimension (e.g @pytest.mark.vector_dim(128))