Skip to content

Commit

Permalink
Merge branch 'master' into feat-windows-beta
Browse files Browse the repository at this point in the history
  • Loading branch information
deepankarm committed Sep 1, 2021
2 parents 1b2b440 + c1ce3eb commit 9f3edf3
Show file tree
Hide file tree
Showing 172 changed files with 9,059 additions and 6,007 deletions.
375 changes: 2 additions & 373 deletions .github/2.0/cookbooks/CleanCode.md
Original file line number Diff line number Diff line change
@@ -1,381 +1,10 @@
# Cookbook on Clean & Efficient Code

Jina is designed as a lean and efficient framework. Solutions built on top of Jina also mean to be so. Here are some
tips to help you write beautiful and efficient code.

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->



<!-- END doctoc generated TOC please keep comment here to allow auto update -->

1. `from jina import Document, DocumentArray, Executor, Flow, requests` is all you need. Copy-paste it as the first line of your code.

1. Use [Python generator](https://docs.python.org/3/glossary.html#term-generator) as the input to the Flow. Generator can lazily build `Document` one at a time, instead of building all at once. This can greatly speedup the overall performance and reduces the memory footprint.

<table>
<tr>
<td>
<b><center>✅ Do</center></b>
</td>
<td>
<b><center>😔 Don't</center></b>
</td>
</tr>
<tr>
<td>

```python
from jina import Flow, Document

def my_input():
for _ in range(1000):
yield Document()

f = Flow()
with f:
f.post('/', my_input)
```


</td>
<td>

```python
from jina import Flow, Document, DocumentArray

my_input = DocumentArray([Document() for _ in range(1000)])

f = Flow()
with f:
f.post('/', my_input)
```

</td>
</tr>
</table>

1. Set `request_size`. `request_size` decides how many Documents in each request. When combining with Generator, `request_size` determines how long will it take before sending the first request. You can change `request_size` to overlap the time of request generation and Flow computation.


<table>
<tr>
<td>
<b><center>✅ Do</center></b>
</td>
<td>
<b><center>😔 Don't</center></b>
</td>
</tr>
<tr>
<td>

```python
from jina import Flow, Document

def my_input():
for _ in range(1000):
# big document
yield Document()

f = Flow().add(uses=...) # heavy computation
with f:
f.post('/', my_input, request_size=10)
```


</td>
<td>

```python
from jina import Flow, Document

def my_input():
for _ in range(1000):
# big document
yield Document()

f = Flow().add(uses=...) # heavy computation
with f:
f.post('/', my_input, request_size=10000)
```

</td>
</tr>
</table>

1. No need to implement `__init__` if your `Executor` does not contain initial states.

<table>
<tr>
<td>
<b><center>✅ Do</center></b>
</td>
<td>
<b><center>😔 Don't</center></b>
</td>
</tr>
<tr>
<td>

```python
from jina import Executor

class MyExecutor(Executor):
def foo(self, **kwargs):
...
```
</td>
<td>

```python
from jina import Executor

class MyExecutor(Executor):
def __init__(**kwargs):
super().__init__(**kwargs)

def foo(self, **kwargs):
...
```

</td>
</tr>
</table>

1. Use `@requests` without specifying `on=` if your function mean to work on all requests. You can use it for catching all requests that are not for this Executor.

<table>
<tr>
<td>
<b><center>✅ Do</center></b>
</td>
<td>
<b><center>😔 Don't</center></b>
</td>
</tr>
<tr>
<td>

```python
from jina import Executor, requests

class MyExecutor(Executor):

@requests
def _skip_all(self, **kwargs):
print('default do sth')
```

</td>
<td>

```python
from jina import Executor

class MyExecutor(Executor):
@requests(on='/index')
def _skip_index(self, **kwargs):
pass

@requests(on='/search')
def _skip_search(self, **kwargs):
pass
```

</td>
</tr>
</table>

1. Fold unnecessary arguments into `**kwargs`, only get what you need.

<table>
<tr>
<td>
<b><center>✅ Do</center></b>
</td>
<td>
<b><center>😔 Don't</center></b>
</td>
</tr>
<tr>
<td>

```python
from jina import Executor, requests

class MyExecutor(Executor):

@requests
def foo_need_pars_only(self, parameters, **kwargs):
print(parameters)
```
</td>
<td>

```python
from jina import Executor, requests

class MyExecutor(Executor):

@requests
def foo_need_pars_only(self, docs, parameters, docs_matrix, groundtruths_matrix, **kwargs):
print(parameters)
```

</td>
</tr>
</table>

1. To debug an `Executor`, there is no need to use it in the Flow. Simply initiate it as an object and call its method.

<table>
<tr>
<td>
<b><center>✅ Do</center></b>
</td>
<td>
<b><center>😔 Don't</center></b>
</td>
</tr>
<tr>
<td>

```python
from jina import Executor, requests, DocumentArray, Document


class MyExec(Executor):

@requests
def foo(self, docs, **kwargs):
for d in docs:
d.text = 'hello world'


m = MyExec()
da = DocumentArray([Document(text='test')])
m.foo(da)
print(da)
```

</td>
<td>

```python
from jina import Executor, requests, DocumentArray, Document, Flow


class MyExec(Executor):

@requests
def foo(self, docs, **kwargs):
for d in docs:
d.text = 'hello world'


da = DocumentArray([Document(text='test')])

with Flow().add(uses=MyExec) as f:
f.post('/', da, on_done=print)
```

</td>
</tr>
</table>

1. Send `parameters` only request to a Flow if you don't need `docs`.

<table>
<tr>
<td>
<b><center>✅ Do</center></b>
</td>
<td>
<b><center>😔 Don't</center></b>
</td>
</tr>
<tr>
<td>

```python
from jina import Executor, Flow, requests

class MyExecutor(Executor):

@requests
def foo_need_pars_only(self, parameters, **kwargs):
print(parameters)

f = Flow().add(uses=MyExecutor)

with f:
f.post('/foo', parameters={'hello': 'world'})
```

</td>
<td>

```python
from jina import Executor, Flow, Document, requests

class MyExecutor(Executor):

@requests
def foo_need_pars_only(self, parameters, **kwargs):
print(parameters)

f = Flow().add(uses=MyExecutor)

with f:
f.post('/foo', inputs=Document(), parameters={'hello': 'world'})
```

</td>
</tr>
</table>

1. Add `Chunks` to root `Document`, do not create them in one line to keep recursive document structure correct. This is because `chunks` use `ref_doc` to control its `granularity`, at `chunk` creation time, it didn't know anything about its parent, and will get a wrong `granularity` value.
This cookbook is deprecated. The content is revised and moved to https://docs.jina.ai/fundamentals/clean-code/

<table>
<tr>
<td>
<b><center>✅ Do</center></b>
</td>
<td>
<b><center>😔 Don't</center></b>
</td>
</tr>
<tr>
<td>

```python
from jina import Document

root_document = Document(text='i am root')
# add one chunk to root
root_document.chunks.append(Document(text='i am chunk 1'))
root_document.chunks.extend([
Document(text='i am chunk 2'),
Document(text='i am chunk 3'),
]) # add multiple chunks to root
```

</td>
<td>

```python
from jina import Document

root_document = Document(
text='i am root',
chunks=[
Document(text='i am chunk 2'),
Document(text='i am chunk 3'),
])
```

</td>
</tr>
</table>
If you want to contribute to this part of the doc, please commit to https://github.com/jina-ai/jina/blob/master/docs/fundamentals/clean-code.md
Loading

0 comments on commit 9f3edf3

Please sign in to comment.