Skip to content

Commit

Permalink
Ensure consistent use of "computation" and "state computation"
Browse files Browse the repository at this point in the history
"Computation" and "state computation" have technical
meanings in the context of Wallaroo core concepts and APIs.
This commit updates the book and other forms of documentation
to use "state computation" whenever we are referring to
state computation.

Closes #1941
  • Loading branch information
jtfmumm authored and SeanTAllen committed Jan 15, 2018
1 parent 06d4c16 commit 0f716f8
Show file tree
Hide file tree
Showing 27 changed files with 63 additions and 65 deletions.
10 changes: 5 additions & 5 deletions book/core-concepts/partitioning.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Partitioning

If all of the application state exists in one state object then only one computation at a time can access that state object. In order to leverage concurrency, that state needs to be divided into multiple distinct state objects. Wallaroo can then automatically distribute these objects in a way that allows them to be accessed by computations in parallel.
If all of the application state exists in one state object then only one state computation at a time can access that state object. In order to leverage concurrency, that state needs to be divided into multiple distinct state objects. Wallaroo can then automatically distribute these objects in a way that allows them to be accessed by state computations in parallel.

For example, in an application that keeps track of stock prices, the application state might be a dictionary where the stock symbol is used to look up the price of the stock.

Expand Down Expand Up @@ -34,7 +34,7 @@ func (s *Stocks) Set(symbol string, price float64) {
}
{%- endcodetabs %}

If a message came into the system with a new stock price, the computation would take that message, get the symbol and the price, and use them to update the state.
If a message came into the system with a new stock price, the state computation would take that message, get the symbol and the price, and use them to update the state.

{% codetabs name="Python", type="py" -%}
@wallaroo.state_computation("update stock")
Expand All @@ -57,14 +57,14 @@ func (us *UpdateStock) Compute(data interface{}, state interface{}) (interface{}
{%- endcodetabs %}


However, only one computation may access the state at a time, so in this cases messages are handled one at a time.
However, only one state computation may access the state at a time, so in this cases messages are handled one at a time.

If we could break the state into pieces and tell Wallaroo about those pieces then we could process many messages concurrently. In the example, each stock could be broken out into it's own piece of state. This is possible because in the model the price of each stock is independent of the price of any other stock, so modifying one has no effect on any of the others.

## State Partitioning

Wallaroo supports parallel execution by way of _state partitioning_. The state is broken up into distinct parts, and Wallaroo manages access to each part so that they can be accessed in parallel.
To do this, a _partition function_ is used to determine which _state part_ a particular data should be applied to. Once the _part_ is determined, the data and the associated _state part_ are given to a Computation to perform the update logic.
To do this, a _partition function_ is used to determine which _state part_ a particular data should be applied to. Once the _part_ is determined, the data and the associated _state part_ are given to a state computation to perform the update logic.

### Partitioned State

Expand All @@ -82,7 +82,7 @@ type Stock struct {
}
{%- endcodetabs %}

Since the computation only has one stock in its state now, there is no need to do a dictionary look up. Instead, the computation can update the particular Stock's state right away.
Since the state computation only has one stock in its state now, there is no need to do a dictionary look up. Instead, the state computation can update the particular Stock's state right away.

{% codetabs name="Python", type="py" -%}
@wallaroo.state_computation(name="update stock")
Expand Down
11 changes: 5 additions & 6 deletions book/cpp/api/application.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,20 +64,19 @@ is responsible for building the computation that will be added.
### `Application *to_stateful(StateComputation *state_computation_, StateBuilder *state_builder_, const char* state_name_)`
Add a stateful computation to the current pipeline. The
`state_builder_` builds the state that will be used by the
Add a state computation to the current pipeline. The
`state_builder_` builds the state that will be used by the state
computation. state_name_ is the name of the collection of state objects
that we will run computations against. You can share state partitions across
pipelines by using the same name. Using different names for different
that we will run state computations against. You can share state partitions across pipelines by using the same name. Using different names for different
partitions, keeps them separate and in this way, acts as a sort of namespace.
### `Application *to_state_partition(StateComputation *state_computation_, StateBuilder *state_builder_, const char* state_name_, Partition *partition_, bool multi_worker_)`
Add a partitioned stateful computation to the current pipeline.
Add a partitioned state computation to the current pipeline.
### `Application *to_state_partition_u64(StateComputation *state_computation_, StateBuilder *state_builder_, const char* state_name_, PartitionU64 *partition_, bool multi_worker_)`
Add a partitioned stateful computation that uses a 64-bit integer partitioning key to the current pipeline.
Add a partitioned state computation that uses a 64-bit integer partitioning key to the current pipeline.
### `Application *to_sink(SinkEncoder *sink_encoder_)`
Expand Down
2 changes: 1 addition & 1 deletion book/cpp/api/state-builder.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ A state builder is responsible for creating state objects.
Two states with the same name are considered the same, while two
states with different names, even if they return the same type of
state object, are considered different. This allows to different
stateful computations to use the same state object if they have the
state computations to use the same state object if they have the
same name, while also allowing two different state objects of the same
type to be used in different places if they have different names.

Expand Down
2 changes: 1 addition & 1 deletion book/cpp/api/state-change.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ public:
### `virtual const char *name()`

This method returns the name of the state change object. This name is
used to look up the state change object inside stateful computations.
used to look up the state change object inside state computations.

### `virtual void apply(State *state_)`

Expand Down
4 changes: 2 additions & 2 deletions book/cpp/api/state-computation.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ used for logging and reporting.

#### `virtual void *compute(Data *input_, StateChangeRepository *state_change_repository_, void* state_change_respository_helper_, State *state_, void *none)`

This method performs a computation using a the `input_` message data
This method performs a state computation using a the `input_` message data
and state from the state object. The `state_change_repository_` and
`state_change_repository_helper_` are used to look up state change
objects.
Expand All @@ -35,7 +35,7 @@ the example application in [`examples/cpp/alphabet-cpp`](https://github.com/Wall
#### `virtual size_t get_number_of_state_change_builders()`

This method returns the number of state change builders associated
with this computation.
with this state computation.

#### `virtual StateChangeBuilder *get_state_change_builder(size_t idx_)`

Expand Down
2 changes: 1 addition & 1 deletion book/cpp/api/state.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# State

A state object represents state that is passed into a stateful
A state object represents state that is passed into a state
computation. The application developer is responsible for subclassing
this class and adding fields and methods for dealing with whatever
data is appropriate to the state.
Expand Down
2 changes: 1 addition & 1 deletion book/cpp/api/user-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Objects of the following classes must be supported in the
`w_user_data_deserialize(...)` function:

* data passed as messages between steps (`wallaroo::Data`)
* stateful computations (`wallaroo::StateComputation`)
* state computations (`wallaroo::StateComputation`)
* sink encoders (`wallaroo::SinkEncoder`)
* keys (`wallaroo::Key`)
* state change builders (`wallaroo::StateChangeBuilder`)
Expand Down
4 changes: 2 additions & 2 deletions book/cpp/best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ Decoding is the process of taking data from the outside world and bringing it in

The systems with which your application communicates will most likely have their own data formats; these formats may contain fields that are not needed for your Wallaroo application. So while the source decoder may need to know about all of these fields, the serialization and deserialization methods only need be concerned with the information that is actually being used. By the same token, the sink decoder may need to write to a system that uses a format that is not the same as the one used by the serialization and deserialization code.

Another thing to keep in mind is that while encoding and decoding involve only message data, a Wallaroo application needs to be able to handle serialization and deserialization for more than just message data; it must also handle stateful computations, sink encoders, keys, and state change builders. Therefore, the serialization and deserialization formats must include information about the type of object that is being represented.
Another thing to keep in mind is that while encoding and decoding involve only message data, a Wallaroo application needs to be able to handle serialization and deserialization for more than just message data; it must also handle state computations, sink encoders, keys, and state change builders. Therefore, the serialization and deserialization formats must include information about the type of object that is being represented.

## Data Should Only Enter an Application From a Source

Wallaroo is designed around the idea of streaming data processing. Any state that is required by an application should be stored in a state object. And state objects should only be updated in response to incoming events. Accessing something like a database or a web service from inside a computation will have a performance impact on the whole application. If you need data from another system, you should stream that data into Wallaroo and store it in a state object so that it can then be used by a stateful computation.
Wallaroo is designed around the idea of streaming data processing. Any state that is required by an application should be stored in a state object. And state objects should only be updated in response to incoming events. Accessing something like a database or a web service from inside a state computation will have a performance impact on the whole application. If you need data from another system, you should stream that data into Wallaroo and store it in a state object so that it can then be used by a state computation.

## Do Not Store Mutable Data In Global Variables

Expand Down
14 changes: 7 additions & 7 deletions book/cpp/sample-application.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,9 +122,9 @@ LetterState::LetterState(): m_letter(' '), m_count(0)
}
```

In this case, the state stores a letter and the number of votes it has received. When a state computation is run, the state associated with that computation is passed to the computation. Our application partitions state, so there is one state object for each partition (remember that the partitions are for the letter a to z). So in this case each state object only has to keep track of the information for one letter, not all of the letters.
In this case, the state stores a letter and the number of votes it has received. When a state computation is run, its associated state is passed to the state computation. Our application partitions state, so there is one state object for each partition (remember that the partitions are for the letter a to z). So in this case each state object only has to keep track of the information for one letter, not all of the letters.

The application developer does not allocate state objects directly. Instead, the application developer provides a state builder that Wallaroo calls when it is ready to create a state object. In addition to creating a state object, the state builder also stores a name for the state. Wallaroo uses the name to determine if more than one state computation wants to use the same state object; if two builders return the same name then only one state object is generated and it is used by both computations.
The application developer does not allocate state objects directly. Instead, the application developer provides a state builder that Wallaroo calls when it is ready to create a state object. In addition to creating a state object, the state builder also stores a name for the state. Wallaroo uses the name to determine if more than one state computation wants to use the same state object; if two builders return the same name then only one state object is generated and it is used by both state computations.

Here's our state builder:

Expand Down Expand Up @@ -257,15 +257,15 @@ The `name` method returns the name of the state computation.

The `compute` method does the actual work. It gets the state change handle (this is required by Wallaroo) by calling `w_state_change_repository_lookup_by_name` (if no state change handle can be found then it returns a pointer to `None`, which can be tested by comparing it to the `none` pointer that is passed into the method). It then uses this to get the state change object by calling `w_state_change_get_state_change_object`. It updates the state change object using the `Votes` message data and creates a new outgoing `LetterTotal` message object. Note that the `LetterTotal` object adds the count from the `Votes` data to the count in the state data, because the state object does not get updated until after the `compute` message is run. Finally, the `compute` uses the `w_stateful_computation_get_return` function to generate the return value using the outgoing `LetterTotal` message data and the state change handle.

State functions have zero or more state change builders associated with them. Each state change builder can build a state change, which represents a way of updating the state object. Wallaroo uses the `get_number_of_state_change_builders` method to determine how many state change builders are associated with the computation, and then calls the `get_state_change_builder` method to get each state change builder. In our case there is only one state change builder.
State functions have zero or more state change builders associated with them. Each state change builder can build a state change, which represents a way of updating the state object. Wallaroo uses the `get_number_of_state_change_builders` method to determine how many state change builders are associated with the state computation, and then calls the `get_state_change_builder` method to get each state change builder. In our case there is only one state change builder.

State computations are passed to the worker that processes a given state partition, so they must implement the [Serializable](serialization.md) interface.

### State Changes

State objects are used by state computations, but they are not directly updated within the state computation. Instead, computations return state change objects which Wallaroo later applies to the state. The state change object also provides methods that are used to write state changes to the recovery log and read them from the log during recovery.
State objects are used by state computations, but they are not directly updated within the state computation. Instead, state computations return state change objects which Wallaroo later applies to the state. The state change object also provides methods that are used to write state changes to the recovery log and read them from the log during recovery.

If a new state change object was created each time a computation was run then there could be a negative performance impact on the system. In order to avoid that, state change objects created once and then reused. They are accessed through the state change repository, which can be used to look up a state change handle and the associated state change.
If a new state change object was created each time a state computation was run then there could be a negative performance impact on the system. In order to avoid that, state change objects created once and then reused. They are accessed through the state change repository, which can be used to look up a state change handle and the associated state change.

Our application only has one state change. It increases the vote count for a letter. Here's our application's state change:

Expand Down Expand Up @@ -349,7 +349,7 @@ uint64_t AddVotesStateChange::id()
}
```
Our state change object has an `update` method that is used to set the data that will be used to update the state. After the computation completes, Wallaroo calls the `apply` method to apply the stored
Our state change object has an `update` method that is used to set the data that will be used to update the state. After the state computation completes, Wallaroo calls the `apply` method to apply the stored
change to the state.
State change objects are not created directly by the developer, instead a state change builder object creates the state change object at some point. Our application's state change builder looks like this:
Expand Down Expand Up @@ -465,7 +465,7 @@ namespace SerializationTypes
const uint8_t Votes = 1;
const uint8_t LetterTotal = 2;
// stateful computations
// state computations
const uint8_t AddVotes = 3;
// state change builders
Expand Down
4 changes: 2 additions & 2 deletions book/cpp/serialization.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# C++ Serialization

Wallaroo makes it possible to distribute computation across multiple workers. In order to do this, objects must be serialized and deserialized at various points. Wallaroo controls when objects are serialized and deserialized, but the programmer is left with a great deal of control over how this is done.
Wallaroo makes it possible to distribute computations across multiple workers. In order to do this, objects must be serialized and deserialized at various points. Wallaroo controls when objects are serialized and deserialized, but the programmer is left with a great deal of control over how this is done.

## Classes That Support Serialization

* data passed as messages between steps (`wallaroo::Data`)
* stateful computations (`wallaroo::StateComputation`)
* state computations (`wallaroo::StateComputation`)
* sink encoders (`wallaroo::SinkEncoder`)
* keys (`wallaroo::Key`)
* state change builders (`wallaroo::StateChangeBuilder`)
Expand Down
6 changes: 3 additions & 3 deletions book/go/api/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ Add a partitioned state computation that only returns one message to the pipelin

##### `ToStatePartitionMulti(stateComputation wallarooapi.StateComputationMulti, stateBuilder wallarooapi.StateBuilder, stateName string, partitionFunction wa.PartitionFunction, partitions []uint64) *pipelineBuilder`

Similar to `ToStatePartition`, but the computation can return more than one message.
Similar to `ToStatePartition`, but the state computation can return more than one message.

##### `ToSink(sinkConfig SinkConfig)`

Expand Down Expand Up @@ -410,7 +410,7 @@ The first return value is a message that we will send on to our next step. It sh

Why wouldn't we always return `true`? There are two answers:

1. Your computation might not have updated the state, in which case saving its state for recovery is wasteful.
1. Your state computation might not have updated the state, in which case saving its state for recovery is wasteful.
2. You might only want to save after some changes. Saving your state can be expensive for large objects. There's a tradeoff that can be made between performance and safety.

#### Example `StateComputation`
Expand Down Expand Up @@ -440,7 +440,7 @@ A `StateComputationMulti` is similar to a `StateComputation`, except that its `C

##### `Name() string`

Return the name of the computation as a string.
Return the name of the state computation as a string.

##### ` Compute(data interface{}, state interface{}) ([]interface {}, bool)`

Expand Down
2 changes: 1 addition & 1 deletion book/go/api/interworker-serialization-and-resilience.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ func Serialize(c interface{}) []byte {
}
```

The switch statement handles all of the types that need to be serialized. (Remember, the Reverse application does not use stateful computations. Cases for partition functions, state computations, and state objects do not appear in this example).
The switch statement handles all of the types that need to be serialized. (Remember, the Reverse application does not use state computations. Cases for partition functions, state computations, and state objects do not appear in this example).

* decoders -- the `Decoder` class
* encoders -- the `Encoder` class
Expand Down
Loading

0 comments on commit 0f716f8

Please sign in to comment.