-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate attachments from IndexedDB to file system #2129
Conversation
14aef88
to
55b7350
Compare
d8d3199
to
a677b31
Compare
55b7350
to
bde8436
Compare
0aef6a2
to
47df1d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotated PR for your convenience, @scottnonnenberg-signal 😄
!js/views/settings_view.js | ||
!js/backup.js | ||
!js/database.js |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just alphabetized it (using Sublime Sort Lines
) and found some dupes.
const toArrayBuffer = require('to-arraybuffer'); | ||
|
||
|
||
const PATH = 'attachments.noindex'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the piece that helps prevent indexing on macOS.
|
||
const prefix = name.slice(0, 2); | ||
return path.join(prefix, name); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, this file was written before we had our discussion about module.exports
so I haven’t converted it yet. Also, it’s nice to have =>
for some short functions without return
which would require to have function
keyword.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I'll amend my previous comment: exports.xxxx
is totally reasonable since export xxx
is the future. :0)
I'll get my preferred order in files when we move to that via export function xxx
.
js/models/conversations.js
Outdated
@@ -656,10 +657,13 @@ | |||
profileKey = storage.get('profileKey'); | |||
} | |||
|
|||
const loadData = Attachment.loadData(migrationContext.readAttachmentData); | |||
const attachmentsWithData = | |||
await Promise.all(messageWithSchema.attachments.map(loadData)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to surgically add places where we have to upgrade messages or load attachment metadata. I’d appreciate you thinking about places that I might have missed (and are therefore not in this PR) based on your extensive experience with the codebase — thanks 👀🙇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only a few potential sources which could create a new message:
- incoming, from contact
- incoming, sent to contact but from your own device
- import
- sending, created locally
There are a lot of places where 'sync' messages are sent, but those will never have attachments. Though perhaps you do want to upgrade their schema as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that helps. I think I got (1) onMessageReceived
, (2) onSentMessage
, (4) Conversation::sendMessage
. Let me look at (3) imported messages (Update: On hold until I get the app startup migration working.)
Re: sync messages. Good point. Indeed, right now they don’t have attachments but we may have other schema upgrades in the future. I’ll see if I can fit this in here, or at least do Message::initializeSchema
so they’ll be picked up by our background task in the future.
initialize: function(attributes) { | ||
if (_.isObject(attributes)) { | ||
this.set(TypedMessage.initializeSchemaVersion(attributes)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to add if
guard for when we call new Whisper.Message()
.
@@ -427,6 +444,7 @@ | |||
} | |||
} | |||
message.set({ | |||
schemaVersion : dataMessage.schemaVersion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a place we have to look out for when adding new properties. Haven’t had time to think about how to make this less likely to be forgotten in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about in the constructor of the Message
object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added Signal.Types.Message.initializeSchema
to Whisper.Message::initialize
constructor. But sadly that won’t cover handleDataMessage
as it mutates Whisper.Message
through this set
call. In my experience, code is much easier to reason about when the constructor is the only place to construct a data type and it ensures the data type is valid after successful execution (Make Invalid States Impossible to Represent).
In fact, when you work with immutable data types, there simply is no such a thing as a set
after the fact to mutate an existing ‘instance’. Yes, you can have set
later but it returns a new data type. This means required fields (non-Maybe
s) have to be set in the constructor. Functions that do validation and data type construction are often called smart constructors. This article gives a good overview: https://github.com/paf31/purescript-book/blob/cf0b510052495e884059667e0585156cf68e354b/text/chapter14.md#smart-constructors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Maybe it would make sense to use a different technique to go from message
to dataMessage
? What keys don't we want to come through? What did you see when you spent some time with this part of the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ideally we’d only use constructors as the sole way to create an instance of a model so there is a single point to determine what data is valid vs not. Right now we have a mixture of new Whisper.Message(…)
and later various set
calls which makes it harder to reason about the final state of Message
. For specifics, I’d have to spend more time with it but at a high-level I see a few problematic patterns:
- We treat incoming, outgoing, verified-change, group update, end session, etc. messages all the same, i.e. as
Whisper.Message
. I understand we want to render them at the same time and persist them as a union (all properties flat and explicittype
field) inmessages
store at the storage level, but it’d be nice to make a distinction at the app level. This would allow us to break downhandleDataMessage
into shorter functions, e.g.handleIncomingMessage
,handleOutgoingMessage
,handleVerifiedChangeMessage
, etc. Happy to share how this would be modeled if we had support for union types. - We interweave pure construction of the data type as well as side-effects such as
Message::save
andConversation::save
. It’d be nice to break the two apart so we could test the side-effect free data type construction using unit tests.
What keys don't we want to come through?
For Whisper.Message
specifically we seem to be doing fairly well to pick data vs rejecting it. Going through my own database, I found dataMessage: null
in type: 'outgoing'
which seems to be astray. Other than that, it’d be simply nice if there was a single new Whisper.Message(…)
call that showed all the properties we initialize with.
|
||
// hasData :: Attachment -> Boolean | ||
exports.hasData = attachment => | ||
attachment.data instanceof ArrayBuffer || ArrayBuffer.isView(attachment.data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One example where module.exports
would make the function longer. But not a huge deal 😄
BTW, this definition of data
is what window.crypto.subtle
expects for encryption.
|
||
const attachmentWithoutData = Object.assign({}, attachment, { path }); | ||
delete attachmentWithoutData.data; | ||
return attachmentWithoutData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grr, forgot to use omit
again. Can fix it before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a big deal, I'm just honestly surprised that it's not second-nature for you! :0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t start using Lodash until recently, that’s why. Deleting fields from a data type is also not something you do in statically typed languages 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it. Thanks for the tip. This is one example where yes, we could say that comment is not worth it but I appreciate you having made it and I learned something new 😄
'`attachment.data` must be an `ArrayBuffer` or `ArrayBufferView`; got: ' + | ||
typeof attachment.data | ||
)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helped improve our error messaging in case of problems and highlights the error sooner.
package.json
Outdated
"spectron": "^3.8.0" | ||
"spectron": "^3.8.0", | ||
"string-to-arraybuffer": "^1.0.0", | ||
"tmp": "^0.0.33" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I guess tmp
is now not needed since you added it to dependencies
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it from devDependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, great work. Some design things to think about, some questions about dependencies. And a strong recommendation to avoid ` in strings unless we really need it. :0)
// getPath :: AbsolutePath -> AbsolutePath | ||
exports.getPath = (userDataPath) => { | ||
if (!isString(userDataPath)) { | ||
throw new TypeError('`userDataPath` must be a string'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. The use of ` to denote a name is bit confusing when scanning this text, since that character is used for strings as well. I don't thing we get enough value out of it here to justify that cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what the particular concern is? The rationale is to use the Markdown syntax we’re so used to as developers for denoting identifiers. I feel '\'userDataPath\' must be a string
is noisier and just 'userDataPath must be a string'
can get confusing once we have more generic arguments such as 'data must be a string'
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backticks are harder for me to scan and understand. It's a twitch thing, and makes the code take more time to parse. IMO, markdown syntax is only useful when it gets converted to markdown.
I'm not really worried about a confusing error message like 'data must be a string' because we'll have the stack and therefore line numbers to help us out with addressing the error. If we're really looking for clarity, we might want to put the name of the method in the error.
js/models/conversations.js
Outdated
@@ -656,10 +657,13 @@ | |||
profileKey = storage.get('profileKey'); | |||
} | |||
|
|||
const loadData = Attachment.loadData(migrationContext.readAttachmentData); | |||
const attachmentsWithData = | |||
await Promise.all(messageWithSchema.attachments.map(loadData)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only a few potential sources which could create a new message:
- incoming, from contact
- incoming, sent to contact but from your own device
- import
- sending, created locally
There are a lot of places where 'sync' messages are sent, but those will never have attachments. Though perhaps you do want to upgrade their schema as well?
js/models/messages.js
Outdated
const attachments = this.get('attachments'); | ||
const deleteData = | ||
Attachment.deleteData(migrationContext.deleteAttachmentData); | ||
await Promise.all(attachments.map(deleteData)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the last line of a function, how do we decide to use return
or await
? Is it because you explicitly don't want to make the results of those deleteData()
calls available to any caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I want onDestroy
to have return type void
. I’ll add an explicit return;
to make it clear 👍
@@ -427,6 +444,7 @@ | |||
} | |||
} | |||
message.set({ | |||
schemaVersion : dataMessage.schemaVersion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about in the constructor of the Message
object?
js/models/messages.js
Outdated
this.revokeImageUrl(); | ||
const attachments = this.get('attachments'); | ||
const deleteData = | ||
Attachment.deleteData(migrationContext.deleteAttachmentData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of API, I would like to have the final deleteData
available to me here as an end-user, instead of needing to assemble it with context
. I can see why we arrange things this way at declaration, keeping things pure, keeping things easy to test and compose. But I think we could simplify things here by having things be pre-assembled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that’d be nice. I will have to think about how to best expose it since the regular upgrade pipeline just needs the writeAttachmentData
function on its own.
test/app/attachments_test.js
Outdated
@@ -0,0 +1,105 @@ | |||
const fse = require('fs-extra'); | |||
const path = require('path'); | |||
const stringToArrayBuffer = require('string-to-arraybuffer'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to me like we're pulling this in for a very simple function:
function str2ab (str) {
var array = new Uint8Array(str.length);
for(var i = 0; i < str.length; i++) {
array[i] = str.charCodeAt(i);
}
return array.buffer
}
I feel like the risk we take on in adding another dependency (even if a dev dependency) is not worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
|
||
const prefix = name.slice(0, 2); | ||
return path.join(prefix, name); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I'll amend my previous comment: exports.xxxx
is totally reasonable since export xxx
is the future. :0)
I'll get my preferred order in files when we move to that via export function xxx
.
@@ -77,6 +78,7 @@ | |||
"spellchecker": "^3.4.4", | |||
"testcheck": "^1.0.0-rc.2", | |||
"tmp": "^0.0.33", | |||
"to-arraybuffer": "^1.0.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you talk about the need for this dependency?
Unlike string-to-arraybuffer
, at least this function is has a nice performance optimization we probably wouldn't have written otherwise: https://github.com/jhiesey/to-arraybuffer/blob/6502d9850e70ba7935a7df4ad86b358fc216f9f0/index.js#L3-L27
|
||
const attachmentWithoutData = Object.assign({}, attachment, { path }); | ||
delete attachmentWithoutData.data; | ||
return attachmentWithoutData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a big deal, I'm just honestly surprised that it's not second-nature for you! :0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review and the valuable feedback, especially pointing out where we create, send, and receive messages.
Re: to-arraybuffer
. I need that to convert from Node.js Buffer
(fs.readFile
) to ArrayBuffer
used in our UI.
js/models/conversations.js
Outdated
@@ -656,10 +657,13 @@ | |||
profileKey = storage.get('profileKey'); | |||
} | |||
|
|||
const loadData = Attachment.loadData(migrationContext.readAttachmentData); | |||
const attachmentsWithData = | |||
await Promise.all(messageWithSchema.attachments.map(loadData)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that helps. I think I got (1) onMessageReceived
, (2) onSentMessage
, (4) Conversation::sendMessage
. Let me look at (3) imported messages (Update: On hold until I get the app startup migration working.)
Re: sync messages. Good point. Indeed, right now they don’t have attachments but we may have other schema upgrades in the future. I’ll see if I can fit this in here, or at least do Message::initializeSchema
so they’ll be picked up by our background task in the future.
@scottnonnenberg-signal Added the following changes based on feedback: https://github.com/signalapp/Signal-Desktop/pull/2129/files/47df1d52a4005362cba339cfe24f1041c12dec2a..9d2706c59e18585f9d1ddd4e134aee24b4d749e9 |
642f376
to
ead5d04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! That attachment API is now much cleaner at all the call sites. :0)
Allows us to use `fs` with promises. Turns out it’s already a transitive dependency.
Useful for creating random temporary directories for testing.
Used for testing attachment data migration.
Using hashes, we get the benefit of deduplication but if a user receives two messages with the same attachment, deleting one would delete it for both since they are only stored once. To avoid the complexity of tracking number of references, we simply generate random file names similar to iMessage on MacOS (?) and Signal Android.
We already use `tmp`.
Allows us to pass in Electron/Node.js specific functions. This allows better unit testing in Mocha where we don’t have access to Electron APIs.
Prevent double rendering of attachments by multiple entries into `MessageView::render` using promises.
Our tests rely on that.
This way we can avoid an extra dependency.
Makes `migrationContext` obsolete.
The function is simple enough to inline and this allows us to reduce our dependencies surface area.
a86ac5d
to
7eaa6ef
Compare
- [x] Generate random file names. - [x] Generate random file paths that prevent too many files per folder using fan-out. - [x] Create attachment directory in user data folder. - [x] Investigate operating system file indexing on: - [x] Windows: Confirmed that `AppData` is not indexed by default. - [x] macOS: Confirmed that `~/Library` files are not indexed by default. Searching system files using Spotlight requires multi-step opt-in: https://lifehacker.com/5711409/how-to-search-for-hidden-packaged-and-system-files-in-os-x. More info https://apple.stackexchange.com/a/92785. Added `.noindex` suffix to `attachments` folder. - [x] Linux: n/a - [x] Save incoming attachment files to disk - [x] On received - [x] On sync - [x] Save outgoing attachments files to disk before sending - [x] Display attachments either from disk or memory in attachment view. Be robust to multiple render passes. - [x] Test that missing attachment on disk doesn’t break app. Behavior: Message is displayed without attachment. - [x] Delete attachment files when message is deleted. Relates to #1589.
Rebased (no changes) and merged (before CI). Will address any unlikely CI issues after all three PRs are in |
Update to electron 1.8.4 (#2186) Migrate all attachments from IndexedDB to file system in the background (#2208, #2193, #2165, #2162, #2129) Save attachments to disk when importing Chrome app export (#2212) Fixed: Read receipts setting would not be synchronized along with re-link (#2218) Fixed: Clicking conversation in left pane when already selected would remove focus on message composition field - thanks @colefranz! (#2032) Fixed: Searching for the phone number of an existing contact, then selecting 'start conversation' would erase contact details (#2191) Fixed: Selecting Settings menu option multiple times would open multiple instances of settings view - thanks @navdeepsinghkhalsa (#2167) Dev: - Relax Node.js version requirements (#2203) - Fix a few typos in documentation - thanks @Vinnl (#2171) - Update issue template to mention that translation should be via Transifex (#2157)
Update to electron 1.8.4 (signalapp#2186) Migrate all attachments from IndexedDB to file system in the background (signalapp#2208, signalapp#2193, signalapp#2165, signalapp#2162, signalapp#2129) Save attachments to disk when importing Chrome app export (signalapp#2212) Fixed: Read receipts setting would not be synchronized along with re-link (signalapp#2218) Fixed: Clicking conversation in left pane when already selected would remove focus on message composition field - thanks @colefranz! (signalapp#2032) Fixed: Searching for the phone number of an existing contact, then selecting 'start conversation' would erase contact details (signalapp#2191) Fixed: Selecting Settings menu option multiple times would open multiple instances of settings view - thanks @navdeepsinghkhalsa (signalapp#2167) Dev: - Relax Node.js version requirements (signalapp#2203) - Fix a few typos in documentation - thanks @Vinnl (signalapp#2171) - Update issue template to mention that translation should be via Transifex (signalapp#2157)
v1.7.0-beta.3 Update to electron 1.8.4 (#2186) Migrate all attachments from IndexedDB to file system in the background (#2208, #2193, #2165, #2162, #2129) Save attachments to disk when importing Chrome app export (#2212) Fixed: Read receipts setting would not be synchronized along with re-link (#2218) Fixed: Clicking conversation in left pane when already selected would remove focus on message composition field - thanks @colefranz! (#2032) Fixed: Searching for the phone number of an existing contact, then selecting 'start conversation' would erase contact details (#2191) Fixed: Selecting Settings menu option multiple times would open multiple instances of settings view - thanks @navdeepsinghkhalsa (#2167) Dev: - Relax Node.js version requirements (#2203) - Fix a few typos in documentation - thanks @Vinnl (#2171) - Update issue template to mention that translation should be via Transifex (#2157)
Note: this release is equivalent to v1.7.0-beta.3. The changes listed below are compared to the previous production release, v1.6.1. Update to electron 1.8.4 (#2186) Migrate all attachments from IndexedDB to file system in the background (#2208, #2193, #2165, #2162, #2129) Save attachments to disk when importing Chrome app export (#2212) New option in settings: delete all application data (383e02e, #2144 and #2153) Remove all configuration in database when we discover we are unlinked (9acb189 and 1c6d91b) Delete everything in database when we link with a different phone number from previous link (9acb189) Windows: Delete all data on uninstall (c855597) Fixed: Read receipts setting would not be synchronized along with re-link (#2218) Fixed: Clicking conversation in left pane when already selected would remove focus on message composition field - thanks @colefranz! (#2032) Fixed: Searching for the phone number of an existing contact, then selecting 'start conversation' would erase contact details (#2191) Fixed: Selecting Settings menu option multiple times would open multiple instances of settings view - thanks @navdeepsinghkhalsa (#2167) Dev: - Redact file paths in anything that goes to the log on disk (#2110) - When top-level process errors happen, don't show dialog with stack trace (#2110) - Add `nsp` to CI runs (fd056e1) - Add eslint-plugin-mocha to disallow exclusive tests using *.only (#2110) - Preparation for encrypted backups (cea42bd) - Updates to structure of exported data - messages.zip, flat list of attachments (6d8f4b7) - Relax Node.js version requirements (#2203) - Fix a few typos in documentation - thanks @Vinnl (#2171) - Update issue template to mention that translation should be via Transifex (#2157)
AppData
is not indexed by default.~/Library
files are not indexed by default. Searching system files using Spotlight requires multi-step opt-in: https://lifehacker.com/5711409/how-to-search-for-hidden-packaged-and-system-files-in-os-x. More info https://apple.stackexchange.com/a/92785. Added.noindex
suffix toattachments
folder.Relates to #1589.