Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize project file decoding #365

Merged
merged 10 commits into from
Feb 18, 2019
Merged

Conversation

CognitiveDisson
Copy link
Contributor

Hi.
I was going to use your framework to implement a remote cache. But I ran into a blocking problem - it takes my project file about 12 seconds to read from the disk and about 30 seconds to write back. The project is really big - about 13 megabytes, but even for this size such performance is extremely bad.

I tried to find a issue about this, but did not find it. After that, I looked at the source code and found a lot of performance issues. Some of them I tried to fix with minimal changes. As a result, the project opening time was reduced from 12 to 3 seconds. There are still a few problems left in the code, but the main problems have been fixed. In general, I would recommend to refuse Codable and use dictionaries and Marshal - benchmarks.

Problem research

I used Xcode Instruments and my utilitarian class to look for problems.

I made a demo project to check how long it takes to serialize a project file using a simple PropertyListSerialization.

    func readPlist(url: URL) {
        do {
            let plistXML = try Data(contentsOf: url)
            var plistData: [String: AnyObject] = [:]
            var propertyListForamt =  PropertyListSerialization.PropertyListFormat.xml
            do {
                plistData = try PropertyListSerialization.propertyList(from: plistXML, options: .mutableContainersAndLeaves, format: &propertyListForamt) as! [String : AnyObject]
                let objects = plistData["objects"] as! [String: AnyObject]
                let classes = plistData["classes"] as! [String: AnyObject]
                let firstObject = objects.first
                let firstClass = classes.first
            } catch {
                print("Error reading plist: \(error), format: \(propertyListForamt)")
            }
        } catch {
            print("error no upload")
        }
    }

It turned out that it takes less than a second (very quickly). I understood that the problem is in decoding and serialization.

Steps to optimize

Concurrent decoding

First of all, I added concurrent decoding for objects in PBXObjects and implemented synchronization mechanisms for it.

Before:

try objectsDictionaries.forEach { reference, dictionary in
    let object = try PBXObject.parse(reference: reference, dictionary: dictionary, userInfo: decoder.userInfo)
    objects.add(object: object)
}

After:

let dictionary = NSDictionary(dictionary: objectsDictionaries)
dictionary.enumerateKeysAndObjects(options: .concurrent) { (key, obj, _) in
    guard let reference = key as? String,
        let dictionary = obj as? Dictionary<String, Any>
        else { return }
    let userInfo = decoder.userInfo
    let parser = PBXObjectParser(
        reference: reference,
        userInfo: userInfo
    )
    guard let object = try? parser.parse(dictionary: dictionary)
        else { return }
    objects.add(object: object)
}

I chose the standard method enumerateKeysAndObjects, but I also tried to manually select the number of threads. There was no significant difference. This gave about 4 seconds. ( 8 seconds instead of 12 ). Instruments showed excellent work parallelization.

Blind decoding order

I also added very simple optimizations based on the order of parsing - since there are usually more PBXBuildFile entries in the project than PBXTarget ones, so we should start our blind decoding from them, thus reducing the number of checks. Similar with switches by type. This gave another 1-2 seconds. (6 seconds instead of 12)

Eliminating unnecessary serialization

After that I found a time consuming decoding here:

let objectsDictionary: [String: Any] = try container.decodeIfPresent([String: Any].self, forKey: .objects)

I realized that in this case you can convert this project file to serialize into a dictionary and just take a value from it. To make this more accurate, I added a new property to the ProjectDecodingContext and read from it directly in the KeyedDecodingContainer extension. Currenly this optimization takes effect only during PBXProj decoding. For other objects there will be no key provided, but this extra check does not affect their decoding times. ( In the future, it is possible to improve this for nested objects. )

Result

Finally, I got the cherished 3 seconds. Ideally, I would like about a 1-2 seconds, but this requires global changes (like replacing Codable with Marshal).

All tests pass (checked many times), and the project parsing is also correct (checked it in my demo project).

I also plan to speed up the project saving performance in the next PR, and I already know what to do.

@keith
Copy link
Contributor

keith commented Feb 13, 2019

I should have a chance to test this in the next few days, our pbxproj file is 34 mbs so it would be great to see an improvement. I'll have to locally work this into an XcodeGen build though

Usipov
Usipov previously approved these changes Feb 13, 2019
Copy link
Contributor

@pepicrft pepicrft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @CognitiveDisson. This is awesome ❤️ and very appreciated. Thanks for taking your time to bring those improvements to the codebase.
I left some minor comments but overall looks good. It's been a while without working with the codebase so I needed to get some context beforehand to understand all the pieces that you touched.

@pepicrft
Copy link
Contributor

I should have a chance to test this in the next few days, our pbxproj file is 34 mbs so it would be great to see an improvement. I'll have to locally work this into an XcodeGen build though

It'd be great @keith if you could give us some numbers from that project.

@keith
Copy link
Contributor

keith commented Feb 14, 2019

Looks like this saves ~4 seconds or ~10% on the largest version of our project 🎉 (I didn't test a ton)

beefon
beefon previously approved these changes Feb 15, 2019
@CognitiveDisson
Copy link
Contributor Author

@keith Strange, I expected a better result. Can you provide more information about the project file? What objects are there most?
If possible, I would like to debug on your project .

@pepibumur I corrected most of the comments. Can I merge?

I implemented using a custom decoder to work with a dictionary. This should take away the extra work. According to the results received a slight but still improved. (from 3 seconds to 2)
I'll try to check on other project files. And make a separate PR.

@keith
Copy link
Contributor

keith commented Feb 16, 2019

I can check in more detail but we have ~700 targets and like 15k files

pepicrft
pepicrft previously approved these changes Feb 16, 2019
Copy link
Contributor

@pepicrft pepicrft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some last comments and we can merge afterwards.

@CognitiveDisson CognitiveDisson dismissed stale reviews from pepicrft and beefon via dab2f3d February 17, 2019 11:04
@pepicrft
Copy link
Contributor

Awesome job @CognitiveDisson 🎉 . I'm merging it.
I've opened an issue to keep track of all the potential performance improvements that could be added to the project.

@pepicrft pepicrft merged commit 5fb4b32 into tuist:master Feb 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants