Skip to content

Commit

Permalink
Merge pull request #2655 from joshcooper/perf_improvements
Browse files Browse the repository at this point in the history
Eliminate redundant token joining and const_get calls
  • Loading branch information
mhashizume authored Dec 11, 2023
2 parents 352e2a8 + a6cfedc commit 69b0978
Show file tree
Hide file tree
Showing 3 changed files with 67 additions and 23 deletions.
39 changes: 39 additions & 0 deletions docs/data-flow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
## Data Flow

This shows the general data flow when a user runs facter on the command to lookup facts `a.b` and `c`.

Generally, facter loads fact definitions (`LoadedFact`) to determine all of the things it could collect, including internal (aka core) facts, custom facts (implemented using the `Facter.add` API) and external facts (json, yaml, bash, etc). Each `LoadedFact` specifies a name like `os.family` and a class that can be called later to collect the values, e.g. `Facts::Linux::Os::Release`.

The `QueryParser` parse both user queries `a.b` and `c` and matches each query against all LoadedFacts, returning an array of `SearchedFacts`. These are more like SearchableFacts, since they haven't been searched yet.

Facter attempts to lookup the facts from the cache, otherwise it calls the `InternalFactManager` and `ExternalFactManager` to resolve facts.

For internal facts, facter wraps each `SearchedFact` with a `CoreFact`. The `CoreFact` calls the `call_the_resolver` method on the class that the `SearchedFact` references. The `call_the_resolver` method then typically delegates to a resolver and returns the fact value which may be scalar or structured data. For example, `os.family` returns a string, but `gce` returns a Hash.

```mermaid
flowchart TD
CLI[facter a.b c] --> Facter[Facter.to_user_output]
Facter --> FactManager[FactManager#resolve_facts]
FactManager --> FactLoader[FactLoader.load]
FactLoader -->|internal| InternalFactLoader[InternalLoader.core_facts]
FactLoader -->|custom| CustomFactLoader[ExternalFactLoader.custom_facts]
FactLoader -->|external| ExternalFactLoader[ExternalFactLoader.external_facts]
InternalFactLoader --> QueryParser[QueryParser.parse]
CustomFactLoader --> QueryParser
ExternalFactLoader --> QueryParser
QueryParser -->|empty query| AllSearchable[All loaded facts are searchable]
QueryParser -->|not empty| SomeSearchable[Match query tokens to loaded facts]
AllSearchable --> SearchedFacts[Array of SearchedFacts]
SomeSearchable --> SearchedFacts
SearchedFacts --> CacheManager[CacheManager.resolve_facts]
CacheManager -->|internal| InternalFactManager[InternalFactManager.resolve_facts]
InternalFactManager --> CoreFact[CoreFact#create]
CoreFact --> SearchedFact[SearchedFact -> call_the_resolver]
SearchedFact --> Resolvers[Facter::Resolvers::*.resolve]
CacheManager -->|external| ExternalFactManager[ExternalFactManager.resolve_facts]
Resolvers --> ResolvedFacts[Array of ResolvedFacts]
ExternalFactManager --> ResolvedFacts
ResolvedFacts --> CacheFacts[CacheManager.cache_facts]
CacheFacts --> FilterFacts[FactFilter#filter_facts!]
FilterFacts --> Formatter
```
7 changes: 4 additions & 3 deletions lib/facter/framework/core/fact_loaders/class_discoverer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,10 @@ def discover_classes(operating_system)

def find_nested_classes(mod, discovered_classes)
mod.constants.each do |constant_name|
if mod.const_get(constant_name).instance_of? Class
discovered_classes << mod.const_get(constant_name)
elsif mod.const_get(constant_name).instance_of? Module
obj = mod.const_get(constant_name)
if obj.instance_of? Class
discovered_classes << obj
elsif obj.instance_of? Module
find_nested_classes(Module.const_get("#{mod.name}::#{constant_name}"), discovered_classes)
end
end
Expand Down
44 changes: 24 additions & 20 deletions lib/facter/framework/parsers/query_parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,18 @@ class << self
# Because a root fact will always be resolved by a collection of child facts,
# we can return one or more child facts for each parent.
#
# query - is the user input used to search for facts
# loaded_fact - is a list with all facts for the current operating system
# @param query_list [Array] The list of facts to search for
# @param loaded_facts [Array] All of the fact definitions for the current operating system
#
# Returns a list of SearchedFact objects that resolve the users query.
def parse(query_list, loaded_fact)
# @return [Array<SearchedFact>] a list of searchable facts that resolve the user's query
def parse(query_list, loaded_facts)
matched_facts = []
@query_list = query_list

return no_user_query(loaded_fact) unless query_list.any?
return no_user_query(loaded_facts) unless query_list.any?

query_list.each do |query|
found_facts = search_for_facts(query, loaded_fact)
found_facts = search_for_facts(query, loaded_facts)
matched_facts << found_facts
end

Expand All @@ -44,15 +44,17 @@ def no_user_query(loaded_facts)
searched_facts
end

def search_for_facts(query, loaded_fact_hash)
def search_for_facts(query, loaded_facts)
resolvable_fact_list = []
query = query.to_s
query_tokens = query.end_with?('.*') ? [query] : query.split('.')
size = query_tokens.size

# Try to match the most specific query_tokens to the least, returning the first match
size.times do |i|
query_token_range = 0..size - i - 1
resolvable_fact_list = get_facts_matching_tokens(query_tokens, query_token_range, loaded_fact_hash)
query_fact = query_tokens[query_token_range].join('.')
resolvable_fact_list = get_facts_matching_tokens(query_tokens, query_fact, loaded_facts)

return resolvable_fact_list if resolvable_fact_list.any?
end
Expand All @@ -62,12 +64,10 @@ def search_for_facts(query, loaded_fact_hash)
resolvable_fact_list
end

def get_facts_matching_tokens(query_tokens, query_token_range, loaded_fact_hash)
def get_facts_matching_tokens(query_tokens, query_fact, loaded_facts)
resolvable_fact_list = []

loaded_fact_hash.each do |loaded_fact|
query_fact = query_tokens[query_token_range].join('.')

loaded_facts.each do |loaded_fact|
next unless found_fact?(loaded_fact.name, query_fact)

searched_fact = construct_loaded_fact(query_tokens, loaded_fact)
Expand All @@ -79,16 +79,20 @@ def get_facts_matching_tokens(query_tokens, query_token_range, loaded_fact_hash)
end

def found_fact?(fact_name, query_fact)
# This is the case where the fact_name contains a wildcard like
# blockdevice_.*_model and we're querying for the legacy fact
# specifically using 'blockdevice_sba_model' and we don't want the query
# 'blockdevice.sba.model' to match
fact_with_wildcard = fact_name.include?('.*') && !query_fact.include?('.')

processed_equery_fact = query_fact.gsub('\\', '\\\\\\\\')

return false if fact_with_wildcard && !query_fact.match("^#{fact_name}$")

# Must escape metacharacters (like dots) to ensure the correct fact is found
return false unless fact_with_wildcard || fact_name.match("^#{Regexp.escape(processed_equery_fact)}($|\\.)")

true
if fact_with_wildcard
# fact_name contains wildcard, so we're intentially not escaping.
query_fact.match("^#{fact_name}$")
else
processed_equery_fact = query_fact.gsub('\\', '\\\\\\\\')
# Must escape metacharacters (like dots) to ensure the correct fact is found
fact_name.match("^#{Regexp.escape(processed_equery_fact)}($|\\.)")
end
end

def construct_loaded_fact(query_tokens, loaded_fact)
Expand Down

0 comments on commit 69b0978

Please sign in to comment.