Skip to content

Commit

Permalink
fix duplicated search result
Browse files Browse the repository at this point in the history
Summary:
Bug mentioned [here](https://fb.workplace.com/groups/factsaboutcode/permalink/2412053725810528/)

Symbols are duplicated in search result. I suspect we return descriptions from both `fbsource.fbcode.python` and `fbsource.fbcode.python.incr` (I got that from the repo_hash'es in the result, which match instances of these dbs).

Sledge-hammer fix, keep only one (arbitrary but  deterministic) description per key (symbol_id/repo) in [search result](https://www.internalfb.com/code/fbsource/[e81c7e9fcf17]/fbcode/glean/glass/if/glass.thrift?lines=576).

There may be better ways to fix this.  Query only one db? or return the most recent symbol

```
> buck2 run  fbcode//glean/glass/facebook/cli:glass   -- search  -d --kinds Class
```
Before

```
fbsource/fbcode/neteng/drainer/services/driver/py/core/lib/option.py:13:1-52:45:fbsource/py/fbcode/neteng.drainer.services.driver.py.core.lib.option.EnumAction neteng.drainer.services.driver.py.core.lib.option.EnumAction
fbsource/fbcode/ricardo/src/main.py:1309:1-1364:44:fbsource/py/fbcode/ricardo.src.main.EnumAction ricardo.src.main.EnumAction
fbsource/fbcode/hphp/facebook/tools/automator/__init__.py:81:1-111:31:fbsource/py/fbcode/automator.EnumAction automator.EnumAction
fbsource/fbcode/neteng/drainer/services/driver/py/core/lib/option.py:13:1-52:45:fbsource/py/fbcode/neteng.drainer.services.driver.py.core.lib.option.EnumAction neteng.drainer.services.driver.py.core.lib.option.EnumAction
fbsource/fbcode/ricardo/src/main.py:1309:1-1364:44:fbsource/py/fbcode/ricardo.src.main.EnumAction ricardo.src.main.EnumAction
fbsource/fbcode/hphp/facebook/tools/automator/__init__.py:81:1-111:31:fbsource/py/fbcode/automator.EnumAction automator.EnumAction
```

After

```
fbsource/fbcode/ricardo/src/main.py:1309:1-1364:44:fbsource/py/fbcode/ricardo.src.main.EnumAction ricardo.src.main.EnumAction
fbsource/fbcode/hphp/facebook/tools/automator/__init__.py:81:1-111:31:fbsource/py/fbcode/automator.EnumAction automator.EnumAction
fbsource/fbcode/neteng/drainer/services/driver/py/core/lib/option.py:13:1-52:45:fbsource/py/fbcode/neteng.drainer.services.driver.py.core.lib.option.EnumAction neteng.drainer.services.driver.py.core.lib.option.EnumAction
```

Reviewed By: pepeiborra

Differential Revision: D67762205

fbshipit-source-id: e284f0f7e9bc57a9d9e4632cae18efe0adfd22fe
  • Loading branch information
Philippe Bidinger authored and facebook-github-bot committed Jan 16, 2025
1 parent 008b92b commit cef3f58
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 3 deletions.
7 changes: 4 additions & 3 deletions glean/glass/Glean/Glass/Handler/Symbols.hs
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ import Glean.Glass.NameSearch (
SearchQuery(..), SingleSymbol, FeelingLuckyResult(..),
QueryExpr(..), RepoSearchResult, SymbolSearchData(..),
toSearchResult, ToSearchResult(..), AngleSearch(..), srEntity,
buildLuckyContainerQuery, buildSearchQuery
buildLuckyContainerQuery, buildSearchQuery, dedupSearchResult
)
import Glean.Glass.XRefs ( GenXRef(..) )
import Glean.Glass.Search as Search
Expand Down Expand Up @@ -408,13 +408,14 @@ joinSearchResults
joinSearchResults mlimit terse sorted xs = SymbolSearchResult syms $
if terse then [] else catMaybes descs
where
uniqXs = dedupSearchResult <$> xs
(syms,descs) = unzip $ nubOrd $ case (mlimit, sorted) of
(Nothing, _) -> flattened
(Just n, False) -> take n flattened
-- codehub/aka "sorted" mode grouping, ranking and sampling
(Just n, True) -> takeFairN n (concatMap sortResults xs)
(Just n, True) -> takeFairN n (concatMap sortResults uniqXs)

flattened = concat xs
flattened = concat uniqXs

--
-- DFS to first singleton result.
Expand Down
14 changes: 14 additions & 0 deletions glean/glass/Glean/Glass/NameSearch.hs
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ module Glean.Glass.NameSearch
, RepoSearchResult
, FeelingLuckyResult(..)
, SingleSymbol
, dedupSearchResult

-- * Search
-- ** Search flags
Expand Down Expand Up @@ -56,6 +57,7 @@ import Glean.Glass.Utils (splitOnAny, QueryType )
import qualified Glean.Schema.CodemarkupTypes.Types as Code
import qualified Glean.Schema.CodemarkupSearch.Types as CodeSearch
import qualified Glean.Schema.Code.Types as Code
import qualified Data.HashMap.Strict as Map

--
-- Finding entities by name search
Expand Down Expand Up @@ -651,6 +653,18 @@ instance ToSearchResult CodeSearch.SearchByScope where
-- | Type of processed search results from a single scm repo
type RepoSearchResult = [SingleSymbol]

-- | Ensure all SymbolResults in a (repo-wide) search result are unique.
--
-- We can have multiple descriptions when querying dbs
-- having the same content (for instance, incremental and full
-- DBs). In that case, descriptions differ only in the repo_hash
-- field.
-- dedupSearchResult picks one abitrarily. TODO pick the most
-- recent revision one, and don't discard a description if it
-- differs by more than the repo_hash field
dedupSearchResult :: RepoSearchResult -> RepoSearchResult
dedupSearchResult results = Map.toList $ Map.fromListWith max results

-- An un-concatenated set of query results to search for unique hits in
-- within one scm repo, across dbs, across queries, a set of result symbols.
newtype FeelingLuckyResult = FeelingLuckyResult [[RepoSearchResult]]
Expand Down

0 comments on commit cef3f58

Please sign in to comment.