-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding files with no extension to be searched #105
Comments
Open the ./include/classifier/database.json file in your editor of choice. add a new entry like the following (changing pod to have whatever extension you need),
You can leave keywords empty. Be sure to validate that the file is still valid JSON. You can use a command line tool like jq to validate it like so
If its a common format please post it back here and I will include it into the default list. |
Just realised I misread this. You want to add a file without an extension. Currently there is no way to do this. I will need to look at fixing this. TODO - Add support for files without extension to be classified. |
Have some fortran without extensions here too ;) |
Hmmm for that situation would you need to rely totally on the keyword checks. This specific issue can be solved by just looking for an explicit filename. Keyword checks are something that I have been playing around with locally. The idea being that if a file doesn't match anything based on extension then use keywords to guess what filetype it is. It will slow things down considerably when indexing due to the additional processing overhead. Assuming its a minority of files though it shouldn't be a huge issue. |
Boyter, another way at least to deal with some of these would be something like this:
Two This would not fix the fortran of quasarea; but it would solve both the "License" and "Podfile" issues and allow people to easily ignore files in the classifier file... ;-) And rather than adding gitignore and npmignore to the "binary" file types I could also add them to the classifier and put the "ignore" = true flag on them... Might be a more universal cleaner fix for most issues, as everything it together then. Then your keyword checks could go into effect after this point to cover things like Fortran files w/o an extension... |
That was the plan for for your specific case. The fixedname thing. I would probably make it an array though just to cover things like COPYING and LICENSE both generally being license files. I was going to keep the ignores inside the properties file though. I will have a think about it in this case though. It might make more sense for specific types for them to live in the database file. |
The only reason I suggest |
Valid reasons. The main issue is during upgrades. Its a little more painful to migrate your own changes into the database file. The CPU hit should in theory be nothing thankfully. I might do it as though as I can see it being a better solution in the long term. |
So I was looking into this, and turns out some of it is already done. The problem is that I didn't make the database name "extensions" very descriptive. If you add the following,
To the database the file with the name cocoapod will be classified correctly. I made it such that if no file extension is specified with a . then the filename itself is treated as it. An example of this already happening is for Jenkins Buildfiles which looks like this
I will need to update the KB with this detail and probably add it as part of a readme in the directory itself. I will however be adding a check which tries to guess the file type given that nothing else matches. This will not however be 100% accurate as it will be based on the most common keywords in the database. Adding the ignored functionality however is something I will be adding. I have also added Cocopod into the database to save the effort of having to do this yourself in the future, b141810 |
Logic to guess file type given no matches added. Can be enabled by setting the property
In the searchcode.properties file. |
Documentation for KB updated https://searchcodeserver.com/knowledge-base/how-to-add-files-to-be-recognised.html |
- BREAKING CHANGE Changed validation of repository names such that they must be alphanumeric, _ or - with client and server side validation - BREAKING CHANGE Fix spelling of check_filerepo_chages to check_filerepo_changes for properties file - Set follow symlinks to be configurable through properties file #99 - Clicking Remove will also clear the text box filters #98 - Improved stop/reset jobs logic, deleted jobs persist on searchcode restart #41 - Add logic to calculate project stats by lines not files and display next to existing #103 - Deep guess logic added to guess a files type based on keyword heuristic's #105 - Additional languages added to classifier database, F#, Mathematica, Parrot, Puppet, Rakefile, PKGBUILD, Cargo, Lock, License - API auditing via logs added #57 - Search results now have RSS feed #114 - Can add custom HTML/CSS/JS to all pages #107 - Add average index time seconds to repo overview page #118 - Fix bug where unable to filter on html page #120
Is there a way to tag "Podfile" as a Cocoapod file; it has no extension so I'm not sure how to add it to the
include/classifier/database.json
file... I'd rather have it come up with as a known file...The text was updated successfully, but these errors were encountered: