From ea3f10fc3fb49f69595f2b76f75978331e54241c Mon Sep 17 00:00:00 2001 From: Vikas Bhansali <64532198+vibhansa-msft@users.noreply.github.com> Date: Mon, 3 Feb 2025 21:56:47 +0530 Subject: [PATCH] 2.4.1 Release Changes (#1587) * feat: support workload identity token (#1556) * feat: support workload identity token * Create block pool only once in child process (#1581) * create block pool in child only * Update golang.org/x/crypto to v0.31.0 (#1594) * Update golang.org/x/crypto to v0.31.0 * sync with main (#1603) * updated year in copyright message (#1601) * Use ListBlob for hns accounts (#1555) * Optimize HNS listing * Added statfs for block-cache (#1470) * Added statfs for block_cache * Add strong consistency check for data on disk (#1604) * Add strong consistency check for data on disk * bug in block cache open call (#1580) * current implementation of open file when opened in O_WRONLY will truncate the file to zero. This is incorrect behaviour. We don't see it in the normal scenario as write-back cache is on by default. Hence all the open calls with O_WRONLY will be redirected O_RDWR. To simulate this turn of the write-back cache and then open file in O_WRONLY. * Feature: Blob filter (#1595) * Integrating blob filter in azstorage * Serve getAttr call for destination file after the Copy finishes from the cache * Cleanup on start shall be set to cleanup temp cache (#1613) * Add Tests * Refactor the code and refresh the cache after copying the attributes * Automate blobfuse2 setup for new VM (#1575) added script for blobfuse setup and azsecpack setup in VM * * Update the Unit tests. * Refactor the Code * Update Changelog * do go fmt on src * Downgrade go version to 1.22.7 due to memory issues in 1.23 (#1619) * Enable ETAG based validation on every block download to provide higher consistency (#1608) * Make etag validation a defualt option * BUG#31069208: Fixed Prefix filtering from File Path (#1618) * Fixed the logic to filter out folder prefix from path * Added/Updated/Removed test case --------- Co-authored-by: weizhi Co-authored-by: Sourav Gupta <98318303+souravgupta-msft@users.noreply.github.com> Co-authored-by: Jan Jagusch <77677602+JanJaguschQC@users.noreply.github.com> Co-authored-by: ashruti-msft <137055338+ashruti-msft@users.noreply.github.com> Co-authored-by: syeleti-msft Co-authored-by: jainakanksha-msft --- CHANGELOG.md | 19 + MIGRATION.md | 2 +- NOTICE | 33 ++ README.md | 32 +- azure-pipeline-templates/huge-list-test.yml | 4 + cmd/mount.go | 2 +- common/types.go | 2 +- common/util.go | 13 + common/util_test.go | 10 + component/attr_cache/attr_cache.go | 22 +- component/attr_cache/attr_cache_test.go | 49 ++- component/azstorage/azauth.go | 1 + component/azstorage/azauthspn.go | 16 + component/azstorage/azstorage.go | 10 +- component/azstorage/block_blob.go | 395 ++++++++++++++------ component/azstorage/block_blob_test.go | 248 +++++++++--- component/azstorage/config.go | 83 ++-- component/azstorage/config_test.go | 4 +- component/azstorage/connection.go | 19 +- component/azstorage/datalake.go | 185 +++------ component/azstorage/datalake_test.go | 198 ++++++++-- component/azstorage/utils.go | 47 ++- component/azstorage/utils_test.go | 66 ++-- component/block_cache/block_cache.go | 159 ++++++-- component/block_cache/block_cache_test.go | 209 ++++++++++- component/file_cache/file_cache.go | 4 +- component/file_cache/file_cache_test.go | 2 +- component/libfuse/libfuse_handler.go | 12 +- component/loopback/loopback_fs.go | 7 + component/loopback/loopback_fs_test.go | 24 ++ go.mod | 25 +- go.sum | 50 +-- go_installer.sh | 2 +- internal/attribute.go | 21 +- internal/component_options.go | 8 +- internal/mock_component.go | 6 + setup/advancedConfig.yaml | 1 + setup/baseConfig.yaml | 1 + setup/setupUBN.sh | 48 +++ setup/vmSetupAzSecPack.sh | 108 ++++++ 40 files changed, 1625 insertions(+), 522 deletions(-) create mode 100755 setup/setupUBN.sh create mode 100755 setup/vmSetupAzSecPack.sh diff --git a/CHANGELOG.md b/CHANGELOG.md index 2a55337ef..e354758af 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,21 @@ + +## 2.4.1 (Unreleased) +**Bug Fixes** +- Create block pool only in the child process. +- Prevent the block cache to truncate the file size to zero when the file is opened in O_WRONLY mode when writebackcache is disabled. +- Correct statFS results to reflect block-cache in memory cache status. +- Do not wipeout temp-cache on start after a un-graceful unmount, if `cleanup-on-start` is not configured in file-cache. +- When the subdirectory is mounted and there is some file/folder operation, remove only the subdirectory path from the file paths. + +**Other Changes** +- Optimized listing operation on HNS account to support symlinks. +- Optimized Rename operation to do less number of REST calls. + +**Features** +- Mount container or directory but restrict the view of blobs that you can see. This feature is available only in read-only mount. +- To protect against accidental overwrites on data stored by block-cache on temp path, crc64 hash will be validated on read. This feature can be enabled by using `--block-cache-strong-consistency` cli flag. +- To provide strong consistency check, ETAG of the file will be preserved on open. For any subsequent block download, with block-cache, ETAG will be verified and if the blob has changed in container the download will be declare failure resulting into read failure. + ## 2.4.0 (2024-12-03) **Features** - Added 'gen-config' command to auto generate the recommended blobfuse2 config file based on computing resources and memory available on the node. Command details can be found with `blobfuse2 gen-config --help`. @@ -16,6 +34,7 @@ - `Stream` option automatically replaced with "Stream with Block-cache" internally for optimized performance. - Login via Managed Identify is supported with Object-ID for all versions of blobfuse except 2.3.0 and 2.3.2.To use Object-ID for these two versions, use AzCLI or utilize Application/Client-ID or Resource ID base authentication.. - Version check is now moved to a static website hosted on a public container. +- 'df' command output will present memory availability in case of block-cache if disk is not configured. ## 2.3.2 (2024-09-03) **Bug Fixes** diff --git a/MIGRATION.md b/MIGRATION.md index 7df237086..55317524d 100644 --- a/MIGRATION.md +++ b/MIGRATION.md @@ -99,7 +99,7 @@ Note: Blobfuse2 accepts all CLI parameters that Blobfuse does, but may ignore pa | --log-level=LOG_WARNING | --log-level=LOG_WARNING | logging.level | | | --use-attr-cache=true | --use-attr-cache=true | attr_cache | Add attr_cache to the components list | | --use-adls=false | --use-adls=false | azstorage.type | Specify either 'block' or 'adls' | -| --no-symlinks=false | --no-symlinks=true | attr_cache.no-symlinks | | +| --no-symlinks=false | --no-symlinks=false | attr_cache.no-symlinks | | | --cache-on-list=true | --cache-on-list=true | attr_cache.no-cache-on-list | This parameter has the opposite boolean semantics | | --upload-modified-only=true | --upload-modified-only=true | | Always on in blobfuse2 | | --max-concurrency=12 | --max-concurrency=12 | azstorage.max-concurrency | | diff --git a/NOTICE b/NOTICE index 7ceed1624..b944b7636 100644 --- a/NOTICE +++ b/NOTICE @@ -4093,4 +4093,37 @@ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + + + +**************************************************************************** + +============================================================================ +>>> github.com/vibhansa-msft/blobfilter +============================================================================== + +MIT License + +Copyright (c) 2024 Vikas Bhansali + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + + --------------------- END OF THIRD PARTY NOTICE -------------------------------- diff --git a/README.md b/README.md index e857c9fb3..1a33fac54 100755 --- a/README.md +++ b/README.md @@ -66,6 +66,7 @@ One of the biggest BlobFuse2 features is our brand new health monitor. It allows - Set MD5 sum of a blob while uploading - Validate MD5 sum on download and fail file open on mismatch - Large file writing through write Block-Cache +- Blob filter to view only files matching given criteria for read-only mount ## Blobfuse2 performance compared to blobfuse(v1.x.x) - 'git clone' operation is 25% faster (tested with vscode repo cloning) @@ -139,9 +140,10 @@ To learn about a specific command, just include the name of the command (For exa * `--wait-for-mount=` : Let parent process wait for given timeout before exit to ensure child has started. * `--block-cache` : To enable block-cache instead of file-cache. This works only when mounted without any config file. * `--lazy-write` : To enable async close file handle call and schedule the upload in background. + * `--filter=`: Enable blob filters for read-only mount to restrict the view on what all blobs user can see or read. - Attribute cache options * `--attr-cache-timeout=`: The timeout for the attribute cache entries. - * `--no-symlinks=true`: To improve performance disable symlink support. + * `--no-symlinks=false`: By default symlinks will be supported and the performance overhead, that earlier existed, has been resolved. - Storage options * `--container-name=`: The container to mount. * `--cancel-list-on-mount-seconds=`: Time for which list calls will be blocked after mount. ( prevent billing charges on mounting) @@ -166,6 +168,8 @@ To learn about a specific command, just include the name of the command (For exa * `--block-cache-prefetch=`: Number of blocks to prefetch at max when sequential reads are in progress. Default - 2 times number of CPU cores. * `--block-cache-parallelism=`: Number of parallel threads doing upload/download operation. Default - 3 times number of CPU cores. * `--block-cache-prefetch-on-open=true`: Start prefetching on open system call instead of waiting for first read. Enhances perf if file is read sequentially from offset 0. + * `--block-cache-strong-consistency=true`: Enable strong data consistency checks in block-cache. This will increase load on your CPU and may introduce some latency. + This will need support of `xattr` on your system. Kindly install the feature manually before using this cli parameter. - Fuse options * `--attr-timeout=`: Time the kernel can cache inode attributes. * `--entry-timeout=`: Time the kernel can cache directory listing. @@ -235,6 +239,32 @@ Below diagrams guide you to choose right configuration for your workloads. - [Sample Block-Cache Config](./sampleBlockCacheConfig.yaml) - [All Config options](./setup/baseConfig.yaml) +## Blob Filter +- In case of read-only mount, user can configure a filter to restrict what all blobs a mount can see or operate on. +- Blobfuse supports filters based on + - Name + - Size + - Last modified time + - File extension +- Blob Name based filter + - Supported operations are "=" and "!=" + - Name shall be a valid regex expression + - e.g. ```filter=name=^mine[0-1]\\d{3}.*``` +- Size based filter + - Supported operations are "<=", ">=", "!=", "<", ">" and "=" + - Size shall be provided in bytes + - e.g. ```filter=size > 1000``` +- Last Modified Date based filter + - Supported operations are "<=", ">=", "<", ">" and "=" + - Date shall be provided in RFC1123 Format e.g. "Mon, 24 Jan 1982 13:00:00 UTC" + - e.g. ```filter=modtime>Mon, 24 Jan 1982 13:00:00 UTC``` +- File Extension based filter + - Supported operations are "=" and "!=" + - Extension can be supplied as string. Do not include "." in the filter + - e.g. ```--filter=format=pdf``` +- Multiple filters can be combined using '&&' and '||' operator as well, however precedence using '()' is not supported yet. + - e.g. ```--filter=name=^testfil.* && size>130000000``` + ## Frequently Asked Questions - How do I generate a SAS with permissions for rename? diff --git a/azure-pipeline-templates/huge-list-test.yml b/azure-pipeline-templates/huge-list-test.yml index c6f292077..03949d926 100755 --- a/azure-pipeline-templates/huge-list-test.yml +++ b/azure-pipeline-templates/huge-list-test.yml @@ -55,6 +55,10 @@ steps: env: mount_dir: ${{ parameters.mount_dir }} + - script: grep "OUTGOING REQUEST" blobfuse2-logs.txt | wc -l + displayName: 'HugeList: ${{ parameters.idstring }} Request Count' + continueOnError: true + - script: | cat blobfuse2-logs.txt displayName: 'View Logs' diff --git a/cmd/mount.go b/cmd/mount.go index 5daac3a1b..10610baa4 100644 --- a/cmd/mount.go +++ b/cmd/mount.go @@ -127,7 +127,7 @@ func (opt *mountOptions) validate(skipNonEmptyMount bool) error { var cleanupOnStart bool _ = config.UnmarshalKey("file_cache.cleanup-on-start", &cleanupOnStart) - if tempCachePath != "" && !cleanupOnStart { + if tempCachePath != "" && cleanupOnStart { if err = common.TempCacheCleanup(tempCachePath); err != nil { return fmt.Errorf("failed to cleanup file cache [%s]", err.Error()) } diff --git a/common/types.go b/common/types.go index 8e4b2a8f7..77de972c2 100644 --- a/common/types.go +++ b/common/types.go @@ -47,7 +47,7 @@ import ( // Standard config default values const ( - blobfuse2Version_ = "2.4.0" + blobfuse2Version_ = "2.4.1" DefaultMaxLogFileSize = 512 DefaultLogFileCount = 10 diff --git a/common/util.go b/common/util.go index cbedc2fe7..337059c23 100644 --- a/common/util.go +++ b/common/util.go @@ -39,7 +39,9 @@ import ( "crypto/aes" "crypto/cipher" "crypto/rand" + "encoding/binary" "fmt" + "hash/crc64" "io" "os" "os/exec" @@ -500,3 +502,14 @@ func WriteToFile(filename string, data string, options WriteToFileOptions) error return nil } + +func GetCRC64(data []byte, len int) []byte { + // Create a CRC64 hash using the ECMA polynomial + crc64Table := crc64.MakeTable(crc64.ECMA) + checksum := crc64.Checksum(data[:len], crc64Table) + + checksumBytes := make([]byte, 8) + binary.BigEndian.PutUint64(checksumBytes, checksum) + + return checksumBytes +} diff --git a/common/util_test.go b/common/util_test.go index a8d472bfd..4fdb132a3 100644 --- a/common/util_test.go +++ b/common/util_test.go @@ -363,6 +363,16 @@ func (suite *utilTestSuite) TestWriteToFile() { } +func (suite *utilTestSuite) TestCRC64() { + data := []byte("Hello World") + crc := GetCRC64(data, len(data)) + + data = []byte("Hello World!") + crc1 := GetCRC64(data, len(data)) + + suite.assert.NotEqual(crc, crc1) +} + func (suite *utilTestSuite) TestGetFuseMinorVersion() { i := GetFuseMinorVersion() suite.assert.GreaterOrEqual(i, 0) diff --git a/component/attr_cache/attr_cache.go b/component/attr_cache/attr_cache.go index ea8f347d0..7c8939125 100644 --- a/component/attr_cache/attr_cache.go +++ b/component/attr_cache/attr_cache.go @@ -223,6 +223,21 @@ func (ac *AttrCache) invalidateDirectory(path string) { ac.invalidatePath(path) } +// Copies the attr to the given path. +func (ac *AttrCache) updateCacheEntry(path string, attr *internal.ObjAttr) { + cacheEntry, found := ac.cacheMap[path] + if found { + // Copy the attr + cacheEntry.attr = attr + // Update the path inside the attr + cacheEntry.attr.Path = path + // Update the Existence of the entry + cacheEntry.attrFlag.Set(AttrFlagExists) + // Refresh the cache entry + cacheEntry.cachedAt = time.Now() + } +} + // invalidatePath: invalidates a path func (ac *AttrCache) invalidatePath(path string) { // Keys in the cache map do not contain trailing /, truncate the path before referencing a key in the map. @@ -360,14 +375,15 @@ func (ac *AttrCache) DeleteFile(options internal.DeleteFileOptions) error { // RenameFile : Mark the source file deleted. Invalidate the destination file. func (ac *AttrCache) RenameFile(options internal.RenameFileOptions) error { log.Trace("AttrCache::RenameFile : %s -> %s", options.Src, options.Dst) - + srcAttr := options.SrcAttr err := ac.NextComponent().RenameFile(options) if err == nil { + // Copy source attribute to destination. + // LMT of Source will be modified by next component if the copy is success. ac.cacheLock.RLock() defer ac.cacheLock.RUnlock() - + ac.updateCacheEntry(options.Dst, srcAttr) ac.deletePath(options.Src, time.Now()) - ac.invalidatePath(options.Dst) } return err diff --git a/component/attr_cache/attr_cache_test.go b/component/attr_cache/attr_cache_test.go index 6dbf0cbec..195baff6b 100644 --- a/component/attr_cache/attr_cache_test.go +++ b/component/attr_cache/attr_cache_test.go @@ -122,6 +122,25 @@ func assertUntouched(suite *attrCacheTestSuite, path string) { suite.assert.True(suite.attrCache.cacheMap[path].exists()) } +// This method is used when we transfer the attributes from the src to dst, and mark src as invalid +func assertAttributesTransferred(suite *attrCacheTestSuite, srcAttr *internal.ObjAttr, dstAttr *internal.ObjAttr) { + suite.assert.EqualValues(srcAttr.Size, dstAttr.Size) + suite.assert.EqualValues(srcAttr.Path, dstAttr.Path) + suite.assert.EqualValues(srcAttr.Mode, dstAttr.Mode) + suite.assert.EqualValues(srcAttr.Atime, dstAttr.Atime) + suite.assert.EqualValues(srcAttr.Mtime, dstAttr.Mtime) + suite.assert.EqualValues(srcAttr.Ctime, dstAttr.Ctime) + suite.assert.True(suite.attrCache.cacheMap[dstAttr.Path].exists()) + suite.assert.True(suite.attrCache.cacheMap[dstAttr.Path].valid()) +} + +// If next component changes the times of the attribute. +func assertSrcAttributeTimeChanged(suite *attrCacheTestSuite, srcAttr *internal.ObjAttr, srcAttrCopy internal.ObjAttr) { + suite.assert.NotEqualValues(suite, srcAttr.Atime, srcAttrCopy.Atime) + suite.assert.NotEqualValues(suite, srcAttr.Mtime, srcAttrCopy.Mtime) + suite.assert.NotEqualValues(suite, srcAttr.Ctime, srcAttrCopy.Ctime) +} + // Directory structure // a/ // @@ -676,15 +695,41 @@ func (suite *attrCacheTestSuite) TestRenameFile() { suite.assert.NotContains(suite.attrCache.cacheMap, src) suite.assert.NotContains(suite.attrCache.cacheMap, dst) - // Entry Already Exists + // Src, Dst Entry Already Exists addPathToCache(suite.assert, suite.attrCache, src, false) addPathToCache(suite.assert, suite.attrCache, dst, false) + options.SrcAttr = suite.attrCache.cacheMap[src].attr + options.SrcAttr.Size = 1 + options.SrcAttr.Mode = 2 + options.DstAttr = suite.attrCache.cacheMap[dst].attr + options.DstAttr.Size = 3 + options.DstAttr.Mode = 4 + srcAttrCopy := *options.SrcAttr + suite.mock.EXPECT().RenameFile(options).Return(nil) + err = suite.attrCache.RenameFile(options) + suite.assert.Nil(err) + assertDeleted(suite, src) + modifiedDstAttr := suite.attrCache.cacheMap[dst].attr + assertSrcAttributeTimeChanged(suite, options.SrcAttr, srcAttrCopy) + // Check the attributes of the dst are same as the src. + assertAttributesTransferred(suite, options.SrcAttr, modifiedDstAttr) + // Src Entry Exist and Dst Entry Don't Exist + addPathToCache(suite.assert, suite.attrCache, src, false) + // Add negative entry to cache for Dst + suite.attrCache.cacheMap[dst] = newAttrCacheItem(&internal.ObjAttr{}, false, time.Now()) + options.SrcAttr = suite.attrCache.cacheMap[src].attr + options.DstAttr = suite.attrCache.cacheMap[dst].attr + options.SrcAttr.Size = 1 + options.SrcAttr.Mode = 2 + suite.mock.EXPECT().RenameFile(options).Return(nil) err = suite.attrCache.RenameFile(options) suite.assert.Nil(err) assertDeleted(suite, src) - assertInvalid(suite, dst) + modifiedDstAttr = suite.attrCache.cacheMap[dst].attr + assertSrcAttributeTimeChanged(suite, options.SrcAttr, srcAttrCopy) + assertAttributesTransferred(suite, options.SrcAttr, modifiedDstAttr) } // Tests Write File diff --git a/component/azstorage/azauth.go b/component/azstorage/azauth.go index 96113f7cf..450b3849b 100644 --- a/component/azstorage/azauth.go +++ b/component/azstorage/azauth.go @@ -63,6 +63,7 @@ type azAuthConfig struct { ClientID string ClientSecret string OAuthTokenFilePath string + WorkloadIdentityToken string ActiveDirectoryEndpoint string Endpoint string diff --git a/component/azstorage/azauthspn.go b/component/azstorage/azauthspn.go index c36795758..aeaa5d616 100644 --- a/component/azstorage/azauthspn.go +++ b/component/azstorage/azauthspn.go @@ -34,6 +34,8 @@ package azstorage import ( + "context" + "github.com/Azure/azure-sdk-for-go/sdk/azcore" "github.com/Azure/azure-sdk-for-go/sdk/azidentity" "github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/service" @@ -69,6 +71,20 @@ func (azspn *azAuthSPN) getTokenCredential() (azcore.TokenCredential, error) { log.Err("AzAuthSPN::getTokenCredential : Failed to generate token for SPN [%s]", err.Error()) return nil, err } + } else if azspn.config.WorkloadIdentityToken != "" { + log.Trace("AzAuthSPN::getTokenCredential : Going for fedrated token flow ") + + cred, err = azidentity.NewClientAssertionCredential( + azspn.config.TenantID, + azspn.config.ClientID, + func(ctx context.Context) (string, error) { + return azspn.config.WorkloadIdentityToken, nil + }, + &azidentity.ClientAssertionCredentialOptions{}) + if err != nil { + log.Err("AzAuthSPN::getTokenCredential : Failed to generate token for SPN [%s]", err.Error()) + return nil, err + } } else { log.Trace("AzAuthSPN::getTokenCredential : Using client secret for fetching token") diff --git a/component/azstorage/azstorage.go b/component/azstorage/azstorage.go index e054e5419..1cd7e450d 100644 --- a/component/azstorage/azstorage.go +++ b/component/azstorage/azstorage.go @@ -423,7 +423,7 @@ func (az *AzStorage) DeleteFile(options internal.DeleteFileOptions) error { func (az *AzStorage) RenameFile(options internal.RenameFileOptions) error { log.Trace("AzStorage::RenameFile : %s to %s", options.Src, options.Dst) - err := az.storage.RenameFile(options.Src, options.Dst) + err := az.storage.RenameFile(options.Src, options.Dst, options.SrcAttr) if err == nil { azStatsCollector.PushEvents(renameFile, options.Src, map[string]interface{}{src: options.Src, dest: options.Dst}) @@ -453,7 +453,8 @@ func (az *AzStorage) ReadInBuffer(options internal.ReadInBufferOptions) (length return 0, nil } - err = az.storage.ReadInBuffer(options.Handle.Path, options.Offset, dataLen, options.Data) + err = az.storage.ReadInBuffer(options.Handle.Path, options.Offset, dataLen, options.Data, options.Etag) + if err != nil { log.Err("AzStorage::ReadInBuffer : Failed to read %s [%s]", options.Handle.Path, err.Error()) } @@ -555,7 +556,7 @@ func (az *AzStorage) StageData(opt internal.StageDataOptions) error { } func (az *AzStorage) CommitData(opt internal.CommitDataOptions) error { - return az.storage.CommitBlocks(opt.Name, opt.List) + return az.storage.CommitBlocks(opt.Name, opt.List, opt.NewETag) } // TODO : Below methods are pending to be implemented @@ -665,6 +666,9 @@ func init() { preserveACL := config.AddBoolFlag("preserve-acl", false, "Preserve ACL and Permissions set on file during updates") config.BindPFlag(compName+".preserve-acl", preserveACL) + blobFilter := config.AddStringFlag("filter", "", "Filter string to match blobs") + config.BindPFlag(compName+".filter", blobFilter) + config.RegisterFlagCompletionFunc("container-name", func(cmd *cobra.Command, args []string, toComplete string) ([]string, cobra.ShellCompDirective) { return nil, cobra.ShellCompDirectiveNoFileComp }) diff --git a/component/azstorage/block_blob.go b/component/azstorage/block_blob.go index 4480faedd..6d0b1c13d 100644 --- a/component/azstorage/block_blob.go +++ b/component/azstorage/block_blob.go @@ -39,6 +39,7 @@ import ( "encoding/base64" "errors" "fmt" + "io" "math" "os" "path/filepath" @@ -57,6 +58,7 @@ import ( "github.com/Azure/azure-storage-fuse/v2/common/log" "github.com/Azure/azure-storage-fuse/v2/internal" "github.com/Azure/azure-storage-fuse/v2/internal/stats_manager" + "github.com/vibhansa-msft/blobfilter" ) const ( @@ -101,9 +103,10 @@ func (bb *BlockBlob) Configure(cfg AzStorageConfig) error { } bb.listDetails = container.ListBlobsInclude{ - Metadata: true, - Deleted: false, - Snapshots: false, + Metadata: true, + Deleted: false, + Snapshots: false, + Permissions: false, //Added to get permissions, acl, group, owner for HNS accounts } return nil @@ -283,26 +286,6 @@ func (bb *BlockBlob) DeleteFile(name string) (err error) { // DeleteDirectory : Delete a virtual directory in the container/virtual directory func (bb *BlockBlob) DeleteDirectory(name string) (err error) { log.Trace("BlockBlob::DeleteDirectory : name %s", name) - - pager := bb.Container.NewListBlobsFlatPager(&container.ListBlobsFlatOptions{ - Prefix: to.Ptr(filepath.Join(bb.Config.prefixPath, name) + "/"), - }) - for pager.More() { - listBlobResp, err := pager.NextPage(context.Background()) - if err != nil { - log.Err("BlockBlob::DeleteDirectory : Failed to get list of blobs %s", err.Error()) - return err - } - - // Process the blobs returned in this result segment (if the segment is empty, the loop body won't execute) - for _, blobInfo := range listBlobResp.Segment.BlobItems { - err = bb.DeleteFile(split(bb.Config.prefixPath, *blobInfo.Name)) - if err != nil { - log.Err("BlockBlob::DeleteDirectory : Failed to delete file %s [%s]", *blobInfo.Name, err.Error()) - } - } - } - err = bb.DeleteFile(name) // libfuse deletes the files in the directory before this method is called. // If the marker blob for directory is not present, ignore the ENOENT error. @@ -314,7 +297,11 @@ func (bb *BlockBlob) DeleteDirectory(name string) (err error) { // RenameFile : Rename the file // Source file must exist in storage account before calling this method. -func (bb *BlockBlob) RenameFile(source string, target string) error { +// When the rename is success, Data, metadata, of the blob will be copied to the destination. +// Creation time and LMT is not preserved for copyBlob API. +// Copy the LMT to the src attr if the copy is success. +// https://learn.microsoft.com/en-us/rest/api/storageservices/copy-blob?tabs=microsoft-entra-id +func (bb *BlockBlob) RenameFile(source string, target string, srcAttr *internal.ObjAttr) error { log.Trace("BlockBlob::RenameFile : %s -> %s", source, target) blobClient := bb.Container.NewBlockBlobClient(filepath.Join(bb.Config.prefixPath, source)) @@ -322,7 +309,7 @@ func (bb *BlockBlob) RenameFile(source string, target string) error { // not specifying source blob metadata, since passing empty metadata headers copies // the source blob metadata to destination blob - startCopy, err := newBlobClient.StartCopyFromURL(context.Background(), blobClient.URL(), &blob.StartCopyFromURLOptions{ + copyResponse, err := newBlobClient.StartCopyFromURL(context.Background(), blobClient.URL(), &blob.StartCopyFromURLOptions{ Tier: bb.Config.defaultTier, }) @@ -338,10 +325,15 @@ func (bb *BlockBlob) RenameFile(source string, target string) error { return err } - copyStatus := startCopy.CopyStatus + var dstLMT *time.Time = copyResponse.LastModified + + copyStatus := copyResponse.CopyStatus + var prop blob.GetPropertiesResponse + pollCnt := 0 for copyStatus != nil && *copyStatus == blob.CopyStatusTypePending { time.Sleep(time.Second * 1) - prop, err := newBlobClient.GetProperties(context.Background(), &blob.GetPropertiesOptions{ + pollCnt++ + prop, err = newBlobClient.GetProperties(context.Background(), &blob.GetPropertiesOptions{ CPKInfo: bb.blobCPKOpt, }) if err != nil { @@ -350,6 +342,14 @@ func (bb *BlockBlob) RenameFile(source string, target string) error { copyStatus = prop.CopyStatus } + if pollCnt > 0 { + dstLMT = prop.LastModified + } + + if copyStatus != nil && *copyStatus == blob.CopyStatusTypeSuccess { + modifyLMT(srcAttr, dstLMT) + } + log.Trace("BlockBlob::RenameFile : %s -> %s done", source, target) // Copy of the file is done so now delete the older file @@ -390,8 +390,8 @@ func (bb *BlockBlob) RenameDirectory(source string, target string) error { // Process the blobs returned in this result segment (if the segment is empty, the loop body won't execute) for _, blobInfo := range listBlobResp.Segment.BlobItems { srcDirPresent = true - srcPath := split(bb.Config.prefixPath, *blobInfo.Name) - err = bb.RenameFile(srcPath, strings.Replace(srcPath, source, target, 1)) + srcPath := removePrefixPath(bb.Config.prefixPath, *blobInfo.Name) + err = bb.RenameFile(srcPath, strings.Replace(srcPath, source, target, 1), nil) if err != nil { log.Err("BlockBlob::RenameDirectory : Failed to rename file %s [%s]", srcPath, err.Error) } @@ -417,7 +417,7 @@ func (bb *BlockBlob) RenameDirectory(source string, target string) error { } } - return bb.RenameFile(source, target) + return bb.RenameFile(source, target, nil) } func (bb *BlockBlob) getAttrUsingRest(name string) (attr *internal.ObjAttr, err error) { @@ -429,10 +429,10 @@ func (bb *BlockBlob) getAttrUsingRest(name string) (attr *internal.ObjAttr, err }) if err != nil { - e := storeBlobErrToErr(err) - if e == ErrFileNotFound { + serr := storeBlobErrToErr(err) + if serr == ErrFileNotFound { return attr, syscall.ENOENT - } else if e == InvalidPermission { + } else if serr == InvalidPermission { log.Err("BlockBlob::getAttrUsingRest : Insufficient permissions for %s [%s]", name, err.Error()) return attr, syscall.EACCES } else { @@ -453,10 +453,12 @@ func (bb *BlockBlob) getAttrUsingRest(name string) (attr *internal.ObjAttr, err Crtime: *prop.CreationTime, Flags: internal.NewFileBitMap(), MD5: prop.ContentMD5, + ETag: strings.Trim(string(*prop.ETag), `"`), } parseMetadata(attr, prop.Metadata) + // We do not get permissions as part of this getAttr call hence setting the flag to true attr.Flags.Set(internal.PropFlagModeDefault) return attr, nil @@ -516,10 +518,23 @@ func (bb *BlockBlob) GetAttr(name string) (attr *internal.ObjAttr, err error) { // To support virtual directories with no marker blob, we call list instead of get properties since list will not return a 404 if bb.Config.virtualDirectory { - return bb.getAttrUsingList(name) + attr, err = bb.getAttrUsingList(name) + } else { + attr, err = bb.getAttrUsingRest(name) } - return bb.getAttrUsingRest(name) + if bb.Config.filter != nil && attr != nil { + if !bb.Config.filter.IsAcceptable(&blobfilter.BlobAttr{ + Name: attr.Name, + Mtime: attr.Mtime, + Size: attr.Size, + }) { + log.Debug("BlockBlob::GetAttr : Filtered out %s", name) + return nil, syscall.ENOENT + } + } + + return attr, err } // List : Get a list of blobs matching the given prefix @@ -534,16 +549,11 @@ func (bb *BlockBlob) List(prefix string, marker *string, count int32) ([]*intern } }(marker)) - blobList := make([]*internal.ObjAttr, 0) - if count == 0 { count = common.MaxDirListCount } - listPath := filepath.Join(bb.Config.prefixPath, prefix) - if (prefix != "" && prefix[len(prefix)-1] == '/') || (prefix == "" && bb.Config.prefixPath != "") { - listPath += "/" - } + listPath := bb.getListPath(prefix) // Get a result segment starting with the blob indicated by the current Marker. pager := bb.Container.NewListBlobsHierarchyPager("/", &container.ListBlobsHierarchyOptions{ @@ -562,85 +572,143 @@ func (bb *BlockBlob) List(prefix string, marker *string, count int32) ([]*intern if err != nil { log.Err("BlockBlob::List : Failed to list the container with the prefix %s", err.Error) - return blobList, nil, err - } - - dereferenceTime := func(input *time.Time, defaultTime time.Time) time.Time { - if input == nil { - return defaultTime - } else { - return *input - } + return nil, nil, err } // Process the blobs returned in this result segment (if the segment is empty, the loop body won't execute) // Since block blob does not support acls, we set mode to 0 and FlagModeDefault to true so the fuse layer can return the default permission. + blobList, dirList, err := bb.processBlobItems(listBlob.Segment.BlobItems) + if err != nil { + return nil, nil, err + } + + // In case virtual directory exists but its corresponding 0 byte marker file is not there holding hdi_isfolder then just iterating + // over BlobItems will fail to identify that directory. In such cases BlobPrefixes help to list all directories + // dirList contains all dirs for which we got 0 byte meta file in this iteration, so exclude those and add rest to the list + // Note: Since listing is paginated, sometimes the marker file may come in a different iteration from the BlobPrefix. For such + // cases we manually call GetAttr to check the existence of the marker file. + err = bb.processBlobPrefixes(listBlob.Segment.BlobPrefixes, dirList, &blobList) + if err != nil { + return nil, nil, err + } + + return blobList, listBlob.NextMarker, nil +} + +func (bb *BlockBlob) getListPath(prefix string) string { + listPath := filepath.Join(bb.Config.prefixPath, prefix) + if (prefix != "" && prefix[len(prefix)-1] == '/') || (prefix == "" && bb.Config.prefixPath != "") { + listPath += "/" + } + return listPath +} + +func (bb *BlockBlob) processBlobItems(blobItems []*container.BlobItem) ([]*internal.ObjAttr, map[string]bool, error) { + blobList := make([]*internal.ObjAttr, 0) // For some directories 0 byte meta file may not exists so just create a map to figure out such directories - var dirList = make(map[string]bool) - for _, blobInfo := range listBlob.Segment.BlobItems { - attr := &internal.ObjAttr{} - if blobInfo.Properties.CustomerProvidedKeySHA256 != nil && *blobInfo.Properties.CustomerProvidedKeySHA256 != "" { - log.Trace("BlockBlob::List : blob is encrypted with customer provided key so fetching metadata explicitly using REST") - attr, err = bb.getAttrUsingRest(*blobInfo.Name) - if err != nil { - log.Err("BlockBlob::List : Failed to get properties of blob %s", *blobInfo.Name) - return blobList, nil, err - } - } else { - attr = &internal.ObjAttr{ - Path: split(bb.Config.prefixPath, *blobInfo.Name), - Name: filepath.Base(*blobInfo.Name), - Size: *blobInfo.Properties.ContentLength, - Mode: 0, - Mtime: *blobInfo.Properties.LastModified, - Atime: dereferenceTime(blobInfo.Properties.LastAccessedOn, *blobInfo.Properties.LastModified), - Ctime: *blobInfo.Properties.LastModified, - Crtime: dereferenceTime(blobInfo.Properties.CreationTime, *blobInfo.Properties.LastModified), - Flags: internal.NewFileBitMap(), - MD5: blobInfo.Properties.ContentMD5, - } - parseMetadata(attr, blobInfo.Metadata) - attr.Flags.Set(internal.PropFlagModeDefault) + dirList := make(map[string]bool) + filterAttr := blobfilter.BlobAttr{} + + for _, blobInfo := range blobItems { + blobAttr, err := bb.getBlobAttr(blobInfo) + if err != nil { + return nil, nil, err } - blobList = append(blobList, attr) - if attr.IsDir() { + if blobAttr.IsDir() { // 0 byte meta found so mark this directory in map dirList[*blobInfo.Name+"/"] = true - attr.Size = 4096 + blobAttr.Size = 4096 + } + + if bb.Config.filter != nil && !blobAttr.IsDir() { + filterAttr.Name = blobAttr.Name + filterAttr.Mtime = blobAttr.Mtime + filterAttr.Size = blobAttr.Size + + if bb.Config.filter.IsAcceptable(&filterAttr) { + blobList = append(blobList, blobAttr) + } else { + log.Debug("BlockBlob::List : Filtered out blob %s", blobAttr.Name) + } + } else { + blobList = append(blobList, blobAttr) } } - // In case virtual directory exists but its corresponding 0 byte marker file is not there holding hdi_isfolder then just iterating - // over BlobItems will fail to identify that directory. In such cases BlobPrefixes help to list all directories - // dirList contains all dirs for which we got 0 byte meta file in this iteration, so exclude those and add rest to the list - // Note: Since listing is paginated, sometimes the marker file may come in a different iteration from the BlobPrefix. For such - // cases we manually call GetAttr to check the existence of the marker file. - for _, blobInfo := range listBlob.Segment.BlobPrefixes { + return blobList, dirList, nil +} + +func (bb *BlockBlob) getBlobAttr(blobInfo *container.BlobItem) (*internal.ObjAttr, error) { + if blobInfo.Properties.CustomerProvidedKeySHA256 != nil && *blobInfo.Properties.CustomerProvidedKeySHA256 != "" { + log.Trace("BlockBlob::List : blob is encrypted with customer provided key so fetching metadata explicitly using REST") + return bb.getAttrUsingRest(*blobInfo.Name) + } + mode, err := bb.getFileMode(blobInfo.Properties.Permissions) + if err != nil { + mode = 0 + log.Warn("BlockBlob::getBlobAttr : Failed to get file mode for %s [%s]", *blobInfo.Name, err.Error()) + } + + attr := &internal.ObjAttr{ + Path: removePrefixPath(bb.Config.prefixPath, *blobInfo.Name), + Name: filepath.Base(*blobInfo.Name), + Size: *blobInfo.Properties.ContentLength, + Mode: mode, + Mtime: *blobInfo.Properties.LastModified, + Atime: bb.dereferenceTime(blobInfo.Properties.LastAccessedOn, *blobInfo.Properties.LastModified), + Ctime: *blobInfo.Properties.LastModified, + Crtime: bb.dereferenceTime(blobInfo.Properties.CreationTime, *blobInfo.Properties.LastModified), + Flags: internal.NewFileBitMap(), + MD5: blobInfo.Properties.ContentMD5, + ETag: strings.Trim((string)(*blobInfo.Properties.ETag), `"`), + } + + parseMetadata(attr, blobInfo.Metadata) + if !bb.listDetails.Permissions { + // In case of HNS account do not set this flag + attr.Flags.Set(internal.PropFlagModeDefault) + } + + return attr, nil +} + +func (bb *BlockBlob) getFileMode(permissions *string) (os.FileMode, error) { + if permissions == nil { + return 0, nil + } + return getFileMode(*permissions) +} + +func (bb *BlockBlob) dereferenceTime(input *time.Time, defaultTime time.Time) time.Time { + if input == nil { + return defaultTime + } + return *input +} + +func (bb *BlockBlob) processBlobPrefixes(blobPrefixes []*container.BlobPrefix, dirList map[string]bool, blobList *[]*internal.ObjAttr) error { + for _, blobInfo := range blobPrefixes { if _, ok := dirList[*blobInfo.Name]; ok { // marker file found in current iteration, skip adding the directory continue } else { - // marker file not found in current iteration, so we need to manually check attributes via REST - _, err := bb.getAttrUsingRest(*blobInfo.Name) - // marker file also not found via manual check, safe to add to list - if err == syscall.ENOENT { - // For these dirs we get only the name and no other properties so hardcoding time to current time - name := strings.TrimSuffix(*blobInfo.Name, "/") - attr := &internal.ObjAttr{ - Path: split(bb.Config.prefixPath, name), - Name: filepath.Base(name), - Size: 4096, - Mode: os.ModeDir, - Mtime: time.Now(), - Flags: internal.NewDirBitMap(), + //Check to see if its a HNS account and we received properties in blob prefixes + if bb.listDetails.Permissions { + attr, err := bb.createDirAttrWithPermissions(blobInfo) + if err != nil { + return err + } + *blobList = append(*blobList, attr) + } else { + // marker file not found in current iteration, so we need to manually check attributes via REST + _, err := bb.getAttrUsingRest(*blobInfo.Name) + // marker file also not found via manual check, safe to add to list + if err == syscall.ENOENT { + attr := bb.createDirAttr(*blobInfo.Name) + *blobList = append(*blobList, attr) } - attr.Atime = attr.Mtime - attr.Crtime = attr.Mtime - attr.Ctime = attr.Mtime - attr.Flags.Set(internal.PropFlagModeDefault) - blobList = append(blobList, attr) } } } @@ -650,7 +718,54 @@ func (bb *BlockBlob) List(prefix string, marker *string, count int32) ([]*intern delete(dirList, k) } - return blobList, listBlob.NextMarker, nil + return nil +} + +func (bb *BlockBlob) createDirAttr(name string) *internal.ObjAttr { + // For these dirs we get only the name and no other properties so hardcoding time to current time + name = strings.TrimSuffix(name, "/") + attr := &internal.ObjAttr{ + Path: removePrefixPath(bb.Config.prefixPath, name), + Name: filepath.Base(name), + Size: 4096, + Mode: os.ModeDir, + Mtime: time.Now(), + Flags: internal.NewDirBitMap(), + } + attr.Atime = attr.Mtime + attr.Crtime = attr.Mtime + attr.Ctime = attr.Mtime + + // This is called only in case of FNS when blobPrefix is there but the marker does not exists + attr.Flags.Set(internal.PropFlagModeDefault) + return attr +} + +func (bb *BlockBlob) createDirAttrWithPermissions(blobInfo *container.BlobPrefix) (*internal.ObjAttr, error) { + if blobInfo.Properties == nil { + return nil, fmt.Errorf("failed to get properties of blobprefix %s", *blobInfo.Name) + } + + mode, err := bb.getFileMode(blobInfo.Properties.Permissions) + if err != nil { + mode = 0 + log.Warn("BlockBlob::createDirAttrWithPermissions : Failed to get file mode for %s [%s]", *blobInfo.Name, err.Error()) + } + + name := strings.TrimSuffix(*blobInfo.Name, "/") + attr := &internal.ObjAttr{ + Path: removePrefixPath(bb.Config.prefixPath, name), + Name: filepath.Base(name), + Size: *blobInfo.Properties.ContentLength, + Mode: mode, + Mtime: *blobInfo.Properties.LastModified, + Atime: bb.dereferenceTime(blobInfo.Properties.LastAccessedOn, *blobInfo.Properties.LastModified), + Ctime: *blobInfo.Properties.LastModified, + Crtime: bb.dereferenceTime(blobInfo.Properties.CreationTime, *blobInfo.Properties.LastModified), + Flags: internal.NewDirBitMap(), + } + + return attr, nil } // track the progress of download of blobs where every 100MB of data downloaded is being tracked. It also tracks the completion of download @@ -773,20 +888,26 @@ func (bb *BlockBlob) ReadBuffer(name string, offset int64, len int64) ([]byte, e } // ReadInBuffer : Download specific range from a file to a user provided buffer -func (bb *BlockBlob) ReadInBuffer(name string, offset int64, len int64, data []byte) error { +func (bb *BlockBlob) ReadInBuffer(name string, offset int64, len int64, data []byte, etag *string) error { // log.Trace("BlockBlob::ReadInBuffer : name %s", name) - blobClient := bb.Container.NewBlobClient(filepath.Join(bb.Config.prefixPath, name)) - opt := (blob.DownloadBufferOptions)(*bb.downloadOptions) - opt.BlockSize = len - opt.Range = blob.HTTPRange{ - Offset: offset, - Count: len, + if etag != nil { + *etag = "" } + blobClient := bb.Container.NewBlobClient(filepath.Join(bb.Config.prefixPath, name)) + ctx, cancel := context.WithTimeout(context.Background(), max_context_timeout*time.Minute) defer cancel() - _, err := blobClient.DownloadBuffer(ctx, data, &opt) + opt := &blob.DownloadStreamOptions{ + Range: blob.HTTPRange{ + Offset: offset, + Count: len, + }, + CPKInfo: bb.blobCPKOpt, + } + + downloadResponse, err := blobClient.DownloadStream(ctx, opt) if err != nil { e := storeBlobErrToErr(err) @@ -796,10 +917,32 @@ func (bb *BlockBlob) ReadInBuffer(name string, offset int64, len int64, data []b return syscall.ERANGE } - log.Err("BlockBlob::ReadInBuffer : Failed to download blob %s [%s]", name, err.Error()) + log.Err("BlockBlob::ReadInBufferWithETag : Failed to download blob %s [%s]", name, err.Error()) return err } + var streamBody io.ReadCloser = downloadResponse.NewRetryReader(ctx, nil) + dataRead, err := io.ReadFull(streamBody, data) + + if err != nil && err != io.EOF && err != io.ErrUnexpectedEOF { + log.Err("BlockBlob::ReadInBuffer : Failed to copy data from body to buffer for blob %s [%s]", name, err.Error()) + return err + } + + if dataRead < 0 { + log.Err("BlockBlob::ReadInBuffer : Failed to copy data from body to buffer for blob %s", name) + return errors.New("failed to copy data from body to buffer") + } + + err = streamBody.Close() + if err != nil { + log.Err("BlockBlob::ReadInBuffer : Failed to close body for blob %s [%s]", name, err.Error()) + } + + if etag != nil { + *etag = strings.Trim(string(*downloadResponse.ETag), `"`) + } + return nil } @@ -1050,7 +1193,7 @@ func (bb *BlockBlob) removeBlocks(blockList *common.BlockOffsetList, size int64, blk.Data = make([]byte, blk.EndIndex-blk.StartIndex) blk.Flags.Set(common.DirtyBlock) - err := bb.ReadInBuffer(name, blk.StartIndex, blk.EndIndex-blk.StartIndex, blk.Data) + err := bb.ReadInBuffer(name, blk.StartIndex, blk.EndIndex-blk.StartIndex, blk.Data, nil) if err != nil { log.Err("BlockBlob::removeBlocks : Failed to remove blocks %s [%s]", name, err.Error()) } @@ -1106,7 +1249,7 @@ func (bb *BlockBlob) TruncateFile(name string, size int64) error { size -= blkSize } - err = bb.CommitBlocks(blobName, blkList) + err = bb.CommitBlocks(blobName, blkList, nil) if err != nil { log.Err("BlockBlob::TruncateFile : Failed to commit blocks for %s [%s]", name, err.Error()) return err @@ -1174,12 +1317,12 @@ func (bb *BlockBlob) HandleSmallFile(name string, size int64, originalSize int64 var data = make([]byte, size) var err error if size > originalSize { - err = bb.ReadInBuffer(name, 0, 0, data) + err = bb.ReadInBuffer(name, 0, 0, data, nil) if err != nil { log.Err("BlockBlob::TruncateFile : Failed to read small file %s", name, err.Error()) } } else { - err = bb.ReadInBuffer(name, 0, size, data) + err = bb.ReadInBuffer(name, 0, size, data, nil) if err != nil { log.Err("BlockBlob::TruncateFile : Failed to read small file %s", name, err.Error()) } @@ -1254,7 +1397,7 @@ func (bb *BlockBlob) Write(options internal.WriteFileOptions) error { oldDataBuffer := make([]byte, oldDataSize+newBufferSize) if !appendOnly { // fetch the blocks that will be impacted by the new changes so we can overwrite them - err = bb.ReadInBuffer(name, fileOffsets.BlockList[index].StartIndex, oldDataSize, oldDataBuffer) + err = bb.ReadInBuffer(name, fileOffsets.BlockList[index].StartIndex, oldDataSize, oldDataBuffer, nil) if err != nil { log.Err("BlockBlob::Write : Failed to read data in buffer %s [%s]", name, err.Error()) } @@ -1445,14 +1588,14 @@ func (bb *BlockBlob) StageBlock(name string, data []byte, id string) error { } // CommitBlocks : persists the block list -func (bb *BlockBlob) CommitBlocks(name string, blockList []string) error { +func (bb *BlockBlob) CommitBlocks(name string, blockList []string, newEtag *string) error { log.Trace("BlockBlob::CommitBlocks : name %s", name) ctx, cancel := context.WithTimeout(context.Background(), max_context_timeout*time.Minute) defer cancel() blobClient := bb.Container.NewBlockBlobClient(filepath.Join(bb.Config.prefixPath, name)) - _, err := blobClient.CommitBlockList(ctx, + resp, err := blobClient.CommitBlockList(ctx, blockList, &blockblob.CommitBlockListOptions{ HTTPHeaders: &blob.HTTPHeaders{ @@ -1467,5 +1610,19 @@ func (bb *BlockBlob) CommitBlocks(name string, blockList []string) error { return err } + if newEtag != nil { + *newEtag = strings.Trim(string(*resp.ETag), `"`) + } + return nil } + +func (bb *BlockBlob) SetFilter(filter string) error { + if filter == "" { + bb.Config.filter = nil + return nil + } + + bb.Config.filter = &blobfilter.BlobFilter{} + return bb.Config.filter.Configure(filter) +} diff --git a/component/azstorage/block_blob_test.go b/component/azstorage/block_blob_test.go index 11c3347d9..e81ddcdfe 100644 --- a/component/azstorage/block_blob_test.go +++ b/component/azstorage/block_blob_test.go @@ -174,7 +174,6 @@ func newTestAzStorage(configuration string) (*AzStorage, error) { _ = config.ReadConfigFromReader(strings.NewReader(configuration)) az := NewazstorageComponent() err := az.Configure(true) - return az.(*AzStorage), err } @@ -485,61 +484,6 @@ func (s *blockBlobTestSuite) setupHierarchy(base string) (*list.List, *list.List return a, ab, ac } -func (s *blockBlobTestSuite) TestDeleteDirHierarchy() { - defer s.cleanupTest() - // Setup - base := generateDirectoryName() - a, ab, ac := s.setupHierarchy(base) - - err := s.az.DeleteDir(internal.DeleteDirOptions{Name: base}) - - s.assert.Nil(err) - - /// a paths should be deleted - for p := a.Front(); p != nil; p = p.Next() { - _, err = s.containerClient.NewBlobClient(p.Value.(string)).GetProperties(ctx, nil) - s.assert.NotNil(err) - } - ab.PushBackList(ac) // ab and ac paths should exist - for p := ab.Front(); p != nil; p = p.Next() { - _, err = s.containerClient.NewBlobClient(p.Value.(string)).GetProperties(ctx, nil) - s.assert.Nil(err) - } -} - -func (s *blockBlobTestSuite) TestDeleteSubDirPrefixPath() { - defer s.cleanupTest() - // Setup - base := generateDirectoryName() - a, ab, ac := s.setupHierarchy(base) - - s.az.storage.SetPrefixPath(base) - - attr, err := s.az.GetAttr(internal.GetAttrOptions{Name: "c1"}) - s.assert.Nil(err) - s.assert.NotNil(attr) - s.assert.True(attr.IsDir()) - - err = s.az.DeleteDir(internal.DeleteDirOptions{Name: "c1"}) - s.assert.Nil(err) - - // a paths under c1 should be deleted - for p := a.Front(); p != nil; p = p.Next() { - path := p.Value.(string) - _, err = s.containerClient.NewBlobClient(path).GetProperties(ctx, nil) - if strings.HasPrefix(path, base+"/c1") { - s.assert.NotNil(err) - } else { - s.assert.Nil(err) - } - } - ab.PushBackList(ac) // ab and ac paths should exist - for p := ab.Front(); p != nil; p = p.Next() { - _, err = s.containerClient.NewBlobClient(p.Value.(string)).GetProperties(ctx, nil) - s.assert.Nil(err) - } -} - func (s *blockBlobTestSuite) TestDeleteDirError() { defer s.cleanupTest() // Setup @@ -1192,6 +1136,70 @@ func (s *blockBlobTestSuite) TestReadInBuffer() { s.assert.EqualValues(testData[:5], output) } +func (bbTestSuite *blockBlobTestSuite) TestReadInBufferWithETAG() { + defer bbTestSuite.cleanupTest() + // Setup + name := generateFileName() + handle, _ := bbTestSuite.az.CreateFile(internal.CreateFileOptions{Name: name}) + testData := "test data" + data := []byte(testData) + bbTestSuite.az.WriteFile(internal.WriteFileOptions{Handle: handle, Offset: 0, Data: data}) + handle, _ = bbTestSuite.az.OpenFile(internal.OpenFileOptions{Name: name}) + + output := make([]byte, 5) + var etag string + len, err := bbTestSuite.az.ReadInBuffer(internal.ReadInBufferOptions{Handle: handle, Offset: 0, Data: output, Etag: &etag}) + bbTestSuite.assert.Nil(err) + bbTestSuite.assert.NotEqual(etag, "") + bbTestSuite.assert.EqualValues(5, len) + bbTestSuite.assert.EqualValues(testData[:5], output) + _ = bbTestSuite.az.CloseFile(internal.CloseFileOptions{Handle: handle}) +} + +func (bbTestSuite *blockBlobTestSuite) TestReadInBufferWithETAGMismatch() { + defer bbTestSuite.cleanupTest() + // Setup + name := generateFileName() + handle, _ := bbTestSuite.az.CreateFile(internal.CreateFileOptions{Name: name}) + testData := "test data 12345678910" + data := []byte(testData) + bbTestSuite.az.WriteFile(internal.WriteFileOptions{Handle: handle, Offset: 0, Data: data}) + _ = bbTestSuite.az.CloseFile(internal.CloseFileOptions{Handle: handle}) + + attr, err := bbTestSuite.az.GetAttr(internal.GetAttrOptions{Name: name}) + bbTestSuite.assert.Nil(err) + bbTestSuite.assert.NotNil(attr) + bbTestSuite.assert.NotEqual("", attr.ETag) + bbTestSuite.assert.Equal(int64(len(data)), attr.Size) + + output := make([]byte, 5) + var etag string + + handle, _ = bbTestSuite.az.OpenFile(internal.OpenFileOptions{Name: name}) + _, err = bbTestSuite.az.ReadInBuffer(internal.ReadInBufferOptions{Handle: handle, Offset: 0, Data: output, Etag: &etag}) + bbTestSuite.assert.Nil(err) + bbTestSuite.assert.NotEqual(etag, "") + etag = strings.Trim(etag, `"`) + bbTestSuite.assert.Equal(etag, attr.ETag) + + // Update the file in parallel using another handle + handle1, err := bbTestSuite.az.OpenFile(internal.OpenFileOptions{Name: name}) + bbTestSuite.assert.Nil(err) + testData = "test data 12345678910 123123123123123123123" + data = []byte(testData) + bbTestSuite.az.WriteFile(internal.WriteFileOptions{Handle: handle1, Offset: 0, Data: data}) + _ = bbTestSuite.az.CloseFile(internal.CloseFileOptions{Handle: handle1}) + + // Read data back using older handle + _, err = bbTestSuite.az.ReadInBuffer(internal.ReadInBufferOptions{Handle: handle, Offset: 5, Data: output, Etag: &etag}) + bbTestSuite.assert.Nil(err) + bbTestSuite.assert.NotEqual(etag, "") + etag = strings.Trim(etag, `"`) + bbTestSuite.assert.NotEqual(etag, attr.ETag) + + _ = bbTestSuite.az.CloseFile(internal.CloseFileOptions{Handle: handle}) +} + func (s *blockBlobTestSuite) TestReadInBufferLargeBuffer() { defer s.cleanupTest() // Setup @@ -2377,7 +2385,7 @@ func (s *blockBlobTestSuite) TestFlushFileUpdateChunkedFile() { updatedBlock := make([]byte, 2*MB) rand.Read(updatedBlock) h.CacheObj.BlockOffsetList.BlockList[1].Data = make([]byte, blockSize) - s.az.storage.ReadInBuffer(name, int64(blockSize), int64(blockSize), h.CacheObj.BlockOffsetList.BlockList[1].Data) + s.az.storage.ReadInBuffer(name, int64(blockSize), int64(blockSize), h.CacheObj.BlockOffsetList.BlockList[1].Data, nil) copy(h.CacheObj.BlockOffsetList.BlockList[1].Data[MB:2*MB+MB], updatedBlock) h.CacheObj.BlockOffsetList.BlockList[1].Flags.Set(common.DirtyBlock) @@ -2414,7 +2422,7 @@ func (s *blockBlobTestSuite) TestFlushFileTruncateUpdateChunkedFile() { // truncate block h.CacheObj.BlockOffsetList.BlockList[1].Data = make([]byte, blockSize/2) h.CacheObj.BlockOffsetList.BlockList[1].EndIndex = int64(blockSize + blockSize/2) - s.az.storage.ReadInBuffer(name, int64(blockSize), int64(blockSize)/2, h.CacheObj.BlockOffsetList.BlockList[1].Data) + s.az.storage.ReadInBuffer(name, int64(blockSize), int64(blockSize)/2, h.CacheObj.BlockOffsetList.BlockList[1].Data, nil) h.CacheObj.BlockOffsetList.BlockList[1].Flags.Set(common.DirtyBlock) // remove 2 blocks @@ -3259,7 +3267,7 @@ func (s *blockBlobTestSuite) TestDownloadBlobWithCPKEnabled() { s.assert.EqualValues(data, fileData) buf := make([]byte, len(data)) - err = s.az.storage.ReadInBuffer(name, 0, int64(len(data)), buf) + err = s.az.storage.ReadInBuffer(name, 0, int64(len(data)), buf, nil) s.assert.Nil(err) s.assert.EqualValues(data, buf) @@ -3387,6 +3395,87 @@ func (suite *blockBlobTestSuite) TestTruncateNoBlockFileToLarger() { suite.UtilityFunctionTruncateFileToLarger(200*MB, 300*MB) } +func (s *blockBlobTestSuite) TestBlobFilters() { + defer s.cleanupTest() + // Setup + var err error + name := generateDirectoryName() + err = s.az.CreateDir(internal.CreateDirOptions{Name: name}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/abcd1.txt"}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/abcd2.txt"}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/abcd3.txt"}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/abcd4.txt"}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/bcd1.txt"}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/cd1.txt"}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/d1.txt"}) + s.assert.Nil(err) + err = s.az.CreateDir(internal.CreateDirOptions{Name: name + "/subdir"}) + s.assert.Nil(err) + + var iteration int = 0 + var marker string = "" + blobList := make([]*internal.ObjAttr, 0) + + for { + new_list, new_marker, err := s.az.StreamDir(internal.StreamDirOptions{Name: name + "/", Token: marker, Count: 50}) + s.assert.Nil(err) + blobList = append(blobList, new_list...) + marker = new_marker + iteration++ + + log.Debug("AzStorage::ReadDir : So far retrieved %d objects in %d iterations", len(blobList), iteration) + if new_marker == "" { + break + } + } + s.assert.EqualValues(8, len(blobList)) + err = s.az.storage.(*BlockBlob).SetFilter("name=^abcd.*") + s.assert.Nil(err) + + blobList = make([]*internal.ObjAttr, 0) + for { + new_list, new_marker, err := s.az.StreamDir(internal.StreamDirOptions{Name: name + "/", Token: marker, Count: 50}) + s.assert.Nil(err) + blobList = append(blobList, new_list...) + marker = new_marker + iteration++ + + log.Debug("AzStorage::ReadDir : So far retrieved %d objects in %d iterations", len(blobList), iteration) + if new_marker == "" { + break + } + } + // Only 4 files matches the pattern but there is a directory as well and directories are not filtered by blobfilter + s.assert.EqualValues(5, len(blobList)) + err = s.az.storage.(*BlockBlob).SetFilter("name=^bla.*") + s.assert.Nil(err) + + blobList = make([]*internal.ObjAttr, 0) + for { + new_list, new_marker, err := s.az.StreamDir(internal.StreamDirOptions{Name: name + "/", Token: marker, Count: 50}) + s.assert.Nil(err) + blobList = append(blobList, new_list...) + marker = new_marker + iteration++ + + log.Debug("AzStorage::ReadDir : So far retrieved %d objects in %d iterations", len(blobList), iteration) + if new_marker == "" { + break + } + } + + s.assert.EqualValues(1, len(blobList)) + err = s.az.storage.(*BlockBlob).SetFilter("") + s.assert.Nil(err) +} + func (suite *blockBlobTestSuite) UtilityFunctionTestTruncateFileToSmaller(size int, truncatedLength int) { defer suite.cleanupTest() // Setup @@ -3452,6 +3541,47 @@ func (suite *blockBlobTestSuite) UtilityFunctionTruncateFileToLarger(size int, t } +func (s *blockBlobTestSuite) TestList() { + defer s.cleanupTest() + // Setup + s.tearDownTestHelper(false) // Don't delete the generated container. + config := fmt.Sprintf("azstorage:\n account-name: %s\n endpoint: https://%s.dfs.core.windows.net/\n type: block\n account-key: %s\n mode: key\n container: %s\n fail-unsupported-op: true", + storageTestConfigurationParameters.BlockAccount, storageTestConfigurationParameters.BlockAccount, storageTestConfigurationParameters.BlockKey, s.container) + s.setupTestHelper(config, s.container, true) + + base := generateDirectoryName() + s.setupHierarchy(base) + + blobList, marker, err := s.az.storage.List("", nil, 0) + s.assert.Nil(err) + emptyString := "" + s.assert.Equal(&emptyString, marker) + s.assert.NotNil(blobList) + s.assert.EqualValues(3, len(blobList)) + + // Test listing with prefix + blobList, marker, err = s.az.storage.List(base+"b/", nil, 0) + s.assert.Nil(err) + s.assert.Equal(&emptyString, marker) + s.assert.NotNil(blobList) + s.assert.EqualValues(1, len(blobList)) + s.assert.EqualValues("c1", blobList[0].Name) + + // Test listing with marker + blobList, marker, err = s.az.storage.List(base, to.Ptr("invalid-marker"), 0) + s.assert.NotNil(err) + s.assert.Equal(0, len(blobList)) + s.assert.Nil(marker) + + // Test listing with count + blobList, marker, err = s.az.storage.List("", nil, 1) + s.assert.Nil(err) + s.assert.NotNil(blobList) + s.assert.NotEmpty(marker) + s.assert.EqualValues(1, len(blobList)) + s.assert.EqualValues(base, blobList[0].Path) +} + // In order for 'go test' to run this suite, we need to create // a normal test function and pass our suite to suite.Run func TestBlockBlob(t *testing.T) { diff --git a/component/azstorage/config.go b/component/azstorage/config.go index 7b9715a39..86907a3c3 100644 --- a/component/azstorage/config.go +++ b/component/azstorage/config.go @@ -42,6 +42,7 @@ import ( "github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blockblob" "github.com/Azure/azure-storage-fuse/v2/common/config" "github.com/Azure/azure-storage-fuse/v2/common/log" + "github.com/vibhansa-msft/blobfilter" "github.com/JeffreyRichter/enum/enum" ) @@ -124,24 +125,25 @@ const DefaultMaxResultsForList int32 = 2 // https://github.com/Azure/go-autorest/blob/a46566dfcbdc41e736295f94e9f690ceaf50094a/autorest/adal/token.go#L788 // newServicePrincipalTokenFromMSI : reads them directly from env const ( - EnvAzStorageAccount = "AZURE_STORAGE_ACCOUNT" - EnvAzStorageAccountType = "AZURE_STORAGE_ACCOUNT_TYPE" - EnvAzStorageAccessKey = "AZURE_STORAGE_ACCESS_KEY" - EnvAzStorageSasToken = "AZURE_STORAGE_SAS_TOKEN" - EnvAzStorageIdentityClientId = "AZURE_STORAGE_IDENTITY_CLIENT_ID" - EnvAzStorageIdentityResourceId = "AZURE_STORAGE_IDENTITY_RESOURCE_ID" - EnvAzStorageIdentityObjectId = "AZURE_STORAGE_IDENTITY_OBJECT_ID" - EnvAzStorageSpnTenantId = "AZURE_STORAGE_SPN_TENANT_ID" - EnvAzStorageSpnClientId = "AZURE_STORAGE_SPN_CLIENT_ID" - EnvAzStorageSpnClientSecret = "AZURE_STORAGE_SPN_CLIENT_SECRET" - EnvAzStorageSpnOAuthTokenFilePath = "AZURE_OAUTH_TOKEN_FILE" - EnvAzStorageAadEndpoint = "AZURE_STORAGE_AAD_ENDPOINT" - EnvAzStorageAuthType = "AZURE_STORAGE_AUTH_TYPE" - EnvAzStorageBlobEndpoint = "AZURE_STORAGE_BLOB_ENDPOINT" - EnvAzStorageAccountContainer = "AZURE_STORAGE_ACCOUNT_CONTAINER" - EnvAzAuthResource = "AZURE_STORAGE_AUTH_RESOURCE" - EnvAzStorageCpkEncryptionKey = "AZURE_STORAGE_CPK_ENCRYPTION_KEY" - EnvAzStorageCpkEncryptionKeySha256 = "AZURE_STORAGE_CPK_ENCRYPTION_KEY_SHA256" + EnvAzStorageAccount = "AZURE_STORAGE_ACCOUNT" + EnvAzStorageAccountType = "AZURE_STORAGE_ACCOUNT_TYPE" + EnvAzStorageAccessKey = "AZURE_STORAGE_ACCESS_KEY" + EnvAzStorageSasToken = "AZURE_STORAGE_SAS_TOKEN" + EnvAzStorageIdentityClientId = "AZURE_STORAGE_IDENTITY_CLIENT_ID" + EnvAzStorageIdentityResourceId = "AZURE_STORAGE_IDENTITY_RESOURCE_ID" + EnvAzStorageIdentityObjectId = "AZURE_STORAGE_IDENTITY_OBJECT_ID" + EnvAzStorageSpnTenantId = "AZURE_STORAGE_SPN_TENANT_ID" + EnvAzStorageSpnClientId = "AZURE_STORAGE_SPN_CLIENT_ID" + EnvAzStorageSpnClientSecret = "AZURE_STORAGE_SPN_CLIENT_SECRET" + EnvAzStorageSpnOAuthTokenFilePath = "AZURE_OAUTH_TOKEN_FILE" + EnvAzStorageSpnWorkloadIdentityToken = "WORKLOAD_IDENTITY_TOKEN" + EnvAzStorageAadEndpoint = "AZURE_STORAGE_AAD_ENDPOINT" + EnvAzStorageAuthType = "AZURE_STORAGE_AUTH_TYPE" + EnvAzStorageBlobEndpoint = "AZURE_STORAGE_BLOB_ENDPOINT" + EnvAzStorageAccountContainer = "AZURE_STORAGE_ACCOUNT_CONTAINER" + EnvAzAuthResource = "AZURE_STORAGE_AUTH_RESOURCE" + EnvAzStorageCpkEncryptionKey = "AZURE_STORAGE_CPK_ENCRYPTION_KEY" + EnvAzStorageCpkEncryptionKeySha256 = "AZURE_STORAGE_CPK_ENCRYPTION_KEY_SHA256" ) type AzStorageOptions struct { @@ -157,6 +159,7 @@ type AzStorageOptions struct { ClientID string `config:"clientid" yaml:"clientid,omitempty"` ClientSecret string `config:"clientsecret" yaml:"clientsecret,omitempty"` OAuthTokenFilePath string `config:"oauth-token-path" yaml:"oauth-token-path,omitempty"` + WorkloadIdentityToken string `config:"workload-identity-token" yaml:"workload-identity-token,omitempty"` ActiveDirectoryEndpoint string `config:"aadendpoint" yaml:"aadendpoint,omitempty"` Endpoint string `config:"endpoint" yaml:"endpoint,omitempty"` AuthMode string `config:"mode" yaml:"mode,omitempty"` @@ -185,6 +188,7 @@ type AzStorageOptions struct { CPKEncryptionKey string `config:"cpk-encryption-key" yaml:"cpk-encryption-key"` CPKEncryptionKeySha256 string `config:"cpk-encryption-key-sha256" yaml:"cpk-encryption-key-sha256"` PreserveACL bool `config:"preserve-acl" yaml:"preserve-acl"` + Filter string `config:"filter" yaml:"filter"` // v1 support UseAdls bool `config:"use-adls" yaml:"-"` @@ -209,6 +213,7 @@ func RegisterEnvVariables() { config.BindEnv("azstorage.clientid", EnvAzStorageSpnClientId) config.BindEnv("azstorage.clientsecret", EnvAzStorageSpnClientSecret) config.BindEnv("azstorage.oauth-token-path", EnvAzStorageSpnOAuthTokenFilePath) + config.BindEnv("azstorage.workload-identity-token", EnvAzStorageSpnWorkloadIdentityToken) config.BindEnv("azstorage.objid", EnvAzStorageIdentityObjectId) @@ -451,14 +456,15 @@ func ParseAndValidateConfig(az *AzStorage, opt AzStorageOptions) error { az.stConfig.authConfig.ResourceID = opt.ResourceID case EAuthType.SPN(): az.stConfig.authConfig.AuthMode = EAuthType.SPN() - if opt.ClientID == "" || (opt.ClientSecret == "" && opt.OAuthTokenFilePath == "") || opt.TenantID == "" { + if opt.ClientID == "" || (opt.ClientSecret == "" && opt.OAuthTokenFilePath == "" && opt.WorkloadIdentityToken == "") || opt.TenantID == "" { //lint:ignore ST1005 ignore - return errors.New("Client ID, Tenant ID or Client Secret not provided") + return errors.New("Client ID, Tenant ID or Client Secret, OAuthTokenFilePath, WorkloadIdentityToken not provided") } az.stConfig.authConfig.ClientID = opt.ClientID az.stConfig.authConfig.ClientSecret = opt.ClientSecret az.stConfig.authConfig.TenantID = opt.TenantID az.stConfig.authConfig.OAuthTokenFilePath = opt.OAuthTokenFilePath + az.stConfig.authConfig.WorkloadIdentityToken = opt.WorkloadIdentityToken case EAuthType.AZCLI(): az.stConfig.authConfig.AuthMode = EAuthType.AZCLI() @@ -500,6 +506,12 @@ func ParseAndValidateConfig(az *AzStorage, opt AzStorageOptions) error { } az.stConfig.preserveACL = opt.PreserveACL + if opt.Filter != "" { + err = configureBlobFilter(az, opt) + if err != nil { + return err + } + } log.Crit("ParseAndValidateConfig : account %s, container %s, account-type %s, auth %s, prefix %s, endpoint %s, MD5 %v %v, virtual-directory %v, disable-compression %v, CPK %v", az.stConfig.authConfig.AccountName, az.stConfig.container, az.stConfig.authConfig.AccountType, az.stConfig.authConfig.AuthMode, @@ -508,11 +520,30 @@ func ParseAndValidateConfig(az *AzStorage, opt AzStorageOptions) error { log.Crit("ParseAndValidateConfig : Retry Config: retry-count %d, max-timeout %d, backoff-time %d, max-delay %d, preserve-acl: %v", az.stConfig.maxRetries, az.stConfig.maxTimeout, az.stConfig.backoffTime, az.stConfig.maxRetryDelay, az.stConfig.preserveACL) - log.Crit("ParseAndValidateConfig : Telemetry : %s, honour-ACL %v, disable-symlink %v", az.stConfig.telemetry, az.stConfig.honourACL, az.stConfig.disableSymlink) + log.Crit("ParseAndValidateConfig : Telemetry : %s, honour-ACL %v", az.stConfig.telemetry, az.stConfig.honourACL) return nil } +func configureBlobFilter(azStorage *AzStorage, opt AzStorageOptions) error { + readonly := false + _ = config.UnmarshalKey("read-only", &readonly) + if !readonly { + log.Err("configureBlobFilter: Blob filters are supported only in read-only mode") + return errors.New("blobfilter is supported only in read-only mode") + } + + azStorage.stConfig.filter = &blobfilter.BlobFilter{} + err := azStorage.stConfig.filter.Configure(opt.Filter) + if err != nil { + log.Err("configureBlobFilter : Failed to configure blob filter %s", err.Error()) + return errors.New("failed to configure blob filter") + } + + log.Crit("configureBlobFilter : Blob filter configured %s", opt.Filter) + return nil +} + // ParseAndReadDynamicConfig : On config change read only the required config func ParseAndReadDynamicConfig(az *AzStorage, opt AzStorageOptions, reload bool) error { log.Trace("ParseAndReadDynamicConfig : Reparsing config") @@ -560,16 +591,6 @@ func ParseAndReadDynamicConfig(az *AzStorage, opt AzStorageOptions, reload bool) az.stConfig.honourACL = false } - // by default symlink will be disabled - az.stConfig.disableSymlink = true - - if config.IsSet("attr_cache.no-symlinks") { - err := config.UnmarshalKey("attr_cache.no-symlinks", &az.stConfig.disableSymlink) - if err != nil { - log.Err("ParseAndReadDynamicConfig : Failed to unmarshal attr_cache.no-symlinks") - } - } - // Auth related reconfig switch opt.AuthMode { case "sas": diff --git a/component/azstorage/config_test.go b/component/azstorage/config_test.go index f13ff2162..c9153899a 100644 --- a/component/azstorage/config_test.go +++ b/component/azstorage/config_test.go @@ -340,13 +340,13 @@ func (s *configTestSuite) TestAuthModeSPN() { err := ParseAndValidateConfig(az, opt) assert.NotNil(err) assert.Equal(az.stConfig.authConfig.AuthMode, EAuthType.SPN()) - assert.Contains(err.Error(), "Client ID, Tenant ID or Client Secret not provided") + assert.Contains(err.Error(), "Client ID, Tenant ID or Client Secret, OAuthTokenFilePath, WorkloadIdentityToken not provided") opt.ClientID = "abc" err = ParseAndValidateConfig(az, opt) assert.NotNil(err) assert.Equal(az.stConfig.authConfig.AuthMode, EAuthType.SPN()) - assert.Contains(err.Error(), "Client ID, Tenant ID or Client Secret not provided") + assert.Contains(err.Error(), "Client ID, Tenant ID or Client Secret, OAuthTokenFilePath, WorkloadIdentityToken not provided") opt.ClientSecret = "123" opt.TenantID = "xyz" diff --git a/component/azstorage/connection.go b/component/azstorage/connection.go index de73e4cb5..6fa6be166 100644 --- a/component/azstorage/connection.go +++ b/component/azstorage/connection.go @@ -40,6 +40,7 @@ import ( "github.com/Azure/azure-storage-fuse/v2/common" "github.com/Azure/azure-storage-fuse/v2/common/log" "github.com/Azure/azure-storage-fuse/v2/internal" + "github.com/vibhansa-msft/blobfilter" ) // Example for azblob usage : https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob#pkg-examples @@ -73,15 +74,17 @@ type AzStorageConfig struct { maxResultsForList int32 disableCompression bool - telemetry string - honourACL bool - disableSymlink bool - preserveACL bool + telemetry string + honourACL bool + preserveACL bool // CPK related config cpkEnabled bool cpkEncryptionKey string cpkEncryptionKeySha256 string + + // Blob filters + filter *blobfilter.BlobFilter } type AzStorageConnection struct { @@ -107,7 +110,7 @@ type AzConnection interface { DeleteFile(name string) error DeleteDirectory(name string) error - RenameFile(string, string) error + RenameFile(string, string, *internal.ObjAttr) error RenameDirectory(string, string) error GetAttr(name string) (attr *internal.ObjAttr, err error) @@ -117,7 +120,7 @@ type AzConnection interface { ReadToFile(name string, offset int64, count int64, fi *os.File) error ReadBuffer(name string, offset int64, len int64) ([]byte, error) - ReadInBuffer(name string, offset int64, len int64, data []byte) error + ReadInBuffer(name string, offset int64, len int64, data []byte, etag *string) error WriteFromFile(name string, metadata map[string]*string, fi *os.File) error WriteFromBuffer(name string, metadata map[string]*string, data []byte) error @@ -131,9 +134,11 @@ type AzConnection interface { GetCommittedBlockList(string) (*internal.CommittedBlockList, error) StageBlock(string, []byte, string) error - CommitBlocks(string, []string) error + CommitBlocks(string, []string, *string) error UpdateServiceClient(_, _ string) error + + SetFilter(string) error } // NewAzStorageConnection : Based on account type create respective AzConnection Object diff --git a/component/azstorage/datalake.go b/component/azstorage/datalake.go index e23d69d34..73c68390f 100644 --- a/component/azstorage/datalake.go +++ b/component/azstorage/datalake.go @@ -36,17 +36,16 @@ package azstorage import ( "context" "fmt" - "io/fs" "net/url" "os" "path/filepath" "strings" "syscall" - "time" "github.com/Azure/azure-storage-fuse/v2/common" "github.com/Azure/azure-storage-fuse/v2/common/log" "github.com/Azure/azure-storage-fuse/v2/internal" + "github.com/vibhansa-msft/blobfilter" "github.com/Azure/azure-sdk-for-go/sdk/azcore" "github.com/Azure/azure-sdk-for-go/sdk/azcore/to" @@ -103,7 +102,13 @@ func (dl *Datalake) Configure(cfg AzStorageConfig) error { EncryptionAlgorithm: to.Ptr(directory.EncryptionAlgorithmTypeAES256), } } - return dl.BlockBlob.Configure(transformConfig(cfg)) + + err := dl.BlockBlob.Configure(transformConfig(cfg)) + + // List call shall always retrieved permissions for HNS accounts + dl.BlockBlob.listDetails.Permissions = true + + return err } // For dynamic config update the config here @@ -313,12 +318,13 @@ func (dl *Datalake) DeleteDirectory(name string) (err error) { } // RenameFile : Rename the file -func (dl *Datalake) RenameFile(source string, target string) error { +// While renaming the file, Creation time is preserved but LMT is changed for the destination blob. +func (dl *Datalake) RenameFile(source string, target string, srcAttr *internal.ObjAttr) error { log.Trace("Datalake::RenameFile : %s -> %s", source, target) fileClient := dl.Filesystem.NewFileClient(url.PathEscape(filepath.Join(dl.Config.prefixPath, source))) - _, err := fileClient.Rename(context.Background(), filepath.Join(dl.Config.prefixPath, target), &file.RenameOptions{ + renameResponse, err := fileClient.Rename(context.Background(), filepath.Join(dl.Config.prefixPath, target), &file.RenameOptions{ CPKInfo: dl.datalakeCPKOpt, }) if err != nil { @@ -331,7 +337,7 @@ func (dl *Datalake) RenameFile(source string, target string) error { return err } } - + modifyLMT(srcAttr, renameResponse.LastModified) return nil } @@ -358,7 +364,7 @@ func (dl *Datalake) RenameDirectory(source string, target string) error { } // GetAttr : Retrieve attributes of the path -func (dl *Datalake) GetAttr(name string) (attr *internal.ObjAttr, err error) { +func (dl *Datalake) GetAttr(name string) (blobAttr *internal.ObjAttr, err error) { log.Trace("Datalake::GetAttr : name %s", name) fileClient := dl.Filesystem.NewFileClient(filepath.Join(dl.Config.prefixPath, name)) @@ -368,23 +374,23 @@ func (dl *Datalake) GetAttr(name string) (attr *internal.ObjAttr, err error) { if err != nil { e := storeDatalakeErrToErr(err) if e == ErrFileNotFound { - return attr, syscall.ENOENT + return blobAttr, syscall.ENOENT } else if e == InvalidPermission { log.Err("Datalake::GetAttr : Insufficient permissions for %s [%s]", name, err.Error()) - return attr, syscall.EACCES + return blobAttr, syscall.EACCES } else { log.Err("Datalake::GetAttr : Failed to get path properties for %s [%s]", name, err.Error()) - return attr, err + return blobAttr, err } } mode, err := getFileMode(*prop.Permissions) if err != nil { log.Err("Datalake::GetAttr : Failed to get file mode for %s [%s]", name, err.Error()) - return attr, err + return blobAttr, err } - attr = &internal.ObjAttr{ + blobAttr = &internal.ObjAttr{ Path: name, Name: filepath.Base(name), Size: *prop.ContentLength, @@ -394,12 +400,13 @@ func (dl *Datalake) GetAttr(name string) (attr *internal.ObjAttr, err error) { Ctime: *prop.LastModified, Crtime: *prop.LastModified, Flags: internal.NewFileBitMap(), + ETag: (string)(*prop.ETag), } - parseMetadata(attr, prop.Metadata) + parseMetadata(blobAttr, prop.Metadata) if *prop.ResourceType == "directory" { - attr.Flags = internal.NewDirBitMap() - attr.Mode = attr.Mode | os.ModeDir + blobAttr.Flags = internal.NewDirBitMap() + blobAttr.Mode = blobAttr.Mode | os.ModeDir } if dl.Config.honourACL && dl.Config.authConfig.ObjectID != "" { @@ -412,128 +419,30 @@ func (dl *Datalake) GetAttr(name string) (attr *internal.ObjAttr, err error) { if err != nil { log.Err("Datalake::GetAttr : Failed to get file mode from ACL for %s [%s]", name, err.Error()) } else { - attr.Mode = mode + blobAttr.Mode = mode } } } - return attr, nil + if dl.Config.filter != nil { + if !dl.Config.filter.IsAcceptable(&blobfilter.BlobAttr{ + Name: blobAttr.Name, + Mtime: blobAttr.Mtime, + Size: blobAttr.Size, + }) { + log.Debug("Datalake::GetAttr : Filtered out %s", name) + return nil, syscall.ENOENT + } + } + + return blobAttr, nil } // List : Get a list of path matching the given prefix // This fetches the list using a marker so the caller code should handle marker logic // If count=0 - fetch max entries func (dl *Datalake) List(prefix string, marker *string, count int32) ([]*internal.ObjAttr, *string, error) { - log.Trace("Datalake::List : prefix %s, marker %s", prefix, func(marker *string) string { - if marker != nil { - return *marker - } else { - return "" - } - }(marker)) - - pathList := make([]*internal.ObjAttr, 0) - - if count == 0 { - count = common.MaxDirListCount - } - - prefixPath := filepath.Join(dl.Config.prefixPath, prefix) - if prefix != "" && prefix[len(prefix)-1] == '/' { - prefixPath += "/" - } - - // Get a result segment starting with the path indicated by the current Marker. - pager := dl.Filesystem.NewListPathsPager(false, &filesystem.ListPathsOptions{ - Marker: marker, - MaxResults: &count, - Prefix: &prefixPath, - }) - - // Process the paths returned in this result segment (if the segment is empty, the loop body won't execute) - listPath, err := pager.NextPage(context.Background()) - if err != nil { - log.Err("Datalake::List : Failed to validate account with given auth %s", err.Error()) - m := "" - e := storeDatalakeErrToErr(err) - if e == ErrFileNotFound { // TODO: should this be checked for list calls - return pathList, &m, syscall.ENOENT - } else if e == InvalidPermission { - return pathList, &m, syscall.EACCES - } else { - return pathList, &m, err - } - } - - // Process the paths returned in this result segment (if the segment is empty, the loop body won't execute) - for _, pathInfo := range listPath.Paths { - var attr *internal.ObjAttr - var lastModifiedTime time.Time - if dl.Config.disableSymlink { - var mode fs.FileMode - if pathInfo.Permissions != nil { - mode, err = getFileMode(*pathInfo.Permissions) - if err != nil { - log.Err("Datalake::List : Failed to get file mode for %s [%s]", *pathInfo.Name, err.Error()) - m := "" - return pathList, &m, err - } - } else { - // This happens when a blob account is mounted with type:adls - log.Err("Datalake::List : Failed to get file permissions for %s", *pathInfo.Name) - } - - var contentLength int64 = 0 - if pathInfo.ContentLength != nil { - contentLength = *pathInfo.ContentLength - } else { - // This happens when a blob account is mounted with type:adls - log.Err("Datalake::List : Failed to get file length for %s", *pathInfo.Name) - } - - if pathInfo.LastModified != nil { - lastModifiedTime, err = time.Parse(time.RFC1123, *pathInfo.LastModified) - if err != nil { - log.Err("Datalake::List : Failed to get last modified time for %s [%s]", *pathInfo.Name, err.Error()) - } - } - attr = &internal.ObjAttr{ - Path: *pathInfo.Name, - Name: filepath.Base(*pathInfo.Name), - Size: contentLength, - Mode: mode, - Mtime: lastModifiedTime, - Atime: lastModifiedTime, - Ctime: lastModifiedTime, - Crtime: lastModifiedTime, - Flags: internal.NewFileBitMap(), - } - if pathInfo.IsDirectory != nil && *pathInfo.IsDirectory { - attr.Flags = internal.NewDirBitMap() - attr.Mode = attr.Mode | os.ModeDir - } - } else { - attr, err = dl.GetAttr(*pathInfo.Name) - if err != nil { - log.Err("Datalake::List : Failed to get properties for %s [%s]", *pathInfo.Name, err.Error()) - m := "" - return pathList, &m, err - } - } - - // Note: Datalake list paths does not return metadata/properties. - // To account for this and accurately return attributes when needed, - // we have a flag for whether or not metadata has been retrieved. - // If this flag is not set the attribute cache will call get attributes - // to fetch metadata properties. - // Any method that populates the metadata should set the attribute flag. - // Alternatively, if you want Datalake list paths to return metadata/properties as well. - // pass CLI parameter --no-symlinks=false in the mount command. - pathList = append(pathList, attr) - - } - - return pathList, listPath.Continuation, nil + return dl.BlockBlob.List(prefix, marker, count) } // ReadToFile : Download a file to a local file @@ -547,8 +456,8 @@ func (dl *Datalake) ReadBuffer(name string, offset int64, len int64) ([]byte, er } // ReadInBuffer : Download specific range from a file to a user provided buffer -func (dl *Datalake) ReadInBuffer(name string, offset int64, len int64, data []byte) error { - return dl.BlockBlob.ReadInBuffer(name, offset, len, data) +func (dl *Datalake) ReadInBuffer(name string, offset int64, len int64, data []byte, etag *string) error { + return dl.BlockBlob.ReadInBuffer(name, offset, len, data, etag) } // WriteFromFile : Upload local file to file @@ -686,6 +595,20 @@ func (dl *Datalake) StageBlock(name string, data []byte, id string) error { } // CommitBlocks : persists the block list -func (dl *Datalake) CommitBlocks(name string, blockList []string) error { - return dl.BlockBlob.CommitBlocks(name, blockList) +func (dl *Datalake) CommitBlocks(name string, blockList []string, newEtag *string) error { + return dl.BlockBlob.CommitBlocks(name, blockList, newEtag) +} + +func (dl *Datalake) SetFilter(filter string) error { + if filter == "" { + dl.Config.filter = nil + } else { + dl.Config.filter = &blobfilter.BlobFilter{} + err := dl.Config.filter.Configure(filter) + if err != nil { + return err + } + } + + return dl.BlockBlob.SetFilter(filter) } diff --git a/component/azstorage/datalake_test.go b/component/azstorage/datalake_test.go index f48006131..236efd3a8 100644 --- a/component/azstorage/datalake_test.go +++ b/component/azstorage/datalake_test.go @@ -123,7 +123,6 @@ func (s *datalakeTestSuite) setupTestHelper(configuration string, container stri storageTestConfigurationParameters.AdlsAccount, storageTestConfigurationParameters.AdlsAccount, storageTestConfigurationParameters.AdlsKey, s.container) } s.config = configuration - s.assert = assert.New(s.T()) s.az, _ = newTestAzStorage(configuration) @@ -458,7 +457,7 @@ func (s *datalakeTestSuite) TestIsDirEmptyError() { empty := s.az.IsDirEmpty(internal.IsDirEmptyOptions{Name: name}) - s.assert.False(empty) // Note: See comment in BlockBlob.List. BlockBlob behaves differently from Datalake + s.assert.True(empty) // Note: See comment in BlockBlob.List. BlockBlob behaves differently from Datalake // Directory should not be in the account dir := s.containerClient.NewDirectoryClient(name) @@ -494,19 +493,20 @@ func (s *datalakeTestSuite) TestReadDirHierarchy() { s.setupHierarchy(base) // ReadDir only reads the first level of the hierarchy + //Using listblob api lists the files before directories so the order is reversed entries, err := s.az.ReadDir(internal.ReadDirOptions{Name: base}) s.assert.Nil(err) s.assert.EqualValues(2, len(entries)) // Check the dir - s.assert.EqualValues(base+"/c1", entries[0].Path) - s.assert.EqualValues("c1", entries[0].Name) - s.assert.True(entries[0].IsDir()) - s.assert.False(entries[0].IsModeDefault()) - // Check the file - s.assert.EqualValues(base+"/c2", entries[1].Path) - s.assert.EqualValues("c2", entries[1].Name) - s.assert.False(entries[1].IsDir()) + s.assert.EqualValues(base+"/c1", entries[1].Path) + s.assert.EqualValues("c1", entries[1].Name) + s.assert.True(entries[1].IsDir()) s.assert.False(entries[1].IsModeDefault()) + // Check the file + s.assert.EqualValues(base+"/c2", entries[0].Path) + s.assert.EqualValues("c2", entries[0].Name) + s.assert.False(entries[0].IsDir()) + s.assert.False(entries[0].IsModeDefault()) } func (s *datalakeTestSuite) TestReadDirRoot() { @@ -524,21 +524,22 @@ func (s *datalakeTestSuite) TestReadDirRoot() { entries, err := s.az.ReadDir(internal.ReadDirOptions{Name: path}) s.assert.Nil(err) s.assert.EqualValues(3, len(entries)) + //Listblob api lists files before directories so the order is reversed // Check the base dir - s.assert.EqualValues(base, entries[0].Path) - s.assert.EqualValues(base, entries[0].Name) - s.assert.True(entries[0].IsDir()) - s.assert.False(entries[0].IsModeDefault()) - // Check the baseb dir - s.assert.EqualValues(base+"b", entries[1].Path) - s.assert.EqualValues(base+"b", entries[1].Name) + s.assert.EqualValues(base, entries[1].Path) + s.assert.EqualValues(base, entries[1].Name) s.assert.True(entries[1].IsDir()) s.assert.False(entries[1].IsModeDefault()) - // Check the basec file - s.assert.EqualValues(base+"c", entries[2].Path) - s.assert.EqualValues(base+"c", entries[2].Name) - s.assert.False(entries[2].IsDir()) + // Check the baseb dir + s.assert.EqualValues(base+"b", entries[2].Path) + s.assert.EqualValues(base+"b", entries[2].Name) + s.assert.True(entries[2].IsDir()) s.assert.False(entries[2].IsModeDefault()) + // Check the basec file + s.assert.EqualValues(base+"c", entries[0].Path) + s.assert.EqualValues(base+"c", entries[0].Name) + s.assert.False(entries[0].IsDir()) + s.assert.False(entries[0].IsModeDefault()) }) } } @@ -573,7 +574,7 @@ func (s *datalakeTestSuite) TestReadDirSubDirPrefixPath() { s.assert.Nil(err) s.assert.EqualValues(1, len(entries)) // Check the dir - s.assert.EqualValues(base+"/c1"+"/gc1", entries[0].Path) + s.assert.EqualValues("c1"+"/gc1", entries[0].Path) s.assert.EqualValues("gc1", entries[0].Name) s.assert.False(entries[0].IsDir()) s.assert.False(entries[0].IsModeDefault()) @@ -586,7 +587,7 @@ func (s *datalakeTestSuite) TestReadDirError() { entries, err := s.az.ReadDir(internal.ReadDirOptions{Name: name}) - s.assert.NotNil(err) // Note: See comment in BlockBlob.List. BlockBlob behaves differently from Datalake + s.assert.Nil(err) // Note: See comment in BlockBlob.List. BlockBlob behaves differently from Datalake s.assert.Empty(entries) // Directory should not be in the account dir := s.containerClient.NewDirectoryClient(name) @@ -1401,6 +1402,26 @@ func (s *datalakeTestSuite) TestReadInBuffer() { s.assert.EqualValues(testData[:5], output) } +func (suite *datalakeTestSuite) TestReadInBufferWithETAG() { + defer suite.cleanupTest() + // Setup + name := generateFileName() + fileHandle, _ := suite.az.CreateFile(internal.CreateFileOptions{Name: name}) + testData := "test data" + data := []byte(testData) + suite.az.WriteFile(internal.WriteFileOptions{Handle: fileHandle, Offset: 0, Data: data}) + fileHandle, _ = suite.az.OpenFile(internal.OpenFileOptions{Name: name}) + + output := make([]byte, 5) + var etag string + len, err := suite.az.ReadInBuffer(internal.ReadInBufferOptions{Handle: fileHandle, Offset: 0, Data: output, Etag: &etag}) + suite.assert.Nil(err) + suite.assert.NotEqual(etag, "") + suite.assert.EqualValues(5, len) + suite.assert.EqualValues(testData[:5], output) + _ = suite.az.CloseFile(internal.CloseFileOptions{Handle: fileHandle}) +} + func (s *datalakeTestSuite) TestReadInBufferLargeBuffer() { defer s.cleanupTest() // Setup @@ -2034,7 +2055,7 @@ func (s *datalakeTestSuite) TestFlushFileUpdateChunkedFile() { updatedBlock := make([]byte, 2*MB) rand.Read(updatedBlock) h.CacheObj.BlockOffsetList.BlockList[1].Data = make([]byte, blockSize) - s.az.storage.ReadInBuffer(name, int64(blockSize), int64(blockSize), h.CacheObj.BlockOffsetList.BlockList[1].Data) + s.az.storage.ReadInBuffer(name, int64(blockSize), int64(blockSize), h.CacheObj.BlockOffsetList.BlockList[1].Data, nil) copy(h.CacheObj.BlockOffsetList.BlockList[1].Data[MB:2*MB+MB], updatedBlock) h.CacheObj.BlockOffsetList.BlockList[1].Flags.Set(common.DirtyBlock) @@ -2072,7 +2093,7 @@ func (s *datalakeTestSuite) TestFlushFileTruncateUpdateChunkedFile() { // truncate block h.CacheObj.BlockOffsetList.BlockList[1].Data = make([]byte, blockSize/2) h.CacheObj.BlockOffsetList.BlockList[1].EndIndex = int64(blockSize + blockSize/2) - s.az.storage.ReadInBuffer(name, int64(blockSize), int64(blockSize)/2, h.CacheObj.BlockOffsetList.BlockList[1].Data) + s.az.storage.ReadInBuffer(name, int64(blockSize), int64(blockSize)/2, h.CacheObj.BlockOffsetList.BlockList[1].Data, nil) h.CacheObj.BlockOffsetList.BlockList[1].Flags.Set(common.DirtyBlock) // remove 2 blocks @@ -2489,7 +2510,7 @@ func (s *datalakeTestSuite) TestDownloadWithCPKEnabled() { s.assert.EqualValues(data, fileData) buf := make([]byte, len(data)) - err = s.az.storage.ReadInBuffer(name, 0, int64(len(data)), buf) + err = s.az.storage.ReadInBuffer(name, 0, int64(len(data)), buf, nil) s.assert.Nil(err) s.assert.EqualValues(data, buf) @@ -2693,6 +2714,131 @@ func (s *datalakeTestSuite) TestPermissionPreservationWithCommit() { s.assert.Contains(acl, "other::rwx") } +func (s *datalakeTestSuite) TestBlobFilters() { + defer s.cleanupTest() + // Setup + var err error + name := generateDirectoryName() + err = s.az.CreateDir(internal.CreateDirOptions{Name: name}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/abcd1.txt"}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/abcd2.txt"}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/abcd3.txt"}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/abcd4.txt"}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/bcd1.txt"}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/cd1.txt"}) + s.assert.Nil(err) + _, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/d1.txt"}) + s.assert.Nil(err) + err = s.az.CreateDir(internal.CreateDirOptions{Name: name + "/subdir"}) + s.assert.Nil(err) + + var iteration int = 0 + var marker string = "" + blobList := make([]*internal.ObjAttr, 0) + + for { + new_list, new_marker, err := s.az.StreamDir(internal.StreamDirOptions{Name: name + "/", Token: marker, Count: 50}) + s.assert.Nil(err) + blobList = append(blobList, new_list...) + marker = new_marker + iteration++ + + log.Debug("AzStorage::ReadDir : So far retrieved %d objects in %d iterations", len(blobList), iteration) + if new_marker == "" { + break + } + } + s.assert.EqualValues(8, len(blobList)) + err = s.az.storage.(*Datalake).SetFilter("name=^abcd.*") + s.assert.Nil(err) + + blobList = make([]*internal.ObjAttr, 0) + for { + new_list, new_marker, err := s.az.StreamDir(internal.StreamDirOptions{Name: name + "/", Token: marker, Count: 50}) + s.assert.Nil(err) + blobList = append(blobList, new_list...) + marker = new_marker + iteration++ + + log.Debug("AzStorage::ReadDir : So far retrieved %d objects in %d iterations", len(blobList), iteration) + if new_marker == "" { + break + } + } + + s.assert.EqualValues(5, len(blobList)) + err = s.az.storage.(*Datalake).SetFilter("name=^bla.*") + s.assert.Nil(err) + + blobList = make([]*internal.ObjAttr, 0) + for { + new_list, new_marker, err := s.az.StreamDir(internal.StreamDirOptions{Name: name + "/", Token: marker, Count: 50}) + s.assert.Nil(err) + blobList = append(blobList, new_list...) + marker = new_marker + iteration++ + + log.Debug("AzStorage::ReadDir : So far retrieved %d objects in %d iterations", len(blobList), iteration) + if new_marker == "" { + break + } + } + + s.assert.EqualValues(1, len(blobList)) + err = s.az.storage.(*Datalake).SetFilter("") + s.assert.Nil(err) +} + +func (s *datalakeTestSuite) TestList() { + defer s.cleanupTest() + // Setup + s.tearDownTestHelper(false) // Don't delete the generated container. + config := fmt.Sprintf("azstorage:\n account-name: %s\n endpoint: https://%s.dfs.core.windows.net/\n type: adls\n account-key: %s\n mode: key\n container: %s\n", + storageTestConfigurationParameters.AdlsAccount, storageTestConfigurationParameters.AdlsAccount, storageTestConfigurationParameters.AdlsKey, s.container) + s.setupTestHelper(config, s.container, false) + + base := generateDirectoryName() + s.setupHierarchy(base) + + blobList, marker, err := s.az.storage.List(base, nil, 0) + s.assert.Nil(err) + emptyString := "" + s.assert.Equal(&emptyString, marker) + s.assert.NotNil(blobList) + s.assert.EqualValues(3, len(blobList)) + s.assert.NotEqual(0, blobList[0].Mode) + + // Test listing with prefix + blobList, marker, err = s.az.storage.List(base+"b/", nil, 0) + s.assert.Nil(err) + s.assert.Equal(&emptyString, marker) + s.assert.NotNil(blobList) + s.assert.EqualValues(1, len(blobList)) + s.assert.EqualValues("c1", blobList[0].Name) + s.assert.NotEqual(0, blobList[0].Mode) + + // Test listing with marker + blobList, marker, err = s.az.storage.List(base, to.Ptr("invalid-marker"), 0) + s.assert.NotNil(err) + s.assert.Equal(0, len(blobList)) + s.assert.Nil(marker) + + // Test listing with count + blobList, marker, err = s.az.storage.List("", nil, 1) + s.assert.Nil(err) + s.assert.NotNil(blobList) + s.assert.NotEmpty(marker) + s.assert.EqualValues(1, len(blobList)) + s.assert.EqualValues(base, blobList[0].Path) + s.assert.NotEqual(0, blobList[0].Mode) +} + // func (s *datalakeTestSuite) TestRAGRS() { // defer s.cleanupTest() // // Setup diff --git a/component/azstorage/utils.go b/component/azstorage/utils.go index 299702efc..962dc1387 100644 --- a/component/azstorage/utils.go +++ b/component/azstorage/utils.go @@ -533,24 +533,17 @@ func getFileMode(permissions string) (os.FileMode, error) { return mode, nil } -// Strips the prefixPath from the path and returns the joined string -func split(prefixPath string, path string) string { +// removePrefixPath removes the given prefixPath from the beginning of path, +// if it exists, and returns the resulting string without leading slashes. +func removePrefixPath(prefixPath, path string) string { if prefixPath == "" { return path } - - // Remove prefixpath from the given path - paths := strings.Split(path, prefixPath) - if paths[0] == "" { - paths = paths[1:] - } - - // If result starts with "/" then remove that - if paths[0][0] == '/' { - paths[0] = paths[0][1:] + path = strings.TrimPrefix(path, prefixPath) + if path[0] == '/' { + return path[1:] } - - return filepath.Join(paths...) + return path } func sanitizeSASKey(key string) string { @@ -596,3 +589,29 @@ func removeLeadingSlashes(s string) string { } return s } + +func modifyLMT(attr *internal.ObjAttr, lmt *time.Time) { + if attr != nil { + attr.Atime = *lmt + attr.Mtime = *lmt + attr.Ctime = *lmt + } +} + +// func parseBlobTags(tags *container.BlobTags) map[string]string { + +// if tags == nil { +// return nil +// } + +// blobtags := make(map[string]string) +// for _, tag := range tags.BlobTagSet { +// if tag != nil { +// if tag.Key != nil { +// blobtags[*tag.Key] = *tag.Value +// } +// } +// } + +// return blobtags +// } diff --git a/component/azstorage/utils_test.go b/component/azstorage/utils_test.go index e72ad7ca1..b2b606fd0 100644 --- a/component/azstorage/utils_test.go +++ b/component/azstorage/utils_test.go @@ -77,45 +77,6 @@ type contentTypeVal struct { result string } -func (s *utilsTestSuite) TestPrefixPathRemoval() { - assert := assert.New(s.T()) - - type PrefixPath struct { - prefix string - path string - result string - } - - var inputs = []PrefixPath{ - {prefix: "", path: "abc.txt", result: "abc.txt"}, - {prefix: "", path: "ABC", result: "ABC"}, - {prefix: "", path: "ABC/DEF.txt", result: "ABC/DEF.txt"}, - {prefix: "", path: "ABC/DEF/1.txt", result: "ABC/DEF/1.txt"}, - - {prefix: "ABC", path: "ABC/DEF/1.txt", result: "DEF/1.txt"}, - {prefix: "ABC/", path: "ABC/DEF/1.txt", result: "DEF/1.txt"}, - {prefix: "ABC", path: "ABC/DEF", result: "DEF"}, - {prefix: "ABC/", path: "ABC/DEF", result: "DEF"}, - {prefix: "ABC/", path: "ABC/DEF/G/H/1.txt", result: "DEF/G/H/1.txt"}, - - {prefix: "ABC/DEF", path: "ABC/DEF/1.txt", result: "1.txt"}, - {prefix: "ABC/DEF/", path: "ABC/DEF/1.txt", result: "1.txt"}, - {prefix: "ABC/DEF", path: "ABC/DEF/A/B/c.txt", result: "A/B/c.txt"}, - {prefix: "ABC/DEF/", path: "ABC/DEF/A/B/c.txt", result: "A/B/c.txt"}, - - {prefix: "A/B/C/D/E", path: "A/B/C/D/E/F/G/H/I/j.txt", result: "F/G/H/I/j.txt"}, - {prefix: "A/B/C/D/E/", path: "A/B/C/D/E/F/G/H/I/j.txt", result: "F/G/H/I/j.txt"}, - } - - for _, i := range inputs { - s.Run(filepath.Join(i.prefix, i.path), func() { - output := split(i.prefix, i.path) - assert.EqualValues(i.result, output) - }) - } - -} - func (s *utilsTestSuite) TestGetContentType() { assert := assert.New(s.T()) var inputs = []contentTypeVal{ @@ -533,6 +494,33 @@ func (s *utilsTestSuite) TestRemoveLeadingSlashes() { } } +func (suite *utilsTestSuite) TestRemovePrefixPath() { + assert := assert.New(suite.T()) + + var inputs = []struct { + prefixPath string + path string + result string + }{ + {prefixPath: "", path: "abc.txt", result: "abc.txt"}, + {prefixPath: "", path: "ABC/DEF/abc.txt", result: "ABC/DEF/abc.txt"}, + {prefixPath: "ABC", path: "ABC/DEF/1.txt", result: "DEF/1.txt"}, + {prefixPath: "ABC/", path: "ABC/DEF/1.txt", result: "DEF/1.txt"}, + {prefixPath: "ABC/DEF", path: "ABC/DEF/1.txt", result: "1.txt"}, + {prefixPath: "ABC/DEF/", path: "ABC/DEF/1.txt", result: "1.txt"}, + {prefixPath: "ABC", path: "ABC/ABC.txt", result: "ABC.txt"}, + {prefixPath: "A/B/C/D/E/", path: "A/B/C/D/E/F/G/H/I/j.txt", result: "F/G/H/I/j.txt"}, + {prefixPath: "A/B/C/D/E/", path: "A/B/C/D/E/F/G/H/I/j.txt", result: "F/G/H/I/j.txt"}, + } + + for _, i := range inputs { + suite.Run(filepath.Join(i.prefixPath, i.path), func() { + output := removePrefixPath(i.prefixPath, i.path) + assert.EqualValues(i.result, output) + }) + } +} + func TestUtilsTestSuite(t *testing.T) { suite.Run(t, new(utilsTestSuite)) } diff --git a/component/block_cache/block_cache.go b/component/block_cache/block_cache.go index 4a9cb8bb2..858140b70 100644 --- a/component/block_cache/block_cache.go +++ b/component/block_cache/block_cache.go @@ -34,6 +34,7 @@ package block_cache import ( + "bytes" "container/list" "context" "encoding/base64" @@ -84,6 +85,7 @@ type BlockCache struct { maxDiskUsageHit bool // Flag to indicate if we have hit max disk usage noPrefetch bool // Flag to indicate if prefetch is disabled prefetchOnOpen bool // Start prefetching on file open call instead of waiting for first read + consistency bool // Flag to indicate if strong data consistency is enabled stream *Stream lazyWrite bool // Flag to indicate if lazy write is enabled fileCloseOpt sync.WaitGroup // Wait group to wait for all async close operations to complete @@ -99,6 +101,7 @@ type BlockCacheOptions struct { PrefetchCount uint32 `config:"prefetch" yaml:"prefetch,omitempty"` Workers uint32 `config:"parallelism" yaml:"parallelism,omitempty"` PrefetchOnOpen bool `config:"prefetch-on-open" yaml:"prefetch-on-open,omitempty"` + Consistency bool `config:"consistency" yaml:"consistency,omitempty"` } const ( @@ -135,7 +138,20 @@ func (bc *BlockCache) SetNextComponent(nc internal.Component) { func (bc *BlockCache) Start(ctx context.Context) error { log.Trace("BlockCache::Start : Starting component %s", bc.Name()) + bc.blockPool = NewBlockPool(bc.blockSize, bc.memSize) + if bc.blockPool == nil { + log.Err("BlockCache::Start : failed to init block pool") + return fmt.Errorf("config error in %s [failed to init block pool]", bc.Name()) + } + + bc.threadPool = newThreadPool(bc.workers, bc.download, bc.upload) + if bc.threadPool == nil { + log.Err("BlockCache::Start : failed to init thread pool") + return fmt.Errorf("config error in %s [failed to init thread pool]", bc.Name()) + } + // Start the thread pool and keep it ready for download + log.Debug("BlockCache::Start : Starting thread pool") bc.threadPool.Start() // If disk caching is enabled then start the disk eviction policy @@ -237,6 +253,8 @@ func (bc *BlockCache) Configure(_ bool) error { bc.diskTimeout = conf.DiskTimeout } + bc.consistency = conf.Consistency + bc.prefetchOnOpen = conf.PrefetchOnOpen bc.prefetch = uint32(math.Max((MIN_PREFETCH*2)+1, (float64)(2*runtime.NumCPU()))) bc.noPrefetch = false @@ -310,18 +328,6 @@ func (bc *BlockCache) Configure(_ bool) error { return fmt.Errorf("config error in %s [memory limit too low for configured prefetch]", bc.Name()) } - bc.blockPool = NewBlockPool(bc.blockSize, bc.memSize) - if bc.blockPool == nil { - log.Err("BlockCache::Configure : fail to init Block pool") - return fmt.Errorf("config error in %s [fail to init block pool]", bc.Name()) - } - - bc.threadPool = newThreadPool(bc.workers, bc.download, bc.upload) - if bc.threadPool == nil { - log.Err("BlockCache::Configure : fail to init thread pool") - return fmt.Errorf("config error in %s [fail to init thread pool]", bc.Name()) - } - if bc.tmpPath != "" { bc.diskPolicy, err = tlru.New(uint32((bc.diskSize)/bc.blockSize), bc.diskTimeout, bc.diskEvict, 60, bc.checkDiskUsage) if err != nil { @@ -330,8 +336,8 @@ func (bc *BlockCache) Configure(_ bool) error { } } - log.Crit("BlockCache::Configure : block size %v, mem size %v, worker %v, prefetch %v, disk path %v, max size %v, disk timeout %v, prefetch-on-open %t, maxDiskUsageHit %v, noPrefetch %v", - bc.blockSize, bc.memSize, bc.workers, bc.prefetch, bc.tmpPath, bc.diskSize, bc.diskTimeout, bc.prefetchOnOpen, bc.maxDiskUsageHit, bc.noPrefetch) + log.Crit("BlockCache::Configure : block size %v, mem size %v, worker %v, prefetch %v, disk path %v, max size %v, disk timeout %v, prefetch-on-open %t, maxDiskUsageHit %v, noPrefetch %v, consistency %v", + bc.blockSize, bc.memSize, bc.workers, bc.prefetch, bc.tmpPath, bc.diskSize, bc.diskTimeout, bc.prefetchOnOpen, bc.maxDiskUsageHit, bc.noPrefetch, bc.consistency) return nil } @@ -382,7 +388,7 @@ func (bc *BlockCache) CreateFile(options internal.CreateFileOptions) (*handlemap // OpenFile: Create a handle for the file user has requested to open func (bc *BlockCache) OpenFile(options internal.OpenFileOptions) (*handlemap.Handle, error) { - log.Trace("BlockCache::OpenFile : name=%s, flags=%d, mode=%s", options.Name, options.Flags, options.Mode) + log.Trace("BlockCache::OpenFile : name=%s, flags=%X, mode=%s", options.Name, options.Flags, options.Mode) attr, err := bc.NextComponent().GetAttr(internal.GetAttrOptions{Name: options.Name}) if err != nil { @@ -394,10 +400,14 @@ func (bc *BlockCache) OpenFile(options internal.OpenFileOptions) (*handlemap.Han handle.Mtime = attr.Mtime handle.Size = attr.Size + if attr.ETag != "" { + handle.SetValue("ETAG", attr.ETag) + } + log.Debug("BlockCache::OpenFile : Size of file handle.Size %v", handle.Size) bc.prepareHandleForBlockCache(handle) - if options.Flags&os.O_TRUNC != 0 || (options.Flags&os.O_WRONLY != 0 && options.Flags&os.O_APPEND == 0) { + if options.Flags&os.O_TRUNC != 0 { // If file is opened in truncate or wronly mode then we need to wipe out the data consider current file size as 0 log.Debug("BlockCache::OpenFile : Truncate %v to 0", options.Name) handle.Size = 0 @@ -982,35 +992,43 @@ func (bc *BlockCache) download(item *workItem) { _ = os.Remove(localPath) } else { var successfulRead bool = true - n, err := f.Read(item.block.data) + numberOfBytes, err := f.Read(item.block.data) if err != nil { log.Err("BlockCache::download : Failed to read data from disk cache %s [%s]", fileName, err.Error()) successfulRead = false _ = os.Remove(localPath) } - if n != int(bc.blockSize) && item.block.offset+uint64(n) != uint64(item.handle.Size) { - log.Err("BlockCache::download : Local data retrieved from disk size mismatch, Expected %v, OnDisk %v, fileSize %v", bc.getBlockSize(uint64(item.handle.Size), item.block), n, item.handle.Size) + if numberOfBytes != int(bc.blockSize) && item.block.offset+uint64(numberOfBytes) != uint64(item.handle.Size) { + log.Err("BlockCache::download : Local data retrieved from disk size mismatch, Expected %v, OnDisk %v, fileSize %v", bc.getBlockSize(uint64(item.handle.Size), item.block), numberOfBytes, item.handle.Size) successfulRead = false _ = os.Remove(localPath) } f.Close() - // We have read the data from disk so there is no need to go over network - // Just mark the block that download is complete + if successfulRead { - item.block.Ready(BlockStatusDownloaded) - return + // If user has enabled consistency check then compute the md5sum and match it in xattr + successfulRead = checkBlockConsistency(bc, item, numberOfBytes, localPath, fileName) + + // We have read the data from disk so there is no need to go over network + // Just mark the block that download is complete + if successfulRead { + item.block.Ready(BlockStatusDownloaded) + return + } } } } } + var etag string // If file does not exists then download the block from the container n, err := bc.NextComponent().ReadInBuffer(internal.ReadInBufferOptions{ Handle: item.handle, Offset: int64(item.block.offset), Data: item.block.data, + Etag: &etag, }) if item.failCnt > MAX_FAIL_CNT { @@ -1021,7 +1039,7 @@ func (bc *BlockCache) download(item *workItem) { return } - if err != nil { + if err != nil && err != io.EOF { // Fail to read the data so just reschedule this request log.Err("BlockCache::download : Failed to read %v=>%s from offset %v [%s]", item.handle.ID, item.handle.Path, item.block.id, err.Error()) item.failCnt++ @@ -1035,6 +1053,17 @@ func (bc *BlockCache) download(item *workItem) { return } + // Compare the ETAG value and fail download if blob has changed + if etag != "" { + etagVal, found := item.handle.GetValue("ETAG") + if found && etagVal != etag { + log.Err("BlockCache::download : Blob has changed for %v=>%s (index %v, offset %v)", item.handle.ID, item.handle.Path, item.block.id, item.block.offset) + item.block.Failed() + item.block.Ready(BlockStatusDownloadFailed) + return + } + } + if bc.tmpPath != "" { err := os.MkdirAll(filepath.Dir(localPath), 0777) if err != nil { @@ -1053,6 +1082,15 @@ func (bc *BlockCache) download(item *workItem) { f.Close() bc.diskPolicy.Refresh(diskNode.(*list.Element)) + + // If user has enabled consistency check then compute the md5sum and save it in xattr + if bc.consistency { + hash := common.GetCRC64(item.block.data, n) + err = syscall.Setxattr(localPath, "user.md5sum", hash, 0) + if err != nil { + log.Err("BlockCache::download : Failed to set md5sum for file %s [%v]", localPath, err.Error()) + } + } } } @@ -1060,6 +1098,30 @@ func (bc *BlockCache) download(item *workItem) { item.block.Ready(BlockStatusDownloaded) } +func checkBlockConsistency(blockCache *BlockCache, item *workItem, numberOfBytes int, localPath, fileName string) bool { + if !blockCache.consistency { + return true + } + // Calculate MD5 checksum of the read data + actualHash := common.GetCRC64(item.block.data, numberOfBytes) + + // Retrieve MD5 checksum from xattr + xattrHash := make([]byte, 8) + _, err := syscall.Getxattr(localPath, "user.md5sum", xattrHash) + if err != nil { + log.Err("BlockCache::download : Failed to get md5sum for file %s [%v]", fileName, err.Error()) + } else { + // Compare checksums + if !bytes.Equal(actualHash, xattrHash) { + log.Err("BlockCache::download : MD5 checksum mismatch for file %s, expected %v, got %v", fileName, xattrHash, actualHash) + _ = os.Remove(localPath) + return false + } + } + + return true +} + // WriteFile: Write to the local file func (bc *BlockCache) WriteFile(options internal.WriteFileOptions) (int, error) { // log.Debug("BlockCache::WriteFile : Writing %v bytes from %s", len(options.Data), options.Handle.Path) @@ -1450,6 +1512,15 @@ func (bc *BlockCache) upload(item *workItem) { } else { bc.diskPolicy.Refresh(diskNode.(*list.Element)) } + + // If user has enabled consistency check then compute the md5sum and save it in xattr + if bc.consistency { + hash := common.GetCRC64(item.block.data, int(blockSize)) + err = syscall.Setxattr(localPath, "user.md5sum", hash, 0) + if err != nil { + log.Err("BlockCache::download : Failed to set md5sum for file %s [%v]", localPath, err.Error()) + } + } } } @@ -1502,12 +1573,17 @@ func (bc *BlockCache) commitBlocks(handle *handlemap.Handle) error { log.Debug("BlockCache::commitBlocks : Committing blocks for %s", handle.Path) // Commit the block list now - err = bc.NextComponent().CommitData(internal.CommitDataOptions{Name: handle.Path, List: blockIDList, BlockSize: bc.blockSize}) + var newEtag string = "" + err = bc.NextComponent().CommitData(internal.CommitDataOptions{Name: handle.Path, List: blockIDList, BlockSize: bc.blockSize, NewETag: &newEtag}) if err != nil { log.Err("BlockCache::commitBlocks : Failed to commit blocks for %s [%s]", handle.Path, err.Error()) return err } + if newEtag != "" { + handle.SetValue("ETAG", newEtag) + } + // set all the blocks as committed list, _ := handle.GetValue("blockList") listMap := list.(map[int64]*blockInfo) @@ -1758,6 +1834,36 @@ func (bc *BlockCache) SyncFile(options internal.SyncFileOptions) error { return nil } +func (bc *BlockCache) StatFs() (*syscall.Statfs_t, bool, error) { + var maxCacheSize uint64 + if bc.diskSize > 0 { + maxCacheSize = bc.diskSize + } else { + maxCacheSize = bc.memSize + } + + if maxCacheSize == 0 { + return nil, false, nil + } + + usage, _ := common.GetUsage(bc.tmpPath) + usage = usage * float64(_1MB) + + available := (float64)(maxCacheSize) - usage + statfs := &syscall.Statfs_t{} + err := syscall.Statfs("/", statfs) + if err != nil { + log.Debug("BlockCache::StatFs : statfs err [%s].", err.Error()) + return nil, false, err + } + statfs.Frsize = int64(bc.blockSize) + statfs.Blocks = uint64(maxCacheSize) / uint64(bc.blockSize) + statfs.Bavail = uint64(math.Max(0, available)) / uint64(bc.blockSize) + statfs.Bfree = statfs.Bavail + + return statfs, true, nil +} + // ------------------------- Factory ------------------------------------------- // Pipeline will call this method to create your object, initialize your variables here // << DO NOT DELETE ANY AUTO GENERATED CODE HERE >> @@ -1796,4 +1902,7 @@ func init() { blockCachePrefetchOnOpen := config.AddBoolFlag("block-cache-prefetch-on-open", false, "Start prefetching on open or wait for first read.") config.BindPFlag(compName+".prefetch-on-open", blockCachePrefetchOnOpen) + + strongConsistency := config.AddBoolFlag("block-cache-strong-consistency", false, "Enable strong data consistency for block cache.") + config.BindPFlag(compName+".consistency", strongConsistency) } diff --git a/component/block_cache/block_cache_test.go b/component/block_cache/block_cache_test.go index 0d31843e1..14a41e875 100644 --- a/component/block_cache/block_cache_test.go +++ b/component/block_cache/block_cache_test.go @@ -47,6 +47,7 @@ import ( "path/filepath" "strconv" "strings" + "syscall" "testing" "time" @@ -236,6 +237,58 @@ func (suite *blockCacheTestSuite) TestFreeDiskSpace() { suite.assert.LessOrEqual(difference, tolerance) } +func (suite *blockCacheTestSuite) TestStatfsMemory() { + emptyConfig := "read-only: true\n\nblock_cache:\n block-size-mb: 16\n" + tobj, err := setupPipeline(emptyConfig) + defer tobj.cleanupPipeline() + + suite.assert.Nil(err) + suite.assert.Equal(tobj.blockCache.Name(), "block_cache") + cmd := exec.Command("bash", "-c", "free -b | grep Mem | awk '{print $4}'") + var out bytes.Buffer + cmd.Stdout = &out + err = cmd.Run() + suite.assert.Nil(err) + free, err := strconv.Atoi(strings.TrimSpace(out.String())) + suite.assert.Nil(err) + expected := uint64(0.8 * float64(free)) + stat, ret, err := tobj.blockCache.StatFs() + suite.assert.Equal(ret, true) + suite.assert.Equal(err, nil) + suite.assert.NotEqual(stat, &syscall.Statfs_t{}) + actual := tobj.blockCache.memSize + difference := math.Abs(float64(actual) - float64(expected)) + tolerance := 0.10 * float64(math.Max(float64(actual), float64(expected))) + suite.assert.LessOrEqual(difference, tolerance) +} + +func (suite *blockCacheTestSuite) TestStatfsDisk() { + disk_cache_path := getFakeStoragePath("fake_storage") + config := fmt.Sprintf("read-only: true\n\nblock_cache:\n block-size-mb: 1\n path: %s", disk_cache_path) + tobj, err := setupPipeline(config) + defer tobj.cleanupPipeline() + + suite.assert.Nil(err) + suite.assert.Equal(tobj.blockCache.Name(), "block_cache") + + cmd := exec.Command("bash", "-c", fmt.Sprintf("df -B1 %s | awk 'NR==2{print $4}'", disk_cache_path)) + var out bytes.Buffer + cmd.Stdout = &out + err = cmd.Run() + suite.assert.Nil(err) + freeDisk, err := strconv.Atoi(strings.TrimSpace(out.String())) + suite.assert.Nil(err) + expected := uint64(0.8 * float64(freeDisk)) + stat, ret, err := tobj.blockCache.StatFs() + suite.assert.Equal(ret, true) + suite.assert.Equal(err, nil) + suite.assert.NotEqual(stat, &syscall.Statfs_t{}) + actual := tobj.blockCache.diskSize + difference := math.Abs(float64(actual) - float64(expected)) + tolerance := 0.10 * float64(math.Max(float64(actual), float64(expected))) + suite.assert.LessOrEqual(difference, tolerance) +} + func (suite *blockCacheTestSuite) TestInvalidPrefetchCount() { cfg := "read-only: true\n\nblock_cache:\n block-size-mb: 16\n mem-size-mb: 500\n prefetch: 8\n parallelism: 10\n path: abcd\n disk-size-mb: 100\n disk-timeout-sec: 5" tobj, err := setupPipeline(cfg) @@ -277,12 +330,12 @@ func (suite *blockCacheTestSuite) TestSomeInvalidConfigs() { cfg := "read-only: true\n\nblock_cache:\n block-size-mb: 8\n mem-size-mb: 800\n prefetch: 12\n parallelism: 0\n" _, err := setupPipeline(cfg) suite.assert.NotNil(err) - suite.assert.Contains(err.Error(), "fail to init thread pool") + suite.assert.Contains(err.Error(), "failed to init thread pool") cfg = "read-only: true\n\nblock_cache:\n block-size-mb: 1024000\n mem-size-mb: 20240000\n prefetch: 12\n parallelism: 1\n" _, err = setupPipeline(cfg) suite.assert.NotNil(err) - suite.assert.Contains(err.Error(), "fail to init block pool") + suite.assert.Contains(err.Error(), "failed to init block pool") cfg = "read-only: true\n\nblock_cache:\n block-size-mb: 8\n mem-size-mb: 800\n prefetch: 12\n parallelism: 5\n path: ./bctemp \n disk-size-mb: 100\n disk-timeout-sec: 0" _, err = setupPipeline(cfg) @@ -2625,6 +2678,158 @@ func (suite *blockCacheTestSuite) TestZZZZZStreamToBlockCacheConfig() { } } +func (suite *blockCacheTestSuite) TestSizeOfFileInOpen() { + // Write-back cache is turned on by default while mounting. + config := "block_cache:\n block-size-mb: 1\n mem-size-mb: 20\n prefetch: 12\n parallelism: 1" + tobj, err := setupPipeline(config) + suite.assert.Nil(err) + defer tobj.cleanupPipeline() + + path := getTestFileName(suite.T().Name()) + storagePath := filepath.Join(tobj.fake_storage_path, path) + localPath := filepath.Join(tobj.disk_cache_path, path) + + // ------------------------------------------------------------------ + // Create a local file + fh, err := os.Create(localPath) + suite.assert.Nil(err) + + // write 1MB data at offset 0 + n, err := fh.WriteAt(dataBuff[:_1MB], 0) + suite.assert.Nil(err) + suite.assert.Equal(n, int(_1MB)) + + err = fh.Close() + suite.assert.Nil(err) + // ------------------------------------------------------------------ + // Create a file using Mountpoint + options := internal.CreateFileOptions{Name: path, Mode: 0777} + h, err := tobj.blockCache.CreateFile(options) + suite.assert.Nil(err) + suite.assert.NotNil(h) + suite.assert.Equal(h.Size, int64(0)) + suite.assert.False(h.Dirty()) + + // write 1MB data at offset 0 + n, err = tobj.blockCache.WriteFile(internal.WriteFileOptions{Handle: h, Offset: 0, Data: dataBuff[:_1MB]}) + suite.assert.Nil(err) + suite.assert.Equal(n, int(_1MB)) + suite.assert.True(h.Dirty()) + + err = tobj.blockCache.CloseFile(internal.CloseFileOptions{Handle: h}) + suite.assert.Nil(err) + //--------------------------------------------------------------------- + + //Open and close the file using the given flag in local and mountpoint and + // check the size is same or not. + check := func(flag int) int { + lfh, err := os.OpenFile(localPath, flag, 0666) + suite.assert.Nil(err) + suite.assert.NotNil(lfh) + err = lfh.Close() + suite.assert.Nil(err) + + openFileOptions := internal.OpenFileOptions{Name: path, Flags: flag, Mode: 0777} + rfh, err := tobj.blockCache.OpenFile(openFileOptions) + suite.assert.Nil(err) + err = tobj.blockCache.CloseFile(internal.CloseFileOptions{Handle: rfh}) + suite.assert.Nil(err) + + statInfoLocal, err := os.Stat(localPath) + suite.assert.Nil(err) + sizeInLocal := statInfoLocal.Size() + + statInfoMount, err := os.Stat(storagePath) + suite.assert.Nil(err) + sizeInMount := statInfoMount.Size() + suite.assert.Equal(sizeInLocal, sizeInMount) + return int(sizeInLocal) + } + size := check(os.O_WRONLY) // size of the file would be 1MB + suite.assert.Equal(size, int(_1MB)) + size = check(os.O_TRUNC) // size of the file would be zero here. + suite.assert.Equal(size, int(0)) +} + +func (suite *blockCacheTestSuite) TestStrongConsistency() { + tobj, err := setupPipeline("") + defer tobj.cleanupPipeline() + + suite.assert.Nil(err) + suite.assert.NotNil(tobj.blockCache) + + tobj.blockCache.consistency = true + + path := getTestFileName(suite.T().Name()) + options := internal.CreateFileOptions{Name: path, Mode: 0777} + h, err := tobj.blockCache.CreateFile(options) + suite.assert.Nil(err) + suite.assert.NotNil(h) + suite.assert.Equal(h.Size, int64(0)) + suite.assert.False(h.Dirty()) + + storagePath := filepath.Join(tobj.fake_storage_path, path) + fs, err := os.Stat(storagePath) + suite.assert.Nil(err) + suite.assert.Equal(fs.Size(), int64(0)) + //Generate random size of file in bytes less than 2MB + + size := rand.Intn(2097152) + data := make([]byte, size) + + n, err := tobj.blockCache.WriteFile(internal.WriteFileOptions{Handle: h, Offset: 0, Data: data}) // Write data to file + suite.assert.Nil(err) + suite.assert.Equal(n, size) + suite.assert.Equal(h.Size, int64(size)) + + err = tobj.blockCache.CloseFile(internal.CloseFileOptions{Handle: h}) + suite.assert.Nil(err) + suite.assert.Nil(h.Buffers.Cooked) + suite.assert.Nil(h.Buffers.Cooking) + + localPath := filepath.Join(tobj.disk_cache_path, path+"::0") + + xattrMd5sumOrg := make([]byte, 32) + _, err = syscall.Getxattr(localPath, "user.md5sum", xattrMd5sumOrg) + suite.assert.Nil(err) + + h, err = tobj.blockCache.OpenFile(internal.OpenFileOptions{Name: path, Flags: os.O_RDWR}) + suite.assert.Nil(err) + suite.assert.NotNil(h) + _, _ = tobj.blockCache.ReadInBuffer(internal.ReadInBufferOptions{Handle: h, Offset: 0, Data: data}) + err = tobj.blockCache.CloseFile(internal.CloseFileOptions{Handle: h}) + suite.assert.Nil(err) + suite.assert.Nil(h.Buffers.Cooked) + suite.assert.Nil(h.Buffers.Cooking) + + xattrMd5sumRead := make([]byte, 32) + _, err = syscall.Getxattr(localPath, "user.md5sum", xattrMd5sumRead) + suite.assert.Nil(err) + suite.assert.EqualValues(xattrMd5sumOrg, xattrMd5sumRead) + + err = syscall.Setxattr(localPath, "user.md5sum", []byte("000"), 0) + suite.assert.Nil(err) + + xattrMd5sum1 := make([]byte, 32) + _, err = syscall.Getxattr(localPath, "user.md5sum", xattrMd5sum1) + suite.assert.Nil(err) + + h, err = tobj.blockCache.OpenFile(internal.OpenFileOptions{Name: path, Flags: os.O_RDWR}) + suite.assert.Nil(err) + suite.assert.NotNil(h) + _, _ = tobj.blockCache.ReadInBuffer(internal.ReadInBufferOptions{Handle: h, Offset: 0, Data: data}) + err = tobj.blockCache.CloseFile(internal.CloseFileOptions{Handle: h}) + suite.assert.Nil(err) + suite.assert.Nil(h.Buffers.Cooked) + suite.assert.Nil(h.Buffers.Cooking) + + xattrMd5sum2 := make([]byte, 32) + _, err = syscall.Getxattr(localPath, "user.md5sum", xattrMd5sum2) + suite.assert.Nil(err) + + suite.assert.NotEqualValues(xattrMd5sum1, xattrMd5sum2) +} + // In order for 'go test' to run this suite, we need to create // a normal test function and pass our suite to suite.Run func TestBlockCacheTestSuite(t *testing.T) { diff --git a/component/file_cache/file_cache.go b/component/file_cache/file_cache.go index 44b3186a2..6b2c8d571 100644 --- a/component/file_cache/file_cache.go +++ b/component/file_cache/file_cache.go @@ -298,9 +298,9 @@ func (c *FileCache) Configure(_ bool) error { err = syscall.Statfs(c.tmpPath, &stat) if err != nil { log.Err("FileCache::Configure : config error %s [%s]. Assigning a default value of 4GB or if any value is assigned to .disk-size-mb in config.", c.Name(), err.Error()) - c.maxCacheSize = 4192 * MB + c.maxCacheSize = 4192 } else { - c.maxCacheSize = 0.8 * float64(stat.Bavail) * float64(stat.Bsize) + c.maxCacheSize = (0.8 * float64(stat.Bavail) * float64(stat.Bsize)) / (MB) } if config.IsSet(compName+".max-size-mb") && conf.MaxSizeMB != 0 { diff --git a/component/file_cache/file_cache_test.go b/component/file_cache/file_cache_test.go index e6ae50409..dd7067a59 100644 --- a/component/file_cache/file_cache_test.go +++ b/component/file_cache/file_cache_test.go @@ -208,7 +208,7 @@ func (suite *fileCacheTestSuite) TestDefaultCacheSize() { freeDisk, err := strconv.Atoi(strings.TrimSpace(out.String())) suite.assert.Nil(err) expected := uint64(0.8 * float64(freeDisk)) - actual := suite.fileCache.maxCacheSize + actual := suite.fileCache.maxCacheSize * MB difference := math.Abs(float64(actual) - float64(expected)) tolerance := 0.10 * float64(math.Max(float64(actual), float64(expected))) suite.assert.LessOrEqual(difference, tolerance, "mssg:", actual, expected) diff --git a/component/libfuse/libfuse_handler.go b/component/libfuse/libfuse_handler.go index 23f8d70ac..be1d38532 100644 --- a/component/libfuse/libfuse_handler.go +++ b/component/libfuse/libfuse_handler.go @@ -979,7 +979,10 @@ func libfuse_rename(src *C.char, dst *C.char, flags C.uint) C.int { } } - err := fuseFS.NextComponent().RenameDir(internal.RenameDirOptions{Src: srcPath, Dst: dstPath}) + err := fuseFS.NextComponent().RenameDir(internal.RenameDirOptions{ + Src: srcPath, + Dst: dstPath, + }) if err != nil { log.Err("Libfuse::libfuse_rename : error renaming directory %s -> %s [%s]", srcPath, dstPath, err.Error()) return -C.EIO @@ -989,7 +992,12 @@ func libfuse_rename(src *C.char, dst *C.char, flags C.uint) C.int { libfuseStatsCollector.UpdateStats(stats_manager.Increment, renameDir, (int64)(1)) } else { - err := fuseFS.NextComponent().RenameFile(internal.RenameFileOptions{Src: srcPath, Dst: dstPath}) + err := fuseFS.NextComponent().RenameFile(internal.RenameFileOptions{ + Src: srcPath, + Dst: dstPath, + SrcAttr: srcAttr, + DstAttr: dstAttr, + }) if err != nil { log.Err("Libfuse::libfuse_rename : error renaming file %s -> %s [%s]", srcPath, dstPath, err.Error()) return -C.EIO diff --git a/component/loopback/loopback_fs.go b/component/loopback/loopback_fs.go index eb48f53b0..3f8bb38be 100644 --- a/component/loopback/loopback_fs.go +++ b/component/loopback/loopback_fs.go @@ -478,6 +478,13 @@ func (lfs *LoopbackFS) CommitData(options internal.CommitDataOptions) error { return err } + if len(options.List) == 0 { + err = blob.Truncate(0) + if err != nil { + return err + } + } + for idx, id := range options.List { path := fmt.Sprintf("%s_%s", filepath.Join(lfs.path, options.Name), strings.ReplaceAll(id, "/", "_")) info, err := os.Lstat(path) diff --git a/component/loopback/loopback_fs_test.go b/component/loopback/loopback_fs_test.go index f3843dae9..cd8eb6b90 100644 --- a/component/loopback/loopback_fs_test.go +++ b/component/loopback/loopback_fs_test.go @@ -309,6 +309,30 @@ func (suite *LoopbackFSTestSuite) TestStageAndCommitData() { assert.Nil(err) } +// This test is for opening the file in O_TRUNC on the existing file +// must result in resetting the filesize to 0 +func (suite *LoopbackFSTestSuite) TestCommitNilDataToExistingFile() { + defer suite.cleanupTest() + assert := assert.New(suite.T()) + + lfs := &LoopbackFS{} + + lfs.path = common.ExpandPath("~/blocklfstest") + err := os.MkdirAll(lfs.path, os.FileMode(0777)) + assert.Nil(err) + defer os.RemoveAll(lfs.path) + Filepath := filepath.Join(lfs.path, "testFile") + os.WriteFile(Filepath, []byte("hello"), 0777) + + blockList := []string{} + err = lfs.CommitData(internal.CommitDataOptions{Name: "testFile", List: blockList}) + assert.Nil(err) + + info, err := os.Stat(Filepath) + assert.Nil(err) + assert.Equal(info.Size(), int64(0)) +} + func TestLoopbackFSTestSuite(t *testing.T) { suite.Run(t, new(LoopbackFSTestSuite)) } diff --git a/go.mod b/go.mod index d9a7f3bb9..3cf3bdb78 100755 --- a/go.mod +++ b/go.mod @@ -1,14 +1,12 @@ module github.com/Azure/azure-storage-fuse/v2 -go 1.22.0 - -toolchain go1.23.1 +go 1.22.7 require ( - github.com/Azure/azure-sdk-for-go/sdk/azcore v1.16.0 - github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.8.0 - github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.5.0 - github.com/Azure/azure-sdk-for-go/sdk/storage/azdatalake v1.3.0 + github.com/Azure/azure-sdk-for-go/sdk/azcore v1.17.0 + github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.8.1 + github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.0 + github.com/Azure/azure-sdk-for-go/sdk/storage/azdatalake v1.4.0-beta.1 github.com/JeffreyRichter/enum v0.0.0-20180725232043-2567042f9cda github.com/fsnotify/fsnotify v1.8.0 github.com/golang/mock v1.6.0 @@ -21,6 +19,7 @@ require ( github.com/spf13/pflag v1.0.5 github.com/spf13/viper v1.19.0 github.com/stretchr/testify v1.10.0 + github.com/vibhansa-msft/blobfilter v0.0.0-20250115104552-d9d40722be3e github.com/vibhansa-msft/tlru v0.0.0-20240410102558-9e708419e21f go.uber.org/atomic v1.11.0 gopkg.in/ini.v1 v1.67.0 @@ -44,17 +43,17 @@ require ( github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c // indirect github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect github.com/russross/blackfriday/v2 v2.1.0 // indirect - github.com/sagikazarmark/locafero v0.6.0 // indirect + github.com/sagikazarmark/locafero v0.7.0 // indirect github.com/sagikazarmark/slog-shim v0.1.0 // indirect github.com/sourcegraph/conc v0.3.0 // indirect - github.com/spf13/afero v1.11.0 // indirect + github.com/spf13/afero v1.12.0 // indirect github.com/spf13/cast v1.7.1 // indirect github.com/subosito/gotenv v1.6.0 // indirect go.uber.org/multierr v1.11.0 // indirect - golang.org/x/crypto v0.31.0 // indirect - golang.org/x/exp v0.0.0-20241217172543-b2144cdd0a67 // indirect - golang.org/x/net v0.33.0 // indirect - golang.org/x/sys v0.28.0 // indirect + golang.org/x/crypto v0.32.0 // indirect + golang.org/x/exp v0.0.0-20250106191152-7588d65b2ba8 // indirect + golang.org/x/net v0.34.0 // indirect + golang.org/x/sys v0.29.0 // indirect golang.org/x/text v0.21.0 // indirect ) diff --git a/go.sum b/go.sum index bf59ac008..6f8084133 100644 --- a/go.sum +++ b/go.sum @@ -1,17 +1,17 @@ -github.com/Azure/azure-sdk-for-go/sdk/azcore v1.16.0 h1:JZg6HRh6W6U4OLl6lk7BZ7BLisIzM9dG1R50zUk9C/M= -github.com/Azure/azure-sdk-for-go/sdk/azcore v1.16.0/go.mod h1:YL1xnZ6QejvQHWJrX/AvhFl4WW4rqHVoKspWNVwFk0M= -github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.8.0 h1:B/dfvscEQtew9dVuoxqxrUKKv8Ih2f55PydknDamU+g= -github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.8.0/go.mod h1:fiPSssYvltE08HJchL04dOy+RD4hgrjph0cwGGMntdI= -github.com/Azure/azure-sdk-for-go/sdk/azidentity/cache v0.3.0 h1:+m0M/LFxN43KvULkDNfdXOgrjtg6UYJPFBJyuEcRCAw= -github.com/Azure/azure-sdk-for-go/sdk/azidentity/cache v0.3.0/go.mod h1:PwOyop78lveYMRs6oCxjiVyBdyCgIYH6XHIVZO9/SFQ= +github.com/Azure/azure-sdk-for-go/sdk/azcore v1.17.0 h1:g0EZJwz7xkXQiZAI5xi9f3WWFYBlX1CPTrR+NDToRkQ= +github.com/Azure/azure-sdk-for-go/sdk/azcore v1.17.0/go.mod h1:XCW7KnZet0Opnr7HccfUw1PLc4CjHqpcaxW8DHklNkQ= +github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.8.1 h1:1mvYtZfWQAnwNah/C+Z+Jb9rQH95LPE2vlmMuWAHJk8= +github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.8.1/go.mod h1:75I/mXtme1JyWFtz8GocPHVFyH421IBoZErnO16dd0k= +github.com/Azure/azure-sdk-for-go/sdk/azidentity/cache v0.3.1 h1:Bk5uOhSAenHyR5P61D/NzeQCv+4fEVV8mOkJ82NqpWw= +github.com/Azure/azure-sdk-for-go/sdk/azidentity/cache v0.3.1/go.mod h1:QZ4pw3or1WPmRBxf0cHd1tknzrT54WPBOQoGutCPvSU= github.com/Azure/azure-sdk-for-go/sdk/internal v1.10.0 h1:ywEEhmNahHBihViHepv3xPBn1663uRv2t2q/ESv9seY= github.com/Azure/azure-sdk-for-go/sdk/internal v1.10.0/go.mod h1:iZDifYGJTIgIIkYRNWPENUnqx6bJ2xnSDFI2tjwZNuY= github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/storage/armstorage v1.6.0 h1:PiSrjRPpkQNjrM8H0WwKMnZUdu1RGMtd/LdGKUrOo+c= github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/storage/armstorage v1.6.0/go.mod h1:oDrbWx4ewMylP7xHivfgixbfGBT6APAwsSoHRKotnIc= -github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.5.0 h1:mlmW46Q0B79I+Aj4azKC6xDMFN9a9SyZWESlGWYXbFs= -github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.5.0/go.mod h1:PXe2h+LKcWTX9afWdZoHyODqR4fBa5boUM/8uJfZ0Jo= -github.com/Azure/azure-sdk-for-go/sdk/storage/azdatalake v1.3.0 h1:K0iyzgmfcq5zLxnD0kndh2G7kejTUZ5xO41IHYGOYVM= -github.com/Azure/azure-sdk-for-go/sdk/storage/azdatalake v1.3.0/go.mod h1:CgYxIvUeJo6+7LdnaArwd1Mpk02d9ATikuJviLrxU5E= +github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.0 h1:UXT0o77lXQrikd1kgwIPQOUect7EoR/+sbP4wQKdzxM= +github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.0/go.mod h1:cTvi54pg19DoT07ekoeMgE/taAwNtCShVeZqA+Iv2xI= +github.com/Azure/azure-sdk-for-go/sdk/storage/azdatalake v1.4.0-beta.1 h1:0yYjWwH2PzbU61ogcXZbYJWQs+WleiB42bj+wJMy7zg= +github.com/Azure/azure-sdk-for-go/sdk/storage/azdatalake v1.4.0-beta.1/go.mod h1:8wNWnf3Kk8c9moptloY3YxGAFNOMqu1JMtJbQ33rzVw= github.com/AzureAD/microsoft-authentication-extensions-for-go/cache v0.1.1 h1:WJTmL004Abzc5wDB5VtZG2PJk5ndYDgVacGqfirKxjM= github.com/AzureAD/microsoft-authentication-extensions-for-go/cache v0.1.1/go.mod h1:tCcJZ0uHAmvjsVYzEFivsRTN00oz5BEsRgQHu5JZ9WE= github.com/AzureAD/microsoft-authentication-library-for-go v1.3.2 h1:kYRSnvJju5gYVyhkij+RTJ/VR6QIUaCfWeaFm2ycsjQ= @@ -73,22 +73,22 @@ github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRI github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= github.com/radovskyb/watcher v1.0.7 h1:AYePLih6dpmS32vlHfhCeli8127LzkIgwJGcwwe8tUE= github.com/radovskyb/watcher v1.0.7/go.mod h1:78okwvY5wPdzcb1UYnip1pvrZNIVEIh/Cm+ZuvsUYIg= -github.com/redis/go-redis/v9 v9.6.1 h1:HHDteefn6ZkTtY5fGUE8tj8uy85AHk6zP7CpzIAM0y4= -github.com/redis/go-redis/v9 v9.6.1/go.mod h1:0C0c6ycQsdpVNQpxb1njEQIqkx5UcsM8FJCQLgE9+RA= +github.com/redis/go-redis/v9 v9.7.0 h1:HhLSs+B6O021gwzl+locl0zEDnyNkxMtf/Z3NNBMa9E= +github.com/redis/go-redis/v9 v9.7.0/go.mod h1:f6zhXITC7JUJIlPEiBOTXxJgPLdZcA93GewI7inzyWw= github.com/rogpeppe/go-internal v1.12.0 h1:exVL4IDcn6na9z1rAb56Vxr+CgyK3nn3O+epU5NdKM8= github.com/rogpeppe/go-internal v1.12.0/go.mod h1:E+RYuTGaKKdloAfM02xzb0FW3Paa99yedzYV+kq4uf4= github.com/russross/blackfriday/v2 v2.1.0 h1:JIOH55/0cWyOuilr9/qlrm0BSXldqnqwMsf35Ld67mk= github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM= -github.com/sagikazarmark/locafero v0.6.0 h1:ON7AQg37yzcRPU69mt7gwhFEBwxI6P9T4Qu3N51bwOk= -github.com/sagikazarmark/locafero v0.6.0/go.mod h1:77OmuIc6VTraTXKXIs/uvUxKGUXjE1GbemJYHqdNjX0= +github.com/sagikazarmark/locafero v0.7.0 h1:5MqpDsTGNDhY8sGp0Aowyf0qKsPrhewaLSsFaodPcyo= +github.com/sagikazarmark/locafero v0.7.0/go.mod h1:2za3Cg5rMaTMoG/2Ulr9AwtFaIppKXTRYnozin4aB5k= github.com/sagikazarmark/slog-shim v0.1.0 h1:diDBnUNK9N/354PgrxMywXnAwEr1QZcOr6gto+ugjYE= github.com/sagikazarmark/slog-shim v0.1.0/go.mod h1:SrcSrq8aKtyuqEI1uvTDTK1arOWRIczQRv+GVI1AkeQ= github.com/sevlyar/go-daemon v0.1.6 h1:EUh1MDjEM4BI109Jign0EaknA2izkOyi0LV3ro3QQGs= github.com/sevlyar/go-daemon v0.1.6/go.mod h1:6dJpPatBT9eUwM5VCw9Bt6CdX9Tk6UWvhW3MebLDRKE= github.com/sourcegraph/conc v0.3.0 h1:OQTbbt6P72L20UqAkXXuLOj79LfEanQ+YQFNpLA9ySo= github.com/sourcegraph/conc v0.3.0/go.mod h1:Sdozi7LEKbFPqYX2/J+iBAM6HpqSLTASQIKqDmF7Mt0= -github.com/spf13/afero v1.11.0 h1:WJQKhtpdm3v2IzqG8VMqrr6Rf3UYpEF239Jy9wNepM8= -github.com/spf13/afero v1.11.0/go.mod h1:GH9Y3pIexgf1MTIWtNGyogA5MwRIDXGUr+hbWNoBjkY= +github.com/spf13/afero v1.12.0 h1:UcOPyRBYczmFn6yvphxkn9ZEOY65cpwGKb5mL36mrqs= +github.com/spf13/afero v1.12.0/go.mod h1:ZTlWwG4/ahT8W7T0WQ5uYmjI9duaLQGy3Q2OAl4sk/4= github.com/spf13/cast v1.7.1 h1:cuNEagBQEHWN1FnbGEjCXL2szYEXqfJPbP2HNUaca9Y= github.com/spf13/cast v1.7.1/go.mod h1:ancEpBxwJDODSW/UG4rDrAqiKolqNNh2DX3mk86cAdo= github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA= @@ -99,6 +99,8 @@ github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOf github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY= github.com/subosito/gotenv v1.6.0 h1:9NlTDc1FTs4qu0DDq7AEtTPNw6SVm7uBMsUCUjABIf8= github.com/subosito/gotenv v1.6.0/go.mod h1:Dk4QP5c2W3ibzajGcXpNraDfq2IrhjMIvMSWPKKo0FU= +github.com/vibhansa-msft/blobfilter v0.0.0-20250115104552-d9d40722be3e h1:cwReArp3IJQj86DIFNcYO/0/RZjTwDzgFO7LqZ1ZK0o= +github.com/vibhansa-msft/blobfilter v0.0.0-20250115104552-d9d40722be3e/go.mod h1:7qlJNGhIwS5VRsa7FzXgw7GhdKuVebVwbIRJduXl8Ms= github.com/vibhansa-msft/tlru v0.0.0-20240410102558-9e708419e21f h1:KmQFbsVFi45PtwEWIXugkW0X9VSJ+rZtee/WCPG5unc= github.com/vibhansa-msft/tlru v0.0.0-20240410102558-9e708419e21f/go.mod h1:7G2C64UXEWNr8oUzspzcrymxCjD9fKAKTGbL7zO2GW8= github.com/yuin/goldmark v1.3.5/go.mod h1:mwnBkeHKe2W/ZEtQ+71ViKU8L12m81fl3OWwC1Zlc8k= @@ -108,16 +110,16 @@ go.uber.org/multierr v1.11.0 h1:blXXJkSxSSfBVBlC76pxqeO+LN3aDfLQo+309xJstO0= go.uber.org/multierr v1.11.0/go.mod h1:20+QtiLqy0Nd6FdQB9TLXag12DsQkrbs3htMFfDN80Y= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= -golang.org/x/crypto v0.31.0 h1:ihbySMvVjLAeSH1IbfcRTkD/iNscyz8rGzjF/E5hV6U= -golang.org/x/crypto v0.31.0/go.mod h1:kDsLvtWBEx7MV9tJOj9bnXsPbxwJQ6csT/x4KIN4Ssk= -golang.org/x/exp v0.0.0-20241217172543-b2144cdd0a67 h1:1UoZQm6f0P/ZO0w1Ri+f+ifG/gXhegadRdwBIXEFWDo= -golang.org/x/exp v0.0.0-20241217172543-b2144cdd0a67/go.mod h1:qj5a5QZpwLU2NLQudwIN5koi3beDhSAlJwa67PuM98c= +golang.org/x/crypto v0.32.0 h1:euUpcYgM8WcP71gNpTqQCn6rC2t6ULUPiOzfWaXVVfc= +golang.org/x/crypto v0.32.0/go.mod h1:ZnnJkOaASj8g0AjIduWNlq2NRxL0PlBrbKVyZ6V/Ugc= +golang.org/x/exp v0.0.0-20250106191152-7588d65b2ba8 h1:yqrTHse8TCMW1M1ZCP+VAR/l0kKxwaAIqN/il7x4voA= +golang.org/x/exp v0.0.0-20250106191152-7588d65b2ba8/go.mod h1:tujkw807nyEEAamNbDrEGzRav+ilXA7PCRAd6xsmwiU= golang.org/x/mod v0.4.2/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4/go.mod h1:p54w0d4576C0XHj96bSt6lcn1PtDYWL6XObtHCRCNQM= -golang.org/x/net v0.33.0 h1:74SYHlV8BIgHIFC/LrYkOGIwL19eTYXQ5wc6TBuO36I= -golang.org/x/net v0.33.0/go.mod h1:HXLR5J+9DxmrqMwG9qjGCxZ+zKXxBru04zlTvWlWuN4= +golang.org/x/net v0.34.0 h1:Mb7Mrk043xzHgnRM88suvJFwzVrRfHEHJEl5/71CKw0= +golang.org/x/net v0.34.0/go.mod h1:di0qlW3YNM5oh6GqDGQr92MyTozJPmybPK4Ev/Gm31k= golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= @@ -126,8 +128,8 @@ golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7w golang.org/x/sys v0.0.0-20210330210617-4fbd30eecc44/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210510120138-977fb7262007/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.1.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= -golang.org/x/sys v0.28.0 h1:Fksou7UEQUWlKvIdsqzJmUmCX3cZuD2+P3XyyzwMhlA= -golang.org/x/sys v0.28.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= +golang.org/x/sys v0.29.0 h1:TPYlXGxvx1MGTn2GiZDhnjPA9wZzZeGKHHmKhHYvgaU= +golang.org/x/sys v0.29.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= diff --git a/go_installer.sh b/go_installer.sh index cc27d13ae..d2f813ad4 100755 --- a/go_installer.sh +++ b/go_installer.sh @@ -1,6 +1,6 @@ #!/bin/bash work_dir=$(echo $1 | sed 's:/*$::') -version="1.23.1" +version="1.22.7" arch=`hostnamectl | grep "Arch" | rev | cut -d " " -f 1 | rev` if [ $arch != "arm64" ] diff --git a/internal/attribute.go b/internal/attribute.go index 3b6d63f09..f566be45d 100644 --- a/internal/attribute.go +++ b/internal/attribute.go @@ -69,16 +69,17 @@ const ( // ObjAttr : Attributes of any file/directory type ObjAttr struct { - Mtime time.Time // modified time - Atime time.Time // access time - Ctime time.Time // change time - Crtime time.Time // creation time - Size int64 // size of the file/directory - Mode os.FileMode // permissions in 0xxx format - Flags common.BitMap16 // flags - Path string // full path - Name string // base name of the path - MD5 []byte + Mtime time.Time // modified time + Atime time.Time // access time + Ctime time.Time // change time + Crtime time.Time // creation time + Size int64 // size of the file/directory + Mode os.FileMode // permissions in 0xxx format + Flags common.BitMap16 // flags + Path string // full path + Name string // base name of the path + MD5 []byte // MD5 of the blob as per last GetAttr + ETag string // ETag of the blob as per last GetAttr Metadata map[string]*string // extra information to preserve } diff --git a/internal/component_options.go b/internal/component_options.go index dc53ff2ba..f2fb4cea2 100644 --- a/internal/component_options.go +++ b/internal/component_options.go @@ -96,8 +96,10 @@ type CloseFileOptions struct { } type RenameFileOptions struct { - Src string - Dst string + Src string + Dst string + SrcAttr *ObjAttr + DstAttr *ObjAttr } type ReadFileOptions struct { @@ -107,6 +109,7 @@ type ReadFileOptions struct { type ReadInBufferOptions struct { Handle *handlemap.Handle Offset int64 + Etag *string Data []byte } @@ -202,6 +205,7 @@ type CommitDataOptions struct { Name string List []string BlockSize uint64 + NewETag *string } type CommittedBlock struct { diff --git a/internal/mock_component.go b/internal/mock_component.go index 80bb76baf..4702b9ef2 100644 --- a/internal/mock_component.go +++ b/internal/mock_component.go @@ -43,6 +43,7 @@ import ( handlemap "github.com/Azure/azure-storage-fuse/v2/internal/handlemap" reflect "reflect" "syscall" + "time" gomock "github.com/golang/mock/gomock" ) @@ -534,6 +535,11 @@ func (mr *MockComponentMockRecorder) RenameDir(arg0 interface{}) *gomock.Call { func (m *MockComponent) RenameFile(arg0 RenameFileOptions) error { m.ctrl.T.Helper() ret := m.ctrl.Call(m, "RenameFile", arg0) + if arg0.DstAttr != nil { + arg0.DstAttr.Atime = time.Now() + arg0.DstAttr.Mtime = time.Now() + arg0.DstAttr.Ctime = time.Now() + } ret0, _ := ret[0].(error) return ret0 } diff --git a/setup/advancedConfig.yaml b/setup/advancedConfig.yaml index 1c01a793f..5522e777c 100644 --- a/setup/advancedConfig.yaml +++ b/setup/advancedConfig.yaml @@ -126,6 +126,7 @@ azstorage: clientid: clientsecret: oauth-token-path: + workload-identity-token: # Optional use-http: true|false aadendpoint: diff --git a/setup/baseConfig.yaml b/setup/baseConfig.yaml index 6c2cb9cb8..ebed085da 100644 --- a/setup/baseConfig.yaml +++ b/setup/baseConfig.yaml @@ -50,3 +50,4 @@ azstorage: clientid: clientsecret: oauth-token-path: + workload-identity-token: diff --git a/setup/setupUBN.sh b/setup/setupUBN.sh new file mode 100755 index 000000000..85ba0efcf --- /dev/null +++ b/setup/setupUBN.sh @@ -0,0 +1,48 @@ +# This setup script can be used to install all the dependencies required to clone and run the project on Ubuntu machines + +!/bin/bash + +Run the go_installer script with the parent directory as an argument +./go_installer.sh ../ +echo "Installed go" +go version +sudo apt update -y +sudo apt install openssh-server -y +sudo apt install net-tools -y +sudo apt install git -y +sudo apt install gcc -y +sudo apt install libfuse-dev -y +sudo apt install fuse -y +sudo apt install fuse3 -y +sudo apt install libfuse3-dev -y +echo "Installed all dependencies" -y + +# Open the file /etc/fuse.conf and uncomment the line user_allow_other +sudo sed -i 's/#user_allow_other/user_allow_other/' /etc/fuse.conf +echo "Uncommented user_allow_other in /etc/fuse.conf" + +# Add Microsoft Linux repository for Ubuntu +wget -qO- https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add - +sudo add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list)" +sudo apt update + +# Install Blobfuse2 +sudo apt install blobfuse2 -y +echo "Installed Blobfuse2" + +#Blobfuse2 version +blobfuse2 --version + +#Build blobfuse2 from repo +#Navigate to the parent directory of the project and run +#./build.sh + +# For not entering password every time on running sudo command, add this line at the end of the +# /etc/sudoers file, +# ALL=(ALL:ALL) NOPASSWD:ALL + +# Calling the setup script for AzSecPack setup +echo "Calling the setup script for AzSecPack setup" +setup/vmSetupAzSecPack.sh + + diff --git a/setup/vmSetupAzSecPack.sh b/setup/vmSetupAzSecPack.sh new file mode 100755 index 000000000..5898b7578 --- /dev/null +++ b/setup/vmSetupAzSecPack.sh @@ -0,0 +1,108 @@ +# Script to setup Azsecpack on Ubuntu VM as per recent SFI guidelines +!/bin/bash + +# Install Azure CLI +curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash + +# Update package lists +sudo apt-get update -y + +# Install required packages +sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release -y + +# Create directory for Microsoft GPG key +sudo mkdir -p /etc/apt/keyrings + +# Download and install Microsoft GPG key +curl -sLS https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor | sudo tee /etc/apt/keyrings/microsoft.gpg > /dev/null + +# Set permissions for the GPG key +sudo chmod go+r /etc/apt/keyrings/microsoft.gpg + +# Get the distribution codename +AZ_DIST=$(lsb_release -cs) + +# Add Azure CLI repository to sources list +echo "Types: deb +URIs: https://packages.microsoft.com/repos/azure-cli/ +Suites: ${AZ_DIST} +Components: main +Architectures: $(dpkg --print-architecture) +Signed-by: /etc/apt/keyrings/microsoft.gpg" | sudo tee /etc/apt/sources.list.d/azure-cli.sources + +# Update package lists again +sudo apt-get update + +# Install Azure CLI again to ensure it's up to date +sudo apt-get install azure-cli -y + +# Remove unnecessary packages +sudo apt autoremove -y + +# Upgrade Azure CLI to the latest version +az upgrade -y + +#------------------------------------------------------------------------------------------------------- + +# Log in to Azure +# You will get a pop-up here select your account and login +echo "You will get a pop-up here select your account and login" +echo "PLEASE NOTE: After az login you should select the Subscription you are on and enter that Subscription ID : +\\n For Example: XCLient 116 is shown in the list of subscriptions, you should then enter 116" +az login --tenant 72f988bf-86f1-41af-91ab-2d7cd011db47 + +# Extracting VM name from hostname +vm_name=$(hostname) + +# Extracting resource group name from Azure Instance Metadata Service +resource_group=$(curl -H Metadata:true "http://169.254.169.254/metadata/instance?api-version=2021-02-01" -s | jq -r '.compute.resourceGroupName') + +# Check if VM name and resource group are not empty +if [ -z "$vm_name" ] || [ -z "$resource_group" ]; then + echo "Failed to retrieve VM name or resource group. You will have to manually insert these values in the upcoming commands" + echo "az vm extension set -n AzureMonitorLinuxAgent --publisher Microsoft.Azure.Monitor --version 1.0 --vm-name --resource-group --enable-auto-upgrade true --settings '{"GCS_AUTO_CONFIG": true}'" + echo "az vm extension set -n AzureSecurityLinuxAgent --publisher Microsoft.Azure.Security.Monitoring --version 2.0 --vm-name --resource-group --enable-auto-upgrade true --settings '{"enableGenevaUpload":true,"enableAutoConfig":true}'" + exit 1 +fi + +# Install Azure Monitor Linux Agent extension +# az vm extension set -n AzureMonitorLinuxAgent --publisher Microsoft.Azure.Monitor --version 1.0 --vm-name --resource-group --enable-auto-upgrade true --settings '{"GCS_AUTO_CONFIG": true}' +az vm extension set -n AzureMonitorLinuxAgent --publisher Microsoft.Azure.Monitor --version 1.0 --vm-name $vm_name --resource-group $resource_group --enable-auto-upgrade true --settings '{"GCS_AUTO_CONFIG": true}' + +# Install Azure Security Linux Agent extension +# az vm extension set -n AzureSecurityLinuxAgent --publisher Microsoft.Azure.Security.Monitoring --version 2.0 --vm-name --resource-group --enable-auto-upgrade true --settings '{"enableGenevaUpload":true,"enableAutoConfig":true}' +az vm extension set -n AzureSecurityLinuxAgent --publisher Microsoft.Azure.Security.Monitoring --version 2.0 --vm-name $vm_name --resource-group $resource_group --enable-auto-upgrade true --settings '{"enableGenevaUpload":true,"enableAutoConfig":true}' + +# Check the status of Azure Security Pack +status_output=$(sudo /usr/local/bin/azsecd status) + +# Check if AutoConfig is enabled +if echo "$status_output" | grep -Pzo "AutoConfig:\n\s+Enabled\(true\)" > /dev/null; then + autoconfig_enabled="true" +else + autoconfig_enabled="false" +fi +# Check if AzSecPack is present in ResourceTags +azsecpack_present=$(echo "$status_output" | grep -q 'AzSecPack:\s*IsPresent(true)' && echo "true" || echo "false") + +if [ "$autoconfig_enabled" = "true" ]; then + echo "AutoConfig is enabled." +else + echo "AutoConfig is not enabled. Please manually check if any installation step has failed." +fi + +if [ "$azsecpack_present" = "true" ]; then + echo "AzSecPack is present in ResourceTags." +else + echo "AzSecPack is not present in ResourceTags.Please manually check if any installation step has failed." +fi + +echo "Please check the status of Azure Security Pack by running 'sudo /usr/local/bin/azsecd status'" +echo "Installation of Azure Security Pack is complete.If you found any errors please manually check the installation steps." +#------------------------------------------------------------------------------------------------------- + + +sleep 100 +# Check for pending updates, assess and install patches +az vm assess-patches --resource-group $resource_group --name $vm_name +az vm install-patches --resource-group $resource_group --name $vm_name --maximum-duration PT2H --reboot-setting IfRequired --classifications-to-include-linux Critical Security \ No newline at end of file