Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I supply Files.contentType? #20

Closed
omikader opened this issue Jan 27, 2024 · 8 comments
Closed

How do I supply Files.contentType? #20

omikader opened this issue Jan 27, 2024 · 8 comments

Comments

@omikader
Copy link

omikader commented Jan 27, 2024

How can I provide a value for my file's content type to the partitioning API?

I noticed that the code in unstructured-api determines how to partition each file using the content_type attribute attached to the FastAPI UploadFile. If one is not provided, it tries to infer the file type using the filename extension.

My file names are arbitrary UUIDs (no filename extension) so when I try to partition them I get this error

{"detail":"File type None is not supported."}

I would like to manually provide a value for UploadFile.content_type to avoid the fallback behavior but I don't see a way to do that using the JS client. Can we modify the Files definition to include an optional value for contentType, which I presume would be used in the unstructured-api code and result in skipping the fallback path?

export declare class Files extends SpeakeasyBase {
    content: Uint8Array;
    fileName: string;
    // PROPOSING WE ADD THE FOLLOWING LINE
    contentType?: string;
}
@omikader omikader changed the title Should contentType be an optional parameter? How do I supply FIles.contentType? Jan 29, 2024
@omikader omikader changed the title How do I supply FIles.contentType? How do I supply Files.contentType? Jan 29, 2024
@awalker4
Copy link
Collaborator

awalker4 commented Feb 3, 2024

Hi there, apologies for the delay. This is certainly something that should be in the client. I can do some digging and get back to you soon. We're also planning to improve the content type checking on the server side in the near term.

@omikader
Copy link
Author

Hi @awalker4! Do you have any updates on this front? We'd like to upgrade to the new version of the JS client but we get the infamous {"detail":"File type None is not supported."} error when we try provide to provide a Blob type for the files argument. Unfortunately, when providing a Blob, you can no longer supply the fileName

@awalker4
Copy link
Collaborator

Hi Omar, sorry for the delays! We still need to get content_type in as a client param, and I'd like get around to that this week. As a workaround, the latest client does still take the files object from before, so you can set the filename. Check out the Typescript tab in the docs here. Let me know if this is sufficient for now, or if you're blocked on needing the content type.

Separately, I have an internal ticket to improve server side file handling. We can address the filetype None issue by actually inspecting the file and not just keying off of the extension.

@omikader
Copy link
Author

No problem! Thanks for the quick response! Yes, once we upgrade we can continue to provide the Files object but we'd love to start using the Blob variant to avoid loading the entire file into memory at once.

For now, we've decided to stay on the older client version because we started running into fetch timeout issues at the 5 minute mark and I believe this issue is related nodejs/node#46375

@alimoezzi
Copy link

Also when supplying files: new Blob([data], { type: file.mimetype }), the splitPdfPage: true doesn't work and client raises Given file is not a PDF. Continuing without splitting.

@santiq
Copy link

santiq commented Jul 29, 2024

@alimoezzi

You can do this

const blob = await openAsBlob('path/to/filename.pdf');
const name = 'filename.pdf';
const file = new File([blob], name);

@awalker4
Copy link
Collaborator

awalker4 commented Jul 31, 2024

Hi all, we've merged a fix for the API that removes the naive extension check and does an actual filetype detection. This will get rid of the Filetype None is not supported errors and should cover most of the cases where you'd need to explicitly send a content type. This is deployed in our hosted serverless and free tier APIs.

@awalker4
Copy link
Collaborator

awalker4 commented Aug 6, 2024

We've added the contentType parameter to the SDK, to coincide with the new API param here. In addition to the better filetype checking, this issue should be resolved. Apologies for the very long turnaround time on this :/

@alimoezzi I created #100 for the pdf page splitting bug.

@awalker4 awalker4 closed this as completed Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants