-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SFR-2476: Implement DSpace service and update DOAB ingest #531
Conversation
services/sources/dspace_service.py
Outdated
self.constants = get_constants() | ||
|
||
self.base_url = base_url | ||
self.SourceMapping = SourceMapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do source_mapping for both the param and class variable here.
It looks a bit funky when you instantiate an object but it's common practice to use snake case.
services/sources/dspace_service.py
Outdated
'oaire': 'https://raw.githubusercontent.com/rcic/openaire4/master/schemas/4.0/oaire.xsd' | ||
} | ||
|
||
def __init__(self, base_url, SourceMapping: XMLMapping): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fun cuz I've actually never done this in python so we are learning ha!
I think when we say that the SourceMapping type is XMLMapping, we are expecting an actual object that is an instance of XMLMapping.
Checkout typing for Python: https://docs.python.org/3/library/typing.html
I think we could do type[XMLMapping] or the like here.
services/sources/dspace_service.py
Outdated
self.s3_manager = S3Manager() | ||
self.s3_manager.createS3Client() | ||
self.s3_bucket = os.environ['FILE_BUCKET'] | ||
|
||
self.file_queue = os.environ['FILE_QUEUE'] | ||
self.file_route = os.environ['FILE_ROUTING_KEY'] | ||
|
||
self.rabbitmq_manager = RabbitMQManager() | ||
self.rabbitmq_manager.createRabbitConnection() | ||
self.rabbitmq_manager.createOrConnectQueue( | ||
self.file_queue, self.file_route) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm are these composed class instances being called? I think we can remove them if not and it makes it cleaner because this class is really only responsible for getting records from DSpace so it's following the single responsibility principle: https://stackify.com/solid-design-principles/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved. This was a good read, thanks!
processes/ingest/doab.py
Outdated
if self.process == 'daily': | ||
records = self.dSpace_service.get_records(offset=self.offset, limit=self.limit) | ||
elif self.process == 'complete': | ||
records = self.dSpace_service.get_records(full_or_partial=True, offset=self.offset, limit=self.limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend changing the name full_or_partial
to full_import
since it's more clear what this variable does and the publisherBacklistProcess uses that variable name too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up you could also just remove the full param!
I'm working on adding integration tests for the DSpace service |
self.saveRecords() | ||
self.commitChanges() | ||
|
||
logger.info(f'Ingested {len(self.records) if records else 1} DOAB records') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the result of the else conditional be 0 if the records array is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should still return 0 since the len of the records array would be 0 if it is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are looking good so far Jackie!
services/sources/dspace_service.py
Outdated
|
||
return BytesIO(content) | ||
|
||
raise DSpaceError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just raise an Exception here unless downstream we need to handle it in a specific way.
services/sources/dspace_service.py
Outdated
f'Received {response.status_code} status code from {url}') | ||
|
||
|
||
class DSpaceError(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is no longer needed - feel free to remove!
processes/ingest/doab.py
Outdated
class DOABProcess(CoreProcess): | ||
DOAB_BASE_URL = 'https://directory.doabooks.org/oai/request?' | ||
ROOT_NAMESPACE = {None: 'http://www.openarchives.org/OAI/2.0/'} | ||
DOAB_IDENTIFIER = "oai:directory.doabooks.org" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do single quotes here! we really need a formatter lol
processes/ingest/doab.py
Outdated
|
||
self.constants = get_constants() | ||
self.dSpace_service = DSpaceService(base_url=self.DOAB_BASE_URL, source_mapping=DOABMapping) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend doing d_space_service or dspace_service so that it's not a mix-match of camel and snake case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look great! Just a few minor non-blocking comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look really good Jackie!
Description
Testing
python main.py -p DOABProcess -e local -r "20.500.12854/62823"