Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tap refactoring using singer generator #47

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

rdeshmukh15
Copy link
Contributor

@rdeshmukh15 rdeshmukh15 commented Feb 27, 2025

Description of change

  • Complete tap refactoring
  • Fixes Dependabot issue #2
  • singer-python upgrade to 6.1.0 and backoff to 2.2.1
  • Unit test cases update

Manual QA steps

Risks

  • Changes are not tested due to missing credentials

Rollback steps

  • revert this branch

AI generated code

https://internal.qlik.dev/general/ways-of-working/code-reviews/#guidelines-for-ai-generated-code

  • this PR has been written with the help of GitHub Copilot or another generative AI tool

@rdeshmukh15 rdeshmukh15 marked this pull request as ready for review March 4, 2025 07:18
Comment on lines +70 to +77
```json
{
"start_date": "2019-01-01T00:00:00Z",
"user_agent": "tap-appsflyer <api_user_email@your_company.com>",
"app_id": "abc1e2swewe",
"api_token": "askawqewdqwer123445666"
...
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```json
{
"start_date": "2019-01-01T00:00:00Z",
"user_agent": "tap-appsflyer <api_user_email@your_company.com>",
"app_id": "abc1e2swewe",
"api_token": "askawqewdqwer123445666"
...
}
```json
{
"start_date": "2019-01-01T00:00:00Z",
"user_agent": "tap-appsflyer <api_user_email@your_company.com>",
"app_id": "abc1e2swewe",
"api_token": "askawqewdqwer123445666"
...
}```

Comment on lines +95 to +98
See the Singer docs on discovery mode
[here](https://github.com/singer-io/getting-started/blob/master/docs/DISCOVERY_MODE.md

5. Run the Tap in Sync Mode (with catalog) and [write out to state file](https://github.com/singer-io/getting-started/blob/master/docs/RUNNING_AND_DEVELOPING.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
See the Singer docs on discovery mode
[here](https://github.com/singer-io/getting-started/blob/master/docs/DISCOVERY_MODE.md
5. Run the Tap in Sync Mode (with catalog) and [write out to state file](https://github.com/singer-io/getting-started/blob/master/docs/RUNNING_AND_DEVELOPING.md
See the Singer docs on discovery mode
[here](https://github.com/singer-io/getting-started/blob/master/docs/DISCOVERY_MODE.md)
5. Run the Tap in Sync Mode (with catalog) and [write out to state file](https://github.com/singer-io/getting-started/blob/master/docs/RUNNING_AND_DEVELOPING.md)


To [check the tap](https://github.com/singer-io/singer-tools)
```bash
> tap-mixpanel --config tap_config.json --catalog catalog.json | singer-check-tap > state.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> tap-mixpanel --config tap_config.json --catalog catalog.json | singer-check-tap > state.json
> tap-appsflyer --config tap_config.json --catalog catalog.json | singer-check-tap > state.json

data following the [Singer spec](https://github.com/singer-io/getting-started/blob/master/SPEC.md).
This is a [Singer](https://singer.io) tap that produces JSON-formatted data
following the [Singer
spec](https://github.com/singer-io/getting-started/blob/master/SPEC.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spec](https://github.com/singer-io/getting-started/blob/master/SPEC.md).
spec](https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md).

url="http://singer.io",
classifiers=["Programming Language :: Python :: 3 :: Only"],
py_modules=["tap_appsflyer"],
install_requires= ["attrs==25.1.0", "singer-python==6.1.0", "requests==2.32.3", "backoff==2.2.1"],
Copy link
Contributor

@RushiT0122 RushiT0122 Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
install_requires= ["attrs==25.1.0", "singer-python==6.1.0", "requests==2.32.3", "backoff==2.2.1"],
install_requires= ["attrs==25.1.0",
"singer-python==6.1.0",
"requests==2.32.3",
"backoff==2.2.1"],

Comment on lines +32 to +35
sync(client=client,
config=parsed_args.config,
catalog=parsed_args.catalog,
state=state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sync(client=client,
config=parsed_args.config,
catalog=parsed_args.catalog,
state=state)
sync(client=client,
config=parsed_args.config,
catalog=parsed_args.catalog,
state=state)

Comment on lines +54 to +57
if config_request_timeout and float(config_request_timeout):
self.request_timeout = float(config_request_timeout)
else:
self.request_timeout = REQUEST_TIMEOUT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically your code is correct, just a suggestion.

Suggested change
if config_request_timeout and float(config_request_timeout):
self.request_timeout = float(config_request_timeout)
else:
self.request_timeout = REQUEST_TIMEOUT
self.request_timeout = float(config_request_timeout) if config_request_timeout else REQUEST_TIMEOUT

Comment on lines +206 to +217
while True:
response = self.client.get(
extraction_url, self.params, self.headers
)
if not response:
LOGGER.warning("No records found on Page %s", page_count)
break

with singer.metrics.http_request_timer(self.parse_source_from_url(self.client.base_url)) as timer:
resp = SESSION.send(response)
timer.tags[singer.metrics.Tag.http_status_code] = resp.status_code
return resp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove while here.

Comment on lines +290 to +293
if value.lower() == "TRUE".lower():
record[field_name] = True
else:
record[field_name] = False
Copy link
Contributor

@RushiT0122 RushiT0122 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if value.lower() == "TRUE".lower():
record[field_name] = True
else:
record[field_name] = False
record[field_name] = value.lower() == "true"

Comment on lines +312 to +315
from_datetime = bookmark_date =self.get_bookmark(state)
to_datetime = self.get_stop(from_datetime, datetime.datetime.now(pytz.utc))

current_max_bookmark_date = bookmark_date_to_utc = bookmark_date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from_datetime = bookmark_date =self.get_bookmark(state)
to_datetime = self.get_stop(from_datetime, datetime.datetime.now(pytz.utc))
current_max_bookmark_date = bookmark_date_to_utc = bookmark_date
from_datetime = bookmark_date
bookmark_date = self.get_bookmark(state)
to_datetime = self.get_stop(from_datetime, datetime.datetime.now(pytz.utc))
current_max_bookmark_date = bookmark_date_to_utc
bookmark_date_to_utc = bookmark_date

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the same wherever required?

Comment on lines +324 to +337
try:
first_line = next(csv_data)
# Push the line back into the iterator
csv_data = iter([first_line] + list(csv_data))
except StopIteration:
LOGGER.warning("No data available in the CSV.")
return state
reader = csv.DictReader(csv_data, fieldnames)

try:
next(reader) # Skip the header row
except StopIteration:
LOGGER.warning("No data available after header row.")
return state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try:
first_line = next(csv_data)
# Push the line back into the iterator
csv_data = iter([first_line] + list(csv_data))
except StopIteration:
LOGGER.warning("No data available in the CSV.")
return state
reader = csv.DictReader(csv_data, fieldnames)
try:
next(reader) # Skip the header row
except StopIteration:
LOGGER.warning("No data available after header row.")
return state
try:
reader = csv.DictReader(csv_data, fieldnames)
next(reader) # Skip the header row
except StopIteration:
LOGGER.warning("No data available in the CSV.")
return state

Comment on lines +339 to +354
for i, row in enumerate(reader):
xform_record = self.xform(row)
transformed_record = transformer.transform(
xform_record, schema, stream_metadata
)
try:
record_timestamp = strptime_to_utc(
transformed_record[self.replication_keys[0]]
)
except KeyError as _:
LOGGER.error(
"Unable to process Record, Exception occurred: %s for stream %s",
_,
self.__class__,
)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i, row in enumerate(reader):
xform_record = self.xform(row)
transformed_record = transformer.transform(
xform_record, schema, stream_metadata
)
try:
record_timestamp = strptime_to_utc(
transformed_record[self.replication_keys[0]]
)
except KeyError as _:
LOGGER.error(
"Unable to process Record, Exception occurred: %s for stream %s",
_,
self.__class__,
)
continue
for _, row in enumerate(reader):
xform_record = self.xform(row)
transformed_record = transformer.transform(
xform_record, schema, stream_metadata
)
try:
record_timestamp = strptime_to_utc(
transformed_record[self.replication_keys[0]]
)
except KeyError as ex:
LOGGER.error(
"Unable to process Record, Exception occurred: %s for stream %s",
ex,
self.__class__,
)
continue

Copy link
Contributor

@RushiT0122 RushiT0122 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes inline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants