The goal of this project is to convert PCAP files into Parquet format and make them available via Amazon Athena. The project uses libraries from Entrada and also contains code which was copied from Entrada and modified where needed.
PCAP to Athena is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. The PCAP to Athena code is the result of a proof-of-concept. It is not guaranteed to work and is not ready for production use.
For the columns used in the parquet files, see http://entrada.sidnlabs.nl/docs/concepts/data_model/#dns
The software will automatically look for application.properties
file in the user directory, where the java command is launched.
You can copy the application.properties
from the source and adapt it to your environment.
In order to enrich parquet file with information such as Country and ASN, the software looks up the IP in the Maxmind database. This database is just a file that needs to be downloaded and made available to the software via configuration.
The default configuration locates the maxmind database in ${user.dir}/maxmind
.
To download the Maxmind database, run
cd maxmind && ./download_maxmind_geo_ip_db.sh
Athena JDBC driver is needed in order to connect to Amazon Athena and execute SQL statements. To download the jar, run
cd lib && ./download_libs.sh
The script also download pcaplib4java and dnslib4java
This software uses S3 buckets. The responsability of creation of the bucket is left to the reader.
Once bucket name(s) is (are) defined, the configuration in application.properties
must be updated, in particular pcap.bucket.name
, pcap.archive.bucket.name
, parquet.bucket.name
.
A database and a table must be created in AWS Athena and should be filled in the config at athena.database.name
and athena.table.name
.
Table can be created via the SQL statement in src/resources/sql/athena-create-table.sql
.
You need a credentials file: ~/.aws/credentials
with a [default]
profile containing a valid pair of aws_access_key_id and aws_secret_access_key.
Be sure to have the correct permissions for S3 and Athena.
The project uses Maven for building sources. Once Maxmind DB and third party libraries are downloaded, you can run
./mvnw validate package
to install jar in your local maven repository, and build and package sources to a jar. The jar will be located in target/
.
The test folder contains both unit tests and integration tests.
The unit tests can be runned by ./mvnw test
.
Some tests require access to AWS and are skipped by default.
Use ./run-integration-tests.sh
to execute these tests.
To run these tests, you need a credentials file: ~/.aws/credentials
with a [default]
profile containing a valid pair of aws_access_key_id and aws_secret_access_key
The IAM user corresponding to these access key should have write permissions in the S3 buckets used in the tests.
See src/test/resources/test-application.properties
for the names of the S3 buckets
This product includes ENTRADA created by SIDN Labs, available from http://entrada.sidnlabs.nl.