[](https://pypi.org/project/lbsntransform/) [](https://gitlab.vgiscience.de/lbsn/lbsntransform) [](https://gitlab.vgiscience.de/lbsn/lbsntransform/-/commits/master) [](https://lbsn.vgiscience.org/lbsntransform/docs/) # LBSNTransform A python package that uses the [common location based social network (LBSN) data structure][lbsnstructure] (ProtoBuf) to import, transform and export Social Media data such as Twitter and Flickr.  ## Motivation The goal is to provide a common interface to handle Social Media Data, without the need to individually adapt to the myriad API endpoints available. As an example, consider the ProtoBuf spec [lbsn.Post][lbsnpost], which can be a Tweet on Twitter, a Photo shared on Flickr, or a post on Reddit. However, all of these objects share a common set of attributes, which is reflected in the lbsnstructure. The tool is based on a 4-Facet conceptual framework for LBSN, introduced in a paper by [Dunkel et al. (2018)](https://www.tandfonline.com/doi/full/10.1080/13658816.2018.1546390). The GDPR directly requests Social Media Network operators to allow users to transfer accounts and data in-between services. While there are attempts by Google, Facebook etc. (e.g. see the [data-transfer-project][data-transfer-project]), this is not currently possible. With the lbsnstructure, a primary motivation is to systematically characterize LBSN data aspects in a common, cross-network data scheme that enables privacy-by-design for connected software, data handling and database design. ## Description This tool enables data import from a Postgres database, JSON, or CSV and export to CSV, [LBSN ProtoBuf][lbsnstructure] or the [hll][hlldb] and [raw][rawdb] versions of the LBSN prepared Postgres Databases. The tool will map Social Media endpoints (e.g. Twitter tweets) to a common [LBSN Interchange Structure][lbsnstructure] format in ProtoBuf. LBSNTransform can be used using the command line (CLI) or imported to other Python projects with `import lbsntransform`, for on-the-fly conversion. ## Quick Start The recommended way to install lbsntransform, for both Linux and Windows, is through the conda package manager. 1. Create a conda env using `environment.yml` First, create an environment with the dependencies for lbsntransform using the [environment.yml][environment.yml] that is provided in the root of the repository. ```bash git clone https://github.com/Sieboldianus/lbsntransform.git cd lbsntransform # not necessary, but recommended: conda config --env --set channel_priority strict conda env create -f environment.yml ``` 2. Install lbsntransform without dependencies Afterwards, install lbsntransform using pip, without dependencies. ```bash conda activate lbsntransform pip install lbsntransform --no-deps --upgrade # or locally, from the latest commits on master # pip install . --no-deps --upgrade ``` 3. Import data using a mapping For each data source, a mapping must be provided that defines how data is mapped to the [lbsnstructure][lbsnstructure]. The default mapping is [lbsnraw][lbsnraw]. Additional mappings can be dynamically loaded from a folder. We have provided two [example mappings][mappings] for the [Flickr YFCC100M dataset][yfcc100m] (CSV) and Twitter (json). For example, to import the first 1000 records from json data from Twitter to the [lbsn raw database][rawdb], clone [field_mapping_twitter.py][field_mapping_twitter] to a local folder `./resources/mappings/`, startup the Docker [rawdb][rawdb] container, and use: ```shell lbsntransform --origin 3 \ --mappings_path ./resources/mappings/ \ --file_input \ --file_type "json" \ --mappings_path ./resources/mappings/ \ --dbpassword_output "sample-key" \ --dbuser_output "postgres" \ --dbserveraddress_output "127.0.0.1:5432" \ --dbname_output "rawdb" \ --dbformat_output "lbsn" \ --transferlimit 1000 ``` .. with the above input args, the the tool will: - read local json from `./01_Input/` - and store lbsn records to the [lbsn rawdb][rawdb]. Vice versa, to import data directly to the privacy-aware version of lbsnstructure, called [hlldb][hlldb], startup the Docker container, and use: ```shell lbsntransform --origin 3 \ --mappings_path ./resources/mappings/ \ --file_input \ --file_type "json" \ --mappings_path ./resources/mappings/ \ --dbpassword_output "sample-key" \ --dbuser_output "postgres" \ --dbserveraddress_output "127.0.0.1:25432" \ --dbname_output "hlldb" \ --dbformat_output "hll" \ --dbpassword_hllworker "sample-key" \ --dbuser_hllworker "postgres" \ --dbserveraddress_hllworker "127.0.0.1:25432" \ --dbname_hllworker "hlldb" \ --include_lbsn_objects "origin,post" \ --include_lbsn_bases hashtag,place,date,community \ --transferlimit 1000 ``` .. with the above input args, the the tool will: - read local json from `./01_Input/` - and store lbsn records to the privacy-aware [lbsn hlldb][hlldb] - by converting only lbsn objects of type [origin][lbsnorigin] and [post][lbsnpost] - and updating the HyperLogLog (HLL) target tables `hashtag`, `place`, `date` and `community` A full list of possible input and output args is available in the [documentation](https://lbsn.vgiscience.org/lbsntransform/docs/). ## Built With * [lbsnstructure](https://pypi.org/project/lbsnstructure/) - A common language independend and cross-network social-media datascheme * [protobuf](https://github.com/google/protobuf) - Google's data interchange format * [psycopg2](https://github.com/psycopg/psycopg2) - Python-PostgreSQL Database Adapter * [ppygis3](https://github.com/AlexImmer/ppygis3) - A PPyGIS port for Python * [shapely](https://github.com/Toblerity/Shapely) - Geometric objects processing in Python * [emoji](https://github.com/carpedm20/emoji/) - Emoji handling in Python ## Authors * **Alexander Dunkel** - Initial work See also the list of [contributors](/../graphs/master). ## License This project is licensed under the GNU GPLv3 or any higher - see the [LICENSE.md](LICENSE.md) file for details. [lbsnstructure]: https://lbsn.vgiscience.org/structure/ [lbsnpost]: https://lbsn.vgiscience.org/structure/#post [lbsnorigin]: https://lbsn.vgiscience.org/structure/#origin [data-transfer-project]: https://datatransferproject.dev/ [rawdb]: https://gitlab.vgiscience.de/lbsn/databases/rawdb [hlldb]: https://gitlab.vgiscience.de/lbsn/databases/hlldb [lbsnraw]: lbsntransform/input/mappings/field_mapping_lbsn.py [mappings]: resources/mappings [field_mapping_twitter]: resources/mappings/field_mapping_twitter.py [yfcc100m]: http://projects.dfki.uni-kl.de/yfcc100m/