Key Tips for Managing Passwords in Sqoop
Sqoop is a popular data transfer tool for Hadoop. Sqoop allows easy import and export of data from structured data stores like relational databases, enterprise data warehouses, and NoSQL datastores. Sqoop also integrates with Hadoop-based systems such as Hive, HBase, and Oozie.
In this blog post, I will cover the different options available for managing passwords in Sqoop. Sqoop is a very stable and easy to use tool for transferring data into and out of Hadoop. However, if you want to automate the import/export process, the password has to be in plan text, and this can be abused. Here are some solutions to this challenge.
1. Use a Protected File in the Filesystem
Create separate files for Prod and Development (it is always a good pratice to use separate accounts for Prod and Development).
Set File permissions based on the user
ProdFile, Only Prod users have access
-rw------- 1 vgunnuProd staff 1.1M Oct 6 2014 password.txt
Dev File, Giving group read access – To share the file between team members
-rw-r----- 1 vgunnuDev staff 1.1M Oct 6 2014 password.txt sqoop import --connect jdbc:mysql://mapr.com/sqoop \ --username sqoop \ --password-file /mapr/democluster/opt/passwords/prodpass.txt --table vgunnu
2. Read Passwords from the Database
Maintain Dev and PROD database tables, and leverage MySQL grant premissions to lock down PROD password tables in order to give access to certain users from Prod edgenodes.
# Read password from database # Select Password from mapr.Prod where applicationdb='mapr.vgunnu' echo -n $Password > /mapr/democluster/opt/passwords/prodpass.txt sqoop import --connect jdbc:mysql://mapr.com/sqoop \ --username vgunnu \ --password-file /mapr/democluster/opt/passwords/prodpass.txt --table vgunnu rm /mapr/democluster/opt/passwords/prodpass.txt
3. Use the Database and Expect Script
In the above option, instead of saving the password to file, we can use a linux spawn and expect script.
#!/bin/bash # Read password from database # Select Password from mapr.Prod where applicationdb='mapr.vgunnu' # Read password to variable pwd /usr/bin/expect «EOF spawn sqoop import --connect jdbc:mysql://mapr.com/sqoop --username vgunnu -P --table vgunnu expect "Enter password:" send "$pwd\r\n" set timeout -1 # Wait for Sqoop to finish expect "~~~~~~~~~~~~" wait EOF
4. Use the Hadoop CredentialProvider API
In Hadoop 2.6, a fundamental feature was introduced – the Credential API. The CredentialProvider API in Hadoop allows for the separation of applications and how they store their required passwords/secrets. With Sqoop 1.4.5, the credentail API keystore is supported by Sqoop.
To generate an encrypted credentail keystore:
[mapr@maprdemo ~]$ hadoop credential create msql.vgunnudb -provider jceks://maprfs/user/mapr/mysql.password.m.jceks Enter password: Enter password again: msql.vgunnudb has been successfully created. org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated. [mapr@maprdemo ~]$ hadoop credential list -provider jceks://maprfs/user/mapr/mysql.password.m.jceks Listing aliases for CredentialProvider: jceks://maprfs/user/mapr/mysql.password.m.jceks msql.vgunnudb sqoop import -Dhadoop.security.credential.provider.path=jceks://maprfs/user/mapr/mysql.password.m.jceks \ --connect jdbc:mysql://mapr.com/sqoop \ --username vgunnu \ --password-file /mapr/democluster/opt/passwords/prodpass.txt --table vgunnu \ -–password-alias msql.vgunnudb
In this blog post, you learned about the different options available for managing passwords in Sqoop. If you have any further questions, please ask them in the comments section below.
Reference: | Key Tips for Managing Passwords in Sqoop from our JCG partner Venkat Gunnu at the Mapr blog. |