RUN .sql file from command line syntax error [duplicate] - postgresql

I'm running this command from PostgreSQL 9.4 on Windows 8.1:
psql -d dbname -f filenameincurrentdirectory.sql
The sql file has, for example, these commands:
INSERT INTO general_lookups ("name", "old_id") VALUES ('Open', 1);
INSERT INTO general_lookups ("name", "old_id") VALUES ('Closed', 2);`
When I run the psql command, I get this error message:
psql:filenameincurrentdirectory.sql:1: ERROR: syntax error at or near "ÿ_I0811a2h1"
LINE 1: ÿ_I0811a2h1 ru
How do I import a file of SQL commands using psql?
I have no problems utilizing pgAdmin in executing these sql files.

If your issue is BOM, Byte Order Marker, another option is sed. Also kind of nice because if BOM is not your issue it is non-destructive to you data. Download and install sed for windows:
http://gnuwin32.sourceforge.net/packages/sed.htm
The package called "Complete package, except sources" contains additional required libraries that the "Binaries" package doesn't.
Once sed is installed run this command to remove the BOM from your file:
sed -i '1 s/^\xef\xbb\xbf//' filenameincurrentdirectory.sql
Particularly useful if you file is too large for Notepad++

Okay, the problem does have to do with BOM, byte order marker. The file was generated by Microsoft Access. I opened the file in Notepad and saved it as UTF-8 instead of Unicode since Windows saves UTF-16 by default. That got this error message:
psql:filenameincurrentdirectory.sql:1: ERROR: syntax error at or near "INSERT"
LINE 1: INSERT INTO general_lookups ("name", "old_id" ) VAL...
I then learned from another website that Postgres doesn't utilize the BOM and that Notepad doesn't allow users to save without a BOM. So I had to download Notepad++, set the encoding to UTF-8 without BOM, save the file, and then import it. Voila!
An alternative to using Notepad++ is this little python script I wrote. Simply pass in the file name to convert.
import sys
if len(sys.argv) == 2:
with open(sys.argv[1], 'rb') as source_file:
contents = source_file.read()
with open(sys.argv[1], 'wb') as dest_file:
dest_file.write(contents.decode('utf-16').encode('utf-8'))
else:
print "Please pass in a single file name to convert."

Related

How to pass string via STDIN into terminal command being executed within python script?

I need to generate postgres schema from a dataframe. I found csvkit library to come closet to matching datatypes. I can run csvkit and generate postgres schema over a csv on my desktop via terminal through this command found in docs:
csvsql -i postgresql myFile.csv
csvkit docs - https://csvkit.readthedocs.io/en/stable/scripts/csvsql.html
And I can run the terminal command in my script via this code:
import os
a=os.popen("csvsql -i postgresql Desktop/myFile.csv").read()
However I have a dataframe, that I have converted to a csv string and need to generate schema from the string like so:
csvstr = df.to_csv()
In the docs it says that under positional arguments:
The CSV file(s) to operate on. If omitted, will accept
input on STDIN
How do I pass my variable csvstr into the line of code a=os.popen("csvsql -i postgresql csvstr").read() as a variable?
I tried to do the below line of code but got an error OSError: [Errno 7] Argument list too long: '/bin/sh':
a=os.popen("csvsql -i postgresql {}".format(csvstr)).read()
Thank you in advance
You can't pass such a big string via commandline! You have to save the data to a file and pass its path to csvsql.
import csv
csvstr = df.to_csv()
with open('my_cool_df.csv', 'w', newline='') as csvfile:
csvwriter= csv.writer(csvfile)
csvwriter.writerows(csvstr)
And later:
a=os.popen("csvsql -i postgresql my_cool_df.csv")

How to gzip file with unicode encoding using linux cmd prompt?

I have large tsv format file(30GB). I have to transform all those data to google bigquery. So I split the files into smaller chunks and gzip all those chunk files and moved to google cloud storage. After that I have calling google bigquery api to load data from GCS. But I have facing following encoding error.
file_data.part_0022.gz: Error detected while parsing row starting at position: 0. Error: Bad character (ASCII 0) encountered. (error code: invalid)
I am using following unix commands in my python code for splitting and gzip tasks.
cmd = [
"split",
"-l",
"300000",
"-d",
"-a",
"4",
"%s%s" % (<my-dir>, file_name),
"%s/%s.part_" % (<my temp dir>, file_prefix)
]
code = subprocess.check_call(cmd)
cmd = 'gzip %s%s/%s.part*' % (<my temp dir>,file_prefix,file_prefix)
logging.info("Running shell command: %s" % cmd)
code = subprocess.Popen(cmd, shell=True)
code.communicate()
Files are successfully splitted and gziped (file_data.part_0001.gz, file_data.part_0002.gz, etc..) but when I try to load these files to bigquery it throws above error. I understand that was encoding issue.
Is there any way to encoding files while split and gzip operation? or we need to use python file object to read line by line and do unicode encoding and write it to new gzip file?(pythonic way)
Reason:
Error: Bad character (ASCII 0) encountered
Clearly states you have a unicode (UTF-16) tab character there which cannot be decoded.
BigQuery service only supports UTF-8 and latin1 text encodings. So, the file is supposed to be UTF-8 encoded.
Solution: I haven't tested it. Use the -a or --ascii flag with gzip command. It'll be decoded ok by bigquery.

How to import a text file with '|' delimited data to PostgreSQL database? [closed]

I have a text file with | delimited data that I want to import to a table in PostgreSQL database. PgAdminIII only exports CSV files. I converted the file to a CSV file but still was unsuccessful importing data to PostgreSQL database. It says an error has occurred:
Extradata after last expected column.
CONTEXT: COPY <file1>, line1:
What I am doing wrong here?
Using the standard psql shell you can do this:
\copy table_name from 'filename' delimiter '|'
In the shell you can do
\h copy
to see more options and the complete syntax. Of course the manual about COPY is also worthwhile reading.

How to import csv data into postgres table

I tried to import csv file data into postgres table. Running the following line as pgscript in pgAdmin
\copy users_page_rank FROM E'C:\\Users\\GamulinN\\Desktop\\users page rank.csv' DELIMITER ';' CSV
it returned an error:
[ERROR ] 1.0: syntax error, unexpected character
Does anyone know what could be wrong here? I checked this post but couldn't figure out what's the problem.
To import file into postgres with COPY you need one of the following:
1) Connect with psql to the DB and run your comand:
\copy users_page_rank FROM E'C:\\Users\\GamulinN\\Desktop\\users page rank.csv' DELIMITER ';' CSV
It will copy the file from current computer to the table. Details here.
2) Connect with any tool to the DB and run this SQL script:
COPY users_page_rank FROM E'C:\\Users\\GamulinN\\Desktop\\users page rank.csv' DELIMITER ';' CSV
It will copy the file from the server with postgres to the table. Details here. (With this command you can only COPY from files in postgresql data directory. So you will need to transfer files there first.)

PyMySQL UnicodeEncodeError; python shell successes but cmd fails

I'm new to pymysql module and trying to discover it, I have a simple code:
import pymysql
conn=pymysql.connect(host="127.0.0.1",
port=8080,user="root",
passwd="mysql",
db="world",
charset="utf8",
use_unicode=True)
cur=conn.cursor()
cur.execute("SELECT * FROM world.city")
for line in cur:
print(line)
cur.close()
conn.close()
I'm using Python Tools for Visual Studio. When i execute the code, it fails with this error:
Traceback (most recent call last):
File "C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE\Extensio
ns\Microsoft\Python Tools for Visual Studio\1.5\visualstudio_py_debugger.py", li
ne 1788, in write
self.old_out.write(value)
File "C:\Python32\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 6-7: cha
racter maps to <undefined>
Failing line contains city name : ´s-Hertogenbosch
I thought that maybe it's a related problem with cmd output so I've switched to python shell, and my script runs without any error.
So what is the problem I'm facing?
How can I solve it?
I really want to use Python Tools for Visual Studio, so answers that enable me to use PTVS will be most welcomed.
The problem probably is that the output encoding of the environment is set to cp437 and the unicode character cannot be converted to that encoding when doing print(line) that probably translates to the self.old_out.write(value).
Try to replace the print() inside the loop by writing to the file, like:
with open('myoutput.txt', 'w', encoding='utf-8') as f:
for line in cur:
f.write(line)
Well, but the cursor does not return a string line. It return a row (I guess tuple) of elements. Because of that you probably have to do something like that:
with open('myoutput.txt', 'w', encoding='utf-8') as f:
for row in cur:
f.write(repr(row))
This may be enough for a diagnostic purpose. If you need some nicer string, you have to format it in some specific way.
Also, you wrote:
charset="utf8",
use_unicode=True)
If the charset is used, then use_unicode=True can be left out (it is implied by using the charset. If I recall correctly, the charset='utf8' is not any recognized encoding for Python. You have to use charset='utf-8' -- i.e. with dash or underscore between utf and 8. Correction: The utf8 probably works as it is one of the aliases.
UPDATE based on comments...
As the output to a file seems to be OK, the problem is related to the capabilities of the window used for the output of the print command. As the cmd knows only cp437, you have to use or another window (like a Unicode capable window of some GUI), or you have to tell the cmd to use another encoding. See the experience of others. Basically, you have to tell the console:
chcp 65001
to change accepted output encoding to UTF-8, or you can use another (non-Unicode) encoding that supports the wanted characters. Also, the console font should be capable to display the characters (i.e. to contain the glyphs, the images of the characters).
My guess is the data you're receiving is not in unicode despite the fact that your python script is trying to encode it in Unicode.
I would check for database and table spesific charset & collation settings. utf8 & utf8_general_ci are your friends.

Resources