overhauled the project to get away from files (a little)
This commit is contained in:
parent
a13f92548c
commit
2bb66f037a
1
.gitignore
vendored
1
.gitignore
vendored
|
@ -2,3 +2,4 @@ venv/
|
||||||
*.pyc
|
*.pyc
|
||||||
__pycache__/
|
__pycache__/
|
||||||
|
|
||||||
|
.env
|
||||||
|
|
27
.vscode/launch.json
vendored
Normal file
27
.vscode/launch.json
vendored
Normal file
|
@ -0,0 +1,27 @@
|
||||||
|
{
|
||||||
|
// Use IntelliSense to learn about possible attributes.
|
||||||
|
// Hover to view descriptions of existing attributes.
|
||||||
|
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
|
||||||
|
"version": "0.2.0",
|
||||||
|
"configurations": [
|
||||||
|
{
|
||||||
|
"name": "Python: Flask",
|
||||||
|
"type": "python",
|
||||||
|
"request": "launch",
|
||||||
|
"module": "flask",
|
||||||
|
"envFile": "${workspaceFolder}/.env",
|
||||||
|
"env": {
|
||||||
|
"FLASK_APP": "server.py",
|
||||||
|
"FLASK_ENV": "development",
|
||||||
|
"FLASK_DEBUG": "0"
|
||||||
|
},
|
||||||
|
"args": [
|
||||||
|
"run",
|
||||||
|
"--no-debugger",
|
||||||
|
"--no-reload"
|
||||||
|
],
|
||||||
|
"jinja": true,
|
||||||
|
"justMyCode": true
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
40
README.md
40
README.md
|
@ -1,26 +1,26 @@
|
||||||
# csv2ankicards
|
# json2ankicards
|
||||||
|
|
||||||
A comprehensive toolkit that offers:
|
A comprehensive toolkit that offers:
|
||||||
- Conversion of CSV files into Anki deck packages (.apkg files).
|
- Conversion of JSON files into Anki deck packages (.apkg files).
|
||||||
- Conversion of image files in a directory to a text file using Optical Character Recognition (OCR).
|
- Conversion of image files in a directory to a text file using Optical Character Recognition (OCR).
|
||||||
- Generation of CSV format question-answer pairs from textual content using OpenAI's GPT-3 model.
|
- Generation of JSON format question-answer pairs from textual content using OpenAI's GPT-3 model.
|
||||||
- **RESTful API endpoint to upload and convert multiple images directly into an Anki deck package.**
|
- **RESTful API endpoint to upload and convert multiple images directly into an Anki deck package.**
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- Converts a CSV file with questions and answers into an Anki deck package.
|
- Converts a JSON file with questions and answers into an Anki deck package.
|
||||||
- Converts image files from a specified directory to a single text file using OCR.
|
- Converts image files from a specified directory to a single text file using OCR.
|
||||||
- Generates CSV formatted question-answer pairs based on a given text content, ideal for studying or summarization.
|
- Generates JSON formatted question-answer pairs based on a given text content, ideal for studying or summarization.
|
||||||
- For CSV: there are only two columns in the CSV file, separated by the first comma encountered.
|
- For JSON: there are only two columns in the JSON file, separated by the first comma encountered.
|
||||||
- CSV files should have a "Front" column for questions and a "Back" column for answers.
|
- JSON files should have a "Front" column for questions and a "Back" column for answers.
|
||||||
- **API endpoint that accepts multiple image uploads, processes them through the pipeline, and returns an Anki deck package.**
|
- **API endpoint that accepts multiple image uploads, processes them through the pipeline, and returns an Anki deck package.**
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
1. Clone this repository:
|
1. Clone this repository:
|
||||||
```bash
|
```bash
|
||||||
git clone https://git.rudefox.io/bj/anki-csv2ankicards.git
|
git clone https://git.rudefox.io/bj/anki-json2ankicards.git
|
||||||
cd csv2ankicards
|
cd json2ankicards
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Set up a virtual environment and activate it:
|
2. Set up a virtual environment and activate it:
|
||||||
|
@ -36,7 +36,7 @@ A comprehensive toolkit that offers:
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
Before using the `text2csvdeck.py` script, ensure that you have set the `OPENAI_API_KEY` environment variable:
|
Before using the `text2jsondeck.py` script, ensure that you have set the `OPENAI_API_KEY` environment variable:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
export OPENAI_API_KEY=your_openai_api_key_here
|
export OPENAI_API_KEY=your_openai_api_key_here
|
||||||
|
@ -75,7 +75,7 @@ To convert a directory of images directly to an Anki deck package:
|
||||||
python pipeline.py /path/to/your/image_directory/
|
python pipeline.py /path/to/your/image_directory/
|
||||||
```
|
```
|
||||||
|
|
||||||
This will process the images, extract text, convert text to a set of questions and answers in CSV format, and then produce an `output.apkg` file ready for import into Anki.
|
This will process the images, extract text, convert text to a set of questions and answers in JSON format, and then produce an `output.apkg` file ready for import into Anki.
|
||||||
|
|
||||||
### Image to Text Conversion
|
### Image to Text Conversion
|
||||||
|
|
||||||
|
@ -91,31 +91,31 @@ This will produce a `final.txt` file which contains the text extracted from the
|
||||||
|
|
||||||
Currently supported formats for the images are: `.png`, `.jpg`, and `.jpeg`.
|
Currently supported formats for the images are: `.png`, `.jpg`, and `.jpeg`.
|
||||||
|
|
||||||
### Text to CSV Deck Generation
|
### Text to JSON Deck Generation
|
||||||
|
|
||||||
To generate a CSV deck of question-answer pairs from a given text file:
|
To generate a JSON deck of question-answer pairs from a given text file:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python text2csvdeck.py /path/to/your/textfile.txt
|
python text2jsondeck.py /path/to/your/textfile.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
This will analyze the content of the given text file and generate a corresponding `_deck.csv` file with questions and answers that capture the main points and themes of the text.
|
This will analyze the content of the given text file and generate a corresponding `_deck.json` file with questions and answers that capture the main points and themes of the text.
|
||||||
|
|
||||||
**Note:** This script uses the OpenAI GPT-3 model. Ensure you have the necessary API key and OpenAI Python client installed.
|
**Note:** This script uses the OpenAI GPT-3 model. Ensure you have the necessary API key and OpenAI Python client installed.
|
||||||
|
|
||||||
### CSV to Anki Conversion
|
### JSON to Anki Conversion
|
||||||
|
|
||||||
To convert a CSV file into an Anki deck package:
|
To convert a JSON file into an Anki deck package:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python csv2ankicards.py /path/to/your/csvfile.csv output.apkg
|
python json2ankicards.py /path/to/your/jsonfile.json output.apkg
|
||||||
```
|
```
|
||||||
|
|
||||||
This will produce an `output.apkg` file which can then be imported into Anki.
|
This will produce an `output.apkg` file which can then be imported into Anki.
|
||||||
|
|
||||||
#### CSV Format
|
#### JSON Format
|
||||||
|
|
||||||
The CSV file should follow this format:
|
The JSON file should follow this format:
|
||||||
|
|
||||||
```
|
```
|
||||||
Front,Back
|
Front,Back
|
||||||
|
|
21
ankiai.py
Normal file
21
ankiai.py
Normal file
|
@ -0,0 +1,21 @@
|
||||||
|
import sys
|
||||||
|
|
||||||
|
from images2text import main as ocr_images
|
||||||
|
from prompt4cards import prompt_for_card_content, response_to_json
|
||||||
|
from json2deck import to_package
|
||||||
|
|
||||||
|
|
||||||
|
def images_to_package(directory_path, outfile):
|
||||||
|
ocr_text = ocr_images(directory_path)
|
||||||
|
response_text = prompt_for_card_content(ocr_text)
|
||||||
|
deck_json = response_to_json(response_text)
|
||||||
|
to_package(deck_json).write_to_file(outfile)
|
||||||
|
print(f"Deck created at: {outfile}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
if len(sys.argv) != 2:
|
||||||
|
print("Usage: python ankiai.py <directory_path_containing_images>")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
images_to_package(sys.argv[1])
|
|
@ -1,49 +0,0 @@
|
||||||
import csv
|
|
||||||
import genanki
|
|
||||||
import sys
|
|
||||||
|
|
||||||
# Create a new model for our cards. This is necessary for genanki.
|
|
||||||
MY_MODEL = genanki.Model(
|
|
||||||
1607392319,
|
|
||||||
"Simple Model",
|
|
||||||
fields=[
|
|
||||||
{"name": "Question"},
|
|
||||||
{"name": "Answer"},
|
|
||||||
],
|
|
||||||
templates=[
|
|
||||||
{
|
|
||||||
"name": "Card 1",
|
|
||||||
"qfmt": "{{Question}}",
|
|
||||||
"afmt": "{{FrontSide}}<hr id='answer'>{{Answer}}",
|
|
||||||
},
|
|
||||||
])
|
|
||||||
|
|
||||||
def csv_to_anki(csv_path, output_path):
|
|
||||||
with open(csv_path, 'r', encoding='utf-8') as f:
|
|
||||||
reader = csv.reader(f)
|
|
||||||
# Skipping the header row
|
|
||||||
next(reader, None)
|
|
||||||
|
|
||||||
my_deck = genanki.Deck(2059400110, "CSV Deck")
|
|
||||||
for row in reader:
|
|
||||||
# Use row directly without splitting
|
|
||||||
question = row[0]
|
|
||||||
answer = ",".join(row[1:])
|
|
||||||
|
|
||||||
note = genanki.Note(
|
|
||||||
model=MY_MODEL,
|
|
||||||
fields=[question, answer]
|
|
||||||
)
|
|
||||||
my_deck.add_note(note)
|
|
||||||
genanki.Package(my_deck).write_to_file(output_path)
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
if len(sys.argv) != 3:
|
|
||||||
print("Usage: python convert.py <input_csv> <output_apkg>")
|
|
||||||
sys.exit(1)
|
|
||||||
|
|
||||||
input_csv = sys.argv[1]
|
|
||||||
output_apkg = sys.argv[2]
|
|
||||||
csv_to_anki(input_csv, output_apkg)
|
|
||||||
print(f"Deck created at: {output_apkg}")
|
|
||||||
|
|
|
@ -80,7 +80,7 @@ def main(directory_path):
|
||||||
f.write("\n".join(final_text))
|
f.write("\n".join(final_text))
|
||||||
|
|
||||||
print(f"All images processed! Final output saved to {FINAL_OUTPUT}")
|
print(f"All images processed! Final output saved to {FINAL_OUTPUT}")
|
||||||
return FINAL_OUTPUT # Add this line
|
return final_text # Add this line
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|
55
json2deck.py
Normal file
55
json2deck.py
Normal file
|
@ -0,0 +1,55 @@
|
||||||
|
import json
|
||||||
|
import genanki
|
||||||
|
import sys
|
||||||
|
|
||||||
|
# Create a new model for our cards. This is necessary for genanki.
|
||||||
|
MY_MODEL = genanki.Model(
|
||||||
|
1607372319,
|
||||||
|
"Simple Model",
|
||||||
|
fields=[
|
||||||
|
{"name": "Title"},
|
||||||
|
{"name": "Question"},
|
||||||
|
{"name": "Answer"},
|
||||||
|
],
|
||||||
|
templates=[
|
||||||
|
{
|
||||||
|
"name": "{{Title}}",
|
||||||
|
"qfmt": "{{Question}}",
|
||||||
|
"afmt": "{{FrontSide}}<hr id='answer'>{{Answer}}",
|
||||||
|
},
|
||||||
|
])
|
||||||
|
|
||||||
|
def json_file_to_package(json_path):
|
||||||
|
with open(json_path, 'r', encoding='utf-8') as f:
|
||||||
|
json_data = json.load(f)
|
||||||
|
package = to_package(json_data)
|
||||||
|
|
||||||
|
return package
|
||||||
|
|
||||||
|
def to_package(deck_json):
|
||||||
|
deck_title = deck_json["DeckTitle"]
|
||||||
|
deck = genanki.Deck(1607372319, deck_title)
|
||||||
|
|
||||||
|
for card_json in deck_json["Cards"]:
|
||||||
|
title = card_json["Title"]
|
||||||
|
question = card_json["Question"]
|
||||||
|
answer = card_json["Answer"]
|
||||||
|
|
||||||
|
note = genanki.Note(
|
||||||
|
model=MY_MODEL,
|
||||||
|
fields=[title, question, answer]
|
||||||
|
)
|
||||||
|
|
||||||
|
deck.add_note(note)
|
||||||
|
|
||||||
|
return genanki.Package(deck)
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
if len(sys.argv) != 3:
|
||||||
|
print("Usage: python convert.py <input_json> <output_apkg>")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
input_json = sys.argv[1]
|
||||||
|
output_apkg = sys.argv[2]
|
||||||
|
json_file_to_package(input_json).write_to_file(output_apkg)
|
||||||
|
print(f"Deck created at: {output_apkg}")
|
27
pipeline.py
27
pipeline.py
|
@ -1,27 +0,0 @@
|
||||||
import sys
|
|
||||||
import os
|
|
||||||
|
|
||||||
from images2text import main as images_to_text
|
|
||||||
from text2csvdeck import text_file_to_csv_deck
|
|
||||||
|
|
||||||
CSV_DECK_NAME = "output_deck.csv"
|
|
||||||
APKG_NAME = "output.apkg"
|
|
||||||
|
|
||||||
|
|
||||||
def pipeline(directory_path):
|
|
||||||
# 1. Convert images in the directory to a text file
|
|
||||||
text_file_name = images_to_text(directory_path)
|
|
||||||
|
|
||||||
# 2. Convert the text file to a CSV deck using ChatGPT
|
|
||||||
text_file_to_csv_deck(text_file_name)
|
|
||||||
|
|
||||||
# 3. Convert the CSV deck to an Anki package
|
|
||||||
os.system(f"python csv2ankicards.py {CSV_DECK_NAME} {APKG_NAME}")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
if len(sys.argv) != 2:
|
|
||||||
print("Usage: python pipeline.py <directory_path_containing_images>")
|
|
||||||
sys.exit(1)
|
|
||||||
|
|
||||||
pipeline(sys.argv[1])
|
|
104
prompt4cards.py
Normal file
104
prompt4cards.py
Normal file
|
@ -0,0 +1,104 @@
|
||||||
|
import openai
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import json
|
||||||
|
|
||||||
|
CHAT_MODEL = "gpt-3.5-turbo"
|
||||||
|
OUTPUT_FILENAME = "output_deck.json"
|
||||||
|
|
||||||
|
API_KEY = os.environ.get("OPENAI_API_KEY")
|
||||||
|
if not API_KEY:
|
||||||
|
raise ValueError("Please set the OPENAI_API_KEY environment variable.")
|
||||||
|
|
||||||
|
openai.api_key = API_KEY
|
||||||
|
|
||||||
|
# Given prompt template
|
||||||
|
PROMPT_TEMPLATE = """
|
||||||
|
Please come up with a title for the deck and a set of 10 index cards for memorization,
|
||||||
|
including a title, front, and back for each card. The index cards should completely
|
||||||
|
capture the main points and themes of the text. In addition, they should contain any
|
||||||
|
numbers or data that humans might find difficult to remember. The goal of the index
|
||||||
|
card set is that one who memorizes it can provide a summary of the text to someone
|
||||||
|
else, conveying the main points and themes.
|
||||||
|
|
||||||
|
You will provide the deck title, and the titles, questions, and answers for each card
|
||||||
|
in a structured format as follows:
|
||||||
|
```
|
||||||
|
Deck Title: Title of the Deck
|
||||||
|
Cards:
|
||||||
|
- Title: Card Title 1
|
||||||
|
Front: What is the capital of New York?
|
||||||
|
Back: Albany
|
||||||
|
- Title: Card Title 2
|
||||||
|
Front: Where in the world is Carmen San Diego?
|
||||||
|
Back: Nobody knows
|
||||||
|
```
|
||||||
|
|
||||||
|
{content}
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def prompt_for_card_content(text_content):
|
||||||
|
# Prepare the prompt
|
||||||
|
prompt = PROMPT_TEMPLATE.format(content=text_content)
|
||||||
|
|
||||||
|
# Get completion from the OpenAI ChatGPT API
|
||||||
|
response = openai.ChatCompletion.create(
|
||||||
|
model=CHAT_MODEL,
|
||||||
|
messages=[
|
||||||
|
{"role": "user", "content": prompt}
|
||||||
|
],
|
||||||
|
temperature=0,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Extract content from response and save to a new file
|
||||||
|
return response.choices[0]['message']['content']
|
||||||
|
|
||||||
|
|
||||||
|
def response_to_json(response_text):
|
||||||
|
lines = [line.strip() for line in response_text.split("\n") if line.strip()]
|
||||||
|
|
||||||
|
deck_title = None
|
||||||
|
cards = []
|
||||||
|
current_card = {}
|
||||||
|
|
||||||
|
for line in lines:
|
||||||
|
if "Deck Title:" in line and not deck_title:
|
||||||
|
deck_title = line.split("Deck Title:", 1)[1].strip()
|
||||||
|
elif "Title:" in line:
|
||||||
|
if current_card: # If there's a card being processed, add it to cards
|
||||||
|
cards.append(current_card)
|
||||||
|
current_card = {}
|
||||||
|
current_card["Title"] = line.split("Title:", 1)[1].strip()
|
||||||
|
elif "Front:" in line:
|
||||||
|
current_card["Question"] = line.split("Front:", 1)[1].strip()
|
||||||
|
elif "Back:" in line:
|
||||||
|
current_card["Answer"] = line.split("Back:", 1)[1].strip()
|
||||||
|
|
||||||
|
if current_card: # Add the last card if it exists
|
||||||
|
cards.append(current_card)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"DeckTitle": deck_title,
|
||||||
|
"Cards": cards
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
if len(sys.argv) != 2:
|
||||||
|
print("Usage: python text2jsondeck.py <text_file_path>")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
text_file_path = sys.argv[1]
|
||||||
|
|
||||||
|
# Read the text content
|
||||||
|
with open(text_file_path, 'r') as file:
|
||||||
|
text_content = file.read()
|
||||||
|
|
||||||
|
response_text = prompt_for_card_content(text_content)
|
||||||
|
deck_json = response_to_json(response_text)
|
||||||
|
|
||||||
|
with open(OUTPUT_FILENAME, 'w') as json_file:
|
||||||
|
json.dump(deck_json, json_file)
|
||||||
|
|
||||||
|
print(f"Saved generated deck to {OUTPUT_FILENAME}")
|
|
@ -4,7 +4,7 @@ import os
|
||||||
import tempfile
|
import tempfile
|
||||||
import shutil
|
import shutil
|
||||||
|
|
||||||
from pipeline import pipeline
|
from ankiai import images_to_package
|
||||||
|
|
||||||
app = Flask(__name__)
|
app = Flask(__name__)
|
||||||
|
|
||||||
|
@ -35,7 +35,7 @@ def deck_from_images():
|
||||||
save_uploaded_images(images, TEMP_DIR)
|
save_uploaded_images(images, TEMP_DIR)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
pipeline(TEMP_DIR)
|
images_to_package(TEMP_DIR)
|
||||||
return send_from_directory('.', OUTPUT_FILE, as_attachment=True)
|
return send_from_directory('.', OUTPUT_FILE, as_attachment=True)
|
||||||
except Exception as e: # Consider catching more specific exceptions
|
except Exception as e: # Consider catching more specific exceptions
|
||||||
return jsonify({'error': str(e)}), 500
|
return jsonify({'error': str(e)}), 500
|
||||||
|
|
|
@ -1,70 +0,0 @@
|
||||||
import openai
|
|
||||||
import sys
|
|
||||||
import os
|
|
||||||
|
|
||||||
CHAT_MODEL = "gpt-3.5-turbo"
|
|
||||||
OUTPUT_FILENAME = "output_deck.csv"
|
|
||||||
|
|
||||||
API_KEY = os.environ.get("OPENAI_API_KEY")
|
|
||||||
if not API_KEY:
|
|
||||||
raise ValueError("Please set the OPENAI_API_KEY environment variable.")
|
|
||||||
|
|
||||||
openai.api_key = API_KEY
|
|
||||||
|
|
||||||
# Given prompt template
|
|
||||||
PROMPT_TEMPLATE = """
|
|
||||||
Please come up with a set of 10 index cards for memorization, including front and back.
|
|
||||||
The index cards should completely capture the main points and themes of the text.
|
|
||||||
In addition, they should contain any numbers or data that humans might find difficult to remember.
|
|
||||||
The goal of the index card set is that one who memorizes it can provide a summary of the text to someone else, conveying the main points and themes.
|
|
||||||
|
|
||||||
You will provide the questions and answers to me in CSV format, as follows:
|
|
||||||
```
|
|
||||||
Front,Back
|
|
||||||
What is the capital of New York?,Albany
|
|
||||||
Where in the world is Carmen San Diego?,Nobody knows
|
|
||||||
```
|
|
||||||
|
|
||||||
The question/answer pairs shall not be numbered or contain any signs of being ordered.
|
|
||||||
|
|
||||||
{content}
|
|
||||||
"""
|
|
||||||
|
|
||||||
def text_file_to_csv_deck(text_file_path):
|
|
||||||
|
|
||||||
# Read the text content
|
|
||||||
with open(text_file_path, 'r') as file:
|
|
||||||
text_content = file.read()
|
|
||||||
|
|
||||||
content_to_csv(text_content)
|
|
||||||
|
|
||||||
|
|
||||||
def content_to_csv(text_content):
|
|
||||||
|
|
||||||
# Prepare the prompt
|
|
||||||
prompt = PROMPT_TEMPLATE.format(content=text_content)
|
|
||||||
|
|
||||||
# Get completion from the OpenAI ChatGPT API
|
|
||||||
response = openai.ChatCompletion.create(
|
|
||||||
model=CHAT_MODEL,
|
|
||||||
messages=[
|
|
||||||
{"role": "user", "content": prompt}
|
|
||||||
],
|
|
||||||
temperature=0,
|
|
||||||
)
|
|
||||||
|
|
||||||
# Extract CSV content from response and save to a new file
|
|
||||||
csv_content = response.choices[0]['message']['content']
|
|
||||||
|
|
||||||
with open(OUTPUT_FILENAME, 'w') as csv_file:
|
|
||||||
csv_file.write(csv_content)
|
|
||||||
|
|
||||||
print(f"Saved generated deck to {OUTPUT_FILENAME}")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
if len(sys.argv) != 2:
|
|
||||||
print("Usage: python text2csvdeck.py <text_file_path>")
|
|
||||||
sys.exit(1)
|
|
||||||
|
|
||||||
text_file_to_csv_deck(sys.argv[1])
|
|
Loading…
Reference in New Issue
Block a user