twblue/src/sessions/mastodon/utils.py

import re
from html.parser import HTMLParser

url_re = re.compile('<a\s*href=[\'|"](.*?)[\'"].*?>')

class HTMLFilter(HTMLParser):
    text = ""
    first_paragraph = True

    def handle_data(self, data):
        self.text += data

    def handle_starttag(self, tag, attrs):
        if tag == "br":
            self.text = self.text+"\n"
        elif tag == "p":
            if self.first_paragraph:
                self.first_paragraph = False
            else:
                self.text = self.text+"\n\n"

def html_filter(data):
    f = HTMLFilter()
    f.feed(data)
    return f.text

def find_item(item, listItems):
    for i in range(0, len(listItems)):
        if listItems[i].id == item.id:
            return i
        if hasattr(item, "reblog") and item.reblog != None and item.reblog.id == listItems[i].id:
            return i
    return None

def is_audio_or_video(post):
    if post.reblog != None:
        return is_audio_or_video(post.reblog)
    # Checks firstly for Mastodon native videos and audios.
    for media in post.media_attachments:
        if media["type"] == "video" or media["type"] == "audio":
            return True

def is_image(post):
    if post.reblog != None:
        return is_image(post.reblog)
    # Checks firstly for Mastodon native videos and audios.
    for media in post.media_attachments:
        if media["type"] == "gifv" or media["type"] == "image":
            return True

def get_media_urls(post):
    if hasattr(post, "reblog") and post.reblog != None:
            return get_media_urls(post.reblog)
    urls = []
    for media in post.media_attachments:
        if media.get("type") == "audio" or media.get("type") == "video":
            url_keys = ["remote_url", "url"]
            for url_key in url_keys:
                if media.get(url_key) != None:
                    urls.append(media.get(url_key))
                    break
    return urls

def find_urls(post, include_tags=False):
    urls = url_re.findall(post.content)
    if include_tags == False:
        for tag in post.tags:
            for url in urls[::]:
                if url.lower().endswith("/tags/"+tag["name"]):
                    urls.remove(url)
    return urls
Added toot deletion and opening URLS in the browser 2022-11-09 15:52:18 -06:00			`import re`
Added util to parse mastodon toots (very basic, not yet implemented) 2022-11-07 17:13:03 -06:00			`from html.parser import HTMLParser`

Improved html parsing for toots. Remove Tags from URLList 2022-11-10 10:01:39 -06:00			`url_re = re.compile('<a\shref=[\'\|"](.?)[\'"].*?>')`
Added toot deletion and opening URLS in the browser 2022-11-09 15:52:18 -06:00
Added util to parse mastodon toots (very basic, not yet implemented) 2022-11-07 17:13:03 -06:00			`class HTMLFilter(HTMLParser):`
			`text = ""`
Mastodon: Add line breaks when new paragraphs are present on posts content 2022-12-14 12:08:02 -06:00			`first_paragraph = True`

Added util to parse mastodon toots (very basic, not yet implemented) 2022-11-07 17:13:03 -06:00			`def handle_data(self, data):`
			`self.text += data`

Improved html parsing for toots. Remove Tags from URLList 2022-11-10 10:01:39 -06:00			`def handle_starttag(self, tag, attrs):`
			`if tag == "br":`
			`self.text = self.text+"\n"`
Mastodon: Add line breaks when new paragraphs are present on posts content 2022-12-14 12:08:02 -06:00			`elif tag == "p":`
			`if self.first_paragraph:`
			`self.first_paragraph = False`
			`else:`
			`self.text = self.text+"\n\n"`
Improved html parsing for toots. Remove Tags from URLList 2022-11-10 10:01:39 -06:00
Added util to parse mastodon toots (very basic, not yet implemented) 2022-11-07 17:13:03 -06:00			`def html_filter(data):`
			`f = HTMLFilter()`
			`f.feed(data)`
Added templates for toots and persons 2022-11-08 12:19:05 -06:00			`return f.text`

			`def find_item(item, listItems):`
			`for i in range(0, len(listItems)):`
			`if listItems[i].id == item.id:`
			`return i`
Added initial support for direct messages, users need to open conversations for every dm 2022-11-12 11:20:16 -06:00			`if hasattr(item, "reblog") and item.reblog != None and item.reblog.id == listItems[i].id:`
Added templates for toots and persons 2022-11-08 12:19:05 -06:00			`return i`
			`return None`
Added indications when there are audio, video, photos or gifvs in media attachments 2022-11-09 09:09:37 -06:00
Replace 'toot' term to follow mastodon changes 2022-11-16 13:28:45 -06:00			`def is_audio_or_video(post):`
			`if post.reblog != None:`
			`return is_audio_or_video(post.reblog)`
Added indications when there are audio, video, photos or gifvs in media attachments 2022-11-09 09:09:37 -06:00			`# Checks firstly for Mastodon native videos and audios.`
Replace 'toot' term to follow mastodon changes 2022-11-16 13:28:45 -06:00			`for media in post.media_attachments:`
Added indications when there are audio, video, photos or gifvs in media attachments 2022-11-09 09:09:37 -06:00			`if media["type"] == "video" or media["type"] == "audio":`
			`return True`

Replace 'toot' term to follow mastodon changes 2022-11-16 13:28:45 -06:00			`def is_image(post):`
			`if post.reblog != None:`
Removed unneeded code and fixed small typos 2022-11-20 14:54:10 -06:00			`return is_image(post.reblog)`
Added indications when there are audio, video, photos or gifvs in media attachments 2022-11-09 09:09:37 -06:00			`# Checks firstly for Mastodon native videos and audios.`
Replace 'toot' term to follow mastodon changes 2022-11-16 13:28:45 -06:00			`for media in post.media_attachments:`
Added indications when there are audio, video, photos or gifvs in media attachments 2022-11-09 09:09:37 -06:00			`if media["type"] == "gifv" or media["type"] == "image":`
Added playback of audio and video attachments in toots 2022-11-09 13:07:59 -06:00			`return True`

Replace 'toot' term to follow mastodon changes 2022-11-16 13:28:45 -06:00			`def get_media_urls(post):`
Removed unneeded code and fixed small typos 2022-11-20 14:54:10 -06:00			`if hasattr(post, "reblog") and post.reblog != None:`
			`return get_media_urls(post.reblog)`
Added playback of audio and video attachments in toots 2022-11-09 13:07:59 -06:00			`urls = []`
Replace 'toot' term to follow mastodon changes 2022-11-16 13:28:45 -06:00			`for media in post.media_attachments:`
Added playback of audio and video attachments in toots 2022-11-09 13:07:59 -06:00			`if media.get("type") == "audio" or media.get("type") == "video":`
Mastodon: Prefer remote_url before instance cached URL when playing media files 2023-04-05 08:29:21 -06:00			`url_keys = ["remote_url", "url"]`
			`for url_key in url_keys:`
			`if media.get(url_key) != None:`
			`urls.append(media.get(url_key))`
			`break`
Added toot deletion and opening URLS in the browser 2022-11-09 15:52:18 -06:00			`return urls`

Replace 'toot' term to follow mastodon changes 2022-11-16 13:28:45 -06:00			`def find_urls(post, include_tags=False):`
			`urls = url_re.findall(post.content)`
Improved html parsing for toots. Remove Tags from URLList 2022-11-10 10:01:39 -06:00			`if include_tags == False:`
Replace 'toot' term to follow mastodon changes 2022-11-16 13:28:45 -06:00			`for tag in post.tags:`
Improved html parsing for toots. Remove Tags from URLList 2022-11-10 10:01:39 -06:00			`for url in urls[::]:`
			`if url.lower().endswith("/tags/"+tag["name"]):`
			`urls.remove(url)`
			`return urls`