[SOLVED] Postgresql rule on view and concurrency

Ramurd · 05-19-2016, 08:57 AM

I have created a database with the following code:

Code:

CREATE TABLE IF NOT EXISTS public.data_table
(
        id integer NOT NULL DEFAULT nextval('data_table_id_seq'::regclass),
        reference_type character varying(40) NOT NULL,
        reference_value character varying(100) NOT NULL,
        CONSTRAINT data_table_pkey PRIMARY KEY (id)
);

CREATE TABLE IF NOT EXISTS public.crossref
(
        source_ref integer NOT NULL,
        target_ref integer NOT NULL,
        CONSTRAINT crossref_pkey PRIMARY KEY (source_ref, target_ref),
        CONSTRAINT crossref_source_ref_fkey FOREIGN KEY (source_ref)
                REFERENCES public.data_table (id) MATCH SIMPLE
                ON UPDATE CASCADE ON DELETE CASCADE,
        CONSTRAINT crossref_target_ref_fkey FOREIGN KEY (target_ref)
                REFERENCES public.data_table (id) MATCH SIMPLE
                ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE
);

CREATE INDEX IF NOT EXISTS fki_crossref_target_ref_fkey
  ON public.crossref USING btree (target_ref);

-- this function will be called by the view
-- and assures the reference / value pair exists and will return the id.
CREATE OR REPLACE FUNCTION get_refid(_type varchar, _value varchar)
RETURNS integer AS
$BODY$DECLARE
        ref_id INTEGER;
BEGIN
        IF _type IS NULL OR _value IS NULL
        THEN
                RAISE EXCEPTION 'None of the arguments to get_refid(type,value) may be NULL';
        ELSE
                INSERT INTO data_table(reference_type,reference_value) VALUES (_type,_value) ON CONFLICT DO NOTHING;
                SELECT INTO ref_id id FROM data_table WHERE reference_type=_type AND reference_value=_value;
        END IF;
        RETURN ref_id;
EXCEPTION WHEN unique_violation THEN
        SELECT INTO ref_id id FROM data_table WHERE reference_type=_type AND reference_value=_value;
        RETURN ref_id;
END;$BODY$
LANGUAGE plpgsql VOLATILE;


-- view (supports select + insert; update will come in due time)
CREATE OR REPLACE VIEW public.crossview AS 
        SELECT xr1.reference_type AS source_reference_type,
                xr1.reference_value AS source_reference_value,
                xr2.reference_type AS target_reference_type,
                xr2.reference_value AS target_reference_value,
                crossref.valid_from,
                crossref.valid_to
        FROM crossref
        JOIN data_table xr1 ON crossref.source_ref = xr1.id
        JOIN data_table xr2 ON crossref.target_ref = xr2.id;

CREATE RULE crossview_insert AS
ON INSERT TO crossview 
DO INSTEAD
        INSERT INTO crossref(source_ref, target_ref) VALUES (get_refid(NEW.source_reference_type, NEW.source_reference_value), get_refid(NEW.target_reference_type,NEW.target_reference_value));

This does get the job done. I noticed double entries on the data table (added a unique index on reference_type + reference_value, as is actually logical.

Now, single simple inserts on the view work as intended; However, something is not entirely as I would like it:

If I do, for example:

Code:

INSERT INTO crossview(source_reference_type,source_reference_value,target_reference_type,target_reference_value)
VALUES
('name','John','street', 'livestreet'),
('name','John','city','atown'),
('name','John','hobby','atoms');

Without unique constraint, "name" + "John" will appear multiple times in data_table; however, with the unique constraint only once (of course); however: the id skips up.. meaning nextval is called several times by the stored procedure... What is the correct way to go around and solve this? (yes, id is only 'a number'; yet: it does not feel clean... but can't really put my hands on it.

NevemTeve · 05-20-2016, 04:46 AM

Yes there will be gaps in the generated numbers, but it hurts no-one.

sundialsvcs · 05-20-2016, 06:50 AM

Agree. In quite a few systems I've worked with over the years, the practice was to generate primary-keys at random, ostensibly to keep the index-trees in better balance but maybe just to emphasize that primary-key values must "mean nothing at all." You do not and must not care that the numbers are, or aren't, consecutive ... only that they are unique.

Ramurd · 05-24-2016, 02:01 AM

fair enough :-) Was just thinking that maybe there was, is a better approach to ensure no conflicts occur and things happen a bit cleaner.

Ramurd · 05-30-2016, 12:39 PM

In fact there was a small improvement to be made (actually, my fear was/is more than I run out of integers, rather than the gaps themselves; but then again, maybe I shouldn't have used the serial type for that.

Anyway the improvement is rather than insert and wait for the exception, is first lookup and act on "IF NOT FOUND" and then INSERT. Due to concurrent invocations of this procedure, I should still handle an exception and try to do the select again. So far this seems to work fine:

Code:

REATE OR REPLACE FUNCTION public.get_refid(
    _type character varying,
    _value character varying)
  RETURNS integer AS
$BODY$DECLARE
	ref_id INTEGER;
	_tid INTEGER;
BEGIN
	IF _value IS NULL
	THEN
		RAISE EXCEPTION 'get_refid() value may not be null.';
	ELSE
		_tid := get_typeid(_type);
		SELECT INTO ref_id id FROM ref_data WHERE type_id=_tid AND reference_value=_value;
		IF ref_id IS NULL
		THEN
			INSERT INTO ref_data(type_id,reference_value,create_ts) VALUES (_tid,_value,now()) ON CONFLICT DO NOTHING;
			SELECT INTO ref_id id FROM ref_data WHERE type_id=_tid AND reference_Value=_value;
		END IF;
	END IF;
	RETURN ref_id;
EXCEPTION WHEN unique_violation THEN
	SELECT INTO ref_id id FROM ref_data WHERE reference_type=_tid AND reference_value=_value;
	RETURN ref_id;
END;$BODY$
  LANGUAGE plpgsql;

sundialsvcs · 05-31-2016, 01:19 PM

I suggest that your first approach was much more efficient and that you ought not be concerned at all about "gaps in the sequence."

Your "if not found" strategy has a race condition, unless you use a transaction with a very-restrictive isolation level.

Your original "insert multiple rows" strategy, while it still must be performed "in a transaction," could tolerate a much more liberal level.

No one, apart from the database itself, should even know(!) what a "primary key" value is! If your application requires "an externally available unique identifier," it should generate such a value ... or, values ... by some means, and store them separately. Any external application that needs to refer to a particular row must use this "guaranteed unique, but otherwise nonsensical," identifier. ("UUID" a.k.a. "GUID" fields are very-commonly used for this purpose.)

Ramurd · 06-01-2016, 09:06 AM

Well, it was not the gaps in themselves that I felt could be an issue; rather: since the serial type is limited I feared the end of this sequence would be reached in a relatively short term. For some reason (can't really find the exact why right now) the get_ functions would be called so often, that I noticed huge gaps (of an order of 100k) within a few hours. That got me wondering if the sequence would likely recycle and if that would break the database.

Should've thought about using a UUID; I can easily create one by adding the extension pgcrpyto and calling gen_random_uuid()
The only thing is that I'll have to alter the schema a bit to use these uuids rather than the sequence and integers. But the gain is obvious. Thanks for pointing that out.

Edit: fixed most now, I think:
Altered the tables

Code:

DROP VIEW xreference_view; -- because the view relies on the tables that get recreated.

DROP TABLE ref_types;
CREATE TABLE ref_types
(
        id uuid NOT NULL DEFAULT gen_random_uuid(),
        reference_type character varying(40) NOT NULL,
        CONSTRAINT ref_types_pkey PRIMARY KEY (id)
);

DROP TABLE ref_data;
CREATE TABLE ref_data
(
        id uuid NOT NULL DEFAULT gen_random_uuid(),
        type_id uuid NOT NULL,
        reference_value character varying(100) NOT NULL,
        create_ts timestamp without time zone,
        CONSTRAINT ref_data_pkey PRIMARY KEY (id),
        CONSTRAINT type_id_fkey FOREIGN KEY (type_id) REFERENCES ref_types (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE
);

CREATE TABLE xreference
(
        source_ref uuid NOT NULL,
        target_ref uuid NOT NULL,
        valid_from date,
        valid_to date,
        CONSTRAINT xreference_pkey PRIMARY KEY (source_ref, target_ref),
        CONSTRAINT xreference_source_ref_fkey FOREIGN KEY (source_ref) REFERENCES ref_data (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE,
        CONSTRAINT xreference_target_ref_fkey FOREIGN KEY (target_ref) REFERENCES ref_data (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE
);

CREATE OR REPLACE VIEW xreference_view AS 
SELECT 
        xr1.reference_type AS source_reference_type,
        xr1.reference_value AS source_reference_value,
        xr2.reference_type AS target_reference_type,
        xr2.reference_value AS target_reference_value,
        xreference.valid_from,
        xreference.valid_to
FROM xreference
        JOIN ( 
                SELECT 
                        ref_data.id, 
                        xt1.reference_type,
                         ref_data.reference_value
                FROM ref_data
                JOIN ref_types xt1 ON ref_data.type_id = xt1.id
        ) xr1 ON xreference.source_ref = xr1.id
        JOIN ( 
                SELECT 
                        ref_data.id,
                        xt1.reference_type,
                        ref_data.reference_value
                FROM ref_data
                        JOIN ref_types xt1 ON ref_data.type_id = xt1.id
        ) xr2 ON xreference.target_ref = xr2.id;

Instead of a set of rules, instead added a trigger:

Code:

CREATE OR REPLACE FUNCTION update_xreference_view()
RETURNS trigger AS
$BODY$
BEGIN
	--
	-- Perform the required operation on xreference_view. 
	--
	IF (TG_OP = 'DELETE') THEN
		DELETE FROM xreference
		WHERE 
			xreference.source_ref = get_refid(old.source_reference_type, old.source_reference_value) 
		AND	xreference.target_ref = get_refid(old.target_reference_type, old.target_reference_value);

		IF NOT FOUND 
		THEN 
			RETURN NULL; 
		END IF;

		RETURN OLD;
	ELSIF (TG_OP = 'UPDATE') THEN
		UPDATE xreference 
		SET 
			source_ref = get_refid(new.source_reference_type, new.source_reference_value), 
			target_ref = get_refid(new.target_reference_type, new.target_reference_value), 
			valid_from = NEW.valid_from, 
			valid_to = NEW.valid_to
		WHERE 
			xreference.source_ref = get_refid(old.source_reference_type, old.source_reference_value) 
		AND	xreference.target_ref = get_refid(old.target_reference_type, old.target_reference_value);

		RETURN NEW;
	ELSIF (TG_OP = 'INSERT') THEN
		INSERT INTO xreference(source_ref, target_ref, valid_from, valid_to) 
		VALUES (
			get_refid(NEW.source_reference_type, NEW.source_reference_value), 
			get_refid(NEW.target_reference_type,NEW.target_reference_value), 
			NEW.valid_from, NEW.valid_to)
		ON CONFLICT ON CONSTRAINT xreference_pkey DO 
		UPDATE SET 
			valid_from=EXCLUDED.valid_from, 
			valid_to=EXCLUDED.valid_to;

		RETURN NEW;
        END IF;
    END;
$BODY$
LANGUAGE plpgsql;

CREATE TRIGGER xreference_update
        INSTEAD OF INSERT OR UPDATE OR DELETE
        ON xreference_view
        FOR EACH ROW EXECUTE PROCEDURE update_xreference_view();

and the procedures to create/find the right identifiers were modified to use UUID rather than integer and safe in regard to the race condition:

Code:

DROP FUNCTION get_refid(varchar, varchar);
DROP FUNCTION get_typeid(varchar);

CREATE OR REPLACE FUNCTION get_typeid(_type character varying)
RETURNS uuid AS
$BODY$
DECLARE
        _tid uuid;
BEGIN
        IF _type IS NULL
        THEN
                RAISE EXCEPTION 'Argument to get_typeid may not be null.';
        ELSE
                INSERT INTO ref_types(reference_type) VALUES (_type) ON CONFLICT DO NOTHING;
                SELECT INTO _tid id FROM ref_types WHERE reference_type=_type;
        END IF;
        RETURN _tid;
EXCEPTION WHEN unique_violation THEN
        SELECT INTO _tid id FROM ref_types WHERE reference_type=_type;
        RETURN _tid;
END;
$BODY$
LANGUAGE plpgsql;